Publications - Honda Research Institute USA
HRI Publications
Publishing allows us to share ideas with the scientific community and provides credibility to our researchers and the organization as a whole as a contributing member. Such open exchange of information facilitates innovation.
You can search our peer-reviewed publications written by HRI-US researchers, sometimes in collaboration with the academic community. Start your journey by filtering our the list according to the recent projects and research areas, or searching through the catalog of archived publications.
Transition metal dichalcogenides exhibit a variety of electronic behaviors depending on the number of layers and width. Therefore, developing facile methods for their controllable synthesis is of central importance. We found that nickel nanoparticles promote both heterogeneous nucleation of the first layer of molybdenum disulfide and simultaneously catalyzes homoepitaxial tip growth of a second layer via a vapor-liquid-solid (VLS) mechanism, resulting in bilayer nanoribbons with width controlled by the nanoparticle diameter. Simulations further confirm the VLS growth mechanism toward nanoribbons and its orders of magnitude higher growth speed compared to the conventional noncatalytic growth of flakes. Width-dependent Coulomb blockade oscillation observed in the transfer characteristics of the nanoribbons at temperatures up to 60 K evidences the value of this proposed synthesis strategy for future nanoelectronics.
The configuration and local environment of active sites in transition metal dichalcogenides can significantly alter their
electrocatalytic activity toward the hydrogen evolution reaction (HER). Herein, we demonstrate that the HER activity of monolayer MoS2 electrocatalysts can be enhanced through the modulation of active sites by introducing a molecular mediator that alters the coverage of adsorbed protons. Sodium dodecyl sulfate (SDS) promotes the intrinsic HER activity of both terrace-based sulfur vacancies (VS) and edge sites during HER operation in an acidic environment, leading to increases in the turnover frequency (TOF) of both sites by up to 5 orders of magnitude. Simulations indicate that SDS facilitates proton adsorption by catching protons from
hydronium ions and releasing them to VS, which reduces the energy barrier by creating a stair-case-like free energy profile. Our results highlight the ability to tailor the activity of electrocatalysts by synergistically combining proton transfer mediators with engineered active sites.
Predicting future trajectories of traffic agents in highly interactive environments is an essential and challenging problem for the safe operation of autonomous driving systems. On the basis of the fact that self-driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, radar, etc.), we propose a Cross-Modal Embedding framework that aims to benefit from the use of multiple input modalities. At training time, our model learns to embed a set of complementary features in a shared latent space by jointly optimizing the objective functions across different types of input data. At test time, a single input modality (e.g., LiDAR data) is required to generate predictions from the input perspective (i.e., in the LiDAR space), while taking advantages from the model trained with multiple sensor modalities. An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
A human-centered robot needs to reason about the cognitive limitations and potential irrationality of its human partner to achieve seamless interactions. This paper proposes a novel anytime game-theoretic planning framework that integrates iterative reasoning models, partially observable Markov decision process, and Monte-Carlo belief tree search for robot behavioral planning. Our planner equips a robot with the ability to reason about its human partner’s latent cognitive states(bounded intelligence and irrationality) and enables the robot to actively learn these latent states to better maximize its utility. Furthermore, our planner handles safety explicitly by enforcing change constraints. We validate our approach in an autonomous driving domain where our behavioral planner and a low-level motion controller hierarchically control an autonomous car to negotiate traffic merges. Simulations and user studies are conducted to show our planner’s effectiveness.
This paper considers the problem of multi-modal future trajectory forecast with ranking. Here, multi-modality and ranking refer to the multiple plausible path predictions and the confidence in those predictions, respectively. We propose Social-STAGE, Social interaction-aware SpatioTemporal multi-Attention Graph convolution network with novel Evaluation for multi-modality. Our main contributions include analysis and formulation of multi-modality with ranking using interaction and multi-attention, and introduction of new metrics to evaluate the diversity and associated confidence of multi-modal predictions. We evaluate our approach on existing public datasets ETH and UCY and show that the proposed algorithm outperforms the state of the arts on these datasets
Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios. However, identifying the subtle cues that can indicate drastically different outcomes remains an open problem with designing autonomous systems that operate in human environments. In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty. We encode prior knowledge on the latent states of other drivers through a framework that combines the reinforcement learner with a supervised learner. In addition, we model the influence passing between different vehicles through graph neural networks (GNNs). The proposed framework significantly improves performance in the context of navigating T-intersections compared with state-of-the-art baseline approaches.
Constructing realistic and real time human-robot interaction models is a core challenge in crowd navigation. In this paper we derive a robot-agent interaction density from first principles of probability theory; we call our approach “first order interacting Gaussian processes” (foIGP). Furthermore, we compute locally optimal solutions—with respect to multi-faceted agent “intent” and “flexibility”—in near real time on a laptop CPU. We test on challenging scenarios from the ETH crowd dataset and show that the safety and efficiency statistics of foIGP is competitive with human safety and efficiency statistics. Further, we compute the safety and efficiency statistics of dynamic window avoidance, a physics based model variant of foIGP, a Monte Carlo inference based approach, and the best performing deep reinforcement learning algorithm; foIGP outperforms all of them.
Multi-agent interacting systems are prevalent in the world, from pure physical systems to complicated social dynamic systems. In many applications, effective understanding of the situation and accurate trajectory prediction of interactive agents play a significant role in downstream tasks, such as decision making and planning. In this paper, we propose a generic trajectory forecasting framework (named EvolveGraph) with explicit relational structure recognition and prediction via latent interaction graphs among multiple heterogeneous, interactive agents. Considering the uncertainty of future behaviors, the model is designed to provide multi-modal prediction hypotheses. Since the underlying interactions may evolve even with abrupt changes, and different modalities of evolution may lead to different outcomes, we address the necessity of dynamic relational reasoning and adaptively evolving the interaction graphs. We also introduce a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance. The proposed framework is evaluated on both synthetic physics simulations and multiple real-world benchmark datasets in various areas. The experimental results illustrate that our approach achieves state-of-the-art performance in terms of prediction accuracy.
Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.
We propose a Deep RObust Goal-Oriented trajectory prediction Network (DROGON) for accurate vehicle trajectory prediction by considering behavioral intentions of vehicles in traffic scenes. Our main insight is that the behavior (i.e., motion) of drivers can be reasoned from their high level possible goals (i.e., intention) on the road. To succeed in such behavior reasoning, we build a conditional prediction model to forecast goal-oriented trajectories with the following stages: (i) relational inference where we encode relational interactions of vehicles using the perceptual context; (ii) intention estimation to compute the probability distributions of intentional goals based on the inferred relations; and (iii) behavior reasoning where we reason about the behaviors of vehicles as trajectories conditioned on the intentions. To this end, we extend the proposed framework to the pedestrian trajectory prediction task, showing the potential applicability toward general trajectory prediction.
Tactile sensing is inherently contact based. To use tactile data, robots need to make contact with the surface of an object. This is inefficient in applications where an agent needs to make a decision between multiple alternatives that depend the physical properties of the contact location. We propose a method to get tactile data in a non-invasive manner. The proposed method estimates the output of a tactile sensor from the depth data of the surface of the object based on past experiences. An experience dataset is built by allowing the robot to interact with various objects, collecting tactile data and the corresponding object surface depth data. We use the experience dataset to train a neural network to estimate the tactile output from depth data alone. We use GelSight tactile sensors, an image-based sensor, to generate images that capture detailed surface features at the contact location. We train a network with a dataset containing 578 tactile-image to depthmap correspondences. Given a depth-map of the surface of an object, the network outputs an estimate of the response of the tactile sensor, should it make a contact with the object. We evaluate the method with structural similarity index matrix (SSIM), a similarity metric between two images commonly used in image processing community. We present experimental results that show the proposed method outperforms a baseline that uses random images with statistical significance getting an SSIM score of 0.84 ± 0.0056 and 0.80 ± 0.0036, respectively.
Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone. Supplementary material is available at https: //sites.google.com/view/fabric-smoothing.
Interaction research has been initially focusing on partially and conditionally automated vehicles. Augmented Reality (AR) may provide a promising way to enhance drivers' experience when using autonomous driving (AD) systems. This study sought to gain insights on drivers' subjective assessment of a simulated driving automation system with AR-based support. A driving simulator study was conducted and participants' rating of the AD system in terms of information imparting, nervousness and trust was collected. Cumulative Link Models (CLMs) were developed to investigate the impacts of AR cues, traffic density and intersection complexity on drivers' attitudes towards the presented AD system. Random effects were incorporated in the CLMs to account for the heterogeneity among participants. Results indicated that AR graphic cues could significantly improve drivers' experience by providing advice for their decision-making and mitigating their anxiety and stress. However, the magnitude of AR's effect was impacted by traffic conditions (i.e. diminished at more complex intersections). The study also revealed a strong correlation between self-rated trust and takeover frequency, suggesting takeover and other driving behavior need to be further examined in future studies.
Eye-tracking techniques have the potential for estimating driver awareness of road hazards. However, traditional eye-movement measures based on static areas of interest may not capture the unique characteristics of driver eyeglance behavior and challenge the real-time application of the technology on the road. This article proposes a novel method to operationalize driver eye-movement data analysis based on moving objects of interest. A human-subject experiment conducted in a driving simulator demonstrated the potential of the proposed method. Correlation and regression analyses between indirect (i.e., eye-tracking) and direct measures of driver awareness identified some promising variables that feature both spatial and temporal aspects of driver eye-glance behavior relative to objects of interest. Results also suggest that eye-glance behavior might be a promising but insufficient predictor of driver awareness. This work is a preliminary step toward real-time, on-road estimation of driver awareness of road hazards. The proposed method could be further combined with computer-vision techniques such as object recognition to fully automate eye-movement data processing as well as machine learning approaches to improve the accuracy of driver awareness estimation.
We propose a robust solution to future trajectory forecast, which can be practically applicable to autonomous agents in highly crowded environments. For this, three aspects are particularly addressed in this paper. First, we use composite fields to predict future locations of all road agents in a single-shot, which results in a constant time complexity, regardless of the number of agents in the scene. Second, interactions between agents are modeled as a non-local response, enabling spatial relationships between different locations to be captured temporally as well (i.e., in spatio-temporal interactions). Third, the semantic context of the scene are modeled and take into account the environmental constraints that potentially influence the future motion. To this end, we validate the robustness of the proposed approach using the ETH, UCY, and SDD datasets and highlight its practical functionality compared to the current state-of-the-art methods. (Test text added)
Test Content
This paper presents a trajectory planning algo-rithm for person following that is more comprehensive thanexisting algorithms. This algorithm is tailored for a front-wheel-steered vehicle, is designed to follow a person while avoidingcollisions with both static and moving obstacles, simultaneouslyoptimizing speed and steering, and minimizing control effort.This algorithm uses nonlinear model predictive control, wherethe underling trajectory optimization problem is approximatedusing a simultaneous method. Results collected in an unknownenvironment show that the proposed planning algorithm workswell with a perception algorithm to follow a person in unevengrass near obstacles and over ditches and curbs, and on asphaltover train-tracks and near buildings and cars. Overall, theresults indicate that the proposed algorithm can safely followa person in unknown, dynamic environment
In this letter, we present a method for resolving kinematic redundancy using a human motion database, with application to teleoperation of bimanual humanoid robots using low-cost devices. Handheld devices for virtual reality applications can realize low-cost interfaces for operating such robots but available information does not uniquely determine the arm configuration. The resulting arm motions may be unnatural and inconsistent due to the kinematic redundancy. The idea explored in this paper is to construct a human motion database in advance using an interface that can directly measure the whole arm configuration such as motion capture. During teleoperation, the database is used to infer the appropriate arm configuration, grasp forces, and object trajectory based on the end effector trajectories measured by low-cost devices. The database employs Bayesian Interaction Primitives that have been used for modeling human-robot interactions.
Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features. This problem is typically formulated in the context of supervised learning, where the learned classifiers operate on the premise that both training and test data are sampled from the same underlying distribution. However, this assumption does not hold when there is a significant domain shift, leading to poor generalization performance on the test data. To address this, we focus on the hard and novel task of generalizing training models to test samples without access to any labels from the latter for spatio-temporal action localization by proposing an end-to-end unsupervised domain adaptation algorithm. We extend the state-of-the-art object detection framework to localize and classify actions. In order to minimize the domain shift, three domain adaptation modules at image level (temporal and spatial) and instance level (temporal) are designed and integrated. We design a new experimental setup and evaluate the proposed method and different adaptation modules on the UCF-Sports, UCF-101 and JHMDB benchmark datasets. We show that significant performance gain can be achieved when spatial and temporal features are adapted separately, or jointly for the most effective results.
Properly calibrated human trust is essential for successful interaction between humans and automation. However, while human trust calibration can be improved by increased automation transparency, too much transparency can overwhelm human workload. To address this tradeoff, we present a probabilistic framework using a partially observable Markov decision process (POMDP) for modeling the coupled trust-workload dynamics of human behavior in an action-automation context. We specifically consider hands-off Level 2 driving automation in a city environment involving multiple intersections where the human chooses whether or not to rely on the automation. We consider automation reliability, automation transparency, and scene complexity, along with human reliance and eye-gaze behavior, to model the dynamics of human trust and workload. We demonstrate that our model framework can appropriately vary automation transparency based on real-time human trust and workload belief estimates to achieve trust calibration
New developments in advanced driver assistance systems (ADAS) can help drivers deal with risky driving maneuvers, preventing potential hazard scenarios. A key challenge in these systems is to determine when to intervene. While there are situations where the needs for intervention or feedback is clear (e.g., lane departure), it is often difficult to determine scenarios that deviate from normal driving conditions. These scenarios can appear due to errors by the drivers, presence of pedestrian or bicycles, or maneuvers from other vehicles. We formulate this problem as a driving anomaly detection, where the goal is to automatically identify cases that require intervention. Towards addressing this challenging but important goal, we propose a multimodal system that considers (1) physiological signals from the driver, and (2) vehicle information obtained from the controller area network (CAN) bus sensor. The system relies on conditional generative adversarial networks (GAN) where the models are constrained by the signals previously observed. The difference of the scores in the discriminator between the predicted and actual signals is used as a metric for detecting driving anomalies. We collected and annotated a novel dataset for driving anomaly detection tasks, which is used to validate our proposed models. We present the analysis of the results, and perceptual evaluations which demonstrate the discriminative power of this unsupervised approach for detecting driving anomalies
A significant amount of people die in road accidents due to driver errors. To reduce fatalities, developing intelligent driving systems assisting drivers to identify potential risks is in an urgent need. Risky situations are generally defined based on collision prediction in the existing works. However, collision is only a source of potential risks, and a more generic definition is required. In this work, we propose a novel driver-centric definition of risk, i.e., objects influencing drivers' behavior are risky. A new task called risk object identification is introduced. We formulate the task as the cause-effect problem and present a novel two-stage risk object identification framework based on causal inference with the proposed object-level manipulable driving model. We demonstrate favorable performance on risk object identification compared with strong baselines on the Honda Research Institute Driving Dataset (HDD). Our framework achieves a substantial average performance boost over a strong baseline by 7.5%.
Maneuvering in dense traffic is a challenging task for autonomous vehicles because it requires reasoning about the stochastic behaviors of many other participants. In addition, the agent must achieve the maneuver within a limited time and distance. In this work, we propose a combination of reinforcement learning and game theory to learn merging behaviors. We design a training curriculum for a reinforcement learning agent using the concept of level-k behavior. This approach exposes the agent to a broad variety of behaviors during training, which promotes learning policies that are robust to model discrepancies. We show that our approach learns more efficient policies than traditional training methods.
Driving anomaly detection is an important problem in advanced driver assistance systems (ADAS). The ability to immediately detect potentially hazardous scenarios will prevent accidents by allowing enough time to react. Toward this goal, our previous work proposed an unsupervised driving anomaly detection system using conditional generative adversarial network (GAN), which was built with physiological data and features extracted from the controller area network-Bus (CAN-Bus). The approach generates predictions for the upcoming driving recordings, constrained by the previously observed signals. These predictions were contrasted with actual physiological and CAN-Bus signals by subtracting the corresponding activation outputs from the discriminator. Instead, this study proposes to use a triplet-loss function to contrast the predicted and actual signals. The triplet-loss function creates an unsupervised framework that rewards predictions closer to the actual signals, and penalizes predictions deviating from the expected signals. This approach maximizes the discriminative power of feature embeddings to detect anomalies, leading to measurable improvements over the results observed by our previous approach. The study is implemented and evaluated with recordings from the driving anomaly dataset (DAD), which includes 250 hours of naturalistic data manually annotated with driving events. Objective and subjective metrics validate the benefits of using the proposed triplet-loss function for driving anomaly detection.
Understanding and predicting pedestrian behavior is an important and challenging area of research for realizing safe and effective navigation strategies in automated and advanced driver assistance technologies in urban scenes. This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view for the purpose of predicting intention and forecasting future trajectory. A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians, whereby actions and intentions are constantly in flux and depend on the pedestrians pose, their 3D spatial relations, and their interaction with other agents as well as with the environment. To partially address these challenges, we consider the importance of pose toward recognition and 3D localization of pedestrian actions. In particular, we propose an action recognition framework using a two-stream temporal relation network with inputs corresponding to the raw RGB image sequence of the tracked pedestrian as well as the pedestrian pose. The proposed method outperforms methods using a single-stream temporal relation network based on evaluations using the JAAD public dataset. The estimated pose and associated body key-points are also used as input to a network that estimates the 3D location of the pedestrian using a unique loss function. The evaluation of our 3D localization method on the KITTI dataset indicates the improvement of the average localization error as compared to existing state-of-the-art methods. Finally, we conduct qualitative tests of action recognition and 3D localization on HRI’s H3D driving dataset.
Understanding how personalities relate to driving styles is crucial for improving Advanced Driver Assistance Systems (ADASs) and driver-vehicle interactions. Focusing on the ”high-risk” population of young male drivers, the objective of this study is to investigate the association between personality traits and driving styles. An online survey study was conducted among 46 males aged 21-30 to gauge their personality traits, self-reported driving style, and driving history. Hierarchical Clustering was proposed to identify driving styles and revealed two subgroups of drivers who either had a ”risky” or ”compliant” driving style. Compared to the compliant group, the risky cluster sped more frequently, was easily distracted and affected by negative emotion, and often behaved recklessly. The logit model results showed that the risky driving style was associated with lower Agreeableness and Conscientiousness, but higher driving exposure. An interaction effect was also detected between age and Extraversion to form a risky driving style.
Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We extend the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different fabric manipulation tasks with a single goal-conditioned policy. We introduce VisuoSpatial Foresight (VSF), which builds on prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance. RGBD data yields an 80% improvement in fabric folding success rate over pure RGB data. Code, data, videos, and supplementary material are available at https://sites.google.com/view/fabric-vsf/.
This paper introduces an accurate nonlinear model predictive control-based algorithm for trajectory following. For accuracy, the algorithm incorporates both the planned state and control trajectories into its cost functional. Current following algorithms do not incorporate control trajectories into their cost functionals. Comparisons are made against two trajectory following algorithms, where the trajectory planning problem is to safely follow a person using an automated ATV with control delays in a dynamic environment while simultaneously optimizing speed and steering, minimizing control effort, and minimizing the time-to-goal. Results indicate that the proposed algorithm reduces collisions, tracking error, orientation error, and time-to-goal. Therefore, tracking the control trajectories with the trajectory following algorithm helps the vehicle follow the planned state trajectories more accurately, which ultimately improves safety, especially in dynamic environments
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surroundingdrivers is essential to achieve the lane-change maneuver. Wepropose a two-stage control framework that harmonizes ModelPredictive Control (MPC) with Generative Adversarial Networks(GAN) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, thesystem is augmented with an adaptive safety boundary and aKalman Filter to mitigate sensor noise. Simulation studies are in-vestigated in different levels of traffic density and cooperativenessof other drivers. The simulation results support the effectiveness,driving comfort, and safety of the proposed method.
We consider the problem of predicting the future trajectory of scene agents from egocentric views obtained from a moving platform. This problem is important in a variety of domains, particularly for autonomous systems making reactive or strategic decisions in navigation. In an attempt to address this problem, we introduce TITAN (Trajectory Inference using Targeted Action priors Network), a new model that incorporates prior positions, actions, and context to forecast future trajectory of agents and future ego-motion. In the absence of an appropriate dataset for this task, we created the TITAN dataset that consists of 700 labeled video-clips (with odometry) captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo. Our dataset includes 50 labels including vehicle states and actions, pedestrian age groups, and targeted pedestrian action attributes that are organized hierarchically corresponding to atomic, simple/complex-contextual, transportive, and communicative actions. To evaluate our model, we conducted extensive experiments on the TITAN dataset, revealing significant performance improvement against baselines and state-of-the-art algorithms. We also report promising results from our Agent Importance Mechanism (AIM), a module which provides insight into assessment of perceived risk by calculating the relative influence of each agent on the future ego-trajectory. The dataset is available at https://usa.honda-ri.com/titan
The development of active, durable, and nonprecious electrocatalysts for hydrogen electrochemistry is highly desirable but challenging. In this work, we design and fabricate a novel interface catalyst of Ni and Co2N (Ni/Co2N) for hydrogen evolution reaction (HER) and hydrogen oxidation reaction (HOR). The Ni/Co2N interfacial catalysts not only achieve a current density of −10.0 mA cm–2 with an overpotential of 16.2 mV for HER but also provide a HOR current density of 2.35 mA cm–2 at 0.1 V vs reversible hydrogen electrode (RHE). Furthermore, the electrode couple made of the Ni/Co2N interfacial catalysts requires only a cell voltage of 1.57 V to gain a current density of 10 mA cm–2 for overall water splitting. Hybridizations in the three elements of Ni-3d, N-2p, and Co-3d result in charge transfer in the interfacial junction of the Ni and Co2N materials. Our density functional theory calculations show that both the interfacial N and Co sites of Ni/Co2N prefer to hydrogen adsorption in the hydrogen catalytic activities. This study provides a new approach for the construction of multifunctional catalysts for hydrogen electrochemistry.
We consider the problem of predicting the future trajectory of scene agents from egocentric views obtained from a moving platform. This problem is important in a variety of domains, particularly for autonomous systems making reactive or strategic decisions in navigation. In an attempt to address this problem, we introduce TITAN (Trajectory Inference using Targeted Action priors Network), a new model that incorporates prior positions, actions, and context to forecast future trajectory of agents and future ego-motion. In the absence of an appropriate dataset for this task, we created the TITAN dataset that consists of 700 labeled video-clips (with odometry) captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo. Our dataset includes 50 labels including vehicle states and actions, pedestrian age groups, and targeted pedestrian action attributes that are organized hierarchically corresponding to atomic, simple/complex-contextual, transportive, and communicative actions. To evaluate our model, we conducted extensive experiments on the TITAN dataset, revealing significant performance improvement against baselines and state-of-the-art algorithms. We also report promising results from our Agent Importance Mechanism (AIM), a module which provides insight into assessment of perceived risk by calculating the relative influence of each agent on the future ego-trajectory. The dataset is available at https://usa.honda-ri.com/titan
To enable intelligent automated driving systems, a promising strategy is to understand how human drives and interacts with road users in complicated driving situations. In this paper, we propose a 3D-aware egocentric spatial-temporal interaction framework for automated driving applications. Graph convolution networks (GCN) is devised for interaction modeling. We introduce three novel concepts into GCN. First, we decompose egocentric interactions into ego-thing and egostuff interaction, modeled by two GCNs. In both GCNs, ego nodes are introduced to encode the interaction between thing objects (e.g., car and pedestrian), and interaction between stuff objects (e.g., lane marking and traffic light). Second, objects’ 3D locations are explicitly incorporated into GCN to better model egocentric interactions. Third, to implement ego-stuff interaction in GCN, we propose a MaskAlign operation to extract features for irregular objects.
We validate the proposed framework on tactical driver behavior recognition. Extensive experiments are conducted using Honda Research Institute Driving Dataset, the largest dataset with diverse tactical driver behavior annotations. Our framework demonstrates substantial performance boost over baselines on the two experimental settings by 3.9% and 6.0%, respectively. Furthermore, we visualize the learned affinity matrices, which encode ego-thing and ego-stuff interactions, to showcase the proposed framework can capture interactions effectively.
A vehicle driving along the road is surrounded by many objects, but only a small subset of them influence the driver’s decisions and actions. Learning to estimate the importance of each object on the driver’s real-time decisionmaking may help better understand human driving behavior and lead to more reliable autonomous driving systems. Solving this problem requires models that understand the interactions between the ego-vehicle and the surrounding objects. However, interactions among other objects in the scene can potentially also be very helpful, e.g., a pedestrian beginning to cross the road between the ego-vehicle and the car in front will make the car in front less important. We propose a novel framework for object importance estimation using an interaction graph, in which the features of each object node are updated by interacting with others through graph convolution. Experiments show that our model outperforms state-of-the-art baselines with much less input and pre-processing.
This paper presents a learning-from-demonstration (LfD) framework for teaching human-robot social interactions that involve whole-body haptic interaction, i.e. direct human-robot contact over the full robot body. The performance of existing LfD frameworks suffers in such interactions due to the high dimensionality and spatiotemporal sparsity of the demonstration data. We show that by leveraging this sparsity, we can reduce the data dimensionality without incurring a significant accuracy penalty, and introduce three strategies for doing so. By combining these techniques with an LfD framework for learning multimodal human-robot interactions, we can model the spatiotemporal relationship between the tactile and kinesthetic information during whole-body haptic interactions. Using a teleoperated bimanual robot equipped with 61 force sensors, we experimentally demonstrate that a model trained with 121 sample hugs from 4 participants generalizes well to unseen inputs and human partners.
The role of additives in facilitating the growth of conventional semiconducting thin films is well-established. Apparently, their presence is also decisive in the growth of two-dimensional transition metal dichalcogenides (TMDs), yet their role remains ambiguous. In this work, we show that the use of sodium bromide enables synthesis of TMD monolayers via a surfactant-mediated growth mechanism, without introducing liquefaction of metal oxide precursors. We discovered that sodium ions provided by sodium bromide chemically passivate edges of growing molybdenum disulfide crystals, relaxing in-plane strains to suppress 3D islanding and promote monolayer growth. To exploit this growth model, molybdenum disulfide monolayers were directly grown into desired patterns using predeposited sodium bromide as a removable template. The surfactant-mediated growth not only extends the families of metal oxide precursors but also offers a way for lithography-free patterning of TMD monolayers on various surfaces to facilitate fabrication of atomically thin electronic devices.
Recognition of human actions and associated interactions with objects and the environment is an important problem in computer vision due to its potential applications in a variety of domains. Recently, graph convolutional networks that extract features from the skeleton have demonstrated promising performance. In this paper, we propose a novel Spatio-Temporal Pyramid Graph Convolutional Network (ST-PGN) for online action recognition for ergonomics risk assessment that enables the use of features from all levels of the skeleton feature hierarchy. The proposed algorithm outperforms state-of-art action recognition algorithms tested on two public benchmark datasets typically used for postural assessment (TUM and UW-IOM). We also introduce a pipeline to enhance postural assessment methods with online action recognition techniques. Finally, the proposed algorithm is integrated with a traditional ergonomics risk index (REBA) to demonstrate the potential value for assessment of musculoskeletal disorders in occupational safety.
We employ triplet loss as a feature embedding regularizer to boost classification performance. Standard architectures, like ResNet and Inception, are extended to support both losses with minimal hyper-parameter tuning. This promotes generality while fine-tuning pretrained networks. Triplet loss is a powerful surrogate for recently proposed embedding regularizers. Yet, it is avoided due to large batch-size requirement and high computational cost. Through our experiments, we re-assess these assumptions.
During inference, our network supports both classification and embedding tasks without any computational overhead. Quantitative evaluation highlights a steady improvement on five fine-grained recognition datasets. Further evaluation on an imbalanced video dataset achieves significant improvement. Triplet loss brings feature embedding capabilities like nearest neighbor to classification models. Code available at http://bit.ly/2LNYEqL
Many of the existing electrochemical catalysts suffer from poor selectivity, instability, and low exchange current densities. These shortcomings call for a comprehensive exploration of the catalytic processes at the fundamental nanometer length scale levels. Here we exploit infrared (IR) nanoimaging and nanospectroscopy to directly visualize catalytic reactions on the surface of Cu2O polyhedral single crystals with nanoscale spatial resolution. Nano-IR data revealed signatures of this common catalyst after electrochemical reduction of carbon dioxides (CO2). We discuss the utility of nano-IR methods for surface/facet engineering of efficient electrochemical catalysts.
A variety of cooperative multi-agent control problems require agents to achieve individual goals while contributing to collective success. This multi-goal multiagent setting poses difficulties for recent algorithms, which primarily target settings with a single global reward, due to two new challenges: efficient exploration for learning both individual goal attainment and cooperation for others’ success, and credit-assignment for interactions between actions and goals of different agents. To address both challenges, we restructure the problem into a novel two-stage curriculum, in which single-agent goal attainment is learned prior to learning multi-agent cooperation, and we derive a new multi-goal multi-agent policy gradient with a credit function for localized credit assignment. We use a function augmentation scheme to bridge value and policy functions across the curriculum. The complete architecture, called CM3, learns significantly faster than direct adaptations of existing algorithms on three challenging multi-goal multi-agent problems: cooperative navigation in difficult formations, negotiating multi-vehicle lane changes in the SUMO traffic simulator, and strategic cooperation in a Checkers environment.
The Tactical Driver Behavior modeling problem requires an understanding of driver actions in complicated urban scenarios from rich multimodal signals including video, LiDAR and CAN signal data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks jointly offers the richest distillation of knowledge but presents challenges in the formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in Long ShortTerm Memory units (LSTMs), we propose Gated Recurrent Fusion Units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate it’s superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. On tactical driver behavior classification using Honda Driving Dataset (HDD), we report 10% improvement in mean Average Precision (mAP) score, and similarly, for steering angle regression on TORCS dataset, we note a 20% drop in Mean Squared Error (MSE) over the state-of-the-art
Singlet fission is believed to improve the efficiency of solar energy conversion by breaking up the Shockley–Queisser thermodynamic limit. Understanding of triplet excitons generated by singlet fission is essential for solar energy exploitation. Here we employed transient absorption microscopy to examine dynamical behaviors of triplet excitons. We observed anisotropic recombination of triplet excitons in hexacene single crystals. The triplet exciton relaxations from singlet fission proceed in both geminate and non-geminate recombination. For the geminate recombination, the different rates were attributed to the significant difference in their related energy change based on the Redfield quantum dissipation theory. The process is mainly governed by the electron–phonon interaction in hexacene. On the other hand, the non-geminate recombination is of bimolecular origin through energy transfer. In the triplet–triplet bimolecular process, the rates along the two different optical axes in the a–b crystalline plane differ by a factor of 4. This anisotropy in the triplet–triplet recombination rates was attributed to the interference in the coupling probability of dipole–dipole interactions in the different geometric configurations of hexacene single crystals. Our experimental findings provide new insight into future design of singlet fission materials with desirable triplet exciton exploitations.
Mobile robots moving in crowded environments have to navigate among pedestrians safely. Ideally, the way the robot avoids the pedestrians should not only be physically safe but also perceived safe and comfortable. Despite the rich literature in collision-free crowd navigation, limited research has been conducted on how humans perceive robot behaviors in the navigation context. In this paper, we implement three local pedestrian avoidance strategies inspired by human avoidance behaviors on a self-balancing mobile robot and evaluate their perception in a human-robot crossing scenario through a large-scale user study with 98 participants. The study reveals that the avoidance strategies positively affect the participants' perception of the robot's safety, comfort, and awareness to different degrees. Furthermore, the participants perceive the robot as more intelligent, friendly and reliable in the last trial than in the first even with the same strategy.
According to density functional theory, monolayer (ML) MoS2 is predicted to possess electrocatalytic activity for the hydrogen evolution reaction (HER) that approaches that of platinum. However, its observed HER activity is much lower, which is widely believed to result from a large Schottky barrier between ML MoS2 and its electrical contact. In order to better understand the role of contact resistance in limiting the performance of ML MoS2 HER electrocatalysts, this study has employed well-defined test platforms that allow for the simultaneous measurement of contact resistance and electrocatalytic activity toward the HER during electrochemical testing. At open circuit potential, these measurements reveal that a 0.5 M H2SO4 electrolyte can act as a strong p-dopant that depletes free electrons in MoS2 and leads to extremely high contact resistance, even if the contact resistance of the as-made device in air is originally very low. However, under applied negative potentials this doping is mitigated by a strong electrolyte-mediated gating effect which can reduce the contact and sheet resistances of properly configured ML MoS2 electrocatalysts by more than 5 orders of magnitude. At potentials relevant to HER, the contact resistance becomes negligible and the performance of MoS2 electrodes is limited by HER kinetics. These findings have important implications for the design of low-dimensional semiconducting electrocatalysts and photocatalysts.
Inferring relational behavior between road users as well as road users and their surrounding physical space is an important step toward effective modeling and prediction of navigation strategies adopted by participants in road scenes. To this end, we propose a relation-aware framework for future trajectory forecast. Our system aims to infer relational information from the interactions of road users with each other and with the environment. The first module involves visual encoding of Spatio-temporal features, which captures human-human and human-space interactions over time. The following module explicitly constructs pair-wise relations from Spatio-temporal interactions and identifies more descriptive relations that highly influence future motion of the target road user by considering its past trajectory. The resulting relational features are used to forecast future locations of the target, in the form of heatmaps with an additional guidance of spatial dependencies and consideration of the uncertainty. Extensive evaluations on the public benchmark datasets demonstrate the robustness and efficacy of the proposed framework as observed by performances higher than the state-of-the-art methods.
Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed. However, important real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this paper, we propose a novel framework, the Temporal Recurrent Network (TRN), to model greater temporal context of each frame by simultaneously performing online action detection and anticipation of the immediate future. At each moment in time, our approach makes use of both accumulated historical evidence and predicted future information to better recognize the action that is currently occurring, and integrates both of these into a unified end-to-end architecture. We evaluate our approach on two popular online action detection datasets, HDD and TVSeries, as well as another widely used dataset, THUMOS'14. The results show that TRN significantly outperforms the state-of-the-art.
In this paper, we propose a robust, real-time and scalable localization framework for multi-LiDAR equipped vehicles in challenging urban environments. Our test vehicle uses multiple LiDARs with low mounting positions which makes the localization task very challenging in dense traffic scenarios due to the reduced field-of-view of LiDARs and additional uncertainty introduced by the dynamic vehicles. In order to increase the robustness and provide consistently smooth localization output, LiDAR localization is fused with the dead reckoning using the probabilistic scan matching confidence estimation method. We conducted experiments in different urban settings and results confirmed that our framework can operate reliably with low-position multi-LiDAR suite under various traffic scenarios.
As a driver prepares to complete a maneuver, his/her internal cognitive state triggers physiological responses that are manifested, for example, in changes in heart rate (HR), breath rate (BR), and electrodermal activity (EDA). This process opens opportunities to understand driving events by observing the physiological data of the driver. In particular, this work studies the relation between driver maneuvers and physiological signals during naturalistic driving recordings. It presents both feature and discriminant analysis to investigate how physiological data can signal driver's responses for planning, preparation, and execution of driving maneuvers. We study recordings with extreme values in the physiological data (high and low values in HR, BR, and EDA). The analysis indicates that most of these events are associated with driving events. We evaluate the values obtained from physiological signals as the driver complete specific maneuvers. We observe deviations from typical physiological responses during normal driving recordings that are statistically significant. These results are validated with binary classification problems, where the task is to recognize between a driving maneuver and a normal driving condition (e.g., left turn versus normal). The average F1-score of these classifiers is 72.8%, demonstrating the discriminative power of features extracted from physiological signals.
Corner cases are the main bottlenecks when applying Artificial Intelligence (AI) systems to safety-critical applications. An AI system should be intelligent enough to detect such situations so that system developers can prepare for subsequent planning. In this paper, we propose semi-supervised anomaly detection considering the imbalance of normal situations. In particular, driving data consists of multiple positive/normal situations (eg, right turn, going straight), some of which (eg, U-turn) could be as rare as anomalous situations. Existing machine learning based anomaly detection approaches do not fare sufficiently well when applied to such imbalanced data. In this paper, we present a novel multi-task learning based approach that leverages domain-knowledge (maneuver labels) for anomaly detection in driving data. We evaluate the proposed approach both quantitatively and qualitatively on 150 hours of real-world driving data and show improved performance over baseline approaches.
Toward Prediction of Driver Awareness of Automotive Hazards: Driving-Video-Based Simulation Approach
Recent AR research efforts have explored the use of virtual environments to test augmented reality (AR) user interfaces. However, it is yet to be seen what effects the visual fidelity of such virtual environments may have on AR interface assessment, and specifically to what degree assessment results observed in a virtual world would apply to the real world. Automotive AR head-up (HUD) interfaces provide a meaningful application area to examine this problem, especially given that immersive, 3D-graphics-based driving simulators are established tools to examine in-vehicle interfaces safely before testing in real vehicles. In this work, we present an argument that adequately assessing AR interfaces requires a suite of different measures, and that such measures should be considered when debating the appropriateness of virtual environments for AR interface assessment. We present a case study that examines how an AR interface presented via HUD effects driver performance and behavior in different virtual and real environments. Twelve participants completed the study measuring driver task performance, eye gaze behavior and situational awareness during AR guided navigation in low- and high-fidelity virtual simulation, and an on-road environment. Our results suggest that the visual fidelity of the environmental in which an AR interface is assessed, could impact some measures of effectiveness. Discussion is guided by a proposed initial assessment classification for AR user interfaces that may serve to guide future discussions on AR interface evaluation, as well as the suitability of virtual environments for AR assessment.
In this paper, we present motion retargeting and control algorithms for teleoperated physical human-robot interaction (pHRI). We employ unilateral teleoperation in which a sensor-equipped operator interacts with a static object such as a mannequin to provide the motion and force references. The controller takes the references as well as current robot states and contact forces as input, and outputs the joint torques to track the operator's contact forces while preserving the expression and style of the motion. We develop a hierarchical optimization scheme combined with a motion retargeting algorithm that resolves the discrepancy between the contact states of the operator and robot due to different kinematic parameters and body shapes. We demonstrate the controller performance on a dual-arm robot with soft skin and contact force sensors using pre-recorded human demonstrations of hugging.
Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting.
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
Road-users are a critical part of decision-making for both self-driving cars and driver assistance systems. Some road-users, however, are more important for decision-making than others because of their respective intentions, ego vehicle's intention and their effects on each other. In this paper, we propose a novel architecture for road-user importance estimation which takes advantage of the local and global context of the scene. For local context, the model exploits the appearance of the road users (which captures orientation, intention, etc.) and their location relative to ego-vehicle. The global context in our model is defined based on the feature map of the convolutional layer of the module which predicts the future path of the ego-vehicle and contains rich global information of the scene (e.g., infrastructure, road lanes, etc.), as well as the ego vehicle's intention information. Moreover, this paper introduces a new data set of real-world driving, concentrated around inter-sections and includes annotations of important road users. Systematic evaluations of our proposed method against several baselines show promising results.
Singlet fission is known to improve solar energy utilization by circumventing the Shockley-Queisser limit. The two essential steps of singlet fission are the formation of a correlated triplet pair and its subsequent quantum decoherence. However, the mechanisms of the triplet pair formation and decoherence still remain elusive. Here we examined both essential steps in single crystalline hexacene and discovered remarkable anisotropy of the overall singlet fission rate along different crystal axes. Since the triplet pair formation emerges on the same timescale along both crystal axes, the quantum decoherence is likely responsible for the directional anisotropy. The distinct quantum decoherence rates are ascribed to the notable difference on their associated energy loss according to the Redfield quantum dissipation theory. Our hybrid experimental/theoretical framework will not only further our understanding of singlet fission, but also shed light on the systematic design of new materials for the third-generation solar cells.
Singlet fission has the great potential to overcome the Shockley–Queisser thermodynamic limit and thus promotes solar power conversion efficiency. However, the current limited understandings of detailed singlet fission mechanisms hinder a further improved design of versatile singlet fission materials. In the present study, we combined ultrafast transient infrared spectroscopy with ab initio calculations to elucidate the roles played by the vibrational normal modes in the process of singlet fission for hexacene. Our transient infrared experiments revealed three groups of vibrational modes that are prominent in vibronic coupling upon photoexcitation. Through our computational study, those normal modes with notable Franck-Condon shifts have been classified as ring-twisting modes near 1300.0 cm−1, ring-stretching modes near 1600.0 cm−1, and ring-scissoring modes near 1700.0 cm−1. Experimentally, a ring-stretching mode near 1620.0 cm−1 exhibits a significant blue-shift of 4.0 cm−1 during singlet fission, which reaction rate turns out to be 0.59 ± 0.07 ps. More interestingly, the blue-shifted mode was also identified by our functional mode singlet fission theory as the primary driving mode for singlet fission, suggesting the importance of vibronic coupling when a correlated triplet pair of hexacene is directly converted from its first excited state singlet exciton. Our findings indicate that the ultrafast transient infrared spectroscopy, in conjunction with the nonadiabatic transition theory, is a powerful tool to probe the vibronic fingerprint of singlet fission.
Inferring relational behavior between road users as well as road users and their surrounding physical space is an important step toward effective modeling and prediction of navigation strategies adopted by participants in road scenes. To this end, we propose a relation-aware frame-work for future trajectory forecast, which aims to infer relational information from the interactions of road users with each other and with environments. Extensive evaluations on a public dataset demonstrate the robustness of the proposed framework as observed by performances higher than the state-of-the-art methods.
Occupancy maps offer a useful representation of the environment for applications in automated driving and robotics. If created from 3D lidar scanners, they can be very accurate. Traditional static occupancy mapping methods are not able to represent a changing environment. In this paper, we present a novel dynamic occupancy mapping (DOM) algorithm. Inspired by the phase congruency idea from computer vision, it has an intuitive formulation and yet is effective in practice. In addition, our framework provides solutions to several common challenges during occupancy mapping, such as multi-lidar fusion and ground estimation. Finally, we use several experiments to quantitatively evaluate the quality of DOM’s output and our algorithm runs in real-time (10Hz).
Automotive manufacturers are rapidly developing more advanced in-vehicle systems that seek to provide a driver with more active safety and information in real-time, in particular, human machine interfaces (HMIs) using mixed or augmented reality (AR) graphical elements. However, it is difficult to properly test novel AR interfaces in the same way as traditional HMIs via on-road testing. Instead, the simulation could likely offer a safer and more financially viable alternative for testing AR HMIs, inconsistent simulation quality may confound HMI research depending on the visual fidelity of each simulation environment. We investigated how visual fidelity in a virtual environment impacts the quality of resulting driver behavior, visual attention, and situational awareness when using the system. We designed two large-scale immersive virtual environments; a “low” graphic fidelity driving simulation representing most current research simulation testbeds and a “high” graphic fidelity environment created in Unreal Engine that represents state of the art graphical presentation. We conducted a user study with 24 participants who navigated a route in a virtual urban environment via the direction of AR graphical cues while also monitoring the road scene for pedestrian hazards, and recorded their driving performance, gaze patterns, and subjective feedback via situational awareness survey (SART). Our results show drivers change both their driving and visual behavior depending upon the visual fidelity presented in the virtual scene. We further demonstrate the value of using multi-tiered analysis techniques to more finely examine driver performance and visual attention.
Understanding ego-motion and surrounding vehicle state is essential to enable automated driving and advanced driving assistance technologies. Typical approaches to solve this problem use fusion of multiple sensors such as LiDAR, camera, and radar to recognize surrounding vehicle state, including position, velocity, and orientation. Such sensing modalities are overly complex and costly for production of personal use vehicles. In this paper, we propose a novel machine learning method to estimate ego-motion and surrounding vehicle state using a single monocular camera. Our approach is based on a combination of three deep neural networks to estimate the 3D vehicle bounding box, depth, and optical flow from a sequence of images. The main contribution of this paper is a new framework and algorithm that integrates these three networks in order to estimate the ego-motion and surrounding vehicle state. To realize more accurate 3D position estimation, we address ground plane correction in real-time. The efficacy of the proposed method is demonstrated through experimental evaluations that compare our results to ground truth data available from other sensors including Can-Bus and LiDAR.
Automotive manufactures are rapidly developing more advanced in-vehicle systems that seek to provide a driver with more active safety and information in real-time, in particular human machine interfaces (HMIs) using mixed or augmented reality (AR) graphical elements. However, it is difficult to properly test novel AR interfaces in the same way as traditional HMIs via on-road testing. Instead, simulation could likely offer a safer and more financially viable alternative for testing AR HMIs, inconsistent simulation quality may confound HMI research depending on the visual fidelity of each simulation environment. We investigated how visual fidelity in a virtual environment impacts the quality of resulting driver behavior, visual attention, and situational awareness when using the system. We designed two large-scale immersive virtual environments; a “low” graphic fidelity driving simulation representing most current research simulation testbeds and a “high” graphic fidelity environment created in Unreal Engine that represents state of the art graphical presentation. We conducted a user study with 24 participants who navigated a route in a virtual urban environment via direction of AR graphical cues while also monitoring the road scene for pedestrian hazards, and recorded their driving performance, gaze patterns, and subjective feedback via situational awareness survey (SART). Our results show drivers change both their driving and visual behavior depending upon the visual fidelity presented in the virtual scene. We further demonstrate the value of using multi-tiered analysis techniques to more finely examine driver performance and visual attention.
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.
Estimating statistical uncertainties allows autonomous agents to communicate their confidence during task execution and is important for applications in safety-critical domains such as autonomous driving. In this work, we present the uncertainty-aware imitation learning (UAIL) algorithm for improving end-to-end control systems via data aggregation. UAIL applies Monte Carlo Dropout to estimate uncertainty in the control output of end-to-end systems, using states where it is uncertain to selectively acquire new training data. In contrast to prior data aggregation algorithms that force human experts to visit sub-optimal states at random, UAIL can anticipate its own mistakes and switch control to the expert in order to prevent visiting a series of sub-optimal states. Our experimental results from simulated driving tasks demonstrate that our proposed uncertainty estimation method can be leveraged to reliably predict infractions. Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks.
In a multi-agent setting, the optimal policy of a single agent is largely dependent on the behavior of other agents. We investigate the problem of multi-agent reinforcement learning, focusing on decentralized learning in non- stationary domains for mobile robot navigation. We identify a cause for the difficulty in training non-stationary policies: mutual adaptation to sub-optimal behaviors, and we use this to motivate a curriculum-based strategy for learning interactive policies. The curriculum has two stages. First, the agent leverages policy gradient algorithms to learn a policy that is capable of achieving multiple goals. Second, the agent learns a modifier policy to learn how to interact with other agents in a multi-agent setting. We evaluated our approach on both an autonomous driving lane-change domain and a robot navigation domain.
3D multi-object detection and tracking are crucial for traffic scene understanding. However, the community pays less attention to these areas due to the lack of a standardized benchmark dataset to advance the field. Moreover, existing datasets (e.g., KITTI [1]) do not provide sufficient data and labels to tackle challenging scenes where highly interactive and occluded traffic participants are present. To address the issues, we present the Honda Research Institute 3D Dataset (H3D), a large-scale full-surround 3D multi-object detection and tracking dataset collected using a 3D LiDAR scanner. H3D comprises of 160 crowded and highly interactive traffic scenes with a
total of 1 million labeled instances in 27,721 frames. With unique dataset size, rich annotations, and complex scenes, H3D is gathered to stimulate research on full-surround 3D multiobject detection and tracking. To effectively and efficiently annotate a large-scale 3D point cloud dataset, we propose a labeling methodology to speed up the overall annotation cycle. A standardized benchmark is created to evaluate full-surround 3D multi-object detection and tracking algorithms. 3D object detection and tracking algorithms are trained and tested on H3D. Finally, sources of errors are discussed for the development of future algorithms.
We formulate a new problem as Object Importance Estimation (OIE) in on-road driving videos, where the road users are considered as important objects if they have an influence on the control decision of the ego-vehicle’s driver. The importance of a road user depends on both its visual dynamics,e.g., appearance, motion and location, in the driving scene and the driving goal, e.g., the planned path, of the ego vehicle. We propose a novel framework that incorporates both visual model and goal representation to conduct OIE. To evaluate our framework, we collect an on-road driving dataset at traffic intersections in the real world and conduct human-labeled annotation of the important objects. Experimental results show that our goal-oriented method outperforms baselines and has much more improvement on the left-turn and right-turn scenarios. Furthermore, we explore the possibility of using object importance for driving control prediction and demonstrate that binary brake prediction can be improved with the information of object importance.
This paper examines the problem of dynamic traffic scene classification under space-time variations in viewpoint that arise from video captured on-board a moving vehicle. Solutions to this problem are important for the realization of effective driving assistance technologies required to interpret or predict road user behavior. Currently, dynamic traffic scene classification has not been adequately addressed due to a lack of benchmark datasets that consider the spatiotemporal evolution of traffic scenes resulting from a vehicle’s ego-motion. This paper has three main contributions. First, an annotated dataset is released to enable dynamic scene classification that includes 80 hours of diverse high quality driving video data clips collected in the San Francisco Bay area. The dataset includes temporal annotations for road places, road types, weather, and road surface conditions. Second, we introduce novel and baseline algorithms that utilize semantic context and temporal nature of the dataset for dynamic classification of road scenes. Finally, we showcase algorithms and experimental results that highlight how extracted features from scene classification serve as strong priors and help with tactical driver behavior understanding. The results show significant improvement from previously reported driving behavior detection baselines in the literature.
Real-time navigation in dense human environments is a challenging problem in robotics. Most existing path planners fail to account for the dynamics of pedestrians because introducing time as an additional dimension in search space is computationally prohibitive. Alternatively, most local motion planners only address imminent collision avoidance and fail to offer long-term optimality. In this work, we present an approach, called Dynamic Channels, to solve this global to local quandary. Our method combines the high-level topological path planning with low-level motion planning into a complete pipeline. By formulating the path planning problem as graph-searching in the triangulation space, our planner is able to explicitly reason about the obstacle dynamics and capture the environmental change efficiently. We evaluate the efficiency and performance of our approach on public pedestrian datasets and compare it to a state-of-the-art planning algorithm for dynamic obstacle avoidance. Completeness proofs are provided in the supplement at http://caochao.me/files/proof.pdf. An extended version of the paper is available on arXiv.
We present a multi-agent reinforcement learning algorithm that is a simple, yet effective modification of a known algorithm. External agents are modeled as a time-varying environment, whose policy parameters are updated periodically at a slower rate than the planner to make learning stable and more efficient. Replay buffer, which is used to store the experiences, is also reset with the same large period to draw samples from a fixed environment. This enables us to address challenging cooperative control problems in highway navigation. The resulting Multi-agent Reinforcement Learning with Periodic Parameter Sharing (MARL-PPS) algorithm outperforms the baselines in multi-agent highway scenarios we tested.
Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.
Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.
Recent success suggests that deep neural control networks are likely to be a key component of self-driving vehicles. These networks are trained on large datasets to imitate human actions, but they lack semantic understanding of image contents. This makes them brittle and potentially unsafe in situations that do not match training data. Here, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Attention mechanisms tie controller behavior to salient objects in the advice. We evaluate our model on a novel advisable driving dataset with manually annotated human-to-vehicle advice called Honda Research Institute-Advice Dataset (HAD). We show that taking advice improves the performance of the end-to-end network, while the network cues on a variety of visual features that are provided by advice. The dataset is available at https://usa.honda-ri.com/HAD.