Motion Planning and Interactive Decision Making
The push for the deployment of autonomous vehicles (AV) is already a reality in many cities and roads around the world. AVs make use of the improved capabilities in sensing and perception to predict the motion of other agents (human-driven vehicles, cyclists, and pedestrians) and plan a safe trajectory accordingly. Self-driving cars can help diminish the amount of accidents because of their faster reaction times and lack of distraction; however, they still cannot handle all types of traffic scenarios. Some of the most problematic situations to tackle are maneuvers in dense traffic conditions. Humans use a combination of visual cues like blinkers or hand signs and actions such as nudging, braking/accelerating to open/close a gap in order to communicate their intentions to surrounding agents.
Our Interactive Decision Making work at HRI includes modeling and simulating the behavior of surrounding agents to safely coordinate the combined actions. By creating motion planning behaviors that proactively seek information about others’ intentions while at the same time attempting to convince other human-driven vehicles to accommodate the AV’s desired maneuver, our system is able to prevent deadlocks.
Reinforcement Learning for Autonomous Driving with Latent State Inference and Spatial-Temporal Relationships
Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios. However, identifying the subtle cues that can indicate drastically different outcomes remains an open problem with designing autonomous systems that operate in human environments. In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty. We encode prior knowledge on the latent states of other drivers through a framework that combines the reinforcement learner with a supervised learner. In addition, we model the influence passing between different vehicles through graph neural networks (GNNs). The proposed framework significantly improves performance in the context of navigating T-intersections compared with state-of-the-art baseline approaches.
Anytime Game-Theoretic Planning with Safe and Active Information Gathering on Humans’ Latent States for Human-Centered Robots
A human-centered robot needs to reason about the cognitive limitations and potential irrationality of its human partner to achieve seamless interactions. This paper proposes a novel anytime game-theoretic planning framework that integrates iterative reasoning models, partially observable Markov decision process, and Monte-Carlo belief tree search for robot behavioral planning. Our planner equips a robot with the ability to reason about its human partner’s latent cognitive states(bounded intelligence and irrationality) and enables the robot to actively learn these latent states to better maximize its utility. Furthermore, our planner handles safety explicitly by enforcing change constraints. We validate our approach in an autonomous driving domain where our behavioral planner and a low-level motion controller hierarchically control an autonomous car to negotiate traffic merges. Simulations and user studies are conducted to show our planner’s effectiveness.
Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.
This paper presents a trajectory planning algo-rithm for person following that is more comprehensive thanexisting algorithms. This algorithm is tailored for a front-wheel-steered vehicle, is designed to follow a person while avoidingcollisions with both static and moving obstacles, simultaneouslyoptimizing speed and steering, and minimizing control effort.This algorithm uses nonlinear model predictive control, wherethe underling trajectory optimization problem is approximatedusing a simultaneous method. Results collected in an unknownenvironment show that the proposed planning algorithm workswell with a perception algorithm to follow a person in unevengrass near obstacles and over ditches and curbs, and on asphaltover train-tracks and near buildings and cars. Overall, theresults indicate that the proposed algorithm can safely followa person in unknown, dynamic environment
Maneuvering in dense traffic is a challenging task for autonomous vehicles because it requires reasoning about the stochastic behaviors of many other participants. In addition, the agent must achieve the maneuver within a limited time and distance. In this work, we propose a combination of reinforcement learning and game theory to learn merging behaviors. We design a training curriculum for a reinforcement learning agent using the concept of level-k behavior. This approach exposes the agent to a broad variety of behaviors during training, which promotes learning policies that are robust to model discrepancies. We show that our approach learns more efficient policies than traditional training methods.
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surroundingdrivers is essential to achieve the lane-change maneuver. Wepropose a two-stage control framework that harmonizes ModelPredictive Control (MPC) with Generative Adversarial Networks(GAN) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, thesystem is augmented with an adaptive safety boundary and aKalman Filter to mitigate sensor noise. Simulation studies are in-vestigated in different levels of traffic density and cooperativenessof other drivers. The simulation results support the effectiveness,driving comfort, and safety of the proposed method.
This paper introduces an accurate nonlinear model predictive control-based algorithm for trajectory following. For accuracy, the algorithm incorporates both the planned state and control trajectories into its cost functional. Current following algorithms do not incorporate control trajectories into their cost functionals. Comparisons are made against two trajectory following algorithms, where the trajectory planning problem is to safely follow a person using an automated ATV with control delays in a dynamic environment while simultaneously optimizing speed and steering, minimizing control effort, and minimizing the time-to-goal. Results indicate that the proposed algorithm reduces collisions, tracking error, orientation error, and time-to-goal. Therefore, tracking the control trajectories with the trajectory following algorithm helps the vehicle follow the planned state trajectories more accurately, which ultimately improves safety, especially in dynamic environments
A variety of cooperative multi-agent control problems require agents to achieve individual goals while contributing to collective success. This multi-goal multiagent setting poses difficulties for recent algorithms, which primarily target settings with a single global reward, due to two new challenges: efficient exploration for learning both individual goal attainment and cooperation for others’ success, and credit-assignment for interactions between actions and goals of different agents. To address both challenges, we restructure the problem into a novel two-stage curriculum, in which single-agent goal attainment is learned prior to learning multi-agent cooperation, and we derive a new multi-goal multi-agent policy gradient with a credit function for localized credit assignment. We use a function augmentation scheme to bridge value and policy functions across the curriculum. The complete architecture, called CM3, learns significantly faster than direct adaptations of existing algorithms on three challenging multi-goal multi-agent problems: cooperative navigation in difficult formations, negotiating multi-vehicle lane changes in the SUMO traffic simulator, and strategic cooperation in a Checkers environment.
The Tactical Driver Behavior modeling problem requires an understanding of driver actions in complicated urban scenarios from rich multimodal signals including video, LiDAR and CAN signal data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks jointly offers the richest distillation of knowledge but presents challenges in the formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in Long ShortTerm Memory units (LSTMs), we propose Gated Recurrent Fusion Units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate it’s superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. On tactical driver behavior classification using Honda Driving Dataset (HDD), we report 10% improvement in mean Average Precision (mAP) score, and similarly, for steering angle regression on TORCS dataset, we note a 20% drop in Mean Squared Error (MSE) over the state-of-the-art
Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting.
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.
Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.
We present a multi-agent reinforcement learning algorithm that is a simple, yet effective modification of a known algorithm. External agents are modeled as a time-varying environment, whose policy parameters are updated periodically at a slower rate than the planner to make learning stable and more efficient. Replay buffer, which is used to store the experiences, is also reset with the same large period to draw samples from a fixed environment. This enables us to address challenging cooperative control problems in highway navigation. The resulting Multi-agent Reinforcement Learning with Periodic Parameter Sharing (MARL-PPS) algorithm outperforms the baselines in multi-agent highway scenarios we tested.
Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate, and have limited ability to generalize. We then explore a system’s ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however, deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.