Motion Planning and Interactive Decision Making

The push for the deployment of autonomous vehicles (AV) is already a reality in many cities and roads around the world. AVs make use of the improved capabilities in sensing and perception to predict the motion of other agents (human-driven vehicles, cyclists, and pedestrians) and plan a safe trajectory accordingly. Self-driving cars can help diminish the amount of accidents because of their faster reaction times and lack of distraction; however, they still cannot handle all types of traffic scenarios. Some of the most problematic situations to tackle are maneuvers in dense traffic conditions. Humans use a combination of visual cues like blinkers or hand signs and actions such as nudging, braking/accelerating to open/close a gap in order to communicate their intentions to surrounding agents.
Our Interactive Decision Making work at HRI includes modeling and simulating the behavior of surrounding agents to safely coordinate the combined actions. By creating motion planning behaviors that proactively seek information about others’ intentions while at the same time attempting to convince other human-driven vehicles to accommodate the AV’s desired maneuver, our system is able to prevent deadlocks.
Related Publications
We propose a risk-aware crash mitigation system (RCMS), to augment any existing motion planner (MP), that enables an autonomous vehicle to perform evasive maneuvers in high-risk situations and minimize the severity of collision if a crash is inevitable. In order to facilitate a smooth transition between RCMS and MP, we develop a novel activation mechanism that combines instantaneous as well as predictive collision risk evaluation strategies in a unified hysteresis-band approach. For trajectory planning, we deploy a modular receding horizon optimization-based approach that minimizes a smooth situational risk profile, while adhering to the physical road limits as well as vehicular actuator limits. We demonstrate the performance of our approach in a simulation environment.
Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
This paper discusses the limitations of existing microscopic traffic models in accounting for the potential impacts of on-ramp vehicles on the car-following behavior of main-lane vehicles on highways. We first surveyed U.S. on-ramps to choose a representative set of on-ramps and then collected real-world
observational data from the merging vehicle’s perspective in
various traffic conditions ranging from free-flowing to rushhour
traffic jams. Next, as our core contribution, we introduce
a novel car-following model, called MR-IDM, for highway
driving that reacts to merging vehicles in a realistic way. This
proposed driving model can either be used in traffic simulators
to generate realistic highway driving behavior or integrated
into a prediction module for autonomous vehicles attempting
to merge onto the highway. We quantitatively evaluated the
effectiveness of our model and compared it against several
other methods. We show that MR-IDM has the least error
in mimicking the real-world data, while having
The ability to accurately predict others' behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human-robot interaction and multi-robot teaming tasks.Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between the robot's trajectory plan and its prediction of other agents' intent. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a ``shielding'' scheme), which overrides the robot's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors.We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles.
Autonomous vehicles (AVs) must share the driving space with other drivers and often employ conservative motion planning strategies to ensure safety. These conservative strategies can negatively impact AV’s performance and significantly slow traffic throughput. Therefore, to avoid conservatism, we design an interaction-aware motion planner for the ego vehicle (AV) that interacts with surrounding vehicles to perform complex maneuvers in a locally optimal manner. Our planner uses a neural network-based interactive trajectory predictor and analytically integrates it with model predictive control (MPC). We solve the MPC optimization using the alternating direction method of multipliers (ADMM) and prove the algorithm’s convergence. We provide an empirical study and compare our method with a baseline heuristic method.
A lane-change maneuver on a congested highway could be severely disruptive or even infeasible without the cooperation of neighboring cars. However, cooperation with other vehicles does not guarantee that the performed maneuver will not have a negative impact on traffic flow unless it is explicitly considered in the cooperative controller design. In this letter, we present a socially compliant framework for cooperative lane-change maneuvers for an arbitrary number of CAVs on highways that aims to interrupt traffic flow as minimally as possible. Moreover, we explicitly impose feasibility constraints in the optimization formulation by using reachability set theory, leading to a unified design that removes the need for an iterative procedure used in prior work. We quantitatively evaluate the effectiveness of our framework and compare it against previously offered approaches in terms of maneuver time and incurred throughput disruption.
This paper proposes a hierarchical autonomous vehicle navigation architecture, composed of a high-level speed and lane advisory system (SLAS) coupled with low-level trajectory generation and trajectory following modules. Specifically, we target a multi-lane highway driving scenario where an autonomous ego vehicle navigates in traffic. We propose a novel receding horizon mixed-integer optimization based method for SLAS with the objective to minimize travel time while accounting for passenger comfort. We further incorporate various modifications in the proposed approach to improve the overall computational efficiency and achieve real-time performance.We demonstrate the efficacy of the proposed approach in contrast to the existing methods, when applied in conjunction with state-of-the-art trajectory generation and trajectory following frameworks, in a CARLA simulation environment.
In robot learning from demonstration (LfD), a visual representation of a cost function inferred from Inverse Reinforcement Learning (IRL) provides an intuitive tool for humans to quickly interpret the underlying objectives of the demonstration. The inferred cost function can be used by controllers, for example, Model Predictive Controllers (MPCs). In this work, we improve the recently developed IRL-MPC framework, by enhancing it in a risk-sensitive formulation to be more applicable for safety-critical applications like autonomous driving. Our risk-sensitive MPCs together with the distributional costmap demonstrate lower collision rates in the CARLA simulator for autonomous driving tasks compared to other learning-based baseline methods.
Accurately modeling the behavior of traffic participants is essential for safely and efficiently navigating an autonomous vehicle through heavy traffic. We propose a method, based on the intelligent driver model, that allows us to accurately model individual driver behaviors from only a small number of frames using easily observable features. On average, this method makes prediction errors that have less than 1 meter difference from an oracle with full-information when analyzed over a 10-second horizon of highway driving. We then validate the efficiency of our method through extensive analysis against a competitive data-driven method such as Reinforcement Learning that may be of independent interest.
Micro-mobility is promising to contribute to sustainable cities with its efficiency and low cost. To better design such a sustainable future, it is necessary to understand the trends and challenges. Thus, we examined people's opinions on micro-mobility in the US and the EU using Tweets. We used topic modelling based on advanced natural language processing techniques and categorised the data into seven topics: promotion and service, mobility, technical features, acceptance, recreation, infrastructure and regulations. Furthermore, using sentiment analysis, we investigated people's positive and negative attitudes towards specific aspects of these topics and compared the patterns of the trends and challenges in the US and the EU. We found that (1) promotion and service included the majority of Twitter discussions in the both regions, (2) the EU had more positive opinions than the US, (3) micro-mobility devices were more widely used for utilitarian mobility and recreational purposes in the EU than in the US and (4) compared to the EU, people in the US had many more concerns related to infrastructure and regulation issues. These findings help us design and prioritise micro-mobility to improve their safety and experience across the two areas for designing a more sustainable micro-mobility future.
We derive optimal control policies for a Connected Automated Vehicle (CAV) and cooperating neighboring CAVs to carry out a lane change maneuver consisting of a longitudinal phase where the CAV properly positions itself relative to the cooperating neighbors and a lateral phase where it safely changes lanes. In contrast to prior work on this problem, where the CAV “selfishly” seeks to minimize its maneuver time, we seek to ensure that the fast-lane traffic flow is minimally disrupted (through a properly defined metric) and that highway throughput is improved by optimally selecting the cooperating vehicles. We show that analytical solutions for the optimal trajectories can be derived and are guaranteed to satisfy safety constraints for all vehicles involved in the maneuver. When feasible solutions do not exist, we include a time relaxation method trading off a longer maneuver time with reduced disruption. Our analysis is also extended to multiple sequential maneuvers. Simulation results where the controllers are implemented show their effectiveness in terms of safety guarantees and up to 35 % throughput improvement compared to maneuvers with no vehicle cooperation.
This article presents an online smooth-path lane-change control framework. We focus on dense traffic where intervehicle space gaps are narrow, and cooperation with surrounding drivers is essential to achieve the lane-change maneuver. We propose a two-stage control framework that harmonizes model predictive control (MPC) with generative adversarial networks (GANs) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, the system is augmented with an adaptive safety boundary and a Kalman filter to mitigate sensor noise. Simulation studies are investigated at different levels of traffic density and cooperativeness of other drivers. The simulation results support the effectiveness, driving comfort, and safety of the proposed method.
It can be difficult to autonomously produce driver behavior so that it appears natural to other traffic participants. Through Inverse Reinforcement Learning (IRL), we can automate this process by learning the underlying reward function from human demonstrations. We propose a new IRL algorithm that learns a goal-conditioned spatiotemporal reward function. The resulting costmap is used by Model Predictive Controllers (MPCs) to perform a task without any hand-designing or hand-tuning of the cost function. We evaluate our proposed Goal-conditioned SpatioTemporal Zeroing Maximum Entropy Deep IRL (GSTZ)-MEDIRL framework together with MPC in the CARLA simulator for autonomous driving, lane keeping, and lane changing tasks in a challenging dense traffic highway scenario. Our proposed methods show higher success rates compared to other baseline methods including behavior cloning, state-of-the-art RL policies, and MPC with a learning-based behavior prediction model.
Multi-agent reinforcement learning (MARL) provides an efficient way for simultaneously learning policies for multiple agents interacting with each other. However, in scenarios requiring complex interactions, existing algorithms can suffer from an inability to accurately anticipate the influence of selfactions on other agents. Incorporating an ability to reason about other agents’ potential responses can allow an agent to formulate more effective strategies. This paper adopts a recursive reasoning model in a centralized-training-decentralizedexecution framework to help learning agents better cooperate with or compete against others. The proposed algorithm, referred to as the Recursive Reasoning Graph (R2G), shows state-of-the-art performance on multiple multi-agent particle and robotics games.
A human-centered robot needs to reason about the cognitive limitations and potential irrationality of its human partner to achieve seamless interactions. This paper proposes a novel anytime game-theoretic planning framework that integrates iterative reasoning models, partially observable Markov decision process, and Monte-Carlo belief tree search for robot behavioral planning. Our planner equips a robot with the ability to reason about its human partner’s latent cognitive states(bounded intelligence and irrationality) and enables the robot to actively learn these latent states to better maximize its utility. Furthermore, our planner handles safety explicitly by enforcing change constraints. We validate our approach in an autonomous driving domain where our behavioral planner and a low-level motion controller hierarchically control an autonomous car to negotiate traffic merges. Simulations and user studies are conducted to show our planner’s effectiveness.
Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios. However, identifying the subtle cues that can indicate drastically different outcomes remains an open problem with designing autonomous systems that operate in human environments. In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty. We encode prior knowledge on the latent states of other drivers through a framework that combines the reinforcement learner with a supervised learner. In addition, we model the influence passing between different vehicles through graph neural networks (GNNs). The proposed framework significantly improves performance in the context of navigating T-intersections compared with state-of-the-art baseline approaches.
Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation.
This paper presents a trajectory planning algo-rithm for person following that is more comprehensive thanexisting algorithms. This algorithm is tailored for a front-wheel-steered vehicle, is designed to follow a person while avoidingcollisions with both static and moving obstacles, simultaneouslyoptimizing speed and steering, and minimizing control effort.This algorithm uses nonlinear model predictive control, wherethe underling trajectory optimization problem is approximatedusing a simultaneous method. Results collected in an unknownenvironment show that the proposed planning algorithm workswell with a perception algorithm to follow a person in unevengrass near obstacles and over ditches and curbs, and on asphaltover train-tracks and near buildings and cars. Overall, theresults indicate that the proposed algorithm can safely followa person in unknown, dynamic environment
Maneuvering in dense traffic is a challenging task for autonomous vehicles because it requires reasoning about the stochastic behaviors of many other participants. In addition, the agent must achieve the maneuver within a limited time and distance. In this work, we propose a combination of reinforcement learning and game theory to learn merging behaviors. We design a training curriculum for a reinforcement learning agent using the concept of level-k behavior. This approach exposes the agent to a broad variety of behaviors during training, which promotes learning policies that are robust to model discrepancies. We show that our approach learns more efficient policies than traditional training methods.
This paper introduces an accurate nonlinear model predictive control-based algorithm for trajectory following. For accuracy, the algorithm incorporates both the planned state and control trajectories into its cost functional. Current following algorithms do not incorporate control trajectories into their cost functionals. Comparisons are made against two trajectory following algorithms, where the trajectory planning problem is to safely follow a person using an automated ATV with control delays in a dynamic environment while simultaneously optimizing speed and steering, minimizing control effort, and minimizing the time-to-goal. Results indicate that the proposed algorithm reduces collisions, tracking error, orientation error, and time-to-goal. Therefore, tracking the control trajectories with the trajectory following algorithm helps the vehicle follow the planned state trajectories more accurately, which ultimately improves safety, especially in dynamic environments
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surroundingdrivers is essential to achieve the lane-change maneuver. Wepropose a two-stage control framework that harmonizes ModelPredictive Control (MPC) with Generative Adversarial Networks(GAN) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, thesystem is augmented with an adaptive safety boundary and aKalman Filter to mitigate sensor noise. Simulation studies are in-vestigated in different levels of traffic density and cooperativenessof other drivers. The simulation results support the effectiveness,driving comfort, and safety of the proposed method.
A variety of cooperative multi-agent control problems require agents to achieve individual goals while contributing to collective success. This multi-goal multiagent setting poses difficulties for recent algorithms, which primarily target settings with a single global reward, due to two new challenges: efficient exploration for learning both individual goal attainment and cooperation for others’ success, and credit-assignment for interactions between actions and goals of different agents. To address both challenges, we restructure the problem into a novel two-stage curriculum, in which single-agent goal attainment is learned prior to learning multi-agent cooperation, and we derive a new multi-goal multi-agent policy gradient with a credit function for localized credit assignment. We use a function augmentation scheme to bridge value and policy functions across the curriculum. The complete architecture, called CM3, learns significantly faster than direct adaptations of existing algorithms on three challenging multi-goal multi-agent problems: cooperative navigation in difficult formations, negotiating multi-vehicle lane changes in the SUMO traffic simulator, and strategic cooperation in a Checkers environment.
The Tactical Driver Behavior modeling problem requires an understanding of driver actions in complicated urban scenarios from rich multimodal signals including video, LiDAR and CAN signal data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks jointly offers the richest distillation of knowledge but presents challenges in the formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in Long ShortTerm Memory units (LSTMs), we propose Gated Recurrent Fusion Units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate it’s superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. On tactical driver behavior classification using Honda Driving Dataset (HDD), we report 10% improvement in mean Average Precision (mAP) score, and similarly, for steering angle regression on TORCS dataset, we note a 20% drop in Mean Squared Error (MSE) over the state-of-the-art
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting.
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.
We present a multi-agent reinforcement learning algorithm that is a simple, yet effective modification of a known algorithm. External agents are modeled as a time-varying environment, whose policy parameters are updated periodically at a slower rate than the planner to make learning stable and more efficient. Replay buffer, which is used to store the experiences, is also reset with the same large period to draw samples from a fixed environment. This enables us to address challenging cooperative control problems in highway navigation. The resulting Multi-agent Reinforcement Learning with Periodic Parameter Sharing (MARL-PPS) algorithm outperforms the baselines in multi-agent highway scenarios we tested.
Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.
Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate, and have limited ability to generalize. We then explore a system’s ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however, deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.
