RANK2TELL - Honda Research Institute USA

Rank2Tell Dataset

The Rank2Tell Dataset is captured from a moving vehicle on highly interactive traffic scenes in San Francisco Bay Area.

Introduction

The Rank2Tell Dataset is captured from a moving vehicle on highly interactive traffic scenes in San Francisco Bay Area.

116 clips (~20s each) of 10FPS captured using an instrumented vehicle equipped with three Point Grey Grasshopper video cameras with a resolution of 1920 × 1200 pixels, a Velodyne HDL-64E S2 LiDAR sensor, and high precision GPS.
Vehicle Controller Area Network (CAN) data is collected for analyzing how drivers manipulate steering, breaking, and throttle.
All sensor data are synchronized and timestamped using ROS and customized hardware and software.
Includes Video-level Q/A, Object-level Q/A, LiDAR and 3D bounding boxes (with tracking), Field of view from 3 cameras (stitched), important object bounding boxes (multiple important objects per frame with multiple levels of importance- High, Medium, Low), Free-form captions (multiple captions per object for multiple objects), ego-car intention.

Video

Annotation Schema

Dataset Structure

- Scenarios:
   |
   |
   scenario_xxx: sequence of sensor data
       |
       ----3_camera_images:
       |       |
       |       ---image_*.png (center camera image)
       |       |
       |       ---image_left*.png (left camera image)
       |       |
       |       ---image_right*.png (right camera image)
       |
       ----Stitched_camera_frames:
       |       |
       |       ---frame_*.png (stitched left, center and right camera image)
       |
       ----CAN_data (Decoded CAN DATA):
       |       |
       |       ---CAN_yaw_yyy.csv: yaw (deg/s)
       |       |
       |       ---CAN_vel_yyy.csv: speed (km/hr)
       |
       ----labels (labels) [c: center, l: length]:
       |       |
       |       ---labels_3d1_yyy.txt: (label, trackerID, state[static/dynamic], c_x, c_y, c_z, l_x, l_y, l_z, yaw) 3D bounding box labeled in velodyne frame
       |
       ----pointcloud (pointcloud):
       |       |
       |       ---pointcloud1_yyy.ply: Full 360 deg pointcloud (surfel format, fields: xyz, radius->intensity, confidence->ring_number, curvature->encoder_angle)
       |
       |
       ----ego_intentions (frame level intention of ego car)
       |       |
       |       —scenario_xxx.csv: (frame_num, intention(straight/ left/ right)]
       |
       ----GPS_data (GPS+IMU DATA):
       |       |
       |       --gps_yyy.csv: Long_Rel,Lat_Rel,In_Height,Tilt_Roll,Tilt_Pitch,Tilt_Yaw,Vel_x,Vel_y,Vel_z,Std_Dev_x,Std_Dev_y,Std_Dev_z,Std_Dev_roll,Std_Dev_pitch,Std_Dev_yaw,Std_Dev_vel_x,Std_Dev_vel_y,Std_Dev_vel_z,Abs_Lat,Abs_Long
       |
       ---odom (Odometer Data:
               |
               --odom_yyy.txt: translation (tx, ty, tz), rotation(roll, pitch, yaw)
       |
       ----importance annotations(4W + 1H annotations)
       |       |
       |       — integrated_annotations.json

Download the dataset

This dataset corresponds to the paper, "Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning". In the current release, the data is available for researchers from universities. Use this link to make the download request.

Citation

@inproceedings{sachdeva2024rank2tell,
title={Rank2tell: A multimodal driving dataset for joint importance ranking and reasoning},
author={Sachdeva, Enna and Agarwal, Nakul and Chundi, Suhas and Roelofs, Sean and Li, Jiachen and Kochenderfer, Mykel and Choi, Chiho and Dariush, Behzad},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={7513--7522},
year={2024}
}

Navigation