RANK2TELL - Honda Research Institute USA
Rank2Tell Dataset
The Rank2Tell Dataset is captured from a moving vehicle on highly interactive traffic scenes in San Francisco Bay Area.
Introduction
The Rank2Tell Dataset is captured from a moving vehicle on highly interactive traffic scenes in San Francisco Bay Area.
- 116 clips (~20s each) of 10FPS captured using an instrumented vehicle equipped with three Point Grey Grasshopper video cameras with a resolution of 1920 × 1200 pixels, a Velodyne HDL-64E S2 LiDAR sensor, and high precision GPS.
- Vehicle Controller Area Network (CAN) data is collected for analyzing how drivers manipulate steering, breaking, and throttle.
- All sensor data are synchronized and timestamped using ROS and customized hardware and software.
- Includes Video-level Q/A, Object-level Q/A, LiDAR and 3D bounding boxes (with tracking), Field of view from 3 cameras (stitched), important object bounding boxes (multiple important objects per frame with multiple levels of importance- High, Medium, Low), Free-form captions (multiple captions per object for multiple objects), ego-car intention.
Video
Annotation Schema
Dataset Structure
-Processed:
|
|
----Rank2Tell_all_obj_vocabulary.json (wordtoidx (word to index mapping), idxtoword (index to word mapping)
|
|
----train_split.txt
|
|
----test_split.txt
|
|
----val_split.txt
- Scenarios:
|
|
scenario_xxx: sequence of sensor data
|
----3_camera_images:
| |
| ---image_*.png (center camera image)
| |
| ---image_left*.png (left camera image)
| |
| ---image_right*.png (right camera image)
|
----Stitched_camera_frames:
| |
| ---frame_*.png (stitched left, center and right camera image)
|
----CAN_data (Decoded CAN DATA):
| |
| ---CAN_yaw_yyy.csv: yaw (deg/s)
| |
| ---CAN_vel_yyy.csv: speed (km/hr)
|
----labels (labels) [c: center, l: length]:
| |
| ---labels_3d1_yyy.txt: (label, trackerID, state[static/dynamic], c_x, c_y, c_z, l_x, l_y, l_z, yaw) 3D bounding box labeled in velodyne frame
|
----pointcloud (pointcloud):
| |
| ---pointcloud1_yyy.ply: Full 360 deg pointcloud (surfel format, fields: xyz, radius->intensity, confidence->ring_number, curvature->encoder_angle)
|
|
----ego_intentions (frame level intention of ego car)
| |
| —scenario_xxx.csv: (frame_num, intention(straight/ left/ right)]
|
----GPS_data (GPS+IMU DATA):
| |
| --gps_yyy.csv: Long_Rel,Lat_Rel,In_Height,Tilt_Roll,Tilt_Pitch,Tilt_Yaw,Vel_x,Vel_y,Vel_z,Std_Dev_x,Std_Dev_y,Std_Dev_z,Std_Dev_roll,Std_Dev_pitch,Std_Dev_yaw,Std_Dev_vel_x,Std_Dev_vel_y,Std_Dev_vel_z,Abs_Lat,Abs_Long
|
---odom (Odometer Data:
|
--odom_yyy.txt: translation (tx, ty, tz), rotation(roll, pitch, yaw)
|
----importance annotations(4W + 1H annotations)
| |
| — integrated_annotations.json
Download the dataset
This dataset corresponds to the paper, "Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning". In the current release, the data is available for researchers from universities. Use this link to make the download request.
Citation
@inproceedings{sachdeva2024rank2tell,
title={Rank2tell: A multimodal driving dataset for joint importance ranking and reasoning},
author={Sachdeva, Enna and Agarwal, Nakul and Chundi, Suhas and Roelofs, Sean and Li, Jiachen and Kochenderfer, Mykel and Choi, Chiho and Dariush, Behzad},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={7513--7522},
year={2024}
}