DRAMA Dataset

The DRAMA Dataset is captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo.

Introduction

DRAMA dataset is captured from a moving vehicle on highly interactive urban traffic scenes in Tokyo.

  • 17,785 scenario clips captured using SEKONIX SF332X-10X video camera (30HZ frame rate, 1928 × 1280 resolution and 60 H-FOV) and GoPRO Hero 7 camera (60HZ frame rate, 2704 × 1520 resolution and 118.2 ◦ H-FOV), each clipped to 2 seconds in duration
  • The videos are synchronized with the Controller Area Network (CAN) signals and Inertial Measurement Unit (IMU) information.
  • Filtered these videos based on the ego-driver’s behavioral response to external situations or events, which activate braking of the vehicle
  • Contains different annotations: Video-level Q/A, Object-level Q/A, Risk object bounding box, Free-form caption, and separate labels for ego-car intention, scene classifier and suggestions to the driver.
  • 17,066 risk scenarios contains 12,273 vehicles, 3,344 (pedestrian/cyclist), 1,449 (infrastructure)
  • The free-form descriptions of reasoning include 992 unique words with total occurrences of 306,708 times.

Video

 

Annotation schema

Data Format

DATA:

----raw data----

combined/
    |-subfolders/flow_xxxxxx.png (flow image)
    |-subfolders/frame_xxxxxx.png (raw camera image)
    |-subfolders/movie.gif (video using the images)

`integrated_output_v2.json`: all the annotations for the data in `combined/` folder
`data_gen.py`: data-loader script used to create the processed data using `combined/` folder and `integrated_output_v2.json`

----processed data structure----

processed/:
    |-train/xxxxxx.pkl
    |-val/yyyyyy.pkl
    |-test/zzzzzz.pkl
    |-wordtoix.pkl (word to index mapping)
    |-ixtoword.pkl (index to word mapping)

every .pkl contains dictionary with keys {img, flow_img, enc_caption, caption_len, bbox, dims}

Citation

This dataset corresponds to the paper, 'DRAMA: Joint Risk Localization and Captioning in Driving', as it appears in the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023. In the current release, the data is available for researchers from universities.

Please cite the following paper if you find this work useful:

@inproceedings{malla2023drama,
  title={DRAMA: Joint Risk Localization and Captioning in Driving},
  author={Malla, Srikanth and Choi, Chiho and Dwivedi, Isht and Choi, Joon Hee and Li, Jiachen},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={1043--1052},
  year={2023}
}

 

Download the dataset

The dataset is available for non-commercial usage. You must be affiliated with a university and use your university email address to make the request. Use this link to make the download request.