[Computer Vision] Video Captioning in Traffic Scenes - HRI-US

[Computer Vision] Video Captioning in Traffic Scenes

[Computer Vision] Video Captioning in Traffic Scenes

Job Number: P20INT-38
This title includes multiple positions, which focus on developing computer vision and machine learning algorithms to generate linguistic descriptions of traffic scene events that are important for the development of advanced driver assistance systems.
San Jose, CA

You are expected to: 

  • Use video inputs to generate natural language description of important or unstructured traffic scene events that impact the driver's decision making and motion planning strategies.
  • Participate in creating a dataset to support activities in video-based captioning of traffic scenes.
  • Develop and evaluate metrics to verify reliability of the proposed algorithms.
  • Contribute to a portfolio of patents, academic publications, and prototypes to demonstrate research value.


  • Ph.D. or highly qualified M.S. candidate in computer science, electrical engineering, or related field.
  • Strong familiarity with computer vision and machine learning techniques pertaining to video captioning.
  • Excellent programming skills in Python or C++.

Bonus Qualifications:

  • Familiarity with creating datasets for video captioning, including visual question and answering methods is preferred for one position.
  • Experience in open-source deep learning frameworks such as TensorFlow or PyTorch preferred.

Duration: 3 months

How to apply

Candidates must have the legal right to work in the U.S.A.​ Please add Cover Letter and CV in the same document