Research Intern: Action Understanding Using Narration

Job Number: P25INT-41

Honda Research Institute USA (HRI-US) is seeking a highly motivated and independent PhD research intern to join our team in advancing the frontiers of human action understanding and computer vision in long procedural videos. The project focuses on learning from narration as supervision for downstream tasks such as action segmentation and proficiency estimation in procedural videos. This role is ideal for a researcher with a strong background in video understanding and vision-language models. The intern will work on real-world challenges involving long-horizon human activity videos, and contribute to high-impact publications and patents.

San Jose, CA

Key Responsibilities

Conduct cutting-edge research in learning from narration to recognize actions or evaluate proficiency in long procedural videos.
Design and implement novel algorithms for aligning video and text representations or training multi-modal learning using open-source language models.
Perform literature review, formulate hypotheses, run experiments, and analyze results.
Lead or contribute to research paper writing, including potential submission to top-tier computer vision or machine learning conferences (e.g., CVPR, ICCV, NeurIPS, ECCV).
Write well-structured, efficient code using deep learning frameworks such as PyTorch.

Minimum Qualifications

Currently enrolled in a PhD program in Computer Vision, Machine Learning, Artificial Intelligence, or a closely related field.
Publication record in top-tier conferences (e.g., CVPR, ICCV, ECCV, WACV, NeurIPS, ICLR).
Prior research experience with multimodal language models (i.e, Q formers, LoRA, and LLMs), OR vision-language representation alignment (e.g., CLIP).
Excellent programming skills, ability to write reproducible research code, and proficiency in deep learning frameworks, especially PyTorch.
Strong written and verbal communication skills.
Ability to independently drive research, from ideation to experimentation and publication.

Bonus Qualifications

Familiarity with procedural video understanding tasks such as weakly supervised temporal action segmentation.
Experience working with procedural human action datasets (e.g., long-form, untrimmed videos) and associated annotations.

Years of Work Experience Required	0
Desired Start Date	8/31/2026
Internship Duration	3 Months
Position Keywords	Action Understanding, Action Segmentation, Narration, Language Models

Alternate Way to Apply

Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)

Please, do not contact our office to inquiry about your application status.

Navigation

Navigation

Research Intern: Action Understanding Using Narration - Honda Research Institute USA

Navigation

Research Intern: Action Understanding Using Narration

Alternate Way to Apply