Your application is being processed
Research Intern: Action Understanding Using Narration
Job Number: P25INT-41
Honda Research Institute USA (HRI-US) is seeking a highly motivated and independent PhD research intern to join our team in advancing the frontiers of human action understanding and computer vision in long procedural videos. The project focuses on learning from narration as supervision for downstream tasks such as action segmentation and proficiency estimation in procedural videos. This role is ideal for a researcher with a strong background in video understanding and vision-language models. The intern will work on real-world challenges involving long-horizon human activity videos, and contribute to high-impact publications and patents.
San Jose, CA
|
Key Responsibilities
|
|
- Conduct cutting-edge research in learning from narration to recognize actions or evaluate proficiency in long procedural videos.
- Design and implement novel algorithms for aligning video and text representations or training multi-modal learning using open-source language models.
- Perform literature review, formulate hypotheses, run experiments, and analyze results.
- Lead or contribute to research paper writing, including potential submission to top-tier computer vision or machine learning conferences (e.g., CVPR, ICCV, NeurIPS, ECCV).
- Write well-structured, efficient code using deep learning frameworks such as PyTorch.
Minimum Qualifications
|
|
- Currently enrolled in a PhD program in Computer Vision, Machine Learning, Artificial Intelligence, or a closely related field.
- Publication record in top-tier conferences (e.g., CVPR, ICCV, ECCV, WACV, NeurIPS, ICLR).
- Prior research experience with multimodal language models (i.e, Q formers, LoRA, and LLMs), OR vision-language representation alignment (e.g., CLIP).
- Excellent programming skills, ability to write reproducible research code, and proficiency in deep learning frameworks, especially PyTorch.
- Strong written and verbal communication skills.
- Ability to independently drive research, from ideation to experimentation and publication.
Bonus Qualifications
- Familiarity with procedural video understanding tasks such as weakly supervised temporal action segmentation.
- Experience working with procedural human action datasets (e.g., long-form, untrimmed videos) and associated annotations.
|
| Years of Work Experience Required |
0 |
| Desired Start Date |
8/31/2026 |
| Internship Duration |
3 Months |
| Position Keywords |
Action Understanding, Action Segmentation, Narration, Language Models |
|
|
|
Alternate Way to Apply
Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)
Please, do not contact our office to inquiry about your application status.