Research Intern: Vision-Language-Action and World Models for Autonomous Systems

Job Number: P25INT-43

Honda Research Institute USA (HRI-US) is seeking a highly motivated intern to join its intelligent autonomy and AI research efforts. This role focuses on advancing approaches for handling rare, long-tail scenarios in autonomous driving by exploring complementary modeling paradigms. The candidate will work with modern multimodal and predictive modeling techniques, including vision-language(-action) models and world modeling approaches, to better understand and represent complex real-world situations. The work will contribute to improving the robustness, interpretability, and reliability of intelligent autonomous systems.

San Jose, CA

Key Responsibilities

Develop multimodal and predictive models, including vision-language(-action) and world models, using post-training (e.g. fine-tuning) to improve performance in rare, safety-critical scenarios.
Curate and preprocess datasets from public benchmarks with a focus on long-tail and edge-case conditions.
Design experiments to evaluate model behavior in complex scenarios and analyze results to identify strengths, limitations, failure modes, and potential improvements in rare-event settings.
Collaborate with cross-functional teams to align research direction and technical goals.
Contribute to a portfolio of patents, academic publications, and prototypes to demonstrate research value.

Minimum Qualifications

M.S. in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, Machine Learning, or a related field.
Strong background in machine learning, deep learning, or multimodal AI, including experience with vision-language(-action) models and/or world models.
Experience with model training, fine-tuning, or large-scale data processing.
Proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).
Strong written and verbal communication skills, with the ability to present technical ideas and results clearly to diverse audiences.

Bonus Qualifications

Ph.D. in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, Machine Learning, or a related field.
Familiarity with autonomous systems, robotics, or mobility-related datasets.
Experience with parameter-efficient training methods (e.g., LoRA, adapters).
Exposure to long-tail/edge-case analysis or safety-critical systems.
Strong analytical and problem-solving skills for diagnosing model behavior.
Publication record in top-tier conferences (e.g., CVPR, ICCV, ECCV, WACV, NeurIPS, ICLR).

Years of Work Experience Required	0
Desired Start Date	8/31/2026
Internship Duration	3 Months
Position Keywords	Mutimodal learning, vision-language(-action) models, world models, long-tail scenarios, autonomous driving

Alternate Way to Apply

Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)

Please, do not contact our office to inquiry about your application status.

Navigation

Navigation

Research Intern: Vision-Language-Action and World Models for Autonomous Systems - Honda Research Institute USA

Navigation

Research Intern: Vision-Language-Action and World Models for Autonomous Systems

Alternate Way to Apply