Research Intern: Vision-Language-Action and World Models for Autonomous Systems - Honda Research Institute USA

Research Intern: Vision-Language-Action and World Models for Autonomous Systems

Your application is being processed

Research Intern: Vision-Language-Action and World Models for Autonomous Systems

Job Number: P25INT-43
Honda Research Institute USA (HRI-US) is seeking a highly motivated intern to join its intelligent autonomy and AI research efforts. This role focuses on advancing approaches for handling rare, long-tail scenarios in autonomous driving by exploring complementary modeling paradigms. The candidate will work with modern multimodal and predictive modeling techniques, including vision-language(-action) models and world modeling approaches, to better understand and represent complex real-world situations. The work will contribute to improving the robustness, interpretability, and reliability of intelligent autonomous systems.
San Jose, CA

 

Key Responsibilities

 

  • Develop multimodal and predictive models, including vision-language(-action) and world models, using post-training (e.g. fine-tuning) to improve performance in rare, safety-critical scenarios.
  • Curate and preprocess datasets from public benchmarks with a focus on long-tail and edge-case conditions.
  • Design experiments to evaluate model behavior in complex scenarios and analyze results to identify strengths, limitations, failure modes, and potential improvements in rare-event settings.  
  • Collaborate with cross-functional teams to align research direction and technical goals.
  • Contribute to a portfolio of patents, academic publications, and prototypes to demonstrate research value.

 

Minimum Qualifications

 

  • M.S. in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, Machine Learning, or a related field.
  • Strong background in machine learning, deep learning, or multimodal AI, including experience with vision-language(-action) models and/or world models. 
  • Experience with model training, fine-tuning, or large-scale data processing.
  • Proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).      
  • Strong written and verbal communication skills, with the ability to present technical ideas and results clearly to diverse audiences.  

 

Bonus Qualifications

  • Ph.D. in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, Machine Learning, or a related field. 
  • Familiarity with autonomous systems, robotics, or mobility-related datasets.
  • Experience with parameter-efficient training methods (e.g., LoRA, adapters).
  • Exposure to long-tail/edge-case analysis or safety-critical systems.
  • Strong analytical and problem-solving skills for diagnosing model behavior. 
  • Publication record in top-tier conferences (e.g., CVPR, ICCV, ECCV, WACV, NeurIPS, ICLR).

 

Years of Work Experience Required   0
Desired Start Date  8/31/2026
Internship Duration  3 Months
Position Keywords  Mutimodal learning, vision-language(-action) models, world models, long-tail scenarios, autonomous driving 

Alternate Way to Apply

Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for 
- Recent CV 
- A cover letter highlighting relevant background (Optional)

Please, do not contact our office to inquiry about your application status.