Research Intern: Tactile-Aware Multi-Modal Foundation Models for Dexterous Manipulation

Job Number: P25INT-18

Honda Research Institute USA (HRI-US) is seeking a passionate individual to contribute to the development of tactile-aware multi-modal foundation models for robotic dexterous manipulation. The project investigates how large models can integrate tactile, vision, force, audio, and language modalities to build a representation of robot-object interactions and integrate them into action policies that demonstrate their effectiveness in multi-fingered dexterous manipulation. The successful candidate will explore novel approaches for sensor encoding, cross-modal alignment, and context-driven modality selection, and evaluate these methods in both simulated and real-robot environments.

San Jose, CA

Key Responsibilities

Collaborate with researchers to develop and evaluate novel architectures, and assist in multi-modal dataset generation.
Design and implement sensor encoders and cross-modal alignment strategies for heterogeneous data streams.
Develop action policies or control heads that directly leverage learned representations to perform dexterous manipulation using multi-fingered hands.
Conduct experiments in simulation and real-robot platforms, analyzing model performance and generalization.
Document findings and contribute to internal research reports
Contribute to the portfolio of patents, and publish research results when applicable at top-tier conferences and journals in robotics and machine learning.

Minimum Qualifications

Ph.D. or highly qualified M.S. candidate in computer science, robotics or a related field.
Strong background in machine learning and deep learning.
Proficiency in Python and at least one ML framework (PyTorch or TensorFlow).
Research experience in machine learning, robotics, or computer vision.
Familiarity with robotics software frameworks such as ROS

Bonus Qualifications

Experience with multi-modal learning or representation learning (e.g., integrating visual, tactile, or language data).
Experience with tactile or force/torque sensing and dexterous manipulation.
Expericne with optical tactie sensors like DIGIT.
Experience with large models (e.g., VLMs, multi-modal transformers, or foundation models) for perception or control.
Familiarity with simulation platforms such as Isaac Sim, Isaac Lab or MuJoCo
Experience with robot learning, multi-modal perception, and large model adaptation.

Years of Work Experience Required	0
Desired Start Date	5/11/2026
Internship Duration	3 Months
Position Keywords	Robotics, perception, representation learning, vision, tactile, visuotactile, vision-language models, VLM, VLA, VTLA

Alternate Way to Apply

Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)

Please, do not contact our office to inquiry about your application status.

Navigation

Navigation

Research Intern: Tactile-Aware Multi-Modal Foundation Models for Dexterous Manipulation - Honda Research Institute USA

Navigation

Research Intern: Tactile-Aware Multi-Modal Foundation Models for Dexterous Manipulation

Alternate Way to Apply