Multimodal Spatial Cognition Intern

Job Number: P24INT-10

We are seeking a highly motivated and intellectually curious Multi-Modal Spatial Cognition Intern to join our team. In this role, you will contribute to the development of multi-modal foundation models, aiming to improve the understanding of 3D Environments and Spatial Reasoning. You will work closely with our team of AI Researchers and Engineers to explore and implement solutions that integrate diverse data sources (e.g., images, point clouds, depth maps, motion, audio, language, etc.) for better scene understanding and spatial perception. The ideal candidate will have a background in computer science, machine learning, and 3D computer vision, along with a keen interest in advancing spatial cognition using multi-modal AI models.

San Jose, CA

Key Responsibilities

Model Development: Assist in the development, training, and evaluation of multi-modal AI models that combine visual, spatial, and contextual information for understanding 3D scenes.
Data Integration: Work with large, multi-modal datasets including images, depth maps, LiDAR scans, and sensor data, applying techniques to enhance spatial reasoning and scene reconstruction.
Research and Experimentation: Conduct literature reviews and support experimental design to advance state-of-the-art methods for spatial cognition and perception tasks such as object recognition, 3D scene segmentation, and spatial relations.
Performance Analysis: Analyze the performance of multi-modal models and propose improvements, such as model fine-tuning or optimizing for real-time applications.
Documentation & Reporting: Write technical reports, document experimental results, and present findings to the team.

Minimum Qualifications

Currently pursuing or recently completed a PhD degree in Computer Science, Artificial Intelligence, Robotics, or a related field.
Extensive research background in computer science, deep learning, and 3D Computer vision.
Strong programming skills in Python, with experience in machine learning libraries such as PyTorch, TensorFlow or similar.
Experience with multi-modal data fusion, including images, point clouds, depth data, audio, motion and other sensor data.

Years of Work Experience Required	0
Desired Start Date	5/5/2025
Internship Duration	3 Months
Position Keywords	Multimodal Foundation Models, Scene Segmentation, Computer Vision

Alternate Way to Apply

Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)

Please, do not contact our office to inquiry about your application status.

Navigation

Navigation

Multimodal Spatial Cognition Intern - Honda Research Institute USA

Navigation

Multimodal Spatial Cognition Intern

Alternate Way to Apply