Your application is being processed
Research Intern: Semantics Detection for Robotics Perception
Job Number: P25INT-20
Honda Research Institute USA (HRI-US) is seeking a self-motivated PhD student to build semantic feature detection for objects and environments—attributes, affordances, relations, and 6-DoF poses—that a vision-language model (VLM) will use to reason about tasks and synthesize robot motions and motion constraints, closely tied to object operations. The successful candidate will conduct reseearch on perception, representation, and robotics to turn raw sensor data into a scene description that safely guides planning and control of the robot.
San Jose, CA
|
Key Responsibilities
|
|
- Design and Develop software modules on semantics, perception and constraints reasoning.
- Deploy and demonstrate the taslk on a robot hardware by integrating accelerated infrence, ROS1/2 integration and planning stacks.
- Document findings and contribute to internal research reports.
- Contribute to the portfolio of patents, and publish research results when applicable at top-tier conferences and journals in robotics and machine learning.
Minimum Qualifications
|
|
- Currently pursuing a Ph.D. in machine learning, robotics, computer science or related fields.
- Strong Python and deep learning fundamentals, solid computer-vision foundations.
- Hands-on experience training a vision model (detection/segmentation/pose/…) on a custom dataset.
- Basic 3D perception (camera models, calibration, transforms) and experience with RGB-D and point clouds.
- Proficiency in Linux, Git, PyTorch, OpenCV, and reproducible experimentation
Bonus Qualifications
- 6-DoF pose & tracking: Experience with FoundationPose/DOPE/CenterPose/GenPose++ or similar; multi-object tracking (e.g., ByteTrack/BoT-SORT) and temporal filtering (EKF).
- Affordances & parts detection: Familiarity with-or experience in-methods for detecting graspable regions, handles, openings, containers, tools; articulated objects (hinge/prismatic joints).
- Open-vocabulary vision: Experience with CLIP/OWL-ViT/Grounding DINO; segmentation with SAM/Mask2Former; tuning with few-shot/adaptive labeling.
- VLM/robotics interface: Familiarity with VLM prompting/tool-use and turning semantics into motion goals/constraints; basics of motion planning/control (MoveIt, constraint specs, diffusion/learned policies.
|
| Years of Work Experience Required |
0 |
| Desired Start Date |
5/11/2026 |
| Internship Duration |
3 Months |
| Position Keywords |
Semantic perception, affordance detection, Vision Language Model (VLM), teleoperation/robotic autonomy |
|
|
|
Alternate Way to Apply
Send an e-mail to careers@honda-ri.com with the following:
- Subject line including the job number(s) you are applying for
- Recent CV
- A cover letter highlighting relevant background (Optional)
Please, do not contact our office to inquiry about your application status.