Computer Vision Scientist: Language and Vision

Computer Vision Scientist: Language and Vision

Job Number: P19F04
This position focuses on research problems that are at the intersection of vision and language for next-generation mobility systems. In particular, the work involves research and development of computer vision and machine learning algorithms toward visual scene understanding and linking the visual information to natural language for applications involving video captioning, vision-language navigation (VLN), and visual dialog.
San Jose, CA


Key Responsibilities:

  • Conduct research and development on problems at the intersection of vision and language. The scenes are acquired from a mobile platform and involve a high degree of interaction between agents in the scene and the environment
  • Design, develop, and integrate software systems and architectures necessary to realize research prototypes
  • Develop and evaluate metrics to verify the reliability of proposed algorithms
  • Participate in data collection, sensor calibration, and data processing
  • Participate in ideation, creation, and evaluation of related technologies in various domains including traffic scenes and indoor robotics
  • Contribute to a portfolio of patents, academic publications, and prototypes to demonstrate research value


  • Ph.D. or M.S. in computer science, electrical engineering, or related field
  • Strong familiarity with machine learning techniques at the intersection of vision and language
  • Familiarity with scene modeling and interpretation using spatiotemporal graphs, scene graphs, graph convolution networks, or similar graphical modeling techniques
  • Familiarity with automatic generation of natural language descriptions from images and videos and/or familiarity with visual question answering/visual dialog research domain preferred
  • Experience in open-source Deep Learning frameworks such as TensorFlow or Pytorch preferred
  • Highly proficient in software engineering using C++ and Python
  • Hands-on experience in handling multi-modal sensor data preferred
  • Strong written and oral communication skills including development and delivery of presentations, proposals, and technical documents
  • Strong publication record in computer vision or machine learning

How to apply

Candidates must have the legal right to work in the U.S.A.​ Please add Cover Letter and CV in the same document

Text to Identify Refresh CAPTCHA