Visual Saliency and Crowdsourcing-based Priors for an In-car Situated Dialog System

Visual Saliency and Crowdsourcing-based Priors for an In-car Situated Dialog System

Conference

Abstract

​​This paper addresses issues in situated language understanding in a moving car. We propose a reference resolution
method to identify user queries about specific target objects in their surroundings. We investigate methods of predicting which target object is likely to be queried given a visual scene and what kind of linguistic cues users naturally provide to describe a given target object in a situated environment. We propose methods to incorporate the visual saliency of the visual scene as a prior. Crowdsourced statistics of how people describe an object are also used as a prior. We have collected situated utterances from drivers using our research system, which was embedded in a real vehicle. We demonstrate that the proposed algorithms improve target identification rate by 15.1%.

Details

PUBLISHED IN
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
PUBLICATION DATE
09 Nov 2015
AUTHORS
T. Misu