VisuoTactile 6D Pose Estimation of an In-Hand Object using Vision and Tactile Sensor Data
Robotics and Automation Letters (RA-L)
Knowledge of the 6D pose of an object can benefit in-hand object manipulation. Existing 6D pose estimation methods use vision data. In-hand 6D object pose estimation is challenging because of heavy occlusion produced by the robot's grippers, which can have an adverse effect on methods that rely on vision data only. Many robots are equipped with tactile sensors at their fingertips that could be used to complement vision data. In this paper, we present a method that uses both tactile and vision data to estimate the pose of an object grasped in a robot's hand. The main challenges of this research include 1) lack of standard representation for tactile sensor data, 2) fusion of sensor data from heterogeneous sources---vision and tactile, and 3) a need for large training datasets. To address these challenges, first, we propose use of point clouds to represent object surfaces that are in contact with the tactile sensor. Second, we present a network architecture based on pixel-wise dense fusion to fuse vision and tactile data to estimate the 6D pose of an object. Third, we extend NVIDIA's Deep Learning Dataset Synthesizer to produce synthetic photo-realistic vision data and the corresponding tactile point clouds for 11 objects from the YCB Object and Model Set in Unreal Engine 4. We present results of simulated experiments suggesting that using tactile data in addition to vision data improves the 6D pose estimate of an in-hand object. We also present qualitative results of experiments in which we deploy our network on real physical robots showing successful transfer of a network trained on synthetic data to a real system.