Deep Tactile experience for Manipulation
Tactile sensing is inherently contact based. To use tactile data robots need to make contact with the surface of an object. In order to manipulate the object, often, robots need to make multiple contacts with the object to determine a suitable contact surface. This is inefficient and may cause damage to the object. To reduce the number of direct contacts with the object, we propose a method that, based on past experience, estimates the output of a tactile sensor from the depth data of the surface of the object.
In this project, we presented a method that estimates the output of a tactile sensor from depth sensor data. The novelty of this work lies in the way we use depth sensor data to estimate tactile sensor output. We presented qualitative and quantitative results that suggest the proposed method can estimate tactile sensor output from the depth data. We also presented results of a study that suggests reduction of point cloud density has negligible adverse affect on the quality of the tactile sensor estimates.
Related Publications
In this letter, we introduce ViHOPE, a novel framework for estimating the 6D pose of an in-hand object using visuotactile perception. Our key insight is that the accuracy of the 6D object pose estimate can be improved by explicitly completing the shape of the object. To this end, we introduce a novel visuotactile shape completion module that uses a conditional Generative Adversarial Network to complete the shape of an in-hand object based on volumetric representation. This approach improves over prior works that directly regress visuotactile observations to a 6D pose. By explicitly completing the shape of the in-hand object and jointly optimizing the shape completion and pose estimation tasks, we improve the accuracy of the 6D object pose estimate. We train and test our model on a synthetic dataset and compare it with the state-of-the-art. In the visuotactile shape completion task, we outperform the state-of-the-art by 265% using the Intersection of Union metric and achieve 88% lower Chamfer Distance. In the visuotactile pose estimation task, we present results that suggest our framework reduces position and angular errors by 35% and 64%, respectively. Furthermore, we ablate our framework to confirm the gain on the 6D object pose estimate from explicitly completing the shape. Ultimately, we show that our framework produces models that are robust to sim-to-real transfer on a real-world robot platform.
Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object’s 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings
Knowledge of the 6D pose of an object can benefit in-hand object manipulation. Existing 6D pose estimation methods use vision data. In-hand 6D object pose estimation is challenging because of heavy occlusion produced by the robot's grippers, which can have an adverse effect on methods that rely on vision data only. Many robots are equipped with tactile sensors at their fingertips that could be used to complement vision data. In this paper, we present a method that uses both tactile and vision data to estimate the pose of an object grasped in a robot's hand. The main challenges of this research include 1) lack of standard representation for tactile sensor data, 2) fusion of sensor data from heterogeneous sources---vision and tactile, and 3) a need for large training datasets. To address these challenges, first, we propose use of point clouds to represent object surfaces that are in contact with the tactile sensor. Second, we present a network architecture based on pixel-wise dense fusion to fuse vision and tactile data to estimate the 6D pose of an object. Third, we extend NVIDIA's Deep Learning Dataset Synthesizer to produce synthetic photo-realistic vision data and the corresponding tactile point clouds for 11 objects from the YCB Object and Model Set in Unreal Engine 4. We present results of simulated experiments suggesting that using tactile data in addition to vision data improves the 6D pose estimate of an in-hand object. We also present qualitative results of experiments in which we deploy our network on real physical robots showing successful transfer of a network trained on synthetic data to a real system.
Tactile sensing is inherently contact based. To use tactile data, robots need to make contact with the surface of an object. This is inefficient in applications where an agent needs to make a decision between multiple alternatives that depend the physical properties of the contact location. We propose a method to get tactile data in a non-invasive manner. The proposed method estimates the output of a tactile sensor from the depth data of the surface of the object based on past experiences. An experience dataset is built by allowing the robot to interact with various objects, collecting tactile data and the corresponding object surface depth data. We use the experience dataset to train a neural network to estimate the tactile output from depth data alone. We use GelSight tactile sensors, an image-based sensor, to generate images that capture detailed surface features at the contact location. We train a network with a dataset containing 578 tactile-image to depthmap correspondences. Given a depth-map of the surface of an object, the network outputs an estimate of the response of the tactile sensor, should it make a contact with the object. We evaluate the method with structural similarity index matrix (SSIM), a similarity metric between two images commonly used in image processing community. We present experimental results that show the proposed method outperforms a baseline that uses random images with statistical significance getting an SSIM score of 0.84 ± 0.0056 and 0.80 ± 0.0036, respectively.