| CARVIEW |
|
|
|
|
|
|
|
|
NeurIPS 2022 Datasets and Benchmarks
|
|
Quick Jump:
We successfully apply it to a variety of tasks: 1) self-supervised visuo-tactile feature learning, 2) the novel task of tactile-driven image stylization, i.e., making an object look as though it were ''felt like'' a given tactile input, and 3) predicting future frames of a tactile signal from visuo-tactile inputs.
Human-Powered Data Collection
                      Vision                                         Touch
Untrimmed video examples from our dataset
Two people collecting data
Examples from our Dataset
Labeled Video Data
| Label |
RGB Video Gelsight Video |
|---|---|
| Synthetic Fabric | |
| Stone | |
| Brick |
| Label |
RGB Video Gelsight Video |
|---|---|
| Wood | |
| Grass | |
| Concrete |
Touch and Go Dataset       We collect a dataset of natural vision-and-touch signals. Our dataset contains multimodal data
recorded by humans, who probe objects in their natural locations with a tactile sensor. To more easily
train and analyze models on this dataset, we also collect material labels and identify touch onsets.
The touch_and_go directory contains a directory of raw videos, extract_frame.py that convert raw videos to frames, and label.txt of material labels for onset frames.
Each raw video folder in the Dataset folder consists of six items:
Applications
Tactile-driven image stylization
Hover over each row to make the right pictures look like the left tactile input.
Multimodal video prediction
We predict multiple frames by autoregressively feeding our output images back to the original model. We evaluate our model for predicting future tactile signals. In the figure below, we compare a tactile-only model to a multimodal visuo-tactile model, and show that the latter obtains better performance. By incorporating our dataset's visual signal, the model gains a constant performance increase under different evaluation metrics, under both experimental settings. The gap becomes larger for longer time horizon, suggesting that visual information may be more helpful in this case.

Comparison to Other Datasets

Acknowledgements

Touch and Go: Learning from Human-Collected Vision and Touch by Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens is licensed under a Creative Commons Attribution 4.0 International License



