| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 13 Dec 2022 21:07:51 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6398e9a7-9b8d"
expires: Tue, 30 Dec 2025 06:17:29 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 1D3D:3ABDEF:9A253C:AD45FC:69536C21
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 06:07:29 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210066-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767074850.683637,VS0,VE225
vary: Accept-Encoding
x-fastly-request-id: 6278c8f49ba1d2e1ecd7fda7b19a3ffed15f712a
content-length: 11443
TRITON
Paper #474
At inference time, TRITON takes in a special R G B image representing a 3d scene (which we call a U V L image), and outputs a realistic R G B image that can then be used for downstream tasks, such as training a robot. The input U V L images are simple enough that even the most rudimentary 3d renderer can generate them.
||
Neural Neural Textures Make Sim2Real Consistent
Ryan Burgert, Jinghuan Shang, Xiang Li, Michael S. Ryoo
Paper #474
Come see us at the last poster session!
Abstract
We present TRITON (Texture Recovering Image Translation Network): an unpaired image translation algorithm that achieves temporal consistency over indefinite timescales by generating neural textures on object surfaces.At inference time, TRITON takes in a special R G B image representing a 3d scene (which we call a U V L image), and outputs a realistic R G B image that can then be used for downstream tasks, such as training a robot. The input U V L images are simple enough that even the most rudimentary 3d renderer can generate them.
Move your cursor along the video to move the the divider!
On the left is the U V L input image, and on the right is the translated R G B output image. Note that all training photographs were taken from the same camera position; this video extrapolates new camera positions as well as new object positions. This should be animated - if it's not working, please try a Chromium based browser.
On the left is the U V L input image, and on the right is the translated R G B output image. Note that all training photographs were taken from the same camera position; this video extrapolates new camera positions as well as new object positions. This should be animated - if it's not working, please try a Chromium based browser.
Two sets of recovered textures, corresponding to the above two videos.
Here's an animation of a robot arm.
How it Works
The Training Data
To train TRITON, you need an unpaired set of photographs and a set of simulated scenes, rendered as U V L images.
We need a set of about 100 or so R
G
B real-life photographs
We also need a set of simulated
U
V
L images, which encode the
U
V
coordinates of each object in a scene, as well as the object label L. Because these images can be obtained cheaply, we can use thousands of them.
Neural Neural Textures
What sets TRITON apart from other image translation algorithms is its use of neural neural textures. Previous works called these learnable textures "neural textures", and were parametrized by a discrete grid of differentiable texels. In contrast, we call our learnable textures as neural nerual textures, because our textures themselves are represented as a neural network function, parameterized continuously over UV space. Using this representation instead of using discrete texels allows TRITON to learn faster and yields better results.
Each 3d object gets its own neural neural texture, which is represented continuously with an MLP.
The General Pipeline
This is a simplified version of the TRITON pipeline. It omits GAN losses as well as surface consistency losses. For more information, as well as a more detailed diagram, please read the paper!
Results
Comparison to other Image Translators
In the above video, we compare the outputs of TRITON to various other image translation
algorithms. TRITON provides higher quality, temporally consistent results. This is because TRITON is better at
making use of 3d geometry. or more comparison videos like this, please see our paper's appendix.
Robot policy trained by sim2real
TRITON enables a robot reacher task. In this sim2real experiment, we train a behavioral cloning
policy that takes single RGB image from a fixed camera in the simulator and deploy it directly to the real robot
without further fine-tuning. The action policy predicts the location of all the target objects simultaneously
and is trained fully by only 2000 photorealistic images generated from TRITON. Check out the demo video below.
Citation
If you would like to cite us, please use the below bibtex citation:
@inproceedings{Burgert2022,
author = {Burgert, Ryan and Shang, Jinghuan and Li, Xiang and Ryoo, Michael},
title = {Neural Neural Textures Make Sim2Real Consistent},
booktitle = {Proceedings of the 6th Conference on Robot Learning},
year = {2022},
url = {https://tritonpaper.github.io}
}
author = {Burgert, Ryan and Shang, Jinghuan and Li, Xiang and Ryoo, Michael},
title = {Neural Neural Textures Make Sim2Real Consistent},
booktitle = {Proceedings of the 6th Conference on Robot Learning},
year = {2022},
url = {https://tritonpaper.github.io}
}