| CARVIEW |
Select Language
How do we model agent-environment decoupling in visual RL?
Rather than directly model a single latent variable from an input image, we add a latent, ZR,
to represent agent-specific visual information.
How do we obtain robot-specific visual supervision?
We can directly get robot masks from a simulator.
Or we can fine-tune a segmentation model. Furthermore, there are many off-the-shelf segmentation models,
such as segment-anything, that can be used to provide
robot-specific visual information.
Robot masks are a natural and readily-available form of robot-specific visual information.
How do we incorporate this into a visual RL algorithm?
We augment the RL loss with agent-centric and environment-centric losses. In particular, we reconstruct
a robot mask from the agent-centric latent and reconstruct the input image from the environment-centric
latent. We supervise our input image encoder jointly with an RL loss, a robot mask reconstruction loss,
and an image reconstruction loss.
Experimental Setup
We train SEAR on 18 tasks spanning 5 robots across 4 simulation suites, in single-task, transfer, and
multi-task settings.
How does SEAR perform in single-task settings?
SEAR matches or exceeds baselines in single-task settings.
What about transfer learning?
SEAR seems to learn representations useful for transfer learning.
How does SEAR perform for multi-task learning?
While SEAR matches baselines, more work is needed to improve SEAR for the multi-task setting.
How do noisy mask labels impact SEAR's peformance?
We generate noisy mask labels in simulation by randomly dropping pixels or downsampling the mask.
While noisy masks hurt performance, SEAR is still able to outperform baselines.
Key Takeaways
- Takeaway 1: Decoupled representation boosts performance.
- Takeaway 2: SEAR can help with transfer.
- Takeaway 3: Masks are readily available from sim or shelf-supervised models.
- Takeaway 4: SEAR can be easily added to any visual RL approach.
BibTeX
@InProceedings{pmlr-v202-gmelin23a,
title = {Efficient {RL} via Disentangled Environment and Agent Representations},
author = {Gmelin, Kevin and Bahl, Shikhar and Mendonca, Russell and Pathak, Deepak},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
pages = {11525--11545},
year = {2023},
editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato,
Sivan and Scarlett, Jonathan},
volume = {202},
series = {Proceedings of Machine Learning Research},
month = {23--29 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v202/gmelin23a/gmelin23a.pdf},
url = {https://proceedings.mlr.press/v202/gmelin23a.html},
}
Acknowledgements
We would like to thank Alexander C. Li and Murtaza Dalal for fruitful discussions. This work is supported by
Sony Faculty Research Award and NSF IIS-2024594.