Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

How do we model agent-environment decoupling in visual RL?

Figure
Rather than directly model a single latent variable from an input image, we add a latent, ZR, to represent agent-specific visual information.

How do we obtain robot-specific visual supervision?


Figure
We can directly get robot masks from a simulator.

Figure
Or we can fine-tune a segmentation model. Furthermore, there are many off-the-shelf segmentation models, such as segment-anything, that can be used to provide robot-specific visual information.

Robot masks are a natural and readily-available form of robot-specific visual information.

How do we incorporate this into a visual RL algorithm?


Figure
We augment the RL loss with agent-centric and environment-centric losses. In particular, we reconstruct a robot mask from the agent-centric latent and reconstruct the input image from the environment-centric latent. We supervise our input image encoder jointly with an RL loss, a robot mask reconstruction loss, and an image reconstruction loss.

Experimental Setup


Figure
We train SEAR on 18 tasks spanning 5 robots across 4 simulation suites, in single-task, transfer, and multi-task settings.

How does SEAR perform in single-task settings?


Figure

Figure

Figure
SEAR matches or exceeds baselines in single-task settings.

What about transfer learning?


Figure
SEAR seems to learn representations useful for transfer learning.

How does SEAR perform for multi-task learning?


Figure
While SEAR matches baselines, more work is needed to improve SEAR for the multi-task setting.

How do noisy mask labels impact SEAR's peformance?


Figure
We generate noisy mask labels in simulation by randomly dropping pixels or downsampling the mask.

Figure
While noisy masks hurt performance, SEAR is still able to outperform baselines.

Key Takeaways

  • Takeaway 1: Decoupled representation boosts performance.
  • Takeaway 2: SEAR can help with transfer.
  • Takeaway 3: Masks are readily available from sim or shelf-supervised models.
  • Takeaway 4: SEAR can be easily added to any visual RL approach.

BibTeX

        
@InProceedings{pmlr-v202-gmelin23a,
  title = {Efficient {RL} via Disentangled Environment and Agent Representations},
  author = {Gmelin, Kevin and Bahl, Shikhar and Mendonca, Russell and Pathak, Deepak},
  booktitle = {Proceedings of the 40th International Conference on Machine Learning},
  pages = {11525--11545},
  year = {2023},
  editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, 
            Sivan and Scarlett, Jonathan},
  volume = {202},
  series = {Proceedings of Machine Learning Research},
  month = {23--29 Jul},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v202/gmelin23a/gmelin23a.pdf},
  url = {https://proceedings.mlr.press/v202/gmelin23a.html},
}
    

Acknowledgements

We would like to thank Alexander C. Li and Murtaza Dalal for fruitful discussions. This work is supported by Sony Faculty Research Award and NSF IIS-2024594.

Page template borrowed from Nerfies.