Carview!

CARVIEW

MOTORHOMES

Select Language

How do we model agent-environment decoupling in visual RL?

Rather than directly model a single latent variable from an input image, we add a latent, Z_R, to represent agent-specific visual information.

How do we obtain robot-specific visual supervision?

We can directly get robot masks from a simulator.

Or we can fine-tune a segmentation model. Furthermore, there are many off-the-shelf segmentation models, such as segment-anything, that can be used to provide robot-specific visual information.

Robot masks are a natural and readily-available form of robot-specific visual information.

How do we incorporate this into a visual RL algorithm?

We augment the RL loss with agent-centric and environment-centric losses. In particular, we reconstruct a robot mask from the agent-centric latent and reconstruct the input image from the environment-centric latent. We supervise our input image encoder jointly with an RL loss, a robot mask reconstruction loss, and an image reconstruction loss.

Experimental Setup

We train SEAR on 18 tasks spanning 5 robots across 4 simulation suites, in single-task, transfer, and multi-task settings.

How does SEAR perform in single-task settings?

SEAR matches or exceeds baselines in single-task settings.

What about transfer learning?

SEAR seems to learn representations useful for transfer learning.

How does SEAR perform for multi-task learning?

While SEAR matches baselines, more work is needed to improve SEAR for the multi-task setting.

How do noisy mask labels impact SEAR's peformance?

We generate noisy mask labels in simulation by randomly dropping pixels or downsampling the mask.

While noisy masks hurt performance, SEAR is still able to outperform baselines.

Key Takeaways

Takeaway 1: Decoupled representation boosts performance.
Takeaway 2: SEAR can help with transfer.
Takeaway 3: Masks are readily available from sim or shelf-supervised models.
Takeaway 4: SEAR can be easily added to any visual RL approach.

BibTeX

        
@InProceedings{pmlr-v202-gmelin23a,
  title = {Efficient {RL} via Disentangled Environment and Agent Representations},
  author = {Gmelin, Kevin and Bahl, Shikhar and Mendonca, Russell and Pathak, Deepak},
  booktitle = {Proceedings of the 40th International Conference on Machine Learning},
  pages = {11525--11545},
  year = {2023},
  editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, 
            Sivan and Scarlett, Jonathan},
  volume = {202},
  series = {Proceedings of Machine Learning Research},
  month = {23--29 Jul},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v202/gmelin23a/gmelin23a.pdf},
  url = {https://proceedings.mlr.press/v202/gmelin23a.html},
}

Acknowledgements

We would like to thank Alexander C. Li and Murtaza Dalal for fruitful discussions. This work is supported by Sony Faculty Research Award and NSF IIS-2024594.

Original Source | Taken Source

Efficient RL via Disentangled
Environment and Agent Representations

Abstract

How do we model agent-environment decoupling in visual RL?

How do we obtain robot-specific visual supervision?

How do we incorporate this into a visual RL algorithm?

Experimental Setup

How does SEAR perform in single-task settings?

What about transfer learning?

How does SEAR perform for multi-task learning?

How do noisy mask labels impact SEAR's peformance?

Key Takeaways

BibTeX

Acknowledgements

Efficient RL via Disentangled Environment and Agent Representations

Abstract

How do we model agent-environment decoupling in visual RL?

How do we obtain robot-specific visual supervision?

How do we incorporate this into a visual RL algorithm?

Experimental Setup

How does SEAR perform in single-task settings?

What about transfer learning?

How does SEAR perform for multi-task learning?

How do noisy mask labels impact SEAR's peformance?

Key Takeaways

BibTeX

Acknowledgements

Efficient RL via Disentangled
Environment and Agent Representations