| CARVIEW |
Real-World Experiments
Rod Balancing
Balancing, or dynamic stacking of objects critically depends on an accurate estimate of the inertia parameters. In this task, the agent can interact with a rod to identify its physical parameters, in this case varying the center of mass along the rod. To successfully balance the rod on the tower requires an accurate estimation of the system parameters.
Shuffleboard
In shuffleboard, the goal is to shoot a puck to a target area. We closely follow the original game and pour wax (sand) on the board. This modification makes the task especially difficult as the surface friction on the board changes slightly after each shot since the puck displaces the wax. The goal is to strike the puck to one of the target regions.
Fisher Information Exploration
Intuition
Simulation
To identify the underlying physics parameters that govern the dynamics of the system, we need to collect trajectories that are strongly affected by these parameters. In the case of the rod balancing task, the center of mass affects how the rod rotates when pushed by the robot during the exploration stage. As shown above, the same actions can lead to drastically different rod poses after interaction. Trajectories are uninformative if the rod is not forced to rotate around its center of mass or doesn't move at all. We indicate the center of mass by a blue marker ● on top of the rod in both sim and real (not visible to the policy).
Fisher Information Maximization
The Fisher information matrix
plays a key role in the choice of our exploration policy, \(\pi_{exp}\).
Recall that for a distribution over trajectories \(p_\theta\), the Fisher information is defined as:
\(\mathrm{I}(\theta,\pi_{exp}) := \mathrm{E}_{\tau \sim p_{\theta}} \left [ \nabla_{\theta} \log p_{\theta}(\tau; \pi_{exp}) \cdot \nabla_{\theta} \log p_{\theta}(\tau; \pi_{exp})^\top \right ]\)
The Fisher information matrix, therefore, captures the sensitivity of the distribution to the parameter \(\theta\). Since the distribution induced by trajectories \(\tau\) is different for different exploration policies, we can find distributions with higher Fisher information by changing the exploration policy. To find an exploration policy that gives us the most informative trajectories about the true parameter \(\theta^*\), we can formulate the optimization problem as:
\(\mathrm{argmin}_{\pi_{exp}}\quad\mathrm{tr}(\mathrm{I}(\theta^*, \pi_{exp})^{-1})\)
Intuitively, a policy that makes the Fisher information "large" will make \(\mathrm{tr}(\mathrm{I}(\theta^*, \pi)^{-1})\) "small", suggesting that the induced trajectories are very sensitive to the unknown parameters and are good candidates for system identification. Because the true parameters \(\theta^*\) are unknown, we solve this optimization problem in simulation by randomizing over the parameters and rolling out the resulting exploration policy in the real world to collect a trajectory.
Exploration Behavior
Simulation
Real World
We train the exploration policy \(\pi_{exp}\) in simulation (left) and roll it out in the real world (right) to collect trajectories for system identification. Observe that the exploration policy does not transfer perfectly but still collects informative data about the rod's center of mass. Note that even though the video shows multiple real-world rollouts, we only collect a single one in practice.
Baseline Comparisons
Rod Balancing
ASID correctly identifies the rod's center of mass and successfully balances it on the tower. The baseline trained with domain randomization, i.e., over a distribution of inertia parameters, fails catastrophically as it converges to picking up and placing the rod at a random location. These results show that identifying the correct parameters is crucial to solving dynamic tasks.
Shuffleboard
Due to the changing surface friction caused by previous shot attempts, the domain randomization baseline struggles to shoot the puck to the desired zone. With its dedicated exploration phase, ASID can accurately adapt the simulation to the current conditions and land the puck in the desired zone. A correct parameter estimate is crucial to accurately solve the task.