You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here we use information regularization to promote cooperation / competition via intention signalling / hiding in a multi-agent RL problem. The environment is a simple, two-goal grid world built in OpenAI Gym based on the example here. The first agent, Alice, has access to the goal, is parameterized with a tabular policy and value function, and is trained using REINFORCE, based on an implementation here. Alice's policy is regularized with the mutual information between goal and action (given state), I(goal; action | state). Depending on the sign of the information weighting, this regularization encourages her to either signal or hide her private information about the goal. The second agent, Bob, does not have access to the goal, but instead must infer it purely from observing the behavior of Alice. Thus, information regularization of Alice directly affects the success of Bob. In summary, information regularization allows Alice to train alone, but to be prepared for cooperation / competition with a friend / foe (Bob) introduced later. More detailed notes can be found here.
TODOS:
make richer episode visualization
use I(t) = sum of info up until t, so that agent prefers revealing info later
find a lossy case
learn friend/foe policies and optimize mixture parameter
learn pi(beta) and optimize beta
try discounting kl / entropy into future (like Distral paper); for high enough beta, Alice should try not to terminate episodes
under what conditions might Alice "overshoot" to signal?
under what conditions are I(traj;goal) and I(action;goal|state) approx equal?
About
using information theory to encourage agents to cooperate and compete