This repo provides the full implementation for the paper "Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming" at the International Conference on Learning Representations (ICLR) 2022
Authors: Sachin Konan*, Esmaeil Seraj*, Matthew Gombolay
* Co-first authors. These authors contributed equally to this work.
Full Read (arXiv): https://arxiv.org/pdf/2201.08484.pdf
- Download Anaconda
conda env create --file marl.ymlcd PettingZooconda activate marlpython setup.py install- Follow Starcraft MultiAgent Challenge Instructions Here:
https://github.com/oxwhirl/smac
cd pistonball- To Execute Experiments:
- MOA:
python test_piston_ball.py -method moa - InfoPG:
python test_piston_ball.py -method infopg -k [K_LEVELS] - Adv. InfoPG:
python test_piston_ball.py -method infopg_adv -k [K_LEVELS] - Consensus Update:
python test_piston_ball.py -method consensus - Standard A2C:
python test_piston_ball.py -method a2c
- MOA:
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/pistonball/ python distributed_pistonabll_train.py -batch 4 -workers [NUM CPUS]- Results will be saved in
experiments/pistonball/[DATETIME OF RUN]/
- cd
- MOA:
python batch_pistoncase_moa_env.py - InfoPG:
python batch_pistoncase_infopg_env.py
cd pong- To Execute MOA Experiments:
cd pong_moa- MOA:
python distributed_pong_moa_train.py -batch 16 -workers [NUM CPUS] - Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/pong/ python distributed_pong_train.py -batch 16 -workers [NUM CPUS]- Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic - Adv. InfoPG:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal - Consensus Update:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus - Standard A2C:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal - Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- InfoPG:
cd walker- To Execute MOA Experiments:
cd walker_moa- MOA:
python distributed_walker_train_moa.py -batch 16 -workers [NUM CPUS] - Results will be saved in
experiments/walker_moa/[DATETIME OF RUN]/ - To Execute PR2-AC Experiments:
- cd
../pr2-ac/walker/ python distributed_walker_train.py -batch 16 -workers [NUM CPUS]- Results will be saved in
experiments/walker/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic - Adv. InfoPG:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal - Consensus Update:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus - Standard A2C:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal - Results will be saved in
experiments/walker/[DATETIME OF RUN]/
- InfoPG:
cd starcraft- To Execute MOA Experiments:
cd moa- MOA:
python distributed_starcraft_train_moa.py -batch 128 -workers [NUM CPUS] -positive_rewards - Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/starcraft/ python distributed_starcraft_train.py -batch 128 -workers [NUM CPUS]- Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic -positive_rewards - Adv. InfoPG:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv normal -positive_rewards - Consensus Update:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -consensus -positive_rewards - Standard A2C:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -positive_rewards - Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- InfoPG: