Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy.
This repo evaluates and analyzes a multi LLM agent framework, VacSim, for simulating health-related decision-making. Paper link: https://arxiv.org/abs/2503.09639
- Create and activate an environment in anaconda (recommend
python=3.10) pip install -r requirements.txt
The main entry file of this repo is driver.py, which creates an EvalSuite object that conducts evaluations for the multi-agent system. Each EvalSuite (see implementations in utils/eval_suite.py) will analyze according to the eval_mode you input in:
- 0 -> conduct attitude tuning
- 1 -> incentive policy strength eval
- 2 -> community policy strength eval
- 3 -> mandate policy strength eval
- 4 -> news sanity eval
- 5 -> run simulation with different diseases (by replacing disease names)
- 6 -> compare simulation run under strong policies of each kind (incentive, community, mandate)
Each EvalSuite will create an DataParallelEngine or AsyncDataParallelEngine object, which will run parallel inferences on vLLM servers or make async API calls, depending on whether you run local or remote inferneces. DataParallelEngine or AsyncDataParallelEngine inherit Engine class, which specifies behaviors of agents and outlines their routines at each simulation time step.
- (Recommended) If you use bash script to run, you can see an example at
example_run_4_servers.shfor running parallel servers on 4 GPUs - If you use a single interative GPU, do something like:
python -m vllm.entrypoints.openai.api_server \
--model $model --guided-decoding-backend lm-format-enforcer --max-model-len 6144 \
--tensor-parallel-size 1 --port $PORT
where you input $model and $PORT.
Example:
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Meta-Llama-3.1-8B-Instruct --guided-decoding-backend lm-format-enforcer --max-model-len 6144 \
--tensor-parallel-size 1 --port 49172
- Parallel: If you use interactive GPUs on N parallel processes, request
NGPUs and openNsessions. At each session, do the command above.
Example command:
python src/driver.py 1 --warmup_days 5 --run_days 15 \
--news_path data/news/COVID-news-total-k=10000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 --temperature 0.7
This command runs policy strength eval (the incentive policy) for five seeds by default. If you want to supply a different list of seeds, add in --seed_list argument.
If you use multiple processes, then include all the ports, like:
python src/driver.py 1 --warmup_days 5 --run_days 15 \
--news_path data/news/COVID-news-total-k=10000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 55050 60050 60100 --temperature 0.7
If you use close-sourced models, we recommend to provide your API keys as environmental variables. If you use:
- OpenAI models: create an env variable called
OPENAI_API_KEYand modify theinit_clientmethod insrc/engines/async_engine. - OpenAI models hosted on Azure: create env variables called
AZURE_OPENAI_API_KEYandAZURE_OPENAI_ENDPOINTand specifiy the api_version theinit_clientmethod insrc/engines/async_engine - Anthropic models: create an env varibale called
ANTHROPIC_API_KEY.
Caveat: Note that for some cloud providers, you need to specify the model token limit. For example, instead of --model_type gpt-4o, you need to do --model_type gpt-4o-0513-50ktokenperminute. Please be aware of this when you input.
The following lists commands and inputs for each kind of eval:
- Attitude tuning (command 0): Do not provide
--temperature, provide--temperature_listbecause it is tuning within a range. Example (on four ports).
python src/driver.py 0 --warmup_days 5 --run_days 15 \
--news_path data/news/COVID-news-total-k=10000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 55050 60050 60100 --temperature_list 0.1, 0.3, 0.5, 0.7, 1.0, 2.0
- Policy Strength Eval (command 1,2,3): Provide
--temperature(the selected temperature after attitude tuning). Example (on four ports).
python src/driver.py 1 --warmup_days 5 --run_days 15 \
--news_path data/news/COVID-news-total-k=10000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 55050 60050 60100 --temperature 0.7
- News Eval (command 4): Provide a news_list (positive and negative news).
python src/driver.py 4 --warmup_days 5 --run_days 15 \
--news_list data/news/COVID-news-positive-k=5000.pkl data/news/COVID-news-negative-k=5000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 55050 60050 60100 --temperature 0.7
- Policy Comparison (command 5). Need to specify a temperature, similar command to the Policy Strength Eval.
python src/driver.py 5 --warmup_days 5 --run_days 15 \
--news_path data/news/COVID-news-total-k=10000.pkl \
--network_str data/social_network-num=100-incl=neutral.pkl \
--profile_path data/profiles-num=100-incl=neutral.pkl \
--model_type meta-llama/Meta-Llama-3.1-8B-Instruct \
--disease COVID-19 \
--ports 49172 55050 60050 60100 --temperature 0.7
The eval outputs will stack up your storage in the long term, as it contains the output of individual agents in the simulated population. If you need to save the output in a custom directory for data storage, specify the --save_dir argument, otherwise it will be created in the current directory.
The output directory contains the following files:
- results: record the summary of all the simulation ran over a list of seeds, along with the individual run results.
- sim: record the detailed info of every simulation run, including all agents output and a trajectory of an individual run.
You can create more evaluations by modifying the eval function in the src/utils/eval_suite.py file. You can set the policy, news, social network, and the demographic of agents provided to the engine by specifying corresponding parameters.