You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
This repository contains the code and resources for the NAACL 2025 main paper titled "Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations". The project focuses on adapting large language models (LLMs) to simulate survey response distributions across diverse global populations.
Introduction
This project explores the use of large language models (LLMs) to simulate survey responses for global populations. By fine-tuning LLMs on survey data, we aim to generate realistic response distributions that reflect the diversity of global opinions.
Installation
To set up the environment, clone this repository and install the required dependencies:
pip install -r requirements.txt
Dataset Preparation
Before running the experiments, you need to prepare the dataset. Run the following script to download and preprocess the data:
sh prepare_data.sh
or you can also directly use our processed data: Download Dataset.
Zero-shot Evaluation
To evaluate the model in a zero-shot setting, use the following script:
sh infer_slurm.sh
Fine-tuning
To fine-tune the model on the survey dataset, run the following script:
sh train_slurm.sh
Evaluation
After fine-tuning, you can evaluate the model's performance using the following script (config INFER_MODE as sft, and CKPT_PATH):
sh infer_slurm.sh
Contact
For any questions or issues, please open an issue on this repository or contact yongcao2018@gmail.com.
About
Code for Simulating LLMs' cultural responses distribution.