Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

This repository contains the code and resources for the NAACL 2025 main paper titled "Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations". The project focuses on adapting large language models (LLMs) to simulate survey response distributions across diverse global populations.

Introduction

This project explores the use of large language models (LLMs) to simulate survey responses for global populations. By fine-tuning LLMs on survey data, we aim to generate realistic response distributions that reflect the diversity of global opinions.

Installation

To set up the environment, clone this repository and install the required dependencies:

pip install -r requirements.txt

Dataset Preparation

Before running the experiments, you need to prepare the dataset. Run the following script to download and preprocess the data:

sh prepare_data.sh

or you can also directly use our processed data: Download Dataset.

Zero-shot Evaluation

To evaluate the model in a zero-shot setting, use the following script:

sh infer_slurm.sh

Fine-tuning

To fine-tune the model on the survey dataset, run the following script:

sh train_slurm.sh

Evaluation

After fine-tuning, you can evaluate the model's performance using the following script (config INFER_MODE as sft, and CKPT_PATH):

sh infer_slurm.sh

Contact

For any questions or issues, please open an issue on this repository or contact yongcao2018@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
llama_recipes		llama_recipes
.gitignore		.gitignore
README.md		README.md
config_setting.yaml		config_setting.yaml
construct_sft_data.py		construct_sft_data.py
custom_dataset.py		custom_dataset.py
finetuning.py		finetuning.py
infer_slurm.sh		infer_slurm.sh
inference.py		inference.py
inference_multi_gpu.py		inference_multi_gpu.py
prepare_data.sh		prepare_data.sh
requirements.txt		requirements.txt
train_slurm.sh		train_slurm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

Introduction

Installation

Dataset Preparation

Zero-shot Evaluation

Fine-tuning

Evaluation

Contact

About

Uh oh!

Releases

Packages

Languages

yongcaoplus/SimLLMCultureDist

Folders and files

Latest commit

History

Repository files navigation

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

Introduction

Installation

Dataset Preparation

Zero-shot Evaluation

Fine-tuning

Evaluation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages