The First Framework for LLM-Driven Machine Design in Besiege
Paper: Agentic Design of Compositional Machines
Installation โข Quick Start โข Training โข Citation
- Overview
- Installation
- Quick Start
- Fine-tuning
- Performance Leaderboard
- RL Fine-tuning Results
- License
- Acknowledgement
- Citation
BesiegeField is a cutting-edge framework that enables Large Language Models (LLMs) to autonomously design and build complex machines in the Besiege physics-based game environment. This project bridges AI reasoning with creative engineering tasks.
| Component | Version |
|---|---|
| Besiege | Linux v1.60-22044 |
| Ubuntu | 22.04 |
| GLIBC | 2.33 โ 2.35 |
| Mono | โฅ 6.8.0.105 |
Step 1: Purchase the official copy on Steam
Step 2: Download DepotDownloader
Step 3: Download Besiege v1.60-22044
./DepotDownloader -app 346010 -depot 346016 -manifest 2732248020700221971 \
-username <steam_user> -password <password>Step 4: Download v1.20-17395 executables (required for headless operation)
./DepotDownloader -app 346010 -depot 346016 -manifest 5506301120812842666 \
-username <steam_user> -password <password>๐ก Tip: Find other manifests on SteamDB if needed.
๐ฅ BesiegeField Plugin (Google Drive)
Standard Installation:
sudo apt install mono-complete xvfb # xvfb only for headless workstation
mono --version # Verify โฅ 6.8.0.105๐ฆ Offline/Manual Installation (click to expand)
If apt is unavailable, use manual installation:
# Install mono
cd /path/to/tar
tar -xzf mono-complete-offline.tar.gz
for deb in *.deb; do dpkg -x "$deb" .; done
export PATH="/path/to/mono/usr/bin:$PATH"
export LD_LIBRARY_PATH="/path/to/mono/usr/lib:$LD_LIBRARY_PATH"
export PKG_CONFIG_PATH="/path/to/mono/usr/lib/pkgconfig:$PKG_CONFIG_PATH"
# Make permanent
cat >> ~/.bashrc <<EOF
export PATH="/path/to/mono/usr/bin:\$PATH"
export LD_LIBRARY_PATH="/path/to/mono/usr/lib:\$LD_LIBRARY_PATH"
export PKG_CONFIG_PATH="/path/to/mono/usr/lib/pkgconfig:\$PKG_CONFIG_PATH"
EOF
source ~/.bashrc
# Install xvfb
cd /path/to/xvfb
tar -xzf xvfb-offline.tar.gz
dpkg -i *.debStep 1: Extract the plugin archive and copy all files into the v1.60-22044 game folder
Step 2: Copy Besiege.x86 & Besiege.x86_64 from v1.20-17395 into v1.60-22044, overwriting the originals
โ ๏ธ Warning: This enables headless/code control but makes normal GUI start unstable. Keep a backup if you want to launch v1.60 visually.
Step 3: Set permissions
chmod -R 777 /path/to/BesiegeStep 4: Test the vanilla game (use backup copy)
cd /path/to/backup/Besiege && ./run.shconda env create -f environment_inferenceonly.yaml
conda activate <env_name>Folder Structure:
your-project/
โโโ Besiege/ # Game installation
โโโ AgenticCodes/ # Framework code
Edit AgenticCodes/config.py:
| Parameter | Description |
|---|---|
APIPATH |
Path to file storing LLM type, API key, etc. Fill it in yourself. |
DEFAULT_SAVE_ROOT |
Root directory for LLM outputs |
SCRIPT_PATH |
Must point to Besiege/run_besiegefield.sh |
Design a machine to throw projectiles:
python main.py \
-use_model deepseek-chat \
-task catapult/catapult_level1 \
-env_num 2 \
-user_input "Design a machine to throw a boulder (type id 36) in a parabolic trajectory."Design a machine to move forward:
python main.py \
-use_model deepseek-chat \
-task car/car_level1 \
-env_num 2 \
-user_input "Design a machine to move forward on a straight road."Explore all available tasks in environments/env_files/level_menus.json
- Generated
.bsgmachine files appear inDEFAULT_SAVE_ROOT - Copy them to
Besiege/Besiege_Data/SavedMachines - Run
./run.shto launch the game - Inspect and test your AI-designed machines in-game!
Add training-related packages:
conda activate <env_name>
pip install -r requirements_rl.txtStep 1: Run Cold Start with Orthogonal Finetuning (Dataset will download from huggingface)
cd PostTraining/ColdStart
./run_cold_start.sh <model_path>If you want to try cold start with human dataset (Not Recommended), you can run with:
cd PostTraining/ColdStart
./run_cold_start.sh <model_path> trueFill the paths in merge_ckpts.py before running:
python merge_ckpts.pyConfigure rl_config.yaml with your settings (important!), then run:
cd PostTraining/RL
./rl_single_agent_light.shPerformance metrics across different models and methods:
| Models | Single-agent | Iterative Editing | Hierarchical Design | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | Max | Std | Mean | Max | Std | Mean | Max | Std | |
| Gemini 2.5 Pro | 2.30 | 9.00 | 3.86 | 4.67 | 21.95 | 8.68 | 9.83 | 18.19 | 8.35 |
| OpenAI o3 | 2.87 | 5.22 | 1.96 | 9.14 | 14.01 | 3.71 | 2.00 | 11.11 | 3.98 |
| Qwen3-Coder-480B-A35B | 1.75 | 9.24 | 3.17 | 5.10 | 12.02 | 5.54 | 3.90 | 6.52 | 2.54 |
| Doubao Seed 1.6-250615 | 3.18 | 8.20 | 2.99 | 4.82 | 9.10 | 3.41 | 1.73 | 4.76 | 2.39 |
| Claude Opus 4-20250514 | 1.19 | 4.82 | 2.21 | 1.18 | 4.91 | 2.18 | 2.27 | 9.32 | 4.22 |
| DeepSeek-V3 | 3.50 | 4.86 | 2.17 | 3.07 | 5.24 | 2.55 | 2.41 | 4.93 | 2.58 |
| Kimi K2-0711-preview | 2.57 | 9.05 | 3.72 | 2.82 | 11.39 | 5.23 | 5.39 | 12.02 | 5.16 |
| Llama 4 Scout 17B 16E | 3.18 | 5.64 | 1.95 | 1.28 | 5.94 | 2.41 | 3.59 | 11.83 | 4.15 |
Performance metrics across different models and methods:
| Models | Single-agent | Iterative Editing | Hierarchical Design | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | Max | Std | Mean | Max | Std | Mean | Max | Std | |
| Gemini 2.5 Pro | 33.96 | 40.85 | 6.73 | 34.34 | 41.66 | 13.96 | 29.96 | 41.52 | 7.78 |
| OpenAI o3 | 15.28 | 32.08 | 8.97 | 14.34 | 35.08 | 11.79 | 28.39 | 36.18 | 11.01 |
| Qwen3-Coder-480B-A35B | 8.87 | 11.50 | 4.46 | 15.24 | 28.95 | 13.12 | 12.59 | 34.05 | 10.78 |
| Doubao Seed 1.6-250615 | 3.51 | 9.40 | 4.85 | 8.11 | 10.04 | 3.58 | 18.75 | 26.02 | 4.38 |
| Claude Opus 4-20250514 | 9.83 | 12.98 | 1.28 | 8.07 | 28.04 | 12.48 | 14.56 | 38.67 | 20.69 |
| DeepSeek-V3 | 9.06 | 10.53 | 3.68 | 8.23 | 18.84 | 7.12 | 17.92 | 31.94 | 12.85 |
| Kimi K2-0711-preview | 1.75 | 8.09 | 2.80 | 14.36 | 28.34 | 9.47 | 1.94 | 14.99 | 5.48 |
| Llama 4 Scout 17B 16E | 0.02 | 0.03 | 0.01 | 3.04 | 12.76 | 5.23 | 1.55 | 2.00 | 0.32 |
Performance comparison of Qwen2.5-14B-Instruct model with different training strategies:
| Models | Catapult | Car | ||||
|---|---|---|---|---|---|---|
| Validity Ratio | Mean Score | Max Score | Validity Ratio | Mean Score | Max Score | |
| Qwen2.5-14B-Instruct | 11/50 | 0.06 | 2.41 | 46/50 | 4.97 | 19.10 |
| Qwen2.5-14B-Instruct + Cold-Start | 9/50 | 0.11 | 5.54 | 40/50 | 4.67 | 20.23 |
| Qwen2.5-14B-Instruct + RL | 12/50 | 0.13 | 5.92 | 41/50 | 3.72 | 24.08 |
| Qwen2.5-14B-Instruct + Cold-Start + RL | 11/50 | 0.14 | 7.14 | 42/50 | 5.05 | 45.72 |
If you find this repository useful for your research or projects, please consider citing our work:
@article{zhang2025besiegefield,
title={Agentic Design of Compositional Machines},
author={Zhang, Wenqian and Liu, Weiyang and Liu, Zhen},
journal={arXiv preprint arXiv:2510.14980},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
Weโd like to thank the developers of Besiege for creating such an inspiring game and for nurturing such a vibrant player community โ without them, this project wouldnโt exist.
Big thanks also to the BepInEx team for their amazing modding framework, which made it possible for us to push the boundaries of whatโs possible in Besiege.
If you find this project helpful, please consider giving it a star! โญ
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License - see the LICENSE file for details.

