Carview!

This is the official repository for ICLR 2025 paper "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation". We present a World-model-augmented (WMA) web agent that simulates action outcomes for better decision-making in long-horizon web tasks. Unlike existing LLM-based web agents that often make costly mistakes (e.g., repeatedly buying non-refundable tickets), our approach leverages world models to predict potential consequences before taking actions. We propose a transition-focused observation abstraction that enables effective training of LLMs as world models by focusing on natural language descriptions of important state changes. Experiments on WebArena and Mind2Web demonstrate that our world models improve agents' policy selection without training and show superior cost- and time-efficiency compared to tree-search-based approaches.

🌍 News

[2025/01/22] WMA Web Agent is accepted by ICLR 2025!
[2024/06/12] WMA Web Agent is out!

Models and Datasets

Type	Name	Link
Model	World Model Adapter	Meta-Llama-3.1-8B-Instruct-WM-webarena-16k-adapter
Model	Value Model Adapter	Meta-Llama-3.1-8B-Instruct-value-model-16k-qlora-adapter-v2
Dataset	World Model Training	world_model_for_wa_desc_with_tao_dataset
Dataset	World Model Formatted	world_model_for_wa_desc_with_tao_formatted_w_cot
Dataset	Value Model Test	Mind2Web-cleaned-lite-value-model-w-cot-test
Dataset	Value Model Formatted Test	Mind2Web-cleaned-lite-value-model-w-cot-formatted-test

Abstract

Click Here

Large language models (LLMs) have recently gained much attention in building autonomous agents. However, performance of current LLM-based web agents in long-horizon tasks is far from optimal, often yielding errors such as repeatedly buying a non-refundable flight ticket. By contrast, humans can avoid such an irreversible mistake, as we have an awareness of the potential outcomes (e.g., losing money) of our actions, also known as the "world model". Motivated by this, our study first starts with preliminary analyses, confirming the absence of world models in current LLMs (e.g., GPT-4o, Claude-3.5-Sonnet, etc.). Then, we present a World-model-augmented (WMA) web agent, which simulates the outcomes of its actions for better decision-making. To overcome the challenges in training LLMs as world models predicting next observations, such as repeated elements across observations and long HTML inputs, we propose a transition-focused observation abstraction, where the prediction objectives are free-form natural language descriptions exclusively highlighting important state differences between time steps. Experiments on WebArena and Mind2Web show that our world models improve agents' policy selection without training and demonstrate our agents' cost- and time-efficiency compared to recent tree-search-based agents.

Overview

Dataset Construction

The process of dataset construction includes following steps:

Collecting Agent-Environment Interaction Data

First, we need to collect the agents' trajectories to collect the environment transition data.

python run_for_trajectory.py

Annotating the Transition-focused Observation

Then, to allow efficient training of the world model, we extract the difference between the current observation and the next observation.

python dataset_construction/annotation_for_tao_torch.py

And, we convert the extracted difference to descriptive format.

python dataset_construction/annotation_for_description_with_tao.py

Finally, we processing the dataset for training.

python dataset_construction/format_dataset_and_split.py

World Model Training

We use axolotl framework for finetuning our world models. We will upload the train configuration soon.

Inference with the World Model

Please note that our code base for inference is largely motivated by search-agents. Also, you have to setup docker environment to run experiments on WebArena. Please refer to WebArena to setup the environment. To run inference, execute the command below.

bash scripts/parallel_run_webarena_wma.sh

Citation

@inproceedings{
chae2025web,
title={Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation},
author={Hyungjoo Chae and Namyoung Kim and Kai Tzu-iunn Ong and Minju Gwak and Gwanwoo Song and Jihoon Kim and Sunghwan Kim and Dongha Lee and Jinyoung Yeo},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=moWiYJuSGF}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agent		agent
browser_env		browser_env
dataset		dataset
dataset_construction		dataset_construction
environment_docker		environment_docker
evaluation_harness		evaluation_harness
images		images
llms		llms
mind2web		mind2web
scripts		scripts
README.md		README.md
export_commands.sh		export_commands.sh
prepare.sh		prepare.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_for_trajectory.py		run_for_trajectory.py
run_w_world_model.py		run_w_world_model.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 News

Models and Datasets

Abstract

Overview

Dataset Construction

Collecting Agent-Environment Interaction Data

Annotating the Transition-focused Observation

World Model Training

Inference with the World Model

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kyle8581/WMA-Agents

Folders and files

Latest commit

History

Repository files navigation

🌍 News

Models and Datasets

Abstract

Overview

Dataset Construction

Collecting Agent-Environment Interaction Data

Annotating the Transition-focused Observation

World Model Training

Inference with the World Model

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages