This repository contains a modified version of Windows Agent Arena (WAA) 🪟 , a scalable Windows AI agent platform for testing and benchmarking multi-modal, desktop AI agents. This modified version focuses on integration with UFO, a UI-Focused Agent for Windows OS Interaction.
We highly recommend you have a look at the deployment guide from the original WindowsAgentArena repository. Our guide here assumes you are familiar with the deployment process of the original repository. The following steps will help you set up the environment for running the UFO agent in the Windows Agent Arena.
Clone the repository
git clone https://github.com/nice-mee/WindowsAgentArena.git
Note: If you want to run OSWorld cases, checkout the
2020-qqtcg/dev
branch.git checkout 2020-qqtcg/dev
Create a config.json
file in the root of WAA repo, the API key here doesn't matter, since UFO will only use the key from its own config file.
{
"OPENAI_API_KEY": "placeholder"
}
Next, build the WinArena image locally:
cd scripts
chmod +x build-container-image.sh # (if required)
chmod +x prepare-agents.sh # (if required)
./build-container-image.sh --build-base-image true
This will create the windowsarena/winarena:latest
image with the latest code from the src
directory.
You should first configure UFO with ufo/config/config.json
(refer to UFO repo for details). Then copy the entire ufo
folder to WindowsAgentArena/src/win-arena-container/client/
.
cp -r src/win-arena-container/vm/setup/mm_agents/UFO/ufo src/win-arena-container/client/
Remember to swap the order of @staticmethod
and @functools.lru_cache()
in src/win-arena-container/client/ufo/llm/openai.py
, this is actually due to a bug in Python 3.9 and unfortunately WAA uses Python 3.9 instead of higher versions (UFO uses Python 3.10).
- Visit Microsoft Evaluation Center, accept the Terms of Service, and download a Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO file [~6GB]
- After downloading, rename the file to
setup.iso
and copy it to the directoryWindowsAgentArena/src/win-arena-container/vm/image
Before running the arena, you need to prepare a new WAA snapshot (also referred as WAA golden image). This 30GB snapshot represents a fully functional Windows 11 VM with all the programs needed to run the benchmark. This VM additionally hosts a Python server which receives and executes agent commands. To learn more about the components at play, see our local and cloud components diagrams.
To prepare the gold snapshot, run once:
cd ./scripts
./run-local.sh --mode dev --prepare-image true
Please do not interfere with the VM while it is being prepared. It will automatically shut down when the provisioning process is complete.
You will find the 30GB WAA golden image in WindowsAgentArena/src/win-arena-container/vm/storage
.
Start the initial run with this command:
./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_custom.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}'
After booting up, wait until the device code prompt shows up, then do not enter the device code. This will block the WAA server forever as long as you don't enter the device code.
Instead, visit localhost:8006
and control the WAA Windows, do the following things:
- Disable Windows Firewall.
- Open Google Chrome and complete the initial setup.
- Open VLC and complete the initial setup.
After completing these steps, kill the WAA client, then copy the "golden" image under storage
folder to somewhere else.
Before an experiment run, do the following things:
- Replace image with previously obtained golden image
- Delete the UFO logs
Then run this command:
./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_full.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}
You probably will use a different LLM type or endpoint, so make sure to change the --agent-settings
parameter accordingly.
Note:
test_full.json
contains all the test cases where UIA works,test_all.json
contains all the test cases, even if UIA doesn't work. So please usetest_full.json
if OmniParser is not used.