CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://showlab.github.io/AUI/ x-github-request-id: F107:3827E5:8492F3:9508AD:6951FFCF accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:13:04 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210051-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766981584.844854,VS0,VE198 vary: Accept-Encoding x-fastly-request-id: d5d4e2a032cb64fa4b7ceac6258c9d9707d63837 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 25 Nov 2025 17:35:22 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6925e8da-7503" expires: Mon, 29 Dec 2025 04:23:04 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: FF8D:292AC1:83D5F0:943797:6951FFCF accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:13:04 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210051-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766981584.062027,VS0,VE204 vary: Accept-Encoding x-fastly-request-id: edb6202022471f84c17c6692e01652bd062e9a87 content-length: 7457 Computer-Use Agents as Judges for Generative User Interface

🧑🏻‍💻 For human 🤖️ For agent

Computer-Use Agents as Judges for Generative User Interface

Kevin Qinghong Lin^1†, Siyuan Hu^2†, Linjie Li³, Zhengyuan Yang³, Lijuan Wang³,
Philip Torr¹, Mike Zheng Shou²

¹University of Oxford ²Show Lab, National University of Singapore ³Microsoft

^†Equal Contribution

Paper Code Demo BibTeX

📖 TL;DR

What does Agent-friendly look like? Check out below demo:

The left UI is designed for 🧑🏻‍💻humans—prioritizing aesthetics.

The right UI is redesigned for 🤖agents—focused on clarity and functionality.

Can Computer-Use Agents offer feedback to assist Coders to Generate UI?

AUI Framework — **👨‍💻Humans Collaboration *vs.* 🤖️Coder-CUA Collaboration.**
**Left:** Most GUIs are designed by humans and optimized for user experience (e.g. aesthetics), forcing trained agents to adapt to human-oriented behaviors. **Right:** Our Coder-CUA Collaboration framework leverages Coder as Designer and CUA as Judge together, enabling more reliable task execution and improved usability for agents.
**This is AUI, Agent-friendly UI.**

🔧 How does it work?

🎬 Qualitative Examples

We introduce AUI-Gym, a benchmark of 52 applications with 1560 tasks.
Each screenshot shows initial vs agent-optimized UI across individual coder.

📊 New UI improve Success Rate

We evaluate our framework using Function Completeness Rate (FC) and CUA Success Rate (SR). The results demonstrate that our Coder-CUA collaboration significantly improves both metrics compared to the baseline, especially for stronger models like GPT-5 and Gemini-3-Pro.

Coder	Method	Overall Performance
Coder	Method	Func. Completeness (%)	CUA Success Rate (%)
GPT-5	Baseline	67.9	24.5
GPT-5	+ Ours	81.5	26.0
Qwen3-Coder-30B	Baseline	42.1	7.3
Qwen3-Coder-30B	+ Ours	60.1	19.0
GPT-4o	Baseline	36.3	8.8
GPT-4o	+ Ours	43.1	16.1
Gemini-3-Pro	Baseline	71.7	35.8
Gemini-3-Pro	+ Ours	72.5	47.0

🎓 Citation

Please kindly cite our paper if you find this project helpful.

@misc{lin2025aui,
      title={Computer-Use Agents as Judges for Generative User Interface}, 
      author={Kevin Qinghong Lin and Siyuan Hu and Linjie Li and Zhengyuan Yang and Lijuan Wang and Philip Torr and Mike Zheng Shou},
      year={2025},
      eprint={2511.15567},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15567}, 
}

Original Source | Taken Source