A Faithful and Scalable Testbed for Mobile UI Task Automation

Dataset • AgentEnv • Evaluator • Paper • Citation

was accepted at ACM UIST'24. Our 30-second teaser video:

teaser.mp4

LlamaTouch is a testbed for evaluating mobile UI automation agents in real-world mobile environments. It compares agent execution traces with (a sequence of) annotated essential states on UI interaction traces/datasets, rather than directly comparing two concrete action sequences. LlamaTouch achieves high evaluation accuracy while maintaining scalability.

Key features:

Task execution in real-world mobile environments.
Faithful and scalable task evaluation powered by LlamaTouch Evaluator and annotated essential states.
Easy task set annotation and expansion with a rich set of UI state annotation primitives and helper systems.

The workflow and programming demonstrations of LlamaTouch:

Dataset

Note

llamatouch_task_metadata.tsv contains the metadata of the dataset.

See docs to explore and use the dataset.

LlamaTouch comprises 495 mobile UI automation tasks, with 102 tasks sampled from AITW and 393 self-constructed tasks from 46 popular Android applications.

Each task contains:

A task description, e.g., "Reserve a rental car in Los Angeles from June 1st-7th, with a budget of up to $60 per day on Expedia."
A sequence of UI representations and actions to complete the task, labeled by human annotators:
- UI representations: Pixel-level screenshots, textual view hierarchies, Android activity names of each UI, etc.
- Actions: Actions performed on each UI to navigate to the next UI.
Annotated essential states, e.g., a textbox in the UI with the text field displaying "Your cart is empty" as shown below.

A visualized dataset sample is shown in the following figure, with actions marked by blue plus markers and essential states highlighted with red bounding boxes and numeric IDs.

AgentEnv

Note

Check out the doc to use AgentEnv.

AgentEnv bridges a mobile agent and real-world mobile environments (e.g., a real smartphone or an Android emulator) for on-device task execution.

AgentEnv provides basic APIs for completing mobile UI automation tasks, including

Retrieving UI representations from mobile environments.
Forwarding agent decisions (predicted actions) to mobile environments.

All device states are recorded during task execution and will be used in LlamaTouch Evaluator.

LlamaTouch Evaluator

Note

Check out the doc to use LlamaTouch Evaluator.

LlamaTouch Evaluator takes essential states from LlamaTouch dataset and agent execution traces as input. For each task, it iterates through the agent execution trace to detect whether all annotated essential states are traversed to complete the task.

With human validation results, it can also report the accuracy of the evaluation approach.

Citation

@misc{zhang2024llamatouch,
      title={LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation}, 
      author={Li Zhang and Shihe Wang and Xianqing Jia and Zhihan Zheng and Yunhe Yan and Longxi Gao and Yuanchun Li and Mengwei Xu},
      year={2024},
      eprint={2404.16054},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2404.16054}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
dataset		dataset
resources		resources
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Faithful and Scalable Testbed for Mobile UI Task Automation

Dataset

AgentEnv

LlamaTouch Evaluator

Citation

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

LlamaTouch/LlamaTouch

Folders and files

Latest commit

History

Repository files navigation

A Faithful and Scalable Testbed for Mobile UI Task Automation

Dataset

AgentEnv

LlamaTouch Evaluator

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages