| CARVIEW |
Experiments
Main Results
We evaluate FRIDAY on GAIA, a benchmark for general AI assistants featuring 466 challenging question-answering tasks. To answer questions in GAIA, language agents need skills to calculate, browse the web, handle multi-modality, and manipulate files, etc.
Self-directed Learning
We perform quantitative and qualitative evaluations to analysis FRIDAY’s self-directed learning capability.
QUANTITATIVE ANALYSIS
To showcase FRIDAY’s ability to master unfamiliar applications through self-learning, we conduct experiments on the SheetCopilot-20 dataset.This dataset includes 20 spreadsheet control tasks, covering various operations such as Formatting, Management, Charts, Pivot Tables, and Formulas, representing typical use cases of spreadsheets.
QUALITATIVE ANALYSIS
(a) FRIDAY w/o self-directed learning.
(b) FRIDAY after learning text box control.
(c) FRIDAY after mastering image insertion.
In our qualitative analysis, we design a task to create a PowerPoint slide to introduce OS-Copilot. The specific content, font, font size, and other details required for the slide are elaborately described in the task instruction.
The experimental results, as shown in Figure (a), demonstrate that without self-directed learning, FRIDAY struggles to effectively control font types, sizes, and the positioning and sizing of inserted images.
Nevertheless, following a period of self-directed learning, FRIDAY acquires various text box configuration tools, such as changing the text color, adjusting the font size of slide text, and modifying the line spacing of body text in PowerPoint presentations, as illustrated in Figure (b).
Further exploration leads FRIDAY to learn how to adjust the size and position of inserted images, ultimately successfully completing the task, as depicted in Figure (c).
Community
Join our community to connect with other enthusiasts, share your tools and demos, and collaborate on innovative projects. Stay engaged and get the latest updates by following us:
- Discord: Join our Discord server for real-time discussions, support, and to share your work with the community. Click here to join: Discord Server.
- Twitter: Follow us on Twitter @oscopilot for the latest news, updates, and highlights from our community.
BibTeX
@misc{wu2024oscopilot,
title={OS-Copilot: Towards Generalist Computer Agents with Self-Improvement},
author={Zhiyong Wu and Chengcheng Han and Zichen Ding and Zhenmin Weng and Zhoumianze Liu and Shunyu Yao and Tao Yu and Lingpeng Kong},
year={2024},
eprint={2402.07456},
archivePrefix={arXiv},
primaryClass={cs.AI}
}