| CARVIEW |
Select Language
Question Examples
Here are some question examples of GTA. All questions are tool-implicit, step-implicit and contains multimodal context inputs. They are easy-to-understand questions with clear goals, based on real-world scenarios, helpful for humans while complex for AI assistants to solve. The JSON format data example is available at Hugging Face.
Dataset Construction
Two steps are performed in the dataset construction pipeline.
- Query construction. Initial exemplars and instruction documents are designed by experts and given to human annotators. Annotators brainstorm and design more samples based on the exemplars.
- Tool chain construction. Annotators manually call the deployed tools to check the executability of each query in the query set. Then they annotate the ground truth tool chains for each query.
🏆 GTA Leaderboard
Notes
- Models labeled with 🔶 are API-based models, while others are open-source models.
- Refer to Github to evaluate models on GTA.
BibTeX
@misc{wang2024gtabenchmarkgeneraltool,
title={GTA: A Benchmark for General Tool Agents},
author={Jize Wang and Zerun Ma and Yining Li and Songyang Zhang and Cailian Chen and Kai Chen and Xinyi Le},
year={2024},
eprint={2407.08713},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.08713},
}