You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length.
🌈 Update
[2024.05.15] CodeScope was accepted into the ACL 2024 Main Conference, thanking the academic community for its recognition.
Please cite the paper if you use the data or code from CodeScope.
@misc{yan2023codescope,
title={CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation},
author={Weixiang Yan and Haitian Liu and Yunkun Wang and Yunzhe Li and Qian Chen and Wen Wang and Tingyu Lin and Weishan Zhao and Li Zhu and Shuiguang Deng and Hari Sundaram},
year={2023},
eprint={2311.08588},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Contact
For questions, please feel free to reach out via email at weixiangyan@ucsb.edu.
About
[ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation