CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

Leaderboard | 📄 Paper | 🤗 Access from HuggingFace datasets |

CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length.

🌈 Update

[2024.05.15] CodeScope was accepted into the ACL 2024 Main Conference, thanking the academic community for its recognition.
[2023.11.15] 🎉🎉🎉 CodeScope is published！🎉🎉🎉

Datasets

🤗Hugging Face or Google Drive or Github Data

Code

CodeScope evaluates the comprehensive ability of LLMs in code understanding and code generation from eight coding tasks.

Code Understanding

Code Generation

Citation

Please cite the paper if you use the data or code from CodeScope.

@misc{yan2023codescope,
      title={CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation},
      author={Weixiang Yan and Haitian Liu and Yunkun Wang and Yunzhe Li and Qian Chen and Wen Wang and Tingyu Lin and Weishan Zhao and Li Zhu and Shuiguang Deng and Hari Sundaram},
      year={2023},
      eprint={2311.08588},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

For questions, please feel free to reach out via email at weixiangyan@ucsb.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
ExecEval		ExecEval
automated_testing		automated_testing
code_optimization		code_optimization
code_repair		code_repair
code_review		code_review
code_smell		code_smell
code_summarization		code_summarization
code_translation		code_translation
data		data
images		images
program_synthesis		program_synthesis
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

🌈 Update

Datasets

Code

Code Understanding

Code Generation

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

WeixiangYAN/CodeScope

Folders and files

Latest commit

History

Repository files navigation

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

🌈 Update

Datasets

Code

Code Understanding

Code Generation

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages