You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We introduce DependEval, a hierarchical benchmark for evaluating LLMs on repository-level code understanding across 8 programming languages.
DependEval comprises 2,683 curated repositories across 8 programming languages, and evaluates models on three hierarchical tasks: Dependency Recognition, Repository Construction, and Multi-file Editing.
Our findings highlight key challenges in applying LLMs to large-scale development, and lay the groundwork for future improvements in repository-level understanding.
How to Run
# Implement your model in the `inference_func` inside run.py# Then run the following commands for automatic inference and evaluation
conda create -n dependeval python=3.10 -y
conda activate dependeval
pip install -r requirements.txt
bash run.sh
Citation
Feel free to cite us
@misc{du2025dependevalbenchmarkingllmsrepository,
title={DependEval: Benchmarking LLMs for Repository Dependency Understanding},
author={Junjia Du and Yadi Liu and Hongcheng Guo and Jiawei Wang and Haojian Huang and Yunyi Ni and Zhoujun Li},
year={2025},
eprint={2503.06689},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2503.06689},
}