CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 13 Feb 2025 23:43:21 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67ae8399-673b" expires: Mon, 29 Dec 2025 20:43:08 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 98F2:15317B:94BE17:A6DA1C:6952E584 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 20:33:08 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210058-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767040388.401455,VS0,VE204 vary: Accept-Encoding x-fastly-request-id: 95c2edcad775cfc322897fe9e6fb728bfee8940a content-length: 6313 TestEval Leaderboard

TestEval: Benchmarking Large Language Models for Test Case Generation

Wenhan Wang^1, Chenyuan Yang^2, Zhijie Wang^1*, Yuheng Huang³, Zhaoyang Chu⁴,
Da Song¹, Lingming Zhang², An Ran Chen¹, Lei Ma^{3, 1}

¹University of Alberta, ²University of Illinois Urbana-Champaign, ³The University of Tokyo,
⁴Huazhong University of Science and Technology

🏆 TestEval Leaderboard

Overall Target

Line Branch Path

Branch

Line

🙋 How to interpret the results?

Overall coverage denotes the line/branch coverage by generating N test cases.
Coverage@k denotes the line/branch coverage by using only k out of N test cases.
Target line/branch/path coverage denotes the accuracy of covering specific line/branch/path by instruction.
Baseline denotes the accuracy of covering specific line/branch/path without instruction.
"Size" here is the amount of activated model weight during inference.
💚 means open weights and open data. 💙 means open weights and open SFT data, but the base model is not data-open. What does this imply? 💚💙 models open-source the data such that one can concretely reason about contamination.

📖 BibTeX

@inproceedings{wang2025testeval,
 title={TESTEVAL: Benchmarking Large Language Models for Test Case Generation},
 author={Wenhan Wang and Chenyuan Yang and Zhijie Wang and Yuheng Huang and Zhaoyang Chu
 and Da Song and Lingming Zhang and An Ran Chen and Lei Ma},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 year={2025}
}

Original Source | Taken Source