| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 13 Feb 2025 23:43:21 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"67ae8399-673b"
expires: Mon, 29 Dec 2025 20:43:08 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 98F2:15317B:94BE17:A6DA1C:6952E584
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 20:33:08 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210058-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767040388.401455,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 95c2edcad775cfc322897fe9e6fb728bfee8940a
content-length: 6313
TestEval Leaderboard
Wenhan Wang1*, Chenyuan Yang2*, Zhijie Wang1*, Yuheng Huang3, Zhaoyang Chu4,
1University of Alberta, 2University of Illinois Urbana-Champaign,
3The University of Tokyo,
TestEval: Benchmarking Large Language Models for Test Case Generation
Wenhan Wang1*, Chenyuan Yang2*, Zhijie Wang1*, Yuheng Huang3, Zhaoyang Chu4,
Da Song1, Lingming Zhang2, An Ran Chen1, Lei Ma3, 1
1University of Alberta, 2University of Illinois Urbana-Champaign,
3The University of Tokyo,
4Huazhong University of Science and Technology
🏆 TestEval Leaderboard
🙋 How to interpret the results?
- Overall coverage denotes the line/branch coverage by generating N test cases.
- Coverage@k denotes the line/branch coverage by using only k out of N test cases.
- Target line/branch/path coverage denotes the accuracy of covering specific line/branch/path by instruction.
- Baseline denotes the accuracy of covering specific line/branch/path without instruction.
- "Size" here is the amount of activated model weight during inference.
- 💚 means open weights and open data. 💙 means open weights and open SFT data, but the base model is not data-open. What does this imply? 💚💙 models open-source the data such that one can concretely reason about contamination.
📖 BibTeX
@inproceedings{wang2025testeval,
title={TESTEVAL: Benchmarking Large Language Models for Test Case Generation},
author={Wenhan Wang and Chenyuan Yang and Zhijie Wang and Yuheng Huang and Zhaoyang Chu
and Da Song and Lingming Zhang and An Ran Chen and Lei Ma},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
year={2025}
}