| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 26 Dec 2024 17:12:54 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"676d8e96-4d09"
expires: Mon, 29 Dec 2025 04:07:43 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 9507:2118F1:829659:92EFD0:6951FC36
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 03:57:43 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210070-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766980663.921503,VS0,VE202
vary: Accept-Encoding
x-fastly-request-id: 383131c56abb78ef826453e2b36836834c9d0905
content-length: 5251
EvalPlus Leaderboard
EvalPlus evaluates AI Coders with rigorous tests.
π EvalPlus Leaderboard π
EvalPlus evaluates AI Coders with rigorous tests.
π’ News: Beyond correctness, how's their code efficiency? Checkout πEvalPerf!
π Notes
- Evaluated using HumanEval+ version 0.1.10; MBPP+ version 0.2.0.
- Models are ranked according to pass@1 using greedy decoding. Setup details can be found here.
- β¨ marks models evaluated using a chat setting, while others perform direct code completion.
- Both MBPP and MBPP+ referred in our leaderboard use a subset (399 tasks) of hand-verified problems from MBPP-sanitized (427 tasks), to make sure the programming task is well-formed (e.g., test_list is not wrong).
- Model providers have the responsibility to avoid data contamination. Models trained on close data can be affected by contamination.
- π means open weights and open data. π means open weights and open SFT data, but the base model is not data-open. What does this imply? ππ models open-source the data such that one can concretely reason about contamination.
- "Size" here is the amount of activated model weight during inference.