| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 26 Dec 2024 17:12:54 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"676d8e96-c67"
expires: Tue, 30 Dec 2025 10:53:36 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: E78F:2D64E0:9E8E47:B227F0:6953ACD7
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 10:43:36 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210071-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767091416.227851,VS0,VE215
vary: Accept-Encoding
x-fastly-request-id: b1c140b209025f01b0bf64d44b286a74dde436c7
content-length: 1273
Benchmarks by EvalPlus Team
Benchmarks @ EvalPlus
EvalPlus team aims to build high-quality and precise evaluators to understand LLM performance on code related tasks:
🔨 HumanEval+ & MBPP+
HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.
Go to EvalPlus Leaderboard🚀 EvalPerf: Code Efficiency Evaluation
Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.
Evalperf Leaderboard📦 RepoQA: Long-Context Code Understanding
Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.
Learn about RepoQA