| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 26 Dec 2024 17:12:54 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"676d8e96-c67"
expires: Mon, 29 Dec 2025 12:15:24 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: EEBE:3FD64F:8C986E:9DD0F2:69526E84
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 12:05:24 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210070-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767009924.147731,VS0,VE212
vary: Accept-Encoding
x-fastly-request-id: bc216bf9a6aec756c1f3b595b27df3337a5ec7b4
content-length: 1273
Benchmarks by EvalPlus Team
Benchmarks @ EvalPlus
EvalPlus team aims to build high-quality and precise evaluators to understand LLM performance on code related tasks:
🔨 HumanEval+ & MBPP+
HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.
Go to EvalPlus Leaderboard🚀 EvalPerf: Code Efficiency Evaluation
Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.
Evalperf Leaderboard📦 RepoQA: Long-Context Code Understanding
Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.
Learn about RepoQA