CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 26 Dec 2024 17:12:54 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"676d8e96-c67" expires: Tue, 30 Dec 2025 10:53:36 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: E78F:2D64E0:9E8E47:B227F0:6953ACD7 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 10:43:36 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210071-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767091416.227851,VS0,VE215 vary: Accept-Encoding x-fastly-request-id: b1c140b209025f01b0bf64d44b286a74dde436c7 content-length: 1273 Benchmarks by EvalPlus Team

Benchmarks @ EvalPlus

EvalPlus team aims to build high-quality and precise evaluators to understand LLM performance on code related tasks:

🔨 HumanEval+ & MBPP+

HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.

Go to EvalPlus Leaderboard

🚀 EvalPerf: Code Efficiency Evaluation

Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.

Evalperf Leaderboard

📦 RepoQA: Long-Context Code Understanding

Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.

Learn about RepoQA

Original Source | Taken Source