CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 26 Dec 2024 17:12:54 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"676d8e96-c67" expires: Mon, 29 Dec 2025 12:15:24 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: EEBE:3FD64F:8C986E:9DD0F2:69526E84 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 12:05:24 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210070-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767009924.147731,VS0,VE212 vary: Accept-Encoding x-fastly-request-id: bc216bf9a6aec756c1f3b595b27df3337a5ec7b4 content-length: 1273 Benchmarks by EvalPlus Team

Benchmarks @ EvalPlus

EvalPlus team aims to build high-quality and precise evaluators to understand LLM performance on code related tasks:

🔨 HumanEval+ & MBPP+

HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.

Go to EvalPlus Leaderboard

🚀 EvalPerf: Code Efficiency Evaluation

Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.

Evalperf Leaderboard

📦 RepoQA: Long-Context Code Understanding

Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.

Learn about RepoQA

Original Source | Taken Source