| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://sparksofagi.github.io/MHPP/
x-github-request-id: 2C6B:2C10E1:8A6D71:9B6D8E:6952519C
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 10:02:05 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210064-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767002525.915925,VS0,VE202
vary: Accept-Encoding
x-fastly-request-id: b5b73089d207d88bd0c9954a2e7d80f0eca26b97
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 12 Nov 2024 07:27:53 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"67330379-3ca4"
expires: Mon, 29 Dec 2025 10:12:05 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 83A0:2680BD:89ABCF:9AAA24:6952519C
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 10:02:05 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210064-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767002525.147082,VS0,VE230
vary: Accept-Encoding
x-fastly-request-id: fccf548fba813bd03e4771744857e69ba980ab98
content-length: 3652
MHPP Leaderboard
MHPP Evaluates AI Coders Performance against Diverse Code
Generation Challenges
🏆 MHPP Leaderboard 🏆
MHPP Evaluates AI Coders Performance against Diverse Code
Generation Challenges
📝 Notes
- Models are ranked based on their pass@1 scores using greedy decoding. For the sampling results, we set the temperature to 0.7 and sampled 100 times. We recommend using 1024 tokens as the context length, considering the length of problems and potential responses.
- In the table, positions marked with a '-' indicate that the data was not collected due to limited resources or budget constraints.
🤗 Acknowledgement and More Leaderboards
We greatly thank the authors of the EvalPlus Leaderboard for allowing us to borrow their leaderboard code! In addition to MHPP leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as: