| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 02 Dec 2025 17:59:59 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"692f291f-5545"
expires: Mon, 29 Dec 2025 06:54:39 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 1DBF:2685F2:85180D:95BD9F:69522357
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 06:44:39 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210025-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766990680.682519,VS0,VE221
vary: Accept-Encoding
x-fastly-request-id: 84bc7a324529114c93ccadac268cca65f249f028
content-length: 6044
CoRe
CoRe 🥑
⭐ NeurIPS 2025 D&B Track Spotlight ⭐Benchmarking LLMs' Code Reasoning Capabilities through Static Analysis Tasks
💡 Overview
CoRe is a high-quality, human-verified benchmark designed to evaluate LLMs on fundamental static analysis tasks. CoRe includes 12,553 task instances spanning data dependency, control dependency, and information flow across programs written in C/C++, Java, and Python. To ensure semantic diversity and reasoning complexity, we propose a semantics-aware diverse sampling strategy that selects targets and task instances based on structural coverage and dependency depth.
🏆 Leaderboard
Dependency Classification
Trace Generation
Dependency Source Enumeration
F1 Score
| Model | Data Dependency | Control Dependency | Information Flow | Overall |
|---|
Correct Trace Rate (%)
| Model | Data Dependency | Control Dependency | Information Flow | Overall |
|---|
Exact Match (%)
| Model | Data Dependency | Control Dependency | Information Flow | Overall |
|---|
📝 Citation
@article{xie2025core,
title={CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks},
author={Xie, Danning and Zheng, Mingwei and Liu, Xuwei and Wang, Jiannan and Wang, Chengpeng and Tan, Lin and Zhang, Xiangyu},
journal={arXiv preprint arXiv:2507.05269},
year={2025}
}