| CARVIEW |
NYU CTF
Bench

A benchmark of CTF challenges to test LLM capabilities in cybersecurity
NeurIPS'24 Datasets and Benchmarks
Minghao Shao*, Sofija Jancheska*, Meet Udeshi*, Brendan Dolan-Gavitt*,
Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg,
Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique
The NYU CTF Bench is designed to evaluate cybersecurity capabilities of LLM agents. We provide difficult real-world CTF challenges to facilitate research in improving LLMs at interactive cybersecurity tasks and complex automated task planning. Evaluating LLM agents on the NYU CTF challenges yields insights into their potential for AI-driven cybersecurity to perform real-world threat management. Check out the paper for details.
@inproceedings{shao2024nyuctfbench,
author = {Shao, Minghao and Jancheska, Sofija and Udeshi, Meet and Dolan-Gavitt, Brendan and xi, haoran and Milner, Kimberly and Chen, Boyuan and Yin, Max and Garg, Siddharth and Krishnamurthy, Prashanth and Khorrami, Farshad and Karri, Ramesh and Shafique, Muhammad},
booktitle = {Advances in Neural Information Processing Systems},
pages = {57472--57498},
title = {NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security},
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/69d97a6493fbf016fff0a751f253ad18-Paper-Datasets_and_Benchmarks_Track.pdf},
volume = {37},
year = {2024}
}
How to Submit
All submissions are managed at the leaderboard submissions github repository. Follow the README on the repository to make a submission.