You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification
Dataset Description
CodeHaluEval is a comprehensive evaluation tool for assessing the performance of Large Language Models (LLMs) in code generation tasks. It includes 8,883 samples from 699 diverse programming tasks, specifically designed to quantify and understand the tendencies of LLMs to produce code hallucinations and other errors during code generation. Utilizing our newly developed CodeHalu dynamic detection algorithm, researchers can identify and categorize various types of code issues, thereby enhancing the model’s application efficacy in real-world programming environments.
Please consider citing if you find our work useful:
@misc{tian2024codehaluinvestigatingcodehallucinations,
title={CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification},
author={Yuchen Tian and Weixiang Yan and Qian Yang and Xuandong Zhao and Qian Chen and Wen Wang and Ziyang Luo and Lei Ma and Dawn Song},
year={2024},
eprint={2405.00253},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.00253},
}