- Contents
- Data Collection
- Naming Patterns
- Complexity and Maintainability
- Code Similarity
- Labels in the Reasoning Process
- Citation
We collect a total of 19,898 GitHub repositories and 926,935 source code files, corresponding to arXiv papers from the first quarter of 2020 to the first quarter of 2025. Our arXiv dataset is organized across two GitHub repositories: Python files are in LLM_code/arxiv_dataset, and C/C++ code is in LLM_code/arxiv_dataset_cpp.
├── 2020 // Year
├── Q1 // Quarter
├── repo_name // Repository name
├── xxx.py // Project Python file
...
├── time_info.txt // File creation/modification time information
We utilize Code4Bench, a multidimensional benchmark based on Codeforces data. This dataset contains user submissions on Codeforces before 2020, which were barely impacted by LLMs. We generate code using LLMs with various prompting strategies.
we categorize variable, function, and file names into several distinct formats (e.g. snake_case). The length of the names has also been considered.
[!IMPORTANT] > Finding 1: The coding style of human-written code may be influenced by LLMs: they may not only mirror existing norms but also subtly reshape them, gradually pushing human developers toward greater stylistic alignment with LLM-preferred conventions.
Cyclomatic complexity is a metric used to measure the number of linearly independent paths in the code.
[!IMPORTANT] > Finding 2: For I/O algorithm problems, LLM-generated code tends to exhibit higher maintainability, lower difficulty, and fewer bugs than human-written solutions, which aligns with the evolution of Github code after 2023Q1. Moreover, the quality of reference-guided code is generally inferior to that of directly generated code.
We compare three versions of each problem’s code: the original human-authored solution (AC), the LLM’s output given only the problem description (ANS), and the LLM’s output when additionally conditioned on the human solution (REF). We compute pairwise cosine and Jaccard similarities among AC, ANS, and REF.
[!IMPORTANT] > Finding 3: LLMs can effectively mimic human coding style when given reference code, but without such guidance, their generated solutions diverge significantly from human-written code—especially in IO algorithm tasks.
To further refine our analysis, we individually examine the matching of reasoning and labels for each question.
Let
We then define the
where
[!IMPORTANT] > Finding 4: LLMs have low algorithm analysis capabilities, are more inclined to approach C/C++ code from an algorithmic perspective, and harder problems may better activate their algorithmic reasoning capabilities.
@article{xu2025code_transformed,
title={code\_transformed: The Influence of Large Language Models on Code},
author={Xu, Yuliang and Huang, Siming and Geng, Mingmeng and Wan, Yao and Shi, Xuanhua and Chen, Dongping},
journal={arXiv preprint arXiv:2506.12014},
year={2025}
}




