You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
CanItEdit is a benchmark for evaluating LLMs on instructional code editing, the task of
updating a program given a natural language instruction. The benchmark contains 105
hand-crafted Python programs with before and after code blocks,
two types of natural language instructions (descriptive and lazy), and a hidden test suite.
This repository provides code for evaluating models on the benchmark, and the code to reproduce
EditPackFT and EditCoder, a dataset and a LLM built for instructional code editing.
The CanItEdit benchmark dataset, EditCoder model, and EditPackFT dataset can be found on HuggingFace:
./benchmark contains the CanItEdit benchmark dataset and code for generating and evaluating completions
./editcoder contains code to train an EditCoder model
./editpackft contains code to reproduce the EditPackFT dataset
./requirements.txt contains the requirements for running the code in this repository
Citation
If you use this code or the CanItEdit benchmark, please cite our paper:
@inproceedings{cassano:canitedit,
title={Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions},
author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha},
booktitle={Conference on Language Modeling (COLM)},
year={2024},
}
About
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions