You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unzip the desired dataset zip and move the resulting folder to ~/data.
Pre-process data
Parse and filter commits and messages: cd ~/commitgenpython ./preprocess.py FOLDER_NAME --language LANGUAGE, where FOLDER_NAME is the name of the folder from the previous step. Add the '--atomic' flag to keep only atomic commits. This will generate a pre-processed version of the dataset in a pickle file in ~/data/preprocessing. Try python ./preprocess.py --help for more details on additional pre-processing parameters.
Generate training data: cd ~/commitgen./buildData.sh PICKLE_FILE_NAME LANGUAGE (PICKLE_FILE_NAME with no .pickle).
Train the model
1.- Run the model cd ~/commitgen./run.sh PICKLE_FILE_NAME LANGUAGE (PICKLE_FILE_NAME with no .pickle)
You can also dowload additional github project data by using our crawler do cd ~/commitgen and run python crawl_commits.py --help for more details on how to do it.
About
Code and data for the paper "A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes"