You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the official data and evaluation code for the NAACL 2022 paper:
Automatic Correction of Human Translations
Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, and John DeNero [arXiv]
The ACED Corpus
You can load the corpus directly from the data/ directory. The corpus consists of three translation error correction datasets from different domains: asics (marketing), emerson (technical), and digitalocean (technical). For more information on the data, please refer to our paper.
Each directory contains the train, dev, test data, with the following files for each:
.src: English source sentence (s)
.pert: original German translation (t)
.tgt: corrected German translation (t')
Evaluation
To install the (minimal) dependencies for evaluation, we recommend setting up a virtualenv:
To calculate precision, recall, and F-scores, we use the errant toolkit to generate and compare
MaxMatch (M2) files.
You can evaluate a model output file (one sentence per line) by running:
sh eval_m2.sh <dataset> <split> /path/to/model/output.tgt
where dataset is one of {asics, cricut, digitalocean} and split is one of {dev, test}. This script generates the gold and hypothesis m2 files
and outputs the precision, recall, and F-0.5 scores with errant_compare.
You can evaluate a model output file (one sentence per line) by running:
sh eval_gleu.sh <dataset> <split> /path/to/model/output.tgt
Sentence-level Accuracy
Error labels are provided for the ASICS test set. You can evaluate the per-category sentence-level accuracy by running:
sh eval_sentence_level.sh /path/to/model/output.tgt
Reference
@inproceedings{lin2022automatic,
title={Automatic Correction of Human Translations},
author={Lin, Jessy and Kovacs, Geza and Shastry, Aditya and Wuebker, Joern and DeNero, John},
booktitle={{NAACL}},
year={2022}
}
Acknowledgements
We use code from the following repositories for our evaluation: