You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the scripts, dataset, and evaluation results from the paper:
Katrin Ortmann, Adam Roussel, and Stefanie Dipper. 2019. Evaluating Off-the-Shelf NLP Tools for German. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS), 212--222. [pdf][bib]
Contents
scripts/
The main scripts which define how the systems are loaded and called (per annotation level): tokens.py, pos.py, morph.py, lemmas.py, depparse.py
common.py Document model und morphology format conversion
Evaluation scripts: eval_bounds.py for tokenization and eval_annotations.py for everything else
eval/
The results of the evaluation are stored here in two csv tables: results.csv for the accuracy evaluation and timing.csv for the performance evaluation.
The plots and tables generated by scripts/analysis.py are also stored here.
data/
Gold standard datasets (data/gold/) and system output (data/system/)
Each system's output is in an appropriately named subdir, and each of these system-specific subdirs will contain one annotated output file per domain
The directory txt/ contains the unannotated original plaintext files.
Usage
In theory you can use the provided Makefile to run the experiments, but in practice it is a lot of work to install all of these systems individually. We hope to eventually provide a Dockerfile to make running all of the experiments easier.
However, performing the evaluation (make evaluate), i.e. comparing the system output to the gold standard, and calculating performance statistics (make analysis) should work, provided you have Numpy, Pandas, Matplotlib, and Seaborn installed.
Results Preview
A more detailed evaluation can be found in the paper cited above.