Evaluating Off-the-Shelf NLP Tools for German

This repository contains the scripts, dataset, and evaluation results from the paper:

Katrin Ortmann, Adam Roussel, and Stefanie Dipper. 2019. Evaluating Off-the-Shelf NLP Tools for German. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS), 212--222. [pdf] [bib]

scripts/
- The main scripts which define how the systems are loaded and called (per annotation level): tokens.py, pos.py, morph.py, lemmas.py, depparse.py
- common.py Document model und morphology format conversion
- Evaluation scripts: eval_bounds.py for tokenization and eval_annotations.py for everything else
eval/
- The results of the evaluation are stored here in two csv tables: results.csv for the accuracy evaluation and timing.csv for the performance evaluation.
- The plots and tables generated by scripts/analysis.py are also stored here.
data/
- Gold standard datasets (data/gold/) and system output (data/system/)
- Each system's output is in an appropriately named subdir, and each of these system-specific subdirs will contain one annotated output file per domain
- The directory txt/ contains the unannotated original plaintext files.

Usage

In theory you can use the provided Makefile to run the experiments, but in practice it is a lot of work to install all of these systems individually. We hope to eventually provide a Dockerfile to make running all of the experiments easier.

However, performing the evaluation (make evaluate), i.e. comparing the system output to the gold standard, and calculating performance statistics (make analysis) should work, provided you have Numpy, Pandas, Matplotlib, and Seaborn installed.

Results Preview

A more detailed evaluation can be found in the paper cited above.

License

The evaluation data is licensed under CC BY-SA 3.0, except for the TED talk sample, which is provided under CC BY–NC–ND 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
eval		eval
scripts		scripts
.editorconfig		.editorconfig
LICENSE		LICENSE
README.md		README.md
accplot.png		accplot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating Off-the-Shelf NLP Tools for German

Contents

Usage

Results Preview

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rubcompling/konvens2019

Folders and files

Latest commit

History

Repository files navigation

Evaluating Off-the-Shelf NLP Tools for German

Contents

Usage

Results Preview

Related

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages