You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DocRED is a widely used benchmark for document-level relation extraction. However, the DocRED dataset contains a significant percentage of false negative examples (incomplete annotation). We revised 4,053 documents in the DocRED dataset and resolved its problems. We released this dataset as: Re-DocRED dataset.
The Re-DocRED Dataset resolved the following problems of DocRED:
Resolved the incompleteness problem by supplementing large amounts of relation triples.
Addressed the logical inconsistencies in DocRED.
Corrected the coreferential errors within DocRED.
Statistics of Re-DocRED
The Re-DocRED dataset is located as ./data directory, the statistics of the dataset are shown below:
Train
Dev
Test
# Documents
3,053
500
500
Avg. # Triples
28.1
34.6
34.9
Avg. # Entities
19.4
19.4
19.6
Avg. # Sents
7.9
8.2
7.9
Citation
If you find our work useful, please cite our work as:
@inproceedings{tan2022revisiting,
title={Revisiting DocRED – Addressing the False Negative Problem in Relation Extraction},
author={Tan, Qingyu and Xu, Lu and Bing, Lidong and Ng, Hwee Tou and Aljunied, Sharifah Mahani},
booktitle={Proceedings of EMNLP},
url={https://arxiv.org/abs/2205.12696},
year={2022}
}