Re-DocRED Dataset

This repository contains the dataset of our EMNLP 2022 research paper Revisiting DocRED – Addressing the False Negative Problem in Relation Extraction.

DocRED is a widely used benchmark for document-level relation extraction. However, the DocRED dataset contains a significant percentage of false negative examples (incomplete annotation). We revised 4,053 documents in the DocRED dataset and resolved its problems. We released this dataset as: Re-DocRED dataset.

The Re-DocRED Dataset resolved the following problems of DocRED:

Resolved the incompleteness problem by supplementing large amounts of relation triples.
Addressed the logical inconsistencies in DocRED.
Corrected the coreferential errors within DocRED.

Statistics of Re-DocRED

The Re-DocRED dataset is located as ./data directory, the statistics of the dataset are shown below:

	Train	Dev	Test
# Documents	3,053	500	500
Avg. # Triples	28.1	34.6	34.9
Avg. # Entities	19.4	19.4	19.6
Avg. # Sents	7.9	8.2	7.9

Citation

If you find our work useful, please cite our work as:

@inproceedings{tan2022revisiting,
  title={Revisiting DocRED – Addressing the False Negative Problem in Relation Extraction},
  author={Tan, Qingyu and Xu, Lu and Bing, Lidong and Ng, Hwee Tou and Aljunied, Sharifah Mahani},
  booktitle={Proceedings of EMNLP},
  url={https://arxiv.org/abs/2205.12696},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Re-DocRED Dataset

Statistics of Re-DocRED

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

tonytan48/Re-DocRED

Folders and files

Latest commit

History

Repository files navigation

Re-DocRED Dataset

Statistics of Re-DocRED

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages