You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 15, 2024. It is now read-only.
If your research is related to or based on our ChID dataset (or the version adapted for the competition), please kindly cite it:
@inproceedings{zheng-etal-2019-chid,
title = "{C}h{ID}: A Large-scale {C}hinese {ID}iom Dataset for Cloze Test",
author = "Zheng, Chujie and Huang, Minlie and Sun, Aixin",
booktitle = "ACL",
year = "2019"
}
content: The given passage where the original idioms are replaced by placeholders #idiom#
realCount: The number of placeholders or blanks
groundTruth: The golden answers in the order of blanks
candidates: The given candidates in the order of blanks
Baseline Codes
Please refer to Codes for baseline.
Competition
We are organizing a competition adapted from the ChID dataset. For the adapted data and baseline codes of the competition, please refer to Competition.
Update History
Update 191001
The competition has finished. We have uploaded all split sets of ChID! Feel free to use it in your research.
Update 190702
The file wordList.txt used in baselines (both for paper and for competition) has been uploaded.
Note that due to the potential differences in equipments and word segmentation tools, your segmentation results may not perfectly match with the vocabulary we provide. For the sake of performance, we suggest you do the segmentation and get the vocabulary list by yourself.
Competition
We are organizing a competition based on our ChID dataset, and here is the website. The adapted corpus establishes up connections between blanks, and adopts a new type of problem. A list of passages (not an isolated one) is provided and the answers need to be selected from a given set of candidate idioms with fixed length (for more details, please refer to the competition website).
The public data contains the training data (both the passages with blanks and the golden answers), the development data (only the passages, and the answers will be available later) and the corpus of idiom explanations.