You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Codes and corpora for paper "VCWE: Visual Character-Enhanced Word Embeddings" (NAACL 2019)
Requirement
pytorch: 1.0.0
python: 3.6.5
numpy: 1.15.4
Preparation
The input file is a plain corpus. The first line of the vocabulary file contains two numbers about the corpus, the first one is the number of lines and the second one is the number of tokens (repeatable). Each subsequent line contains a token and its frequencies.
Serial Dataset Num Pairs Not found Rho
1 CH-297.txt 297 33 0.5582
2 CH-RG-65.txt 65 0 0.7461
3 CH-MC-30.txt 30 0 0.7765
4 CH-240.txt 240 16 0.5554
Citation
@inproceedings{sun-etal-2019-vcwe,
title = "{VCWE}: Visual Character-Enhanced Word Embeddings",
author = "Sun, Chi and
Qiu, Xipeng and
Huang, Xuanjing",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/N19-1277",
pages = "2710--2719"
}
About
VCWE: Visual Character-Enhanced Word Embeddings (NAACL 2019)