ContProto

This repository contains the code for the ACL 2023 paper "Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning"

Requirements

torch>=1.6
pytorch-lightning>=0.9.0
tokenizers
transformers

To install the dependencies, run:

pip install -r requirements.txt

Our experiments are run on a single Nvidia V100 32GB GPU.

Data Format

Before running the experiments, you will need to convert the NER dataset from CoNLL format to span-based format by running the following scripts:

cd data
bash data_preprocess.sh

We have provided the processed German dataset from CoNLL as an example in data/conll03_de.

Generating Pseudo-labeled Data

To generate the pseudo-labeled data, run the following script:

bash run_generate_pseudo.sh de conll03

For example, the generated German pseudo labels will be stored in train_logs/conll03/genpseudo_de_0/.

Training

To start training of ContProto, run the following script:

bash run_contpro_auto-margin.sh de conll03

Citation

If you find this repository useful, please cite our paper:

@inproceedings{zhou-etal-2023-improving,
    title = "Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning",
    author = "Zhou, Ran  and
      Li, Xin  and
      Bing, Lidong  and
      Cambria, Erik  and
      Miao, Chunyan",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2023",
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
dataloaders		dataloaders
models		models
README.md		README.md
contproto.png		contproto.png
eval_metric.py		eval_metric.py
generate_pseudo.py		generate_pseudo.py
radom_seed.py		radom_seed.py
requirements.txt		requirements.txt
run_contpro_auto-margin.sh		run_contpro_auto-margin.sh
run_generate_pseudo.sh		run_generate_pseudo.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ContProto

Requirements

Data Format

Generating Pseudo-labeled Data

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DAMO-NLP-SG/ContProto

Folders and files

Latest commit

History

Repository files navigation

ContProto

Requirements

Data Format

Generating Pseudo-labeled Data

Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages