You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR4NMT provides a general framework to incorporate multiple, arbitrary prior knowledge into Neural Machine Translation. Please refer to the following paper for details:
PR4NMT is built on top of THUMT. It requires THEANO 0.8.2 or above version (0.8.2 is recommended)
pip install theano==0.8.2
Preparation
Firstly, modify THUMT.config in the config directory to specify features and hyper-parameters. if bilingual dictionary (or phrase table) feature is selected, use cPickle to stringify a bilingual dictionary (or phrase table) in the following format:
import cPickle
word_table = [[source word 1, target word 1] , [source word 2, target word 2], ...]
cPickle.dump(word_table, open('word_table', 'w'))
Training
The trainer.py script in the scripts folder is used for training NMT models. We recommend initializing MRT with the best model output by MLE using the --init-model-file option. The command for running PR is given by:
The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com.
About
Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization