You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#disclaimer:
This is work in progress. If you encounter any problems while compiling or using it, it is likely our mistake not yours. Please contact wammar@cs.cmu.edu with questions, comments, and suggestions.
#description:
This is an implementation of the CRF autoencoder framework for four tasks:
bitext word alignment
part-of-speech tagging
code switching
dependency parsing
Our NIPS 2014 paper describes the CRF autoencoder framework as well as the bitext word alignment and part-of-speech induction tasks in detail. Details on code-switching can be found in our EMNLP shared task paper.
#how to build
I'm assuming your default compiler is either gcc 4.6.3, clang 3.1-8 (or later "fingers crossed")
bitext word alignment: make -f Makefile-latentCrfAligner
part-of-speech tagging: make -f Makefile-latentCrfPosTagger
code switching: make -f Makefile-latentCrfPosTagger (this is not a typo)
dependency parsing: make -f Makefile-latentCrfParser (still in the works)
example invocations:
part of speech tagging:
--output-prefix prefix # just a filename prefix for files generated during training
--train-data sent-per-line-space-delimited-tokens.txt # example file below
--feat LABEL_BIGRAM --feat PRECOMPUTED --feat EMISSION
--feat BOUNDARY_LABELS --feat PRECOMPUTED_XIM2 --feat PRECOMPUTED_XIM1
--feat PRECOMPUTED_XI --feat PRECOMPUTED_XIP1 --feat PRECOMPUTED_XIP2
--feat OTHER_ALIGNERS
--min-relative-diff 0.001
--optimizer adagrad --minibatch-size 8000
--max-iter-count 50
--cache-feats true
--wordpair-feats word-level-features```
for a list of all options: execute ``latentCrfAligner --help``
### snippet of the file ``sent-per-line-space-delimited-tokens.txt``
Ms. Haag plays Elianti .
Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990 .