You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch-CTC is an implementation of CTC (Connectionist Temporal Classification) beam search decoding for PyTorch. C++ code borrowed liberally from TensorFlow with some improvements to increase flexibility.
Installation
The library is largely self-contained and requires only PyTorch and CFFI. Building the C++ library requires at least GCC-5. If gcc-5 or later is not your default compiler, you may specify the path via environment variables. KenLM language modeling support is also optionally included, and enabled by default.
# get the code
git clone --recursive https://github.com/ryanleary/pytorch-ctc.git
cd pytorch-ctc
# install dependencies (PyTorch and CFFI)
pip install -r requirements.txt
# build the extension and install python package (requires gcc-5 or later)# python setup.py install
CC=/path/to/gcc-5 CXX=/path/to/g++-5 python setup.py install
# If you do NOT require kenlm, the `--recursive` flag is not required on git clone# and `--exclude-kenlm` should be appended to the `python setup.py install` command
API
pytorch-ctc includes a CTC beam search decoder with multiple scorer implementations. A scorer is a function that the decoder calls to condition the probability of a given beam based on its state.
Scorers
Two Scorer implementations are currently implemented for pytorch-ctc.
Scorer: is a NO-OP and enables the decoder to do a vanilla beam decode
scorer=Scorer()
KenLMScorer: conditions beams based on the provided KenLM binary language model.
labels is a string of output labels given in the same order as the output layer
lm_path path to a binary KenLM language model for decoding
trie_path path to a Trie containing the lexicon (see generate_lm_trie)
blank_index is used to specify which position in the output distribution represents the blank class
space_index is used to specify which position in the output distribution represents the word separator class
The KenLMScorer may be further configured with weights for the language model contribution to the score (lm_weight), as well as word and valid word bonuses (to offset decreasing probability as a function of sequence length).
A vocabulary trie is required for the KenLM Scorer. The trie is created from a lexicon specified as a newline separated text file of words in the vocabulary.
Acknowledgements
Thanks to ebrevdo for the original TensorFlow CTC decoder implementation, timediv for his KenLM extension, and SeanNaren for his assistance.