You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Transformer model in Attention is all you need:a Keras implementation.
A Keras+TensorFlow Implementation of the Transformer: "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)
The code achieves near results as in the repository: about 70% valid accuracy.
If using smaller model parameters, such as layers=2 and d_model=256, the valid accuracy is better since the task is quite small.
For your own data
Just preprocess your source and target sequences as the format in en2de.s2s.txt and pinyin.corpus.examples.txt.
Some notes
For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
In pinyin_main.py, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.
Upgrades
Reconstruct some classes.
It is easier to use the components in other models, just import transformer.py
A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.