You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.
File description
hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.
Training the network
STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.
Generate TTS wav file
STEP 1. Run synthesis.py. Make sure the restore step.
Samples
You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.