You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can get the emotion category and emotion intensity annotation information in the ./preprocessed_data/DailyTalk/ folder.
1_1_d30|1|{Y EH1 S AY1 N OW1}|yes, i know.|none|1 The format of each piece of data is representing sentence ID|speaker|phoneme sequence|original content|emotion|emotion intensity
Preprocessing
Run
python3 prepare_align.py --dataset DailyTalk
for some preparations.
For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.
Pre-extracted alignments for the datasets are provided here.
You have to unzip the files in preprocessed_data/DailyTalk/TextGrid/. Alternately, you can run the aligner by yourself. Please note that our pretrained models are not trained with supervised duration modeling (they are trained with learn_alignment: True).
After that, run the preprocessing script by
python3 preprocess.py --dataset DailyTalk
Training
Train your model with
python3 train.py --dataset DailyTalk
Inference
Only the batch inference is supported as the generation of a turn may need contextual history of the conversation. Try
to synthesize all utterances in preprocessed_data/DailyTalk/val_*.txt.
Citing
To cite this repository:
@inproceedings{liu2024emotion,
title={Emotion rendering for conversational speech synthesis with heterogeneous graph-based context modeling},
author={Liu, Rui and Hu, Yifan and Ren, Yi and Yin, Xiang and Li, Haizhou},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={17},
pages={18698--18706},
year={2024}
}