| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 05 Mar 2023 04:20:34 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"64041892-3c0d"
expires: Mon, 29 Dec 2025 00:59:04 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: B31F:3157C7:80E30E:90E877:6951D000
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 00:49:04 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210078-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766969345.714170,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 06689f4f327c61267c27c7871f7f71565e09b150
content-length: 2169
Sound demos for "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"
Sound demos for "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"
ICLR 2019: paper link
Authors: Wei Ping, Kainan Peng, Jitong Chen. (equal contribution.)
Experiment I: Autoregressive wave generation conditioned on mel-spectrogram
We obtain high-fidelity synthesized speech by training an autoregressive WaveNet with the single Gaussian output distribution.| Ground-truth | Single Gaussian | Mixture of Gaussian (k = 10) | Mixture of Logistic (k = 10) | Softmax (channel = 2048) |
|---|---|---|---|---|
| 1: Others are students or workers involved in some way with agriculture. | ||||
| 2: It is the purpose of antitrust law to look to the future. | ||||
| 3: May I reserve a deck chair, please? | ||||
| 4: But bullies are like termites. | ||||
| 5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos. | ||||
Experiment II: Parallel wave generation conditioned on mel-spectrogram
We propose a parallel wave generation method based on Gaussian inverse autoregressive flow (IAF). We distill a parallel student-net from an autoregressive teacher-net. Our method generates all samples of an audio waveform in parallel.| Student-Net-1 (Reverse KLreg + STFT-loss) | Student-Net-1 (Forward KLreg + STFT-loss) | Student-Net-2 (Reverse KLreg + STFT-loss) |
|---|---|---|
| 1: Others are students or workers involved in some way with agriculture. | ||
| 2: It is the purpose of antitrust law to look to the future. | ||
| 3: May I reserve a deck chair, please? | ||
| 4: But bullies are like termites. | ||
| 5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos. | ||
Experiment III: End-to-End Text-to-Wave Model
We propose the first text-to-wave model for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model.| Text-to-Wave Teacher | Text-to-Wave Studnet |
|---|---|
| 1: Please call Stella. | |
| 2: Ask her to bring these things with her from the store. | |
| 3: Some have accepted it as a miracle without physical explanation. | |
| 4: The rainbow is a division of white light into many beautiful colors. | |
| 5: Throughout the centuries people have explained the rainbow in various ways. html_padding_html_padding_html_padding_html_padding_html_padd | |
Extension: ClariNet for Mandarin Chinese
We also extend ClariNet with linguisitc conditioner for Mandarin Chinese.| Gaussian WaveNet Teacher | Gaussian IAF Student |
|---|---|