GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. / Airaksinen, Manu; Bollepalli, Bajibabu; Juvela, Lauri; Wu, Zhizheng; King, Simon; Alku, Paavo.

Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016. Vol. 08-12-September-2016 International Speech Communication Association, 2016. p. 2473-2477 (Proceedings of the Annual Conference of the International Speech Communication Association).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Airaksinen, M, Bollepalli, B, Juvela, L, Wu, Z, King, S & Alku, P 2016, GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. in Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016. vol. 08-12-September-2016, Proceedings of the Annual Conference of the International Speech Communication Association, International Speech Communication Association, pp. 2473-2477, Interspeech, San Francisco, United States, 08/09/2016. https://doi.org/10.21437/Interspeech.2016-342

APA

Airaksinen, M., Bollepalli, B., Juvela, L., Wu, Z., King, S., & Alku, P. (2016). GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016 (Vol. 08-12-September-2016, pp. 2473-2477). (Proceedings of the Annual Conference of the International Speech Communication Association). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2016-342

Vancouver

Airaksinen M, Bollepalli B, Juvela L, Wu Z, King S, Alku P. GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016. Vol. 08-12-September-2016. International Speech Communication Association. 2016. p. 2473-2477. (Proceedings of the Annual Conference of the International Speech Communication Association). https://doi.org/10.21437/Interspeech.2016-342

Author

Airaksinen, Manu ; Bollepalli, Bajibabu ; Juvela, Lauri ; Wu, Zhizheng ; King, Simon ; Alku, Paavo. / GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016. Vol. 08-12-September-2016 International Speech Communication Association, 2016. pp. 2473-2477 (Proceedings of the Annual Conference of the International Speech Communication Association).

Bibtex - Download

@inproceedings{a38fc5d0ebb54a58ae5ca59f5575ae09,
title = "GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis",
abstract = "GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.",
keywords = "Deep neural network, Glottal inverse filtering, Speech synthesis, Vocoder",
author = "Manu Airaksinen and Bajibabu Bollepalli and Lauri Juvela and Zhizheng Wu and Simon King and Paavo Alku",
year = "2016",
doi = "10.21437/Interspeech.2016-342",
language = "English",
volume = "08-12-September-2016",
series = "Proceedings of the Annual Conference of the International Speech Communication Association",
publisher = "International Speech Communication Association",
pages = "2473--2477",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association",

}

RIS - Download

TY - GEN

T1 - GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

AU - Airaksinen, Manu

AU - Bollepalli, Bajibabu

AU - Juvela, Lauri

AU - Wu, Zhizheng

AU - King, Simon

AU - Alku, Paavo

PY - 2016

Y1 - 2016

N2 - GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.

AB - GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.

KW - Deep neural network

KW - Glottal inverse filtering

KW - Speech synthesis

KW - Vocoder

UR - http://www.scopus.com/inward/record.url?scp=84994338062&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-342

DO - 10.21437/Interspeech.2016-342

M3 - Conference contribution

VL - 08-12-September-2016

T3 - Proceedings of the Annual Conference of the International Speech Communication Association

SP - 2473

EP - 2477

BT - Proceedings of the Annual Conference of the International Speech Communication Association

PB - International Speech Communication Association

ER -

ID: 9921410