GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

29 Citations (Scopus)

Abstract

GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Subtitle of host publicationInterspeech'16, San Francisco, USA, Sept. 8-12, 2016
PublisherInternational Speech Communication Association
Pages2473-2477
Number of pages5
Volume08-12-September-2016
ISBN (Electronic)978-1-5108-3313-5
DOIs
Publication statusPublished - 2016
MoE publication typeA4 Article in a conference publication
EventInterspeech - San Francisco, United States
Duration: 8 Sep 201612 Sep 2016
Conference number: 17

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communications Association
ISSN (Print)1990-9770
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
CountryUnited States
CitySan Francisco
Period08/09/201612/09/2016

Keywords

  • Deep neural network
  • Glottal inverse filtering
  • Speech synthesis
  • Vocoder

Fingerprint Dive into the research topics of 'GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Airaksinen, M., Bollepalli, B., Juvela, L., Wu, Z., King, S., & Alku, P. (2016). GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association: Interspeech'16, San Francisco, USA, Sept. 8-12, 2016 (Vol. 08-12-September-2016, pp. 2473-2477). (Proceedings of the Annual Conference of the International Speech Communication Association). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2016-342