Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Details

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
Place of PublicationUnited States
PublisherInstitute of Electrical and Electronics Engineers
Pages5679-5683
Number of pages5
Volume2018-April
ISBN (Electronic)978-1-5386-4658-8
ISBN (Print)978-1-5386-4659-5
StatePublished - 10 Sep 2018
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018
https://2018.ieeeicassp.org/

Publication series

NameProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
CountryCanada
CityCalgary
Period15/04/201820/04/2018
Internet address

Researchers

Research units

  • Nippon Telegraph & Telephone
  • National Institute of Informatics

Abstract

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network-based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.

    Research areas

  • Excitation modeling, Generative adversarial networks, Mel-filterbank inversion, MFCC, Pitch prediction

ID: 28749294