Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu

Tutkijat

Organisaatiot

  • Nippon Telegraph & Telephone
  • National Institute of Informatics

Kuvaus

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network-based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
TilaJulkaistu - 10 syyskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Kanada
Kesto: 15 huhtikuuta 201820 huhtikuuta 2018
https://2018.ieeeicassp.org/

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
MaaKanada
KaupunkiCalgary
Ajanjakso15/04/201820/04/2018
www-osoite

Lataa tilasto

Ei tietoja saatavilla

ID: 28749294