A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

    Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

    49 Sitaatiot (Scopus)

    Abstrakti

    Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

    AlkuperäiskieliEnglanti
    Otsikko2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
    JulkaisupaikkaUnited States
    KustantajaIEEE
    Sivut4804-4808
    Sivumäärä5
    Vuosikerta2018-April
    ISBN (elektroninen)978-1-5386-4658-8
    ISBN (painettu)978-1-5386-4659-5
    DOI - pysyväislinkit
    TilaJulkaistu - 10 syysk. 2018
    OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
    TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Kanada
    Kesto: 15 huhtik. 201820 huhtik. 2018
    https://2018.ieeeicassp.org/

    Julkaisusarja

    NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
    ISSN (elektroninen)2379-190X

    Conference

    ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
    LyhennettäICASSP
    Maa/AlueKanada
    KaupunkiCalgary
    Ajanjakso15/04/201820/04/2018
    www-osoite

    Sormenjälki

    Sukella tutkimusaiheisiin 'A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Siteeraa tätä