Lombard speech synthesis using long short-term memory recurrent neural networks

Bajibabu Bollepalli, Manu Airaksinen, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

10 Sitaatiot (Scopus)

Abstrakti

In statistical parametric speech synthesis (SPSS), a few studies have investigated the Lombard effect, specifically by using hidden Markov model (HMM)-based systems. Recently, artificial neural networks have demonstrated promising results in SPSS, specifically by using long short-term memory recurrent neural networks (LSTMs). The Lombard effect, however, has not been studied in the LSTM-based speech synthesis systems. In this study, we propose three methods for Lombard speech adaptation in LSTM-based speech synthesis. In particular, (1) we augment Lombard specific information with the linguistic features as input, (2) scale the hidden activations using the learning hidden unit contributions (LHUC) method, and (3) fine-tune the LSTMs trained on normal speech with a small Lombard speech data. To investigate the effectiveness of the proposed methods, we carry out experiments using small (10 utterances) and large (500 utterances) Lombard speech data. Experimental results confirm the adaptability of the LSTMs, and similarity tests show that the LSTMs can achieve significantly better adaptation performance than the HMMs in both small and large data conditions.

AlkuperäiskieliEnglanti
Otsikko2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
KustantajaIEEE
Sivut5505-5509
Sivumäärä5
ISBN (elektroninen)9781509041176
DOI - pysyväislinkit
TilaJulkaistu - 16 kesäk. 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - New Orleans, Yhdysvallat
Kesto: 5 maalisk. 20179 maalisk. 2017

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
KustantajaIEEE
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
Maa/AlueYhdysvallat
KaupunkiNew Orleans
Ajanjakso05/03/201709/03/2017

Sormenjälki

Sukella tutkimusaiheisiin 'Lombard speech synthesis using long short-term memory recurrent neural networks'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä