Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

4 Sitaatiot (Scopus)
116 Lataukset (Pure)

Abstrakti

Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.
AlkuperäiskieliEnglanti
Otsikko ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
KustantajaIEEE
Sivut6835 - 6839
Sivumäärä5
ISBN (elektroninen)978-1-4799-8131-1
ISBN (painettu)978-1-4799-8132-8
DOI - pysyväislinkit
TilaJulkaistu - 1 toukokuuta 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Brighton, Iso-Britannia
Kesto: 12 toukokuuta 201917 toukokuuta 2019
Konferenssinumero: 44

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (painettu)1520-6149
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
MaaIso-Britannia
KaupunkiBrighton
Ajanjakso12/05/201917/05/2019

Projektit

ACLEW: Lasten kielikokemuksien kartoitus ja analyysi koko maailman mittakaavassa

Räsänen, O. & Seshadri, S.

01/06/201725/09/2020

Projekti: Academy of Finland: Other research funding

Poikkitieteellinen parametrisen puhesynteesin tutkimusprojekti

Murtola, T., Bollepalli, B., Juvela, L., Airaksinen, M., Bäckström, T. & Alku, P.

01/01/201824/01/2020

Projekti: Academy of Finland: Other research funding

Ihmisen ja koneen kielenoppimisen kontekstisidonnainen laskennallinen perusta

Räsänen, O.

31/12/201731/12/2017

Projekti: Academy of Finland: Other research funding

Siteeraa tätä

Seshadri, S., Juvela, L., Yamagishi, J., Räsänen, O., & Alku, P. (2019). Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion. teoksessa ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Sivut 6835 - 6839). [8682648] (Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing). IEEE. https://doi.org/10.1109/ICASSP.2019.8682648