Abstrakti
Jitter and shimmer are voice-quality features which have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of such voice-quality features in neural-network based speaker verification systems. To combine these two sets of features, the cosine distance scores estimated from the two sets are linearly weighted to obtain a single, fused score. The fused score is used to accept/reject a given speaker. The experimental results carried out on Voxceleb-1 dataset demonstrate that the fusion of the cosine distance scores extracted from the mel-spectrogram and voice quality features provide a 15% relative improvement in Equal Error Rate (EER) compared to the baseline system which is based only on mel-spectrogram features.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | 29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings |
Kustantaja | IEEE |
Sivut | 176-180 |
Sivumäärä | 5 |
ISBN (elektroninen) | 978-9-0827-9706-0 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 27 elokuuta 2021 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | European Signal Processing Conference - Dublin, Irlanti Kesto: 23 elokuuta 2021 → 27 elokuuta 2021 Konferenssinumero: 29 |
Julkaisusarja
Nimi | European Signal Processing Conference |
---|---|
ISSN (painettu) | 2219-5491 |
ISSN (elektroninen) | 2076-1465 |
Conference
Conference | European Signal Processing Conference |
---|---|
Lyhennettä | EUSIPCO |
Maa/Alue | Irlanti |
Kaupunki | Dublin |
Ajanjakso | 23/08/2021 → 27/08/2021 |