Voice-quality Features for Deep Neural Network Based Speaker Verification Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Jitter and shimmer are voice-quality features which have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of such voice-quality features in neural-network based speaker verification systems. To combine these two sets of features, the cosine distance scores estimated from the two sets are linearly weighted to obtain a single, fused score. The fused score is used to accept/reject a given speaker. The experimental results carried out on Voxceleb-1 dataset demonstrate that the fusion of the cosine distance scores extracted from the mel-spectrogram and voice quality features provide a 15% relative improvement in Equal Error Rate (EER) compared to the baseline system which is based only on mel-spectrogram features.
Original languageEnglish
Title of host publicationEuropean Signal Processing Conference 2021
PublisherIEEE
Pages176-180
Number of pages5
ISBN (Electronic)978-9-0827-9706-0
Publication statusPublished - 27 Aug 2021
MoE publication typeA4 Article in a conference publication
EventEuropean Signal Processing Conference - Dublin, Ireland
Duration: 23 Aug 202127 Aug 2021
Conference number: 29

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491
ISSN (Electronic)2076-1465

Conference

ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO
Country/TerritoryIreland
CityDublin
Period23/08/202127/08/2021

Keywords

  • jitter
  • mel-spectrogram
  • fusion
  • shimmer
  • speech recognition

Fingerprint

Dive into the research topics of 'Voice-quality Features for Deep Neural Network Based Speaker Verification Systems'. Together they form a unique fingerprint.

Cite this