Spectral warping based data augmentation for low resource children’s speaker verification

Hemant Kumar Kathania, Virender Kadyan, Sudarsana Reddy Kadiri*, Mikko Kurimo

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

32 Downloads (Pure)

Abstract

In this paper, we present our effort to develop an automatic speaker verification (ASV) system for low resources children’s data. For the children’s speakers, very limited amount of speech data is available in majority of the languages for training the ASV system. Developing an ASV system under low resource conditions is a very challenging problem. To develop the robust baseline system, we merged out of domain adults’ data with children’s data to train the ASV system and tested with children’s speech. This kind of system leads to acoustic mismatches between training and testing data. To overcome this issue, we have proposed spectral warping based data augmentation. We modified adult speech data using spectral warping method (to simulate like children’s speech) and added it to the training data to overcome data scarcity and mismatch between adults’ and children’s speech. The proposed data augmentation gives 20.46% and 52.52% relative improvement (in equal error rate) for Indian Punjabi and British English speech databases, respectively. We compared our proposed method with well known data augmentation methods: SpecAugment, speed perturbation (SP) and vocal tract length perturbation (VTLP), and found that the proposed method performed best. The proposed spectral warping method is publicly available at https://github.com/kathania/Speaker-Verification-spectral-warping .

Original languageEnglish
Number of pages12
JournalMultimedia Tools and Applications
DOIs
Publication statusE-pub ahead of print - 3 Nov 2023
MoE publication typeA1 Journal article-refereed

Keywords

  • Children’s speech
  • Low resource languages
  • Speaker verification
  • Spectral warping
  • Speed perturbation
  • Vocal tract length perturbation

Fingerprint

Dive into the research topics of 'Spectral warping based data augmentation for low resource children’s speaker verification'. Together they form a unique fingerprint.

Cite this