Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition

Abhijit Sinha, Mittul Singh, Sudarsana Reddy Kadiri, Mikko Kurimo, Hemant Kumar Kathania

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

16 Downloads (Pure)

Abstract

Speech modification methods normalize children's speech towards adults' speech, enabling off-the-shelf generic automatic speech recognition (ASR) for this low-resource scenario. On the other hand, ASR models like Wav2Vec2 have shown remarkable robustness towards various speakers, thus streamlining their deployment. This paper examines the benefit of speech modification methods when using Wav2Vec2 models on children's speech. We experimented with prototypical speech modification methods and found that while models trained on large datasets exhibit similar performance across unmodified and modified children's speech, models trained on smaller datasets exhibit notably enhanced performance with modified speech. However, analyzing age effects on PF-Star and CMU Kids evaluation sets, we observe that all Wav2Vec2 variants still underperform for children under 10 years. In this scenario, speech modification methods and their combinations help improve performance for small and large Wav2Vec2 models but have plenty of room for improvement.

Original languageEnglish
Title of host publication2024 International Conference on Signal Processing and Communications, SPCOM 2024
PublisherIEEE
ISBN (Electronic)979-8-3503-5045-6
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventInternational Conference on Signal Processing and Communications - Bangalore, India
Duration: 1 Jul 20244 Jul 2024

Publication series

Name2024 International Conference on Signal Processing and Communications, SPCOM 2024

Conference

ConferenceInternational Conference on Signal Processing and Communications
Abbreviated titleSPCOM
Country/TerritoryIndia
CityBangalore
Period01/07/202404/07/2024

Keywords

  • children speech recognition
  • Speech modification
  • wav2vec2

Fingerprint

Dive into the research topics of 'Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition'. Together they form a unique fingerprint.

Cite this