TY - GEN
T1 - Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition
AU - Sinha, Abhijit
AU - Singh, Mittul
AU - Kadiri, Sudarsana Reddy
AU - Kurimo, Mikko
AU - Kathania, Hemant Kumar
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Speech modification methods normalize children's speech towards adults' speech, enabling off-the-shelf generic automatic speech recognition (ASR) for this low-resource scenario. On the other hand, ASR models like Wav2Vec2 have shown remarkable robustness towards various speakers, thus streamlining their deployment. This paper examines the benefit of speech modification methods when using Wav2Vec2 models on children's speech. We experimented with prototypical speech modification methods and found that while models trained on large datasets exhibit similar performance across unmodified and modified children's speech, models trained on smaller datasets exhibit notably enhanced performance with modified speech. However, analyzing age effects on PF-Star and CMU Kids evaluation sets, we observe that all Wav2Vec2 variants still underperform for children under 10 years. In this scenario, speech modification methods and their combinations help improve performance for small and large Wav2Vec2 models but have plenty of room for improvement.
AB - Speech modification methods normalize children's speech towards adults' speech, enabling off-the-shelf generic automatic speech recognition (ASR) for this low-resource scenario. On the other hand, ASR models like Wav2Vec2 have shown remarkable robustness towards various speakers, thus streamlining their deployment. This paper examines the benefit of speech modification methods when using Wav2Vec2 models on children's speech. We experimented with prototypical speech modification methods and found that while models trained on large datasets exhibit similar performance across unmodified and modified children's speech, models trained on smaller datasets exhibit notably enhanced performance with modified speech. However, analyzing age effects on PF-Star and CMU Kids evaluation sets, we observe that all Wav2Vec2 variants still underperform for children under 10 years. In this scenario, speech modification methods and their combinations help improve performance for small and large Wav2Vec2 models but have plenty of room for improvement.
KW - children speech recognition
KW - Speech modification
KW - wav2vec2
UR - http://www.scopus.com/inward/record.url?scp=85203696776&partnerID=8YFLogxK
U2 - 10.1109/SPCOM60851.2024.10631626
DO - 10.1109/SPCOM60851.2024.10631626
M3 - Conference article in proceedings
AN - SCOPUS:85203696776
T3 - 2024 International Conference on Signal Processing and Communications, SPCOM 2024
BT - 2024 International Conference on Signal Processing and Communications, SPCOM 2024
PB - IEEE
T2 - International Conference on Signal Processing and Communications
Y2 - 1 July 2024 through 4 July 2024
ER -