Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition

Udara Laxman Kumar*, Mikko Kurimo, Hemant Kumar Kathania

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

Children’s speech recognition shows poor performance as compared to adult speech. Large amount of data is required for the neural network models to achieve good performance. A very limited amount of children’s speech data is publicly available. A baseline system was developed using adult speech for training and children’s speech for testing. This kind of system suffers from mismatches between training and testing speech data. To overcome one of the mismatches, which is formant frequency locations between adults and children, in this paper we have explored the effect of linear prediction order to modify the formant frequency locations. The explored method studies for narrowband and wideband speech and found that they gave reductions in word error rate (WER) for GMM-HMM, DNN-HMM, and TDNN acoustic models. The TDNN acoustic model gives the best performance as compared to other acoustic models. The best formant modification factor α is 0.1 for linear prediction order 6 for narrowband speech (WER 13.82%), and α is 0.1 for linear prediction order 20 for wideband speech (WER 12.19%) for the TDNN acoustic model. Further, we have also compared the method with vocal tract length normalization (VTLN) and speaking rate adaptation (SRA), and it is found that the proposed method gives a better reduction in WERs as compared to VTLN and SRA.

Original languageEnglish
Title of host publicationSpeech and Computer - 25th International Conference, SPECOM 2023, Proceedings
EditorsAlexey Karpov, K. Samudravijaya, K. T. Deepak, Rajesh M. Hegde, S. R. Mahadeva Prasanna, Shyam S. Agrawal
PublisherSpringer
Pages483-493
Number of pages11
ISBN (Print)978-3-031-48308-0
DOIs
Publication statusPublished - 2023
MoE publication typeA4 Conference publication
EventInternational Conference on Speech and Computer - Dharwad, India
Duration: 29 Nov 20232 Dec 2023
Conference number: 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14338 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Speech and Computer
Abbreviated titleSPECOM
Country/TerritoryIndia
CityDharwad
Period29/11/202302/12/2023

Keywords

  • Children’s speech recognition
  • Formant modification
  • Linear prediction
  • TDNN

Fingerprint

Dive into the research topics of 'Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition'. Together they form a unique fingerprint.

Cite this