Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition

Udara Laxman Kumar*, Mikko Kurimo, Hemant Kumar Kathania

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

Abstrakti

Children’s speech recognition shows poor performance as compared to adult speech. Large amount of data is required for the neural network models to achieve good performance. A very limited amount of children’s speech data is publicly available. A baseline system was developed using adult speech for training and children’s speech for testing. This kind of system suffers from mismatches between training and testing speech data. To overcome one of the mismatches, which is formant frequency locations between adults and children, in this paper we have explored the effect of linear prediction order to modify the formant frequency locations. The explored method studies for narrowband and wideband speech and found that they gave reductions in word error rate (WER) for GMM-HMM, DNN-HMM, and TDNN acoustic models. The TDNN acoustic model gives the best performance as compared to other acoustic models. The best formant modification factor α is 0.1 for linear prediction order 6 for narrowband speech (WER 13.82%), and α is 0.1 for linear prediction order 20 for wideband speech (WER 12.19%) for the TDNN acoustic model. Further, we have also compared the method with vocal tract length normalization (VTLN) and speaking rate adaptation (SRA), and it is found that the proposed method gives a better reduction in WERs as compared to VTLN and SRA.

AlkuperäiskieliEnglanti
OtsikkoSpeech and Computer - 25th International Conference, SPECOM 2023, Proceedings
ToimittajatAlexey Karpov, K. Samudravijaya, K. T. Deepak, Rajesh M. Hegde, S. R. Mahadeva Prasanna, Shyam S. Agrawal
KustantajaSpringer
Sivut483-493
Sivumäärä11
ISBN (painettu)978-3-031-48308-0
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Speech and Computer - Dharwad, Intia
Kesto: 29 marrask. 20232 jouluk. 2023
Konferenssinumero: 25

Julkaisusarja

NimiLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vuosikerta14338 LNAI
ISSN (painettu)0302-9743
ISSN (elektroninen)1611-3349

Conference

ConferenceInternational Conference on Speech and Computer
LyhennettäSPECOM
Maa/AlueIntia
KaupunkiDharwad
Ajanjakso29/11/202302/12/2023

Sormenjälki

Sukella tutkimusaiheisiin 'Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä