Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification

Rashmi Kethireddy*, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

20 Lataukset (Pure)

Abstrakti

Speakers exhibit dialectal traits in speech at sub-segmental, segmental, and supra-segmental levels. Any feature representation for dialect classification should appropriately represent these dialectal traits. Traditional segmental features such as mel-frequency cepstral coefficients (MFCCs) fail to represent sub-segmental and supra-segmental dialectal traits. This study proposes to use frequency domain linear prediction cepstral coefficients (FDLPCCs) for dialect classification inspired by its long temporal summarization during pole estimation. The i-vectors and x-vectors derived from both baseline (MFCCs, linear prediction cepstral coefficients (LPCCs), perceptual LPCCs (PLPCCs), RASTA filtered PLPCCs (PLPCC-R) and proposed (FDLPCC) features are used for identifying the dialects with support vector machine (SVM) and feed-forward neural network (FFNN) as classifiers. Proposed FDLPCC features have shown to perform better than baseline features such as MFCCs and PLPCC-Rs (best among LPCCs variants) by an absolute improvement of 3.4% and 3.9% (in unweighted average recall (UAR)), with i-vector + SVM system and 1.6% and 4.6% (in UAR), i-vector + FFNN system respectively. It is also found that there exists a complementary information between the proposed and baseline features. Furthermore current studies are compared with previous studies and it is found that performances of current studies are better than previous studies.

AlkuperäiskieliEnglanti
Artikkeli108553
JulkaisuApplied Acoustics
Vuosikerta188
DOI - pysyväislinkit
TilaJulkaistu - tammik. 2022
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

Sormenjälki

Sukella tutkimusaiheisiin 'Exploration of temporal dynamics of frequency domain linear prediction cepstral coefficients for dialect classification'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä