Mel-weighted Single Frequency Filtering Spectrogram for Dialect Identification

Rashmi Kethireddy, Sudarsana Kadiri, Paavo Alku, S. V. Gangashetty

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

Abstrakti

In this study, we propose Mel-weighted single frequency filtering (SFF) spectrograms for dialect identification. The spectrum derived using SFF has high spectral resolution for harmonics and resonances while simultaneously maintaining good time-resolution of some speech excitation features such as impulse-like events. The SFF spectrum can represent speech characteristics such as burst time and glottal closure instances better than the short-time Fourier transform (STFT) spectrum. Our hypothesis is that these intricate representations in the SFF spectrum should help in distinguishing dialects. Therefore, we built a dialect identification system which uses an unsupervised, bottleneck feature representation of the Mel-weighted SFF spectrogram (Mel-SFF spectrogram) with sequence-to-sequence deep autoencoders. The language invariance of the proposed system was evaluated using two datasets: the UT-Podcast database (English) and the STYRIALECT database (German). The proposed representations gave a relative improvement of 9.47% and 4.69% in unweighted average recall (UAR) compared to the best baseline method on the development and test datasets, respectively, of the UT-Podcast database. The proposed representations also gave a comparable performance to the best baseline method for the STYRIALECT database. In addition, the fusion of the autoencoder bottleneck features computed from the Mel-SFF and Mel-STFT spectrograms improved the overall performance indicating complementary information between these features. By further analyzing the performance of the proposed representation with different utterance lengths using the UT-Podcast database, we observed that the proposed representation performed better on short utterances. The improved performance given by the Mel-weighted SFF spectrogram for recognizing dialects in both databases supports our hypothesis.
AlkuperäiskieliEnglanti
Sivut174871-174879
Sivumäärä9
JulkaisuIEEE Access
Vuosikerta8
Varhainen verkossa julkaisun päivämäärä2020
DOI - pysyväislinkit
TilaJulkaistu - 2020
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

Sormenjälki

Sukella tutkimusaiheisiin 'Mel-weighted Single Frequency Filtering Spectrogram for Dialect Identification'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä