Learning Filterbanks from Raw Waveform for Accent Classification

Rashmi Kethireddy, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

4 Sitaatiot (Scopus)
130 Lataukset (Pure)


Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification, mel-scale distribution of filters may not always be the best representations, e.g., pitch accented languages where the emphasis should be on vocal source information too. Motivated by this, we use end-to-end classification of accents directly from waveforms which will reduce the effort of designing features specific to each corpus. The convolution neural network (CNN) model architecture is designed in such a way that the initial layers exhibit similar operation as in MFSC by initializing the weights using time approximate of MFSC. The entire network along with initial layers is trained to learn accent classification. We observed that learning directly from waveform improved the performance of accent classification when compared to CNN trained on hand-engineered features by 10.94% UAR on the test dataset of common voice corpus. Analyzing the filters after learning, we observed changes in distribution and bandwidths of center frequencies. We further observed the importance of appropriately initializing CNN filters.

OtsikkoProceedings of the 2020 International Joint Conference on Neural Networks, IJCNN 2020
ISBN (elektroninen)9781728169262
DOI - pysyväislinkit
TilaJulkaistu - heinäk. 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Joint Conference on Neural Networks - Glasgow, Iso-Britannia
Kesto: 19 heinäk. 202024 heinäk. 2020


NimiProceedings of International Joint Conference on Neural Networks
ISSN (painettu)2161-4393
ISSN (elektroninen)2161-4407


ConferenceInternational Joint Conference on Neural Networks


Sukella tutkimusaiheisiin 'Learning Filterbanks from Raw Waveform for Accent Classification'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä