Learning Filterbanks from Raw Waveform for Accent Classification

Rashmi Kethireddy, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

4 Citations (Scopus)
140 Downloads (Pure)


Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification, mel-scale distribution of filters may not always be the best representations, e.g., pitch accented languages where the emphasis should be on vocal source information too. Motivated by this, we use end-to-end classification of accents directly from waveforms which will reduce the effort of designing features specific to each corpus. The convolution neural network (CNN) model architecture is designed in such a way that the initial layers exhibit similar operation as in MFSC by initializing the weights using time approximate of MFSC. The entire network along with initial layers is trained to learn accent classification. We observed that learning directly from waveform improved the performance of accent classification when compared to CNN trained on hand-engineered features by 10.94% UAR on the test dataset of common voice corpus. Analyzing the filters after learning, we observed changes in distribution and bandwidths of center frequencies. We further observed the importance of appropriately initializing CNN filters.

Original languageEnglish
Title of host publicationProceedings of the 2020 International Joint Conference on Neural Networks, IJCNN 2020
Number of pages6
ISBN (Electronic)9781728169262
Publication statusPublished - Jul 2020
MoE publication typeA4 Conference publication
EventInternational Joint Conference on Neural Networks - Glasgow, United Kingdom
Duration: 19 Jul 202024 Jul 2020

Publication series

NameProceedings of International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407


ConferenceInternational Joint Conference on Neural Networks
Abbreviated titleIJCNN
Country/TerritoryUnited Kingdom


  • Accent classification
  • Convolution neural network
  • First order scattering transform
  • Raw waveform


Dive into the research topics of 'Learning Filterbanks from Raw Waveform for Accent Classification'. Together they form a unique fingerprint.

Cite this