Projects per year
Abstract
Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification, mel-scale distribution of filters may not always be the best representations, e.g., pitch accented languages where the emphasis should be on vocal source information too. Motivated by this, we use end-to-end classification of accents directly from waveforms which will reduce the effort of designing features specific to each corpus. The convolution neural network (CNN) model architecture is designed in such a way that the initial layers exhibit similar operation as in MFSC by initializing the weights using time approximate of MFSC. The entire network along with initial layers is trained to learn accent classification. We observed that learning directly from waveform improved the performance of accent classification when compared to CNN trained on hand-engineered features by 10.94% UAR on the test dataset of common voice corpus. Analyzing the filters after learning, we observed changes in distribution and bandwidths of center frequencies. We further observed the importance of appropriately initializing CNN filters.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2020 International Joint Conference on Neural Networks, IJCNN 2020 |
Publisher | IEEE |
Number of pages | 6 |
ISBN (Electronic) | 9781728169262 |
DOIs | |
Publication status | Published - Jul 2020 |
MoE publication type | A4 Conference publication |
Event | International Joint Conference on Neural Networks - Glasgow, United Kingdom Duration: 19 Jul 2020 → 24 Jul 2020 |
Publication series
Name | Proceedings of International Joint Conference on Neural Networks |
---|---|
Publisher | IEEE |
ISSN (Print) | 2161-4393 |
ISSN (Electronic) | 2161-4407 |
Conference
Conference | International Joint Conference on Neural Networks |
---|---|
Abbreviated title | IJCNN |
Country/Territory | United Kingdom |
City | Glasgow |
Period | 19/07/2020 → 24/07/2020 |
Keywords
- Accent classification
- Convolution neural network
- First order scattering transform
- Raw waveform
Fingerprint
Dive into the research topics of 'Learning Filterbanks from Raw Waveform for Accent Classification'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Interdisciplinary research on statistical parametric speech synthesis
Alku, P., Nonavinakere Prabhakera, N., Bollepalli, B., Bäckström, T., Murtola, T., Airaksinen, M. & Juvela, L.
01/01/2018 → 31/12/2019
Project: Academy of Finland: Other research funding