Projects per year
Abstract
In this paper, we propose a software toolkit for easier end-to-end training of deep learning based spoken language identification models across several speech datasets. We apply our toolkit to implement three baseline models, one speaker recognition model, and three x-vector architecture variations, which are trained on three datasets previously used in spoken language identification experiments. All models are trained separately on each dataset (closed task) and on a combination of all datasets (open task), after which we compare if the open task training yields better language embeddings. We begin by training all models end-to-end as discriminative classifiers of spectral features, labeled by language. Then, we extract language embedding vectors from the trained end-to-end models, train separate Gaussian Naive Bayes classifiers on the vectors, and compare which model provides best language embeddings for the back-end classifier. Our experiments show that the open task condition leads to improved language identification performance on only one of the datasets. In addition, we discovered that increasing x-vector model robustness with random frequency channel dropout significantly reduces its end-to-end classification performance on the test set, while not affecting back-end classification performance of its embeddings. Finally, we note that two baseline models consistently outperformed all other models.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher | International Speech Communication Association (ISCA) |
Pages | 467-471 |
Number of pages | 5 |
Volume | 2020-October |
DOIs | |
Publication status | Published - 2020 |
MoE publication type | A4 Conference publication |
Event | Interspeech - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020 Conference number: 21 http://www.interspeech2020.org/ |
Publication series
Name | Interspeech |
---|---|
Publisher | International Speech Communication Association |
ISSN (Print) | 2308-457X |
Conference
Conference | Interspeech |
---|---|
Abbreviated title | INTERSPEECH |
Country/Territory | China |
City | Shanghai |
Period | 25/10/2020 → 29/10/2020 |
Internet address |
Keywords
- Deep learning
- Language embedding
- Spoken language identification
- TensorFlow
- X-vector
Fingerprint
Dive into the research topics of 'Releasing a toolkit and comparing the performance of language embeddings across various spoken language identification datasets'. Together they form a unique fingerprint.Projects
- 1 Finished
-
MeMAD: Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy
Kurimo, M. (Principal investigator)
27/12/2017 → 31/03/2021
Project: EU: Framework programmes funding