Improved subword modeling for WFST-based speech recognition
Research output: Scientific - peer-review › Conference contribution
Details
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2017 |
Publisher | International Speech Communication Association |
Pages | 2551-2555 |
Number of pages | 5 |
State | Published - Aug 2017 |
MoE publication type | A4 Article in a conference publication |
Event | INTERSPEECH - Duration: 1 Jan 1900 → … |
Publication series
Name | Interspeech: Annual Conference of the International Speech Communication Association |
---|---|
ISSN (Electronic) | 1990-9772 |
Conference
Conference | INTERSPEECH |
---|---|
Period | 01/01/1900 → … |
Researchers
Research units
Abstract
Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations.
The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.
The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.
- speech recognition, Kaldi, subword modeling, Finnish, Estonian
Research areas
Download statistics
No data available
ID: 13147108