Character-based units for Unlimited Vocabulary Continuous Speech Recognition

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu

Tutkijat

Organisaatiot

Kuvaus

We study character-based language models in the state-of-the-art speech recognition framework. This approach has advantages over both word-based systems and so-called end-to-end ASR systems that do not have separate acoustic and language models. We describe the necessary modifications needed to build an effective character-based ASR system using the Kaldi toolkit and evaluate the models based on words, statistical morphs, and characters for both Finnish and Arabic. The morph-based models yield the best recognition results for both well-resourced and lower-resourced tasks, but the character-based models are close to their performance in the lower-resource tasks, outperforming the word-based models. Character-based models are especially good at predicting novel word forms that were not seen in the training data. Using character-based neural network language models is both computationally efficient and provides a larger gain compared to the morph and word-based systems.

Yksityiskohdat

AlkuperäiskieliEnglanti
OtsikkoAutomatic Speech Recognition and Understanding (ASRU), IEEE Workshop on
TilaJulkaistu - 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japani
Kesto: 16 joulukuuta 201720 joulukuuta 2017
https://asru2017.org/

Workshop

WorkshopIEEE Automatic Speech Recognition and Understanding Workshop
LyhennettäASRU
MaaJapani
KaupunkiOkinawa
Ajanjakso16/12/201720/12/2017
www-osoite

Lataa tilasto

Ei tietoja saatavilla

ID: 14903665