The accuracy of automatic speech recognizers has been constantly improving for decades. Aalto University has developed automatic recognition of Finnish speech and achieved very low error rates on clearly spoken standard Finnish, such as news broadcasts. Recognition of natural conversations is much more challenging. The language that is spoken in Finnish conversations also differs in many ways from standard Finnish, and its recognition requires data that has previously been unavailable.
This thesis develops automatic speech recognition for conversational Finnish, starting by collection of training and evaluation data. For language modeling, large amounts of text are collected from the Internet, and filtered to match the colloquial speaking style. An evaluation set is published and used to benchmark the progress in conversational Finnish speech recognition. The thesis addresses many difficulties that arise from the fact that the vocabulary that is used in Finnish conversations is very large. Using deep neural networks for acoustic modeling and recurrent neural networks for language modeling, accuracy that is already useful in practical applications is achieved in conversational speech recognition.
- , Supervisor
- Sami Virpioja, Advisor
|Publication status||Published - 2018|
|MoE publication type||G5 Doctoral dissertation (article)|
- automatic speech recognition, language modeling, word classes, artificial neural networks, data collection