Finnish ASR with deep transformer models

Abhilash Jain, Aku Rouhe, Stig Arne Grönroos, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

7 Citations (Scopus)
192 Downloads (Pure)


Recently, BERT and Transformer-XL based architectures have achieved strong results in a range of NLP applications. In this paper, we explore Transformer architectures-BERT and Transformer-XL-as a language model for a Finnish ASR task with different rescoring schemes. We achieve strong results in both an intrinsic and an extrinsic task with Transformer-XL. Achieving 29% better perplexity and 3% better WER than our previous best LSTM-based approach. We also introduce a novel three-pass decoding scheme which improves the ASR performance by 8%. To the best of our knowledge, this is also the first work (i) to formulate an alpha smoothing framework to use the non-autoregressive BERT language model for an ASR task, and (ii) to explore sub-word units with Transformer-XL for an agglutinative language like Finnish.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association (ISCA)
Number of pages5
Publication statusPublished - 2020
MoE publication typeA4 Conference publication
EventInterspeech - Shanghai, China
Duration: 25 Oct 202029 Oct 2020
Conference number: 21

Publication series

PublisherInternational Speech Communication Association
ISSN (Print)2308-457X


Abbreviated titleINTERSPEECH
Internet address


  • BERT
  • Language modeling
  • Speech recognition
  • Transformer-XL
  • Transformers


Dive into the research topics of 'Finnish ASR with deep transformer models'. Together they form a unique fingerprint.

Cite this