Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings

Aku Rouhe, Tuomas Kaseva, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

3 Citations (Scopus)
132 Downloads (Pure)

Abstract

In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherIEEE
Pages7064-7068
Number of pages5
ISBN (Electronic)9781509066315
DOIs
Publication statusPublished - May 2020
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Virtual conference, Barcelona, Spain
Duration: 4 May 20208 May 2020
Conference number: 45

Publication series

NameProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
Country/TerritorySpain
CityBarcelona
Period04/05/202008/05/2020
OtherVirtual conference

Keywords

  • end-to-end speech recognition
  • speaker embedding
  • speaker-adaptation
  • speaker-aware training

Fingerprint

Dive into the research topics of 'Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings'. Together they form a unique fingerprint.

Cite this