Attention-Based End-To-End Named Entity Recognition From Speech

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

4 Citations (Scopus)
213 Downloads (Pure)

Abstract

Named entities are heavily used in the field of spoken language understanding, which uses speech as an input. The standard way of doing named entity recognition from speech involves a pipeline of two systems, where first the automatic speech recognition system generates the transcripts, and then the named entity recognition system produces the named entity tags from the transcripts. In such cases, automatic speech recognition and named entity recognition systems are trained independently, resulting in the automatic speech recognition branch not being optimized for named entity recognition and vice versa. In this paper, we propose two attention-based approaches for extracting named entities from speech in an end-to-end manner, that show promising results. We compare both attention-based approaches on Finnish, Swedish, and English data sets, underlining their strengths and weaknesses.
Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 24th International Conference, TSD 2021, Proceedings
EditorsKamil Ekštein, František Pártl, Miloslav Konopík
PublisherSpringer
Pages469 - 480
Number of pages12
ISBN (Electronic)978-3-030-83527-9
ISBN (Print)9783030835262
DOIs
Publication statusPublished - 2021
MoE publication typeA4 Conference publication
EventInternational Conference on Text, Speech, and Dialogue - Olomouc, Czech Republic
Duration: 6 Sept 20219 Sept 2021
Conference number: 24

Publication series

NameLecture Notes in Computer Science
Volume12848
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Text, Speech, and Dialogue
Abbreviated titleTSD
Country/TerritoryCzech Republic
CityOlomouc
Period06/09/202109/09/2021

Fingerprint

Dive into the research topics of 'Attention-Based End-To-End Named Entity Recognition From Speech'. Together they form a unique fingerprint.

Cite this