Topic Identification for Spontaneous Speech: Enriching Audio Features with Embedded Linguistic Information

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)

Abstract

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.
Original languageEnglish
Title of host publication2023 31st European Signal Processing Conference (EUSIPCO)
PublisherIEEE
Pages396-400
Number of pages5
ISBN (Electronic)978-9-4645-9360-0
ISBN (Print)979-8-3503-2811-0
DOIs
Publication statusPublished - 4 Sept 2023
MoE publication typeA4 Conference publication
EventEuropean Signal Processing Conference - Helsinki, Finland
Duration: 4 Sept 20238 Sept 2023
Conference number: 31
https://eusipco2023.org/

Publication series

NameEuropean Signal Processing Conference
ISSN (Electronic)2076-1465

Conference

ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO
Country/TerritoryFinland
CityHelsinki
Period04/09/202308/09/2023
Internet address

Fingerprint

Dive into the research topics of 'Topic Identification for Spontaneous Speech: Enriching Audio Features with Embedded Linguistic Information'. Together they form a unique fingerprint.

Cite this