Beam-search SIEVE for low-memory speech recognition

Martino Ciaperoni, Athanasios Katsamanis, Aristides Gionis, Panagiotis Karras

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.

Original languageEnglish
Title of host publicationInterspeech 2024
PublisherInternational Society for Computers and Their Applications (ISCA)
Pages272-276
Number of pages5
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventInterspeech - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association (ISCA)
ISSN (Print)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryGreece
CityKos Island
Period01/09/202405/09/2024

Keywords

  • memory efficient algorithms
  • speech recognition

Fingerprint

Dive into the research topics of 'Beam-search SIEVE for low-memory speech recognition'. Together they form a unique fingerprint.

Cite this