TY - GEN
T1 - Beam-search SIEVE for low-memory speech recognition
AU - Ciaperoni, Martino
AU - Katsamanis, Athanasios
AU - Gionis, Aristides
AU - Karras, Panagiotis
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.
AB - A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.
KW - memory efficient algorithms
KW - speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85214840193&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-2457
DO - 10.21437/Interspeech.2024-2457
M3 - Conference article in proceedings
AN - SCOPUS:85214840193
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 272
EP - 276
BT - Interspeech 2024
PB - International Society for Computers and Their Applications (ISCA)
T2 - Interspeech
Y2 - 1 September 2024 through 5 September 2024
ER -