An 8.62μW 75dB-DRSoCEnd-to-End Spoken-Language-Understanding SoC with Channel-Level AGC and Temporal-Sparsity-Aware Streaming-Mode RNN

Sheng Zhou, Zixiao Li, Tobi Delbruck, Kwantae Kim, Shih Chii Liu

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)

Abstract

Voice-controlled IoT nodes and wearable devices require integrated real-time ultra-low-power audio classification circuits to perform tasks such as Keyword Spotting (KWS) and Spoken Language Understanding (SLU). In conventional ADC+DSP implementations [1]-[2], the analog front-end (AFE) and digital feature extractor (FEx) together accounted for >50% of the system power. Analog FEx [3]-[9] reduces power by direct analog-to-feature conversion. Voltage-domain FEx [3]-[6] achieved <0.5μW power but only demonstrated <6 classes KWS. Time-domain FEx [7]-[9] achieved 86%-to-91.5% KWS accuracy with 10-to-12 classes but needed amplitude-normalized input or a costly off-chip classifier. In addition, prior designs [1]-[10] were limited to single-word audio inputs and did not consider continuous speech inputs required by SLU. Real-world operation also requires >60dB input range to cope with the variation of speech volume [11] and the speaker distance from the microphone.

Original languageEnglish
Title of host publication2025 IEEE International Solid-State Circuits Conference, ISSCC 2025
PublisherIEEE
Pages238-240
Number of pages3
ISBN (Electronic)979-8-3315-4101-9
DOIs
Publication statusPublished - 2025
MoE publication typeA4 Conference publication
EventIEEE International Solid-State Circuits Conference - San Francisco, United States
Duration: 16 Feb 202520 Feb 2025

Conference

ConferenceIEEE International Solid-State Circuits Conference
Country/TerritoryUnited States
CitySan Francisco
Period16/02/202520/02/2025

Fingerprint

Dive into the research topics of 'An 8.62μW 75dB-DRSoCEnd-to-End Spoken-Language-Understanding SoC with Channel-Level AGC and Temporal-Sparsity-Aware Streaming-Mode RNN'. Together they form a unique fingerprint.

Cite this