Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

263 Downloads (Pure)

Abstract

Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.
Original languageEnglish
Title of host publication31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings
PublisherEuropean Association For Signal and Imag Processing
Pages31-35
Number of pages5
ISBN (Electronic)978-94-645936-0-0
DOIs
Publication statusPublished - 4 Sept 2023
MoE publication typeA4 Conference publication
EventEuropean Signal Processing Conference - Helsinki, Finland
Duration: 4 Sept 20238 Sept 2023
Conference number: 31
https://eusipco2023.org/

Publication series

NameEuropean Signal Processing Conference
ISSN (Electronic)2076-1465

Conference

ConferenceEuropean Signal Processing Conference
Abbreviated titleEUSIPCO
Country/TerritoryFinland
CityHelsinki
Period04/09/202308/09/2023
Internet address

Keywords

  • bandwidth extension
  • speech processing
  • real-time
  • deep learning

Fingerprint

Dive into the research topics of 'Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech'. Together they form a unique fingerprint.

Cite this