Abstract
Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.
Original language | English |
---|---|
Title of host publication | 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings |
Publisher | European Association For Signal and Imag Processing |
Pages | 31-35 |
Number of pages | 5 |
ISBN (Electronic) | 978-94-645936-0-0 |
DOIs | |
Publication status | Published - 4 Sept 2023 |
MoE publication type | A4 Conference publication |
Event | European Signal Processing Conference - Helsinki, Finland Duration: 4 Sept 2023 → 8 Sept 2023 Conference number: 31 https://eusipco2023.org/ |
Publication series
Name | European Signal Processing Conference |
---|---|
ISSN (Electronic) | 2076-1465 |
Conference
Conference | European Signal Processing Conference |
---|---|
Abbreviated title | EUSIPCO |
Country/Territory | Finland |
City | Helsinki |
Period | 04/09/2023 → 08/09/2023 |
Internet address |
Keywords
- bandwidth extension
- speech processing
- real-time
- deep learning