End-to-End Optimized Multi-Stage Vector Quantization of Spectral Envelopes for Speech and Audio Coding

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)
267 Downloads (Pure)

Abstract

Spectral envelope modeling is an instrumental part of speech and audio codecs, which can be used to enable efficient entropy coding of spectral components. Overall optimization of codecs, including envelope models, has however been difficult due to the complicated interactions between different modules of the codec. In this paper, we study an end-to-end optimization methodology to optimize all modules in a codec integrally with respect to each other while capturing all these complex interactions with a global loss function. For the quantization of the spectral envelope parameters with a fixed bitrate, we use multistage vector quantization which gives high quality, but yet has a computational complexity which can be realistically applied in embedded devices. The obtained results demonstrate benefits in terms of PESQ and PSNR in comparison to the 3GPP EVS, as well as our recently proposed PyAWNeS codecs.
Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association (ISCA)
Pages2728-2732
Number of pages5
ISBN (Electronic)9781713836902
DOIs
Publication statusPublished - Sept 2021
MoE publication typeA4 Conference publication
EventInterspeech - Brno, Czech Republic
Duration: 30 Aug 20213 Sept 2021
Conference number: 22

Publication series

NameAnnual Conference of the International Speech Communication Association
ISSN (Print)1990-9772
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
Abbreviated titleINTERSPEECH
Country/TerritoryCzech Republic
CityBrno
Period30/08/202103/09/2021

Fingerprint

Dive into the research topics of 'End-to-End Optimized Multi-Stage Vector Quantization of Spectral Envelopes for Speech and Audio Coding'. Together they form a unique fingerprint.

Cite this