End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Research units


Speech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s.


Original languageEnglish
Title of host publicationProceedings of Interspeech
Publication statusPublished - Sep 2019
MoE publication typeA4 Article in a conference publication
EventInterspeech - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Publication series

NameInterspeech - Annual Conference of the International Speech Communication Association
ISSN (Electronic)2308-457X


Internet address

    Research areas

  • speech and audio coding, end-to-end optimization, speech source modeling

Download statistics

No data available

ID: 36888052