End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    3 Citations (Scopus)
    471 Downloads (Pure)

    Abstract

    Speech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s.
    Original languageEnglish
    Title of host publicationProceedings of Interspeech
    PublisherInternational Speech Communication Association (ISCA)
    Pages3401-3405
    DOIs
    Publication statusPublished - Sept 2019
    MoE publication typeA4 Conference publication
    EventInterspeech - Graz, Austria
    Duration: 15 Sept 201919 Sept 2019
    https://www.interspeech2019.org/

    Publication series

    NameInterspeech - Annual Conference of the International Speech Communication Association
    ISSN (Electronic)2308-457X

    Conference

    ConferenceInterspeech
    Country/TerritoryAustria
    CityGraz
    Period15/09/201919/09/2019
    Internet address

    Keywords

    • speech and audio coding
    • end-to-end optimization
    • speech source modeling

    Fingerprint

    Dive into the research topics of 'End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework'. Together they form a unique fingerprint.

    Cite this