Source modeling is an efficient tool in speech and audio coding, yet in enhancement applications it has been less extensively employed. Incorporating speech source models from coding to enhancement has been difficult because the models are based on linear prediction, which is non-linear in the frequency domain. In this paper we propose a speech source model based on distribution quantizer, which quantifies the coarse shape of the spectral envelope. The spectral envelope is thus described by a set of parameters whose probability distributions have a simple form. The source parameters are estimated using these probability distributions from a single-channel noisy observation by maximum likelihood. Our experiments show that the proposed method is able to track the signal-to-noise ratio with good accuracy. In addition, although trained only on English items, our method showed relatively good results for German items as well, which demonstrates the robustness of the estimated source models.
|Otsikko||Speech Communication; 12. ITG Symposium|
|Tila||Julkaistu - 2016|
|OKM-julkaisutyyppi||A4 Artikkeli konferenssijulkaisuussa|
|Tapahtuma||ITG Symposium on Speech Communication - Paderborn, Saksa|
Kesto: 5 lokakuuta 2016 → 7 lokakuuta 2016
|Conference||ITG Symposium on Speech Communication|
|Ajanjakso||05/10/2016 → 07/10/2016|