Postfilters are used in speech and audio codecs to improve the quality of the decoded signal. Recent studies have shown that postfilters using speech models can considerably improve the output quality of the signal. In our previous work, we proposed a postfilter for frequency-domain speech coding which took advantages of the time-frequency correlation inherent to the speech magnitude spectrum. However, while the proposed approach showed substantial improvement in the output speech quality, it operates in the STFT-domain, whereas state-of-art transform domain coding is implemented in the modified discrete cosine transform (MDCT) domain. As a result, the decoded signal has to be first transformed back to the time-domain and then to the frequency-domain before the postfilter can be applied, thus increasing the algorithmic delay and complexity. In this work, we adapt the context-based postfiltering method to the MDCT-domain, such that the postfilter can be directly applied on the decoded signal within the codec framework. However, we observe that the gains obtained from postfiltering in the MDCT-domain is half of that in the STFT-domain. Further analysis indicates that the correlation between coefficients in the MDCT-domain is lower than the correlations in the STFT-domain. Hence, the modelling methods employed in the STFT-domain postfilter are not directly applicable to a postfilter using the MDCT. To improve performance, we propose to jointly model a pair of MDCT coefficients as a complexvalued pseudo spectrum. The proposed approach shows an average perceptual- SNR (PSNR) improvement of 1.5 dB over a plain MDCT approach, which is similar to the quality with a STFT-domain postfilter.
|Tila||Julkaistu - 2020|
|Tapahtuma||Conference on Electronic Processing of Speech Signals - University of Magdeburg, Magdeburg, Saksa|
Kesto: 4 maaliskuuta 2020 → 6 maaliskuuta 2020
|Conference||Conference on Electronic Processing of Speech Signals|
|Ajanjakso||04/03/2020 → 06/03/2020|