Model-Based Speech Enhancement in the Modulation Domain
This paper presents algorithms for modulation-domain speech enhancement using a Kalman filter. The algorithms are derived using two alternative statistical models for the speech and noise spectral coefficients. The proposed models incorporate the estimated dynamics of the spectral amplitudes of speech and noise into the MMSE estimation of the amplitude spectrum of the clean speech. Both models assume that the speech and noise are additive in the complex domain. The difference between the two algorithms is that the the first algorithm models only the spectral dynamics of the clean speech while the second algorithm jointly models the spectral dynamics of both speech and noise. In the first algorithm, a closed-form estimator is derived under the assumption that speech amplitudes follow a form of generalized Gamma distribution and the noise amplitudes follow Gaussian distribution. In the second algorithm, in order to include the dynamics of noise amplitudes with that of speech amplitudes, we propose a statistical "Gaussring" model that comprises a mixture of Gaussians whose centres lie in a circle on the complex plane. The performance of the proposed algorithms are evaluated using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms.
READ FULL TEXT