In mobile communication systems, conversations are increasingly distorted
by environment noise. Often, there occurs a situation where the speech signal is
distorted because of superposed background noise. Conventional single channel
speech enhancement algorithms improve the quality of noisy speech when the noise is
fairly stationary and introduce the distortions in the enhanced signal. However, they
do not improve the intelligibility when the enhanced signal is presented directly to
a human listener. The loss of intelligibility is mostly due to distortions introduced
by the noise reduction preprocessor. Several speech enhancement algorithms have
been developed over the last two decades, such as Wiener filter, power spectral
subtraction (Berouti et al., 1979), modified spectral subtraction method, and minimum mean-
square error short-time spectral amplitude estimator (Ephraim and Malah,
1984). Improvements are still sought because the existing algorithms ensure that the
spectral characteristics of the noise change very slowly, compared to those of the speech;
this may not be true in non-stationary environments. In non-stationary
environments, noise characteristics may change appreciably during speech activity such that
speech enhancement system performance is degraded. The proposed algorithm consists
of spectral gain and perceptually motivated weighting function (Anitha Sheela et al., 2006). Figure 1 illustrates the system for the proposed scheme. The algorithm
includes short-time spectral amplitude estimator, a procedure for estimating and updating
the noise power spectral density and for estimating the modified spectral gain
and perceptual weighting filter. Conventional speech enhancement schemes were
proved to be very efficient in reducing the stationary noise. But, in non-stationary
environments, there will always be some noise called musical noise (Anitha Sheela et al., 2006) that lowers the speech quality. To reduce this musical noise, in this paper,
perceptual weighting function is included, and this weighting function is obtained based
on noise masking threshold characteristics of the human auditory system.
|