I. Cohen, B. Berdugo / Signal Processing 81 (2001) 2403–2418 2405
However, this noise estimate is sensitive to outliers
[24], generally biased [16], and its variance is about
twice as large as the variance of a conventional
noise estimator [15]. Additionally, this method oc-
casionally attenuates low energy phonemes [15]. To
overcome these limitations, the smoothing parame-
ter and the bias compensation factor are turned into
time and frequency dependent, and estimated for
each spectral component and each time frame [16].
In [6], a computationally more ecient minimum
tracking scheme is presented. Its main drawbacks
are the very slow update rate of the noise estimate
in case of a sudden rise in the noise energy level,
and its tendency to cancel the signal [19].
Considering the speech estimation, Ephraim and
Malah [8] derived a log-spectral amplitude (LSA)
estimator, which minimizes the mean-square error
of the log-spectra, based on a Gaussian statistical
model. This estimator proved very ecient in re-
ducing musical residual noise phenomena [6,12,17].
However, the speech spectrum is estimated under
speech presence hypothesis. In contrast to other es-
timators, whose performance improves by utilizing
the speech presence probability [7,10,18,23,25], it
was believed that modication of the LSA estima-
tor under speech presence uncertainty is “unwor-
thy” [8]. Malah et al. [13] have recently proposed a
multiplicatively modi#ed LSA (MM-LSA) estima-
tor. Accordingly, the spectral gain is multiplied by
the conditional speech presence probability, which
is estimated for each frequency bin and each frame.
Unfortunately, the multiplicative modier is not op-
timal [13]. Moreover, their estimate for the a priori
SNR interacts with the estimated a priori speech
absence probability [17]. This adversely aects the
total gain for noise-only bins, and results in an un-
naturally structured residual noise.
1
Kim and Chang [12] proposed to use a small xed
a priori speech absence probability q (q =0:0625)
and a multiplicative modier, which is based on
the global conditional speech absence probability in
each frame. This modier is applied to the a priori
and a posteriori SNRs. Not only such a modication
1
Applying a uniform attenuation factor to frames that do not
contain speech eliminates the noise structuring in such frames
[13]. Yet, in speech-plus-noise frames the noise structuring
persists.
is inconsistent with the statistical model, but also
insignicant due to the small value of q and the
inuence of a few noise-only bins on the global
speech absence probability.
In this paper, we present an optimally modi#ed
LSA (OM-LSA) speech estimator and a minima
controlled recursive averaging (MCRA) noise es-
timation approach for robust speech enhancement.
The optimal spectral gain function is obtained as a
weighted geometric mean of the hypothetical gains
associated with the speech presence uncertainty.
The exponential weight of each hypothetical gain
is its corresponding probability, conditional on the
observed signal. The noise spectrum is estimated
by recursively averaging past spectral power val-
ues, using a smoothing parameter that is adjusted
by the speech presence probability in subbands.
We introduce two distinct speech presence prob-
ability functions, one for estimating the speech and
one for controlling the adaptation of the noise spec-
trum. The former is based on the time–frequency
distribution of the a priori SNR. The latter is de-
termined by the ratio between the local energy of
the noisy signal and its minimum within a spec-
ied time window. The probability functions are
estimated for each frame and each subband via a
soft-decision approach, which exploits the strong
correlation of speech presence in neighboring fre-
quency bins of consecutive frames.
Objective and subjective evaluation of the
OM-LSA and MCRA estimators is performed un-
der various environmental conditions. We show
that these estimators are superior, particularly for
low input SNRs and non-stationary noise. The
MCRA noise estimate is unbiased, computationally
ecient, robust with respect to the input SNR and
type of underlying additive noise, and characterized
by the ability to quickly follow abrupt changes in
the noise spectrum. Its performance is close to the
theoretical limit. The OM-LSA estimator demon-
strates excellent noise suppression, while retaining
weak speech components and avoiding the musical
residual noise phenomena.
The paper is organized as follows. In Section 2,
we derive the OM-LSA speech estimator and its
corresponding speech presence probability func-
tion. In Section 3, we discuss the problem of the
a priori SNR estimation under speech presence