threshold, it still recovers slowly and does not han-
dle bimodal backgrounds well.
Koller et a1.[4] have
successfully integrated this method in an automatic
traffic monitoring application.
Pfinder[7] uses a multi-class statistical model for
the tracked objects, but the background model is
a
single Gaussian per pixel. After an initialization pe-
riod where the room is empty, the system reports good
results. There have been no reports on the success
of
this tracker in outdoor scenes.
Friedman and Russell[2] have recently implemented
a pixel-wise
EM
framework for detection of vehicles
that bears the most similarity to our work.
Their
method attempts to explicitly classify the pixel values
into three separate, predetermined distributions corre-
sponding to the road color, the shadow color, and col-
ors corresponding to vehicles. Their attempt to medi-
ate the effect of shadows appears to be somewhat suc-
cessful, but it is not clear what behavior their system
would exhibit for pixels which did not contain these
three distributions. For example, pixels may present
a single background color or multiple background col-
ors resulting from repetitive motions, shadows, or re-
flectances.
1.2
Our approach
Rather than explicitly modeling the values of all
the pixels as one particular type
of
distribution, we
simply model the values of a particular pixel
as
a mix-
ture of Gaussians. Based on the persistence and the
variance of each of the Gaussians of the mixture, we
determine which Gaussians may correspond
to
back-
ground colors. Pixel values that do not
fit
the back-
ground distributions are considered foreground until
there is a Gaussian that includes them with sufficient,
consistent evidence supporting it.
Our system adapts to deal robustly with lighting
changes, repetitive motions
of
scene elements, track-
ing through cluttered regions, slow-moving objects,
and introducing or removing objects from the scene.
Slowly moving objects take longer to be incorporated
into the background, because their color has a larger
variance than the background. Also, repetitive vari-
ations are learned, and
a
model for the background
distribution is generally maintained even if it is tem-
porarily replaced by another distribution which leads
to faster recovery when objects are removed.
Our backgrounding method contains two significant
parameters
-
a,
the learning constant and
T,
the pro-
portion of the data that should be accounted for by the
background. Without needing to alter parameters, our
system has been used in an indoors, human-computer
interface application and, for the past
16
months, has
Figure
1:
The execution
of
the program. (a) the cur-
rent image, (6) an image composed of the means
of
the most probable Gaussians
in
the
background model,
(c) the foreground pixels, (d) the current image with
tracking information superimposed. Note: while the
shadows are foreground
in
this case,if the surface was
covered
by
shadows a significant amount
of
the time,
a Gaussian representing those pixel values may be sig-
nificant enough to be considered background.
been continuously monitoring outdoor scenes.
2
The
method
If each pixel resulted from
a
particular surface un-
der particular lighting,
a
single Gaussian would be suf-
ficient to model the pixel value while accounting for
acquisition noise. If only lighting changed over time,
a
single, adaptive Gaussian per pixel would be sufficient.
In practice, multiple surfaces often appear in the view
frustum of a particular pixel and the lighting condi-
tions change. Thus, multiple, adaptive Gaussians are
necessary. We use a mixture of adaptive Gaussians to
approximate this process.
Each time the parameters of the Gaussians are up-
dated, the Gaussians are evaluated using a simple
heuristic to hypothesize which are most likely to be
part of the “background process.” Pixel values that do
not match one of the pixel’s “background” Gaussians
are grouped using connected components. Finally,
the connected components are tracked from frame to
frame using
a
multiple hypothesis tracker. The pro-
cess is illustrated in Figure
1.
2.1
Online
mixture
model
We consider the values of a particular pixel over
time as a “pixel process”. The “pixel process’’ is a
time series of pixel values, e.g. scalars for grayvalues
247
Authorized licensed use limited to: NORTHERN JIAOTONG UNIVERSITY. Downloaded on December 23, 2008 at 06:58 from IEEE Xplore. Restrictions apply.