SCALE-SPACE FILTERING
Andrew P. Witkin
Fairchild Laboratory for Artificial Intelligence Research
ABSTRACT—The extrema in a signal and its first
few derivatives provide a useful general-purpose qualitative
description for many kinds of signals. A fundamental prob-
lem in computing such descriptions is scale: a derivative
must be taken over some neighborhood, but there is seldom
a principled basis for choosing its size. Scale-space filtering
is a method that describes signals qualitatively, managing
the ambiguity of scale in an organized and natural way.
The signal is first expanded by convolution with gaussian
masks over a continuum of sizes. This "scale-space" image
is then collapsed, using its qualitative structure, into a tree
providing a concise but complete qualitative description
covering all scales of observation. The description is further
refined by applying a stability criterion, to identify events
that persist of large changes in scale.
1. Introduction
Hardly any sophisticated signal understanding task can
be performed using the raw numerical signal values directly;
some description of the signal must first be obtained. An
initial description ought to be as compact as possible, and
its elements should correspond as closely as possible to
meaningful objects or events in the signal-forming process.
Frequently, local extrema in the signal and its derivatives—
and intervals bounded by extrema—are particularly ap-
propriate descriptive primitives: although local and closely
tied to the signal data, these events often have direct
semantic interpretations, e.g. as edges in images. A
description that characterizes a signal by its extrema and
those of its first few derivatives is a qualitative description
of exactly the kind we were taught to use in elementary
calculus to "sketch" a function.
A great deal of effort has been expended to obtain this
kind of primitive qualitative description (for overviews of
this literature, see [1,2,3].) and the problem has proved
extremely difficult. The problem of scale has emerged con-
sistently as a fundamental source of difficulty, because the
events we perceive and find meaningful vary enormously in
size and extent. The problem is not so much to eliminate
fine-scale noise, as to separate events at different scales
arising from distinct physical processes.[4] It is possible
to introduce a parameter of scale by smoothing the signal
with a mask of variable size, but with the introduction
of scale-dependence comes ambiguity: every setting of the
scale parameter yields a different description; new extremal
points may appear, and existing ones may move or disap-
pear. How can we decide which if any of this continuum of
descriptions is "right"?
There is rarely a sound basis for setting the scale
parameter. In fact, it has become apparent that for many
tasks no one scale of description is categorically correct:
the physical processes that generate signals such as images
act at a variety of scales, none intrinsically more interest-
ing or important than another. Thus the ambiguity intro-
duced by scale is inherent and inescapable, so the goal of
scale-dependent description cannot be to eliminate this am-
biguity, but rather to manage it effectively, and reduce it
where possible.
This line of thinking has led to considerable interest in
multi-scale descriptions [5,2,6,7]. However, merely com-
puting descriptions at multiple scales does not solve the
problem; if anything, it exacerbates it by increasing the
volume of data. Some means must be found to organize or
simplify the description, by relating one scale to another.
Some work has been done in this area aimed at obtaining
"edge pyramids" (e.g. [8]), but no clear-cut criteria for con-
structing them have been put forward. Marr [4] suggested
that zero-crossings that coincide over several scales are
"physically significant," but this idea was neither justified
nor tested.
How, then, can descriptions at different scales be related
to each other in an organized, natural, and compact way?
Our solution, which we call scale-space filtering, begins by
continuously varying the scale parameter, sweeping out a
surface that we call the scale-space image. In this repre-
sentation, it is possible to track extrema as they move con-
tinuously with scale changes, and to identify the singular
points at which new extrema appear. The scale-space image
is then collapsed into a tree, providing a concise but com-
plete qualitative description of the signal over all scales of
observation.
1
2. The Scale-Space Image
Descriptions that depend on scale can be computed in
many ways. As a primitive scale-parameterization, the
gaussian convolution is attractive for a number of its
properties, amounting to "well-behavedness": the gaussian
is symmetric and strictly decreasing about the mean, and
therefore the weighting assigned to signal values decreases
smoothly with distance. The gaussian convolution behaves
well near the limits of the scale parameter, sigma, approach-
ing the un-smoothed signal for small 0, and approaching
the signal's mean for large cr. The gaussian is also readily
differentiated and integrated.
The gaussian is not the only convolution kernel that
meets these criteria. However, a more specific motivation
for our choice is a property of the gaussian convolution's
*A complementary approach to the "natural" scale problem
has been developed by Hoffman [9].