A Fast Median Filter using AltiVec
Priyadarshan Kolte Roger Smith Wen Su
Motorola, Inc., Austin, Texas
pkolte,rogers,wens@ncct.sps.mot.com
Abstract
This paper describes the design and implementation of
a median lter for graphics images on the Motorola
AltiVec architecture. The lter utilizes 16-way SIMD
parallelism to lter images at rates of 1.15 cycles/pixel
for
3
3
squares and 6.6 cycles/pixel for
5
5
squares.
The median lter is based on a new sorting network
which sorts
N
2
numbers (arrangedinan
N
N
square)
by sorting al l columns, rows, and diagonal lines in the
square. This paper also describes a scheme for eÆcient
testing of the sorting network.
1. Intro duction
Two dimensional graphics images are rectangles of pix-
els, where each pixel has a discrete value. A
median
lter
computes the output value of each pixel in an im-
age as the median value of pixels in a small neighbor-
hoo d such as the 9 pixels in the 3
3 square centered
around the pixel [8]. This type of non-linear ltering is
eective in removing random noise from images with-
out causing the blurring that is typical of linear low{
pass lters, and it is oered by popular image pro cess-
ing programs. However, median ltering for images
containing many pixels is a CPU intensive task, and
there is a need to improve execution time.
A dominant factor in the execution time of the lter
is p erforming compares of pixel values to nd the me-
dian value. There are three ways of improving the time
required by these compare operations. The rst is to
reduce the number of compares needed. The second is
to reuse the results of compares p erformed within one
square for nding the medians of adjacentoverlapping
squares. The third way is to use the parallelism of-
fered by mo dern CPU architectures to compare multi-
ple pixel values simultaneously. Wehave developed a
median lter which uses all three ways to improve the
execution time.
Our implementation of the median lter targets the
Motorola AltiVec architecture [1, 4]. This architecture
extends the PowerPC architecture with vector instruc-
tions that can process 16 8-bit data values simultane-
ously. We have measured image pro cessing rates of
1.15 cycles/pixel for 3
3 squares and 6.6 cycles/pixel
for 5
5 squares using our median lter implemen-
tation. A comparable implementation of the median
nding algorithm byPaeth [8] would require at least
2 cycles/pixel for 3
3 squares and 10.9 cycles/pixel
for 5
5 squares. Thus, our implementation improves
processing rates by at least 65% over prior work.
Our median lter is based on a new sorting network
for numbers arranged in squares. This sorting network
for squares operates by successively sorting columns,
rows, and diagonal line segments of varying slop es.
Section 2 describes the scalar median algorithm and
sorting network. Section 3 describ es our technique for
testing the correctness of the median algorithm and
sorting network. Section 4 provides the details of the
AltiVec implementation and explains how we paral-
lelized the scalar median algorithm. Section 5 presents
related work and Section 6 concludes.
2. Scalar median algorithm
We rst discuss a scalar algorithm for nding the me-
dian of a 3
3 square and then generalize the median
algorithm for an
N
N
square, where
N
is o dd. We
also describe sorting networks for completely sorting
an
N
N
square, rst for the case where
N
is odd,
and then for the case where
N
is even.
2.1. Background on sorting networks
A
compare-swap
of two elements
(a,b)
compares
and exchanges
a
and
b
so that we have
a <= b
af-
ter the op eration. A
sorting network
is a sequence
of
compare-swap
operations that depends only on the
numb er of elements to b e sorted, not on the values of
the elements [5]. Bubblesort is an example of a sorting
network whereas quicksort is not a sorting network.
Although a sorting network for a xed numb er of el-
ements usually requires a greater numb er of compare
operations than a sorting method such as quicksort,
评论3