A DOCUMENT SKEW DETECTION METHOD USING
RUN-LENGTH ENCODING AND THE HOUGH TRANSFORM
Stuart C. Hinds,l James L. Fisher, and Donald
P.
D'Amato
The MITRE CorporationlCivii Systems Division
Systems Engineering and Applied Technology
7525
Colshire Drive
McLean, Virginia 22102-3481,
USA
Abstract
As part of the development of a document image analysis
system, we have devised a method, based on the Hough transform,
for the detection of document skew and interline spacing--necessary
parameters for the automatic segmentation of text from graphics.
Because the Hough transform is computationally expensive, we
reduce the amount of data within a document image through the
computation of its horizontal and vertical black run-lengths.
Histograms of these run-lengths are used
to
determine whether the
document is in portrait or landscape orientation. A grey scale "burst
image" is created from the black run lengths that are perpendicular
to
the text lines by placing the length of the
run
in
the run's bottom-
most pixel. This data reduction procedure decreases the processing
time of the Hough transform and reduces the effects of non-textual
data on the determination of skew and interline spacing.
1.
Introduction
As more and more institutions and federal agencies confront an
overwhelming abundance of paper and microfilm-based information,
it has become increasingly important
to
convert such information into
an efficiently stored, computer searchable form. Document image
analysis is the process
of
identifying the components (text and
non-text) of a document's structure and can be divided into three
phases
[l]:
1)
scanning and binarization, 2) segmentation and
labeling of text and figure blocks, and 3) processing of text (usually
by optical character recognition) and figures. Fast top-down
methods, such as the run-length smoothing algorithm (RLSA),
developed by Wong et. al. [2], and projection profile cuts [3], [4],
have been developed for accomplishing phase 2. Unfortunately,
these methods are generally ineffective when applied
to
skewed
documents. Although other researchers
[5],
(61 have successfully
segmented text from figures in skewed documents, their methods
involve a connected components analysis, which can be
computationally intensive.
We have developed a document image analysis system based
on the approach of Wong et. al. Because Wong's system is limited
to
aligned documents, we have added a skew detection stage
to
our
system. While other methods for determining document skew exist
(see
[7]
for a short review), we have based our method on the
Hough transform because of its sensitivity
to
skew and its
applicability in determining a document's interline spacing--a
necessary parameter for the RLSA. Because the Hough transform
is computationally expensive and slowed by noise, we have
developed a procedure, based on the computation of an image's
vertical black run-lengths,
to
reduce the amount of data and
minimize the effects of noise and non-textual material within the
original document image.
~
1
Current address: University of California at San Diego, Neuropsychology
Research
Lab.,
Childrens Hospital,
8001
Frost
St.,
San Diego,
CA
92123
2.
Review
of
the Hough Transform and Its
Use
In Document Skew Detection
The Hough transform can be used
to
detect lines at any
orientation.
It
consists of mapping points in Cartesian space
(xy)
to
sinusoidal curves in
ptl
space via the transformation:
p
=
xcos(e)
+
ysin(8).
Each time a sinusoidal curve intersects another at a particular value
of
p
and
8,
the likelihood increases that a line corresponding
to
that
p8
coordinate value is present in the original image. An accumulator
array (consisting of
R
rows and
T
columns) is used to count the
number of intersections at various
p
and
8
values. Those cells in the
accumulator array with the highest number of counts will correspond
to
lines in the original image. The basic computational steps in the
Hough transform are as follows:
For (x)
For (Y)
if (pixel is black)
(
For
(8)
(
1
Calculate
p
=
xcos(8)
+
ysin(8)
increment accumulator array at
p,8
I
It
should be noted that the Hough transform is usually applied
to
binary images; hence the need
to
check
if
a pixel is black (i.e., part
of
the information within the image).
The angular resolution that the Hough transform will detect
depends on how finely the columns of the accumulator array
(corresponding
to
0
values) are spaced. The increment size
between
0
values and the maximum skew angle that should be
detected will in turn affect the computation time of the Hough
transform. In general,
8
ranges either from
0
to
180
or
-90
to
90
degrees in increments of one degree.
The number of rows,
R,
of the accumulator array
(corresponding
to
p
values) affects how well the Hough transform
resolves lines. To detect fine lines,
R
should be such that each xy
point along a straight column can
be
mapped
to
a unique row.
Therefore,
to
detect every point in a rectangular image (with
8
ranging from
-90
to
90
degrees):
R
=
(w2
+
h2)'Q
where w
=
width and h
=
height.
Because text lines are actually thick lines of sparse density,
the problem of determining the skew angle of text lines becomes
more difficult than determining the skew angle of fine lines. Two
groups of researchers have developed different methods for
detecting document skew with the Hough transform.
464
CH2898-5/90/0000/0464$01
.OO
0
1990
IEEE
评论0
最新资源