![](https://csdnimg.cn/release/download_crawler_static/89063333/bg1.jpg)
Article 1
Recognizing People
by Their Gait:
The Shape of Motion
James J. Little
Jeffrey E. Boyd
Videre: Journal of Computer Vision Research
Quarterly Journal
Winter 1998, Volume 1, Number 2
The MIT Press
Videre: Journal of Computer Vision Research (ISSN 1089-2788) is a
quarterly journal published electronically on the Internet by The MIT
Press, Cambridge, Massachusetts, 02142. Subscriptions and address
changes should be addressed to MIT Press Journals, Five Cambridge
Center, Cambridge, MA 02142; phone: (617) 253-2889; fax: (617)
577-1545; e-mail: journals-orders@mit.edu. Subscription rates are:
Individuals $30.00, Institutions $125.00. Canadians add additional
7% GST. Prices subject to change without notice.
Subscribers are licensed to use journal articles in a variety of ways,
limited only as required to insure fair attribution to authors and the
Journal, and to prohibit use in a competing commercial product. See
the Journals World Wide Web site for further details. Address inquiries
to the Subsidiary Rights Manager, MIT Press Journals, Five Cambridge
Center, Cambridge, MA 02142; phone: (617) 253-2864; fax: (617)
258-5028; e-mail: journals-rights@mit.edu.
© 1998 by the Massachusetts Institute of Technology
![](https://csdnimg.cn/release/download_crawler_static/89063333/bg2.jpg)
The image flow of a moving figure
varies both spatially and tempo-
rally. We develop a model-free
description of instantaneous mo-
tion, the shape of motion, that
varies with the type of moving fig-
ure and the type of motion. We use
that description to recognize indi-
viduals by their gait, discriminating
them by periodic variation in the
shape of their motion. For each im-
age in a sequence, we derive dense
optical flow, (u(x, y), v(x, y)).
Scale-independent scalar features
of each flow, based on moments
of the moving point weighted by
|u|, |v|,or|(u, v)|, characterize the
spatial distribution of the flow.
We then analyze the periodic
structure of these sequences of
scalars. The scalar sequences for
an image sequence have the same
fundamental period but differ in
phase, which is a phase feature for
each signal. Some phase features
are consistent for one person
and show significant statistical
variation among persons. We
use the phase feature vectors to
recognize individuals by the shape
of their motion. As few as three
features out of the full set of twelve
lead to excellent discrimination.
Keywords: action recognition,
gait recognition, motion features,
optic flow, motion energy, spatial
frequency, analysis
Recognizing People by Their Gait:
The Shape of Motion
James J. Little,
1
Jeffrey E. Boyd
2
1 Introduction
Our goal is to develop a model-free description of image motion, and
then to demonstrate its usefulness by describing the motion of a walking
human figure and recognizing individuals by variation in the character-
istics of the motion description. Such a description is useful in video
surveillance where it contributes to the recognition of individuals and
can indicate aspects of an individual’s behavior. Model-free descriptions
of motion could also prove useful in vision-based user interfaces by help-
ing to recognize individuals, what they are doing, and nuances of their
behavior.
The pattern of motion in the human gait has been studied in kinesi-
ology using data acquired from moving light displays. Using such data,
kinesiologists describe the forward propulsion of the torso by the legs,
the ballistic motion of swinging arms and legs, and the relationships
among these motions [23, 30]. Similarly, in computer vision, model-
based approaches to gait analysis recover the three-dimensional struc-
ture of a person in a model and then interpret the model. The literature
on moving light displays provides an introduction to modeling and mov-
ing figures [11]. Unuma, Anjyo, and Takeuchi [42] show the value of a
structural model in describing variations in gaits. They use Fourier analy-
sis of joint angles in a model to synthesize images of different types of
gaits, e.g., a happy walk versus a tired walk.
Alternatives to the model-based approach emphasize determining
features of the motion fields, acquired from a sequence of images, with-
out structural reconstruction. Recent theoretical work demonstrates the
recoverability of affine motion characteristics from image sequences
[38]. It is therefore reasonable to suggest that variations in gaits are
recoverable from variations in images sequences and that a model-free
approach to gait analysis is viable. Moreover, during periodic motion
the varying spatial distribution of motion is apparent. Capturing this
variation and analyzing its temporal variation should lead to a useful
characterization of periodic motion.
Hogg [16] was among the first to study the motion of a walking figure
using an articulated model. There have recently been several attempts to
recover characteristics of gait from image sequences, without the aid of
annotation via lights [35, 5, 27, 28, 31, 32, 3, 4]. Niyogi and Adelson
[27, 28] emphasize segmentation over a long sequence of frames. Their
technique relies on recovering the boundaries of moving figures in the
xt domain [27] and recently [28] xyt spatiotemporal solids, followed by
fitting deformable splines to the contours. These splines are the elements
of the articulated nonrigid model whose features aid recognition.
Polana and Nelson [31, 32] characterize the temporal texture of a
moving figure by “summing the energy of the highest amplitude fre-
quency and its multiples.” They use Fourier analysis. The results are
1. Department of Computer Science, Uni-
versity of British Columbia, Vancouver, B.C.,
Canada V6T 1Z4. little@cs.ubc.ca
2. Department of Electrical and Computer En-
gineering, University of California, La Jolla,
CA 92093-0407. jeffboyd@ece.ucsd.edu
Copyright © 1998
Massachusetts Institute of Technology
mitpress.mit.edu/videre.html
VIDERE 1:2 The Shape of Motion 2
![](https://csdnimg.cn/release/download_crawler_static/89063333/bg3.jpg)
normalized with respect to total energy so that the measure is 1 for pe-
riodic events and 0 for a flat spectrum. Their input is a sequence of 128
frames, each 128 × 128 pixels. Their analysis consists of determining
the normal flow, thresholding the magnitude of the flow, determining
the centroid of all “moving” points, and computing the mean velocity of
the centroid. The motion in xyt of the centroid determines a linear tra-
jectory. They use as motion signals reference curves that are “lines in the
temporal solid parallel to the linear trajectory.”
Polana and Nelson’s more recent work [32, 33] emphasizes the spatial
distribution of energies around the moving figure. They compute spatial
statistics in a coarse mesh and derive a vector describing the relative
magnitudes and periodicity of activity in the regions over time. Their
experiments demonstrate that the values so derived can be used to
discriminate among differing activities.
Shavit and Jepson [39, 40] use the centroid and moments of a bi-
narized motion figure to represent the distribution of its motion. The
movement of the centroid characterizes the external forces on an object,
while the deformation of the object is computed from the dispersion (the
eigenvalues of the covariance matrix) or ratio of lengths of the moment
ellipse.
Bobick and Davis [6] introduced the Motion Energy Image (MEI), a
smoothed description of the cumulative spatial distribution of motion
energy in a motion sequence. They match this description of motion
against stored models of known actions. Bobick and Davis [7] enhanced
the MEI to form a motion-history image (MHI), where pixel intensity
is a function, over time, of the energy in the current motion energy
(binarized) and recent activity, which they extend in later work [14].
We will discuss these two representations further in Section 2.2.
Baumberg and Hogg [3] present a method of representing the shape
of a moving body at an instance in time. Their method produces a
description composed of a set of principal spline components and a
direction of motion. In later work, Baumberg and Hogg [4] add temporal
variation by modeling the changing shape as a vibrating plate. They
create a vibration model for a “generic” pedestrian and then are able
to measure the quality of fit of the generic data to another pedestrian.
Liu and Picard [22] detect and segment areas of periodic motion in
images by detecting spectral harmonic peaks. The method is not model
based and identifies regions in the images that exhibit periodic motion.
Recently more elaborate models, often including kinematics and dy-
namics of the human figure, are used to track humans in sequences [36,
9, 19, 18, 43].
Our work, in the spirit of Polana and Nelson, and Baumberg and
Hogg, is a model-free approach making no attempt to recover a struc-
tural model of a human subject. Instead we describe the shape of the
motion with a set of features derived from moments of a dense flow dis-
tribution [20]. Our goal is not to fingerprint people, but to determine
what content of motion aids recognition. We wish to recognize gaits,
both types of gaits as well as individual gaits. The features are invariant
to scale and do not require synchronization of the gait or identification
of reference points on the moving figure.
The following sections describe the creation of motion features and
an experiment that determines the variation of the features over a set of
walking subjects. Results of the experiment show that features acquired
by our process exhibit significant variation due to different subjects and
are suitable for recognition of people by subtle differences in their gaits,
VIDERE 1:2 The Shape of Motion 3
![](https://csdnimg.cn/release/download_crawler_static/89063333/bg4.jpg)
Figure 1. Sample image from exper-
imental data described in Section 3
(number 23 of 84 images, sequence
3 of subject 5).
Figure 2. The structure of the image
analysis. Each image sequence
produces a vector of m − 1 phase
values.
image sequence
( n + 1 frames)
optical flow
time−varying
scalars
scalar
sequences
phases
phase
features
feature vector
...
(s
1
,s
2
,...,s
m
)(s
1
,s
2
,...,s
m
)(s
1
,s
2
,...,s
m
)(s
1
,s
2
,...,s
m
)
S
1
={s
11
,s
12
,...,s
1
n
} S
2
={s
21
,s
22
,...,s
2
n
} S
m
={s
m
1
,s
m
2
,...,s
m
n
}
φ
1
φ
2
φ
m
F
1
= φ
1
− φ
m
F
2
= φ
2
− φ
m
F
m
−1
= φ
m
−1
− φ
m
...
(F
1
,F
2
, ... ,F
m
−1
)
as identified by phase analysis of periodic variations in the shape of
motion.
2 Motion Feature Creation
Image sequences are gathered while the subject is walking laterally
before a static camera and processed offline. Motion stabilization could
be accomplished by a tracking system that pursues a moving object, e.g.,
Little and Kam [21]. However, our focus is on the motion, so we restrict
the experimental situation to a single subject moving in the field of view
before a static camera. Figure 1 shows an example of the images used,
image number 23 of 84 in a sequence taken from the experimental data
described in Section 3.
Figure 2 illustrates the data flow through the system that creates our
motion features. We begin with an image sequence of n + 1 images and
then derive n dense optical flow images. For each of these optical flow
VIDERE 1:2 The Shape of Motion 4
![](https://csdnimg.cn/release/download_crawler_static/89063333/bg5.jpg)
images we compute m characteristics that describe the shape of the mo-
tion (i.e., the spatial distribution of the flow), for example, the centroid
of the moving points, and various moments of the flow distribution.
Some of these are locations in the image, but we treat all as time-varying
scalar values. We arrange the values to form a time series for each scalar.
A walking person undergoes periodic motion, returning to a standard
position after a certain time period that depends on the frequency of the
gait. Thus we analyze the periodic structure of these time series, and
determine the fundamental frequency of the variation of each scalar.
The set of time series for an image sequence share the same frequency,
or simple multiples of the fundamental, but their phases vary. To make
the data from different sequences comparable, we subtract a reference
phase, φ
m
, derived from one of the scalars. We characterize each image
sequence by a vector, F = (F
1
, ..., F
m−1
),ofm − 1 relative phase fea-
tures. The phase feature vectors are then used to recognize individuals.
2.1 Tracking and Optical Flow
The motion of the object is a path in three dimensions; we view its pro-
jection. Instead of determining motion of three-dimensional elements of
a figure, we look for characteristics of the periodic variation of the two-
dimensional optical flow.
The raw optical flow identifies temporal changes in brightness; how-
ever, illumination changes such as reflections, shadows, moving clouds,
and inter-reflections between the moving figure and the background, as
well as reflections of the moving figure in specular surfaces in the back-
ground, pollute the motion signal. To isolate the moving figure, we man-
ually compute the average displacement of the person through the image
sequence and then use only the flow within a moving window traveling
with the average motion. Within the window there remain many islands
of small local variation, so we compute the connected components of
each F
j
and eliminate all points not in sufficiently large connected re-
gions. The remaining large components form a mask within which we
can analyze the flow. This reduces the sensitivity of the moment compu-
tations to outlying points.
Figure 3 shows six subimages from the sequence corresponding to
Figure 1. We will refer to the subimages as images from here on and
will display our results in subimages, for compactness. All processing is
carried out in the coordinates of the original frames.
Unlike other methods, we use dense optical flow fields, generated by
minimizing the sum of absolute differences between image patches [10].
The algorithm is sensitive to brightness change caused by reflections,
shadows, and changes of illumination, so we first process the images
by computing the logarithm of brightness, transforming the multiplica-
tive effect of illumination change into an additive one. Filtering by a
Laplacian of Gaussian (effectively a bandpass filter) removes the addi-
tive effects.
The optical flow algorithm, for each pixel, searches among a limited
set of discrete displacements for the displacement (u(x, y), v(x, y)) that
minimizes the sum of absolute differences between a patch in one image
and the corresponding displaced patch in the other image. The algorithm
finds a best-matching patch in the second image for each patch in the
first. The algorithm is run a second time, switching the roles of the two
images. For a correct match, the results will likely agree. In order to
remove invalid matches, we compare the results at each point in the
VIDERE 1:2 The Shape of Motion 5