Lucas-Kanade 20 Years On: A Unifying Framework: Part 1
Simon Baker and Iain Matthews
CMU-RI-TR-02-16
Abstract
Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the
most widely used techniques in computer vision. Applications range from optical flow and tracking
to layered motion, mosaic-ing, and face coding. Numerous algorithms have been proposed and a
wide variety of extensions have been made to the original formulation. We present an overview of
image alignment, describing most of the algorithms and their extensions in a consistent framework.
We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently
proposed. We examine which of the extensions to Lucas-Kanade can be used with the inverse
compositional algorithm without any significant loss of efficiency, and which cannot. In this paper,
Part 1 of a 2 part series, we cover the quantity approximated, the warp update rule, and the gradient
descent approximation. In a future Part 2 of this 2 paper series we will cover the choice of the norm,
how to allow linear appearance variation, how to impose priors on the parameters, and various
heuristics to avoid local minima.
Keywords: Image alignment, Lucas-Kanade, a unifying framework, additive vs. compositional al-
gorithms, forwards vs. inverse algorithms, the inverse compositional algorithm, efficiency, steepest
descent, Gauss-Newton, Newton, Levenberg-Marquardt.
1 Introduction
Image alignment consists of moving, and possibly deforming, a template to minimize the differ-
ence between the template and an image. Since the first use of image alignment in the Lucas-
Kanade optical flow algorithm [11], image alignment has become one of the most widely used
techniques in computer vision. Besides optical flow, some of its other applications include track-
ing [4, 10], parametric and layered motion estimation [3], mosaic-ing [14], and face coding [2, 6].
The usual approach to image alignment is gradient descent. A variety of other numerical algo-
rithms such as difference decomposition [9] and linear regression [6] have also been proposed, but
gradient descent is the defacto standard. Gradient descent can be performed in variety of different
ways, however. For example, one difference between the various approaches is whether they esti-
mate an additive increment to the parameters (the additive approach [11]), or whether they estimate
an incremental warp that is then composed with the current estimate of the warp (the compositional
approach [14].) Another difference is whether the algorithm performs a Gauss-Newton, a Newton,
a steepest-descent, or a Levenberg-Marquardt approximation in each gradient descent step.
We propose a unifying framework for image alignment, describing the various algorithms and
their extensions in a consistent manner. Throughout the framework we concentrate on the inverse
compositional algorithm, an efficient algorithm that we recently proposed [2]. We examine which
of the extensions to Lucas-Kanade can be applied to the inverse compositional algorithm without
any significant loss of efficiency, and which extensions require additional computation. Wherever
possible we provide empirical results to illustrate the various algorithms and their extensions.
In this paper, Part 1 of a 2 part series, we begin in Section 2 by reviewing the Lucas-Kanade
algorithm. We proceed in Section 3 to analyze the quantity that is approximated by the various
image alignment algorithms and the warp update rule that is used. We categorize algorithms as
either additive or compositional, and as either forwards or inverse. We prove the first order equiv-
alence of the various alternatives, derive the efficiency of the resulting algorithms, describe the set
of warps that each alternative can be applied to, and finally empirically compare the algorithms.
In Section 4 we describe the various gradient descent approximations that can be used in each
iteration, Gauss-Newton, Newton, diagonal Hessian, Levenberg-Marquardt,andsteepest-descent
[12]. We compare these alternative both in terms of speed and in terms of empirical performance.
We conclude in Section 5 with a discussion. In Part 2 of this 2 part framework (currently under
preparation), we will cover the choice of the error norm, how to allow linear appearance variation,
how to add priors on the parameters, and various heuristics to avoid local minima.
2 Background: Lucas-Kanade
The original image alignment algorithm was the Lucas-Kanade algorithm [11]. The goal of Lucas-
Kanade is to align a template image
T
(
x
)
to an input image
I
(
x
)
, where
x
=(
x; y
)
T
is a column
vector containing the pixel coordinates. If the Lucas-Kanade algorithm is being used to compute
optical flow or to track an image patch from time
t
= 1
to time
t
= 2
, the template
T
(
x
)
is an
extracted sub-region (a
5
5
window, maybe) of the image at
t
=1
and
I
(
x
)
is the image at
t
=2
.
1
Let
W
(
x
;
p
)
denote the parameterized set of allowed warps, where
p
= (
p
1
;:::p
n
)
T
is a
vector of parameters. The warp
W
(
x
;
p
)
takes the pixel
x
in the coordinate frame of the template
T
and maps it to the sub-pixel location
W
(
x
;
p
)
in the coordinate frame of the image
I
. If we are
computing optical flow, for example, the warps
W
(
x
;
p
)
might be the translations:
W
(
x
;
p
) =
x
+
p
1
y
+
p
2
!
(1)
where the vector of parameters
p
= (
p
1
;p
2
)
T
is then the optical flow. If we are tracking a larger
image patch moving in 3D we may instead consider the set of affine warps:
W
(
x
;
p
) =
(1 +
p
1
)
x
+
p
3
y
+
p
5
p
2
x
+ (1 +
p
4
)
y
+
p
6
!
=
1+
p
1
p
3
p
5
p
2
1+
p
4
p
6
!
0
B
@
x
y
1
1
C
A
(2)
where there are 6 parameters
p
=(
p
1
;p
2
;p
3
;p
4
;p
5
;p
6
)
T
as, for example, was done in [3]. (There
are other ways to parameterize affine warps. In Part 2 of this framework we will investigate what
is the best way.) In general, the number of parameters
n
may be arbitrarily large and
W
(
x
;
p
)
can
be arbitrarily complex. One example of a complex warp is the set of piecewise affine warps used
in Active Appearance Models [6], Active Blobs [13], and Flexible Appearance Models [2].
2.1 Goal of the Lucas-Kanade Algorithm
The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two
images, the template
T
and the image
I
warped back onto the coordinate frame of the template:
X
x
[
I
(
W
(
x
;
p
))
T
(
x
)]
2
:
(3)
Warping
I
back to compute
I
(
W
(
x
;
p
))
requires interpolating the image
I
at the sub-pixel loca-
tions
W
(
x
;
p
)
. The minimization in Equation (3) is performed with respect to
p
and the sum is
performed over all of the pixels
x
in the template image
T
(
x
)
. Minimizing the expression in Equa-
tion (1) is a non-linear optimization task even if
W
(
x
;
p
)
is linear in
p
because the pixel values
I
(
x
)
are, in general, non-linear in
x
. In fact, the pixel values
I
(
x
)
are essentially un-related to
the pixel coordinates
x
. To optimize the expression in Equation (3), the Lucas-Kanade algorithm
assumes that a current estimate of
p
is known and then iteratively solves for increments to the
parameters
p
; i.e. the following expression is (approximately) minimized:
X
x
[
I
(
W
(
x
;
p
+
p
))
T
(
x
)]
2
(4)
with respect to
p
, and then the parameters are updated:
p
p
+
p
:
(5)
These two steps are iterated until the estimates of the parameters
p
converge. Typically the test for
convergence is whether some norm of the vector
p
is below a threshold
;i.e.
k
p
k
.
2
2.2 Derivation of the Lucas-Kanade Algorithm
The Lucas-Kanade algorithm (which is a Gauss-Newton gradient descent non-linear optimization
algorithm) is then derived as follows. The non-linear expression in Equation (4) is linearized by
performing a first order Taylor expansion on
I
(
W
(
x
;
p
+
p
))
to give:
X
x
"
I
(
W
(
x
;
p
)) +
r
I
@
W
@
p
p
T
(
x
)
#
2
:
(6)
In this expression,
r
I
= (
@I
@x
;
@I
@y
)
is the gradient of image
I
evaluated at
W
(
x
;
p
)
;i.e.
r
I
is
computed in the coordinate frame of
I
and then warped back onto the coordinate frame of
T
using the current estimate of the warp
W
(
x
;
p
)
. The term
@
W
@
p
is the Jacobian of the warp. If
W
(
x
;
p
)=(
W
x
(
x
;
p
)
;W
y
(
x
;
p
))
T
then:
@
W
@
p
=
0
@
@W
x
@p
1
@W
x
@p
2
:::
@W
x
@p
n
@W
y
@p
1
@W
y
@p
2
:::
@W
y
@p
n
1
A
:
(7)
We follow the notational convention that the partial derivatives with respect to a column vector are
laid out as a row vector. This convention has the advantage that the chain rule results in a matrix
multiplication, as in Equation (6). For example, the affine warp in Equation (2) has the Jacobian:
@
W
@
p
=
x
0
y
0 1 0
0
x
0
y
0 1
!
:
(8)
Equation (6) is a least squares problem and has a closed from solution which can be derived as
follows. The partial derivative of the expression in Equation (6) with respect to
p
is:
X
x
"
r
I
@
W
@
p
#
T
"
I
(
W
(
x
;
p
)) +
r
I
@
W
@
p
p
T
(
x
)
#
(9)
where we refer to
r
I
@
W
@
p
as the steepest descent images. (See Section 4.3 for why.) Setting this
expression to equal zero and solving gives the closed form solution of Equation (6) as:
p
=
H
1
X
x
"
r
I
@
W
@
p
#
T
[
T
(
x
)
I
(
W
(
x
;
p
))]
(10)
where
H
is the
n
n
(Gauss-Newton approximation to the) Hessian matrix:
H
=
X
x
"
r
I
@
W
@
p
#
T
"
r
I
@
W
@
p
#
:
(11)
For reasons that will become clear later we refer to
P
x
[
r
I
@
W
@
p
]
T
[
T
(
x
)
I
(
W
(
x
;
p
))]
as the steep-
est descent parameter updates. Equation (10) then expresses the fact that the parameter updates
3
The Lucas-Kanade Algorithm
Iterate:
(1) Warp
I
with
W
(
x
;
p
)
to compute
I
(
W
(
x
;
p
))
(2) Compute the error image
T
(
x
)
I
(
W
(
x
;
p
))
(3) Warp the gradient
r
I
with
W
(
x
;
p
)
(4) Evaluate the Jacobian
@
W
@
p
at
(
x
;
p
)
(5) Compute the steepest descent images
r
I
@
W
@
p
(6) Compute the Hessian matrix using Equation (11)
(7) Compute
P
x
[
r
I
@
W
@
p
]
T
[
T
(
x
)
I
(
W
(
x
;
p
))]
(8) Compute
p
using Equation (10)
(9) Update the parameters
p
p
+
p
until
k
p
k
Figure 1: The Lucas-Kanade algorithm [11] consists of iteratively applying Equations (10) & (5) until the
estimates of the parameters
p
converge. Typically the test for convergence is whether some norm of the
vector
p
is below a user specified threshold
. Because the gradient
r
I
must be evaluated at
W
(
x
;
p
)
and the Jacobian
@
W
@
p
must be evaluated at
p
, all 9 steps must be repeated in every iteration of the algorithm.
p
are the steepest descent parameter updates multiplied by the inverse of the Hessian matrix.
The Lucas-Kanade algorithm [11] then consists of iteratively applying Equations (10) and (5). See
Figures 1 and 2 for a summary. Because the gradient
r
I
must be evaluated at
W
(
x
;
p
)
and the
Jacobian
@
W
@
p
at
p
, they both in general depend on
p
. For some simple warps such as the transla-
tions in Equation (1) and the affine warps in Equation (2) the Jacobian can sometimes be constant.
See for example Equation (8). In general, however, all 9 steps of the algorithm must be repeated
in every iteration because the estimates of the parameters
p
vary from iteration to iteration.
2.3 Requirements on the Set of Warps
The only requirement on the warps
W
(
x
;
p
)
is that they are differentiable with respect to the warp
parameters
p
. This condition is required to compute the Jacobian
@
W
@
p
. Normally the warps are
also (piecewise) differentiable with respect to
x
, but even this condition is not strictly required.
2.4 Computational Cost of the Lucas-Kanade Algorithm
Assume that the number of warp parameters is
n
and the number of pixels in
T
is
N
.Step1ofthe
Lucas-Kanade algorithm usually takes time
O(
nN
)
. For each pixel
x
in
T
we compute
W
(
x
;
p
)
andthensample
I
at that location. The computational cost of computing
W
(
x
;
p
)
depends on
W
but for most warps the cost is
O(
n
)
per pixel. Step 2 takes time
O(
N
)
. Step 3 takes the
same time as Step 1, usually
O(
nN
)
. Computing the Jacobian in Step 4 also depends on
W
but
for most warps the cost is
O(
n
)
per pixel. The total cost of Step 4 is therefore
O(
nN
)
.Step5
takes time
O(
nN
)
, Step 6 takes time
O(
n
2
N
)
, and Step 7 takes time
O(
nN
)
. Step 8 takes time
4
- 1
- 2
前往页