Incremental-Learning-for-Tracking.rar_INCREMENTALLEARNING

共1个文件

pdf：1个

版权申诉

目标跟踪

146 浏览量 2022-09-22 22:21:10 上传评论收藏 1.79MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

Incremental-Learning-for-Tracking.rar （1个子文件）

Incremental Learning for Robust Visual Tracking.pdf 1.87MB

Int J Comput Vis (2008) 77: 125–141

DOI 10.1007/s11263-007-0075-7

Incremental Learning for Robust Visual Tracking

David A. Ross ·Jongwoo Lim ·Ruei-Sung Lin ·

Ming-Hsuan Yang

Received: 6 September 2005 / Accepted: 17 July 2007 / Published online: 17 August 2007

Abstract Visual tracking, in essence, deals with non-

stationary image streams that change over time. While most

existing algorithms are able to track objects well in con-

trolled environments, they usually fail in the presence of sig-

niﬁcant variation of the object’s appearance or surrounding

illumination. One reason for such failures is that many algo-

rithms employ ﬁxed appearance models of the target. Such

models are trained using only appearance data available be-

fore tracking begins, which in practice limits the range of ap-

pearances that are modeled, and ignores the large volume of

information (such as shape changes or speciﬁc lighting con-

ditions) that becomes available during tracking. In this pa-

per, we present a tracking method that incrementally learns a

low-dimensional subspace representation, efﬁciently adapt-

ing online to changes in the appearance of the target. The

model update, based on incremental algorithms for princi-

pal component analysis, includes two important features: a

method for correctly updating the sample mean, and a for-

D.A. Ross (



)

University of Toronto, 10 Kings College Road, Toronto, ON,

M55 3G4, Canada

e-mail: dross@cs.toronto.edu

J. Lim · M.-H. Yang

Honda Research Institute, 800 California Street, Mountain View,

CA 94041, USA

J. Lim

e-mail: jlim@honda-ri.com

M.-H. Yang

e-mail: mhyang@ieee.org

R.-S. Lin

Motorola Labs, 1303 E Algonquin Rd., Schaumburg, IL 60196,

USA

e-mail: ruei-sung.lin@motorola.com

getting factor to ensure less modeling power is expended

ﬁtting older observations. Both of these features contribute

measurably to improving overall tracking performance. Nu-

merous experiments demonstrate the effectiveness of the

proposed tracking algorithm in indoor and outdoor envi-

ronments where the target objects undergo large changes

in pose, scale, and illumination.

Keywords Visual tracking · Subspace update · Online

algorithms ·Adaptive methods ·Particle ﬁlter · Illumination

1 Introduction

Visual tracking essentially deals with non-stationary data,

both the target object and the background, that change over

time. Most existing algorithms are able to track objects,

either previously viewed or not, in short durations and in

well controlled environments. However these algorithms

usually fail to observe the object motion or have signiﬁ-

cant drift after some period of time, due to drastic change

in the object’s appearance or large lighting variation in its

surroundings. Although such situations can be ameliorated

with recourse to richer representations, effective prediction

schemes or combination, most algorithms typically oper-

ate on the premise that the model of the target object does

not change drastically over time. Examples abound, ranging

from representation methods based on view-based appear-

ance models (Black and Jepson 1996), contours (Isard and

Blake 1996), parametric templates of geometry and illumi-

nation (Hager and Belhumeur 1996), integration of shape

and color (Birchﬁeld 1998), mixture models (Black et al.

1998), 3D models (La Cascia and Sclaroff 1999), exemplars

(Toyama and Blake 2001), foreground/background models

(Harville 2002) templates with updating (Matthews et al.

126 Int J Comput Vis (2008) 77: 125–141

2004); prediction schemes using particle ﬁlters (Isard and

Blake 1996), joint probabilistic data association ﬁlters (Ras-

mussen and Hager 1998), kernel-based ﬁlters (Comaniciu

et al. 2003; Georgescu et al. 2004), support vector machines

(Avidan 2001; Williams et al. 2003) and variational infer-

ence (Vermaak et al. 2003). These algorithms usually build

or learn a model of the target object ﬁrst and then use it for

tracking, without adapting the model to account for changes

in the appearance of the object, e.g., large variation of pose

or facial expression, or the surroundings, e.g., lighting varia-

tion. Furthermore, it is assumed that all images are acquired

with a stationary camera. Such an approach, in our view, is

prone to performance instability, thus needs to be addressed

when building a robust visual tracker.

The chief challenge of visual tracking can be attributed

to the difﬁculty in handling the appearance variability of a

target object. Intrinsic appearance variability includes pose

variation and shape deformation,whereas extrinsic illumina-

tion change, camera motion, camera viewpoint, and occlu-

sions inevitably cause large appearance variation. Due to the

nature of the tracking problem, it is imperative for a robust

algorithm to model such appearance variation.

In this paper we propose a method

that, during visual

tracking, efﬁciently learns and updates a low dimensional

subspace representation of the target object. The advantages

of this adaptive subspace representation are several fold.

The subspace representation provides a compact notion of

the “thing” being tracked rather than treating the target as a

set of independent pixels, i.e., “stuff” (Adelson and Bergen

1991), and facilitates object recognition. An efﬁcient incre-

mental method continually updates the subspace model to

reﬂect changes in appearance caused by intrinsic and extrin-

sic factors, thereby facilitating the tracking process. Incre-

mentally updating the subspace removes the ofﬂine learning

phase required by other eigentrackers, allowing one to track

objects for which a database of training images is not even

available. To estimate the locations of the target objects in

consecutive frames, we use a sampling algorithm with like-

lihood estimates, which is in contrast to other tracking meth-

ods that usually solve complex optimization problems using

gradient descent. Furthermore, while numerous algorithms

operate under the assumption that there is no camera motion,

our method is able to track objects without this constraint.

The remaining part of this paper is organized as follows.

We begin, in the next section, by reviewing the most relevant

algorithms that motivated this work. The details of our algo-

rithm are described in Sect. 3, where we propose an efﬁcient

incremental subspace method with a mean update and for-

getting factor, followed by an effective tracking algorithm.

Preliminary results of this paper were presented in Ross et al. (2004)

and Lim et al. (2005).

The results of numerous experiments and performance eval-

uation are presented in Sect. 4. We conclude this paper with

remarks on potential extensions for future work. The data,

source code, and videos corresponding to this work can all

be found at http://www.cs.toronto.edu/~dross/ivt/.

2 Related Work and Motivation

There is a rich literature in visual tracking and a thorough

discussion on this topic is beyond the scope of this paper. In

this section we review only the most relevant visual track-

ing work, focusing on algorithms that operate directly on

grayscale images. We contrast our method with these meth-

ods in terms of their representation scheme, target prediction

approach, and their ability to handle changes in illumination

as well as appearance.

Visual tracking problems have conventionally been for-

mulated as prediction tasks within which ﬁxed templates and

optical ﬂow techniques are utilized to estimate the motion of

a target object (Lucas and Kanade 1981). Such approaches

do not take the appearance variability into consideration, and

thus perform well only over short periods of time. To en-

hance the robustness of such object trackers, Black and Jep-

son proposed an algorithm using a pre-trained view-based

eigenbasis representation and a robust error norm (Black

and Jepson 1996). Instead of relying on the brightness con-

stancy principal assumed in optical ﬂow techniques, they

advocated the use of a subspace constancy assumption for

motion estimation. Although their algorithm demonstrated

excellent empirical results, it entailed learning a set of view-

based eigenbases before the tracking task began. To achieve

robust visual tracking with this method, it is imperative to

collect a large set of training images covering the range of

possible appearance variation (including viewing angles and

illumination) from which to construct the eigenbasis, as this

representation, once learned, is not updated.

Observing that low-dimensional linear subspaces are able

to effectively model image variation due to illumination

(Belhumeur and Kreigman 1997), Hager and Belhumeur de-

veloped a tracking algorithm to handle the appearance vari-

ation caused by illumination and pose change using para-

metric models (Hager and Belhumeur 1996). Their method

extends a gradient-based optical ﬂow algorithm by incor-

porating low-dimensional representations (Belhumeur and

Kreigman 1997) for object tracking under varying illumina-

tion conditions. Before tracking begins, a set of illumination

bases needs to be constructed at a ﬁxed pose in order to ac-

count for changes in appearance due to lighting variation.

However, this basis does not attempt to account for changes

in pose such as out-of-plane rotations.

Realizing the limitations of having a single (unimodal or

Gaussian) hypothesis of target location at each timestep—

as produced by the Kalman ﬁlter and its relatives—Isard

Int J Comput Vis (2008) 77: 125–141 127

and Blake introduced particle ﬁlters to visual tracking and

presented the Condensation algorithm for contour track-

ing in which multiple plausible interpretations are propa-

gated over time (Isard and Blake 1996). This probabilis-

tic approach has demonstrated success in tracking the out-

line of target objects in clutter. However, the representa-

tion scheme employed (curves or splines) ignores the inter-

nal appearance of the target, and is not updated to account

for variations in its appearance, due to pose or illumination

change.

Supervised discriminative methods for classiﬁcation and

regression have also been exploited to solve visual tracking

problems. For example, Avidan (2001) developed a tracking

algorithm that employs the support vector machine (SVM)

classiﬁer within a optic ﬂow framework. Avidan modiﬁed

the conventional use of the SVM classiﬁcation score to in-

stead predict target location, by computing image gradi-

ents as is done in optical ﬂow algorithms. Although this al-

gorithm has demonstrated success in tracking speciﬁc ob-

jects, e.g., cars from a mounted camera in a moving vehi-

cle, signiﬁcant effort is required in training a SVM. Along

similar lines, Williams et al. developed a method in which

an SVM-based regressor was used for tracking (Williams

et al. 2003). Instead of relying on optical ﬂow to predict

object location, they learned a perturbation function of spa-

tial in-plane displacements between frames, thereby predict-

ing the most likely object location. As a result of train-

ing the regressor on in-plane image motion, this method

is not effective in tracking objects with out-of-plane move-

ments.

Mixture models have been studied as alternatives to lin-

ear representations, to better account for appearance change

in motion estimation. Black et al. (1998) identiﬁed four pos-

sible factors causing appearance change, ﬁtting them with a

mixture model which was then used to estimate image mo-

tion. A more elaborate mixture model ﬁt via an online EM

algorithm was recently proposed by Jepson et al. (2003), in

which three components were used to model the responses

of wavelet ﬁlters, and thereby account for appearance vari-

ation during tracking. Their method is able to handle vari-

ations in pose, illumination and expression. However, their

appearance model treats pixels within the target region inde-

pendently (ignoring their covariance) and thus does not have

notion of the “thing” being tracked. This can result in mod-

eling background rather than the foreground, thereby failing

to track the target object (Jepson et al. 2003).

Attempts to improve the classic Lucas–Kanade tracker

(Lucas and Kanade 1981) with updates was recently made

by Matthews et al. (2004). They developed a template up-

date method for visual tracking, which employs an active

appearance model (Cootes et al. 2001) to account for im-

age variation. Thus instead of using a ﬁxed template, the

object appearance is modeled by a linear combination of ap-

pearance images. The tracking problem is then formulated

as a search (using gradient descent) for the afﬁne parame-

ters and linear combination which minimize the difference

between the target object and the current appearance model.

The newly tracked object is then used to update appearance

model, as necessary. They demonstrated good tracking re-

sults on vehicles and faces with varying expressions. How-

ever, the authors noted that the computation cost for updat-

ing the template increases dramatically as principal compo-

nent analysis is carried out at each update, and that their

work covers the case where the visibility of the target object

does not change.

Our work is motivated in part by the prowess of subspace

representations as appearance models (Murase and Nayar

1995; Belhumeur and Kreigman 1997), the effectiveness of

particle ﬁlters (Isard and Blake 1996), and the adaptability

of on-line update schemes (Jepson et al. 2003). In contrast to

the eigentracking algorithm (Black and Jepson 1996), our

algorithm does not require a training phase but learns the

eigenbases on-line during the object tracking process. Thus

our appearance model can adapt to changes in pose, view

angle, and illumination not captured by the set of training

images—in fact the need to manually collect training im-

ages prior to tracking is eliminated. Further, our method uses

a particle ﬁlter for motion parameter estimation rather than

the gradient descent method, which often gets stuck in local

minima or is distracted by outliers (Black and Jepson 1996).

Our appearance-based model provides a richer description

than simple curves or splines as used in (Isard and Blake

1996), and has a notion of the “thing” being tracked. In ad-

dition, the learned representation can be utilized for other

tasks such as object recognition. With respect to the template

update method (Cootes et al. 2001), we concurrently devel-

oped an efﬁcient subspace update algorithm that facilitates

object tracking under varying pose and lighting conditions.

Furthermore, our algorithm is able to handle camera motion

while learning compact representations and tracking objects.

In this work, an eigenbasis representation is learned directly

from pixel values corresponding to a target object in the

image space. Experiments show that good tracking results

can be obtained using this representation without employing

more complicated wavelet features as in Jepson et al. (2003),

although this elaboration is still possible and may lead to

even better results. Note also that the view-based eigenba-

sis representation has demonstrated its ability to model the

appearance of objects in different poses (Murase and Nayar

1995), and under different lighting conditions (Belhumeur

and Kreigman 1997). Consequently, the learned eigenbasis

facilitates tracking objects undergoing illumination and pose

change.

128 Int J Comput Vis (2008) 77: 125–141

3 Incremental Learning for Tracking

We present details of the proposed incremental learning al-

gorithm for object tracking in this section. First we propose

an efﬁcient method that incrementally updates an eigenbasis

as new observations arrive, which is used to learn the ap-

pearance of the target while tracking progresses. Next we

describe our approach for drawing particles in the motion

parameter space and predicting the most likely object loca-

tion with the help of the learned appearance model. Collec-

tively, we show how these two modules work in tandem to

track objects well under varying conditions.

3.1 Incremental Update of Eigenbasis and Mean

The appearance of a target object may change drastically

due to intrinsic and extrinsic factors as discussed earlier.

Therefore, to produce a robust tracker, it is important to

adapt the appearance model online, while tracking, to re-

ﬂect these changes. The appearance model we have chosen,

a eigenbasis, is typically learned off-line from a set of train-

ing images {I

,...,I

}, by taking computing the eigenvec-

tors U of the sample covariance matrix

n−1



i=1

−



, where

I =



i=1

is the sample mean of the

training images. Equivalently one can obtain U by comput-

ing the singular value decomposition UΣV



of the centered

data matrix [(I

−

I)···(I

−

I)], with columns equal to the

respective training images minus their mean.

Adapting the appearance model to account for novel

views of the target can be thought of as retraining the eigen-

basis with an additional m images {I

n+1

,...,I

n+m

},for

some value of m. Naively, this update could be performed

by computing the singular value decomposition U



T

of the augmented (centered) data matrix [(I

−



) ···

n+m

−



)], where



is the average of the entire n + m

training images.

Unfortunately this approach is unsatisfactory for online

applications, like visual tracking, due to its storage and com-

putational requirements. First, the naive approach uses the

entire set of training images for each update. If an update is

made at each video frame, then the number of images which

must be retained grows linearly with the length of the se-

quence. Second, the cost of computing the mean and singu-

lar value decomposition grows with the number of images,

so the algorithm will run ever slower as time progresses. In-

stead, the requirements of our application dictate that any

algorithm for updating the mean and eigenbasis must have

storage and computational requirements that are constant,

regardless of the number of images seen so far.

Numerous, more-sophisticated algorithms have been de-

veloped to efﬁciently update an eigenbasis as more data

arrive (Golub and Van Loan 1996;Halletal.1998;Levy

and Lindenbaum 2000; Brand 2002). However, most meth-

ods assume the sample mean is ﬁxed when updating the

eigenbasis, or equivalently that the data is inherently zero-

mean. Neither assumption is appropriate in our application.

An exception is the method by Hall et al. (2002), which

does consider the change of the mean as each new datum

arrives. Although similar to our (independently-developed)

algorithm, it lacks the forgetting factor, which hurts its suit-

ability for tracking, and has a greater computational cost.

(Both of these disadvantages are demonstrated quantita-

tively in Sect. 4.3.) Part of the additional complexity comes,

because Hall’s algorithm is based on the notion of adding

eigenspaces, from computing the eigenvalue decomposition

of each block of new data as it arrives. In this respect our

algorithm is simpler, since it incorporates new data directly,

without the additional step.

Here we extend one of these efﬁcient update proce-

dures—the Sequential Karhunen–Loeve (SKL) algorithm of

Levy and Lindenbaum (2000)—presenting a new incremen-

tal PCA algorithm that correctly updates the eigenbasis as

well as the mean, given one or more additional training data.

Our algorithm, a variation of which was ﬁrst presented in

Lim et al. (2005), has also been applied to algorithms where

the subspace mean plays an important role. For example, it

can be applied to adaptively update the between-class and

within-class covariance matrices used in Fisher linear dis-

criminant analysis (Lin et al. 2005). We begin with a sum-

mary of the SKL algorithm, then describe our new incre-

mental PCA algorithm, and follow with a discussion of a

forgetting factor which can be used to down-weight the ef-

fect of earlier observations on the PCA model.

Putting aside for the moment the problem of the sample

mean, suppose we have a d ×n data matrix A ={I

,...,I

}

where each column I

is an observation (a d-dimensional

image vector in this paper), for which we have already com-

puted the singular value decomposition A =UΣV



. When

a d ×m matrix B of new observations is available, the goal

is to efﬁciently compute the SVD of the concatenation of A

and B: [AB]=U





. Letting

B be the component of

B orthogonal to U , we can express the concatenation of A

and B in a partitioned form as follows:











ΣU







0 I



. (1)

Let R =



ΣU





, which is a square matrix of size

k +m, where k is the number of singular values in Σ.The

SVD of R, R =



, can be computed in constant time

regardless of n, the initial number of data. Now the SVD of

[AB] can be expressed as





















0 I



Since an incremental PCA is only interested in comput-

ing U



and Σ



, V



, whose size scales with the number of

Int J Comput Vis (2008) 77: 125–141 129

observed data, need not be computed. Thus we arrive at the

following formulation of the SKL algorithm.

Given U and Σ from the SVD of A, compute U



and Σ



from the SVD of [AB]:

1. Obtain

B and R by taking the QR decomposition of

[UΣ B]: [U

B]R

=[UΣ B].

2. Compute the SVD of R: R

SVD



3. Finally U



=[U

U and Σ



Σ. If the desired number

of basis vectors in U



is less than the number of non-zero

singular values, then these excess vectors and singular

values may be discarded.

The algorithm can also be made slightly faster, al-

though somewhat more complicated, by modifying the

arrangement of calculations in Step 1. Instead of comput-

ing the QR decomposition of [UΣ B],

B and R can be

obtained directly as follows:

B = orth(B − UU



B) and

R =



ΣU



B(B−UU





, where orth() performs orthogonal-

ization, perhaps via QR. This reorganization, which fol-

lows from (1), avoids performing QR on the entire ma-

trix [UΣ B] (note that the columns corresponding to

U are already orthogonal), instead only orthogonalizing

(B − UU



B), which is the component of B not already

in the subspace U .

The computational advantage of the SKL algorithm over

the naive approach is that it has space and time complexity

that is constant in n, the number of training data seen so far.

Speciﬁcally each update makes use of only the k largest sin-

gular values and basis vectors from the previous stage. This,

together with the storage required for the m new images,

reduces the space complexity to O(d(k + m)), down from

O(d(n +m)

) with the naive approach. Similarly, the com-

putational requirements are also reduced to O(dm

),versus

O(d(n+m)

) for recomputing the entire SVD. More details

and complexity analysis of the SKL algorithm are described

in Levy and Lindenbaum (2000).

The problem with the SKL algorithm as stated above is

that it makes no attempt to account for the sample mean

of the training data, which changes over time as new data

arrive. We will now show how this can be overcome. The

essence of the approach is, at each update of the eigenbasis,

to augment the new training data with an additional vector

carefully chosen to correct for the time-varying mean. We

begin by proving the following lemma:

Lemma 1 Let A =[I

, I

,...,I

], B =[I

n+1

, I

n+2

,...,

n+m

] be data matrices and C =[AB] be their concate-

nation. Denote the means and scatter matrices of A, B, C

, and S

, S

respectively. It can be shown

that S

n+m

(

−

)(

−

)



In this lemma, we deﬁne a scatter matrix to be the outer

product of the centered data matrix, for example S



i=n+1

−

)(I

−

)



. Thus a scatter matrix differs

from the sample covariance matrix by only a scalar multi-

ple S

=m cov(B). The proof of this lemma appears in the

Appendix.

From Lemma 1 we can see that the SVD of (C −

) is

equal to the SVD of the horizontal concatenation of (A −

), (B −

), and one additional vector



n+m

(

−

(The slight abuse of notation (A −

) is meant as a short-

hand for the matrix [(I

−

) ···(I

−

)].) This motivates

our new algorithm, appearing in Fig. 1.

As can be seen, this algorithm shares the favorable com-

plexity of the SKL algorithm, incurring only a small con-

stant overhead to store, update, and correct for the changing

sample mean.

3.1.1 Forgetting Factor

In numerous vision applications it is desirable to focus more

on recently-acquired images and less on earlier observa-

tions. For example, when tracking a target with a chang-

ing appearing, it is likely that recent observations will be

more indicative of its appearance than would more distant

ones. Down-weighting the contribution of earlier observa-

tions also plays an important role in online learning. As time

progresses the observation history can become very large, to

the point of overwhelming the relative contribution of each

Given U and Σ from the SVD of (A −

),aswellas

, n, and B, compute

as well as U



and Σ



from the

SVD of (C −

1. Compute the mean vectors



n+m

i=n+1

, and

n+m

2. Form the matrix

B =



m+1

−

) ... (I

n+m

−

)



n+m

(

−

)



3. Compute

B =orth(

B −UU



B) and R =



ΣU



B−UU





Note that

B will be one column larger than in the SKL algorithm.

4. Compute the SVD of R: R

SVD



5. Finally U



=[U

U and Σ



Σ.

Fig. 1 The incremental PCA algorithm with mean update

评论收藏

内容反馈

版权申诉

林当时

粉丝: 94
资源: 1万+

Incremental-Learning-for-Tracking.rar_INCREMENTAL LEARNING_目标跟踪

最新资源

Incremental-Learning-for-Tracking.rar_INCREMENTAL LEARNING_目标跟踪

Texture-features-tracking.rar_Tracking Features_目标跟踪_纹理特征

Tracking-Moving-Object-.rar_object tracking_目标搜索_目标跟踪_策略 pdf

TLD(Tracking-Learning-Detection)目标跟踪算法的C++实现

incremental-learning-for-SVM.rar_Svm 增量学习_online svm_svm代码matlab

Incremental Learning for Robust Visual Tracking.

incremental-PID-.rar_PID CONTROL_PID 算法_PID算法_pid控制 matlab_增量PID

incremental_svm.tar.gz_Incremental_Incremental-SVM_incremental s

Incremental-SVD-updates.rar_Incremental_NEW svd_SVD-DCT_incremen

Incremental Few-Shot Learning for Pedestrian Attribute Recognition.pdf

maven-shared-incremental-1.1.jar

Incremental-ELM.rar_ELM_ELM增量_Incremental ELM_增量 elm

incremental-16.10.1.tar.gz

incremental-17.5.0-py2.py3-none-any

PV_MPPT_diandoafa_mine.rar_MPPT conductance_incremental mppt_mpp

MPPT.rar_incremental mppt_mppt 增量_mppt导纳_sliding_导纳增量法

maven-shared-incremental-1.1.jar.zip

CAD递增阵列IncArrayV1-7两个命令.rar_CAD插件_IncArray_cad 递增_equalcvc_lsp

maven-shared-incremental-1.1-6.el7.noarch.rpm

maven-shared-incremental-javadoc-1.1-6.el7.noarch.rpm

rh-maven35-maven-shared-incremental-1.1-12.2.el7.noarch.rpm

rh-maven33-maven-shared-incremental-1.1-6.12.el7.noarch.rpm

LS-ILDA.rar_LS-LDA_ilda_machine learning

冰河的渗透实战笔记-冰河.pdf

大灰狼远控2021最新版，解压密码222

J-LINK V10 V11固件.rar

ISO21434.pdf

Web安全漏洞扫描工具-AWVS14

CTF 竞赛入门指南（ctf-all-in-one）.pdf

Web中间件常见漏洞总结.pdf

最新资源