VisualObjectTrackingBasedonCombinationofLocalDescriptionandGlobalRepresentation资源-CSDN文库

20 浏览量 2021-02-20 18:31:39 上传评论收藏 1.3MB PDF 举报

该篇研究论文标题为“基于局部描述和全局表达组合的视觉目标跟踪”，其核心内容是提出了一种新的视觉目标跟踪方法，该方法基于局部尺度不变特征变换（SIFT）描述符与全局增量主成分分析（PCA）表达的结合，在松散约束条件下进行。通过使用平行四边形的位置和形状来定义物体的状态，即通过在每一帧中定位平行四边形来给出跟踪结果。整个方法是在粒子滤波器框架内构建的，包括两个模型：动态模型和观测模型。在动态模型中，粒子的状态预测借助于局部SIFT描述符；通过基于SIFT描述符的连续帧之间局部关键点匹配，为粒子状态的预测提供了重要线索，从而可以有效地将粒子分布在预测位置的邻近区域。在观测模型中，每个粒子都由局部关键点加权的增量PCA表达来评估，它通过给定关键点影响区域内的像素较大权重来更准确地描述目标。此外，通过引入动态遗忘因子，可以根据目标状态在线更新PCA特征向量，从而使我们的方法在不同情况下更具适应性。实验结果表明，与当前其他先进方法相比，所提出的方法在某些困难条件下尤其稳健，例如对象和背景的强烈运动、大的姿态变化和光照变化。论文开篇即指出视觉目标跟踪研究已经吸引了众多研究者的关注，并成为了一个热门的研究主题。目标跟踪在计算机视觉系统中被广泛用于视频监控、活动分析和识别、人机交互以及视频检索和摘要化等多个领域。目标跟踪的目的是自动地从视频帧中定位同一物体，这一任务在计算机视觉和模式识别领域具有重要地位，因为它是许多高级视觉任务的基础。研究者提出的方法中提到了几个关键的概念和技术点，例如局部尺度不变特征变换（SIFT）描述符和全局增量主成分分析（PCA）表达。SIFT是一种用于图像局部特征提取的算法，它可以检测和描述图像中的局部特征，具有尺度不变性和旋转不变性的特点。这一特性使其非常适合于目标跟踪场景，特别是在目标或背景发生尺度或方向变化时仍能进行有效匹配。在文章中，SIFT用于动态模型中，通过匹配连续帧中的关键点来指导粒子滤波器中的粒子状态预测。增量PCA则是一种用于全局特征表示的方法。在目标跟踪的场景中，PCA可以被用于从视频帧中提取关键帧的特征并用于识别和跟踪目标。增量PCA与传统PCA不同，它可以在没有所有数据的情况下进行更新，这使得PCA特征向量能够根据新收集到的数据进行实时更新，提高了对目标在不同情况下的跟踪适应性。文章中提出的方法结合了SIFT和增量PCA的优势，通过动态遗忘因子在线更新PCA特征，进一步增强了方法在不同环境下的鲁棒性和准确性。在跟踪算法中，粒子滤波器是一种常用的统计方法，用于估计动态系统的非线性/非高斯过程。它通过一组随机样本（粒子）来表示可能的系统状态，并根据观测数据和动态模型更新粒子权重以逼近真实的状态。在该论文中，粒子滤波器的框架被用来构建动态和观测模型，其中动态模型负责根据先前的状态和SIFT描述符预测当前状态，而观测模型则负责评估粒子的当前状态与观测数据的匹配程度。文章最后提到的实验结果显示，在多种复杂条件下，该方法的性能优于现有其他方法。这包括了对象和背景的强烈运动、目标姿态的大幅度变化以及光照条件的改变。这些结果表明，在各种环境下，该论文所提出的方法都具备较高的适应性和可靠性。此外，文章中还提到了“动态遗忘因子”的概念，这个因子用于在线更新PCA特征向量，以适应不断变化的跟踪场景。这个因素对于提高跟踪系统在实时应用中的性能至关重要，因为它使得系统能够丢弃与当前跟踪目标不太相关的信息，而保留对当前目标状态更为重要的信息。这篇论文介绍了一个结合局部特征描述与全局特征表达的目标跟踪新方法，展现了如何使用粒子滤波器框架结合SIFT描述符和增量PCA来实现稳健的目标跟踪，并且在多个困难条件下验证了其性能。这篇论文对于希望深入了解目标跟踪和视频分析领域的读者来说，提供了一种创新的视角和有效的技术实现方法。

资源推荐

资源详情

资源评论

408 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 4, APRIL 2011

Visual Object Tracking Based on Combination of

Local Description and Global Representation

Li Sun and Guizhong Liu, Member, IEEE

Abstract—This paper provides a novel method for visual object

tracking based on the combination of local scale-invariant feature

transform (SIFT) description and global incremental principal

component analysis (PCA) representation in loosely constrained

conditions. The state of object is deﬁned by the position and shape

of a parallelogram, which means that tracking results are given

by locating the object in every frame using parallelograms. The

whole method is constructed in the framework of particle ﬁlter

which includes two models: the dynamic model and the observa-

tion model. In the dynamic model, particle states are predicted

with the help of local SIFT descriptors. Local key point matching

between successive frames based on SIFT descriptors provides us

an important cue for the prediction of particle states; thus, we can

efﬁciently spread particles in the neighborhood of the predicted

position. In the observation model, every particle is evaluated by

local key point-weighted incremental PCA representation, which

can describe the object more accurately by giving large weights

to the pixels in the inﬂuence area of key points. Moreover, by

incorporating the dynamic forgetting factor, we can update the

PCA eigenvectors online according to the object states, which

makes our method more adaptable under different situations.

Experimental results show that compared to other state-of-the-

art methods, the proposed method is robust especially under

some difﬁcult conditions, such as strong motion of both object

and background, large pose change, and illumination change.

Index Terms—Forgetting factor, object tracking, PCA, SIFT.

I. Introduction

OR DECADES, visual object tracking has drawn many

researchers’ attention and has become a hot research

topic. It has been widely used in computer vision system,

such as video surveillance [1], [2], activity analysis and

recognition [3], [4], human–computer interaction [5], [6], and

video retrieval and summarization [7], [8]. The goal of tracking

is to automatically locate the same object in an adjacent frame

from a video sequence once it is initialized. Although many

methods have been already proposed for different applications,

it is still a challenging task to develop a robust object tracking

algorithm for the following two main reasons. First, there often

Manuscript received November 18, 2009; revised April 30, 2010; accepted

June 18, 2010. Date of publication October 18, 2010; date of current version

April 1, 2011. This work was supported in part by the National 973, under

Project 2007CB311002, in part by the National Natural Science Foundation

of China, under Project 60903121, in part by the National High Tech., under

Project 2009AA012409, and in part by the China State-Funded Study Abroad

Program, under CSC 2007U06001. This paper was recommended by Associate

Editor S. Yan.

The authors are with the School of Electronics and Information Engineering,

Xi’an Jiaotong University, Xi’an 710049, China (e-mail: liugz@xjtu.edu.cn;

sunli@mailst.xjtu.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCSVT.2010.2087815

exist pose change and shape deformation on object, which

make it difﬁcult to obtain stable and discriminative features

from an image. Second, different situations of environment,

which include illumination variation, cluttered background, or

even possible occlusions, also set obstacles for the adaptability

of the algorithm.

There are usually two major modules in an object tracking

system, which are the appearance description module and the

location determination module. The previous module tries to

tell us what is being tracked by describing the appearance

of a target with some speciﬁc features. It provides us a

foundation since reliable and discriminative features are of

extreme importance for object tracking. Here, the description

can be either based on global features or representations, such

as an original image patch [9], histograms of pixel color [10],

[11], or local features or descriptions such as an object contour

[12], [13] and a collection of local key points [14], [15]. The

latter module aims to determine the state of an object based

on the description features. Here, the state can be the location

in the image plane, the curvature of the contour or the speed

of the object. It is actually an estimation process based on

previous state and can be done either at low level (e.g., using

pixel-based optic ﬂow [16] or block-based matching [17])

or high level (dynamic state transition model [18], [12]). In

particle ﬁlter, these two modules correspond to the observation

model and the dynamic model, respectively.

In this paper, we focus on both of the two modules and

design a compact object tracking scheme, which is based on

the combination of both global representation by incremental

principal component analysis (PCA) and local description by

scale-invariant feature transform (SIFT) descriptors. Particu-

larly, the proposed scheme is constructed in the framework

of particle ﬁlter. In the dynamic model, local key points

are reliably matched based on SIFT descriptors between

successive frames, which can give a prediction about the

state of the object according to its corresponding location

in previous frame. In the observation model, incremental

PCA representation can accurately describe the object as a

whole. It keeps a set of eigenvectors which are dynamically

updated during tracking. Moreover, key point information is

also adopted to calculate the particle weight.

The proposed method complies with the visual perception

of human being [19], which is that both global and local

information are important for humans to locate objects. In

object detection and localization, global PCA representation

has already obtained some achievements, which can be found

1051-8215/$26.00

 2010 IEEE

SUN AND LIU: VISUAL OBJECT TRACKING BASED ON COMBINATION OF LOCAL DESCRIPTION AND GLOBAL REPRESENTATION 409

in [20] and [21]. It is theoretically the optimal linear scheme

for compressing a set of high-dimensional vectors into a

set of lower-dimensional vectors and then reconstructing the

original set. But it is easy to understand that traditional PCA

representation is inappropriate in some applications since

it treats every pixel equally. For example, in face detection

and tracking, pixels in eyes, nose, and mouth provide much

more information than other pixels. Since these pixels are

much more distinctive and discriminative than others, they

should be paid more attention. On the contrary, local key

point descriptors such as SIFT [14] and speeded up robust

features (SURF) [15] have shown their advantages over

other traditional features and many approaches for object

detection and recognition have been proposed based on them

as in [22] and [23]. The fact, that the same key point under

afﬁne transformation may still be matched based on SIFT

descriptors, makes us have reasons to believe that SIFT

descriptor is worth applying in object tracking.

The remainder of this paper is organized as follows. In

Section II, we will discuss some previous works which are

closely related to our proposed one. In Section III, an overview

of the proposed object tracking scheme is given. Technical

details of our scheme are given in Section IV. Results of

experiment compared with other works are given in Section

V. Brief conclusions are ﬁnally given in Section VI.

II. Related Work

Many approaches for object tracking have been proposed in

the recent two decades. Yilmaz et al. [24] provided a detailed

review of major achievements in this area. Here, we categorize

the tracking schemes based on the the type of visual features

they use. These features can be divided into two types which

are global and local features.

Here are some typical works which make use of global

features of the tracking object. Shi and Tomasi [25] performed

object tracking based on the simple point-to-point sum of

squared differences in the tracking region between successive

frames [9]. In [26], Collins et al. proposed an online feature

selection method for object tracking. The candidate feature

sets in their method are histograms of color ﬁlter bank

response in object region. In [27], Wang et al. implemented

another online feature selection method for object tracking.

The characteristic lies is that their method is based on the

Haar-like feature obtained from integral image. Babenko et al.

[28] proposed an online multiple instance learning framework

for object tracking. They also used several Haar-like features to

represent the image patch in global. The above three methods

introduce the online learning mechanism into the tracking

process. So they can be more adaptive to the sudden change

during tracking compared to other tracking schemes. But since

they are discriminative methods built on low-level features

directly from pixels, they cannot avoid “drift” during online

learning. Moreover, all of them use global features to describe

the object but ignore the local information in the tracking

process. Li et al. [29] used the cascade particle ﬁlter with

discriminative observers of different lifespans to track objects

in low-level frame rate (LFR) video. Their method integrates

detection and tracking to adapt the characteristic of LFR video.

Like the method in [27] and [28], they also utilized Haar-

like features. The limitation is that their work needs a large

amount of training work for a speciﬁc object and this has to be

done ofﬂine. Different from previous discriminative methods,

Ross et al. [30] proposed an incremental learning method for

object tracking based on global PCA representation which is a

generative method built on high-level “notions.” Their method

can efﬁciently learn and update a low-dimensional subspace

which is spanned by PCA eigenvectors. These eigenvectors are

updated using sequential Karhunen-Loeve (SKL) algorithm.

Then they use the subspace to reconstruct image patches

in current frame and choose the best particle as the ﬁnal

tracking result according to the distance between the original

patch and the reconstructed one. The authors argue that their

method is robust under some challenging conditions. But since

PCA representation treats every image patch as a whole, key

points information is totally lost here. Tu et al. [31] proposed

an algorithm for 3-D head pose tracking in low resolution

video. Extended from previous method in [30], they build a

conﬁdence map for every pixel in the face area and employ

weighted PCA to represent the tracking face. Their method

can only work for face tracking while our method is a general

scheme.

There are also some works which focus on local features for

object tracking. Optical ﬂow algorithms such as the classical

example proposed by Lucas and Kanade [16] can be regarded

as the tracking algorithm based on local points. Ta et al.

[15] proposed a method for object recognition and tracking

based on SURF point. But they focus on reducing computa-

tion amount for descriptor calculation, but not the compact

framework for tracking. As is similar to the work in [28], Yeh

and Hsu [32] proposed an online feature selection method for

visual object tracking based on AdaBoost. But in [32], instead

of using integral features of an image patch, they use several

combinations of R, G, and B intensity values from individual

pixel and compute a compound likelihood for every pixel.

Object tracking then becomes a problem of ﬁnding region

with higher likelihood on the likelihood image. Due to the

great success of local key point descriptor especially SIFT,

some researchers have already taken it into consideration for

object tracking. Zhou et al. [33] proposed an algorithm, which

makes use of the correspondence obtained by SIFT, in the

tracking framework of mean shift. The similarity measurement

in the mean shift is calculated based on the matched key points

in successive frames. In [34], Gao et al. proposed a moving

vehicle tracking scheme in the framework of particle ﬁlter.

They used matched key points based on SIFT descriptor to

initialize the particle position. The above two methods simply

incorporate the correspondence of key points into the tracking

framework (mean shift or particle ﬁlter). Different from our

work, they do not focus on the compact tracking scheme with

the combination of both the global representation and the local

key point description.

III. Overview Structure

Our proposed method is constructed in the framework of

particle ﬁlter which includes two models: the dynamic model

剩余12页未读，继续阅读

评论收藏

内容反馈

weixin_38660918

粉丝: 9
资源: 926

Visual Object Tracking Based on Combination of Local Description...

最新资源

Visual Object Tracking Based on Combination of Local Description...

visual object tracking

Object Tracking

some resource of image fusion for object tracking and detection

Visual tracking by sparse representation and global measure

Robust Object Tracking Based on Principal Component Analysis and Local Sparse Representation

online visual object tracking

Online Visual Tracking by Huchuan Lu-June 1, 2019.epub

Video object tracking based on YOLOv7 and DeepSORT.pdf

Graphical Models for Visual Object Recognition and Tracking

object tracking stuffs

VisualObjectTracking：视觉对象跟踪算法。 坚持，稍等！ 有很多事情要来

物体跟踪基础 Fundamentals of object tracking

Multiple Object Tracking

Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks

Object Tracking A Survey

Tracking Object based on GPS and IMU Sensor.pdf

The Visual Object Tracking VOT2015 challenge results

kernel—based object tracking

Particle Filter Algorithm for Object Tracking Based on Color Local Entropy

Center-based 3D Object Detection and Tracking译文

论文研究-RGB-D Object Tracking and Occlusion Deformation Processing Based on Depth Model.pdf

Robust Object Tracking via Sparsity-based Collaborative Model

occlusion reasoning for multiple object tracking

目标跟踪基本原理 fundamentals of object tracking 英文版

A Twofold Siamese Network for Real-Time Object Tracking

Multiple Camera Object Tracking

最新资源

VisualObjectTracking：视觉对象跟踪算法。坚持，稍等！有很多事情要来