近两年跟踪速度较快的算法资源-CSDN文库

共5个文件

pdf：4个

docx：1个

跟踪速度

需积分: 9 128 浏览量 2016-12-06 10:55:02 上传评论 2 收藏 6.43MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

近两年跟踪速度较快的算法.rar （5个子文件）

近两年跟踪速度较快的算法

2013TIP-Real-time Object Tracking via Online Discriminative Feature Selection.pdf 2.18MB

2014CVPR-Adaptive Color Attributes for Real-Time Visual Tracking.pdf 1.22MB

近两年跟踪速度较快的算法.docx 32KB

2014ECCV-Fast Visual Tracking via Dense Spatio-Temporal Context Learning.pdf 2.43MB

2014PAMI-High-Speed Tracking with Kernelized Correlation Filters.pdf 1.25MB

Fast Visual Tracking via

Dense Spatio-Temporal Context Learning

Kaihua Zhang

, Lei Zhang

, Qingshan Liu

, David Zhang

, and Ming-Hsuan Yang

S-mart Group, Nanjing University of Information Science & Technology

Dept. of Computing, The Hong Kong Polytechnic University

Electrical Engineering and Computer Science, University of California at Merced

zhkhua@gmail.com,cslzhang@comp.polyu.edu.hk,qsliu@nuist.edu.cn,

csdzhang@comp.polyu.edu.hk,mhyang@ucmerced.edu

Abstract. In this paper, we present a simple yet fast and robust algorithm which

exploits the dense spatio-temporal context for visual tracking. Our approach for-

mulates the spatio-temporal relationships between the object of interest and its

locally dense contexts in a Bayesian framework, which models the statistical cor-

relation between the simple low-level features (i.e., image intensity and position)

from the target and its surrounding regions. The tracking problem is then posed

by computing a conﬁdence map which takes into account the prior information

of the target location and thereby alleviates target location ambiguity effectively.

We further propose a novel explicit scale adaptation scheme, which is able to deal

with target scale variations efﬁciently and effectively. The Fast Fourier Trans-

form (FFT) is adopted for fast learning and detection in this work, which only

needs 4 FFT operations. Implemented in MATLAB without code optimization,

the proposed tracker runs at 350 frames per second on an i7 machine. Extensive

experimental results show that the proposed algorithm performs favorably against

state-of-the-art methods in terms of efﬁciency, accuracy and robustness.

1 Introduction

Visual tracking is one of the most active research topics due to its wide range of applica-

tions such as motion analysis, activity recognition, surveillance, and human-computer

interaction, to name a few [29]. The main challenge for robust visual tracking is to han-

dle large appearance changes of the target object and the background over time due to

occlusion, illumination changes, and pose variation. Numerous algorithms have been

proposed with focus on effective appearance models, which are based on the target ap-

pearance [8,1,28,22,17,18,19,23,21,31] or the difference between appearances of the

target and its local background [11,16,14,2,30,15]. However, if the appearances are de-

graded severely, there does not exist enough information extracted for robustly tracking

the target, whereas its existing scene can provide useful context information to help

localizing it.

In visual tracking, a local context consists of a target object and its immediate sur-

rounding background within a determined region (see the regions inside the red rect-

angles in Figure 1). Most of local contexts remain unchanged as changes between two

consecutive frames can be reasonably assumed to be smooth as the time interval is usu-

ally small (30 frames per second (FPS)). Therefore, there exists a strong spatio-temporal

2 Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, Ming-Hsuan Yang

Fig. 1. The proposed method handles heavy occlusion well by learning dense spatio-temporal

context information. Note that the region inside the red rectangle is the context region which

includes the target and its surrounding background. Left: although the target appearance changes

much due to heavy occlusion, the spatial relationship between the object center (denoted by solid

yellow circle) and most of its surrounding locations in the context region is almost unchanged.

Middle: the learned spatio-temporal context model (some regions have similar values which show

the corresponding regions in the left frames have similar spatial relations to the target center.).

Right: the learned conﬁdence map.

relationship between the local scenes containing the object in consecutive frames. For

instance, the target in Figure 1 undergoes heavy occlusion which makes the object ap-

pearance change signiﬁcantly. However, the local context containing the object does not

change much as the overall appearance remains similar and only a small part of the con-

text region is occluded. Thus, the presence of local context in the current frame helps

to predict the object location in the next frame. This temporally proximal information

in consecutive frames is the temporal context which has been recently applied to object

detection [10]. Furthermore, the spatial relation between an object and its local context

provides speciﬁc information about the conﬁguration of a scene (see middle column in

Figure 1) which helps to discriminate the target from background when its appearance

changes much.

2 Related Works

Most tracking algorithms can be categorized as either generative [22,17,18,19,23,21,31]

or discriminative [11,16,14,2,30,15] based on their appearance models. A generative

tracking method learns an appearance model to represent the target and searches for

image regions with best matching scores as the results. While it is critical to construct

an effective appearance model in order to handle various challenging factors in track-

ing, the involved computational complexity is often increased at the same time. Further-

more, generative methods discard useful information surrounding target regions that can

be exploited to better separate objects from backgrounds. Discriminative methods treat

tracking as a binary classiﬁcation problem with local search which estimates decision

Fast Visual Tracking via Dense Spatio-Temporal Context Learning 3

Focus of attention

Spatial weight function

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Confidence map

Frame (t)

Spatial context model ℎ

𝑡

𝑠𝑐

IFFT

FFT

(a) Learn dense spatial context at the t-th frame







 

















  



Old location

New location

Frame(1)

Frame(2)

Frame(3)

Frame(t)









     

 

  

Spatio-temporal context model

Tracking at frame (t+1)

Frame(t+1)

Low

High







    





  





FFT

IFFT







 



(b) Detect object location at the (t+1)-th frame

Fig. 2. Basic ﬂow of our tracking algorithm. The local context regions are inside the red rectangles

while the target locations are indicated by the yellow rectangles. FFT denotes the Fast Fourier

Transform and IFFT is the inverse FFT.

boundary between an object image patch and the background. However, the objective

of classiﬁcation is to predict instance labels which is different from the goal of track-

ing to estimate object locations [14]. Moreover, while some efﬁcient feature extraction

techniques (e.g., integral image [11,16,14,2,30] and random projection [30]) have been

proposed for visual tracking, there often exist a large number of samples from which

features need to be extracted for classiﬁcation, thereby entailing computationally ex-

pensive operations. Generally speaking, both generative and discriminative tracking al-

gorithms make trade-offs between effectiveness and efﬁciency of an appearance model.

Notwithstanding much progress has been made in recent years, it remains a challenging

task to develop an efﬁcient and robust tracking algorithm.

Recently, several methods [27,13,9,25] exploit context information to facilitate vi-

sual tracking via mining the information of regions with consistent motion correlations

to the target object. In [27], a data mining method is used to extract segmented regions

surrounding the object as auxiliary objects for collaborative tracking. To ﬁnd consis-

tent regions, key points surrounding the object are ﬁrst extracted to help locating the

object position in [13,9,25]. The SIFT or SURF descriptors are then used to represent

these consistent regions. However, computationally expensive operations are required

in representing and ﬁnding consistent regions. Furthermore, due to the sparsity natures

of key points and auxiliary objects, some consistent regions that are useful for locat-

ing the object position may be discarded. In contrast, the proposed algorithm does not

have these problems because all the local regions surrounding the object are considered

as the potentially consistent regions, and the motion correlations between the objects

and its local contexts in consecutive frames are learned by the spatio-temporal context

model that is efﬁciently computed by FFT.

4 Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, Ming-Hsuan Yang

Fig. 3. Graphical model representation of spatial relationship between object and its dense local

context. The dense local context region Ω

is inside the red rectangle which includes object

region surrounding by the yellow rectangle centering at the tracked result x

. The context feature

at location z is denoted by c(z) = (I(z), z) including a low-level appearance representation (i.e.,

image intensity I(z)) and location information.

In this paper, we propose a fast and robust tracking algorithm which exploits dense

spatio-temporal context information. Figure 2 illustrates the basic ﬂow of our algorithm.

First, we learn a spatial context model between the target object and its local surround-

ing background based on their spatial correlations in a scene by solving a deconvolution

problem. Next, the learned spatial context model is used to update a spatio-temporal

context model for the next frame. Tracking in the next frame is formulated by comput-

ing a conﬁdence map as a convolution problem that integrates the dense spatio-temporal

context information, and the best object location can be estimated by maximizing the

conﬁdence map (See Figure 2 (b)). Finally, based on the estimated conﬁdence map,

a novel explicit scale adaptation scheme is presented, which renders an efﬁcient and

accurate tracking result.

The key contributions of the proposed algorithm are summarized as follows:

– To the best of our knowledge, it is the ﬁrst work to use dense context information

for visual tracking and achieves fast and robust results.

– We propose a novel explicit scale update scheme to deal with the scale variations

of the target efﬁciently and effectively.

– The proposed algorithm is simple and fast that needs only 4 FFTs at 350 FPS in

MATLAB.

– The proposed algorithm has the merits of both generative and discriminative meth-

ods. On the one hand, the context includes target and its neighbor background,

thereby making our method have the merits of discriminative models. On the other

hand, the context is a whole of target and background, rendering our method the

merits of generative models.

3 Problem Formulation

The tracking problem is formulated by computing a conﬁdence map which estimates

the object location likelihood:

m(x) = P (x|o), (1)

Fast Visual Tracking via Dense Spatio-Temporal Context Learning 5

Fig. 4. Illustration of the characteristic of the non-radially symmetric function h

(·) in (3). Here,

the left eye is the tracked target denoted by x

whose context is inside the green rectangle while

represents the right eye which is a distractor with context inside the blue rectangle. Although

z has similar distance to x

and x

, their spatial relationships are different (i.e., h

− z) 6=

− z)), and this helps discriminating x

from x

where x ∈ R

is an object location and o denotes the object present in the scene. (1)

is equal to the posterior probability P (o|x) because we use uniform prior P (o) for the

target presence for simplicity. In the following, the spatial context information is used

to estimate (1) and Figure 3 shows its graphical model representation.

In Figure 3, the object location x

(i.e., coordinate of the tracked object center) is

tracked. The context feature set is deﬁned as X

= {c(z) = (I(z), z)|z ∈ Ω

)}

where I(z) denotes image intensity at location z and Ω

) is the neighborhood of lo-

cation x

that is twice the size of the target object. By marginalizing the joint probability

P (x, c(z)|o), the object location likelihood function in (1) can be computed by

m(x) = P (x|o)

c(z)∈X

P (x, c(z)|o)

c(z)∈X

P (x|c(z), o)P (c(z)|o),

(2)

where the conditional probability P (x|c(z), o) models the spatial relationship between

the object location and its context information which helps to resolve ambiguities when

the degraded image measurements allow different interpretations, and P (c(z)|o) is a

context prior probability which models appearance of the local context. The main task

in this work is to learn P (x|c(z), o) as it bridges the gap between object location and its

spatial context.

3.1 Spatial Context Model

The conditional probability function P (x|c(z), o) in (2) is deﬁned as

P (x|c(z), o) = h

(x − z), (3)

where h

(x − z) is a function (see Figure 4 and Section 3.4) with respect to the relative

distance and direction between object location x and its local context location z, thereby

encoding the spatial relationship between an object and its spatial context.

评论收藏

内容反馈

OMG59E9

粉丝: 1
资源: 4

近两年跟踪速度较快的算法

近两年跟踪速度较快的算法总结

视频目标跟踪多种算法及源码

目标检测跟踪算法（源代码）

背景变化鲁棒的自适应视觉跟踪目标模型

近两年跟踪速度较快的算法，排序算法数据结构 最快的排序算法

近几年跟踪速度较快的算法

近几年VOT大赛目标跟踪算法思维导图

论文研究-改进的向心加速度粒子群算法 .pdf

深度学习的目标跟踪算法综述.pdf

视觉单目标跟踪算法研究

一种快速圆弧绘制算法

加速近端梯度算法APG算法的matlab实现

JAVA近百种算法大全

2021年11月股市流动性系列跟踪：一级市场募资规模创近两年新高.pdf

更普适的快速自适应图像滤波算法——近均值滤波 图像滤波算法.pdf

论文研究-基于SystemGenerator的快速中值滤波算法设计与实现.pdf

java 算法（近百种java算法）

单传感器多目标跟踪相关算法的研究

论文研究-一种新型非线性滤波的多特征融合跟踪算法.pdf

Java 面经手册·小傅哥.pdf

解压后拖入浏览器扩展程序使用.zip

103套PPT模板.zip

Beyond Compare 免安装直接使用

notepad++.exe官网下载

Mars4_5.zip

QT自制精美Ui模板系列（一）桃子风格模板 - 二次开发专用

Postman9.12.2安装包

keygen_2032.rar

最新资源

近两年跟踪速度较快的算法，排序算法数据结构最快的排序算法

更普适的快速自适应图像滤波算法——近均值滤波图像滤波算法.pdf