kernel-based_object_tracking_PAMI_kernelmodelheapcorruption资源-CSDN文库

需积分: 9 186 浏览量 2014-12-17 11:01:59 上传评论收藏 2.59MB PDF 举报

### Kernel-Based Object Tracking #### 一、引言与背景在计算机视觉领域，实时对象跟踪是许多应用中的关键任务，例如监控系统([44]、[16]、[32])、感知用户界面([10])、增强现实([26])、智能房间([39]、[75]、[47])、基于对象的视频压缩([11])以及驾驶员辅助系统([34]、[4])等。通常，一个典型的视觉跟踪器可以分为两个主要组成部分：目标表示与定位以及过滤与数据关联。 1. **目标表示与定位**：这通常是一个自下而上的过程，需要应对目标外观的变化。 2. **过滤与数据关联**：这通常是自上而下的过程，处理被跟踪对象的动力学、场景先验的学习以及不同假设的评估。这两种组件如何结合和权衡取决于具体的应用，并且对跟踪器的鲁棒性和效率起着决定性的作用。例如，在拥挤场景中的面部跟踪更多依赖于目标表示而非目标动态([21])；而在空中视频监控中，则可能更侧重于目标动力学。 #### 二、目标表示与定位的新方法本文提出了一种新的方法来实现非刚体对象的目标表示与定位，这是视觉跟踪中的核心部分。该方法通过特征直方图的目标表示，并使用各向异性核进行空间屏蔽来正则化这些表示。这种空间屏蔽能够诱导出空间平滑的相似性函数，非常适合梯度优化。因此，目标定位问题可以利用局部最大值的吸引力盆地来表述。为了实现这一目标，本方法采用Bhattacharyya系数导出的度量作为相似性度量，并利用均值漂移算法执行优化。在给出的跟踪示例中，该新方法成功地应对了相机运动、部分遮挡、杂乱背景以及目标尺寸变化等问题。此外，文中还讨论了与运动滤波器和数据关联技术的集成。 #### 三、关键技术点 - **各向异性核的空间屏蔽**：这种方法通过引入空间平滑的相似性函数，使得目标定位问题能够被有效地解决。通过这种方式，即使是在复杂场景中，也能保证跟踪的准确性和稳定性。 - **Bhattacharyya系数**：这是一种度量两个概率分布之间差异的方法。在本文中，它被用来衡量目标特征直方图之间的相似性，从而为优化问题提供了一个合适的度量标准。 - **均值漂移算法**：这是一种迭代的搜索算法，用于寻找高密度区域的中心（即模式）。在目标跟踪中，该算法被用来找到最有可能的目标位置，进而实现优化。 - **目标定位问题的表述**：通过定义目标定位问题为局部最大值的吸引力盆地，使得问题可以通过优化手段得到有效的解决。这种方法不仅提高了跟踪的准确性，还增强了跟踪系统的鲁棒性。 #### 四、潜在应用 - **利用背景信息**：通过对场景中背景信息的有效利用，可以进一步提高跟踪的准确性，尤其是在复杂的环境中。 - **Kalman跟踪与运动模型**：结合Kalman滤波器和运动模型，可以更好地预测和修正目标的位置，尤其是在存在噪声或遮挡的情况下。 - **面部跟踪**：对于特定场景如面部跟踪，通过上述方法可以显著提升跟踪效果，尤其是在多人场景中。 #### 五、结论本文提出的方法通过改进目标表示与定位的技术，有效解决了非刚体对象跟踪中的多个挑战。通过对特征直方图的各向异性核空间屏蔽、Bhattacharyya系数度量以及均值漂移优化策略的应用，不仅提高了跟踪的准确性和鲁棒性，也为后续的跟踪应用提供了新的思路和技术支持。

资源推荐

资源详情

资源评论

Kernel-Based Object Tracking

Dorin Comaniciu, Senior Member, IEEE, Visvanathan Ramesh, Member, IEEE, and

Peter Meer, Senior Member, IEEE

Abstract—A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects,

is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The

masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem

can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as

similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method

successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data

association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information,

Kalman tracking using motion models, and face tracking.

Index Terms—Nonrigid object tracking, target localization and representation, spatially-smooth similarity function, Bhattacharyya

coefficient, face tracking.

1INTRODUCTION

EAL-TIME object tracking is the critical task in many

computer vision applications such as surveillance [44],

[16], [32], perceptual user interfaces [10], augmented reality

[26], smart rooms [39], [75], [47], object-based video compres-

sion [11], and driver assistance [34], [4].

Two major components can be distinguished in a typical

visual tracker. Target Representation and Localization is mostly a

bottom-up process which has also to cope with the changes in

the appearance of the target. Filtering and Data Association is

mostly a top-down process dealing with the dynamics of the

tracked object, learning of scene priors, and evaluation of

different hypotheses. The way the two components are

combined and weighted is application dependent and plays

a decisive role in the robustness and efficiency of the tracker.

For example, face tracking in a crowded scene relies more on

target representation than on target dynamics [21], while in

aerial video surveillance, e.g., [74], the target motion and the

ego-motion of the camera are the more important compo-

nents. In real-time applications, only a small percentage of the

system resources can be allocated for tracking, the rest being

required for the preprocessing stages or to high-level tasks

such as recognition, trajectory interpretation, and reasoning.

Therefore, it is desirable to keep the computational complex-

ity of a tracker as low as possible.

The most abstract formulation of the filtering and data

association process is through the state space approach for

modeling discrete-time dynamic systems [5]. The informa-

tion characterizing the target is defined by the state

sequence fx

k¼0;1;...

, whose evolution in time is specified

by the dynamic equation x

¼ f

ðx

kÿ1

; v

Þ. The available

measurements fz

k¼1;...

are related to the corresponding

states through the measurement equation z

¼ h

ðx

; n

Þ.In

general, both f

and h

are vector-valued, nonlinear, and

time-varying functions. Each of the noise sequences,

k¼1;...

and fn

k¼1;...

is assumed to be independent and

identically distributed (i.i.d.).

The objective of tracking is to estimate the state x

given all

the measurements z

1:k

up that moment, or equivalently to

construct the probability density function (pdf) pðx

1:k

Þ. The

theoretically optimal solution is provided by the recursive

Bayesian filter which solves the problem in two steps. The

prediction step uses the dynamic equation and the already

computed pdf of the state at time t ¼ k ÿ 1, pðx

kÿ1

1:kÿ1

Þ,to

derive the prior pdf of the current state, pðx

1:kÿ1

Þ. Then, the

update step employs the likelihood function pðz

Þ of the

current measurement to compute the posterior pdf pðx

1:k

When the noise sequences are Gaussian and f

and h

are linear functions, the optimal solution is provided by the

Kalman filter [5, p. 56], which yields the posterior being also

Gaussian. (We will return to this topic in Section 6.2.) When

the functions f

and h

are nonlinear, by linearization the

Extended Kalman Filter (EKF) [5, p. 106] is obtained, the

posterior density being still modeled as Gaussian. A recent

alternative to the EKF is the Unscented Kalman Filter (UKF)

[42] which uses a set of discretely sampled points to

parameterize the mean and covariance of the posterior

density. When the state space is discrete and consists of a

finite number of states, Hidden Markov Models (HMM)

filters [60] can be applied for tracking. The most general

class of filters is represented by particle filters [45], also

called bootstrap filters [31], which are based on Monte Carlo

integration methods. The current density of the state is

represented by a set of random samples with associated

weights and the new density is computed based on these

samples and weights (see [23], [3] for reviews). The UKF can

be employed to generate proposal distributions for particle

filters, in which case the filter is called Unscented Particle

Filter (UPF) [54].

When the tracking is performed in a cluttered environ-

ment where multiple targets can be present [52], problems

564 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 5, MAY 2003

. D. Comaniciu and V. Ramesh are with the Real-Time Vision and Modeling

Department, Siemens Corporate Research, 755 College Road East,

Princeton, NJ 08540. E-mail: comanici@scr.siemens.com.

. P. Meer is with the Electrical and Computer Engineering Department,

Rutgers University, 94 Brett Road, Piscataway, NJ 08854-8058.

Manuscript received 21 May 2002; revised 13 Oct. 2002; accepted 16 Oct.

2002.

Recommended for acceptance by M. Irani.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 116595.

0162-8828/03/$10.00 ß 2003 IEEE Published by the IEEE Computer Society

related to the validation and association of the measurements

arise [5, p. 150]. Gating techniques are used to validate only

measurements whose predicted probability of appearance is

high. After validation, a strategy is needed to associate the

measurements with the current targets. In addition to the

Nearest Neighbor Filter, which selects the closest measure-

ment, techniques such as Probabilistic Data Association Filter

(PDAF) are available for the single target case. The under-

lying assumption of the PDAF is that for any given target only

one measurement is valid, and the other measurements are

modeled as random interference, that is, i.i.d. uniformly

distributed random variables. The Joint Data Association

Filter (JPDAF) [5, p. 222], on the other hand, calculates the

measurement-to-target association probabilities jointly

across all the targets. A different strategy is represented by

the Multiple Hypothesis Filter (MHF) [63], [20], [5, p. 106]

which evaluates the probability that a given target gave rise to

a certain measurement sequence. The MHF formulation can

be adapted to track the modes of the state density [13]. The

data association problem for multiple target particle filtering

is presented in [62], [38].

The filtering and association techniques discussed above

were applied in computer vision for various tracking

scenarios. Boykov and Huttenlocher [9] employed the Kal-

man filter to track vehicles in an adaptive framework. Rosales

and Sclaroff [65] used the Extended Kalman Filter to estimate

a 3D object trajectory from 2D image motion. Particle filtering

was first introduced, in vision, as the Condensation algorithm

by Isard and Blake [40]. Probabilistic exclusion for tracking

multiple objects was discussed in [51]. Wu and Huang

developed an algorithm to integrate multipletarget clues [76].

Li and Chellappa [48] proposed simultaneous tracking and

verification based on particle filters applied to vehicles and

faces. Chen et al. [15] used the Hidden Markov Model

formulation for tracking combined with JPDAF data associa-

tion. Rui and Chen proposed to track the face contour based

on the unscented particle filter [66]. Cham and Rehg [13]

applied a variant of MHF for figure tracking.

The emphasis in this paper is on the other component of

tracking: target representation and localization. While the

filtering and data association have their roots in control

theory, algorithms for target representation and localization

are specific to images and related to registration methods [72],

[64], [56]. Both target localization and registration maximizes

a likelihood type function. The difference is that in tracking,

as opposed to registration, only small changes are assumed in

the location and appearance of the target in two consecutive

frames. This property can be exploited to develop efficient,

gradient-based localization schemes using the normalized

correlation criterion [6]. Since the correlation is sensitive to

illumination, Hager and Belhumeur [33] explicitly modeled

the geometry and illumination changes. The method

was improved by Sclaroff and Isidoro [67] using robust

M-estimators. Learning of appearance models by employing

a mixture of stable image structure, motion information, and

an outlier process, was discussed in [41]. In a different

approach, Ferrari et al. [26] presented an affine tracker based

on planar regions and anchor points. Tracking people, which

raises many challenges due to the presence of large 3D,

nonrigid motion, was extensively analyzed in [36], [1], [30],

[73]. Explicit tracking approaches of people [69] are time-

consuming and often the simpler blob model [75] or adaptive

mixture models [53] are also employed.

The main contribution of the paper is to introduce a new

framework for efficient tracking of nonrigid objects. We

show that by spatially masking the target with an isotropic

kernel, a spatially-smooth similarity function can be defined

and the target localization problem is then reduced to a

search in the basin of attraction of this function. The

smoothness of the similarity function allows application of

a gradient optimization method which yields much faster

target localization compared with the (optimized) exhaus-

tive search. The similarity between the target model and the

target candidates in the next frame is measured using the

metric derived from the Bhattacharyya coefficient. In our

case, the Bhattacharyya coefficient has the meaning of a

correlation score. The new target representation and

localization method can be integrated with various motion

filters and data association techniques. We present tracking

experiments in which our method successfully coped with

complex camera motion, partial occlusion of the target,

presence of significant clutter, and large variations in target

scale and appearance. We also discuss the integration of

background information and Kalman filter based tracking.

The paper is organized as follows: Section 2 discusses

issues of target representation and the importance of a

spatially-smooth similarity function. Section 3 introduces

the metric derived from the Bhattacharyya coefficient. The

optimization algorithm is described in Section 4. Experi-

mental results are shown in Section 5. Section 6 presents

extensions of the basic algorithm and the new approach is

put in the context of computer vision literature in Section 7.

2TARGET REPRESENTATION

To characterize the target, first a feature space is chosen.

The reference target model is represented by its pdf q in the

feature space. For example, the reference model can be

chosen to be the color pdf of the target. Without loss of

generality, the target model can be considered as centered

at the spatial location 0. In the subsequent frame, a target

candidate is defined at location y, and is characterized by the

pdf pðyÞ. Both pdfs are to be estimated from the data. To

satisfy the low-computational cost imposed by real-time

processing discrete densities, i.e., m-bin histograms should

be used. Thus, we have

target model :

qq ¼

u¼1...m

u¼1

¼ 1

target candidate :

ppðyÞ¼

ðyÞfg

u¼1...m

u¼1

¼ 1:

The histogram is not the best nonparametric density

estimate [68], but it suffices for our purposes. Other discrete

density estimates can be also employed.

We will denote by

ðyÞ½

ppðyÞ;

qqð1Þ

a similarity function between

pp and

qq. The function ^ðyÞplays

the role of a likelihood and its local maxima in the image

indicate the presence of objects in the second frame having

representations similar to

qq defined in the first frame. If only

spectral information is used to characterize the target, the

similarity function can have large variations for adjacent

locations on the image lattice and the spatial information is

COMANICIU ET AL.: KERNEL-BASED OBJECT TRACKING 565

lost. To find the maxima of such functions, gradient-based

optimization procedures are difficult to apply and only an

expensive exhaustive search can be used. We regularize the

similarity function by masking the objects with an isotropic

kernel in the spatial domain. When the kernel weights,

carrying continuous spatial information, are used in defining

the feature space representations,

ðyÞ becomes a smooth

function in y.

2.1 Target Model

A target is represented by an ellipsoidal region in the

image. To eliminate the influence of different target

dimensions, all targets are first normalized to a unit circle.

This is achieved by independently rescaling the row and

column dimensions with h

and h

Let x



i¼1...n

be the normalized pixel locations in the

region defined as the target model. The region is centered at

0. An isotropic kernel, with a convex and monotonic

decreasing kernel profile kðxÞ,

assigns smaller weights to

pixels farther from the center. Using these weights increases

the robustness of the density estimation since the peripheral

pixels are the least reliable, being often affected by

occlusions (clutter) or interference from the background.

The function b : R

! 1...m

associates to the pixel at

location x

the index bðx

Þ of its bin in the quantized feature

space. The probability of the feature u ¼ 1...m in the target

model is then computed as

¼ C

i¼1

k kx



bðx

Þÿu



; ð2Þ

where  is the Kronecker delta function. The normalization

constant C is derived by imposing the condition

u¼1

¼ 1,

from where

C ¼

i¼1

k kx



; ð3Þ

since the summation of delta functions for u ¼ 1...m is

equal to one.

2.2 Target Candidates

Let x

i¼1...n

be the normalized pixel locations of the target

candidate, centered at y in the current frame. The normal-

ization is inherited from the frame containing the target

model. Using the same kernel profile kðxÞ, but with

bandwidth h, the probability of the feature u ¼ 1...m in the

target candidate is given by

ðyÞ¼C

i¼1

y ÿ x





bðx

Þÿu½; ð4Þ

where

i¼1

kðk

yÿx

ð5Þ

is the normalization constant. Note that C

does not depend

on y, since the pixel locations x

are organized in a regular

lattice and y is one of the lattice nodes. Therefore, C

can be

precalculated for a given kernel and different values of h. The

bandwidth h defines the scale of the target candidate, i.e., the

number of pixels considered in the localization process.

2.3 Similarity Function Smoothness

The similarity function (1) inherits the properties of the kernel

profile kðxÞ when the target model and candidate are

represented according to (2) and (4). A differentiable kernel

profile yields a differentiable similarity function and efficient

gradient-based optimizations procedures can be used for

finding its maxima. The presence of the continuous kernel

introduces an interpolation process between the locations on

the image lattice. The employed target representations do not

restrict the way similarity is measured and various functions

can be used for . See [59] for an experimental evaluation of

different histogram similarity measures.

3METRIC BASED oN BHATTACHARYYA

COEFFICIENT

The similarity function defines a distance among target

model and candidates. To accommodate comparisons among

various targets, this distance should have a metric structure.

We define the distance between two discrete distributions as

dðyÞ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1 ÿ 

ppðyÞ;

qq½

; ð6Þ

where we chose

ðyÞ

ppðyÞ;

qq½¼

u¼1

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ðyÞ

; ð7Þ

the sample estimate of the Bhattacharyya coefficient

between p and q [43].

The Bhattacharyya coefficient is a divergence-type mea-

sure [49] which has a straightforward geometric interpreta-

tion. It is the cosine of the angle between the m-dimensional

unit vectors

ﬃﬃﬃﬃﬃ

^pp

; ...;

ﬃﬃﬃﬃﬃﬃ

^pp

ÿ

and

ﬃﬃﬃﬃﬃ

^qq

; ...;

ﬃﬃﬃﬃﬃﬃ

^qq

ÿ

. The fact

that p and q are distributions is thus explicitly taken into

account by representing them on the unit hypersphere. At the

same time, we can interpret (7) as the (normalized) correlation

between the vectors

ﬃﬃﬃﬃﬃ

; ...;

ﬃﬃﬃﬃﬃﬃ

ÿ

and

ﬃﬃﬃﬃﬃ

; ...;

ﬃﬃﬃﬃﬃﬃ

ÿ

Properties of the Bhattacharyya coefficient such as its relation

to the Fisher measure of information, quality of the sample

estimate, and explicit forms for various distributions are

given in [22], [43].

The statistical measure (6) has several desirable properties:

1. It imposes a metric structure (see Appendix). The

Bhattacharyya distance [28, p. 99] or Kullback

divergence [19, p. 18] are not metrics since they violate

at least one of the distance axioms.

2. It has a clear geometric interpretation. Note that the

histogram metrics (including histogram intersec-

tion [71]) do not enforce the conditions

u¼1

¼ 1

and

u¼1

¼ 1.

3. It uses discrete densities and, therefore, it is invariant

to the scale of the target (up to quantization effects).

4. It is valid for arbitrary distributions, thus being

superior to the Fisher linear discriminant, which

yields useful results only for distributions that are

separated by the mean-difference [28, p. 132].

566 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 5, MAY 2003

1. The profile of a kernel K is defined as a function k : ½0; 1Þ ! R such

that KðxÞ¼kðkxk

Þ.

剩余13页未读，继续阅读

评论收藏

内容反馈

qq_24580927

粉丝: 0
资源: 1

kernel-based_object_tracking_PAMI

Kernel-based object tracking

kernel—based object tracking

py-eddy-tracker-master_satphy_python_tracker_tracking_中尺度涡_

Robust kernel-based tracking algorithm with background contrasting

EKF based object detect and tracking for UAV by using visual-attention-model

Video object tracking based on improved gradient vector flow snake and intra-frame centroids tracking method

a contour-based moving object detection and tracking

UAV-auto-navigation-and-object-tracking-based-on-RL-main

Center-based 3D Object Detection and Tracking译文

MMST_Object_tracking.rar_Modified_mean-shift-tracking_object tra

Object_tracking_with_an_adaptive_color-based_particle_filter

用matlab仿真多目标跟踪中的航迹关联融合的程序-multiple_object_tracking_matlabcode(3D).rar

python项目-face++人脸识别考勤机-python_GUI-automatic_weather-face_gensui

Binocular-Stereo-Vision_双目立体视觉_运动对象跟踪_视觉跟踪_

jdk-14.0.2_windows-x64_bin.zip

dctracking.zip_Discrete-Continuous_multi object_multi-target_obj

A Contour-Based Moving Object Detection and Tracking

Tracking-Moving-Object-.rar_object tracking_目标搜索_目标跟踪_策略 pdf

kalman-filter-in-single-object-tracking-main.zip

human-face-tracking.rar_face tracking_human-face-tracking_人脸追踪ma

PyPI 官网下载 | mediapipe-0.8.9.1-cp38-cp38-win_amd64.whl

jdk-16.0.2_windows-x64_bin.zip和eclipse

orl_faces.rar_eye tracking_eye tracking - java_faces_java recogn

matlab匹配滤波代码-SFND_3D_Object_Tracking:该代码演示了如何使用Lidarpointcloud数据以及从图像进行

OpenCV-Object-Face-Tracking-master_opencv_人脸识别_人脸跟踪；算法_

Tracking7-7-21.rar_ Tracking7-7-21_帧 差法_帧差法图片_鐩爣璺熻釜

dipy-1.0.0-cp35-cp35m-win_amd64.whl.zip

Online_learning_of_robust_object_detectors_during_unstable_tracking_2009_olcv

最新资源

Tracking7-7-21.rar_ Tracking7-7-21_帧差法_帧差法图片_鐩爣璺熻釜