Context-aware-Visual-Tracking.zip_Aware_Context-aware

共1个文件

pdf：1个

版权申诉

aware

context-aware

112 浏览量 2022-09-14 19:43:56 上传评论收藏 1.29MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Context-aware-Visual-Tracking.zip （1个子文件）

Context-aware Visual Tracking.pdf 1.39MB

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 0, NO. 0, JUNE 2008 1

Context-aware Visual Tracking

Ming Yang, Member, IEEE, Ying Wu, Senior Member, IEEE, and Gang Hua, Member, IEEE

Abstract—Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have

to face in practice: tracking has to be computationally efﬁcient but verifying whether or not the tracker is following the true target tends

to be demanding, especially when the background is cluttered and/or when occlusion occurs. Due to the lack of a good solution to

this problem, many existing methods tend to be either effective but computationally intensive by using sophisticated image observation

models, or efﬁcient but vulnerable to false alarms. This greatly challenges long-duration robust tracking. This paper presents a novel

solution to this dilemma by considering the context of the tracking scene. Speciﬁcally, we integrate into the tracking process a set of

auxiliary objects that are automatically discovered in the video on the ﬂy by data mining. Auxiliary objects have three properties, at

least in a short time interval: (1) persistent co-occurrence with the target; (2) consistent motion correlation to the target; and (3) easy

to track. Regarding these auxiliary objects as the context of the target, the collaborative tracking of these auxiliary objects leads to

efﬁcient computation as well as strong veriﬁcation. Our extensive experiments have exhibited exciting performance in very challenging

real-world testing cases.

Index Terms—Computer vision, visual object tracking, context-aware, collaborative tracking, data mining, robust fusion, belief

inconsistency.

✦

1 INTRODUCTION

Obust long-duration visual tracking is demanded by many

contemporary applications such as video-based surveil-

lance and vision-based interfaces. One fundamental obstacle

in the way is the lack of efﬁcient means for veriﬁcation, i.e.,

to determine whether the object being followed by the tracker

is really the target. At the extreme, this is in fact a recognition

task. Without effective veriﬁcation, the tracker is likely to drift

away gradually, or fail when the target is occluded even for

a short period of time. Therefore, although extensive research

efforts have been taken, it is still quite difﬁcult in practice to

achieve robust and efﬁcient long-duration tracking in uncon-

strained real-world environments. Most existing methods are

in a dilemma: either be fast-but-fallible, or be robust-but-slow.

This dilemma originates from the opposite requirements

for the image likelihood models: on one hand, the likelihood

model should be simple for efﬁcient motion estimation and

tracking; on the other hand, it has to be sophisticated for

comprehensive veriﬁcation of the target. We call them de-

scriptive likelihood and discriminative likelihood, respectively.

In general, descriptive likelihood is based on the descriptive

image features that can be easily accessible and speciﬁed, e.g.,

contours [1], [2], colors [3], or even image regions [4], [5],

etc.. The matching of these image features leads to efﬁcient

computation of the descriptive likelihood and thus fast motion

estimation (e.g., differential methods such as kernel-based

tracking [3], [5], [6]).

However, in practice, many real-world complications such

• Ming Yang and Ying Wu are with Electrical Engineering and Com-

puter Science Department, Northwestern University, 2145 Sheridan

Road, Evanston, IL 60208-3118. Email: m-yang4@u.northwestern.edu,

yingwu@ece.northwestern.edu. Gang Hua is with Microsoft Research,

Redmond, WA 98053. Email: ganghua@microsoft.com.

Manuscript received June 15, 2007; revised January 3, 2008.

as clutters, illumination and view changes, low image quality,

motion blur, and partial occlusions, all may invalidate simple

descriptive likelihood models. As a result, good matches of

these descriptive features do not necessarily have to cor-

respond to the true target, and background false positive

objects may also be good matches. Over the years, there have

been two approaches to address this issue: on-line adaptation

of the descriptive likelihood models [5], [7]–[9], or using

discriminative likelihood models that distinguish the true target

from false positives. Without strong veriﬁcation that provides

conﬁdent supervision, on-line adaptation is risky and lacks a

mechanism to prevent drifting. On the other hand, discrimi-

native likelihood is generally associated with classiﬁers, e.g.,

the SVM tracker [10]. These classiﬁers can be trained off-

line or on-line [11], [12]. As learning a classiﬁer has to be

based on a large number of training features, it tends to be

computationally demanding.

Is there a way to get out of the dilemma so as to have more

efﬁcient but still effective veriﬁcation? In all these existing

methods, the dynamic environment is taken for granted as the

adverse party for the tracker, as it generates false positives, and

most computation has to be spent in separating the true target

from the environment. However, the environment can also

be advantageous to the tracker if it contains objects that are

correlated to the target. For example, if we need to track a face

in a crowd, it is almost impossible to learn a discriminative

model to distinguish the face of interest from the rest of the

crowd. Why do we have to focus our attention only on the

target? If the person (with that face) is wearing a quite unique

shirt (or a hat), then including the shirt (or the hat) in matching

will surely make the tracking much easier and more robust.

By the same token, if another face is always accompanying

the target face, treating them as a geometric structure and

tracking them as a group will be much easier than tracking

either of them. It is clear that this makes the veriﬁcation much

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 0, NO. 0, JUNE 2008 2

easier as the discriminative model is much simpler. We call this

new approach context-aware tracking (CAT) as it takes into

consideration the context of the target, as shown in Fig. 1.

A target is seldom isolated and independent to the entire

scene, therefore there may exist some objects that have short-

term or long-term motion correlations to the targets (but

are unknown to the tracker beforehand). Thus, taking the

advantage of these context information in an efﬁcient way can

improve the robustness of the tracker as the spatial context

provides additional veriﬁcation. We represent the context of

a target by a set of auxiliary objects that are automatically

discovered on the ﬂy in an unsupervised fashion by using data

mining techniques. A context-aware tracker can discover a set

of auxiliary objects and track them simultaneously. Specif-

ically in this paper, auxiliary objects are those that exhibit

strong motion correlation to the target. The correlation can be

employed to improve tracking and to provide computationally

efﬁcient but powerful veriﬁcation. Intuitively, an auxiliary

object should satisfy three properties at least in a short time

interval: 1) persist co-occurrence with the target, 2) consistent

motion correlation to the target, and 3) easy to track.

In the proposed context-aware tracking, auxiliary objects

can be in various forms, e.g. solid semantic objects which

bear intrinsic relations to the target, or certain image regions

that happen to have motion correlation with the target for a

short period of time. They may reliably associate to the target

for a long duration, or only for a short time interval, or may

not exist at all. Thus, it is impossible to determine auxiliary

objects off-line in advance, but they have to be discovered on

the ﬂy. We resort to data mining techniques for discovering

auxiliary objects by learning their co-occurrence associations

and estimating afﬁne motion models to the target. Data mining

methods originated from text information processing and rela-

tional databases [13], and found their uses in extracting video

objects [14]–[16]. To the best of our knowledge, this paper

presents an original attempt of combining visual tracking and

data mining in a collaborative tracking framework.

This new approach has the following advantages. Firstly,

it is computationally efﬁcient, because auxiliary objects are

easy to track (e.g. color regions) and do not incur much

computational cost. Secondly, it outputs more accurate track-

ing results. A context-aware tracker tracks the target and the

set of auxiliary objects as a random ﬁeld in a collaborative

manner. It is provably correct that the uncertainty of the motion

estimation of the target is reduced. Thirdly, it also provides

T

I

2

I

1

I

3

T

Fig. 1. Illustration of context-aware tracking. T indicates the

target and I

means the spatial context of the target. Traditional

tracking methods focus their attention on the target only, while

context-aware tracking considers the target and its spatial

context within a network.

effective veriﬁcation, because the learned motion and/or geo-

metric correlations among the target and the auxiliary objects

serve as strong cues for veriﬁcation. Last but not the least,

it is intelligent and robust. The context of a target, i.e. the

auxiliary objects and the motion correlation (i.e., the random

ﬁeld), is automatically discovered on the ﬂy. The robust fusion

embedded can handle partial occlusions and even camouﬂages.

Our extensive tests on real-world data give quite exciting

performance in dealing with challenging cases including large

scale changes, partial occlusions and complicated cluttered

backgrounds.

The remainder of this paper is organized as follows. Related

work on visual object tracking is reviewed in Sec. 2. The

overview of the proposed approach is presented in Sec. 3. The

three components of the proposed approach, i.e. discovering

the auxiliary objects by data mining, collaboratively fusing

the tracking results of auxiliary objects and the target, and

identifying the outliers, are elaborated in Sec. 4, Sec. 5, and

Sec. 6, respectively. Experiments on real-world sequences are

reported in Sec. 7. Concluding remarks are in Sec. 8.

2 R

ELATED WORK

Visual tracking has been an active research topic since the

early 1980s and keeps advancing both in theory and practice

as the expectations are soaring signiﬁcantly in real-world

applications, e.g. video-based security surveillance, medical

applications [17], autonomous vehicle [18]. The targets in

visual tracking evolve from points in dense optical ﬂow [19]–

[22], contours [1], blob regions [3], [23], to more com-

plicated articulated objects [24] and multiple objects [25],

[26]. Meanwhile, visual tracking is closely coupled with and

greatly beneﬁts from many related tasks, such as background

subtraction [27], image/motion segmentation [28], statistical

learning [10], [29]. For more comprehensive survey about

image features and techniques used in tracking, we refer the

readers to [30].

Regardless of the diverse features and targets studied in

tracking, essentially as a recursive motion estimation prob-

lem, visual tracking mainly involves two fundamental issues:

matching and searching. They correspond to target likeli-

hood/observation models that measure the matching between a

hypothesis and the target, and the motion estimation schemes

that search for the optimal hypothesis. Motion estimation

schemes can be differential and based on gradient descent

search [3], [5], and be sampling-based such as particle ﬁl-

ters [1], [31] or sequential Monte Carlo. The search may

incorporate the prior knowledge about target dynamics, e.g.,

Kalman ﬁlters, multiple hypothesis tracking (MHT) [32], [33],

or probability data association ﬁlter (PDAF) [34], [35].

Target likelihood/observation model is the core in visual

tracking which primarily determines tracking accuracy and ef-

ﬁciency. A target can be described by its visual features, based

on which a descriptive likelihood model can be constructed.

If the features are unique and invariant to the environment

changes, tracking is going to be an easy task. However, in

the real-world, the environment is unconstrained and presents

tremendous variabilities, it is skeptical if the invariant features

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 0, NO. 0, JUNE 2008 3

determined in advance (thus the descriptive likelihood model)

shall still be valid during the run time. Thus, a short term

invalidation of the likelihood model, e.g., the target moves out

of the ﬁeld of view or occlusion is present, is very likely to

fail the tracker. A more adverse failure situation is that the

tracker is following a false positive that also evaluates a large

descriptive likelihood.

To deal with this challenge, various approaches have been

proposed in the literature. Despite the versatile formulations, in

general, they can be categorized into the following three cases:

integrating multiple cues, on-line adaptation of descriptive

models, and using discriminative models. Taking into consid-

eration of multiple visual cues lead to a rich descriptive model,

e.g., geometry and illumination can be combined [5]. The inte-

gration can based on simple heuristics [2] or co-inference [36].

On-line adaption of descriptive models changes the parameters

of the likelihood model according to the changes of the envi-

ronment. For example, an appearance model can be adapted

based on EM [23] or based on an incremental updating of the

basis of the appearance subspace [7], [8].

Since descriptive likelihoods only check the matching of

predeﬁned features, a good match is not necessarily be the true

target but a false positive. Therefore, another approach is based

on using discriminative likelihood models that distinguish the

target from the environment. Such discriminative models can

be trained off-line in advance, e.g., the SVM tracker [10]

that uses the SVM score as the matching criterion. Since

the off-line training is to optimize the global and generic

discrimination performance, it may not be accurate enough

locally. Therefore, on-line adaptation can also be used for

discriminative models. For example, this can be done by on-

line selection of discriminative color spaces from a ﬁxed set

of predeﬁned color spaces to distinguish the target from the

background [37], or by selecting Haar features from a large

pool [12], or by learning a set of weak classiﬁers [11], [38].

In contrast to these existing methods, we propose a novel

approach to enhancing the observation model by on-line

discovery of some auxiliary objects [39] which can help verify

the target tracking results. These auxiliary objects with short-

term motion correlation to the target can serve as the context of

the target. Tracking the target as well as the auxiliary objects

in a collaborative way can effectively reduce the uncertainty

of the tracking results and deal with large uncertainties of the

environments.

3 O

VERVIEW OF OUR APPROACH

The proposed approach, called context-aware visual tracking,

or CAT, has the following three important components:

• Mining auxiliary objects (in Sec. 4): the methods of

extracting the candidates of auxiliary objects and mining

the associations will be discussed. For auxiliary object

candidates, multibody grouping is employed to discover

the potential multibody structure from motion and to esti-

mate the afﬁne motion models through subspace analysis.

This step not only identiﬁes a set of auxiliary objects, but

also learns a random ﬁeld among them;

• Collaborative tracking (in Sec. 5): both the target and

the set of auxiliary objects need to be tracked in CAT.

Because they are not independent, the tracking is formu-

lated based on a random ﬁeld and is achieved efﬁciently

by the collaborations among all the individual trackers in

the network where an individual tracker inﬂuences other

trackers as well as receiving inﬂuence from others;

• Robust fusion (in Sec: 6): for an individual tracker,

there may exist inconsistency among the inﬂuences it

receives and its own image measurements. Handling

inconsistency is fundamental and critical to fuse auxiliary

object trackers and the target tracker.

The entire procedure of CAT algorithm is summarized in

Fig. 2. The details of each component will be explained in the

following sections.

4 M

INING AUXILIARY OBJECTS

4.1 Auxiliary objects

Auxiliary objects (AOs) are the spatial context that can help

the target tracker. We abuse a little bit the term “object”. In

fact, it is not necessary for an AO to be a semantic object. In

the tracking scenario, it refers to an informative image region

or an image feature that satisﬁes the following three properties:

1) frequent co-occurrence with the target;

2) consistent motion correlation to the target;

3) suitable for tracking.

Although this deﬁnition may cover a large variety of

image regions or features, not all of them are appropriate

for balancing the complexity and generality. Since the prior

knowledge about the target and the environments are in general

not accessible, it is preferable to choose simple, generic and

low-level auxiliary objects, such as image regions or feature

points. Feature points are geometrically signiﬁcant and provide

the most localized information. There are some outstanding

work on invariant feature points, e.g. [40]–[43]. Although

feature points may be salient and therefore suitable for object

recognition, they are in general prone to occlusion, lighting

and local geometry changes. Thus they are not always stable

and reliable in video. In addition, extracting invariant features

needs a good amount of computation, which makes it hard

to achieve real-time performance. Therefore, although the

tracking of feature points can be quite efﬁcient, we generally

do not use feature points as auxiliary objects.

Instead, we choose to use signiﬁcant image regions. Differ-

ent from localized image feature points, image regions reﬂect

the visual property of a neighborhood, and they tolerate more

occlusions and local geometry changes. More importantly,

image regions, if selected properly, can be reliably and efﬁ-

ciently tracked, for example, by the mean-shift algorithm [3].

Although texture regions may have invariants and can be very

signiﬁcant, our current implementation does not use them

because it takes more computation to spot them than color

regions. Therefore, our current treatment for data mining is

to discover a set of color regions that are temporally stable

and spatially correlated to the target in a video sequence in an

unsupervised way.

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 0, NO. 0, JUNE 2008 4

Mining auxiliary objects

Robust fusionCollaborative tracking

Input

frames

Target

tracker

Auxiliary

trackers

Quad-tree

color

segmentation

Incremental

clustering

Frequent item

mining

Belief

propagation

Outlier

removal

Belief

propagation

Target

tracker

tracking

results

Multibody

grouping

Fig. 2. Block diagram of the CAT algorithm. The sub-modules of auxiliary object mining, collaborative tracking, and robust fusion

are enclosed in dash rectangles.

4.2 Item candidate generation

To follow data mining’s conventions and make our discussion

clear, we deﬁne the following terms for our video data mining

task.

Deﬁnition 1: We denote an item candidate by s which

is a particular image feature obtained by low-level image

processing; an item by I which is a quantized item candidate in

a vocabulary V = {I

, . . . , I

} which is learned by clustering

all item candidates; an itemset by I ⊂ V, set of items; and a

transaction by τ , the itemset within a neighborhood R.

In our implementation, an item candidate is a rough color

segment with its motion parameters, and an item is deﬁned by

I = {H(I), x

}, where H(I) is the average color histogram

of the item and x

is the motion parameters and respective

covariances. The set of candidate AOs, denoted by F , is a

subset of V, which are frequently co-occurrent with the target.

The candidate AOs that have strong motion correlations to the

target are identiﬁed as auxiliary objects.

The item candidates s, i.e., the color segments in our case,

are the inputs for mining. In the tracking scenario, efﬁcient

segmentation is more preferred than a delicate but expensive

one since exact boundaries of the segments are not necessary

for mining and tracking. In our current implementation, we

employ the classical split-merge quad-tree color segmenta-

tion [44]. The image is recursively split into the smallest

possible homogenous color regions, and then the adjacent

regions with similar appearances are merged gradually. The

most prominent advantage of this method is computational

efﬁciency. Some segments are not appropriate for tracking, so

we employ some heuristics to prune them, e.g. segments that

are too large (the area over 1/2 of the entire image) or too

small (the area less than 64 pixels), and concave segments (the

area less than 1/2 of the bounding box) are excluded. These

kinds of item candidates are suitable for tracking. Fig. 3 shows

some typical segmentation results.

4.3 Frequent item mining

Candidate auxiliary objects are the items that are frequently

co-occurrent with the target. To build the vocabulary V so as to

construct the transactions for mining, we need to quantize the

Fig. 3. Illustration of the quad-tree color segmentation. (left)

input frame, (middle) over-segmentation, (right) pruned seg-

mentation.

item candidates. In conventional mining applications, usually

item candidates can be collected and quantized off-line by

k-means or kNN clustering methods. But in this tracking

scenario, we have to do this in an incremental way. The pro-

cedure is the following. The color segments in each incoming

frame are matched to the items in current vocabulary by the

Bhattacharyya coefﬁcient [3] of the histograms of the segments

as the similarity measurement. Then, each color segment (i.e.

item candidate) can be quantized and given a label, e.g. I

are items as shown in Fig. 4. Afterwards, for each item, we

form a transaction that consists of the item itself and the items

within its neighborhood. There are different choices of the

neighborhood. For example, we can use the item itself (i.e. use

a 0 neighbor). The items inside the region of interest in each

frame construct a transaction τ , and a transaction database is

built based on M consecutive frames.

Given the transaction database, the items which have a high

co-occurrent frequency will be chosen as candidate auxiliary

objects. Since the mining is performed online, we need to

take into account the importance of the historical images.

We maintain an M-frame sliding window and count the item

frequency f (I

) =

i=t−M+1

t−i

) with the forgetting

factor β = 0.9 where B

) is a binary function and 1

indicates I

appears in frame i. If image segmentation does not

end up with too many small segments, the frequent items are

good enough for identifying candidate auxiliary object. If the

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 0, NO. 0, JUNE 2008 5

segmentation tends to over-segment and produces too many

small segments, we cannot use 0 neighbor for constructing

transactions, but use the nearby items to form transactions to

identify co-occurrent patterns that merge the adjacent small

segments. This is another reason that it is ﬁne for image

segmentation step to be imperfect. As illustrated in Fig. 4,

though there are quite many color segments in each frame, by

counting their co-occurrent frequencies, only F = {I

, I

}

are identiﬁed as frequent items, i.e. candidates of auxiliary

objects. The rest of the problem is to determine whether a

candidate really bears a motion correlation to the target. The

issue will be discussed next.

4.4 Mining by subspace analysis

Finding the frequent items only spots the candidate auxiliary

objects that are frequently co-occurrent with the target, but

they do not necessarily exhibit strong motion correlations

to the target. For example, in Fig. 4, I

is less correlated

to the target T than I

does. We need to check if these

candidates satisfy the motion correlation requirement of an

auxiliary object. For each candidate, we can initialize a mean-

shift tracker to ﬁnd its correspondences in the successive

image frames. If this tracker loses track for 4 frames in a

row, we assert that this candidate is not suitable for tracking

and remove it. Otherwise, we can form the motion trajectories

over the frames for a set of candidate auxiliary objects. Then,

we employ a noise subspace analysis method to discover the

potential multibody structure from motion and estimate the

afﬁne motion models between the object pairs.

The motion correlation between two moving objects can be

very complicated and non-linear, but generally linear motion

models can be used as a good approximation. We extend the

simple translational model in [39] to a more general afﬁne

motion model. When the points on two objects have afﬁne

motion relation, they must reside in a linear subspace. Thus,

identifying this subspace will lead to the estimation of the

afﬁne motion model.

At time t, one candidate auxiliary object I

∈ F is

represented as x

= {u

, v

}

⊤

and {s

, s

} where (u

, v

)

are the coordinates of the center of I

and s

are the

scales, respectively. Similarly the target T can be represented

as y

= {u

, v

}

⊤

and {s

, s

}. If I

and T co-occur and

have stable motion correlation, then I

can be claimed as an

auxiliary object. So the goal is to evaluate whether I

and T

have strong motion correlation in time window [t − M + 1, t]

given the trajectories of y

and x

within this time window.

Assume an afﬁne motion model between candidate auxiliary

object I

and the target T for the period of frame t − M + 1

to frame t, which is speciﬁed by a 2 × 2 matrix A

and a

translation vector b

= {u

, v

}

⊤

, as

= A

+ b

. (1)

Subtract the mean

of y

and

of x

in the time window

[t−M +1, t] and take the noise into consideration, the relation

between I

and T can be expressed with

= y

−

and

= x

−

, as

= A

+ n, (2)

where n is a zero mean white noise with E[nn

⊤

] = σ

If we stack

and

, the covariance matrix C can be

expressed as

C = E[





(

⊤

)]. (3)

It is clear that rank(C) ≤ 2 if there is no noise (i.e. n = 0).

This rank deﬁciency property is important in detecting the

subspace due to motion correlation. In reality, because n 6= 0,

C is likely to have a full rank. Since the noise is additive,

it is easy to prove that the 4D space spanned by



⊤



is a direct sum of a signal subspace and a noise subspace.

The signal subspace is up to rank 2 and corresponds to the

large eigenvalues of C, and the noise subspace corresponds to

the smallest eigenvalues (i.e. σ). Therefore, we can check and

threshold the eigenvalues to identify those subspaces.

Denote the estimated covariance matrix by

C and the

covariance matrix of

x by

, and we have

C =

M−1

i=0



t−i



(

⊤

t−i

⊤

t−i

) =



⊤

+ σ

⊤



(4)

Performing eigenvalue decomposition on

C = QΛQ, (5)

we obtain the sorted eigenvalues {λ

, · · · , λ

} and orthonor-

mal basis Q. If there are more than 2 eigenvalues λ

≫ σ

this candidate is not an auxiliary object since its motion and

the target’s are not in one subspace.

# of {λ

≫ σ

}



> 2, the candidate is not an AO

<= 2, otherwise

(6)

If the candidate is an auxiliary object, we can estimate its

afﬁne matrix A

with the property that the noise subspace is

orthogonal to the signal subspace. The last two eigenvectors

correspond to the noise subspace of

C are denoted as













which are orthogonal to arbitrary vector (

⊤

) in the

signal subspace. Substitute them back to

C, the 2 × 2 matrix

can be solved by

⊤









= 0. (7)

Then, the translation vector b

is obtained with

, and A

This method gives an effective detection of auxiliary objects

and efﬁcient estimation of their afﬁne motion models.

Such a mining process is meaningful, because it has learned

a random ﬁeld. We denote the motion of the target T by y and

those of the auxiliary objects by x

, k = 1, . . . , K, where K

is the number of auxiliary objects. They constitute a random

ﬁeld. The pair-wise potentials ψ

, y) are actually learned

as a by-product of this mining process, as

, y) ∝ e

−

(y−A

−b

)

⊤

(y−A

−b

)

2σ

, (8)

评论收藏

内容反馈

版权申诉

周楷雯

粉丝: 78
资源: 1万+

Context-aware-Visual-Tracking.zip_Aware_Context-aware_context aw

最新资源

Context-aware-Visual-Tracking.zip_Aware_Context-aware_context aw

Context-Aware-CF-Tracking

siamRPN论文系列.rar

matlab女孩代码-Occlusion-aware-real-time-object-tracking-:遮挡感知的实时对象跟踪

CVPR2018_Oral_论文合集_人工智能_机器学习

论文研究-A Spatial-aware Tracker.pdf

鼓感知集成结构，用于改进音乐节拍和降拍的联合跟踪_Drum-Aware Ensemble Architecture for Im

Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT

EurekaLog_7.5.0.0_Enterprise

long-term tracking

php.ini-development

Text Mining: Classification, Clustering, and Applications

EXPERT-use-cases:可重复的研究和EXPERT的功能展示

Context Aware CF Tracking目标跟踪官方源码CVPR 2017（含论文原文及补充材料）

OpenCV 3.x with Python By Example, 2nd Edition-Packt Publishing(2018).pdf

数位板压力测试

Quectel_EC25&EC21;_GNSS_AT_Commands_Manual_V1.1

valse2022 poster合集

Mastering OpenCV 3 - Second Edition.azw3格式电子书下载

冰河的渗透实战笔记-冰河.pdf

大灰狼远控2021最新版，解压密码222

J-LINK V10 V11固件.rar

ISO21434.pdf

Web安全漏洞扫描工具-AWVS14

CTF 竞赛入门指南（ctf-all-in-one）.pdf

Web中间件常见漏洞总结.pdf

stm32f103 adc采样+dma传输+fft处理 频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_

jts-1.14.zip

CobaltStrike4.4.zip

最新资源

stm32f103 adc采样+dma传输+fft处理频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_