cvpr.zip_cvpr资源-CSDN文库

共8个文件

pdf：8个

版权申诉

cvpr

64 浏览量 2022-09-20 21:15:11 上传评论收藏 18.64MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

cvpr.zip （8个子文件）

[2014,ITC]Feature correlation hypergraph-exploiting high-order potentials for multimodal recognition.pdf 13.9MB

[2014,icde]Top-K Interesting Subgraph Discovery in Information Networks.pdf 651KB

[2013,ICDM] Learning imbalanced multi-class data with optimal dichotomy weights.pdf 164KB

[2014,TPAMI]Optimized Product Quantization.pdf 827KB

[2013,icdm]Constructing Topical Hierarchies in Heterogeneous Information Networks.pdf 1.82MB

[2013,KDD]Selective Sampling on Graphs for Classification.pdf 384KB

[2013,TR]Associative embeddings for large-scale knowledge transfer with self-assessment.pdf 4.68MB

[2013,TKDE] Transductive multilabel learning via label set propagation.pdf 299KB

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CYBERNETICS 1

Feature Correlation Hypergraph: Exploiting

High-order Potentials for Multimodal Recognition

Luming Zhang, Yue Gao, Chaoqun Hong, Yinfu Feng, Jianke Zhu, Member, IEEE, and Deng Cai

Abstract—In computer vision and multimedia analysis, it is

common to use multiple features (or multimodal features) to

represent an object. For example, to well characterize a natural

scene image, we typically extract a set of visual features to

represent its color, texture, and shape. However, it is challeng-

ing to integrate multimodal features optimally. Since they are

usually high-order correlated, e.g., the histogram of gradient

(HOG), bag of scale invariant feature transform descriptors,

and wavelets are closely related because they collaboratively

reﬂect the image texture. Nevertheless, the existing algorithms

fail to capture the high-order correlation among multimodal

features. To solve this problem, we present a new multimodal

feature integration framework. Particularly, we ﬁrst deﬁne a

new measure to capture the high-order correlation among the

multimodal features, which can be deemed as a direct extension of

the previous binary correlation. Therefore, we construct a feature

correlation hypergraph (FCH) to model the high-order relations

among multimodal features. Finally, a clustering algorithm is

performed on FCH to group the original multimodal features

into a set of partitions. Moreover, a multiclass boosting strategy

is developed to obtain a strong classiﬁer by combining the weak

classiﬁers learned from each partition. The experimental results

on seven popular datasets show the effectiveness of our approach.

Index Terms—Feature correlation hypergraph, high-order re-

lations, multimodal features.

I. Introduction

O BETTER recognize objects, the human cognitive sys-

tem usually combines different types of features. For

example, it is difﬁcult to separate the pear from the banana

by using color information alone because both are of yellow.

Similarly, it is difﬁcult to separate apple from pear by using the

shape information alone. However, by combining both color

and shape into a more discriminative feature and employing it

Manuscript received December 13, 2012; revised June 7, 2013; accepted

September 22, 2013. This work was supported in part by the Singapore Na-

tional Research Foundation under its International Research Centre Singapore

Funding Initiative and administered by the IDM Programme Ofﬁce, and in

part by the Natural Science Foundation of China under Grant 61202145. This

paper was recommended by Associate Editor H. Zhang.

L. Zhang and Y. Gao are with the School of Computing, Na-

tional University of Singapore, Singapore (e-mail: zglumg@zju.edu.cn;

kevin.gaoy@gmail.com).

C. Hong is with the Department of Computer Science, Xiamen University

of Technology, Xiamen, China (e-mail: cqhong@xmut.edu.cn).

Y. Feng and J. Zhu are with the College of Computer Sci-

ence, Zhejiang University, Zhejiang, China (e-mail: yinfufeng@zju.edu.cn;

jianke11@zju.edu.cn).

D. Cai is with the State Key Laboratory of CAD & CG, College of Computer

Science, Zhejiang University, Zhejiang, China (e-mail: dengcaii@zju.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCYB.2013.2285219

for recognition, these fruits can be recognized more easily and

accurately. Motivated by this instance, researchers managed to

improve the recognition accuracy by integrating multimodal

features in the recognition process. In contrast to the con-

ventional single modal approaches, the multimodal features

contain the richer cues, and if such different modalities of

features are integrated optimally, a great enhancement can be

obtained in the process of patternrecognition.

Our work closely relates to three research topics: multicue

integration, modality identiﬁcation integration, and hypergraph

learning methods.

A. Multicue-Based Integration

Multicue integration treats each multimodal feature as a

modality. In the literature, a series of multicue integration

methods have been proposed. In [1], each multimodal feature

corresponds to a subclassiﬁer and recognition accuracy of

the subclassiﬁer with the highest accuracy is treated as the

ﬁnal decision. Kittler et al. [2], [40] proposed a theoretical

framework to combine the decisions from subclassiﬁers though

several classiﬁer combining strategies, such as product rule or

sum rule are derived and analyzed. Greene et al. [3] proposed

a supervised algorithm for multimodal features combination,

wherein the combination is performed by applying matrix fac-

torization to group the related clusters produced on individual

views. Zhao et al. [4] proposed a multimodal feature selection

method by adjusting the covariance matrix obtained from the

multimodal features. Multiple kernel learning (MKL) [5] lin-

early combines kernels from the different multimodal features

into a more expressive one. As a generalized version of MKL,

LPBoost [6] adopts a boosting strategy to integrate multimodal

features. By exploring the complementary property of differ-

ent multimodal features, Multiview spectral embedding [7]

and Multiview stochastic neighbor embedding(m-SNE) [8]

obtains a physical meaningful embedding of the multimodal

features. It is noticeable that, all the above multicue integration

methods treat each multimodal as one modality, which is

heuristic-based one, since the interdependency between dif-

ferent features is left unexplored. Foresti and Snidaro [53],

Yang et al. [54] for detecting and tracking people, and

Wang et al. [55] and Kankanhalli et al. [57] for video surveil-

lance and trafﬁc monitoring.

B. Modality Identiﬁcation-Based Integration

In order to overcome the limitation of multicue integration,

modality identiﬁcation integration is proposed by rearranging

2168-2267

 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CYBERNETICS

all multimodal features into one or several modalities. Fea-

ture level integration [9] concatenates different multimodal

features into one single vector. Linear discriminant analysis

(LDA) [10] is employed to further increase the discriminative

ability of this single vector. However, [9] has two limitations:

1) the feature level integration does not explicitly consider the

relations between the multimodal features, and 2) the number

of the extracted features depends on the number of cate-

gories. Toward a better combination of multimodal features,

Wu et al. [11] proposed a so-called independent modality

analysis to group multimodal features into a set of modalities

ﬁrst. Then, features in these modalities are integrated together

in the kernel space by support vector machine (SVM) [12],

[45], [46]. Although better recognition accuracy is obtained

in this technique, the predeﬁned kernel functions in SVM

prevent it from being more ﬂexible and accurate. Moreover,

the binary correlation between the different multimodal fea-

tures is not consistent with real condition, such as an object

tracked by multiple cameras. Adams et al. [33], [50], [61]

adopted a late fusion approach to detect semantic concepts

(e.g. sky, ﬁre-smoke) in videos using visual, audio, and textual

modalities. They use a discriminate learning approach while

fusing different modalities at the semantic level. For example,

the scores of all intermediate concept classiﬁers are used to

construct a vector that is passed as the semantic feature in

SVM. This ﬁgure depicts that audio, video, and text scores are

combined in a high-dimensional vector before being classiﬁed

by SVM. The black and white dots in the ﬁgure represent

two semantic concepts. Hall et al. [34], [51] provides an

introduction to those multisensor data fusion technologies.

Snoek et al. [52] have compared both the early and late

fusion strategies for semantic video analysis. Using the former

approach, the visual vector has been concatenated with the text

vector and then normalized to use as input in SVM to learn

the semantic concept. In the latter approach, the authors have

adopted a probabilistic aggregation mechanism. Based on an

experiment on 184 hours of broadcast video using 20 semantic

concepts, this paper concluded that a late fusion strategy

provided better performance for most concepts, but it bears

an increased learning effort. The conclusion also suggested

that when the early fusion performed better, the improvements

were signiﬁcant. However, which of the fusion strategies is

better in which case needs further investigation.

C. Hypergraph Learning Methods

As a generalization of classic graph, hypergraph [36] is a

natural high-order relations descriptor. It is a widely used tool

that facilitates image retrieval [49], [58], segmentation [37],

etc. Recently, a series of hypergraph-based approaches have

been proposed. Huang et al. [37] proposed a video object

segmentation method by using a hypergraph to represent the

spatial-temporal neighboring relationships among the over-

segmented image patches. Furthermore, Huang et al. [38] for-

mulated image retrieval into hypergraph ranking, which adopts

a so-called latent context variable to measure the pairwise

afﬁnity between data points, Polak et al. [39] used a three-

order hypergraph to model the afﬁnity-based probabilistic

clustering. Yu et al. [41] proposed an adaptive hypergraph

Fig. 1. Graphical overview of our approach (The blue arrows show the

training stage while the yellow arrows show the test stage.).

learning method. Each hyperedge is generated by linking

images and their nearest neighbors, and the number of neigh-

bors is dynamically determined. Bu et al. [43] proposed a

music recommendation system by describing the relationship

of different entities through a hyperedge, e.g., music, tag, and

users. Gao et al. [44] constructed multiple hypergraphs for a

set of 3-D objects based on their 2-D views, wherein each

vertex is an object and each edge is a cluster of views. Thus,

an edge connects multiple vertices and retrieval/recognition

are performed accordingly. Noticeably, the hyperedges of the

aforementioned methods describe the high-order relations of

image patches, or multiview images, or multiple semantically

correlated entities. These high-order relations are explicit and

easy to be discovered. In contrast, the high-order relations

among multimodal features are much implicit, and actually

there is no measure to describe it yet.

By deﬁning feature correlation hypergraph (FCH) to repre-

sent the high-order relations among multimodal features, a new

multimodal feature integration framework is proposed to focus

on exploiting the high-order potentials among multimodal

features to ameliorate recognition process. To this end, a new

measure called shared entropy is proposed to capture the high-

order correlation among multimodal features. Moreover, an

efﬁcient FCH construction algorithm is proposed accordingly.

Based on the FCH, we cluster the multimodal features into a

set of partitions and obtain the interpartition and intrapartition

matrices, which reﬂect the interpartition and intrapartition’s

relations, respectively. To integrate the interpartition and intra-

partition matrix together, we combine the subclassiﬁer learned

from each intrapartition matrix into a strong classiﬁer by using

a boosting strategy. An overview of our approach is given

in Fig. 1.

II. Features Correlation Hypergraph

As mentioned in Section I, the correlation among the

multimodal features are very helpful to ﬁnd their complemen-

tation for recognition tasks. However, the relations among the

multimodal features are too complex to be described by using

the binary correlation only. For instance, when recognizing

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ZHANG et al.: FEATURE CORRELATION HYPERGRAPH 3

Fig. 2. Visual explanation of the shared entropy. The three cycles represent

S(X

), S(X

) and S(X

), respectively. S(X

), S(X

) and S(X

)

are represented by the overlaps colored with gray+green, gray+red, and

gray+blue, respectively. S(X

) is represented by the overlap colored

with gray.

an object tracked by multiple cameras, the multimodal fea-

tures describe the object from different aspects/views of the

same object. Thus, binary correlation fails to capture the

complementary relation among these features. Therefore, it is

necessary to ﬁnd a new measure to represent the high-order

relations among the multimodal features.

A. Measure of Multimodal Features Correlation

Given a set of multimodal features X

,...,X

(m ≥ 2),

their joint entropy [13] measures the amount of information

contained in these multimodal features

J(X

,...,X

)

= −



,...,x

P(x

,...,x

)log

P(x

,...,x

) (1)

where P(x

,...,x

) is the joint probability of multimodal

features X

,...,X

Based on joint entropy, a new measure called shared en-

tropy is deﬁned to measure the high-order correlation among

multimodal feature, i.e.

S(X

,...,X

)=(−1)



i=1

+(−1)



1≤i<j≤m

+ ···+(−1)

m−1

1,2,...,m

(2)

where J

is abbreviated for J (X

), J

is abbreviated for J (X

and J

(

1, 2, ··· ,m) is abbreviation for J(X

, ··· ,X

which is deﬁned above.

As shown in Fig. 2, the larger the shared en-

tropy S(X

,...,X

) is, the closer multimodal features

,...,X

are correlated. Noticeably, if m = 1, the

shared entropy reduces to the entropy of a single feature, and

if m = 2, the shared entropy reduces to information gain [13].

Importantly, only nominal features can be measured by shared

entropy directly. To handle the continuous features, we can

discretize them into the nominal features beforehand.

B. Feature Correlation Hypergraph

Hypergraph [36] is a generalization of traditional graph in

which the edges, called hyperedges, are arbitrary nonempty

subsets of the vertex set. Unlike traditional graph, which only

represents binary relations between vertices, hypergraph can

Fig. 3. Graphical illustration of hypergraph. (a) Conventional graph in which

two cameras are connected together by an edge if they describe the same

object. This graph cannot tell whether an object is tracked by three or

more cameras. (b) Hypergraph that completely illustrate the complex relations

among objects and cameras).

capture the high-order relations among vertices. A graphical

illustration is given in Fig. 3.

Because the shared entropy measures the high-order corre-

lation among the multimodal features, it is naturally to analogy

this correlation to a hyperedge which connects multiple ver-

tices. Motivated by this analogy, we model the multimodal

features as well as their correlation as a weighted hypergraph,

called feature correlation hypergraph (FCH)

G = {V, E, w, δ

,δ

} (3)

where V = {v

,...,v

} is a set of vertices, and vertex

represents a multimodal feature X

; E = {e

} is a set of

hyperedges, and each hyperedge is a nonempty subset of V ;

w is a function assigning the shared entropy S(e) as the weight

of each hyperedge e ∈ E; δ

and δ

are two parameters

controlling the scale of FCH G, wherein δ

is the depth of

G (i.e., maximum degree of hyperedges in G, and δ

is the

minimum shared entropy of hyperedges in G.

The degree of vertex v is deﬁned as

d(v)=



v∈e,e∈E

w(e). (4)

Given a hyperedge e, its degree η(e) is the number of

multimodal features within this hyperedges. Given a η-degree

hyperedge e anda(η−1)-degree hyperedge e



,ife



⊂ e, then e

is called the child hyperedge of e



and conversely, e



is called

the parent hyperedge of e. The vertex-edge incident matrix

H ∈ R

|V |×|V |

is the diagonal matrix containing the weight of

hyperedges, i.e.

H(v, e)=



1ifv ∈ e,

0 otherwise

(5)

let W ∈ R

|V |×|V |

be the diagonal matrix containing the weight

of hyperedges, and the adjacent matrix A is

A = HWH

(6)

where H

is the transpose of H.

C. Efﬁcient FCH Construction Algorithm

Based on the concept of FCH, given a set of multimodal

features {X

,...,X

}, the construction of an FCH can

be deemed as building a set of hyperedges, i.e., identify-

ing whether multimodal features in a candidate hyperedge

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CYBERNETICS

Algorithm 1 Efﬁcient FCH Construction

Input:

A set of multimodal features: X

,...,X

; the depth of FCH: δ

; the min-

imum shared entropy of FCH: δ

; set L

as the original multimodal features

, ··· ,X

;

Output:

A set of hyperedges L

, L

,...,L

;

begin:

1: L

= {X

,...,X

};

2: For i =1toδ

set e as the ﬁrst hyperedge in L

i−1

;

do begin

set e



as the hyperedge next to e in L

i−1

;

do begin

if (|e ∩ e



| == 1)

construct a candidate hyperedge e

← e ∪ e



;

else

break;

reset count to 0 and search the parent hyperedges of e

in L

i−1

;

if one parent hyperedge is found, then count ← count+1;

if (count = i − 2&&η ≤ S(e

))

insert e

into L

; assign S(e

) as the weight of e

;

mark all parent hyperedge of e

as removing hyperedges;

else

set e



as the hyperedge next to e



in L

i−1

;

end until (e



== null)

set e as the hyperedge next to e



in L

i−1

;

end until(e == null)

remove all marked hyperedges;

end for;

end

are highly correlated. Unfortunately, if we build an i-degree

hyperedge (1 ≤ i ≤ δ

) in a straightforward manner, we

have to evaluate C

candidate hyperedges, wherein δ

is the

depth of G. That is to say, to construct a δ

-depth FCH,

we have to evaluate



i=1

candidate hyperedges, which is

computational intractable if δ

is large.

To accelerate the construction of FCH, we make use of the

observation that the shared entropy among a set of multimodal

features X = {X

,...,X

} is lower and upper bounded,

i.e.

≤ S(X ) ≤ S(X

sub

) (7)

where S(X

sub

) ⊆ X is a subset of X , and δ

is the minimum

shared entropy of hyperedges in G.

Equation (7) implies that if we want to build a hyperedge

connecting a set of multimodal feature X, we must ensure

that all subsets of X are connected beforehand. Thus, a dy-

namic programming-based algorithm is proposed to accelerate

the construction of FCH. Based on the concept of dynamic

programming, we decompose the procedure of building a

hyperedge into a number of separate subprocedures. Each sub-

procedure identiﬁes whether one child hyperedge of the cur-

rent candidate hyperedge can be built. If one child hyperedge

fails, the procedure of building the current candidate hyper-

edge is terminated. Therefore, in contrast with the aforemen-

tioned straightforward strategy, the dynamic programming-

based hyperedge construction is accelerated since fewer can-

didate hyperedges are evaluated.

The details of the FCH construction are presented

in Algorithm 1. Given a set of multimodal features

,...,X

,δ

, the depth of an FCH, and δ

, the minimum

shared entropy of hyperedge in an FCH. The algorithm itera-

tively builds hyperedges. Intuitively, the 1-degree hyperedges

are each multimodal feature, thus the list L

stores the m input

Fig. 4. Graphical illustration of Algorithm 1 [when (X

) becomes a

newly constructed hyperedge, (X

) and (X

) are marked as removing hyper-

edge; similarly, when (X

) becomes a newly constructed hyperedge,

), (X

) and (X

) are marked as removing hyperedges.].

multimodal features (Step 1). Then, hyperedges of degree 2 to

degree δ

are built iteratively (Step 2). In the ith iteration,

each pair of hyperedge (e, e



) in list L

i−1

is sequentially

evaluated to identify whether it can be used to build an i-degree

candidate hyperedge e

, i.e., whether e and e



share only

one different vertex (Step 3). If yes, we continue evaluating

the rest hyperedges in L

i−1

to further identify two additional

conditions. The ﬁrst one is whether the remaining i − 2 parent

hyperedge of e

are existed, while the second one is whether

the shared entropy of e

is larger than δ

(Step 4). If both

conditions are identiﬁed, the candidate hyperedge e

becomes

a newly built one. Then, all parent hyperedges of the newly

built hyperedge are marked as removing hyperedges, which

will be removed at the end of the ith iteration (Step 5). A

graphical illustration of Algorithm 1 is given in Fig. 4.

D. Time Complexity Analysis

The time complexity of Algorithm 1 is determined by the

triple-nested loop from Step 2 to Step 5. In the ﬁrst and second

layer loop, there are δ

− 1 and |L

iterations, respectively.

Given an i-degree hyperedge e, there are maximum i × (m −

i + 1) hyperedges e



if |e



| = 1.Thus, in the innermost loop,

there are maximum i×(m−i+1) iterations. In each iteration, we

need to compute S(e

), shared entropy on candidate hyperedge

. According to (2), we need a number of joint entropies:

J(X)(X ⊆ e

). In our implementation, these joint entropies

are incrementally computed. In particular, we only compute

one joint entropy J (e

) in each iteration, and all other joint

entropies: J (X)(X ⊂ e

) are computed from 1- to (i − 1)th

iteration. According to (1), we assume each joint entropy is

computed in constant time. Thus, in each innermost loop, the

shared entropy on e

is computed in constant time, since only

one joint entropy is computed.

Based on the above analysis, the computational complexity

of Algorithm 1 is



i=1

(δ

− 1)|L

i(m − i + 1), wherein

is the depth of FCH and |L

| is the number of i-degree

hyperedges. In practice, δ

is always less than 10 and |L

| is

determined by δ

, the minimum shared entropy of hyperedges

. By setting an appropriate δ

, |L

| is approximately linearly

increasing with m, the number of multimodal features. Thus,

the computational complexity of Algorithm 1 is O(m

评论收藏

内容反馈

版权申诉

朱moyimi

粉丝: 63
资源: 1万+

cvpr.zip_cvpr

Luo_Fast_and_Furious_CVPR_2018_paper.zip

Realtime_Multi-Person_Pose_Estimation-master.zip_cvpr_person_pos

cvpr_2020.zip

FragTrack-master_2007_cvpr_目标跟踪_抗遮挡_跟踪遮挡_源码.zip

latex的cvpr中文模板 很好用

何博士09 cvpr最佳论文 代码

2017-cvpr-《Interspecies Knowledge Transfer for Facial Keypoint Detection》数据集

Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.zip

Mattyus_Matching_Adversarial_Networks_CVPR_2018_paper.zip

Yang_PIXOR_Real-Time_3D_CVPR_2018_paper.zip

cvpr2021AuthorKit_2.zip

cvpr论文_2018CVPR

cvpr 16 auto-context代码

( cvpr-torch.pdf )

2018 cvpr the unreasonable effectiveness of deep features as a perceptual metric

cvpr_2020_crawl_multi_thread.py

SID---Single-Image-Dehazing.zip_Dark_SID_cvpr2009_haze softmatti

3DSSD：基于点的3D单级目标检测器(CVPR2020)_Python_C++_下载.zip

ECO-master.zip_17年CVPR_eco 多尺度跟踪_eco算法_eco跟踪_mtimesx build

code_bs_cvpr15.zip_roar5ce_rpca_前背景分离

针对未知损坏(AirNet)的多合一图像恢复的PyTorch实现(CVPR2022)_Python_下载.zip

sgnet:具有自我指导的联合去马赛克和去噪，cvpr 2020

csk 追踪算法cvpr 2012上的工作 速度很快

code-BL.rar_BL_BL算法_cvpr_显著性_显著性检测

saliency detection on light field-cvpr 2014 code

形变模型相关资料（pdf、cvpr）

sparse-cvpr13.zip_sparse coding_subspace method_today576_transfe

cvpr2013_IRL1_denoising.zip_凸优化_凸优化 图像_凸优化加权_迭代凸优化_非凸

aasthana_cvpr2013_code_version_1.0.zip_三维特征_三维特征提取_三维重建_特征点三维_特征

最新资源

latex的cvpr中文模板很好用

何博士09 cvpr最佳论文代码

csk 追踪算法cvpr 2012上的工作速度很快

cvpr2013_IRL1_denoising.zip_凸优化_凸优化图像_凸优化加权_迭代凸优化_非凸