Multi-LabelClassificationwithLabelGraphSuperimposing.pdf资源-CSDN文库

需积分: 18 143 浏览量 2021-10-19 16:45:28 上传评论收藏 1.32MB PDF 举报

多标签分类与标签图superimposing 多标签分类是计算机视觉和机器学习领域中的一个重要任务，即对一个图像或视频进行多个对象或动作的识别。近年来，深度学习技术的rapid development使得多标签识别的性能得到了很大的提高。图卷积网络（Graph Convolutional Network，GCN）是近年来兴起的一种有效的多标签分类方法，它可以对标签之间的相关性进行建模，并提高特征学习的性能。然而，如何更好地对标签之间的相关性进行建模，以及如何将标签系统awareness整合到特征学习过程中，仍然是多标签分类领域中两个亟待解决的问题。为了解决这两个问题，本文提出了一个基于标签图superimposing的多标签分类框架。我们使用统计共现信息构建标签图，然后将其 superimpose 到知识先验标签图中。然后，对于最终的 superimposed 图，我们使用多层图卷积来进行标签嵌入抽象。这个方法可以有效地对标签之间的相关性进行建模，并提高特征学习的性能。我们提出了一个使用整个标签系统的嵌入来进行更好的表示学习方法。具体来说，我们在浅、中、深层添加了横向连接，以将标签系统的信息注入到 backbone CNN 中，以提高特征学习过程中的标签awareness。我们在 MS-COCO 和 Charades 数据集上进行了广泛的实验，结果表明我们的方法可以大大提高识别性能，并达到新的最优识别性能。多标签分类是计算机视觉和机器学习领域中的一个基本任务，它的应用前景非常广泛，包括图像和视频识别、自然语言处理、推荐系统等等。因此，研究多标签分类算法的性能和效率具有非常重要的意义。在本文中，我们提出的基于标签图superimposing的多标签分类框架可以有效地提高多标签分类的性能，并具有很强的实用价值。未来，我们将继续研究多标签分类算法的性能和效率，以满足实际应用的需求。

资源推荐

资源详情

资源评论

Multi-Label Classiﬁcation with Label Graph Superimposing

Ya Wang

$∗

, Dongliang He

‡∗

, Fu Li

‡

, Xiang Long

‡

, Zhichao Zhou

‡

, Jinwen Ma

$†

, Shilei Wen

‡

School of Mathematical Sciences and LMAM, Peking University, China

‡

Department of Computer Vision Technology (VIS), Baidu Inc., Beijing, China

{wangyachn@, jwma@math}.pku.edu.cn {hedongliang01, lifu, longxiang, zhouzhichao01, wenshilei}@baidu.com

Abstract

Images or videos always contain multiple objects or ac-

tions. Multi-label recognition has been witnessed to achieve

pretty performance attribute to the rapid development of deep

learning technologies. Recently, graph convolution network

(GCN) is leveraged to boost the performance of multi-label

recognition. However, what is the best way for label corre-

lation modeling and how feature learning can be improved

with label system awareness are still unclear. In this paper,

we propose a label graph superimposing framework to im-

prove the conventional GCN+CNN framework developed for

multi-label recognition in the following two aspects. Firstly,

we model the label correlations by superimposing label graph

built from statistical co-occurrence information into the graph

constructed from knowledge priors of labels, and then multi-

layer graph convolutions are applied on the ﬁnal superim-

posed graph for label embedding abstraction. Secondly, we

propose to leverage embedding of the whole label system

for better representation learning. In detail, lateral connec-

tions between GCN and CNN are added at shallow, mid-

dle and deep layers to inject information of label system

into backbone CNN for label-awareness in the feature learn-

ing process. Extensive experiments are carried out on MS-

COCO and Charades datasets, showing that our proposed so-

lution can greatly improve the recognition performance and

achieves new state-of-the-art recognition performance.

Introduction

Multi-label is a natural property of images or videos, it is

usually the case that a image or video contains multiple ob-

jects or actions. In the computer vision community, multi-

label recognition is a fundamental and practical task, and has

attracted increasing research efforts. Given the great suc-

cess of single label image/video classiﬁcation brought by

deep convolutional networks (He et al. 2015; Carreira and

Zisserman 2017; He et al. 2016a; Feichtenhofer et al. 2018;

Wu et al. 2019), multi-label recognition can achieve pretty

performance by naively treating each label as an indepen-

dent individual and applying multiple binary classiﬁcation

∗

equal contribution. This work was done when Ya Wang was a

full-time research intern at Baidu.

†

Corresponding author

 2020, Association for the Advancement of Artiﬁcial

! = 0.42

Sports Ball

Sports Ball,

Tenni s Racket

(a) Examples on MS-COCO

! = 0.20

Sitting on Couch

Sitting on Couch,

Watching Te le vision

(b) Examples on Charades

Figure 1: Examples of label relationship in multi-label

datasets. (a) illustrates the co-occurrence of “Sports Ball”

and “Tennis Racket” on the MS-COCO datasets, we can see

the frequency that “Tennis Racket” co-occurs with “Sports

Ball” is as high as 0.42. Similarly, (b) showcases an exam-

ple of “Sitting on Couch” and “Watching Television” from

the Charades dataset.

to predict whether a label presents or not. However, we ar-

gue that the following two aspects should be taken into con-

sideration for such a task.

First of all, labels co-occur in images or videos with pri-

ors. As illustrated in Figure 1, with great chance, “Sports

Ball” comes together with “Tennis Racket” and a man “Sit-

ting on Couch” is “Watching Television” simultaneously.

Then, a question is naturally raised, how to model the re-

arXiv:1911.09243v1 [cs.CV] 21 Nov 2019

lation among labels to leverage such priors for better perfor-

mance? Secondly, given input X, the common practice for

predicting its labels can be formulated as a two-stage map-

ping y = F

◦ F

(X), where F

: X 7→ f denotes the CNN

feature extraction process and F

: f 7→ y is the mapping

from feature space to label space. Labels are only explicitly

involved in the last stage as supervision in the training phase.

Therefore, the further question is, for a speciﬁc multi-label

classiﬁcation task, whether and how the mutual-related label

space can explicitly help the feature learning process F

To take into account the label correlations, some ap-

proaches have been proposed. For example, probabilistic

graph model was used in (Li et al. 2016; Li, Zhao, and

Guo 2014) and RNN was used in (Wang et al. 2016a) to

capture dependencies among labels. However, probabilis-

tic graph models may suffer from scalability issues given

their computational cost. RNN model relies on predeﬁned

or learned label sequential order and fails to well capture

the global dependencies. Recently, graph convolutional net-

work (Kipf and Welling 2016), aka GCN, has witnessed

prevailing success in modeling relationship among vertices

of a graph. Such a tool was leveraged to model the rela-

tion of the label system for multi-label recognition in (Chen

et al. 2019). Meanwhile, the label graph was built simply

by utilizing the frequency of label co-occurrence. Another

direction is to implicitly model label correlations via local

image regions attention, as was done in (Wang et al. 2017;

Zhu et al. 2017a). In addition, all the aforementioned solu-

tions follow the conventional practice of two-stage mapping

and the whole structure of label system is ignored in learning

the feature space.

In this paper, we attempt to ﬁnd possible answers for the

two questions. We propose a label graph superimposed deep

convolution network called KSSNet for this task. The super-

imposing means the following two folds in our framework:

(1) to model the priors of co-occurrence of labels follow-

ing the GCN paradigm, instead of using statistics of label

co-occurrence alone to build the relation graph of the label

system, we propose to superimpose knowledge based graph

into statistics based graph for constructing the ﬁnal one. (2)

In order to learn better feature representations for a speciﬁc

multi-label recognition task anchored on its label structures,

we design a novel superimposed CNN and GCN network to

extract label structure aware descriptors. Speciﬁcally, we

ﬁrst construct two adjacency matrices A

∈ R

N×N

and

∈ R

N×N

to denote correlation graphs of labels, which

is constructed by co-occurrence statistics and a knowledge

graph named ConceptNet (Speer, Chin, and Havasi 2017)

respectively. The initial embedding of all nodes (namely, la-

bels) is extracted from ConceptNet. The ﬁnal adjacency ma-

trix is a superimposed version. Then we apply multi-layer

graph convolution on the ﬁnal superimposed graph to model

the label correlation. Besides, different from conventional

graph augmented CNN solutions which utilize information

of label system at the ﬁnal recognition stage, we add lat-

eral connections between CNN and GCN at shallow, middle

and deep layers to inject information of the label system into

backbone CNN for the purpose of labels awareness in fea-

ture learning. We have carried out extensive experiments

on MS-COCO dataset (Lin et al. 2014) for multi-label im-

age recognition and Charades (Sigurdsson et al. 2016) for

multi-label video classiﬁcation. Results show that our solu-

tion obtains absolute mAP improvement of 6.4% and 12.0%

in MS-COCO and Charades with very limited computation

cost overhead, when compared to its plain CNN counter-

part. Our model achieves new state-of-the-art and outper-

forms current state-of-the-art solution by 1.3% and 2.4% in

mAP on MS-COCO and Charades, respectively.

Related Work

State-of-the-art image or video classiﬁcation frameworks

(He et al. 2016a; Carreira and Zisserman 2017; Feichten-

hofer et al. 2018; He et al. 2019; Wu et al. 2019) can be

directly applied for multi-label classiﬁcation by replacing

the cross-entropy loss with multi-binary classiﬁcation loss.

The straightforward extension leaves label correlation unex-

plored thus degrading the recognition performance. We pro-

pose our solution to alleviate this problem and it is closely

related with the following jobs.

Many existing works on multi-label classiﬁcation pro-

posed to capture label relationship for performance improve-

ment. The co-occurrence of labels can be well formulated

by probabilistic graph models, in the literature, there have

many methods based on such mathematical theory to model

the labels (Li et al. 2016; Li, Zhao, and Guo 2014). To

tackle the problem of computation cost burden of proba-

bilistic graph models, the neural network based solution is

becoming prevalence recently. In (Wang et al. 2016a), re-

current network was used to encode labels into embedding

vectors for label correlation modeling purpose. Context gat-

ing strategy was utilized in (Lin, Xiao, and Fan 2018) to inte-

grate the post processing of label re-ranking into the whole

network architecture. There are also works done by lever-

aging the attention mechanism in order for modeling label

relationship. In (Wang et al. 2017) and (Zhu et al. 2017a),

either image region-level spatial attention map or attentive

semantic-level label correlation modeling was used to boost

the ﬁnal recognition performance. (Wang, Jia, and Breckon

2019) proposed to improve the performance by model en-

semble.

Graph has been proved to be more effective for label

structure modeling. Tree-structure label graph built with

maximum spanning tree algorithm in (Li, Zhao, and Guo

2014) and knowledge graph for describing label dependency

in (Lee et al. 2018) are two typical label graph solutions.

Recently, GCN was introduced in (Kipf and Welling 2016)

and it has been successfully utilized for non-grid structured

data modeling. Researchers have leveraged GCN for many

computer vision tasks and great performance was achieved.

For instance, it was leveraged in (Yan, Xiong, and Lin 2018;

Gao et al. 2018) to model the relationship of skeletons of hu-

mans bodies for human action recognition and knowledge-

aware GCN was applied for zero-shot video classiﬁcation

in (Gao, Zhang, and Xu 2019). Our work mostly relates to

the one proposed in (Chen et al. 2019), which used GCN to

propagate information among labels and merges label infor-

mation with CNN features at the ﬁnal classiﬁcation stage.

Differently, our work builds GCN by superimposing the

剩余7页未读，继续阅读

评论收藏

内容反馈

DeepLearning小舟

粉丝: 2431
资源: 57

Multi-Label Classification with Label Graph Superimposing.pdf

最新资源

Multi-Label Classification with Label Graph Superimposing.pdf

multi-label-classification.pdf

multi-label classification by exploiting label correlations

Multi-Label classification: Dealing with Imbalance by Combining Labels

Multi-Label Image Recognition with Graph Convolutional Networks

Probability Estimates for Multi-class Classification by Pairwise Coupling.pdf

A Shared-Subspace Learning Framework for Multi-Label Classification

Zhong_Graph_Convolutional_Label_Noise_Cleaner_Train_a_Plug-And-Play_Action_Classifier_CVPR_2019_paper.pdf

Multi-Label Lazy Associative Classification

Multi-view Graph Learning.pdf

论文研究-HMPC: A Multi-fields Fast Packet Classification Algorithm.pdf

多标记分类课件Multi-label Classification课件

Learning Label Specific Features for Multi-label Classification

Multi-Label-Text-Classification-master.zip_CNN 分类_cNN分类_designz

多标签分类问题multi-label recognition

Dynamic Multi-field Packet Classification.pdf

multi-dimensional classification via sparse label encoding.pdf

Bert-Multi-Label-Text-Classification：此存储库包含用于多标签文本分类的预训练BERT模型的PyTorch实现

Multi-label_image_classification_using_pretrained

Multi-Label-Text-Classification:Kaggle有毒评论挑战

[GCN] 代码解析 of GitHub：Semi-supervised classification with graph convolutional networks

博客代码之Multi-class Classification

semi -supervised classification with graph convolutional networks学习必记

PyTorch-Image-Models-Multi-Label-Classification-main.zip

Single-Label-Multi-Class-Text-Classification

multi-lable multi-SVM classification

PyTorch-Image-Models-Multi-Label-Classification:基于timm的多标签分类

Multi-label-classification:多标签分类

2023数学建模国赛优秀论文合集(A~E)

Academic+Phrasebank+2021+Edition+_中英文对照.pdf

最新资源