递归正则化深度图-CNN的大规模分层文本分类资源-CSDN文库

37 浏览量 2021-04-14 06:44:47 上传评论收藏 487KB PDF 举报

本篇研究论文的主题是《大规模分层文本分类中的递归正则化深度图-CNN模型》。这项研究的目的是为了解决在处理大量具有不同主题粒度标签的文本分类时，传统的词袋（Bag-of-Words, BOW）模型可能无法足够表示文本的问题。随着深度学习在图像数据表示方面取得的成果，本研究旨在探讨如何最好地表示文本数据，并提出了一种基于图卷积神经网络（Graph Convolutional Neural Networks, graph-CNN）的深度学习模型来优化大规模分层文本分类任务。在文本分类领域，主题分类是一个基础性的文本挖掘问题，对于诸如新闻分类、问答系统、搜索结果组织和在线广告等多个应用至关重要。当涉及到的标签非常多时，对文本进行分层分类被认为是一种自然且有效的组织文本的方法，这种方法在过去二十年间得到了广泛研究。传统方法往往通过词袋模型或n-gram等稀疏词汇特征来表示文本。但是，词袋模型在处理具有不同主题粒度的大量标签时可能存在不足，因此本研究提出了一种新的方法，即首先将文本转换为“词图”表示，然后使用图卷积操作对词图进行卷积。这种方法的优势在于能够捕捉非连续的和长距离的语义信息。卷积神经网络（CNN）模型在学习不同层次的语义方面具有优势。为了进一步利用标签层级之间的依赖性，研究者们还提出了一种递归正则化深度架构的方法。研究者们通过在RCV1和NYTimes等数据集上进行的实验，证实了提出的递归正则化深度图-CNN模型在大规模分层文本分类任务上的显著优势，与传统的分层文本分类方法以及现有的深度学习模型相比都有了显著的性能提升。具体来说，图-CNN模型通过将文本表示为“词图”，然后利用图卷积操作处理这些图，能够更加有效地捕捉文本中的非连续性以及长距离的语义信息。这比传统的BOW模型或n-gram模型在表示具有复杂结构的文本数据时更为有效。该模型采用了深度学习中的卷积操作，这些操作能够在不同的层次上学习文本的语义表示。此外，为了进一步提高模型的性能，研究者们在深度架构中引入了递归正则化机制，该机制基于标签之间的依赖关系。这种方法能够使得模型在处理具有层次结构的分类任务时，能够更好地利用标签之间的层级信息，从而提高分类的准确性。本研究的核心贡献在于提出了一种创新的图-CNN模型，该模型将文本以图的形式表示并利用深度学习技术来提高文本分类的性能。这种方法不仅在理论上有创新，而且在实践中也证明了其有效性，为处理大规模分层文本分类问题提供了一种新的视角和技术手段。未来的研究可以进一步探索如何结合其他深度学习架构或自然语言处理技术来进一步提升模型的表现。

资源推荐

资源详情

资源评论

Large-Scale Hierarchical Text Classiﬁcation with

Recursively Regularized Deep Graph-CNN

Hao Peng

Jianxin Li

Yu He

Yaopeng Liu

Mengjiao Bao

Lihong Wang

†

Yangqiu Song

‡

Qiang Yang

‡

School of Computer Science & Engineering, Beihang University, Beijing, China

†

National Computer Network Emergency Response Technical Team/Coordination Center of China

‡

Department of Computer Science & Engineering, Hong Kong University of Science and

Technology, Hong Kong

{penghao,lijx,heyu,liuyp,baomj}@act.buaa.edu.cn wlh@isc.org.cn

{yqsong,qyang}@cse.ust.hk

ABSTRACT

Text classiﬁcation to a hierarchical taxonomy of topics is a common

and practical problem. Traditional approaches simply use bag-of-

words and have achieved good results. However, when there are

a lot of labels with different topical granularities, bag-of-words

representation may not be enough. Deep learning models have

been proven to be effective to automatically learn different levels of

representations for image data. It is interesting to study what is the

best way to represent texts. In this paper, we propose a graph-CNN

based deep learning model to ﬁrst convert texts to graph-of-words,

and then use graph convolution operations to convolve the word

graph. Graph-of-words representation of texts has the advantage

of capturing non-consecutive and long-distance semantics. CNN

models have the advantage of learning different level of semantics.

To further leverage the hierarchy of labels, we regularize the deep

architecture with the dependency among labels. Our results on

both RCV1 and NYTimes datasets show that we can signiﬁcantly

improve large-scale hierarchical text classiﬁcation over traditional

hierarchical text classiﬁcation and existing deep models.

1 INTRODUCTION

Topical text classiﬁcation is a fundamental text mining problem

for many applications, such as news classiﬁcation [

], question

answering [

], search result organization [

], online advertis-

ing [

], etc. When there are many labels, hierarchical catego-

rization of texts has been recognized as a natural and effective

way to organize texts and it has been well studied in the past two

decades [

]. Most of the above traditional

approaches represent text as sparse lexical features such as bag-of-

words (BOW) and/or n-grams due to simplicity and effectiveness [

Different kinds of feature engineering, such as none-phrases or key-

phrases, were shown no signiﬁcant improvement on themselves,

while the majority voting over the results of different features are

signiﬁcantly better [36].

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

WWW2018, Lony, France

DOI: 10.475/123

Recently, deep learning has been proven to be effective to perform

end-to-end learning of hierarchical feature representations, and has

made groundbreaking progress on object recondition in computer

vision and speech recognition problems [

]. Two popular deep

learning architectures have attracted more attention for text data,

i.e., recurrent neural networks (RNNs) [

]

and convolutional

neural networks (CNNs) [

]. RNNs are more powerful on short

messages or word level syntactics or semantics [

]. When they are

applied to long documents, hierarchical RNNs can be developed [

However, hierarchical RNNs assume that the documents and sen-

tences are considered as natural boundaries for the deﬁnition of the

hierarchy where only regular texts and formal languages satisfy this

constraint. Different from RNNs, CNNs use convolutional masks

to sequentially convolve over the data. For texts, a simple mecha-

nism is to recursively convolve the nearby lower-level vectors in the

sequence to compose higher-level vectors [

]. This way of using

CNNs simply evaluates the semantic compositionality of consec-

utive words, which corresponds to the n-grams used in traditional

text modeling [

]. Similar to images, such convolution can naturally

represent different levels of semantics shown by the text data. Higher

level represents semantics captured by larger “n”-grams.

For document-level topical classiﬁcation of texts, the sequential

information of words might not be as important as it is for language

models [

] or sentiment analysis [

]. For example, when we write

“I love this restaurant! I think it is good. It has great sandwich. But

the service may not be very efﬁcient sine there are always a lot of

people...”, we can easily identify it’s topic as “food” but sentiment

analysis should be conducted more carefully since there is a word

“but.” For topic classiﬁcation, the key words, phrases, and their

composition are more important. In this case, rather than sequential

information, the non-consecutive phrases and long-distance word

dependency are more important for computing the composition of

semantics. For example, in a document, the words “restaurant” and

“sandwich” may not co-occur in a small window. However, “menu”

may co-occur with both of them somewhere else in the document,

and the composition of all of three words is a very strong signal to

classify the document to be “food” related topics. Therefore, a more

Here we ignore the discussion of recursive neural networks [

] since it requires

knowing the tree structure of text, which is not as efﬁcient as the others when dealing

with large scale data.

appropriate way of modeling non-consecutive and long-distance

semantics is expected for text topical classiﬁcation.

In this paper, we propose a Hierarchically Regularized Deep

Graph-CNN (HR-DGCNN) framework to tackle the above problems

with the following considerations.

Input.

Instead of viewing long documents as sequences, we ﬁrst

convert them to graphs. A natural way to construct the graph is

based on word co-occurrence, i.e., if two words co-occur in a small

window of texts, we build an edge between them. Then given a

constructed graph, any sub-graphs can be regarded as long distance

n-grams [

]. For each node of the graph, we use a pre-trained vector

based on word2vec [

] as input features. In this way, our input can

be regarded as a graph of vectors. Although word2vec optimization

has been proven to be identical to co-occurrence matrix factorization

under mild conditions [

], it is still preferable to explicitly represent

documents as graphs, since for upper level convolution, the longer

distance co-occurrence of words (which corresponds convolution

over sub-graphs) can be explicitly computed and evaluated.

Convolution Layers.

For lower intermediate layers, we follow

the graph normalization approach [

] to make the following convo-

lution operators possible. This graph normalization can be regarded

as a local operator to convert a graph to a sorted sequence, where

the order is based on the importance of the node on the graph. Other

graph convolution approaches are discussed in the related work (Sec-

tion 2). For the upper intermediate layers, we generally follow the

well deﬁned AlexNet [

] and VGG [

] networks for ImageNet

classiﬁcation. Different from image data, which at most has three

channels, i.e., RGB values, word embeddings have much more chan-

nels. A typical word embedding can have 50 dimensions. In this way,

the input tensor for convolution is slightly different from images, and

thus, we coordinately modify the conﬁguration of all the following

convolution layers to make the feature representation more effective.

Output.

For large scale hierarchical text classiﬁcation, there have

been many existing studies to design better output cost functions [

]. Here, we use the cross entropy objective function to determine

labels and adopt the simple but effective recursive regularization

framework proposed in [

]. The idea is if the two labels are parent

and child in the hierarchy, we assume that the classiﬁcation from

these two labels to other labels are similar. In the global view of the

hierarchy, it means the children label classiﬁers should inherit the

parent classiﬁer. To handle large-scale labels, we also use a tree cut

algorithm to automatically divide the trees into parts, and conquer

the regularized models for different parts.

In the experiments, we compare our proposed approach with

state-of-the-art methods, including traditional algorithms and deep

learning approaches. We use two benchmark datasets to demonstrate

both effectiveness and efﬁciency. RCV1 [

] dataset contains 23,149

training news articles and 784,446 testing news articles with 103

classes. NYTimes

contains 1,855,658 news articles in 2,318 cate-

gories. The results showed that our approach is very promising to

work on large scale hierarchical text topical classiﬁcation problems.

The contributions of this paper can be highlighted as follows.

•

First, we introduce a Deep Graph-CNN approach to text

classiﬁcation. There have been proof that bag-of-graph

https://catalog.ldc.upenn.edu/ldc2008t19

representation [

] and CNN representation [

] are effec-

tive for text topic classiﬁcation. However, this is the ﬁrst

attempt to show Graph-CNN is even more effective.

•

Second, for large scale hierarchical text classiﬁcation, we

demonstrate that recursive regularization can also be ap-

plied to deep learning. This can be a general framework

for deep learning applied to classiﬁcations problems when

classifying data into a hierarchy of labels.

•

Third, we use two benchmark datasets to demonstrate the

efﬁciency and effectiveness of our algorithm. They are with

either large test set, large label set, or large training set.

The rest of the paper is organized as follows. We ﬁrst review the

related work in Section 2. Then we introduce the detailed input and

architecture of our algorithm in Sections 3 and 4. Then we show

the experiments in Section 5. Finally, we conclude this paper in

Section 6. Our system is publicly available at https://github.com/

HKUST-KnowComp/DeepGraphCNNforTexts.

2 RELATED WORK

In this section, we brieﬂy review the related work in following two

categories.

2.1 Traditional Text Classiﬁcation

Tradition text classiﬁcation uses feature engineering (e.g., extracting

features beyond BOW) and feature selection to obtain good fea-

tures for text classiﬁcation [

]. Dimensionality reduction can also

be used to reduce the feature space. For example, Latent Dirichlet

Allocation [

] has been used to extract “topics” from corpus, and

then represent documents in the topic space. It can be better than

BOW when the feature numbers are small. However, when the size

of words in vocabulary increases, it does not show advantage over

BOW on text classiﬁcation task [

]. There is also existing work

on converting texts to graphs [

]. Similar to us, they used

co-occurrence to construct graphs from texts, and then they either

applied similarity measure on graph to deﬁne new document sim-

ilarities [

] or applied graph mining algorithms to ﬁnd frequent

sub-graphs in the corpus to deﬁne new features for text [

]. Both

of them showed some positive results for small label space classi-

ﬁcation problems, and the cost of graph mining is more than our

approach which simply performs breadth-ﬁrst search.

For hierarchical classiﬁcation with large label space, many ef-

forts have been put on how to leverage the hierarchy of labels to

reduce to time complexity or improve the classiﬁcation results. For

example, top-down classiﬁcation mechanism has been shown to be

more efﬁcient than bottom-up classiﬁcation (or ﬂat classiﬁcation

which treats each label in the leaf as a category) when there are many

labels [

]. Moreover, the parent-child dependency of labels in

the hierarchy can also be used to improve the model. For example,

some hierarchical cost-sensitive loss can be developed [

]. The idea

of transfer learning can also be borrowed to improve each of the

classiﬁers in the hierarchy [

]. Recently, a simpler recursive regu-

larization of weight vectors of linear classiﬁers in the hierarchical

model has been developed, and shown to be the state-of-the-arts in

large-scale hierarchical text classiﬁcation problems [13, 14].

剩余9页未读，继续阅读

评论收藏

内容反馈

weixin_38563871

粉丝: 1
资源: 959

递归正则化深度图-CNN的大规模分层文本分类

IJCAI2017:具有递归正则化的分层特征选择

wikicat:CS689 的最终项目受到 Kaggle 的大规模分层文本分类挑战的启发 (http

基于粒子群算法优化卷积神经网络(PSO-CNN)的回归预测预测，多变量输入模型（Matlab完整源码)

正则表达式--递归匹配与非贪婪匹配

【RP-CNN-LSTM-Attention分类】基于递归图优化卷积长短期记忆神经网络注意力机制的数据分类预测

论文 Python 实现WOA-CNN-BiGRU-Attention数据分类预测（含完整的程序和代码详解）

基于小波变换的正则化盲图像复原算法.pdf

循序渐进掌握递归正则表达式【推荐】

递归文件(包括正则表达示，RandomAccessFile）

多尺度卷积递归神经网络的RGB-D物体识别.pdf

基于深度卷积-递归神经网络的手绘草图识别方法.pdf

深度学习中的正则化方法研究.pdf

MATLAB实现WOA-CNN-BiGRU-Attention数据分类预测（含完整的程序和代码详解）

基于支持向量机递归特征消除(SVM-RFE)的分类特征选择算法，matlab代码，输出为选择的特征序号 多特征输入单输出的二

递归图优化卷积长短期记忆神经网络注意力机制RP-CNN-LSTM-Attention的数据分类预测附matlab代码.rar

正则表达式(学习的好资料)

JS正则表达式大全【2】

PHP中的递归正则表达式用法分享

基于卷积递归深度学习模型的句子级文本情感分类.pdf

Python 实现贝叶斯优化卷积双向长短期记忆网络（BO-CNN-BiLSTM）进行多输入分类预测（包含详细的完整的程序和数据

leetcode正则表达式-DP-7:DP-7

基于深度学习的文本分类方法研究综述.pdf

用于标点符号恢复的Bert-CNN-LSTM模型_Jupyter Notebook_Python_下载.zip

2017-精细化分类-Look Closer to See Better_ Recurrent Attention Convol

MATLAB实现WOA-CNN-GRU（鲸鱼算法优化卷积门控循环单元）进行数据分类预测的实例（包含详细的完整的程序和数据）

PHP中的递归正则使用说明

最新资源

基于支持向量机递归特征消除(SVM-RFE)的分类特征选择算法，matlab代码，输出为选择的特征序号多特征输入单输出的二