基于极限学习机的大图分类框架_极限学习机框架资源-CSDN文库

121 浏览量 2021-03-08 21:31:11 上传评论收藏 1.05MB PDF 举报

基于极限学习机的大图分类框架是一篇发表在《Neurocomputing》期刊上的研究论文。文章主要探讨了如何利用极限学习机（Extreme Learning Machine，ELM）进行大图分类，并提出了三种不同的大图分类框架。极限学习机（ELM）是一种单层前馈神经网络的学习算法，由黄广斌教授在2006年提出。它以随机生成隐含层参数和解析解的方式快速确定网络权重，与传统的梯度下降法相比，ELM具有训练速度快、泛化能力强、避免局部最优等优点。极限学习机在很多领域得到了应用，如图像识别、文本分类、语音识别等。在大图分类问题中，随着图数据规模的大幅增加，很多现有的图学习方法会面临高昂的计算成本问题。为了解决这一问题，作者提出并利用了高效特征提取方法以及ELM的变体，包括基于压缩的频繁子图挖掘方法来减小图的大小、增量框架来处理动态图、以及分布式ELM框架来提供良好的可扩展性并在云平台上容易实现。文章中的三种框架具体来说，分别是： 1. 基于压缩的频繁子图挖掘方法以减少图大小的框架，这种方法通过挖掘图数据中频繁出现的子图，从而对原始图数据进行压缩，降低了后续处理的计算复杂度，从而提高大规模图数据分类的速度和效率。 2. 针对动态图的增量框架。在许多实际应用中，图数据是动态变化的，比如社交网络、生物信息网络等。增量学习框架能够适应这种动态变化的需求，实现对新出现的数据或变化的图结构进行实时更新和分类。 3. 具有分布式ELM的分布式框架，利用分布式计算的方法，将数据分布到多个计算节点上处理，能够有效提升大规模图数据的处理能力，具有良好的可扩展性，便于在云平台上实施。在实验部分，作者在大型真实世界的图数据集上进行了广泛实验。实验结果表明，所提出的框架在大图分类应用中是高效的，并且非常适用于动态网络。结果也验证了ELM及其变体在大规模图数据上具有良好的分类性能。通过该研究，我们可以了解到在图数据处理领域中，特别是当面对大规模、动态变化的数据集时，传统的学习方法往往面临挑战。ELM及其变体由于其学习速度快、泛化能力强等特点，能够很好地解决这些挑战。未来，随着人工智能和机器学习技术的不断进步，ELM及相关技术在大图分类等领域的应用也将更加广泛和深入。

资源推荐

资源详情

资源评论

Neurocomputing 330 (2019) 317–327

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Big graph classiﬁcation frameworks based on Extreme Learning

Machine

Yongjiao Sun

a , ∗

, Boyang Li

, Ye Yuan

, Xin Bi

, Xiangguo Zhao

, Guoren Wang

College of Computer Science and Engineering, Northeastern University, Shenyang, Liaoning, China

Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China

School of Computer Science and Technology, Beijing Institute of Technology, China

a r t i c l e i n f o

Article history:

Received 14 December 2017

Revised 5 November 2018

Accepted 9 November 2018

Available online 16 November 2018

Communicated by zhi yong Liu

Keywords:

Graph classiﬁcation

Extreme Learning Machine

Frequent subgraph mining

a b s t r a c t

Graph data analysis is a hot topic in recent research area. Graph classiﬁcation is one of the most impor-

tant graph data analysis problems, which choose the most probable class labels of graphs using models

based on the training dataset. It has wildly applications in protein group identiﬁcation, chemical com-

pounds classiﬁcation and so on. Many existing research of graph learning suffer from high computa-

tion cost as large scale graph data are dramatically increased. In order to realize big graph classiﬁcation

with real-time learning ability and good scalability, eﬃcient feature extraction approaches and ELM vari-

ants are utilized in this paper. To be speciﬁc, we present three frameworks of big graph classiﬁcation

based on ELMs: (1) a framework with a compression-based frequent subgraph mining method to re-

duce graph size; (2) an incremental framework to handle dynamic graphs; (3) a distributed framework

with distributed ELMs to provide good scalability and easy implementation on cloud platforms. Extensive

experiments are conducted on clusters with large real-world graph datasets. The experimental results

demonstrate that our frameworks are eﬃcient in big graph classiﬁcation applications, and well suitable

for dynamic networks. The results also validate that ELM and its variants have good classiﬁcation perfor-

mance on large-scale graphs.

Introduction

In recent years, graphs have been widely used in different do-

mains, such as bioinformatics [1] , cheminformatics [2,3] , social

networks [4] , and multimedia [5] . Graph classiﬁcation is a prob-

lem to choose the most probable class labels of graphs using mod-

els based on the training dataset. There is a great need for building

an automated classiﬁcation model for predicting the large number

of graphs between different classes. For example, proteins, whose

structures can be represented by graphs, can be classiﬁed into

classes: those which perform a certain function and those which

do not. Similarly, chemical compounds can be stored as graphs,

chemists are interested in predicting which chemical compounds

are active and which are inactive.

A common approach in graph classiﬁcation is mining pat-

terns on the training dataset and then generating vectors to rep-

resent the graphs. After that, classiﬁers can be used to classify

graphs. Therefore, a graph classiﬁcation framework has two major

∗

Corresponding author.

E-mail addresses: sunyongjiao@mail.neu.edu.cn , sunyongjiao@ise.neu.edu.cn (Y.

Sun).

components. First, it mines frequent subgraphs as patterns on the

dataset. Second, classiﬁers are applied after converting graphs into

feature vectors.

Frequent subgraphs mining has been well studied in [6–22] .

Most of the early techniques are memory-based and assume that

the entire graph collection ﬁts in memory. However, as the size of

dataset increases, the assumption is no longer satisﬁed. The solu-

tions face a number of issues:

• With the increasing volume of graph data, such large size of

input data may exceed the memory resources of a single ma-

chine, and vast amounts of CPU time required to compute fre-

quent patterns. Traditional techniques are hard to applied to

huge amounts of data sets, and can not meet the requirement

of users.

• Graphs are not static but highly dynamic and frequently up-

dated over time. Furthermore, the structures of many real-

world networks may constantly evolve over time. For instance,

social networks are updated frequently by insertions and dele-

tions of edges (e.g., friendship, collaboration, citation). In most

cases, the update is much smaller than the whole dataset and

comes frequently. It is ineﬃcient to recompute frequent pat-

terns in the new dataset whenever an update comes. Therefore,

https://doi.org/10.1016/j.neucom.2018.11.035

318 Y. Sun, B. Li and Y. Yuan et al. / Neurocomputing 330 (2019) 317–327

as the new structure information are available, the classiﬁcation

model should be effectively in the updated dataset.

• Several parallel works have been proposed to solve the single-

graph case [23–25] . However, we focus on the case of min-

ing the graph transaction, mining a large collection graphs. The

work in [14] employs MapReduce for mining a large collection

of graphs. However, the large number of duplicate subgraphs

still creates signiﬁcant performance problems.

After mining graphs patterns, building an effective classiﬁcation

model is another issue. An effective classiﬁcation model is impor-

tant and should have a fast learning speed and a high classiﬁca-

tion accuracy. Support Vector Machines (SVM) [26] is adopted in

previous works [13,27,28] . SVM is an effective classiﬁer with good

classiﬁcation accuracy. However, the slow training speed and trivial

human intervene become severe problems. Recent years, a gener-

alized Single Hidden-layer Feedforward Network, named Extreme

Learning Machine (ELM), is proposed in [29] . Applications based on

ELM show that ELM has a much faster training speed and provides

a better generalization performance than traditional learning algo-

rithms of feedforward neural networks. Furthermore, ELM is easy

and eﬃcient to be implemented [30] .

To verify the shortcomings of existing approaches, in this paper

we make the following contributions:

• We present a novel compression-based graph pattern mining

algorithm. Due to there being many common structures among

graphs with same class labels, the original graph dataset can

be compressed to reduce the graph number and the required

memory space. We merge graphs with the class labels into one

compressed graph by discovering their common structures and

mine graph patterns in the compressed dataset.

• We propose an incremental graph pattern mining algorithm to

can adopt to the situation that the graph dataset and graph

structures are highly dynamic and frequently updated over

time. When an update G comes, it may inﬂuence the detected

patterns and there may be new patterns in the updated dataset.

In the incremental mining algorithm, we mine new patterns

and update detected patterns based on G instead of recom-

pute in the updated dataset.

• We develop a new distributed graph pattern mining algorithm

to handle complex graphs which have repetitive edges with the

same edge label and vertex label. MapReduce jobs are used to

construct subgraphs and count frequency. We devise a graph

encoding method to encode subgraphs so as to optimize sub-

graphs counting and to prevent constructing duplicate sub-

graphs.

• We extend ELM and its variants into our proposed large

graph mining solutions and implement three graph classiﬁ-

cation frameworks. ELM is an eﬃcient classiﬁer and shows

its outstanding generalization performance in the big graph

dataset.

The remainder of this paper is organized as follows:

Section 2 brieﬂy introduces some basic concepts and formal-

izes the graph classiﬁcation problem. In Section 3 , we give a brief

introduction to ELM. The compression-based graph classiﬁcation

framework is proposed in Section 4 . Section 5 introduces the

incremental graph classiﬁcation framework. Distributed graph

classiﬁcation is described in Section 6 . Extensive experiments are

conducted on a series of real-life datasets and the performance

is evaluated in Section 7 . Section 8 discusses the related works.

Finally, we give our conclusions and future works in Sections 9 and

10 .

2. Problem deﬁnition

2.1. Graph data and graph isomorphism

Deﬁnition 1 (Undirected graph) . For an undirected graph g =

(V (g) , E(g) , L

, L

) , V ( g ) is the set of vertices and E ( g ) is the set

of edges, E ( g ) ⊆ V ( g ) × V ( g ), L

and L

are the labels of vertices

and edges respectively. Each edge can be represented by a 3-tuple,

e = (u, w, v ) , where u and v are vertices in V ( g ) and w is the label

of e in L

. L

is the class label of g .

For a graph dataset D = { G

, G

, . . . , G

} in Fig. 1 , G

, G

are labeled with C

, while G

, G

are labeled with C

Deﬁnition 2 (Subgraph isomorphism) . For two graphs g and g



, l is

a label function mapping a vertex or an edge to a label. A subgraph

isomorphism is an injective function f: V ( g ) → V ( g



), such that 1)

∀ v ∈ V (g) , l(v ) = l( f (v )) , and 2) ∀ ( u, v ) ∈ E ( g ), ( f ( u ), f ( v )) ∈ E ( g ) and

l(u, v ) = l( f (u ) , f (v )) , where l and l



are the labeling functions

of g and g



, respectively. Then g is a subgraph of g



, denoted as

g ⊆ g



2.2. Graph pattern mining and graph classiﬁcation

Deﬁnition 3 (Frequency) . Given a dataset D = { G

, G

, . . . , G

} and

a graph pattern g , the supporting dataset of g is D

= { G

| g ⊆

, G

∈ D } . The frequency of g is | D

|/| D |, denoted as freq ( g ).

Deﬁnition 4 (Support graph vector) . Given a set of patterns

, p

, . . . , p

, a graph g can be represented as a 0–1 vector x =

[ x

, x

, . . . , x

] , where x

= 1 if p

⊆ G ; otherwise, x

= 0 . The vector

x is called a support graph vector of the graph g .

Deﬁnition 5 (Graph classiﬁcation) . Given a graph g and a set of

class labels C

, C

, . . . , C

, the goal of graph classiﬁcation function

GC ( g ) is to predict the most probable class label for g , modeled as

GC(g) = C

, where i ∈ [1, m ].

Deﬁnition 6 (Graph compression classiﬁcation) . For a graph

dataset D and a graph compression function R, D

= R (D ) is a

graph dataset computed from D by R , referred to as the com-

pressed dataset of D , such that | D

|  | D |, satisfying GC(D ) =

GC(D

) , where GC is the graph classiﬁcation function.

The graph compression function reduces the size of the graph

dataset, while structure information of graphs can still be holden.

Fig. 2 (c) is a compressed graph constructed by the compression

function among class label C

. We do not lose any information and

get a smaller dataset so as to save memory space.

Deﬁnition 7 (Incremental graph classiﬁcation) . For a graph dataset

D and an update G , incremental graph classiﬁcation satisﬁes

GC(D + G ) = GC (D ) + GC (G ) .

Incremental graph classiﬁcation can be adopted to solve dy-

namic graphs when the graph structures are frequently updated

over time. Dynamic information can be combined with previous

results, and the classiﬁcation model can adjust to the latest status.

Deﬁnition 8 (Distributed graph classiﬁcation) . For a graph dataset

D , a distributed graph classiﬁcation DC satisﬁes GC(D ) = DC(D ) .

Distributed storage and computing is another eﬃcient way to

solve the big graph dataset. We mine graph patterns and train clas-

siﬁcation models through MapReduce framework. Distributed solu-

tions can make up for the requirement of the computing time and

the memory.

剩余10页未读，继续阅读

评论收藏

内容反馈

weixin_38629206

粉丝: 4
资源: 958

基于极限学习机的大图分类框架

基于Matlab的极限学习机分类算法.zip

基于模因算法的极限学习机分类

ELM.rar_ELM_极限学习机_极限学习机用于分类

基于极限学习机的道路分割

基于极限学习机与子空间追踪的人脸识别算法.pdf

【ELM预测】基于极限学习机进行正弦波预测附matlab代码.zip

一种基于大数据的极限学习机在茶叶产量预测中的研究与应用.pdf

ELM（极限学习机分类）.rar

单调分类极限学习机

ELM.zip_ELM分类_极限学习_极限学习机_极限学习机 ELM

极限学习机回归及分类代码

matlab1.rar_极限学习_极限学习机_极限学习机二分类源代码

MATLAB实现极限学习机在回归拟合及分类问题中的应用研究【深度学习、人工智能项目实战】.zip

matlab深度学习项目，基于度量学习结合极限学习机进行人脸识别分类，其打破了常见的深度学习架构，是一个创新型项目

基于极限学习机多空间配准方法

用于半监督高光谱图像分类的有区别的像素对对约束引导极限学习机

极限学习机和自适应稀疏表示算法 （EA-SRC）附Matlab代码.zip

基于极限学习机的数字化乳腺X线摄影乳腺分类

核极限学习机应用于回归/分类问题，对比极限学习机

基于极限学习机的人脸识别.pdf

基于极限学习机的LIBS钢液定量分析

用于视觉物联网的基于分层极限学习机的图像去噪网络

pso-elm（粒子群算法优化极限学习机）

极限学习机的回归优化方法

Matlab实现ELM极限学习机时间序列预测（完整源码和数据）

最新资源

极限学习机和自适应稀疏表示算法（EA-SRC）附Matlab代码.zip