Distributedextremelearningmachinewithalternatingdirectionmethodofmultiplier资源-CSDN文库

139 浏览量 2021-02-07 13:28:02 上传评论收藏 1.01MB PDF 举报

### 分布式极限学习机与乘子交替方向法 #### 概述本文探讨了一种新的分布式计算模型——分布式极限学习机（Distributed Extreme Learning Machine, DELM），它结合了极限学习机（Extreme Learning Machine, ELM）算法与乘子交替方向法（Alternating Direction Method of Multipliers, ADMM）。该方法旨在解决大数据环境下传统ELM算法所面临的内存限制问题以及数据分布特性带来的挑战。 #### 极限学习机简介 ELM是一种单隐层前馈神经网络的通用框架，其核心思想在于隐藏层参数（权重和偏置）是随机生成而非通过梯度下降等优化算法调整的。这种随机化方式显著提高了训练速度，并且在许多情况下也表现出良好的泛化能力。然而，随着数据规模的增长，单机内存容量的限制成为制约ELM算法应用的关键因素之一。 #### 乘子交替方向法 ADMM是一种用于求解凸优化问题的有效算法，特别适用于大规模分布式优化场景。它能够将一个复杂的全局优化问题分解为一系列较小、更易于处理的子问题，并通过迭代的方式逐步逼近最优解。在分布式计算环境中，这种方法可以显著提高算法的并行性和可扩展性。 #### 分布式极限学习机（DELM）为了克服单机内存限制的问题，研究者们提出了DELMA算法。DELMA的核心是在分布式环境下利用ADMM来实现ELM算法。具体而言，DELMA将大规模的数据集分割成多个小块，这些小块被分配到不同的计算节点上进行处理。每个节点独立地对分配给自己的数据块执行ELM算法，并通过ADMM协调不同节点之间的计算结果，从而最终获得整个数据集上的学习结果。 #### 算法开发过程 1. **数据预处理**：首先将原始数据集划分为多个子集，每个子集足够小，能够在单个计算节点的内存中进行操作。 2. **分布式ELM执行**：每个计算节点根据分发给它的数据子集独立执行ELM算法，计算出局部的学习结果。 3. **乘子交替方向法协调**：利用ADMM算法，通过迭代更新全局变量和乘子，协调各节点间的计算结果，确保最终得到的是全局最优解。 4. **结果整合**：将所有计算节点的结果合并起来，得到最终的学习模型。 #### 实验验证为了验证DELMA算法的有效性和优越性，研究者们在多个基准数据集上进行了广泛的实验。实验结果表明，在处理大规模数据时，DELMA相比传统的ELM算法具有明显的速度优势和更好的扩展性能。此外，通过对不同规模数据集的测试，DELMA还展现出了良好的规模扩展性和数据大小适应性。 #### 结论通过将ELM与ADMM相结合，DELMA有效地解决了大数据环境下单机内存限制的问题，并提高了算法的并行性和扩展性。这一研究成果不仅对于理论研究具有重要意义，也为实际应用中的大规模机器学习提供了有效的解决方案。随着大数据时代的到来，DELMA有望在更多领域得到广泛应用。

资源推荐

资源详情

资源评论

Neurocomputing 261 (2017) 164–170

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Distributed extreme learning machine with alternating direction

method of multiplier

Minnan Luo

a , ∗

, Lingling Zhang

, Jun Liu

, Jun Guo

, Qinghua Zheng

SPKLSTN Lab, Department of Computer Science, Xi’an Jiaotong University, Xi’an 710049, China

Hardware Department, School of Computer Science and Technology, Northwest University, Xi’an 710127, China

a r t i c l e i n f o

Article history:

Received 26 September 2015

Revised 16 March 2016

Accepted 22 March 2016

Available online 14 February 2017

Keywords:

Extreme learning machine

Neuron work

Alternating direction method of multiplier

a b s t r a c t

Extreme learning machine, as a generalized single-hidden-layer feedforward network, has achieved much

attention for its extremely fast learning speed and good generalization performance. However, big data

often makes a challenge in large scale learning of extreme learning machine due to the memory limi-

tation of single machine as well as the distributed manner of large scale data in many applications. For

the purpose of relieving the limitation of memory with big data, in this paper, we exploit a novel dis-

tributed model to implement the extreme learning machine algorithm in parallel for large-scale data set,

namely distributed extreme learning machine (DELM). A corresponding algorithm is developed on the ba-

sis of alternating direction method of multipliers which has shown its effectiveness in distributed convex

optimization. Finally, extensive experiments on some benchmark data sets are carried out to illustrate

the effectiveness and superiority of the proposed DELM method with an analysis on the performance of

speedup, scaleup and sizeup.

1. Introduction

Extreme learning machine is a generalized single-hidden-layer

feedforward network, where the parameters of hidden layer fea-

ture mapping are generated randomly according to any continu-

ous probability distribution [1] instead of being tuned by gradient

descent based algorithms. As a result, extreme learning machine

achieves extremely fast learning speed and better performance of

generalization. The ELM technique performs effectively and have

been applied in many applications of machine learning such as

classiﬁcation [2,3] , clustering [4] and regression [5] . C. W. Deng

and G. B. Huang highlighted the new trends of multi-layer learn-

ing with extreme learning machine [6] . Extreme learning machine

is also used in many real life applications, for example, S. Shaha-

boddin et al. use extreme learning machine to estimate the wind

speed distribution [7] ; Deng et al. proposed an eﬃcient image

super-resolution approach based on extreme learning machine to

reconstruct the high-frequency components containing details [8] .

It is noteworthy that traditional extreme learning machine is of-

ten implemented on a single machine, and therefore it is inevitable

to suffer from the limitation of memory with large scale data set.

∗

Corresponding author.

E-mail addresses: minnluo@mail.xjtu.edu.cn (M. Luo), zhanglingling@stu.

xjtu.edu.cn (L. Zhang), liukeen@mail.xjtu.edu.cn (J. Liu), guojun@nwu.edu.cn

(J. Guo), qhzheng@mail.xjtu.edu.cn (Q. Zheng).

Especially in the era of big data, the data set scale is usually ex-

tremely large and the data is often very high-dimensional for de-

tailed information [9,10] . On the other hand, it is actually necessary

to deal with the data set in different machines due to the follow-

ing two reasons: (1) The data set is stored and collected in a dis-

tributed manner because of the large scale of applications; (2) It

is impossible to collect all of data together for the reason of con-

ﬁdentiality and the data set can be only accessed on their own

machine. Based on the analysis above, how to implement extreme

learning machine with respect to the data set which located in dif-

ferent machines becomes a key problems.

In previous work, some parallel or distributed extreme learning

machine have been implemented to meet the challenge of large-

scale data set [11,12] . For example, Q. He et al. took advantages

of the distributed environment provided by MapReduce [13] and

propose an parallel extreme learning machine on the basis of

MapReduce via designing the proper  key, value  pairs [14] . X.

Wang et al. focused on the issue of parallel ELM and propose M

extreme learning machine on the basis of min-max modular net-

work, namely as [15] . This approach decomposes the classiﬁcation

problem into several small subproblems and trains individual ELM

for each subproblem; in the end, M

-network is adopted to ensem-

ble the individual classiﬁers together. Additionally, A. Akusok et

al. exploited a complete approach which successfully utilize high-

performance extreme learning machine toolbox for big data [16] .

http://dx.doi.org/10.1016/j.neucom.2016.03.112

M. Luo et al. / Neurocomputing 261 (2017) 164–170 165

Besides the distributed framework mentioned above, alternat-

ing direction method of multipliers (ADMM) that aims to ﬁnd a so-

lution to global problem by solving local subproblems coordinately,

is also an effective optimization method for distributed convex

optimization [9,17] . It is noteworthy that C. Zhang investigated a

large-scale distributed linear classiﬁcation algorithm in the frame-

work of ADMM and achieves a signiﬁcant speedup over some other

classiﬁers [18] . C.Y. Lu et al. proposed a fast proximal ADMM with

parallel splitting method to further reduce the per-iteration com-

plexity for multi-blocks problem [19] .

Motivated by the advantages of ADMM in distributed optimiza-

tion problem, in this paper, we exploit a novel distributed extreme

learning machine (DELM) to implement the extreme learning ma-

chine on multiple machines in parallel. This approach relieve the

limitation of memory with large scale data set to some extent. It

is different from the traditional extreme learning machine, where

all of data are loaded onto one processor and share a common

output weight vector of hidden layer. Instead, the proposed DELM

method allows large-scale data set to be stored in distributed man-

ner by associating each processor a output weight vector, where

each output weight vector can be determined in parallel since it

depends only on the corresponding sub-dataset. In the framework

of ADMM, the shared output weight vector across all processors is

derived by combining all output weight vectors with an included

regularization vector.

The remainder of this paper is organized as follows. In

Section 2 , notations and preliminaries about extreme learning ma-

chine are reviewed brieﬂy. Section 3 focuses on the formulated op-

timization problem of distributed extreme learning machine. A cor-

responding distributed convex optimization algorithm is developed

for the proposed DELM on the basis of ADMM in Section 4 . Ex-

tensive experiments on well-known benchmark data sets are con-

ducted in Section 5 to illustrate the effectiveness and superiority

of the proposed method. Conclusion is given in Section 6 .

2. Principles of extreme learning machine

Extreme learning machine refer to a kind of single-hidden-layer

feedforward neural network, where the hidden layer need not be

tuned. Formally, the output function of extreme learning machine

is formulated as

(x ) =



j=1

(

)

= h

(

)

β, (1)

where β =



, β

, ··· , β





∈ R

is the output weights v ect or of

the hidden layer with L nodes, which needs to be estimated ana-

lytically; Feature mapping h : R

→ R

maps input variable x ∈ R

to L -dimensional hidden-layer feature space such that

(

)

(

)

, h

(

)

, ··· , h

(

) )

, (2)

where the component h

(x ) = G (a

, b

, x ) ( j = 1 , 2 , ··· , L ) denotes

the output function of j th hidden node, which is known to users by

generating the parameters



, b



: j = 1 , 2 , ··· , L



randomly ac-

cording to any continuous probability distribution [20] . In general,

Sigmoid function

(

a , b, x

)

1 + exp

(

−

(



x + b

) )

Gaussian function

(

a , b, x

)

= exp



−b



x − a





and some other nonlinear activation functions are usually adopted

to generate the probability distribution in the framework of ex-

treme learning machine. In addition, some kernel sparse coding

methods are also developed for the requirement of many applica-

tions [21,22] .

For data set D =





, t



: x

∈ R

, t

∈ R , k = 1 , 2 , ··· , N



that

consists of N training data points x

with actual output t

(k =

1 , 2 , ··· , N) , extreme learning machine randomly generates input

weights and estimates the output weight vector β by minimizing

the training error with the l

-norm of output weights for better

generalization, i.e.,

β = arg min





k =1

(

h (x

) β − t

)

= arg min



Hβ − T



(3)

where C is the trade-off parameter between the training error

and the regularization; T =

(

, t

, ··· , t

)



∈ R

denotes the actual

output vector; H ∈ R

N×L

represents the hidden layer output matrix,

i.e.,

H =

⎛

⎜

⎝

(

)

(

)

(

)

⎞

⎟

⎠

⎛

⎜

⎝

(

)

(

)

··· h

(

)

(

)

(

)

··· h

(

)

(

)

(

)

··· h

(

)

⎞

⎟

⎠

It is evident that the closed solution of optimization (3) can be

achieved as

β = H





+ H H





−1

T , (4)

and therefore the output of conventional extreme learning ma-

chine satisﬁes

(x ) = h

(

)





+ H H





−1

T . (5)

If the feature mapping h is not known to users, a kernel matrix

 =



i, j



∈ R

N×N

with respect to data set D can be calculated as



i, j

= h

(

)

· h





= K



, x



where the kernel function K : R

× R

→ R is usually deﬁned as

Gaussian function



, x



= G



x , x

, σ



= exp



−



x − x





According to Eq. (5) , the output of extreme learning machine with

kernel function K is formulated as

(x ) = h

(

)





+ H H





−1

T (6)

(

x , x

)

, K

(

x , x

)

, ··· , K

(

x , x

) )



+ 



−1

T .

As we all known that the conventional extreme learning

machine is usually implemented on a single machine and is

inevitable to suffer from the limitation of memory with large scale

data set. Indeed, it is deﬁnitely essential to handle large scale data

which are located on different machines, for example, in some

speciﬁc applications, the large scale data are collected and stored

in a distributed manner and can only be accessed on their own

machines; It is impossible to collect all data together due to the

conﬁdentiality.

In addition, traditional extreme learning machine is

computation-intensive with large scale data because of the

high complexity for inverse conversion of the large hidden layer

output matrix H (see Eq. (5) for details). This situation will be

worse in the case of kernel matrix  ∈ R

N × N

since the number of

nodes is equal to the number of samples with large scale data set.

剩余6页未读，继续阅读

评论收藏

内容反馈

weixin_38660108

粉丝: 6
资源: 924

Distributed extreme learning machine with alternating direction ...

最新资源

Distributed extreme learning machine with alternating direction ...

Distributed Optimization and Statistical Learning via the Alternating Direction

Stanford Slides-Alternating Direction Method of Multipliers

Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization

Packt Building Machine Learning Projects with TensorFlow

Machine Learning with Spark(PACKT,2015)

Communication Efficient Distributed Machine Learning with the Parameter Server-计算机科学

Boyd-admm_code _paper.zip

Machine Learning for Automated Diagnosis of Distributed Systems Performance

Building Machine Learning Projects with TensorFlow-Packt Publishing(2016).epub

RECURRENT EXPERIENCE REPLAY IN DISTRIBUTED REINFORCEMENT LEARNING

Machine Learning Design Patterns

admm_4776.pdf

distributed-deep-learning-with-horovod.pdf

大数据.pptx

online distributed learning over graphs with multitasks graoh filter models.pdf

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

磁悬浮系统自适应模糊PID控制器的设计

最新资源