This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
A Maximally Split and Relaxed ADMM for
Regularized Extreme Learning Machines
Xiaoping Lai , Member, IEEE,JiuwenCao , Member, IEEE, Xiaofeng Huang, Tianlei Wang,
and Zhiping Lin , Senior Member, IEEE
Abstract—One of the salient features of the extreme learning
machine (ELM) is its fast learning speed. However, in a big
data environment, the ELM still suffers from an overly heavy
computational load due to the high dimensionality and the
large amount of data. Using the alternating direction method
of multipliers (ADMM), a convex model fitting problem can be
split into a set of concurrently executable subproblems, each with
just a subset of model coefficients. By maximally splitting across
the coefficients and incorporating a novel relaxation technique, a
maximally split and relaxed ADMM (MS-RADMM), along with
a scalarwise implementation, is developed for the regularized
ELM (RELM). The convergence conditions and the convergence
rate of the MS-RADMM are established, which exhibits linear
convergence with a smaller convergence ratio than the unrelaxed
maximally split ADMM. The optimal parameter values of the
MS-RADMM are obtained and a fast parameter selection scheme
is provided. Experiments on ten benchmark classification data
sets are conducted, the results of which demonstrate the fast
convergence and parallelism of the MS-RADMM. Complexity
comparisons with the matrix-inversion-based method in terms of
the numbers of multiplication and addition operations, the com-
putation time and the number of memory cells are provided for
performance evaluation of the MS-RADMM.
Index Terms—Alternating direction method of multipliers
(ADMM), computational complexity, convergence rate, extreme
learning machine (ELM), parallel algorithm.
I. INTRODUCTION
T
HE extreme learning machine (ELM) [1] developed for
the training of single-hidden-layer feedforward neural
networks (SLFNs) has been attracting much attention in the
past decade and became popular due to its fast learning speed
and satisfactory generalization performance (see [2]–[7] and
the references therein). The fast learning speed of ELMs,
including those in online sequential mode [8]–[10], is due
to the randomly generated hidden nodes and the analytical
calculation of the output weight, which could be traced back
to [11]–[13]. The optimal output weight can be concisely
Manuscript received July 22, 2018; revised January 18, 2019 and May 1,
2019; accepted July 3, 2019. This work was supported by the National
Nature Science Foundation of China under Grants 61573123, 61427808, and
U1509205. (Corresponding author: Xiaoping Lai.)
X. Lai, J. Cao, and T. Wang are with the Institute of Information and
Control, Hangzhou Dianzi University, Hangzhou 310018, China (e-mail:
laixp@hdu.edu.cn; jwcao@hdu.edu.cn; wangtianlei0617@foxmail.com).
X. Huang is with the School of Communication Engineering, Hangzhou
Dianzi University, Hangzhou 310018, China (e-mail: xfhuang@hdu.edu.cn).
Z. Lin is with the School of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore 639798 (e-mail: ezplin@ntu.edu.sg).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2019.2927385
expressed as H
+
T,whereH
+
is the Moore–Penrose gen-
eralized inverse of the hidden-layer output matrix H of the
SLFN and T represents the target output. The regularized
ELM (RELM) is an improved version of the original ELM
in which a regularization term is incorporated into its cost
function to minimize not only the total squared training error
but also the norm of the output weight [2].
The noniterative analytical solution of the output weight is
one of the main factors that contribute to the high computa-
tional efficiency of the ELM. However, the enlarging volume
and the increasing complexity of the data sets in big data
applications render the implementation of the ELM highly
challenging. If the number of training samples M and the
number of hidden nodes N are very large, the hidden output
matrix H is of very high dimension, and the matrix-inversion-
based (MI-based) solutions require a huge memory space and
suffer from a heavy computational load.
To address the above challenges, several enhanced ELMs
were proposed [14]–[21]. The ELM in [14] introduces an
1
-regularized cost that leads to sparse solutions, and therefore
favors network pruning, and describes a hardware SLFN
structure that combines the parallel and pipelined processing.
Frances-Villora et al. [15] presented two parallel hardware
architectures for on-chip ELMs that are implemented on
the field-programmable gate array (FPGA) and focused on
the parallelization of the Moore–Penrose generalized inverse
computation based on a QR decomposition. He et al. [16]
used a programming model, namely, MapReduce to process
large data sets with a parallel/distributed algorithm on a
cluster to parallelize the MI-based output weight calculation
and the hidden node mapping. Xin et al. [17], [18] also used
the MapReduce framework to distribute the output weight
calculation. While Xin et al. [17] focused on the decompo-
sition of matrix multiplication, Xin et al. [18] focused on the
matrix multiplication in incremental/decremental/correctional
learning. The parallel RELM in [19] decomposes the data
matrix by rows or columns into a set of smaller block matrices
and trains the block-matrix-based models in parallel using
a cluster with the message passing interface environment.
Reference [20] divides each of the input data set, the hidden-
layer parameter data set, and the hidden-layer output matrix
into N parts, which are processed in parallel to calculate
the output weight. In [21], an ensemble of ELMs that are
implemented in parallel with multiple GPU and CPU cores
is used to reduce the error in regression problems with large
data sets. In [22], the training time by an ELM is reduced by
outsourcing the training to a computing cloud.
2162-237X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.