Extreme learning machine for classi fication over uncertain data
Yongjiao Sun
n
, Ye Yuan, Guoren Wang
Northeastern University, Shenyang 110004, China
article info
Article history:
Received 17 June 2013
Received in revised form
23 July 2013
Accepted 26 August 2013
Communicated by G.-B. Huang
Available online 18 October 2013
Keywords:
Extreme learning machine
Uncertain data
OS-ELM
SVM
Single hidden layer feedforward neural
networks
abstract
Conventional classification algorithms assume that the input data is exact or precise. Due to various
reasons, including imprecise measurement, network delay, outdated sources and sampling errors, data
uncertainty is common and widespread in real-world applications, such as sensor database, location
database, biometric information systems. Though there exist a lot of approaches for classification, few of
them address the problem of classification over uncertain data in database. Therefore, in this paper, we
propose classification algorithms based on conventional and optimized ELM to conduct classification
over uncertain data. Firstly we view the instances of each uncertain data as the training data for learning.
Then, the probabilities of uncertain data in any class are computed according to learning results of each
instance. Finally, using a bound-based approach, we implement the final classification. We also extend
the proposed algorithms to classification over uncertain data in a distributed environment based on
OS-ELM and Monte Carlo theory. The experiments verify the performance of our proposed algorithms.
& 2013 Elsevier B.V. All rights reserved.
1. Introduction
Recently, classification over uncertain data has gained much
attention, due to the inherent uncertainties of data in many real-
world applications, such as sensor network monitoring [1], object
identification [2], moving object search [3–5], and the like [6,7,36].
A number of factors induce the uncertainty, including data collec-
tion error, measurement, data sampling error, obsolete source,
network latency and transmission error. For example, in the
moving objects databases, due to the limited resources, it is
impossible for the database server to know the exact positions
of all objects all the time. In this condition, there are two kinds of
uncertainty, measurement error and sampling error. The measure-
ment errors are derived from the imprecision of GPS devices,
while in the sampling errors, the uncertainty derives from the
update frequency of moving objects. Therefore, it is very important
to manage and analyze uncertain data effectively and efficiently.
However, many traditional data classification problems become
particularly challenging in the uncertain case, since traditional
classification algorithms cannot work for the uncertain data. An
uncertain data object may have many instances, and the tradi-
tional classification algorithms view each instance as a data object.
Thus an uncertain data object can be categorized into many
classes, but an uncertain data object only belongs to one class
actually. Moreover, an uncertain data object may be attached a
probability density function (pdf) that describes the probability of
each instance appearing in this uncertain object. The uncertain
classification algorithm should consider this uncertain semantics
and efficiently process the computation associated with pdf.
Obviously, traditional classification algorithms cannot deal with
such challenges. Therefore in this paper, based on extreme learning
machine (ELM) [9–17], we propose a new classification algorithm
to process uncertain data objects. Specifically, we use the conven-
tional ELM [10] for uncertain data to obtain binary classifications
and the optimized ELM [9] is used for binary and multiclass
classifications over uncertain data. We also extend these algo-
rithms to distributed environments based on OS-ELM [8]. Conven-
tional ELM is a good learning method to class data due to good
generalization performance as well as improving the learning
speed of neural network, maximizing the separating margin, and
minimizing the training errors. However, optimized ELM tends to
have better scalability and achieve similar (for regression and
binary class cases) or much better (for multiclass cases) general-
ization performance at much faster learning speed than conven-
tional SVM and LS-SVM [18,19]. OS-ELM on the basis of ELM is an
algorithm that can handle data arriving or chunk-by-chunk with
varying chunk size.
T o implement uncertain classi
fications, we model uncertain data
as an object consisting of instances with arbitrary probability
distributi on. Based on ELM, firstly, we train each instance associated
with the uncertain data object. Then, the class probabilities of each
instance are computed according to the learning results. Finally , we
can obtain the final classification results by using a probability
bound-based approach. T o obtain more accurate classification results,
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2013.08.011
n
Corresponding author. Tel.: þ86 1390 9838 790.
E-mail address: sunyongjiao@ise.neu.edu.cn (Y. Sun).
Neurocomputing 128 (2014) 500–506