Extremelearningmachineforclassificationoveruncertaindata资源-CSDN文库

141 浏览量 2021-02-10 13:28:30 上传评论收藏 724KB PDF 举报

极端学习机（ELM）在不确定数据分类中的应用研究随着技术的发展和应用的普及，不确定数据的分类问题日益成为大数据处理中的一项重要任务。在真实世界的应用中，例如传感器数据库、位置数据库、生物特征信息系统等，由于不精确的测量、网络延迟、过时的信息源和采样错误等原因，数据的不确定性是非常常见且普遍存在的。传统的分类算法通常假设输入数据是精确的，但这往往与实际情况不符。因此，如何在不确定数据环境下进行有效的分类，是目前研究中的一大挑战。在这一背景下，本文作者Yongjiao Sunn、Ye Yuan和Guoren Wang等人提出了基于传统和优化极端学习机（ELM）的分类算法，用于处理不确定数据的分类问题。ELM是一种单隐层前馈神经网络（Single-hidden Layer Feedforward Neural Networks，SLFNs），它通过随机设置隐层神经元的参数，并通过解线性系统来训练输出权重，从而达到快速学习的目的。ELM的优势在于其快速的学习速度和良好的泛化能力。文章首先将每个不确定数据的实例视作学习的训练数据。接着，根据每个实例的学习结果计算不确定数据在任何类别中的概率。使用基于边界的分类方法来实现最终的分类结果。此外，文章还扩展了所提出的算法，以在基于OS-ELM（Optimized Single-hidden Layer Feedforward Neural Networks，优化单隐层前馈神经网络）和蒙特卡洛理论的分布式环境下进行不确定数据的分类。文章的关键词包括：极端学习机（ELM）、不确定数据、OS-ELM、支持向量机（SVM）、单隐层前馈神经网络（SLFNs）。从这些关键词可以看出，ELM在处理不确定数据问题上的潜力被深入挖掘。 ELM算法与传统分类算法如支持向量机（SVM）相比，在处理大规模数据集时，尤其是数据存在不确定性的条件下，展现了明显的速度优势和性能优势。支持向量机在寻找最优超平面时需要解决一个优化问题，这在不确定数据的环境下可能会遇到计算效率和收敛性的问题。而ELM的快速性和简单性使其成为处理这类问题的一个很好的选择。通过将ELM应用于不确定数据分类，不仅可以为特定类型的数据问题提供有效的解决方案，还可以为数据科学和人工智能领域提供新的思路和方法。例如，在传感器网络监控、对象识别、移动对象搜索等领域，ELM的应用可以帮助提高数据分类的准确性和可靠性。文章中还提到了蒙特卡洛理论的应用，这是一种基于随机抽样的计算方法，能够处理不确定数据的复杂性和随机性。在分布式环境下，蒙特卡洛理论结合OS-ELM能够有效地处理大规模数据集，实现数据分类的高效率和高准确性。 ELM算法在不确定数据分类领域展现了巨大的潜力和应用价值。该研究不仅为解决现实世界应用中的分类问题提供了新的视角，还为未来的大数据分析和处理技术的发展指明了方向。随着更多研究的不断深入，ELM及相关技术在处理不确定数据方面有望得到进一步的完善和发展。

资源推荐

资源详情

资源评论

Extreme learning machine for classi ﬁcation over uncertain data

Yongjiao Sun

, Ye Yuan, Guoren Wang

Northeastern University, Shenyang 110004, China

article info

Article history:

Received 17 June 2013

Received in revised form

23 July 2013

Accepted 26 August 2013

Communicated by G.-B. Huang

Available online 18 October 2013

Keywords:

Extreme learning machine

Uncertain data

OS-ELM

SVM

Single hidden layer feedforward neural

networks

abstract

Conventional classiﬁcation algorithms assume that the input data is exact or precise. Due to various

reasons, including imprecise measurement, network delay, outdated sources and sampling errors, data

uncertainty is common and widespread in real-world applications, such as sensor database, location

database, biometric information systems. Though there exist a lot of approaches for classiﬁcation, few of

them address the problem of classiﬁcation over uncertain data in database. Therefore, in this paper, we

propose classiﬁcation algorithms based on conventional and optimized ELM to conduct classiﬁcation

over uncertain data. Firstly we view the instances of each uncertain data as the training data for learning.

Then, the probabilities of uncertain data in any class are computed according to learning results of each

instance. Finally, using a bound-based approach, we implement the ﬁnal classiﬁcation. We also extend

the proposed algorithms to classiﬁcation over uncertain data in a distributed environment based on

OS-ELM and Monte Carlo theory. The experiments verify the performance of our proposed algorithms.

1. Introduction

Recently, classiﬁcation over uncertain data has gained much

attention, due to the inherent uncertainties of data in many real-

world applications, such as sensor network monitoring [1], object

identiﬁcation [2], moving object search [3–5], and the like [6,7,36].

A number of factors induce the uncertainty, including data collec-

tion error, measurement, data sampling error, obsolete source,

network latency and transmission error. For example, in the

moving objects databases, due to the limited resources, it is

impossible for the database server to know the exact positions

of all objects all the time. In this condition, there are two kinds of

uncertainty, measurement error and sampling error. The measure-

ment errors are derived from the imprecision of GPS devices,

while in the sampling errors, the uncertainty derives from the

update frequency of moving objects. Therefore, it is very important

to manage and analyze uncertain data effectively and efﬁciently.

However, many traditional data classiﬁcation problems become

particularly challenging in the uncertain case, since traditional

classiﬁcation algorithms cannot work for the uncertain data. An

uncertain data object may have many instances, and the tradi-

tional classiﬁcation algorithms view each instance as a data object.

Thus an uncertain data object can be categorized into many

classes, but an uncertain data object only belongs to one class

actually. Moreover, an uncertain data object may be attached a

probability density function (pdf) that describes the probability of

each instance appearing in this uncertain object. The uncertain

classiﬁcation algorithm should consider this uncertain semantics

and efﬁciently process the computation associated with pdf.

Obviously, traditional classiﬁcation algorithms cannot deal with

such challenges. Therefore in this paper, based on extreme learning

machine (ELM) [9–17], we propose a new classiﬁcation algorithm

to process uncertain data objects. Speciﬁcally, we use the conven-

tional ELM [10] for uncertain data to obtain binary classiﬁcations

and the optimized ELM [9] is used for binary and multiclass

classiﬁcations over uncertain data. We also extend these algo-

rithms to distributed environments based on OS-ELM [8]. Conven-

tional ELM is a good learning method to class data due to good

generalization performance as well as improving the learning

speed of neural network, maximizing the separating margin, and

minimizing the training errors. However, optimized ELM tends to

have better scalability and achieve similar (for regression and

binary class cases) or much better (for multiclass cases) general-

ization performance at much faster learning speed than conven-

tional SVM and LS-SVM [18,19]. OS-ELM on the basis of ELM is an

algorithm that can handle data arriving or chunk-by-chunk with

varying chunk size.

T o implement uncertain classi

ﬁcations, we model uncertain data

as an object consisting of instances with arbitrary probability

distributi on. Based on ELM, ﬁrstly, we train each instance associated

with the uncertain data object. Then, the class probabilities of each

instance are computed according to the learning results. Finally , we

can obtain the ﬁnal classiﬁcation results by using a probability

bound-based approach. T o obtain more accurate classiﬁcation results,

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2013.08.011

Corresponding author. Tel.: þ86 1390 9838 790.

E-mail address: sunyongjiao@ise.neu.edu.cn (Y. Sun).

Neurocomputing 128 (2014) 500–506

we train the huge and non-huge samples using the optimized ELM.

In both cases, we can obtain multiclass results at a time, while SVM

needs a lot of iterations. In a distributed environment, based on

OS-ELM, we train data one-by-one or chunk -by-chunk with varying

or ﬁxed chunk length, so that we can transfer data objects with

minimal network overheads in the training and testing phases.

Our contributions in this paper are as follows:



We develop efﬁcient classiﬁcation algorithms on uncertain data.



We use the conventional ELM for uncertain data objects to

obtain binary classiﬁcations.



We use the optimized ELM for uncertain data objects to obtain

binary and multiclass classiﬁcations.



We also adapt these algorithms to distributed environments

based on OS-ELM and a sampling method of Monte Carlo.

The remainder of the paper is organized as follows. The related

works are presented in Section 2. We formally deﬁne uncertain

classiﬁcation and give a review of ELM in Section 3. We propose

uncertain classiﬁcation algorithms based on conventional and

optimized ELMs in Section 5. We adapt the uncertain classiﬁcation

algorithms to distributed scenarios in Section 6. We discuss

the results of performance tests on real datasets in Section 7 and

the conclusions of our work in Section 8.

2. Related works

There are some works for centralized classiﬁcation. For cen-

tralized massive data, base-level classiﬁers are generated by

applying different learning algorithms with heterogeneous models

[23,24], or a single learning algorithm to different versions of the

given data. Lin et al. [25] theoretically analyzed the rationale

behind plurality voting, and Demrekler et al. [26] investigated how

to select an optimal set of classiﬁers. While [27,28] study the

classiﬁcation of uncertain data using the support vector model,

[29] performs classiﬁcation using decision trees. Artiﬁcial neural

network has been used in model-based clustering with a prob-

ability gained from expectation maximization algorithm for

classiﬁcation-likelihood learning [30].

There are some classiﬁcation approaches for distributed sce-

narios. Collaborative [31,32] are the mainly P2P classiﬁcation

approaches. Collaborative generates a single model for the classi-

ﬁcation, while ensemble combines multiple models (classiﬁers) for

predictions. This is a much more efﬁcient approach which propa-

gates only the statistics of the peers local data, with the decision

tree of each peer converging to the global solution over time.

Luo et al. [33] proposed building local classiﬁers using Ivotes [34]

and performed prediction using a communication-optimal dis-

tributed voting protocol that requires the propagation of unseen

data to most.

3. Problem de

ﬁnition and preliminaries

3.1. Problem deﬁnition

Uncertainty data model: Consider a set of uncertain data objects

U ¼fU

; …; U

g. Each uncertain object U

consists of a set of

instances u

; …; u

. Each instance u

is associated with a prob-

ability p

called appearance probability with the constraint that

∑

j ¼ 1

¼ 1. Without loss of generality, we assume that each

object is independent of other objects.

Note that each instance in an uncertain object is a multidimen-

sional vector. Thus each instance can be viewed a multidimensional

training data for ELM.

For example, Fig. 1 shows uncertain objects U

; U

and U

whose instances and corresponding appearance probabilities are

given in Table 1.

In this example, we assume that the classiﬁcation of uncertain

data objects is a simple linear classiﬁcation. In Fig. 1, the shadow

area and the white area represent two classes, and each instance is

classiﬁed into a corresponding category. However, many instances

in the same uncertain data object are classiﬁed into different

categories. Thus the problem of uncertain classiﬁcation should

consider the probability distributions of its all instances. The

uncertain data classiﬁcation is deﬁned as follows.

Deﬁnition 1. Let

Ω be the set of instances in all uncertain objects.

Given the number of classes m, ELM can classify all instances in

into m categories. Then an uncertain object U belongs to a class C

ð1r ir mÞ if the summarized probability of instances in C

is the

largest.

4. Preliminaries

This subsection brieﬂy gives an overview of ELM.

ELM was originally proposed for the single-hidden-layer feed-

forward neural networks and then extended to the generalized

single-hidden layer feedforward networks where the hidden layer

need not be neuron [5,6,37]. In ELM, all the hidden node para-

meters are randomly generated without tuning. The output func-

tion of ELM for generalized SLFNs is

ðxÞ¼ ∑

i ¼ 1

Gða

; b

; xÞ¼β  hðxÞð1Þ

where

β ¼½β

; …; β



is the vector of the output weights bet-

ween the hidden layer of L nodes and the output node.

hðxÞ¼½Gða

; b

; xÞ; …; Gða

; b

; xÞ

is the output (row) vector of

Fig. 1. An example of uncertain data model.

Table 1

Example of uncertain data model.

Uncertain object # Instances # Probability

1;1

, u

1;2

1;1

¼ 0:2, p

1;2

¼ 0:8

2;1

, u

2;2

2;1

¼ 0:3, p

2;2

¼ 0:7

3;1

, u

3;2

, u

3;3

3;1

¼ 0:1, p

3;2

¼ 0:5, p

3;3

¼ 0:4

4;1

¼ 1

Y. Sun et al. / Neurocomputing 128 (2014) 500–506 501

剩余6页未读，继续阅读

评论收藏

内容反馈

weixin_38733333

粉丝: 4
资源: 922

Extreme learning machine for classification over uncertain data

最新资源

Extreme learning machine for classification over uncertain data

extreme learning machine

extreme-learning-machine

极限学习机extreme learning machine

ECG data classification with deep learning tools

Deep Learning-Based Classification of Hyperspectral Data

Deep Learning for the Classification

Learning Representation for Multi-View Data Analysis

Classification on Trajectory data Using Extreme Learning Machine

A study on effectiveness of extreme learning machine

Sample-Based Extreme Learning Machine with Missing Data

Machine Learning Models and Algorithms for Big Data Classification.pdf

Machine Learning for Audio, Image and Video Analysis(2nd) 无水印pdf

Projective dictionary pair learning for pattern classification的源码

Statistics for Machine Learning

Machine Learning for Text

Machine Learning for OpenCV

Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow

Unsupervised Deep Transfer Feature Learning for Medical Image Classification

Learning Apache Mahout Classification

Machine Learning for the Web [2016]

Deep Learning for Scene Classification A Survey.pdf

Deep learning for time series classification a review.pdf

MATLAB for Machine Learning

statistic books for machine learning

最新资源