用于聚类和分类的多种学习框架资源-CSDN文库

68 浏览量 2021-03-28 10:55:37 上传评论收藏 768KB PDF 举报

标题中的“用于聚类和分类的多种学习框架”指的是在数据挖掘和机器学习领域中，研究者们探索的能够同时处理聚类（clustering）和分类（classification）问题的多种算法和理论框架。聚类是无监督学习的一种，旨在根据数据的内在相似性将数据分为不同的组或簇；而分类则是有监督学习，需要已知的标签信息来指导模型对新数据进行预测。描述中提到的“A manifold learning framework for both clustering and classification”是将流形学习（Manifold Learning）的概念应用于聚类和分类任务的一种方法。流形学习是一种非线性降维技术，它假设高维数据实际上是由低维流形结构决定的，通过学习这种结构，可以更好地理解和分析数据。关键词包括“Pattern recognition（模式识别）”，“Clustering learning（聚类学习）”，“Classiﬁcation learning（分类学习）”，“Bayesian theory（贝叶斯理论）”和“Manifold Learning（流形学习）”。这些关键词表明本文关注的是如何利用贝叶斯理论结合流形学习，提高模式识别、聚类和分类的性能。论文描述了一个两步的学习框架：第一步，使用“ranking on manifolds”的方法进行聚类，探索数据中的结构；第二步，应用贝叶斯规则计算类后验概率。这个框架的关键在于，它建立了流形结构与类别之间的统计关系，从而将聚类学习与分类学习联系起来。这种方法的优点包括： 1. 自动确定聚类参数，无需人工设定； 2. 利用贝叶斯规则建模数据的类后验概率，进行分类学习； 3. 提供了流形结构与给定类别之间的统计联系。实验结果在两个合成数据集和16个真实世界的数据集上取得了令人鼓舞的效果，这表明该框架在实际应用中具有良好的潜力。这篇研究论文提出了一种创新的流形学习框架，将聚类和分类任务结合起来，利用贝叶斯理论解析数据中的流形结构和类别信息之间的关系，为数据挖掘提供了更高效、灵活的方法。这一框架不仅解决了传统聚类和分类算法的一些局限，还展示了其在实际数据集上的有效性。

资源推荐

资源详情

资源评论

A manifold learning framework for both clustering and classiﬁcation

Weiling Cai

⇑

Department of Computer Science & Technology, Nanjing Normal University, Nanjing 210097, PR China

article info

Article history:

Received 12 February 2015

Received in revised form 4 September 2015

Accepted 8 September 2015

Available online 14 September 2015

Keywords:

Pattern recognition

Clustering learning

Classiﬁcation learning

Bayesian theory

Manifold Learning

abstract

In recent years, a great deal of manifold clustering algorithms was presented to identify the subsets of the

manifolds data. Meanwhile, numerous classiﬁcation algorithms were also developed to classiﬁed data

shaped in the form of manifold. However, nearly none of them pay attention to the statistical relationship

between the manifold structures and class labels, thus failing to discover the knowledge concealed in

data. In this paper, a manifold learning framework for both clustering and classiﬁcation is presented,

which involves two steps. In the ﬁrst step, the clustering through ranking on manifolds is executed to

explore structures in data; in the second step, the class posterior probability is calculated by using the

Bayesian rule. The core of this framework lies in employing the Bayesian theory to establish the relation-

ship between manifolds and classes thus creates a bridge between clustering learning and classiﬁcation

learning. Our new manifold learning framework is interesting from a number of perspectives: (1) our

algorithm can perform manifold clustering learning which can auto-determine the clustering parameters

without manual determining; (2) our algorithm can perform manifold classiﬁcation learning which

models the posterior probabilities pð

Þ by using the Bayesian rule; (3) our algorithm can provide

the statistical relationship between the manifold structure and the given classes. Encouraging experi-

mental results are obtained on 2 artiﬁcial and 16 real-life benchmark datasets.

1. Introduction

Data mining [1–3] is the discovery of interesting relationships

and characteristics that may exist implicitly in data. Clustering

and classiﬁcation are two primary data-mining techniques [4,5].

The clustering approaches such as K-Means [6], Fuzzy C-Means

(FCM) [7] and Gaussian Mixture Model [8] are widely utilized to

discover the hidden structure in data. Whereas the classiﬁcation

approaches such as Multi-layer Perceptron (MLP) [9] and Support

Vector Machines (SVM) [10,11] are successfully applied to deter-

mine the class labels of unseen samples. To fuse the advantages

of clustering and classiﬁcation together, numerous researchers

studied on how to design a single approach for both clustering

and classiﬁcation. To bridge clustering and classiﬁcation, Setnes

and Babus

ka [12] proposed Fuzzy Relational Classiﬁer (FRC) which

attempted to utilize the fuzzy composite operators to construct the

relationship between the cluster structures and classes. To

enhance the robustness of FRC, in one of our previous works, we

developed Robust Fuzzy Relational Classiﬁer (RFRC) [13] by replac-

ing FCM and hard class labels with Kernelized FCM (KFCM) [14,15]

and soft labels, respectively. Another famous classiﬁer is Radial

Basis Function neural networks (RBFNN) [16,17] which extracts

signiﬁcant information from the observed data to construct its hid-

den layer.

However, all above algorithms are relatively suitable for the

data shaped in the form of point clouds (group), but unsuitable

for those data in the form of manifold structure. In real-life world,

there are quite a number of data that form paths through a high-

dimensional and expose manifold structure. For instance, motion

segmentation problem in computer vision, the point correspon-

dences in a dynamic scene can generally be represented as mani-

fold; in classiﬁcation of face images, the faces of person lie on

the manifold. For these data exhibiting manifold structure rather

than compact shape, a considerable number of clustering algo-

rithms such as Spectral Clustering [18,19] have been presented

to identify the subsets of the manifolds data. Numerous research

studies proved that incorporating the structure information into

a classiﬁer can enhance its generalization ability, and this research

ﬁnding is consistent with the famous No Free Lunch (NFL) theorem

[20]. In the last decade, a number of manifold or subspace classiﬁ-

cation algorithms such as Plane-Gaussian Function Networks

(PGFN) [21], Laplacian Regularized Least Square Classiﬁcation

(LapRLSC) [22,23] and Laplacian SVM (LapSVM) [24] were pre-

sented. These algorithms only attempt to integrate the manifold

or subspace distribution information into the classiﬁcation model.

http://dx.doi.org/10.1016/j.knosys.2015.09.010

⇑

Tel./fax: +86 25 84441669.

E-mail address: caiwl@nuaa.edu.cn

Knowledge-Based Systems 89 (2015) 641–653

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

However, nearly none of them pay attention to the underlying rela-

tionship between the manifold distribution and given classes, thus

unable to discover the knowledge concealed in data. As a result, an

open and challenging problem is to design a framework for mani-

fold data with the goal of combining the advantages of clustering

and classiﬁcation and meanwhile revealing the statistical relation-

ship between manifolds and classes.

In this paper, we propose a manifold learning framework for

both clustering and classiﬁcation (MCC). MCC aims to discover

the manifold structure hidden in data, design an effective and

transparent classiﬁcation mechanism and meanwhile exploit the

relationship between manifolds and classes. To achieve these

goals, our framework treats the manifold clustering learning and

classiﬁcation learning in a two-step sequential manner. In the ﬁrst

step, the clustering through ranking on manifolds is performed to

explore structures in data; in the second step, by using the Baye-

sian rule, the class posterior probability is calculated to give class

labels for unseen samples. It is worth mentioning that the number

of manifolds (i.e. clusters) has a signiﬁcant inﬂuence on the result

of manifold clustering [25–27]. To auto-determine this parameter

in our algorithm, the inter-cluster mean distance by ranking on

manifolds is maximized and while the intra-cluster mean distance

is minimized. As a result, our algorithm can auto-determine the

clustering parameters without manual determining. Another key

of this framework is to connect the multi-manifold with the given

classes employ, and then establish a relationship between them.

This relationship creates a bridge between clustering learning

and classiﬁcation learning. Based on such relationship, our frame-

work cannot only group multi-manifold into different clusters, but

also make classiﬁcation decisions for unseen samples. More impor-

tantly, this relationship can successfully reﬂect the probability and

statistics meaning between manifold structures and given classes,

so that we gain some meaningful insights to make MCC prone to be

transparent.

The new manifold learning framework for both clustering and

classiﬁcation is interesting from a number of perspectives:

(1) Our algorithm can perform manifold clustering learning

which can auto-determine the clustering parameters with-

out manual determining.

(2) Our algorithm can perform manifold classiﬁcation learning

which models the posterior probabilities pð

Þ by using

the Bayesian rule.

(3) Our algorithm can provide the statistical relationship

between the manifold structure and the given classes.

The experimental results on both synthetic and real-life data-

sets all demonstrate the effectiveness and potential of MCC.

The rest of this paper is organized as follows: Section 2 reviews

the related works. Section 3 describes the proposed manifold

learning framework for both clustering and classiﬁcation. Prelimi-

nary experimental results are shown in Section 4. Finally, we give

concluding remarks and future work in Section 5.

2. Related works

There have been several recent related works to inherit the

merits of both clustering and classiﬁcation learning. We review

the main works as follows.

2.1. Fuzzy relational classiﬁer

Fuzzy Relational Classiﬁer (FRC) [12] was proposed to provide a

transparent alternative to the black-box techniques such as neural

networks. As show in Fig. 1, in FRC, FCM is ﬁrstly adopted as the

clustering criterion to discover the natural structure in data, and

its objective function is as follows:

FCM

ðU; VÞ¼

j¼1

i¼1



; ð1Þ

where fx

; x

; ...; x

g and f

;

; ...;

g are the training samples

and cluster centers, respectively; and u

is the fuzzy memberships

of x

. By deﬁnition, each sample x

satisﬁes the constraint

j¼1

¼ 1. And then, a relation matrix R is computed for the

obtained fuzzy partition and the given hard class labels. In FRC,

FCM is unable to group the datasets consisting of the non-

spherical clusters, so that the interpretation of the clustering or

classiﬁcation results may be biased.

Afterwards, we have presented Robust FRC (RFRC) [13] to

improve both clustering and classiﬁcation performance of FRC in

our previous work. Speciﬁcally, in the clustering phase, the robust

Kernelized FCM (KFCM) [14,15] is adopted to replace FCM which

can be described as below:

KFCM

ðU; VÞ¼

j¼1

i¼1

/ðx

Þ/ð



; ð2Þ

where / is an implicit nonlinear map from the input space to a

rather high dimensional feature space. Compared to FCM, KFCM

based on RBF kernel is a robust estimator according to

M-estimator and is more ﬂexible for clustering non-spherical data.

Next, in the classiﬁcation phase, the soft class label motivated by

the fuzzy k-nearest-neighbor [28] is employed to replace the hard

class label. With the incorporation of both KFCM and the soft class

labels, RFRC makes the constructed relation matrix R more really

reﬂect the relationship between the classes and clusters, and thus

signiﬁcantly boosts the performance of FRC.

It is worth to point out that in FRC and RFRC, the entries in the

relation matrix R lack the statistical meaning, thus it is difﬁcult to

judge whether the obtained relationship is really reliable.

2.2. Radial basis function neural networks

Radial Basis Function neural networks (RBFNN) [16,17],as

shown in Fig. 2, is a feed-forward multi-layer network. It usually

consists of three layers: input layer, hidden layer and output layer.

Each basis function

corresponds to a hidden unit and w

repre-

sents the weight from the kth basis function or hidden unit to the

lth output units.

In the training phase of RBFNN, the basis function

for each

hidden node can be determined by

RBF

ðÞ¼exp

 x 

; ð3Þ

Training Data

Exploratory Data

Analysis

Logical

Interpretation

Features

Unsupervised

Fuzzy Clustering

Cluster Means

Fuzzy

Partition

Class Labels

-composition,

aggregation

Fuzzy Relation

Fig. 1. Training process of FRC and RFRC.

642 W. Cai / Knowledge-Based Systems 89 (2015) 641–653

剩余12页未读，继续阅读

评论收藏

内容反馈

weixin_38691703

粉丝: 2
资源: 961

用于聚类和分类的多种学习框架

Python-用Keras实现的多种深度学习文本分类模型

基于多种机器学习算法的分类预测研究

以imdb影评数据集为例，实践多种传统机器学习和神经网络用于文本分类

tweetcluster:Python包，用于通过多种方法对小型文档进行聚类

国外开源聚类框架

基于Spark框架和ASPSO的并行划分聚类算法.docx

多种机器学习算法在文本分类上的应用-分析与比较(附源码与数据集)

行业分类-设备装置-用于生产一种或多种金合金的中间合金组合物.zip

行业分类-设备装置-用于从多种媒体检索数据的设备以及其方法.zip

模糊聚类分析及其应用 电子书

基于动态邻域的三支聚类分析

利用C#语言开发K-Means聚类算法

基于storm海量数据的实时聚类

Velodyne Lidar点云聚类算法

generator-pkg:用于多种 NPM 模块类型和测试框架的 Node.JS 生成器

Exchange:用于创建，共享和重用学习资源的数字平台–来自多种资源的资产集合

多种深度特征学习，用于监控视频中的对象检索

XLearning是一款支持多种机器学习、深度学习框架调度系统

基于多种植被指数时间序列与机器学习的作物遥感分类研究.pdf

谱系聚类在综合国力分析中的应用分享.pdf

DMSC(深度多模态子空间聚类基于Pytorch的实现)，融合方式为亲和融合，空间融合很容易也可以实现。

cpp-FAISS是FacebookAI研究团队开源的针对聚类和相似性搜索库

蚁群聚类算法极其改进

最新资源

模糊聚类分析及其应用电子书