自适应分类论文资源-CSDN文库

需积分: 10 168 浏览量 2017-12-16 14:57:18 上传评论收藏 3.85MB PDF 举报

### 自适应分类论文知识点概述 #### 一、研究背景与目的本文提出了一种改进的模糊C均值(Fuzzy C-Means, FCM)聚类算法——鲁棒学习型模糊C均值(Robust Learning-based Fuzzy C-Means, RL-FCM)算法。该算法旨在解决传统FCM及其变种在初始化、参数选择以及未知聚类数量方面存在的问题。 #### 二、FCM算法简介 - **定义**: FCM是模糊聚类中最常用的方法之一。 - **原理**: 通过最小化每个数据点到所属聚类中心的距离加权平方和来确定最优聚类。 - **局限性**: - 对初始值敏感。 - 需预先设定模糊度指数\(m\)和聚类数量。 - 容易陷入局部最优解。 #### 三、RL-FCM算法的创新点 - **自由模糊度指数**: 通过引入熵惩罚项调整偏差,使得算法不再依赖于模糊度指数\(m\)。 - **鲁棒学习框架**: 构建了一个鲁棒学习框架,使算法能够自动确定最佳聚类数量,无需参数选择。 - **计算复杂度分析**: 文章还对RL-FCM算法的计算复杂度进行了分析。 #### 四、鲁棒性特征 RL-FCM算法具有以下三个显著的鲁棒性特征: 1. **对初始化的鲁棒性**: 即使在不同的初始化条件下,RL-FCM也能保持良好的性能,避免了传统FCM算法对初始化高度敏感的问题。 2. **无需参数选择**: 通过设计鲁棒学习框架,RL-FCM算法能够在没有额外参数的情况下找到最佳解决方案,这极大地降低了算法的使用门槛。 3. **未知聚类数量**: RL-FCM能够自动检测数据集中的最佳聚类数量,即使事先不知道具体数目也能有效地进行聚类。 #### 五、算法实现步骤 - **引入熵惩罚项**: 为了解决对模糊度指数\(m\)的依赖问题,RL-FCM算法引入了基于熵的惩罚项,通过调整这些惩罚项来动态地改变模糊度,从而实现对\(m\)值的自由选择。 - **鲁棒学习框架的设计**: 通过构建鲁棒学习框架,RL-FCM能够根据数据本身的特性来自动确定最佳的聚类数量,而无需人为指定。这一框架包括但不限于自适应阈值设置、聚类质量评估等机制。 - **实验验证**: 通过一系列实验验证了RL-FCM算法的有效性和鲁棒性,并与现有的其他聚类方法进行了比较,结果显示RL-FCM在处理不确定性和复杂数据集时表现更优。 #### 六、计算复杂度分析文章还详细讨论了RL-FCM算法的计算复杂度,这对于理解和优化算法的实际应用非常重要。通过对算法的每一步骤进行细致分析,可以得出RL-FCM算法的时间复杂度和空间复杂度,进而评估其在大规模数据集上的可行性。 #### 七、结论与展望综上所述,RL-FCM算法通过引入熵惩罚项和构建鲁棒学习框架,有效解决了传统FCM算法中存在的问题,如对初始化敏感、依赖参数选择以及需要预先知道聚类数量等。这种改进不仅提高了算法的鲁棒性和准确性,也为实际应用场景提供了更加灵活和实用的聚类工具。未来的研究方向可能集中在进一步优化算法性能、扩展其适用范围以及与其他机器学习技术的结合等方面。通过上述总结,我们可以看到RL-FCM算法在模糊聚类领域的重要贡献,它不仅解决了现有算法的一些关键问题,而且为后续的研究和发展奠定了坚实的基础。

资源推荐

资源详情

资源评论

Pattern Recognition 71 (2017) 45–59

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/patcog

Robust-learning fuzzy c-means clustering algorithm with unknown

number of clusters

Miin-Shen Yang

a , ∗

, Yessica Nataliani

a , b

Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan

Department of Information Systems, Satya Wacana Christian University, Salatiga 50711, Indonesia

a r t i c l e i n f o

Article history:

Received 27 August 2016

Revised 12 May 2017

Accepted 20 May 2017

Available online 22 May 2017

Keywords:

Fuzzy clustering

Fuzzy c-means (FCM)

Robust learning-based schema

Number of clusters

Entropy penalty terms

Robust-learning FCM (RL-FCM)

a b s t r a c t

In fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used clustering method.

Various extensions of FCM had been proposed in the literature. However, the FCM algorithm and its

extensions are usually affected by initializations and parameter selection with a number of clusters to

be given a priori. Although there were some works to solve these problems in FCM, there is no work

for FCM to be simultaneously robust to initializations and parameter selection under free of the fuzzi-

ness index without a given number of clusters. In this paper, we construct a robust learning-based FCM

framework, called a robust-learning FCM (RL-FCM) algorithm, so that it becomes free of the fuzziness

index m and initializations without parameter selection, and can also automatically ﬁnd the best number

of clusters. We ﬁrst use entropy-type penalty terms for adjusting bias with free of the fuzziness index,

and then create a robust learning-based schema for ﬁnding the best number of clusters. The computa-

tional complexity of the proposed RL-FCM algorithm is also analyzed. Comparisons between RL-FCM and

other existing methods are made. Experimental results and comparisons actually demonstrate these good

aspects of the proposed RL-FCM where it exhibits three robust characteristics: 1) robust to initializations

with free of the fuzziness index, 2) robust to (without) parameter selection, and 3) robust to number of

clusters (with unknown number of clusters).

1. Introduction

Clustering is a useful tool for data analysis. It is a method

for ﬁnding groups within data with the most similarity in the

same cluster and the most dissimilarity between different clusters.

The hierarchical clustering was supposed as the earliest clustering

method used by biologist and social scientists. Afterwards, cluster

analysis becomes a branch in statistical multivariate analysis [1] . It

is also an approach to unsupervised learning as one of the major

techniques in pattern recognition and machine learning. According

to the statistical point of view, clustering methods may be divided

as a probability model-based approach and a nonparametric ap-

proach. A probability model-based approach assumes that the data

set follows a mixture model of probability distributions so that a

mixture likelihood approach to clustering is used [2] , where the

expectation and maximization (EM) algorithm [3] is the most pop-

ular. For a nonparametric approach, clustering methods may be

based on an objective function of similarity or dissimilarity mea-

sures, where partitional methods are the most used. Graph clus-

∗

Corresponding author.

E-mail address: msyang@math.cycu.edu.tw (M.-S. Yang).

tering had also been developed and discussed in the literature [4] ,

for example, clustering based on similarity of user-behavior for tar-

geted advertising was investigated in Aggarwal et al. [5] .

Partitional clustering methods suppose that the data set can

be represented by ﬁnite cluster prototypes with their own objec-

tive functions. Therefore, to deﬁne the dissimilarity (or distance)

between data points and cluster prototypes is essential for parti-

tional methods. It is known that the k-means (or called hard c-

means) algorithm is the oldest and popular partitional method [6] .

For an eﬃcient estimation for a number of clusters, Pelleg and

Moore [7] extended k-means, called X-means, by making local de-

cisions for cluster centers in each iteration of k-means with split-

ting themselves to get better clustering. Users only need to specify

a range of cluster numbers in which the true cluster number rea-

sonably lies and then a model selection, such as Bayesian informa-

tion criterion (BIC) or Akaike information criterion (AIC), is used

to do the splitting process. Although the k-means and X-means al-

gorithms are widely used, these crisp clustering methods restricts

that each data point belongs to exactly one cluster with crisp clus-

ter memberships so that it can be well ﬁtted for sharp bound-

aries between clusters in data, but not good for unsharp (or vague)

boundaries.

http://dx.doi.org/10.1016/j.patcog.2017.05.017

46 M.-S. Yang, Y. Nataliani / Pattern Recognition 71 (2017) 45–59

Since Zadeh [8] proposed fuzzy set that introduced the idea of

partial memberships described by membership functions, it was

successfully applied in clustering. Fuzzy clustering has been widely

studied and applied in a variety of substantive areas more than

45 years [9–12] since Ruspini [13] ﬁrst proposed fuzzy c -partitions

as a fuzzy approach to clustering in the 1970s. In fuzzy clustering,

the fuzzy c-means (FCM) clustering algorithm proposed by Dunn

[14] and Bezdek [9] is the most well-known and used method.

There are many extensions and variants of FCM proposed in the

literature. The ﬁrst important extension to FCM was proposed by

Gustafson and Kessel (GK) [15] in which the Euclidean distance

in the FCM objective function was replaced by the Mahalanobis

distance. Afterwards, there are many extensions to FCM, such as

extensions to maximum-entropy clustering (MEC) by Karayiannis

[16] , Miyamoto and Umayahara [17] and Wei and Fahn [18] , ex-

tensions to L

norms by Hathaway et al. [19] , extension of FCM as

alpha-cut implemented fuzzy clustering algorithms by Yang et al.

[20] , extension of FCM for treating very large data by Havens et al.

[21] , an augmented FCM for clustering spatiotemporal data by Iza-

kian et al. [22] , and so forth. However, these fuzzy clustering algo-

rithms always need to give a number of clusters a priori. In gen-

eral, the cluster number c is unknown. In this case, validity indices

can be used to ﬁnd a cluster number c where they are supposed

to be independent of clustering algorithms. Many cluster validity

indices for fuzzy clustering algorithms had been proposed in the

literature, such as partition coeﬃcient (PC) [23] , partition entropy

(PE) [24] , normalization of PC and PE [25–26] , fuzzy hypervolume

(FHV) [27] and XB (Xie and Beni [28] ).

Frigui and Krishnapuram [29] proposed the robust competitive

agglomerative (RCA) algorithm by adding a loss function of clus-

ters and a weight function of data points to clusters. The RCA al-

gorithm can be used for determining a cluster number. Starting

with a large cluster number, RCA reduces the number by discard-

ing clusters with less cardinality. Some parameter initial values are

needed in RCA, such as time constant, discarding threshold, tun-

ing factor, etc. Another clustering algorithm was presented by Ro-

driguez and Laio [30] for clustering by fast search, called C-FS, us-

ing a similarity matrix for ﬁnding density peaks. They proposed

the C-FS algorithm by assigning a cutoff distance d

and selecting a

decision window so that it can automatically determine a number

of clusters. In [30] , the cutoff distance d

becomes another param-

eter in which clustering results are heavily dependent of the cut-

off parameter d

. Recently, Fazendeiro and Oliveira [31] presented

a fuzzy clustering algorithm with an unknown number of clusters

based on observer position, called focal point. With this point, ob-

server can select a suitable point while searching for clusters that

is actually appropriate to the underlying data structure. After the

focal point is chosen, the initialization of cluster centers must be

generated randomly. The inverse of XB index is used to compute

the validity measure. The maximal value is chosen to get the best

number of clusters. Although these algorithms can ﬁnd a number

of clusters during iteration procedures, they are still dependent of

initializations and parameter selections.

Up to now, there is no work in the literature for FCM to be si-

multaneously robust to initializations and parameter selection un-

der free of the fuzziness index without a given number of clusters.

We think that this may be due to its diﬃculty for constructing

this kind of robust FCM. In this paper, we try to construct a ro-

bust learning-based framework for fuzzy clustering, especially for

the FCM algorithm. This framework can automatically ﬁnd the best

number of clusters, without any initialization and parameter selec-

tion, and it is also free of the fuzziness index m . We ﬁrst consider

some entropy-type penalty terms for adjusting the bias, and then

create a robust-learning mechanism for ﬁnding the best number of

clusters. The organization of this paper is as follows. In Section 2 ,

we construct a robust learning-based framework for fuzzy cluster-

ing. The robust-learning FCM (RL-FCM) clustering algorithm is also

presented in this section. In Section 3 , several experimental exam-

ples and comparisons with numeric and real data sets are provided

to demonstrate the effectiveness of the proposed RL-FCM, which

can automatically ﬁnd the best number of clusters. Finally, conclu-

sions are stated in Section 4 .

2. Robust-learning fuzzy c-means clustering algorithm

Let X = { x

, . . . , x

} be a data set in a d -dimensional Euclidean

space R

and V = { v

, . . . , v

} be the c cluster centers with its

Euclidean norm denoted by d

=  x

− v







j=1

( x

− v

)

The fuzzy c-means (FCM) objective function [9 –10] is given with

(U , V ) =



k =1



i =1

where m > 1 is the fuzziness in-

dex, μ = { μ

}

n ×c

∈ M

fcn

is a fuzzy partition matrix with M

fcn

{ μ = [ μ

]

| ∀ i, ∀ k, 0 ≤ μ

≤ 1 ,



k =1

= 1 , 0 <



i =1

< n } ,

and d

=  x

− v



is the Euclidean distance. The FCM al-

gorithm is iterated through necessary conditions for min-

imizing J

( U, V )with the updating equations for cluster

centers and memberships as: v



i =1



i =1

and

= ( d

)

−2 / (m −1)



t=1

( d

)

−2 / (m −1)

We know that the FCM algorithm is dependent on initial val-

ues and some parameters need to be given a priori, such as a

fuzziness index m , cluster center initialization and also a num-

ber of clusters. Although there exist some works in the literature

to solve some problems in FCM, such as Dembélé and Kastner

[32] and Schwämmle and Jensen [33] on estimating the fuzziness

index m for clustering microarray data, there is no work for FCM

to be simultaneously robust to initializations and parameter selec-

tion under free of the fuzziness index m without a given num-

ber of clusters. Next, we construct a robust learning-based schema

for FCM to simultaneously solve these problems. Our basic idea is

that, we ﬁrst consider all data points as initial cluster centers, i.e.,

the number of data points is the initial number of clusters. After

that, use the mixing proportion α

of the cluster k , which is like

a cluster weight, and discard these clusters that have values of α

less than one over the number of data points. The proposed al-

gorithm can iteratively obtain the best number of clusters until it

converges.

For a data set X = { x

, . . . , x

} in R

with c cluster centers to

have FCM be simultaneously robust to initializations and param-

eter selection under free of the fuzziness index m that can au-

tomatically ﬁnd the best number of clusters, we add several en-

tropy terms in the FCM objective function. First, to construct an

algorithm free of the fuzziness index m , we replace m by adding

an extra term with a function of μ

. In this sense, we consider

the concept of MEC [16 –18] by adding the entropy term of mem-

berships with



k =1



i =1

ln μ

. Moreover, we use a learning

function r , i.e. r



k =1



i =1

ln μ

, to learn the effects of the en-

tropy term for adjusting bias. We next use the mixing proportion

α = ( α

, ··· , α

) of clusters, where α

presents the probability of

one data point belonged to the k th cluster with the constraint



k =1

= 1 . Hence, − ln α

is the information in the occurrence of

a data point belonged to the k th cluster. Thus, we add the entropy

term



k =1



i =1

ln α

to summarize the average of information

for the occurrence of a data point belonged to the corresponding

cluster over fuzzy memberships. Furthermore, we borrow the idea

of Yang et al. [34] in the EM algorithm by using the entropy term,



k =1

ln α

, to represent the average of information for the oc-

currence of each data point belonged to the corresponding cluster.

Totally, the entropy terms of mixing proportion in probability and

the average of occurrence in probability over fuzzy memberships

are used for learning to ﬁnd the best number of clusters.

M.-S. Yang, Y. Nataliani / Pattern Recognition 71 (2017) 45–59 47

According to the above construction for FCM, we propose a

robust-learning FCM (RL-FCM) objective function as follows:

J(U , α, V ) =



k =1



i =1

− r



k =1



i =1

ln α

+ r



k =1



i =1

ln μ

− r



k =1

ln α

, (1)

where r

, r

≥ 0 and d

=  x

− v







j=1

( x

− v

)

. The

Lagrangian function of (1) is

J (U , α, λ

, λ

, V ) =



k =1



i =1

− r



k =1



i =1

ln α

+ r



k =1



i =1

ln μ

− r



k =1

ln α

− λ





k =1

− 1



− λ





k =1

− 1



. (2)

By considering the Lagrangian function in (2) , the updating

equation for membership function, cluster center, and mixing pro-

portion are as follows. The updating equation for the RL-FCM ob-

jective function J ( U , α, V ) with respective to v

is as Eq. (3) .



i =1





i =1

. (3)

By taking the partial derivative of the Lagrangian in

Eq. (2) with respect to μ

and setting them to be zero, it

becomes

∂

∂ μ

= d

− r

ln α

+ r

( ln μ

+ 1 ) − λ

= 0 and then

ln μ

( −d

+ r

ln α

+ λ

−r

)

. Thus, the updating equation for μ

obtained as follows:

= exp



−d

+ r

ln α





t=1

exp



−d

+ r

ln α



. (4)

Similarly, we have

∂

∂ α

= −r



i =1

− r

n ( ln α

+ 1 ) − λ

= 0 .

By multiplying with α

, we obtain

−r



i =1

− r

n α

(

ln α

+ 1

)

− λ

= 0 (5)

and then −r



k =1



i =1

−



k =1

n r

ln α

−



k =1

n r

−



k =1

= 0 . We get

= −n r

− n r



k =1

ln α

− n r

. (6)

By substituting (6) to (5) , we have −r



i =1

−

( ln α

+ 1 ) − ( −n r

− n r



k =1

ln α

− n r

) α

= 0 .

Thus, the updating equation for α

can be obtain as follows:

(new )



i =1

(old)



ln α

(old)

−



t=1

(old)

ln α

(old)



. (7)

For solving the initialization problem, all data points is as-

signed as initial clusters for the ﬁrst iteration. That is, c

(0)

= n

and α

(0)

= 1 /c = 1 /n , k = 1 , ··· , c. There are competitions between

these mixing proportions according to Eq. (7) . Iteratively, the al-

gorithm can ﬁnd the ﬁnal number of clusters c by utilizing the

following Eq. (8) . When α

(new )

< 1 /n , we discard illegitimate mix-

ing proportion α

(new )

. Therefore, the updating number of clusters

( new )

(new )

= c

(old)

−



{

(new )



(new )

< 1 /n, k = 1 , 2 , .., c

(old)

}



(8)

where |{}| denotes the cardinality of the set {}. After updating the

number of clusters c , the remaining mixing proportion α

∗

and cor-

responding μ

∗

need to be re-normalized by,

∗

= α

∗



(new )



t=1

∗

(9)

∗

= μ

∗



(new )



t=1

∗

(10)

Eqs. (9) and (10) keep the constraints



(new )

k =1

∗

= 1 and



(new )

k =1

∗

= 1 . We utilize this concept to estimate the best num-

ber of clusters c

∗

A new problem is how to learn the values of the three parame-

ters, r

, r

, and r

for the three penalty terms



k =1



i =1

ln α



k =1



i =1

ln μ

, and



k =1

ln α

, respectively. By consider-

ing some decrease learning rates, such as e

−t

, e

−t/ 10

, e

−t/ 100

, and

−t/ 10 0 0

, we know that y = e

−t/ 10 0 0

decreases slower, but y = e

−t

decreases faster. Since



k =1



i =1

ln α

has effect on member-

ship partition and mixing proportion, we assume that r

is not set

to decrease too slow or too fast. Therefore, we set r

(t)

= e

−t/ 10

. (11)

On the other hand, because the term



k =1



i =1

ln μ

is the

entropy to the partition membership μ

and has effect on the

clustering results, the parameter r

should maintain large value

and does not need too much variation in iterative process. In this

sense, we consider the decreasing learning rate for r

by assigning

it with

(t)

= e

−t/ 100

. (12)

In order to avoid that



k =1

ln α

interferes with



k =1



i =1

ln μ

when the algorithm is stable, the term



k =1

ln α

needs large effect in initially iterative process and

small effect when the algorithm is stable. Since r

is a control

scale for the entropy to α

, we consider that r

is related to

the variation of the mixing proportion | α

(new )

− α

(old)

| . Our goal

is that r

can control competition of the mixing proportions.

Therefore, ﬁrst r

is deﬁned with



k =1

exp



−ηn



(new )

− α

(old)





, (13)

where η = min { 1 , 2 /d

 d/ 2 −1 

} and the notation  a  denotes the

largest integer no more than a . In Eq. (13) , if | α

(new )

− α

(old)

| is

small, then r

will become large to enhance its competition. If

| α

(new )

− α

(old)

| is large, then r

is small to maintain stability.

In addition, the competition of the mixing proportions for the

higher dimensional data needs the larger value of r

. Therefore,

η = min { 1 , 2 /d

 d/ 2 −1 

} is to adjust r

. Furthermore, we need to

consider the restriction of max

1 ≤k ≤c

(new )

≤ 1 . However, max

1 ≤k ≤c

(new )

≤ max

1 ≤k ≤c

(



i =1

) +

max

1 ≤k ≤c

(old)

( ln max

1 ≤k ≤c

(old)

−



t=1

(old)

) and max

1 ≤k ≤c

(



i =1

) +

max

1 ≤k ≤c

(old)

( ln max

1 ≤i ≤c

(old)

−



t=1

(old)

ln α

(old)

) < max

1 ≤k ≤c

(



i =1

) + r

(−( max

1 ≤k ≤c

(old)



t=1

(old)

ln α

(old)

)) .

Therefore, if max

1 ≤k ≤c

(



i =1

) − r

max

1 ≤k ≤c

(old)



t=1

(old)

ln α

(old)

≤ 1 , then the restriction will be held. It follows that

剩余14页未读，继续阅读

评论收藏

内容反馈

mnx-毛

粉丝: 26
资源: 4

自适应分类论文

自适应分类

论文研究 - 自适应分类方法，预测护理人员的过渡情况

NeurIPS 2020上与【域自适应】相关论文（六篇）

基于最大散度差鉴别准则的自适应分类算法.pdf

自适应识别ICCV 论文

论文研究-基于遗传算法的多模型自适应控制.pdf

基于最大散度差鉴别准则的自适应分类算法

自适应控制论文

论文研究-三角网格的自适应细分研究.pdf

基于自适应多元多尺度色散熵的心电信号分类研究_毕业论文.pdf

论文研究-基于自适应权重的模糊C-均值聚类算法.pdf

基于深度学习网络的神经元自适应投影分类方法.pdf

基于相容粒度空间模型的自适应图像语义分类方法

基于YCbCr空间的亮度自适应肤色检测论文

自适应分类成对降维算法

HEVC SAO样点自适应补偿方法论文

最新资源