2 X. Huang et al. / Knowledge-Based Systems 0 0 0 (2018) 1–15
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 26, 2018;14:47 ]
scale data set. Thus, e
−scatter
will tend to zero and often overflow
in the implementation of the algorithm.
In this paper, we proposed a k -means type clustering frame-
work by using a new fashion to weight features with an l
2
-norm
regularization. Based on the method of weighting features, a new
clustering algorithm, named l
2
-Wkmeans, is proposed by extend-
ing the W- k -means algorithm [1,8] for clustering numerical data
sets and two new algorithms, named l
2
-NOF and l
2
-NDM, are pro-
posed by extending MKM_NOF and MKM_NDM [14] for cluster-
ing categorical data sets, respectively. By combining the l
2
-norm
regularization and the nonnegative constraint to feature weights,
our proposed methods can effectively select discriminative features
and reduce the effects of noisy features in the clustering process.
In addition, different from MKM_NOF and MKM_NDM have no ca-
pability of feature selection and W- k -means calculates a weight
for every feature in a whole data set as the feature selection, our
proposed methods calculate a weight for every feature in a clus-
ter as the feature selection, which results in our proposed method
are more robust than the original methods against noise. Then,
we achieve the iterative updating rules of the three algorithms
by minimizing the corresponding objective functions. Extensive ex-
periments on both numerical and categorical data sets corrobo-
rate that our proposed methods can improve clustering results by
more effective feature selection with an l
2
-norm regularization.
The main contributions of this work are twofold:
• We propose a k -means type clustering framework by using a
new fashion to weight features with an l
2
-norm regularization.
On the basis of the framework, we propose three clustering al-
gorithms: the l
2
-Wkmeans algorithm for clustering numerical
data sets, the l
2
-NOF and l
2
-NDM algorithms for clustering cat-
egorical data sets.
• We develop an objective function for the weighting k-means
type clustering framework and give the complete proof of the
convergence of our proposed algorithms by optimizing the cor-
responding objective functions.
The rest of the paper is organized as follows: a brief re-
view related to k -means type clustering algorithms on a numeri-
cal data set and a categorical data set is presented in Section 2 .
Section 3 introduces the details of our proposed algorithms. Exper-
iments on both numerical data sets and categorical data sets are
presented in Section 4 . Finally, we discuss the features of our pro-
posed algorithms in Section 5 and conclude this paper in Section 6 .
2. Related work
This section gives a briefly survey of the previous works which
involve the k -means type clustering algorithms on numerical data
sets and categorical data sets.
2.1. k -means type clustering on numerical data sets
The last decades have witnessed the rapid development of k -
means type clustering algorithms which range from no weighting
k -means type algorithms to various weighting k -means algorithms.
2.1.1. No weighting k -means type clustering algorithms
Basic k -means algorithm aims at finding a partition such that
the sum of the squared distances between the empirical means
of the clusters and the objects in the clusters is minimized. Ba-
sic k -means algorithm has been improved from the different as-
pects to overcome its weaknesses. Since k -means type algorithms
are sensitive to initial centroids, Arthur and Vassilvitskii proposed
k -means++ algorithm [15] which chooses the initial centroids by
maximizing the distances among them. Another weakness of k -
means type algorithms is to require manually tuning the parame-
ter k (the number of clusters). Pelleg and Moore proposed X-means
[16] to automatically seek the number of clusters by optimizing a
criterion such as Akaike Information Criterion or Bayesian Informa-
tion Criterion. Basic k -means algorithm and most of its variations
usually get stuck at local optima. Therefore, Bagirov proposed a
global k -means algorithm [17,18] which dynamically adds one clus-
ter center at a time and uses each data point as a candidate for
the k th cluster center. Since the global k -means mentioned above
requires a large amount of computational cost, Bai et al. [19] pro-
posed an acceleration mechanism for the production of new clus-
ter centroids by applying local geometrical information to describe
approximately the set of objects.
Moreover, in order to cater for the demands of clustering data
sets in different applications, several variations [20–25] of k -means
algorithms are proposed. Bradley et al. [20] presented a fast scal-
able and single pass version of k -means that is able to solve the
clustering problem of data stream. Dhillon and Modha studied a
certain spherical k -means algorithm [21] for clustering a docu-
ment data set. To cluster a large-scale data set, the triangle in-
equality is employed to accelerate the standard k -means algorithm
[22–24] . Similar to our proposed method, Regularized k -means
[25] aims at improving clustering results on high-dimensional data
sets by using the penalty term for eliminating the effects of re-
dundant features. However, Regularized k -means [25] use l
1
-norm
penalty term to centroids instead of l
2
-norm penalty term to fea-
ture weights in our proposed method.
This type of k -means algorithms equally treats all the features
and has no capability to feature selection. Therefore, the clustering
results produced by this type of algorithms are usually not promis-
ing when a data set includes noisy features.
2.1.2. Weighting k -means type clustering algorithms
All the features are employed equally in the clustering pro-
cess of no weighting k -means type clustering algorithms. In ac-
tual practice, a useful clustering pattern usually hides in a sub-
space defined by a subset of all the features. Therefore, many re-
searchers [1,7,8] are devoted to study various types of weighting
feature ways to find the hidden subspaces. Huang et al. proposed
the W- k -means [1] algorithm by using the βth power to constrain
the feature weights to tune the distribution of feature weights. In
W- k -means, the same feature in the different clusters shares a sin-
gle value of feature weight. As a matter of fact, the same feature in
the different clusters usually has different importance in real ap-
plications. For the sake of solving the problem mentioned above,
Chan et al. proposed the Attributes-Weighting clustering Algorithm
(AWA) [8] by using the similar weighting feature technique as W-
k -means to calculate a weight for every feature in each cluster.
Let X = { X
1
, X
2
, . . . , X
n
} be a set of n objects. Object X
i
=
{ x
i 1
, x
i 2
, . . . , x
im
} is characterized by a set of m features. The mem-
bership matrix U is a n × k binary matrix, where u
ip
= 1 indicates
that object i is assigned to cluster p , otherwise, it is not assigned
to cluster p . Z =
{
Z
1
, Z
2
, . . . , Z
k
}
is a set of k vectors representing
the centroids of k clusters. W is a weighting matrix where each
row W
p
= { w
p1
, w
p2
, . . . , w
pm
} denotes a weight vector of all the
features in a cluster. The attributes-weighting clustering algorithm
can be formulated as
P (U, W, Z) =
k
p=1
n
i =1
u
ip
m
j=1
w
β
pj
( x
ij
− z
pj
)
2
, (1)
subject to
u
ip
∈ { 0 , 1 } ,
k
p=1
u
ip
= 1 ,
m
j=1
w
pj
= 1 , 0 ≤ w
pj
≤ 1 , (2)
where β is parameter to tune the distribution of feature weights.
β > 1, and the larger of the β
s value, the more similar among the
weights of different features in a cluster is. The AWA algorithm can
Please cite this article as: X. Huang et al., A new weighting k -means type clustering framework with an l
2
-norm regularization,
Knowledge-Based Systems (2018), https://doi.org/10.1016/j.knosys.2018.03.028