没有合适的资源?快使用搜索试试~ 我知道了~
clustering methods unsupervised classification
需积分: 5 13 下载量 133 浏览量
2008-03-15
16:06:32
上传
评论
收藏 453KB PDF 举报
温馨提示
试读
11页
clustering methods unsupervised classification
资源推荐
资源详情
资源评论
Rasim Latifovic, Josef Cihlar, Jean Beaubien
Intermap Technologies Ltd., Ottawa, Canada
Canada Centre for Remote Sensing, Ottawa, Canada
Canadian Forest Service, Canada
ABSTRACT
In this paper, we have examined characteristics of spectral clusters produced by several
unsupervised classification algorithms. We have also designed a new cluster merging
strategy for a previously developed unsupervised classification procedure. The clustering
methods were compared using a summer Landsat Thematic Mapper image from central
Canada which contained forest, cropland, wetland and other cover types. We have found
that a strategy employing spectral similarity and cluster size to guide the cluster merging
process yields clusters in which spectral homogeneity and size are well balanced across the
range of spectral clusters found in the scene. The new clustering approach requires only
three control parameters, thus facilitating consistent results when applied in other areas. The
results also suggest that the dual cluster size – spectral homogeneity approach produces
more consistent and refined clustering results than purely statistically-based methods,
although testing over a broader range of conditions is needed to ascertain this with
confidence.
1.0 INTRODUCTION
Image classification is an important analysis method for extracting information from multispectral
remotely sensed images. A major application is in land cover mapping, where the goal is to obtain
relatively few classes (~10-30). However, a large number of combinations of spectral values from
individual bands is typically found in such images. The aim of classification techniques is thus to reduce
the number of individual combination to a small number of classes. Among the various classification
approaches, unsupervised image classification methods are designed to make the best possible use of the
overall spectral content of an image, with the important condition that these remain as thematically uniform
as possible. However, thematic content cannot be evaluated until the number of clusters is relatively small
and can be grouped into the desired classification legend. Thus, during the clustering process the
homogeneity of clusters is typically evaluated using spectral measures.
In the development of an unsupervised classification method, two problems must be solved: a way
for locating centroids of the initial clusters in the multispectral space, and the process of finding the
optimum location of the final clusters. Various algorithms have been developed to deal with these steps.
For example, the K-means algorithm (Tou and Gonzales, 1974) selects the desired number of cluster
centroids in some manner and then attempts to find their optimum location by moving these around in
subsequent iterations. The location of the new centroids at each iteration is updated based on the mean
spectral characteristics of the pixels currently in the cluster. ISODATA (Tou and Gonzales, 1974) uses a
more complex control of the process leading to new cluster locations; at each iteration, clusters are split or
lumped depending on the spectral cohesiveness within clusters in relation to that among clusters. In their
Classification by Progressive Generalization (CPG) method, Cihlar et al. (1998) assumed that all initial
∗
Presented at the Fourth International Airborne Remote Sensing Conference and Exhibition/ 21
st
Canadian
Symposium on Remote Sensing , Ottawa, Ontario, Canada, 21-24 June 1999.
CLUSTERING METHODS FOR UNSUPERVISED CLASSIFICATION
∗
∗∗
∗
spectral combinations could be potential cluster centroids, and the challenge is therefore to find those ‘most
important’ in the image. Cihlar et al. (1998) used a series of steps that gradually reduce the number of
spectral combinations, yet cause minimal loss of information from the image (as determined from
visual image interpretation).
In this paper, we examine the effect of various cluster merging procedures on the classification
results. Three unsupervised classification procedures are tested: K-means, ISODATA, and CPG. We also
develop and test another merging strategy which combines the strengths of these approaches. The three
approaches are selected because of their strengths in specific respects. The main advantage on K-means is
the lack of need for control parameters and thus reproducibility. The principal strength of ISODATA is the
sensitive control of spectral homogeneity of the clusters. The main advantage of CPG is its non-iterative
nature, thus preserving the relation between the seed clusters and the final clusters.
2.0 METHODOLOGY
A Landsat Thematic Mapper image from central Saskatchewan, Canada (path 37, row 22, August
1991) was used in this study. The image contains various cover types (mainly forest but also cropland and
wetland), was previously classified by (Beaubien et al,. 1999), and a detailed ground data were available
for this scene.
In merging clusters, the options are to use a) the maximum desired number of clusters; b) relative
cluster size; c) proximity of cluster centroids; d) cluster purity; or a combination of these. For example,
ISODATA uses a) and d) (defined as a maximum standard deviation), K-means uses a), and CPG uses c)
applied to smallest clusters first. To preserve small but potentially important clusters, a different merging
strategy is required. Five clustering methods were evaluated in this study.
2.1 CLASSIFICATION BY PROGRESSIVE GENERALIZATION
Classification by progressive generalization (CPG) is an unsupervised classification which finds
means for representative spectral clusters in the data set, assigns every pixel to a cluster, and combines
similar clusters until the remaining clusters can be assigned thematic labels (Cihlar et al., 1998). The
procedure consists of the following steps:
1. Contrast stretch,
2. Quantization,
3. Spatial image filtering,
4. Identification of large seed clusters,
5. Merge medium–sized pure clusters
6. Classification
7. Merging clusters using spectral similarity
8. Identifying candidates for merging using spectral and spatial similarity
9. Merging clusters by spectral and spatial measures plus large-scale pattern
10. Cluster labeling (step 9 and 10 are supervised by analyst decision).
Of specific interest in this paper is step 7 in which the spectral similarity SS
ij
is computed between
all pairwise combinations of the clusters. SS
ij
is defined as (Cihlar et al., 1998):
ij
jiij
ij
SD
SS
SS
+
=
, (1)
S
S
ij
ijk ik
k
n
ijk
k
n
=
=
=
(cos * )
cos
1
1
, (2)
ij
jkik
ijk
SD
MM
||
cos
−
=
, (3)
SD M M
ij ik jk
k
n
=−
=
(),
2
1
(4)
where i≠j and M is the arithmetic cluster mean; S
ij
is the standard deviation of cluster i in the direction of
the cluster j centroid; S
ik
is the standard deviation of cluster i in spectral channel k; SD is the spectral
distance between clusters; cos is cosine of the angle between clusters; i,j are cluster numbers; k is the
spectral channel; n is the total number of spectral channels.
In its application, all remaining spectral clusters are first sorted according to decreasing size.
Starting with the smallest cluster i, the cluster j which has the lowest SD
ij
is found. Next, all clusters r with
SD
ir
≤1.1*SD
ij
are found. Cluster i is then identified to be merged with cluster p provided that SS
ip
>SS
iq
for
p,q
∈ r. That is, if several clusters have similar distance in the multispectral space to i, the one spectrally
closest overall is merged in preference to those that are more distant.
In the remainder of this paper the above clustering approach is labeled ‘CPG’.
2.2 MODIFIED CPG: CPGSM AND CPGCS
As an alternative to step 7 of the original CPG procedure (section 2.1), a clustering approach was
developed which places higher emphasis on the proximity of clusters in the spectral space. It was employed
in two forms, spectral proximity alone (‘CPGsm’) and spectral proximity constrained by cluster size
(‘CPGcs’). The goal of CPGsm is to merge spectrally very similar clusters but also to preserve the
dominance of larger clusters. The main decision rules are:
CPGsm:
If (N
current
>N
cl,end
) and (SD
ij
≤
SD
max
) then merge. (5)
CPGcs:
If (N
current
>N
cl,end
) and (NP
i
<NP
l
) and (NP
j
<NP
l
) and (SD
ij
≤
SD
max
) then merge. (6)
where N
current
is the current number of clusters; N
cl,end
is the number of desired clusters; NP
i
, NP
j
are the
sizes of clusters i and j; NP
l
is threshold cluster size to consider a cluster for merging; SD
ij
is the spectral
distance of centroids of clusters i and j; SD
max
is the maximum allowable SD for i, j to merge. That is, the
merging process is constrained by N
cl,end
, SD
max
and, for CPGcs, also by NP
l
. In either case, the merging
proceeds from the two spectrally closest clusters. If the remaining number of clusters is greater than desired
number a cluster pair is found with the smallest distance SD
ij
which satisfies the spectral distance threshold
and, for CPGcs, also satisfies an additional cluster size threshold for merging.
SD
max
is the maximum spectral distance between cluster centroids that should be considered for merging. It
thus helps ensure that dissimilar clusters, even though very small, are not merged. This threshold also
allows to gradually relax the spectral similarity constraint as the number of remaining clusters decreases.
SD
max
, N
cl,end
,and NP
l
are computed as follows: SD
max
from Eq. (7), N
cl,end
from SD table (Eq. (9), and NP
l
from Eq.(8).
2
max k
qSD = , (7)
where q
k
= number of digital levels per quantized level in the k
th
spectral dimension (Cihlar et al., 1998).
The merging cluster size threshold NP
l
is related to the number of clusters that would remain after the
merging using SD
max
:
剩余10页未读,继续阅读
资源评论
snake63
- 粉丝: 0
- 资源: 13
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功