clusteringmethodsunsupervisedclassification资源-CSDN文库

需积分: 5 133 浏览量 2008-03-15 16:06:32 上传评论收藏 453KB PDF 举报

资源推荐

资源详情

资源评论

Rasim Latifovic, Josef Cihlar, Jean Beaubien

Intermap Technologies Ltd., Ottawa, Canada

Canada Centre for Remote Sensing, Ottawa, Canada

Canadian Forest Service, Canada

ABSTRACT

In this paper, we have examined characteristics of spectral clusters produced by several

unsupervised classification algorithms. We have also designed a new cluster merging

strategy for a previously developed unsupervised classification procedure. The clustering

methods were compared using a summer Landsat Thematic Mapper image from central

Canada which contained forest, cropland, wetland and other cover types. We have found

that a strategy employing spectral similarity and cluster size to guide the cluster merging

process yields clusters in which spectral homogeneity and size are well balanced across the

range of spectral clusters found in the scene. The new clustering approach requires only

three control parameters, thus facilitating consistent results when applied in other areas. The

results also suggest that the dual cluster size – spectral homogeneity approach produces

more consistent and refined clustering results than purely statistically-based methods,

although testing over a broader range of conditions is needed to ascertain this with

confidence.

1.0 INTRODUCTION

Image classification is an important analysis method for extracting information from multispectral

remotely sensed images. A major application is in land cover mapping, where the goal is to obtain

relatively few classes (~10-30). However, a large number of combinations of spectral values from

individual bands is typically found in such images. The aim of classification techniques is thus to reduce

the number of individual combination to a small number of classes. Among the various classification

approaches, unsupervised image classification methods are designed to make the best possible use of the

overall spectral content of an image, with the important condition that these remain as thematically uniform

as possible. However, thematic content cannot be evaluated until the number of clusters is relatively small

and can be grouped into the desired classification legend. Thus, during the clustering process the

homogeneity of clusters is typically evaluated using spectral measures.

In the development of an unsupervised classification method, two problems must be solved: a way

for locating centroids of the initial clusters in the multispectral space, and the process of finding the

optimum location of the final clusters. Various algorithms have been developed to deal with these steps.

For example, the K-means algorithm (Tou and Gonzales, 1974) selects the desired number of cluster

centroids in some manner and then attempts to find their optimum location by moving these around in

subsequent iterations. The location of the new centroids at each iteration is updated based on the mean

spectral characteristics of the pixels currently in the cluster. ISODATA (Tou and Gonzales, 1974) uses a

more complex control of the process leading to new cluster locations; at each iteration, clusters are split or

lumped depending on the spectral cohesiveness within clusters in relation to that among clusters. In their

Classification by Progressive Generalization (CPG) method, Cihlar et al. (1998) assumed that all initial

∗

Presented at the Fourth International Airborne Remote Sensing Conference and Exhibition/ 21

Canadian

Symposium on Remote Sensing , Ottawa, Ontario, Canada, 21-24 June 1999.

CLUSTERING METHODS FOR UNSUPERVISED CLASSIFICATION

∗

∗∗

∗

spectral combinations could be potential cluster centroids, and the challenge is therefore to find those ‘most

important’ in the image. Cihlar et al. (1998) used a series of steps that gradually reduce the number of

spectral combinations, yet cause minimal loss of information from the image (as determined from

visual image interpretation).

In this paper, we examine the effect of various cluster merging procedures on the classification

results. Three unsupervised classification procedures are tested: K-means, ISODATA, and CPG. We also

develop and test another merging strategy which combines the strengths of these approaches. The three

approaches are selected because of their strengths in specific respects. The main advantage on K-means is

the lack of need for control parameters and thus reproducibility. The principal strength of ISODATA is the

sensitive control of spectral homogeneity of the clusters. The main advantage of CPG is its non-iterative

nature, thus preserving the relation between the seed clusters and the final clusters.

2.0 METHODOLOGY

A Landsat Thematic Mapper image from central Saskatchewan, Canada (path 37, row 22, August

1991) was used in this study. The image contains various cover types (mainly forest but also cropland and

wetland), was previously classified by (Beaubien et al,. 1999), and a detailed ground data were available

for this scene.

In merging clusters, the options are to use a) the maximum desired number of clusters; b) relative

cluster size; c) proximity of cluster centroids; d) cluster purity; or a combination of these. For example,

ISODATA uses a) and d) (defined as a maximum standard deviation), K-means uses a), and CPG uses c)

applied to smallest clusters first. To preserve small but potentially important clusters, a different merging

strategy is required. Five clustering methods were evaluated in this study.

2.1 CLASSIFICATION BY PROGRESSIVE GENERALIZATION

Classification by progressive generalization (CPG) is an unsupervised classification which finds

means for representative spectral clusters in the data set, assigns every pixel to a cluster, and combines

similar clusters until the remaining clusters can be assigned thematic labels (Cihlar et al., 1998). The

procedure consists of the following steps:

1. Contrast stretch,

2. Quantization,

3. Spatial image filtering,

4. Identification of large seed clusters,

5. Merge medium–sized pure clusters

6. Classification

7. Merging clusters using spectral similarity

8. Identifying candidates for merging using spectral and spatial similarity

9. Merging clusters by spectral and spatial measures plus large-scale pattern

10. Cluster labeling (step 9 and 10 are supervised by analyst decision).

Of specific interest in this paper is step 7 in which the spectral similarity SS

is computed between

all pairwise combinations of the clusters. SS

is defined as (Cihlar et al., 1998):

jiij

, (1)

ijk ik

ijk

(cos * )

cos

, (2)

jkik

ijk

cos

−

, (3)

SD M M

ij ik jk

=−

(),

(4)

where i≠j and M is the arithmetic cluster mean; S

is the standard deviation of cluster i in the direction of

the cluster j centroid; S

is the standard deviation of cluster i in spectral channel k; SD is the spectral

distance between clusters; cos is cosine of the angle between clusters; i,j are cluster numbers; k is the

spectral channel; n is the total number of spectral channels.

In its application, all remaining spectral clusters are first sorted according to decreasing size.

Starting with the smallest cluster i, the cluster j which has the lowest SD

is found. Next, all clusters r with

≤1.1*SD

are found. Cluster i is then identified to be merged with cluster p provided that SS

>SS

for

p,q

∈ r. That is, if several clusters have similar distance in the multispectral space to i, the one spectrally

closest overall is merged in preference to those that are more distant.

In the remainder of this paper the above clustering approach is labeled ‘CPG’.

2.2 MODIFIED CPG: CPGSM AND CPGCS

As an alternative to step 7 of the original CPG procedure (section 2.1), a clustering approach was

developed which places higher emphasis on the proximity of clusters in the spectral space. It was employed

in two forms, spectral proximity alone (‘CPGsm’) and spectral proximity constrained by cluster size

(‘CPGcs’). The goal of CPGsm is to merge spectrally very similar clusters but also to preserve the

dominance of larger clusters. The main decision rules are:

CPGsm:

If (N

current

cl,end

) and (SD

≤

max

) then merge. (5)

CPGcs:

If (N

current

cl,end

) and (NP

<NP

) and (NP

<NP

) and (SD

≤

max

) then merge. (6)

where N

current

is the current number of clusters; N

cl,end

is the number of desired clusters; NP

, NP

are the

sizes of clusters i and j; NP

is threshold cluster size to consider a cluster for merging; SD

is the spectral

distance of centroids of clusters i and j; SD

max

is the maximum allowable SD for i, j to merge. That is, the

merging process is constrained by N

cl,end

, SD

max

and, for CPGcs, also by NP

. In either case, the merging

proceeds from the two spectrally closest clusters. If the remaining number of clusters is greater than desired

number a cluster pair is found with the smallest distance SD

which satisfies the spectral distance threshold

and, for CPGcs, also satisfies an additional cluster size threshold for merging.

max

is the maximum spectral distance between cluster centroids that should be considered for merging. It

thus helps ensure that dissimilar clusters, even though very small, are not merged. This threshold also

allows to gradually relax the spectral similarity constraint as the number of remaining clusters decreases.

max

, N

cl,end

,and NP

are computed as follows: SD

max

from Eq. (7), N

cl,end

from SD table (Eq. (9), and NP

from Eq.(8).

max k

qSD = , (7)

where q

= number of digital levels per quantized level in the k

spectral dimension (Cihlar et al., 1998).

The merging cluster size threshold NP

is related to the number of clusters that would remain after the

merging using SD

max

剩余10页未读，继续阅读

评论收藏

内容反馈

snake63

粉丝: 0
资源: 13

clustering methods unsupervised classification

最新资源

clustering methods unsupervised classification

模糊聚类分析--分类，数据分析与图像识别方法(英文版)

Statistical Pattern Recognition:A Review

The Elements of Statistical Learning 统计学习精要

Machine Learning Essentials: Practical Guide in R Book preview

Mastering.Data.Analysis.with.R.1783982020

Graph-based Natural Language Processing and Information Retrieval

A Tutorial on Network Embeddings

基于随机森林的个人信用评估模型研究及实证分析

Advanced Data Analysis in Neuroscience

Mastering Machine Learning with R - Second Edition

特征选择MCFS算法，来自github

云模型在文本挖掘应用中的关键问题研究

The Elements of Statistical Learning

The elements of statistical learning

Statistics for Machine Learning

Mastering+Java+Machine+Learning-Packt+Publishing(2017).epub

Practical Machine Learning Cookbook

Machine.Learning.Optimization.and.Big.Data

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

数字信号处理——保研复习资料.pdf

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar