MeanShift_ARobustApproachtowardfeaturespaceanalysis资源-CSDN文库

需积分: 16 61 浏览量 2018-02-08 18:47:45 上传评论收藏 3.05MB PDF 举报

均值漂移算法（MeanShift）是一种用于特征空间分析的强大的无监督学习方法。该算法被广泛应用于计算机视觉、图像处理、模式识别以及各种数据分析领域中。MeanShift的核心思想在于通过迭代过程，寻找给定数据空间内的密度最大值点，从而实现对数据的聚类、分割等任务。为了更好地理解均值漂移算法，我们需要了解几个关键的概念和技术点：均值漂移算法不是从传统的聚类算法角度出发，比如K-means或者层次聚类等，它不依赖于预设的聚类数目。MeanShift算法基于一个给定的初始点，通过迭代地移动这个点到最近的密度较大的区域，最终达到密度最高的中心位置，即局部密度最大值点。在此过程中，该算法不需要提前知道要寻找的聚类的数量，它能够自动确定。均值漂移算法利用了核密度估计（Kernel Density Estimation, KDE）这一统计工具。核密度估计通过将数据点周围的平滑函数（或称为核函数）叠加起来，近似总体的概率密度函数。在MeanShift算法中，每一次迭代都尝试将点移动到局部的密度峰值，而密度估计就是通过核函数来完成的。核函数的选择对于算法的性能有着重要影响。在MeanShift算法中，核函数的选择需要考虑平滑性和对数据的局部适应性，常用的核函数包括高斯核等。算法通过核函数计算某个点周围的密度，然后根据这个密度来更新点的位置。 MeanShift算法具有以下特点和应用： 1. 不需要预先指定聚类数目，算法自身会发现数据中的自然聚类。 2. 核函数的选择影响了聚类的形状和大小，因此需要根据数据的特点来选择合适的核函数。 3. 适用于任意形状的聚类，能够很好地处理非球形的数据聚类问题。 4. 可以处理多维空间的数据，具有很高的灵活性。 5. MeanShift可以用来进行图像分割和分析，在图像处理中能够识别并跟踪对象。尽管MeanShift算法具有上述优势，但是它也存在一些局限性。例如，算法的计算成本相对较高，尤其是在处理大规模数据集时。此外，参数选择对结果也有很大影响，包括窗口大小、核函数等，这些参数可能需要依据具体问题进行调整和优化。此外，文章在【部分内容】中提到了一些数字和符号组合，显然这些是扫描时产生的错误。尽管这些内容无法直接用来解释MeanShift算法，但是这提醒我们在处理数据和信息时需要注意识别错误和噪声，保持信息的准确性和完整性对于任何数据分析任务都是至关重要的。均值漂移算法作为分析特征空间的一种有效工具，为我们提供了探索数据结构和模式的强大能力，尽管它在计算效率和参数调整上需要额外注意，但通过合理使用，它可以在很多领域发挥重要的作用。

资源推荐

资源详情

资源评论

Mean Shift: A Robust Approach

Toward Feature Space Analysis

Dorin Comaniciu, Member, IEEE, and Peter Meer, Senior Member, IEEE

AbstractÐA general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate

arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean

shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying

density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-

Watson estimator from kernel regression and the robust M-estimators of location is also established. Algorithms for two low-level vision

tasks, discontinuity preserving smoothing and image segmentation, are described as applications. In these algorithms, the only user

set parameter is the resolution of the analysis and either gray level or color images are accepted as input. Extensive experimental

results illustrate their excellent performance.

Index TermsÐMean shift, clustering, image segmentation, image smoothing, feature space, low-level vision.

1INTRODUCTION

OW-LEVEL computer vision tasks are misleadingly diffi-

cult. Incorrect results can be easily obtained since the

employed techniques often rely upon the user correctly

guessing the values for the tuning parameters. To improve

performance, the execution of low-level tasks should be task

driven, i.e., supported by independent high-level informa-

tion. This approach, however, requires that, first, the low-

level stage provides a reliable enough representation of the

input and that the feature extraction process be controlled

only by very few tuning parameters corresponding to

intuitive measures in the input domain.

Feature space-based analysis of images is a paradigm

which can achieve the above-stated goals. A feature space is

a mapping of the input obtained through the processing of

the data in small subsets at a time. For each subset, a

parametric representation of the feature of interest is

obtained and the result is mapped into a point in the

multidimensional space of the parameter. After the entire

input is processed, significant features correspond to denser

regions in the feature space, i.e., to clusters, and the goal of

the analysis is the delineation of these clusters.

The nature of the feature space is application dependent.

The subsets employed in the mapping can range from

individual pixels, as in the color space representation of an

image, to a set of quasi-randomly chosen data points, as in

the probabilistic Hough transform. Both the advantage and

the disadvantage of the feature space paradigm arise from

the global nature of the derived representation of the input.

On one hand, all the evidence for the presence of a

significant feature is pooled together, providing excellent

tolerance to a noise level which may render local decisions

unreliable. On the other hand, features with lesser support

in the feature space may not be detected in spite of being

salient for the task to be executed. This disadvantage,

however, can be largely avoided by either augmenting the

feature space with additional (spatial) parameters from the

input domain or by robust postprocessing of the input

domain guided by the results of the feature space analysis.

Analysis of the feature space is application independent.

While there are a plethora of published clustering techni-

ques, most of them are not adequate to analyze feature

spaces derived from real data. Methods which rely upon

a priori knowledge of the number of clusters present

(including those which use optimization of a global

criterion to find this number), as well as methods which

implicitly assume the same shape (most often elliptical) for

all the clusters in the space, are not able to handle the

complexity of a real feature space. For a recent survey of

such methods, see [29, Section 8].

In Fig. 1, a typical example is shown. The color image in

Fig. 1a is mapped into the three-dimensional L*u*v* color

space (to be discussed in Section 4). There is a continuous

transition between the clusters arising from the dominant

colors and a decomposition of the space into elliptical tiles

will introduce severe artifacts. Enforcing a Gaussian

mixture model over such data is doomed to fail, e.g., [49],

and even the use of a robust approach with contaminated

Gaussian densities [67] cannot be satisfactory for such

complex cases. Note also that the mixture models require

the number of clusters as a parameter, which raises its own

challenges. For example, the method described in [45]

proposes several different ways to determine this number.

Arbitrarily structured feature spaces can be analyzed

only by nonparametric methods since these methods do not

have embedded assumptions. Numerous nonparametric

clustering methods were described in the literature and

they can be classified into two large classes: hierarchical

clustering and density estimation. Hierarchical clustering

techniques either aggregate or divide the data based on

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002 603

. D. Comaniciu is with the Imaging and Visualization Department, Siemens

Corporate Research, 755 College Road East, Princeton, NJ 08540.

E-mail: comanici@scr.siemens.com.

. P. Meer is with the Electrical and Computer Engineering Department,

Rutgers University, 94 Brett Road, Piscataway, NJ 08854-8058.

E-mail: meer@caip.rutgers.edu.

Manuscript received 17 Jan. 2001; revised 16 July 2001; accepted 21 Nov.

2001.

Recommended for acceptance by V. Solo.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number 113483.

0162-8828/02/$17.00 ß 2002 IEEE

some proximity measure. See [28, Section 3.2] for a survey

of hierarchical clustering methods. The hierarchical meth-

ods tend to be computationally expensive and the definition

of a meaningful stopping criterion for the fusion (or

division) of the data is not straightforward.

The rationale behind the density estimation-based non-

parametric clustering approach is that the feature space can

be regarded as the empirical probability density function

(p.d.f.) of the represented parameter. Dense regions in the

feature space thus correspond to local maxima of the p.d.f.,

that is, to the modes of the unknown density. Once the

location of a mode is determined, the cluster associated

with it is delineated based on the local structure of the

feature space [25], [60], [63].

Our approach to mode detection and clustering is based on

the mean shift procedure, proposed in 1975 by Fukunaga and

Hostetler [21] and largely forgotten until Cheng's paper [7]

rekindled interest in it. In spite of its excellent qualities, the

mean shift procedure does not seem to be known in statistical

literature. While the book [54, Section 6.2.2] discusses [21], the

advantages of employing a mean shift type procedure in

density estimation were only recently rediscovered [8].

As will be proven in the sequel, a computational module

based on the mean shift procedure is an extremely versatile

tool for feature space analysis and can provide reliable

solutions for many vision tasks. In Section 2, the mean shift

procedure is defined and its properties are analyzed. In

Section 3, the procedure is used as the computational

module for robust feature space analysis and implementa-

tional issues are discussed. In Section 4, the feature space

analysis technique is applied to two low-level vision tasks:

discontinuity preserving filtering and image segmentation.

Both algorithms can have as input either gray level or color

images and the only parameter to be tuned by the user is

the resolution of the analysis. The applicability of the mean

shift procedure is not restricted to the presented examples.

In Section 5, other applications are mentioned and the

procedure is put into a more general context.

2THE MEAN SHIFT PROCEDURE

Kernel density estimation (known as the Parzen window

technique in pattern recognition literature [17, Section 4.3]) is

the most popular density estimation method. Given n data

points x

, i  1; ...;n in the d-dimensional space R

, the

multivariate kernel density estimator with kernel Kx and a

symmetric positive definite d  d bandwidth matrix H,

computed in the point x is given by

fx

i1

x  x

; 1

where

xjH j

1=2

KH

1=2

x: 2

The d-variate kernel Kx is a bounded function with

compact support satisfying [62, p. 95]

Kxdx  1 lim

kxk!1

kxk

Kx0

xKxdx  0

Kxdx  c

3

where c

is a constant. The multivariate kernel can be

generated from a symmetric univariate kernel K

x in two

different ways

x

i1

x

 K

xa

k;d

kxk; 4

where K

x is obtained from the product of the univariate

kernels and K

x from rotating K

x in R

, i.e., K

x is

radially symmetric. The constant a

1

k;d



kxkdx

assures that K

x integrates to one, though this condition

can be relaxed in our context. Either type of multivariate

kernel obeys (3), but, for our purposes, the radially

symmetric kernels are often more suitable.

We are interested only in a special class of radially

symmetric kernels satisfying

Kxc

k;d

kkxk

; 5

in which case it suffices to define the function kx called

the profile of the kernel, only for x  0. The normalization

constant c

k;d

, which makes Kx integrate to one, is

assumed strictly positive.

Using a fully parameterized H increases the complexity

of the estimation [62, p. 106] and, in practice, the bandwidth

matrix H is chosen either as diagonal H  diagh

; ...;h

,

604 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002

Fig. 1. Example of a feature space. (a) A 400  276 color image. (b) Corresponding L*u*v* color space with 110; 400 data points.

or proportional to the identity matrix H  h

I. The clear

advantage of the latter case is that only one bandwidth

parameter h>0 must be provided; however, as can be seen

from (2), then the validity of an Euclidean metric for the

feature space should be confirmed first. Employing only

one bandwidth parameter, the kernel density estimator (1)

becomes the well-known expression

fx

i1

x  x



: 6

The quality of a kernel density estimator is measured by

the mean of the square error between the density and its

estimate, integrated over the domainofdefinition. In practice,

however, only an asymptotic approximation of this measure

(denoted as AMISE) can be computed. Under the asympto-

tics, the number of data points n !1, while the bandwidth

h!0 at a rate slower than n

1

. For both types of multivariate

kernels, the AMISE measure is minimized by the Epanechni-

kov kernel [51, p. 139], [62, p. 104] having the profile

x

1  x 0  x  1

0 x>1;



7

which yields the radially symmetric kernel

x

1

d  21 kxk

kxk1

0 otherwise;



8

where c

is the volume of the unit d-dimensional sphere.

Note that the Epanechnikov profile is not differentiable at

the boundary. The profile

xexp 



x  0 9

yields the multivariate normal kernel

x2

d=2

exp 

kxk



10

for both types of composition (4). The normal kernel is often

symmetrically truncated to have a kernel with finite support.

While these two kernels will suffice for most applications

we are interested in, all the results presented below are valid

for arbitrary kernels within the conditions to be stated.

Employing the profile notation, the density estimator (6) can

be rewritten as

h;K

x

k;d

i1

x  x





: 11

The first step in the analysis of a feature space with the

underlying density fx is to find the modes of this density.

The modes are located among the zeros of the gradient

rfx0 and the mean shift procedure is an elegant way

to locate these zeros without estimating the density.

2.1 Density Gradient Estimation

The density gradient estimator is obtained as the gradient of

the density estimator by exploiting the linearity of (11)

h;K

xr

h;K

x

k;d

d2

i1

x  x

k

x  x





12

We define the function

gxk

x; 13

assuming that the derivative of the kernel profile k exists for

all x 20; 1, except for a finite set of points. Now, using

gx for profile, the kernel Gx is defined as

Gxc

g;d

g kxk



; 14

where c

g;d

is the corresponding normalization constant. The

kernel Kx was called the shadow of Gx in [7] in a slightly

different context. Note that the Epanechnikov kernel is the

shadow of the uniform kernel, i.e., the d-dimensional unit

sphere, while the normal kernel and its shadow have the same

expression.

Introducing gx into (12) yields,

h;K

x



k;d

d2

i1

 xg

x  x







k;d

d2

i1

x  x





i1

xx





i1

xx





 x

;

15

where

i1

xx





is assumed to be a positive number.

This condition is easy to satisfy for all the profiles met in

practice. Both terms of the product in (15) have special

significance. From (11), the first term is proportional to the

density estimate at x computed with the kernel G

h;G

x

g;d

i1

x  x





: 16

The second term is the mean shift

h;G

x

i1

xx





i1

xx





 x; 17

i.e., the difference between the weighted mean, using the

kernel G for weights, and x, the center of the kernel

(window). From (16) and (17), (15) becomes

h;K

x

h;G

x

k;d

g;d

h;G

x; 18

yielding

h;G

x

h;K

x

h;G

x

: 19

The expression (19) shows that, at location x, the mean shift

vector computed with kernel G is proportional to the normal-

ized density gradient estimate obtained with kernel K. The

normalization is by the density estimate in x computed with

the kernel G. The mean shift vector thus always points toward

the direction of maximum increase in the density. This is a

more general formulation of the property first remarked by

Fukunaga and Hostetler [20, p. 535], [21], and discussed in [7].

The relation captured in (19) is intuitive, the local mean is

shifted toward the region in which the majority of the

COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 605

points reside. Since the mean shift vector is aligned with the

local gradient estimate, it can define a path leading to a

stationary point of the estimated density. The modes of the

density are such stationary points. The mean shift procedure,

obtained by successive

. computation of the mean shift vector m

h;G

x,

. translation of the kernel (window) Gx by m

h;G

x,

is guaranteed to converge at a nearby point where the estimate

(11) has zero gradient, as will be shown in the next section. The

presence of the normalization by the density estimate is a

desirable feature. The regions of low-density values are of no

interest for the feature space analysis and, in such regions, the

mean shift steps are large. Similarly, near local maxima the

steps are small and the analysis more refined. The mean shift

procedure thus is an adaptive gradient ascent method.

2.2 Sufficient Condition for Convergence

Denote by fy

j1;2...

the sequence of successive locations of

the kernel G, where, from (17),

j1



i1

xx





i1

xx





j  1; 2; ... 20

is the weighted mean at y

computed with kernel G and y

is the center of the initial position of the kernel. The

corresponding sequence of density estimates computed

with kernel K, f

h;K

jg

j1;2...

, is given by

h;K

j

h;K

y

 j  1; 2...: 21

As stated by the following theorem, a kernel K that obeys

some mild conditions suffices for the convergence of the

sequences fy

j1;2...

and f

h;K

jg

j1;2...

Theorem 1. If the kernel K has a convex and monotonically

decreasing profile, the sequences y



j1;2...

and

h;K

jg

j1;2...

converge and f

h;K

jg

j1;2...

is monotoni-

cally increasing.

The proof is given in the Appendix. The theorem

generalizes the result derived differently in [13], where K

was the Epanechnikov kernel and G the uniform kernel. The

theorem remains valid when each data point x

is associated

with a nonnegative weight w

. An example of nonconver-

gence when the kernel K is not convex is shown in [10, p. 16].

The convergence property of the mean shift was also

discussed in [7, Section iv]. (Note, however, that almost all the

discussion there is concerned with the ªblurringº process in

which the input is recursively modified after each mean shift

step.) The convergence of the procedure as defined in this

paperwasattributedin [7]to the gradient ascentnature of (19).

However, as shown in [4, Section 1.2], moving in the direction

of the local gradient guarantees convergence only for

infinitesimal steps. The step size of a gradient-based algo-

rithm is crucial for the overall performance. If the step size is

toolarge,thealgorithmwilldiverge,while ifthestepsizeistoo

small, the rate of convergence may be very slow. A number of

costly procedures have been developed for step size selection

[4, p. 24]. The guaranteed convergence (as shown by

Theorem 1) is due to the adaptive magnitude of the mean

shift vector, which also eliminates the need for additional

procedures to chose the adequate step sizes. This is a major

advantage over the traditional gradient-based methods.

For discrete data, the number of steps to convergence

depends on the employed kernel. When G is the uniform

kernel, convergence is achieved in a finite number of steps

since the number of locations generating distinct mean

values is finite. However, when the kernel G imposes a

weighting on the data points (according to the distance

from its center), the mean shift procedure is infinitely

convergent. The practical way to stop the iterations is to set

a lower bound for the magnitude of the mean shift vector.

2.3 Mean Shift-Based Mode Detection

Let us denote by y

and

h;K



h;K

y

 the convergence

points of the sequences fy

j1;2...

and f

h;K

jg

j1;2...

respectively. The implications of Theorem 1 are the following.

First, the magnitude of the mean shift vector converges to

zero. Indeed, from (17) and (20) the jth mean shift vector is

h;G

y

y

j1

 y

22

and, at the limit, m

h;G

y

y

 y

 0. In other words, the

gradient of the density estimate (11) computed at y

is zero

h;K

y

0; 23

due to (19). Hence, y

is a stationary point of

h;K

. Second,

since f

h;K

jg

j1;2...

is monotonically increasing, the mean

shift iterations satisfy the conditions required by the Capture

Theorem [4, p. 45], which states that the trajectories of such

gradient methods are attracted by local maxima if they are

unique (within a small neighborhood) stationary points.

That is, once y

gets sufficiently close to a mode of

h;K

, it

converges to it. The set of all locations that converge to the

same mode defines the basin of attraction of that mode.

The theoretical observations from above suggest a

practical algorithm for mode detection:

. Run the mean shift procedure to find the stationary

points of

h;K

. Prune these points by retaining only the local

maxima.

The local maxima points are defined, according to the

Capture Theorem, as unique stationary points within some

small open sphere. This property can be tested by

perturbing each stationary point by a random vector of

small norm and letting the mean shift procedure converge

again. Should the point of convergence be unchanged (up to

a tolerance), the point is a local maximum.

2.4 Smooth Trajectory Property

The mean shift procedure employing a normal kernel has

an interesting property. Its path toward the mode follows a

smooth trajectory, the angle between two consecutive mean

shift vectors being always less than 90 degrees.

Using the normal kernel (10), the jth mean shift vector is

given by

h;N

y

y

j1

 y



i1

exp

xx





i1

exp

xx





 y

: 24

The following theorem holds true for all j  1; 2; ...,

according to the proof given in the Appendix.

606 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002

剩余16页未读，继续阅读

评论收藏

内容反馈

流浪iing

粉丝: 5
资源: 9

MeanShift_A Robust Approach toward feature space analysis

最新资源

MeanShift_A Robust Approach toward feature space analysis

Mean Shift: A Robust Approach Toward Feature Space Analysis

图像分割vc源代码使用Mean shift算法

CsharpEdison.rar_mean_图像分割算法_图像特征_图像识别

mean shift image segmentation desertations by peter meer

mean shift模糊滤波（python版）

PCL实现的MeanShift点云聚类

meanshift聚类算法_matlab_

meanshift图像平滑matlab实现

MeanShift相关论文

mean shift 图像分割

论文;用MeanShift实现了对特征空间的分解

mean shift cluster rudiment

OpenCV 4.8.0

ZXPSignLib-minimal.dll

落雪音乐自定义音源切换

OriginPro 色卡

基于FPGA的ov5640图像采集

GIF图片制作神器gifcam.exe

Video Speed Controller谷歌插件

xfeatures2d.zip

python调用DXGI实时快速截屏，是python截屏的最快版了

组态图库-精美图1000+

《数字图像处理》期末复习题库3 + 试题答案

win10下cdr_X4.X5.X6菜单.7z

vdhcoapp-linux-x86-64.tar-2.0.19

photoshop2024增效工具ICOFormat.8bi(PS ico插件)，photoshop2024等历届亲测试可用

源代码-C#与halcon通用开发框架.zip

最新资源