没有合适的资源?快使用搜索试试~ 我知道了~
Unsupervised Object Discovery: A Comparison (Maxplank)
需积分: 14 9 下载量 2 浏览量
2010-01-27
16:07:56
上传
评论
收藏 5.41MB PDF 举报
温馨提示
试读
19页
非监督目标识别 maxplank研究所 比较了当前的各种非监督学习技术在obejct recognition中的应用 包括plsa,LDA
资源详情
资源评论
资源推荐
Int J Comput Vis
DOI 10.1007/s11263-009-0271-8
Unsupervised Object Discovery: A Comparison
Tinne Tuytelaars ·Christoph H. Lampert ·
Matthew B. Blaschko ·Wray Buntine
Received: 28 July 2008 / Accepted: 6 July 2009
© The Author(s) 2009. This article is published with open access at Springerlink.com
Abstract The goal of this paper is to evaluate and compare
models and methods for learning to recognize basic enti-
ties in images in an unsupervised setting. In other words,
we want to discover the objects present in the images by
analyzing unlabeled data and searching for re-occurring
patterns. We experiment with various baseline methods,
methods based on latent variable models, as well as spectral
clustering methods. The results are presented and compared
both on subsets of Caltech256 and MSRC2, data sets that
are larger and more challenging and that include more ob-
ject classes than what has previously been reported in the lit-
erature. A rigorous framework for evaluating unsupervised
object discovery methods is proposed.
Keywords Object discovery · Unsupervised object
recognition · Evaluation
T. Tuytelaars (
)
ESAT-PSI, K.U. Leuven, Kasteelpark Arenberg 10, bus 2241,
Leuven 3001, Belgium
e-mail: Tinne.Tuytelaars@esat.kuleuven.be
C.H. Lampert · M.B. Blaschko
Max Planck Institute for Biological Cybernetics, Tübingen,
Germany
C.H. Lampert
e-mail: Christoph.Lampert@tuebingen.mpg.de
M.B. Blaschko
University of Oxford, Oxford, UK
e-mail: blaschko@robots.ox.ac.uk
W. Buntine
NICTA, Canberra, Australia
e-mail: Wray.Buntine@nicta.com.au
W. Buntine
Australian National University, Canberra, Australia
1 Introduction
Over the last decade, the category-level object recognition
problem has drawn considerable attention within the com-
puter vision research community and as a result, much
progress has been realized. International challenges and
benchmarking data sets such as the yearly Pascal Visual
Object Classes challenge, Caltech101 and later Caltech256
fostered this research by providing annotated data sets for
training and testing. Nevertheless, researchers soon identi-
fied the dependence on the availability of annotated training
data as a serious restriction: the expensive manual annota-
tion process hampers the extension to large numbers of ob-
ject classes (on the order of thousands or even more, taking
into account that humans easily distinguish around 30,000
object categories (Biederman 1987)). More importantly, it
forces methods to extrapolate from a relatively small num-
ber of training examples. Better performance, and especially
better recall, is to be expected if more data were available for
training. In response to this, weakly supervised as well as
fully unsupervised methods have been investigated. In this
paper, we want to experimentally validate the feasibility of
fully unsupervised methods for object recognition, also re-
ferred to as object discovery, where only unlabeled data are
provided.
Unsupervised learning can be formulated as a search for
patterns in the data above and beyond what would be con-
sidered to be pure unstructured noise. Such patterns could
be anything, and need not have a semantic interpretation.
This makes it hard to evaluate unsupervised methods be-
cause multiple solutions (interpretations) can exist and all
be equally valid as far as the data itself is concerned. Here,
we build on the approach proposed in Sivic et al. (2005):
a data set is constructed composed of images from a fixed
Int J Comput Vis
number of predefined categories. The goal is then to sepa-
rate the different categories. This approach provides ground
truth information against which the results obtained by dif-
ferent methods can be evaluated quantitatively.
In contrast to the supervised case, where systematic com-
parison on benchmarking data sets and challenges is stan-
dard practice, to date no in-depth evaluation and compari-
son of different unsupervised methods has been performed.
Instead, each paper reports results on its own data set, gener-
ated ad hoc by throwing together a selection of object cate-
gory images from one of the many annotated data sets avail-
able. Moreover, more often than not, evaluation is limited to
one or two such data sets. This may result in over-optimistic
results due to parameter fine-tuning.
Various evaluation metrics have been proposed, with
classification accuracy probably the most popular one (Sivic
et al. 2005). In this case, each discovered category is linked
(often manually) to a ground truth category. Images are then
assigned to the most probable category and evaluation is per-
formed as for a standard classification problem (with accu-
racy defined as the sum of the diagonal elements of the con-
fusion matrix). However, as the number of object categories
and/or the imbalance in the data set increases, categories can
be merged or split. We argue conditional entropy provides
a more intuitive and scalable measure for a proper evalua-
tion. We also propose a new evaluation scheme for the more
realistic setting with multiple object categories per image,
without imposing pixel-wise classification.
In this paper, a range of different methods are compared,
including baseline methods such as random assignment,
k-means clustering and principal component analysis, and
more advanced methods such as various latent variable mod-
els and spectral clustering schemes. All of these start from
the same underlying image representation: a simple bag-of-
visual-words model that describes the image in terms of a
set of quantized local image patch descriptors. We experi-
ment with several local feature detectors, various vocabu-
lary sizes, as well as different normalization schemes. More
complex image representations that include spatial config-
uration information could yield further improvements, but
fall outside the scope of this paper. Also global image rep-
resentations are not considered, as these would probably be
more sensitive to changing backgrounds, cannot deal with
multiple objects per image, and would make the study too
extensive.
We tried to avoid any kind of manual parameter tuning
or model selection for a specific data set, as this could be
considered a violation of the unsupervised character of our
methods. Instead, we selected reasonable parameters in ad-
vance and held them fixed for all of the experiments. We
report results on more than ten different test sets, each con-
taining 20 or more different object categories, always using
the same parameters.
The only parameter that we assume to be known in ad-
vance is the number of object categories in the data set. Ul-
timately, the machine should be able to decide on this too
without any human intervention, but here we have preferred
to avoid the complex issue of model selection and focus on
comparing the different models given a known number of
classes.
1.1 Related Work
Probably the first work to tackle the problem of unsu-
pervised object category discovery has been Weber et al.
(2000), building on a simplified constellation model-like
scheme.
Especially probabilistic models seem well suited for
tackling the unsupervised object discovery task and have
been studied by several authors. Sivic et al. (2005)have
proposed a method that builds on Probabilistic Latent Se-
mantic Analysis (PLSA), as proposed by Hofmann (1999),
to separate images of four distinct object categories (faces,
airplanes, rear cars, and motorbikes). They later extended
their work, experimenting with both PLSA as well as La-
tent Dirichlet Allocation (LDA) (Blei et al. 2003) and using
multiple image segments as the equivalent of documents, so
as to better localize the objects in the images (Russell et al.
2006). Very similar is the work of Tang and Lewis (2008),
except that they use non-negative matrix factorization (Lee
and Seung 1999). Liu and Chen (2007) extend the PLSA
model with the integration of a temporal model so as to dis-
cover objects in video.
Grauman and Darrell (2006) propose an alternative
method based on spectral clustering. They pay special at-
tention to separate the objects from the background or other
objects present in a single image, and also propose a semi-
supervised extension.
Finally, Kim et al. (2008) build on link analysis tech-
niques developed in the context of graph-mining and typi-
cally used in web search engines.
Recently, unsupervised object discovery methods in-
cluding spatial information (Wang and Grimson 2008;
Todorovic and Ahuja 2006) and hierarchical organization
of categories (Sivic et al. 2008;Bartetal.2008) have been
proposed as well, but these fall outside the scope of this
paper.
The rest of this paper is organized as follows. First, we
introduce our evaluation metrics, both for the case of a sin-
gle object per image as for the case of multiple objects
per image (Sect. 2). Then we give a concise description of
the image representation we use throughout this paper (see
Sect. 3). Next, in Sect. 4,wegiveanoverviewofdiffer-
ent methods for unsupervised object discovery, starting with
the baseline methods (Sect. 4.1), followed by latent variable
methods (Sect. 4.2) and spectral clustering (Sect.
4.3). In
Int J Comput Vis
Sects. 5 and 6, all these methods are applied both to sub-
sets of Caltech256 and MSRC2 data sets and compared both
quantitatively as well as qualitatively. Section 7 concludes
the paper.
2 Evaluation Metrics
Recent reviews of the literature on measures for clustering
can be found in Meila (2007), Rosenberg and Hirschberg
(2007). Standard measures for scoring clustering quality
against a known standard include purity, defined as the mean
of the maximum class probabilities for the ground truth cate-
gory labels X and obtained cluster labels Y . Given variables
(x, y) sampled from the finite discrete joint space X × Y ,
this is defined as
Purity(X|Y)=
y∈Y
p(y)max
x∈X
p(x|y).
In practice the distribution p(x,y) is unknown so it is es-
timated from the observed frequencies in a test sample, re-
sulting in an empirical purity estimate.
A second measure is the mutual information or entropy
gain
I(X|Y)=H(X)−H(X|Y), (1)
where
H(X|Y)=
y∈Y
p(y)
x∈X
p(x|y)log
1
p(x|y)
. (2)
Likewise, observed frequencies are used instead of proba-
bilities, so one has an empirical entropy gain estimate. Our
experiments have shown that purity and entropy gain are
highly correlated in their ranking of different clusterings Y .
Note we usually estimate these quantities from a test set,
say T , so when we do, we denote them as Purity
T
(X|Y)and
I
T
(X|Y).
2.1 Conditional Entropy
The use of entropy gain was originally introduced to allow
the measure to be compared across different domains. In a
single domain, the base entropy H(X) is constant, so one
can also use the conditional entropy, H(X|Y),asasimpler
measure to evaluate different algorithms for generating clus-
ters or components. This has a nice and intuitive interpreta-
tion, as it gives the average amount of uncertainty that re-
mains about X once the value of Y is known. Here we use it
to measure how much uncertainty remains in the true class
given the instances estimated topic or cluster label. Condi-
tional entropy has the following properties:
0 ≤H(X|Y) with equality iff Y determines X, (3)
H(X|Y)≤H(X) with equality
iff Y is independent of X. (4)
Roughly, assuming base 2 logarithms, one can interpret
H(X|Y) as saying that on average there remain about
2
H(X|Y)
choices for X once Y is known. Thus with 20
classes in ground truth, H(X|Y) = 1 leaves about 2/20
choices and H(X|Y)=2 leaves about 4/20 choices.
2.2 Using More Clusters
In experiments, one can soon build more clusters or compo-
nents than is known in the ground truth, that is so |X|< |Y |.
How can these subsequently be evaluated against the ground
truth? The problem with the evaluation metrics described so
far is that as |Y |increases, purity and conditional entropy get
better and better. If |Y | were allowed to go arbitrarily large,
purity would go to 1 and conditional entropy would go to 0.
But this is due to over-fitting rather than having a good clus-
tering.
A simple approach is to use an oracle to assign each dis-
covered component to its best known class, and then eval-
uate the resultant assignments as before, using purity, con-
ditional entropy or entropy gain. In practice, the oracle is
created from a data set that we will call the tuning set.Care
must be taken to ensure an unbiased evaluation and there-
fore, the tuning set of the oracle must be separate from the
test set used to estimate the probabilities p(x,y).Weuse
tuning set T
1
for the oracle and test set T
2
to estimate prob-
abilities. Probabilities estimated from either set are denoted
ˆp
T
1
() and ˆp
T
2
() respectively.
We use a mapping σ :Y → X that represents the assign-
ment of clusters to labels used in the ground truth. Each data
point (x, y) ∈ X × Y is now mapped to a value (x, σ (y)),
a co-domain we denote as X ×σ(Y). Thus any joint distrib-
ution p(x,y) on X ×Y infers a joint distribution p(x, x
) on
X ×σ(Y). We then take measures such as purity or entropy
on this inferred distribution w.r.t. the test set. Note that σ()
could be referred to as a prediction function, and x
=σ(y)
as the prediction.
Given the tuning set T
1
, estimate the best mapping, i.e.,
the oracle, by maximizing the purity/entropy as follows:
σ
purity
= argmax
σ
Purity
T
1
(X|σ (Y )), (5)
σ
entropy
= argmax
σ
H
T
1
(X|σ (Y )). (6)
Then use the test set T
2
to get an unbiased estimate of the
measures:
Purity(X|Y)≈ Purity
T
2
(X|σ
purity
(Y )), (7)
H(X|Y)≈ H
T
2
(X|σ
entropy
(Y )). (8)
Int J Comput Vis
In the case where T
1
=T
2
and |Y|=|X|, these measures are
equivalent to observed estimates of the previous versions,
and the effect of bias minimal. By keeping the tuning and
test sets different, one can see that as |Y | gets arbitrarily
large, the purity and conditional entropy measures no longer
tend towards their ideal values (one and zero respectively).
This is the effect of using unbiased estimates.
Note, with the above formulation, one can easily do
multiple train-test splits, and even perform N-way cross-
validation to get, arguably, better estimates.
Also, note that for purity, the estimates can be re-phrased
as sample-based sums:
σ
purity
= argmax
σ
(x,y)∈T
1
1
x=σ(y)
, (9)
Purity(X|Y)≈
1
|T
2
|
(x,y)∈T
2
1
x=σ
purity
(y)
. (10)
This form may be more intuitive to some people, but the
formulation as in (5) and (7) brings out the correspondence
with entropy.
2.3 Multi-Class Ground Truth
It is more realistic that each test image contains different
classes of content. For instance, one image of a city scene
might have labels including buildings, cars and trees. The
MSRC2 image set, used in Sect. 6, is such a collection.
Now we could ascribe proportions of each image to the
ground truth. Thus one image of a city scene might have
proportions including 27% buildings, 10% cars, 22% trees
and 41% other (or unknown) class. However, this is not gen-
erally considered a good measure of content, since back-
ground classes like sky or grass often cover larger parts of
the image than the actual foreground objects. Because we
do not know of any established means of assessing propor-
tions in a meaningful manner to classes in an image, we only
consider the existence or non-existence of each class. So the
ground truth for an image is the subset of classes existing in
the image.
In this case the purity measure is clearly useless since it
assumes each image belongs to one class. One can instead
replace purity by accuracy, or go to the more general mea-
sures used in such cases where multiple classes are being
assessed, precision and recall. In our case, precision shall
be taken to mean the proportion of predictions we make that
are accurate, and recall shall be the proportion of true classes
that we predict. In evaluation, we use their harmonic mean
called F 1 given by
precision∗recall
precision+recall
.
For a given image, let S denote the subset of X giving
classes that occur in the image (and thus are positive). Like-
wise, let S
denote the subset of σ(Y) giving classes that
have been predicted to occur in the image. Then for one im-
age and a particular prediction function σ(), the precision is
|S∩S
|
|S
|
and the recall is
|S∩S
|
|S|
. We wish to accumulate those
over the full test set, so quantities we estimate empirically
are instead gathered using a mean. Thus
precision
σ
=
E(|S ∩S
|)
E(|S
|)
,
recall
σ
=
E(|S ∩S
|)
E(|S|)
,
(11)
where E() is the expected value function approximated as
E
T
() by taking the mean of the quantity over a data set T .
Samples now take the form of a subset of classes (de-
noting the true classes for an image) together with either
the best component or the probability scores for the compo-
nents. Thus a sample is (S, y) where S ⊆X and y ∈ Y ,or,
in case of methods outputting a vector of component scores,
(S, q) where S ⊆X and q ∈
Y
. The prediction function σ
on Y used previously maps to the space σ(Y)=X ∪{void}
which now includes a null class. We need the null class since
some components may not correspond to true classes but
some “other”. Thus each component either maps to a known
class, or it can map to nothing at all. We arbitrarily extend
σ() to the score case q ∈
Y
by defining
σ(q) =
x :
y :σ(y)=x
q
y
> 0.01
. (12)
In this case, the predicted classes S
is given by σ(q).
The F 1 score for the prediction function σ() estimated
on the data set T is the harmonic mean of precision and
recall given earlier, which simplifies to
F 1
T
σ
(X|Y)=
E
T
(|S ∩S
|)
E
T
(|S|+|S
|)
. (13)
An unbiased estimate for F 1 using the same formulation as
for entropy is then
σ
F1
= argmax
σ
F 1
T
1
σ
(X|Y), (14)
F 1(X|Y) ≈ F 1
T
2
σ
F1
(X|Y). (15)
Notice an unbiased estimate for accuracy also follows the
model of purity given earlier:
σ
accuracy
=argmax
σ
(S,y)∈T
1
hamming(S, σ (y))
|X|
, (16)
Accuracy(X|Y)
≈
1
|T
2
|
(S,y)∈T
2
hamming(S, σ
accuracy
(y))
|X|
. (17)
Here, hamming() denotes the Hamming distance.
剩余18页未读,继续阅读
gongmingnju
- 粉丝: 2
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- AIS2024 valid
- 最入门的爬虫代码 python.docx
- 爬虫零基础入门-爬取天气预报.pdf
- 最通俗易懂的 MongoDB 非结构化文档存储数据库教程.zip
- 以mongodb为数据库的订单物流小项目.zip
- 腾讯云-mongodb数据库, 项目部署.zip
- 腾讯 APIJSON 的 MongoDB 数据库插件.zip
- 理解非关系型数据库和关系型数据库的区别.zip
- 操作简单的Mongodb网页web管理工具,基于Spring Boot2.0支持mongodb集群.zip
- tms-mongodb-web,提供访问mongodb数据的REST API和可灵活扩展的mongodb web 客户端.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0