IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 10, NO. 6, NOVEMBER 2013 1285
Semantic Annotation of High-Resolution Remote
Sensing Images via Gaussian Process
Multi-Instance Multilabel Learning
Keming Chen, Ping Jian, Zhixin Zhou, Jian’en Guo, and Daobing Zhang
Abstract—This letter presents a hierarchical semantic multi-
instance multilabel learning (MIML) framework for high-
resolution (HR) remote sensing image annotation via Gaussian
process (GP). The proposed framework can not only represent
the ambiguities between image contents and semantic labels but
also model the hierarchical semantic relationships contained in
HR remote sensing images. Moreover, it is flexible to incorporate
prior knowledge in HR images into the GP framework which gives
a quantitative interpretation of the MIML prediction problem in
turn. Experiments carried out on a real HR remote sensing image
data set validate that the proposed approach compares favorably
to the state-of-the-art MIML methods.
Index Terms—Annotation, Gaussian process (GP), hierarchical
semantic, high resolution (HR), multi-instance multilabel learning
(MIML).
I. INTRODUCTION
S
EMANTIC annotation of remote sensing images is to
assign one or several predefined semantic concepts to a
remote sensing image with a learning step where the examples
for each concept are given by the user. It is the basis of remote
sensing image indexing for organizing and locating images of
interest from a large database. With the image data delivered by
satellite sensors even as much as several terabytes every day, the
huge quantities of data volumes make the direct access to these
images more difficult. For such a purpose, human efforts are
usually needed to annotate them manually which is expansive
and unpractical. Therefore, it is urgent to develop techniques
which can effectively and efficiently annotate images through
their information content at the semantic level.
Manuscript received September 4, 2012; revised November 4, 2012; ac-
cepted November 28, 2012. Date of publication March 7, 2013; date of current
version October 10, 2013. K. Chen was supported by the National Key Basic
Research and Development Program of China under Grant 2010CB327906.
P. Jian was supported in part by the Open Project Program of the National
Laboratory of Pattern Recognition and in part by the National Natural Science
Foundation of China under Grant 61202244.
K. Chen is with the Key Laboratory of Technology in GeoSpatial
Information Processing and Application System, Institute of Electronics,
Chinese Academy of Sciences, Beijing 100190, China, and also with
the Beijing Institute of Remote Sensing, Beijing 100192, China (e-mail:
kmchen.ie@gmail.com).
P. Jian is with the Department of Computer Science, Beijing Institute of
Technology, Beijing 100081, China (e-mail: pingjian0121@gmail.com).
Z. Zhou and J. Guo are with the Beijing Institute of Remote Sensing, Beijing
100192, China (e-mail: zhixin.zhou@ia.ac.cn; jianen.guo@gmail.com).
D. Zhang is with the Key Laboratory of Technology in GeoSpatial Infor-
mation Processing and Application System, Institute of Electronics, Chinese
Academy of Sciences, Beijing 100190, China (e-mail: db.zhang@ie.ac.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LGRS.2012.2237502
Remote sensing image annotation can be regarded as a
typical multiclass image classification with amounts of prede-
fined semantic classes [1]–[3]. However, with the recent ad-
vance of satellite sensing technology (e.g., Ikonos, QuickBird,
GeoEye-1, and WorldView-2), the abundant geometrical de-
tails in high-resolution (HR) scenes completely change the
perspective of traditional remote sensing image annotation. It
is no longer accurate to regard an HR image as one indiscrete
entity with only a single semantic label. In fact, with more
and more ample information obtained from an HR image, the
image contains multiple semantic meanings, and multiple labels
simply arise from image segments rather than the whole image.
The multi-instance multi-label learning (MIML) framework [4]
provides a general paradigm for multiple-label classification.
In MIML, images are represented as bags, and instances cor-
respond to segmented regions in the images. A bag which is
composed of a set of instances is labeled negative if and only
if it contains all negative instances. A positive bag, however,
is defined if at least one of its instances in it is positive.
This formulation is extremely useful in HR image annotation
tasks for which image semantic labels are freely available or
cheaply obtained, but the target concept is represented by only
a few segmented regions. Zhou and Zhang [4] proposed two
MIML algorithms, multi-instance multi-label Boost (MIML-
BOOST) and multi-instance multi-label support vector machine
(MIML-SVM), respectively, with application to natural scene
classification. In [5] and [6], Gaussian processes (GPs) were
adopted for MIML. The two methods allow incorporating bag
class likelihood models into the GP framework and yield non-
parametric probabilistic interpretation of the MIML prediction.
In [7], a multiple-instance regression model was investigated
and applied to the remote sensing field.
Although MIML algorithms are quite effective for a variety
of situations, most approaches neglect the hierarchical semantic
correlation among the instances in the bag to some degree.
Additionally, MIML treats all the bags identically and does
not care which instance is dominant. A remote sensing image
usually reflects a large region of physiognomy on the earth, but
the interesting object is just a small patch. Thus, HR remote
sensing image interpretation is dominantly determined by im-
age segments and tightly related with the spatial scale. For ex-
ample, we can annotate an HR image as “urban” and “country”
from a lower resolution scale, while we separate the “building”
in city from “road” at higher scale. Moreover, these semantic
regions usually have strictly hierarchical relationships. Taking
“farmland” for instance, it will not be a subject of “sea” or
1545-598X © 2013 IEEE