978-1-4673-9098-9/15/$31.00 ©2015 IEEE 591
2015 8th International Congress on Image and Signal Processing (CISP 2015)
Robustness Comparison of Clustering-Based vs.
Non-Clustering Multi-label Classifications for Image
and Video Annotations
Gulisong Nasierding
Yong Li
School of Computer Science and Technology
Xinjiang Normal University
No. 102 Xin Yi Rd, Urumqi, China 830001
gulnas9@gmail.com, liyong@live.com
Atul Sajjanhar
School of Information Technology
Deakin University
221 Burwood HWY, Burwood, VIC 3125, Australia
atul.sajjanhar@deakin.edu.au
Abstract—This paper reports robustness comparison of
clustering-based multi-label classification methods versus non-
clustering counterparts for multi-concept associated image and
video annotations. In the experimental setting of this paper, we
adopted six popular multi-label classification algorithms, two
different base classifiers for problem transformation based multi-
label classifications, and three different clustering algorithms for
pre-clustering of the training data. We conducted experimental
evaluation on two multi-label benchmark datasets: scene image
data and mediamill video data. We also employed two multi-label
classification evaluation metrics, namely, micro F1-measure and
Hamming-loss to present the predictive performance of the
classifications. The results reveal that different base classifiers
and clustering methods contribute differently to the performance
of the multi-label classifications. Overall, the pre-clustering
methods improve the effectiveness of multi-label classifications in
certain experimental settings. This provides vital information to
users when deciding which multi-label classification method to
choose for multiple-concept associated image and video
annotations.
Keywords - multi-concept; image and video annotation;
clustering based; multi-label classification; robustness comparison.
I. INTRODUCTION
Automatic image or video annotation refers to the
automatic assignation of a set of semantic keywords to
unlabeled images or video clips, which convey meaning of
those images or video contents [1-10]. Multiple concept
associated image and video annotations become popular
nowadays, in line with the growing number of people who
rely on online resources for autonomous learning and
education. During the learning process, learners have to search
and retrieve multimedia information, such as images and
videos from massive digital libraries. Due to the challenges of
representing query images or video clips using abstract
features, annotation keywords based image and video retrieval
become a wiser option[1,8]. However, such retrievals require
effective methods for automatically annotating unlabelled
images and videos in the training phase, and using annotation
keywords to search and retrieve expected images or videos.
This paper pursues fundamental research on exploration of
effective methods for automatic image and video annotations.
Automatically annotating image or video is a challenging
task when a single image or a video clip is associated with
multiple semantic concepts. Problems involved in this
approach are tackled by various methods including multi-label
classifications (MLC) [7-10]. Research findings present that a
clustering based multi-label classification (CBMLC)
framework [10] is effective for various multi-label
classification problems.
The rest of the paper is organized as follows: Section II
provides an overview of automatic image and video annotation
approaches and relevant MLC approaches. Section III
introduces the experimental setting including the experimental
setup, evaluation datasets and the MLC evaluations
measurement methods. Section IV demonstrates experimental
evaluation results and discussions. Section V draws up the
conclusion.
II. O
VERVIEW
A. Image Annotation Approaches
The process of image annotation involves a number of
stages including pre-processing, annotation, and post-
processing[1]. In a typical automatic image annotation (AIA)
system, the pre-processing stage usually involves image
segmentation and feature extraction. Images are firstly
segmented into sub-structural segments (regions or blobs).
Then, useful features are extracted from each region.
However, the image segmentation step can be ignored in some
cases where global features were used to represent images.
The major task for the annotation stage is to predict semantic
concepts for visual image contents. The annotation uses
statistical model based approaches, classification based
approaches, or integrates aspects of both approaches [1-9].
This paper is supported by National Natural Science Foundation of China
(NSFC, Project No. 61262065).