This article presents a Support Vector Machine (SVM) like learning sys-
tem to handle multi-label problems. Such problems are usually decom-
posed into many two-class problems but the expressive power of such a
system can be weak [5, 7]. We explore a new direct approach. It is based
on a large margin ranking system that shares a lot of common proper-
ties with SVMs. We tested it on a Yeast gene functional classification
problem with positive results.
Most current work onclassification hasbeen focused on learningfrom
a set of instances that are associated with a single label (i.e., single-label classi-
fication). However, many applications, such as gene functional prediction and
text categorization, may allow the instances to be associated with multiple la-
bels simultaneously. Multi-label classification is a generalization of single-label
classification, and its generality makes it much more difficult to solve.
Despiteitsimportance, researchonmulti-labelclassificationisstilllacking.Com-
mon approaches simply learn independent binary classifiers for each label, and
do not exploit dependencies among labels. Also, several small disjuncts may ap-
pear due to the possibly large number of label combinations, and neglecting these
small disjuncts may degrade classification accuracy. In this paper we propose a
multi-label lazy associative classifier, which progressively exploits dependencies
among labels. Further, since in our lazy strategy the classification model is in-
duced on an instance-based fashion, the proposed approach can provide a better
coverage of small disjuncts. Gains of up to 24% are observed when the proposed
approach is compared against the state-of-the-art multi-label classifiers.
Multi-label problems arise in various domains such as multi-topic document categorization, pro-
tein function prediction, and automatic image annotation. One natural way to deal with such
problems is to construct a binary classifier for each label, resulting in a set of independent bi-
nary classification problems. Since multiple labels share the same input space, and the seman-
tics conveyed by different labels are usually correlated, it is essential to exploit the correlation
information contained in different labels. In this paper, we consider a general framework for ex-
tracting shared structures in multi-label classification. In this framework, a common subspace is
assumed to be shared among multiple labels. We show that the optimal solution to the proposed
formulation can be obtained by solving a generalized eigenvalue problem, though the problem is
nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and
we develop an efficient algorithm for this case. One appealing feature of the proposed frame-
work is that it includes several well-known algorithms as special cases, thus elucidating their
intrinsic relationships. We further show that the proposed framework can be extended to the
kernel-induced feature space. We have conducted extensive experiments on multi-topic web page
categorization and automatic gene expression pattern image annotation tasks, and results demon-
strate the effectiveness of the proposed formulation in comparison with several representative
algorithms.
This volume contains research papers accepted for presentation at the 1st International
Workshop on Learning from Multi-Label Data (MLD’09), which will be held in Bled,
Slovenia, at September 7, 2009 in conjunction with ECML/PKDD 2009 .
MLD’09 is devoted to multi-label learning, which is an emerging and promising
research topic of machine learning. In multi-label learning, each example is associated
with multiple labels simultaneously, which therefore encompasses traditional super-
vised learning (single-label) as its special case. Multi-label learning is related to various
machine learning paradigms, such as classification, ranking, semi-supervised learning,
active learning, multi-instance learning, dimensionality reduction, etc.
Initial attempts on multi-label learning date back to 1999 with works on multi-label
text categorization. In recent years, the task of learning from multi-label data has been
addressed by a number of methods adapted from various popular learning techniques,
such as neural networks, decision trees, k-nearest neighbors, kernel methods, ensemble
methods,etc.Moreimpressively,multi-labellearninghasmanifesteditseffectivenessin
a diversity of real-world applications, such as image/video annotation, bioinformatics,
websearchandmining,musiccategorization,collaborativetagging,directedmarketing,
etc.
Multi-label learning aims at predicting potentially multiple
labels for a given instance. Conventional multi-label learning approaches
focus on exploiting the label correlations to improve the accuracy of
the learner by building an individual multi-label learner or a combined
learner based upon a group of single-label learners. However, the gener-
alization ability of such individual learner can be weak. It is well known
that ensemble learning can effectively improve the generalization abil-
ity of learning systems by constructing multiple base learners and the
performance of an ensemble is related to the both accuracy and diver-
sity of base learners. In this paper, we study the problem of multi-label
ensemble learning. Specifically, we aim at improving the generalization
ability of multi-label learning systems by constructing a group of multi-
label base learners which are both accurate and diverse. We propose
a novel solution, called EnML, to effectively augment the accuracy as
well as the diversity of multi-label base learners. In detail, we design
two objective functions to evaluate the accuracy and diversity of multi-
label base learners, respectively, and EnML simultaneously optimizes
these two objectives with an evolutionary multi-objective optimization
method. Experiments on real-world multi-label learning tasks validate
the effectiveness of our approach against other well-established methods.
多标签分类学习,This chapter reviews past and recent work on the rapidly evolving research area
of multi-label data mining. Section 2 defines the two major tasks in learning from
multi-label data and presents a significant number of learning methods. Section 3
discusses dimensionality reduction methods for multi-label data. Sections 4 and 5
discuss two important research challenges, which, if successfully met, can signif-
icantly expand the real-world applications of multi-label learning methods: a) ex-
ploiting label structure and b) scaling up to domains with large number of labels.
Section 6 introduces benchmark multi-label datasets and their statistics, while Sec-
tion 7 presents the most frequently used evaluation measures for multi-label learn-