these AdaBoost. Still, some difficulties remain. When
Decision Trees are used as component classifiers, what
should be the suitable tree size? When Radial Basis
Function (RBF) Neural Networks are used as component
classifiers, how could the complexity be controlled to avoid
overfitting? Moreover, we have to decide on the optimum
number of centers and the width of the RBFs? All of these
have to be carefully tuned for AdaB oost to achieve better
performance. Furthermore, diversity is known to be an
important factor which affects the generalization perfor-
mance of Ensemble classifiers (Melville and Mooney, 2005;
Kuncheva and Whitaker, 2003). Some methods are
proposed to quantify the diversity (Kuncheva and Whi-
taker, 2003; Windeatt, 2005). It is also known that there is
an accuracy/diversity dilemma in AdaBoost (Dietterich,
2000), which means that the more accurate the two
component classifiers become, the less they can disagree
with each other. Only when the accuracy and diversity are
well balanced, can the AdaBoost demonstrate excellent
generalization performance. However, the existing Ada-
Boost algorithms do not explicitly take sufficient measures
to deal with this problem.
Support Vector Mach ine (SVM) (Vapnik, 1998)is
developed from the theory of Structural Risk Minimiza-
tion. By using a kernel trick to map the training samples
from an input space to a high-dimensional feature space,
SVM finds an optimal separating hy perplane in the feature
space and uses a regularization parameter, C, to control its
model complexity and training error. One of the popular
kernels used in SVM is the RBF kernel, which has a
parameter known as Gaussian width, s. In contrast to the
RBF networks, SVM with the RBF kernel (RBFSVM in
short) can automatically determine the number and
location of the centers an d the weight values (Scholkopf
et al., 1997). Also, it can effectively avoid overfitting by
selecting proper values of C and s. From the performance
analysis of RBFSVM (Valentini and Dietterich, 2004), we
know that s is a more important parameter compared to C:
although RBFSVM cannot learn well when a very low
value of C is used, its performance largely depends on the s
value if a roughly suitable C is given. This means that, over
a range of suitable C, the performance of RBFSVM can be
changed by simply adjusting the value of s.
Therefore, in this paper, we try to answer the following
questions: Can the SVM be used as an effective component
classifier in AdaBoost? If yes, what will be the general-
ization performance of this AdaBoost? Will this AdaBoost
show some advantage s over the existing ones, especially on
the aforementioned problems? Furthermore, compared
with the individual SVM, what is the benefit of using an
AdaBoost as a combination of multiple SVMs? In this
paper, RBFSVM is adopted as component classifier for
AdaBoost. As mentioned above, there is a parameter s in
RBFSVM which has to be set beforehand. An intuitive way
is to simply apply a single s to all RBFSVM component
classifiers. However, we observed that this way cannot lead
to successful AdaBoost due to the over-weak or over-
strong RBFSVM component classifiers encountered in
Boosting process. Although there may exist a single best s,
we find that AdaBoost with this single best s obtained by
cross-validation cannot lead to the best generalization
performance and also doing cross-validation for it will
increase the computational load. Therefore, using a single
s in all RBFSVM component classifiers should be avoided
if possible.
The following fact opens the door for us to avoid
searching the single best s and help AdaBoost achieve even
better generalization performance. It is known that the
classification performance of RBFSVM can be conveni-
ently changed by adjusting the kernel parameter, s.
Enlightened by this, the proposed AdaBoostSVM approach
adaptively adjusts the s values in RBFSVM component
classifiers to obtain a set of moderately accurate
RBFSVMs for AdaBoost. As will be shown later, this
gives rise to a better SVM-based AdaBoost. Compared
with the existing AdaBoost approaches with Neural
Networks or Deci sion Tree component classifiers, our
proposed AdaBoostSVM can achieve better generalization
performance and it can be seen as a proof of concept of the
idea suggested by Valentini and Dietterich (2004) that
Adaboost with heterogeneous SVMs could work well.
Furthermore, compared wi th individual SVM, Ada-
BoostSVM can achieve much better generalization perfor-
mance on imbalanced data sets. We argue that in
AdaBoostSVM, the Boosting mechanism forces some
RBFSVM component classifiers to focus on the misclassi-
fied samples from the minority class, and this can prevent
the minority class from being considered as noise in the
dominant class and be wrongly class ified. This also
justifies, from another perspective, the significance of
exploring AdaBoost with SVM component classifiers.
Furthermore, since AdaBoostSVM provides a conveni-
ent way to control the classification accuracy of each
RBFSVM component classifier by sim ply adjusting the s
value, it also provides an opportunity to deal with the well-
known accuracy/diveristy dilemma in Boosting methods.
This is a happy ‘‘discovery’’ found during the investigation
of AdaBoost with RBFSVM-based component classifiers.
Through some parameter adjusting strategies, we can tune
the distributions of accuracy and diversity over these
component classifiers to ach ieve a good balance. We also
propose an improved version of AdaBoostSVM called
Diverse AdaBoostSVM in this paper. It is observed that,
benefiting from the balance between accuracy and diver-
sity, it can give better generalization performance than
AdaBoostSVM.
2. Background
2.1. AdaBoost
Given a set of training samples, AdaBoost (Schapire and
Singer, 1999) maintains a weight distribution, W , over
these samples. This dist ribution is initially set uniform.
ARTICLE IN PRESS
X. Li et al. / Engineering Applications of Artificial Intelligence 21 (2008) 785–795786
评论6
最新资源