smo.zip_smo算法_支持向量机

共1个文件

pdf：1个

版权申诉

174 浏览量 2022-09-19 19:51:10 上传评论收藏 299KB ZIP 举报

支持向量机（Support Vector Machine，SVM）是一种广泛应用的监督学习模型，尤其在分类和回归问题中表现出色。SVM的基本思想是找到一个超平面，使得两类样本在这个超平面两侧的距离最大，以此来实现最优的分类效果。然而，当数据集变得非常大时，传统的SVM求解方法，如拉格朗日乘子法，计算复杂度会急剧增加，这就需要高效的优化算法，比如Sequential Minimal Optimization（SMO）算法。 SMO算法是由John Platt提出的，它是解决大型线性可分SVM问题的有效途径。该算法的核心是通过迭代的方式，每次选择两个不满足KKT条件的拉格朗日乘子进行优化，直到所有的乘子都满足条件或者达到预设的停止准则。KKT条件是优化问题的必要条件，它确保了优化问题的解同时满足原问题的约束。 SMO算法的工作流程大致分为以下几个步骤： 1. 选择一对违反KKT条件最严重的拉格朗日乘子α。 2. 确定一个新的目标函数，使得这对乘子的优化不影响其他乘子的值。 3. 使用二次规划解决这个双变量问题，更新选定的两个乘子α。 4. 检查所有乘子是否满足KKT条件，如果不满足则继续回到第一步，否则算法结束。在“smo.zip”压缩包中的“smo.pdf”文件，很可能是详细介绍了SMO算法的原理、实现过程以及可能的应用场景。这份资料对于初学者来说是一份宝贵的资源，因为它可以帮助理解SVM中复杂优化问题的解决策略，同时提供了实现SVM模型的实践指导。 SMO算法的优势在于它可以有效地处理大规模数据集，大大减少了计算时间。此外，SMO还可以与核函数结合，使得SVM可以应用于非线性可分问题，进一步拓宽了其应用范围。例如，在文本分类、图像识别、生物信息学等领域，SVM和SMO算法都发挥了重要作用。 SVM和SMO算法是机器学习领域的重要组成部分，对于希望深入理解和应用这些技术的初学者来说，掌握它们的基本概念、工作原理以及实际操作技巧至关重要。通过阅读“smo.pdf”文件，学习者可以深化对SVM和SMO的理解，提高解决实际问题的能力。

资源推荐

资源详情

资源评论

收起资源包目录

smo.zip （1个子文件）

smo.pdf 339KB

Statistics and Computing 14: 199–222, 2004



2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

A tutorial on support vector regression

∗

ALEX J. SMOLA and BERNHARD SCH

OLKOPF

RSISE, Australian National University, Canberra 0200, Australia

Alex.Smola@anu.edu.au

Max-Planck-Institut f¨ur biologische Kybernetik, 72076 T¨ubingen, Germany

Bernhard.Schoelkopf@tuebingen.mpg.de

Received July 2002 and accepted November 2003

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for

function estimation. Furthermore, we include a summary of currently used algorithms for training

SV machines, covering both the quadratic (or convex) programming part and advanced methods for

dealing with large datasets. Finally, we mention some modiﬁcations and extensions that have been

applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

Keywords: machine learning, support vector machines, regression estimation

1. Introduction

The purpose of this paper is twofold. It should serve as a self-

contained introduction to Support Vector regression for readers

new to this rapidly developing ﬁeld of research.

On the other

hand, it attempts to give an overview of recent developments in

the ﬁeld.

To this end, we decided to organize the essay as follows.

We start by giving a brief overview of the basic techniques in

Sections 1, 2 and 3, plus a short summary with a number of

ﬁgures and diagrams in Section 4. Section 5 reviews current

algorithmic techniques used for actually implementing SV

machines. This may be of most interest for practitioners.

The following section covers more advanced topics such as

extensions of the basic SV algorithm, connections between SV

machines and regularization and brieﬂy mentions methods for

carrying out model selection. We conclude with a discussion

of open questions and problems and current directions of SV

research. Most of the results presented in this review paper

already have been published elsewhere, but the comprehensive

presentations and some details are new.

1.1. Historic background

The SV algorithm is a nonlinear generalization of the Gener-

alized Portrait algorithm developed in Russia in the sixties

∗

An extended version of this paper is available as NeuroCOLT Technical Report

TR-98-030.

(Vapnik and Lerner 1963, Vapnik and Chervonenkis 1964). As

such, it is ﬁrmly grounded in the framework of statistical learn-

ing theory, or VC theory,which has been developed over the last

three decades by Vapnik and Chervonenkis (1974) and Vapnik

(1982, 1995). In a nutshell, VC theory characterizes properties

of learning machines which enable them to generalize well to

unseen data.

In its present form, the SV machine was largely developed

at AT&T Bell Laboratories by Vapnik and co-workers (Boser,

Guyon and Vapnik 1992, Guyon, Boser and Vapnik 1993, Cortes

and Vapnik, 1995, Sch¨olkopf, Burges and Vapnik 1995, 1996,

Vapnik, Golowich and Smola 1997). Due to this industrial con-

text, SV research has up to date had a sound orientation towards

real-world applications. Initial work focused on OCR (optical

character recognition). Within a short period of time, SV clas-

siﬁers became competitive with the best available systems for

both OCR and object recognition tasks (Sch¨olkopf, Burges and

Vapnik 1996, 1998a, Blanz et al. 1996, Sch¨olkopf 1997). A

comprehensive tutorial on SV classiﬁers has been published by

Burges (1998). But also in regression and time series predic-

tion applications, excellent performances were soon obtained

(M¨uller et al. 1997, Drucker et al. 1997, Stitson et al. 1999,

Mattera and Haykin 1999). A snapshot of the state of the art

in SV learning was recently taken at the annual Neural In-

formation Processing Systems conference (Sch¨olkopf, Burges,

and Smola 1999a). SV learning has now evolved into an active

area of research. Moreover, it is in the process of entering the

standard methods toolbox of machine learning (Haykin 1998,

Cherkassky and Mulier 1998, Hearst et al. 1998). Sch¨olkopf and

0960-3174



2004 Kluwer Academic Publishers

200 Smola and Sch¨olkopf

Smola (2002) contains a more in-depth overview of SVM regres-

sion. Additionally, Cristianini and Shawe-Taylor (2000) and Her-

brich (2002) provide further details on kernels in the context of

classiﬁcation.

1.2. The basic idea

Suppose we are given training data {(x

, y

),...,(x



, y



)}⊂

X × R,where X denotes the space of the input patterns (e.g.

X = R

). These might be, for instance, exchange rates for some

currency measured at subsequent days together with correspond-

ing econometric indicators. In ε-SV regression (Vapnik 1995),

our goal is to ﬁnd a function f (x) that has at most ε deviation

from the actually obtained targets y

for all the training data, and

at the same time is as ﬂat as possible. In other words, we do not

care about errors as long as they are less than ε,but will not

accept any deviation larger than this. This may be important if

youwant to be sure not to lose more than ε money when dealing

with exchange rates, for instance.

For pedagogical reasons, we begin by describing the case of

linear functions f , taking the form

f (x) =w, x+b with w ∈ X , b ∈ R (1)

where ·, ·denotes the dot product in X . Flatness in the case

of (1) means that one seeks a small w. One way to ensure this is

to minimize the norm,

i.e. w

=w, w.Wecan write this

problem as a convex optimization problem:

minimize

w

subject to



−w, x

−b ≤ ε

w, x

+b − y

≤ ε

(2)

The tacit assumption in (2) was that such a function f actually

exists that approximates all pairs (x

, y

) with ε precision, or in

other words, that the convex optimization problem is feasible.

Sometimes, however, this may not be the case, or we also may

want to allow for some errors. Analogously to the “soft mar-

gin” loss function (Bennett and Mangasarian 1992) which was

used in SV machines by Cortes and Vapnik (1995), one can in-

troduce slack variables ξ

,ξ

∗

to cope with otherwise infeasible

constraints of the optimization problem (2). Hence we arrive at

the formulation stated in Vapnik (1995).

minimize

w





i=1

(ξ

+ ξ

∗

)

subject to











−w, x

−b ≤ε + ξ

w, x

+b − y

≤ε + ξ

∗

,ξ

∗

≥0

(3)

The constant C > 0 determines the trade-off between the ﬂat-

ness of f and the amount up to which deviations larger than

ε are tolerated. This corresponds to dealing with a so called

ε-insensitive loss function |ξ|

described by

|ξ|



0if|ξ|≤ε

|ξ|−ε otherwise.

(4)

Fig. 1. The soft margin loss setting for a linear SVM (from Sch¨olkopf

and Smola, 2002)

Figure 1 depicts the situation graphically. Only the points outside

the shaded region contribute to the cost insofar, as the deviations

are penalized in a linear fashion. It turns out that in most cases

the optimization problem (3) can be solved more easily in its dual

formulation.

Moreover, as we will see in Section 2, the dual for-

mulation provides the key for extending SV machine to nonlinear

functions. Hence we will use a standard dualization method uti-

lizing Lagrange multipliers, as described in e.g. Fletcher (1989).

1.3. Dual problem and quadratic programs

The key idea is to construct a Lagrange function from the ob-

jective function (it will be called the primal objective function

in the rest of this article) and the corresponding constraints, by

introducing a dual set of variables. It can be shown that this

function has a saddle point with respect to the primal and dual

variables at the solution. For details see e.g. Mangasarian (1969),

McCormick (1983), and Vanderbei (1997) and the explanations

in Section 5.2. We proceed as follows:

L :=

w





i=1

(ξ

+ ξ

∗

) −





i=1

(η

+ η

∗

)

−





i=1

(ε + ξ

− y

+w, x

+b)

−





i=1

∗

(ε + ξ

∗

+ y

−w, x

−b) (5)

Here L is the Lagrangian and η

,η

∗

,α

∗

are Lagrange multi-

pliers. Hence the dual variables in (5) have to satisfy positivity

constraints, i.e.

(∗)

,η

(∗)

≥ 0. (6)

Note that by α

(∗)

,werefer to α

and α

∗

It follows from the saddle point condition that the partial

derivatives of L with respect to the primal variables (w, b,ξ

,ξ

∗

)

have to vanish for optimality.

∂

L =





i=1

(α

∗

− α

) = 0 (7)

∂

L = w −





i=1

(α

− α

∗

= 0 (8)

∂

(∗)

L = C −α

(∗)

− η

(∗)

= 0 (9)

A tutorial on support vector regression 201

Substituting (7), (8), and (9) into (5) yields the dual optimization

problem.

maximize











−





i, j =1

(α

− α

∗

)(α

− α

∗

)x

, x



−ε





i=1

(α

+ α

∗

) +





i=1

(α

− α

∗

)

subject to





i=1

(α

− α

∗

) = 0 and α

,α

∗

∈ [0, C]

(10)

In deriving (10) we already eliminated the dual variables η

,η

∗

through condition (9) which can be reformulated as η

(∗)

= C −

(∗)

. Equation (8) can be rewritten as follows

w =





i=1

(α

−α

∗

, thus f (x) =





i=1

(α

−α

∗

)x

, x+b. (11)

This is the so-called Support Vector expansion, i.e. w can be

completely described as a linear combination of the training

patterns x

.Inasense, the complexity of a function’s represen-

tation by SVs is independent of the dimensionality of the input

space X , and depends only on the number of SVs.

Moreover, note that the complete algorithm can be described

in terms of dot products between the data. Even when evalu-

ating f (x)weneed not compute w explicitly. These observa-

tions will come in handy for the formulation of a nonlinear

extension.

1.4. Computing b

So far we neglected the issue of computing b. The latter can be

done by exploiting the so called Karush–Kuhn–Tucker (KKT)

conditions (Karush 1939, Kuhn and Tucker 1951). These state

that at the point of the solution the product between dual variables

and constraints has to vanish.

(ε + ξ

− y

+w, x

+b) = 0

(12)

∗

(ε + ξ

∗

+ y

−w, x

−b) = 0

and

(C − α

)ξ

= 0

(13)

(C − α

∗

)ξ

∗

= 0.

This allows us to make several useful conclusions. Firstly only

samples (x

, y

) with corresponding α

(∗)

= C lie outside the ε-

insensitive tube. Secondly α

∗

= 0, i.e. there can never be a set

of dual variables α

,α

∗

which are both simultaneously nonzero.

This allows us to conclude that

ε − y

+w, x

+b ≥ 0 and ξ

= 0ifα

< C (14)

ε − y

+w, x

+b ≤ 0ifα

> 0 (15)

In conjunction with an analogous analysis on α

∗

we have

max{−ε + y

−w, x

|α

< C or α

∗

> 0}≤b ≤

min{−ε + y

−w, x

|α

> 0orα

∗

< C}

(16)

If some α

(∗)

∈ (0, C) the inequalities become equalities. See

also Keerthi et al. (2001) for further means of choosing b.

Another way of computing b will be discussed in the context

of interior point optimization (cf. Section 5). There b turns out

to be a by-product of the optimization process. Further consid-

erations shall be deferred to the corresponding section. See also

Keerthi et al. (1999) for further methods to compute the constant

offset.

A ﬁnal note has to be made regarding the sparsity of the SV

expansion. From (12) it follows that only for | f (x

) − y

|≥ε

the Lagrange multipliers may be nonzero, or in other words, for

all samples inside the ε–tube (i.e. the shaded region in Fig. 1)

the α

,α

∗

vanish: for | f (x

) − y

| <εthe second factor in

(12) is nonzero, hence α

,α

∗

has to be zero such that the KKT

conditions are satisﬁed. Therefore we have a sparse expansion

of w in terms of x

(i.e. we do not need all x

to describe w). The

examples that come with nonvanishing coefﬁcients are called

Support Vectors.

2. Kernels

2.1. Nonlinearity by preprocessing

The next step is to make the SV algorithm nonlinear. This, for

instance, could be achieved by simply preprocessing the training

patterns x

byamap  : X → F into some feature space F ,

as described in Aizerman, Braverman and Rozono´er (1964) and

Nilsson (1965) and then applying the standard SV regression

algorithm. Let us have a brief look at an example given in Vapnik

(1995).

Example 1 (Quadratic features in R

). Consider the map  :

→ R

with (x

, x

) = (x

√

, x

). It is understood

that the subscripts in this case refer to the components of x ∈ R

Training a linear SV machine on the preprocessed features would

yield a quadratic function.

While this approach seems reasonable in the particular ex-

ample above, it can easily become computationally infeasible

for both polynomial features of higher order and higher di-

mensionality, as the number of different monomial features

of degree p is (

d+p−1

), where d = dim(X ). Typical values

for OCR tasks (with good performance) (Sch¨olkopf, Burges

and Vapnik 1995, Sch¨olkopf et al. 1997, Vapnik 1995) are

p = 7, d = 28 · 28 = 784, corresponding to approximately

3.7 · 10

features.

2.2. Implicit mapping via kernels

Clearly this approach is not feasible and we have to ﬁnd a com-

putationally cheaper way. The key observation (Boser, Guyon

202 Smola and Sch¨olkopf

and Vapnik 1992) is that for the feature map of example 2.1 we

have



√

, x







√



, x





=x, x





. (17)

As noted in the previous section, the SV algorithm only depends

on dot products between patterns x

. Hence it sufﬁces to know

k(x, x



):=(x),(x



) rather than  explicitly which allows

us to restate the SV optimization problem:

maximize











−





i, j =1

(α

− α

∗

)(α

− α

∗

)k(x

, x

)

−ε





i=1

(α

+ α

∗

) +





i=1

(α

− α

∗

)

subject to





i=1

(α

− α

∗

) = 0 and α

,α

∗

∈ [0, C]

(18)

Likewise the expansion of f (11) may be written as

w =





i=1

(α

−α

∗

)(x

) and

f (x) =





i=1

(α

−α

∗

)k(x

, x) +b. (19)

The difference to the linear case is that w is no longer given ex-

plicitly. Also note that in the nonlinear setting, the optimization

problem corresponds to ﬁnding the ﬂattest function in feature

space, not in input space.

2.3. Conditions for kernels

The question that arises now is, which functions k(x, x



) corre-

spond to a dot product in some feature space F . The following

theorem characterizes these functions (deﬁned on X ).

Theorem 2 (Mercer 1909). Suppose k ∈ L

∞

) such that

the integral operator T

: L

(X ) → L

(X ),

f (·):=



k(·, x) f (x)dµ(x) (20)

is positive (here µ denotes a measure on X with µ(X ) ﬁnite

and supp(µ) = X ). Let ψ

∈ L

(X ) be the eigenfunction of T

associated with the eigenvalue λ

= 0 and normalized such that

ψ



= 1 and let ψ

denote its complex conjugate. Then

1. (λ

(T ))

∈ 

2.k(x, x



) =



j∈N

(x)ψ



) holds for almost all (x, x



where the series converges absolutely and uniformly for al-

most all (x, x



Less formally speaking this theorem means that if



X ×X

k(x, x



) f (x) f (x



) dxdx



≥ 0 for all f ∈ L

(X ) (21)

holds we can write k(x, x



)asadot product in some feature

space. From this condition we can conclude some simple rules

for compositions of kernels, which then also satisfy Mercer’s

condition (Sch¨olkopf, Burges and Smola 1999a). In the follow-

ing we will call such functions k admissible SV kernels.

Corollary 3 (Positive linear combinations of kernels). Denote

by k

, k

admissible SV kernels and c

, c

≥ 0 then

k(x, x



):= c

(x, x



) + c

(x, x



) (22)

is an admissible kernel. This follows directly from (21) by virtue

of the linearity of integrals.

More generally, one can show that the set of admissible ker-

nels forms a convex cone, closed in the topology of pointwise

convergence (Berg, Christensen and Ressel 1984).

Corollary 4 (Integrals of kernels). Let s(x, x



) be a function

on X × X such that

k(x, x



):=



s(x, z)s(x



, z) dz (23)

exists. Then k is an admissible SV kernel.

This can be shown directly from (21) and (23) by rearranging the

order of integration. We now state a necessary and sufﬁcient con-

dition for translation invariant kernels, i.e. k(x, x



):= k(x −x



)

as derived in Smola, Sch¨olkopf and M¨uller (1998c).

Theorem 5 (Products of kernels). Denote by k

and k

admis-

sible SV kernels then

k(x, x



):= k

(x, x



(x, x



) (24)

is an admissible kernel.

This can be seen by an application of the “expansion part” of

Mercer’s theorem to the kernels k

and k

and observing that

each term in the double sum



i, j

(x)ψ



)ψ

(x)ψ



)

gives rise to a positive coefﬁcient when checking (21).

Theorem 6 (Smola, Sch¨olkopf and M¨uller 1998c). Atransla-

tion invariant kernel k(x, x



) = k(x − x



) is an admissible SV

kernels if and only if the Fourier transform

F[k](ω) = (2π)

−



−iω,x

k(x)dx (25)

is nonnegative.

We will give a proof and some additional explanations to this

theorem in Section 7. It follows from interpolation theory

(Micchelli 1986) and the theory of regularization networks

(Girosi, Jones and Poggio 1993). For kernels of the dot-product

type, i.e. k(x, x



) = k(x, x



), there exist sufﬁcient conditions

for being admissible.

Theorem 7 (Burges 1999). Any kernel of dot-product type

k(x, x



) = k(x, x



) has to satisfy

k(ξ ) ≥ 0,∂

k(ξ ) ≥ 0 and ∂

k(ξ ) +ξ∂

k(ξ ) ≥ 0 (26)

for any ξ ≥ 0 in order to be an admissible SV kernel.

A tutorial on support vector regression 203

Note that the conditions in Theorem 7 are only necessary but

not sufﬁcient. The rules stated above can be useful tools for

practitioners both for checking whether a kernel is an admissible

SV kernel and for actually constructing new kernels. The general

case is given by the following theorem.

Theorem 8 (Schoenberg 1942). A kernel of dot-product type

k(x, x



) = k(x, x



) deﬁned on an inﬁnite dimensional Hilbert

space, with a power series expansion

k(t) =

∞



n=0

(27)

is admissible if and only if all a

≥ 0.

A slightly weaker condition applies for ﬁnite dimensional

spaces. For further details see Berg, Christensen and Ressel

(1984) and Smola,

Ov´ari and Williamson (2001).

2.4. Examples

In Sch¨olkopf, Smola and M¨uller (1998b) it has been shown, by

explicitly computing the mapping, that homogeneous polyno-

mial kernels k with p ∈ N and

k(x, x



) =x, x





(28)

are suitable SV kernels (cf. Poggio 1975). From this observation

one can conclude immediately (Boser, Guyon and Vapnik 1992,

Vapnik 1995) that kernels of the type

k(x, x



) = (x, x



+c)

(29)

i.e. inhomogeneous polynomial kernels with p ∈ N, c ≥ 0 are

admissible, too: rewrite k as a sum of homogeneous kernels and

apply Corollary 3. Another kernel, that might seem appealing

due to its resemblance to Neural Networks is the hyperbolic

tangent kernel

k(x, x



) = tanh(ϑ +κx, x



). (30)

By applying Theorem 8 one can check that this kernel does

not actually satisfy Mercer’s condition (Ovari 2000). Curiously,

the kernel has been successfully used in practice; cf. Scholkopf

(1997) for a discussion of the reasons.

Translation invariant kernels k(x, x



) = k(x − x



) are

quite widespread. It was shown in Aizerman, Braverman and

Rozono´er (1964), Micchelli (1986) and Boser, Guyon and Vap-

nik (1992) that

k(x, x



) = e

−

x −x





2σ

(31)

is an admissible SV kernel. Moreover one can show (Smola

1996, Vapnik, Golowich and Smola 1997) that (1

denotes the

indicator function on the set X and ⊗the convolution operation)

k(x, x



) = B

2n+1

(x − x



) with B



i=1

[

−

]

(32)

B-splines of order 2n +1, deﬁned by the 2n +1 convolution of

the unit inverval, are also admissible. We shall postpone further

considerations to Section 7 where the connection to regulariza-

tion operators will be pointed out in more detail.

3. Cost functions

So far the SV algorithm for regression may seem rather strange

and hardly related to other existing methods of function esti-

mation (e.g. Huber 1981, Stone 1985, H¨ardle 1990, Hastie and

Tibshirani 1990, Wahba 1990). However, once cast into a more

standard mathematical notation, we will observe the connec-

tions to previous work. For the sake of simplicity we will, again,

only consider the linear case, as extensions to the nonlinear one

are straightforward by using the kernel method described in the

previous chapter.

3.1. The risk functional

Let us for a moment go back to the case of Section 1.2. There, we

had some training data X :={(x

, y

),...,(x



, y



)}⊂X × R.

We will assume now, that this training set has been drawn iid

(independent and identically distributed) from some probabil-

ity distribution P(x, y). Our goal will be to ﬁnd a function f

minimizing the expected risk (cf. Vapnik 1982)

R[ f ] =



c(x, y, f (x))dP(x, y) (33)

(c(x, y, f (x)) denotes a cost function determining how we will

penalize estimation errors) based on the empirical data X.Given

that we do not know the distribution P(x, y)wecanonly use

X for estimating a function f that minimizes R[ f ]. A possi-

bleapproximation consists in replacing the integration by the

empirical estimate, to get the so called empirical risk functional

emp

[ f ]:=





i=1

c(x

, y

, f (x

)). (34)

A ﬁrst attempt would be to ﬁnd the empirical risk minimizer

:= argmin

f ∈H

emp

[ f ] for some function class H .However,

if H is very rich, i.e. its “capacity” is very high, as for instance

when dealing with few data in very high-dimensional spaces,

this may not be a good idea, as it will lead to overﬁtting and thus

bad generalization properties. Hence one should add a capacity

control term, in the SV case w

,which leads to the regularized

risk functional (Tikhonov and Arsenin 1977, Morozov 1984,

Vapnik 1982)

reg

[ f ]:= R

emp

[ f ] +

w

(35)

where λ>0isasocalled regularization constant. Many

algorithms like regularization networks (Girosi, Jones and

Poggio 1993) or neural networks with weight decay networks

(e.g. Bishop 1995) minimize an expression similar to (35).

评论收藏

内容反馈

版权申诉

APei

粉丝: 83
资源: 1万+

smo.zip_smo算法_支持向量机

smo.zip_smo matlab_smo-matlab_smo算法 MATLAB

svm smo.zip_SMO SVM_SVM_matlab_smo_算法 svm

SMO.rar_序贯_序贯算法_支持向量机_支持向量求解

SMO_linnerTEST.zip_smo_smo 分类 matlab_smo分类_smo算法_smo算法matlab

smo（支持向量机算法）

支持向量机smo算法

支持向量机SVM的smo算法

smo的matlab代码-SVM-w-SMO:使用序列最小优化(SMO)算法进行训练的支持向量机的简单实现

SMO算法_northb2t_smo算法_算法理论_支持向量机_

支持向量机

支持向量机软件包

支持向量机(SVM)算法

svmMLiA.zip_SMO算法分类_smo算法_svmMLiA_分类算法

SVM.zip_SMO算法分类_svm 分类_分类算法_升维

SVM.zip_SVM_never4nh_支持向量机、特征分类

svm.zip_libsvm 3.20_libsvm 核函数_支持向量机_核函数

SMO_smo_zip_SVM_

SMO算法_smo算法_序列二次规划_支持向量机_序列凸优化_smo_

支持向量机回归smo的matlab实现 附带测试训练数据 效果图

支持向量机中smo算法c++实现

libsvm-2.5程序代码导读.rar_libsvm matlab _smo算法_svm程序_向量机_支持向量

PMSM_SMO_atan.zip_pmsm滑膜_smo算法_反正切_滑_滑膜控制

2.支持向量机1

SupportVectorMachine：作业：机器学习-支持向量机

支持向量机(数学建模)

0.7.支持向量机1

nnsysid.zip_c++ 网络_nnsysid_人工神经网络_支持向量机

最新资源

支持向量机回归smo的matlab实现附带测试训练数据效果图