SVM的原理介绍_svm概念原理资源-CSDN文库

4星 · 超过85%的资源需积分: 50 48 浏览量 2011-10-09 16:11:50 上传评论 1 收藏 2.27MB PDF 举报

### 支持向量机（SVM）的基本原理与应用 #### 一、引言支持向量机（Support Vector Machine, SVM）是一种强大的监督学习算法，主要用于分类和回归问题。自1995年Vladimir Vapnik及其团队在AT&T贝尔实验室首次提出以来，SVM因其卓越的泛化能力而在机器学习领域获得了广泛的关注和应用。本文将基于给定的文章内容深入探讨SVM的基本原理、特点以及其实现方式。 #### 二、SVM的基本概念 SVM的核心思想是寻找一个最优的决策边界（称为超平面），该边界能够最大化地分隔两类数据点，并确保最大的间隔（margin）。这种最大间隔原则有助于提高模型的泛化能力，即模型能够在未见过的数据上表现良好。 ##### 1. 分类问题假设我们有两个类别C1和C2的数据集，我们需要找到一个超平面，使得C1和C2之间的距离尽可能大。这个距离被称为“间隔”，它是支持向量到超平面的最短距离的两倍。 ##### 2. 非线性可分问题在现实世界的应用中，很多情况下数据并非线性可分。为了解决这个问题，SVM引入了核技巧（Kernel Trick），通过将原始数据映射到高维空间来实现非线性分类。这种映射可以是多项式核函数、径向基函数(RBF)等。 #### 三、SVM的工作原理 ##### 1. 线性可分情况当训练数据完全线性可分时，可以通过求解一个凸优化问题来找到最优超平面。具体来说，SVM试图最小化以下目标函数： \[ \min_{\mathbf{w}, b} \frac{1}{2}\|\mathbf{w}\|^2 + C \sum_{i=1}^{n} \xi_i \] 其中，\(\mathbf{w}\)表示权重向量，\(b\)表示偏置项，\(\xi_i\)是松弛变量，用来处理不完全线性可分的情况，而\(C\)是一个正则化参数，用于平衡分类间隔和错误分类之间的关系。 ##### 2. 非线性可分情况对于非线性可分的数据，SVM利用核技巧将低维空间中的非线性关系转换为高维空间中的线性关系。常用的核函数包括： - **线性核**：\(K(x, y) = x \cdot y\) - **多项式核**：\(K(x, y) = (x \cdot y + c)^d\) - **径向基函数核**：\(K(x, y) = \exp(-\gamma \|x - y\|^2)\) 这些核函数的选择对SVM的性能有着重要的影响。 #### 四、SVM的特点 - **鲁棒性**：即使在少量样本的情况下，SVM也能保持较好的性能。 - **泛化能力强**：由于最大间隔原则，SVM具有较强的泛化能力。 - **对高维特征空间的适应性**：核技巧使得SVM能够处理高维数据。 - **稀疏性**：最终模型只依赖于支持向量，这大大减少了计算成本。 #### 五、SVM的应用案例文章提到的一个实际应用案例是光学字符识别（OCR）。SVM被证明在这种场景下比传统学习算法具有更好的性能。通过使用多项式输入变换的SVM网络，实验结果显示其具有较高的泛化能力。 #### 六、结论支持向量机是一种功能强大且实用的机器学习算法，它不仅适用于线性分类问题，还能够通过核技巧解决非线性分类问题。SVM的理论基础及其在实际应用中的优异表现使其成为许多领域的首选工具之一。对于希望深入了解并应用这一技术的研究者和实践者而言，掌握SVM的基本原理和操作技巧至关重要。

资源推荐

资源详情

资源评论

Machine Learning,

20,

273-297

(1995)

1995 Kluwer Academic

Publishers,

Boston. Manufactured

in The

Netherlands.

Support-Vector Networks

CORINNA CORTES corinna@neural.att.com

VLADIMIR

VAPNIK

vlad@neural.att.com

AT&T

Bell Labs., Holmdel,

07733,

USA

Editor:

Lorenza

Saitta

Abstract.

The

support-vector network

is a new

learning machine

for

two-group classification problems.

The

machine conceptually implements

the

following idea: input vectors

are

non-linearly mapped

to a

very high-

dimension feature

space.

this feature

space

linear

decision

surface

constructed.

Special

properties

of the

decision surface

ensures

high generalization ability

of the

learning machine.

The

idea behind

the

support-vector

network

was

previously implemented

for the

restricted

case

where

the

training

data

can be

separated

without

errors.

here extend

this

result

non-separable training data.

High generalization ability

support-vector networks utilizing polynomial input transformations

demon-

strated.

also

compare

the

performance

of the

support-vector network

various

classical

learning algorithms

that

all

took part

in a

benchmark study

Optical Character Recognition.

Keywords: pattern recognition,

efficient

learning algorithms, neural networks, radial basis

function

classifiers,

polynomial classifiers.

Introduction

than

years

ago

R.A. Fisher (Fisher, 1936) suggested

the first

algorithm

for

pattern

recognition.

considered

model

of two

normal distributed populations, N(m

EI)

and

N(m

) of n

dimensional vectors

with mean vectors

and m

and

co-variance

matrices

E1 and E2, and

showed that

the

optimal (Bayesian) solution

is a

quadratic decision

function:

the

case

where

= E

= E the

quadratic decision

function

(1)

degenerates

to a

linear

function:

estimate

the

quadratic decision

function

one has to

determine "("+

)

free

parameters.

estimate

the

linear

function

only

free

parameters have

to be

determined.

In the

case where

the

number

observations

small (say

less

than 10n

) estimating o(n

) parameters

is not

reliable. Fisher therefore recommended, even

in the

case

of EI ^ £2, to use the

linear

discriminator

function

(2)

with

£ of the

form:

where

T is

some constant

. Fisher also recommended

linear decision

function

for the

case where

the two

distributions

are not

normal. Algorithms

for

pattern recognition

274

CORTES

AND

VAPNIK

Figure

1. A

simple feed-forward perceptron with

input units,

layers

hidden units,

and 1

output

unit.

The

gray-shading

of the

vector entries reflects their numeric value.

were therefore

from

the

very beginning associated

with

the

construction

linear deci-

sion

surfaces.

1962 Rosenblatt (Rosenblatt, 1962) explored

different

kind

learning machines:

perceptrons

neural networks.

The

perceptron consists

connected neurons, where each

neuron implements

separating hyperplane,

so the

perceptron

as a

whole implements

piecewise

linear separating surface.

See

Fig.

algorithm that minimizes

the

error

on a set of

vectors

adjusting

all the

weights

the

network

was

found

Rosenblatt's time,

and

Rosenblatt

suggested

scheme

where only

the

weights

of the

output unit were adaptive. According

to the fixed

setting

of the

other

weights

the

input vectors

are

non-linearly transformed into

the

feature

space,

Z, of the

last

layer

units.

this space

linear decision

function

constructed:

adjusting

the

weights

from

the ith

hidden unit

to the

output unit

so as to

minimize some

error measure over

the

training data.

As a

result

Rosenblatt's approach, construction

decision rules

was

again associated with

the

construction

linear hyperplanes

some

space.

algorithm that allows

for all

weights

of the

neural network

adapt

order locally

minimize

the

error

on a set of

vectors belonging

to a

pattern recognition problem

was

found

1986 (Rumelhart, Hinton

Williams,

1986,1987;

Parker, 1985; LeCun, 1985) when

the

back-propagation algorithm

was

discovered.

The

solution involves

slight modification

the

mathematical model

neurons. Therefore, neural networks implement

"piece-wise

linear-type"

decision functions.

this article

construct

a new

type

learning machine,

the

so-called support-vector

network.

The

support-vector network implements

the

following idea:

maps

the

input

vectors into some high dimensional feature

space

through some non-linear mapping

chosen

priori.

this space

linear decision surface

constructed with special properties

that

ensure high generalization ability

of the

network.

276

CORTES

AND

VAPNIK

Note that this bound

does

not

explicitly contain

the

dimensionality

of the

space

separation.

follows

from

this bound, that

if the

optimal hyperplane

can be

constructed

from a

small

number

support vectors relative

to the

training

set

size

the

generalization ability

will

high—even

in an

infinite

dimensional space.

Section

5 we

will demonstrate that

the

ratio

(5)

for a

real

life

problems

can be as low as

0.03

and the

optimal hyperplane generalizes

well

in a

billion dimensional feature space.

Let

be the

optimal hyperplane

feature space.

will show, that

the

weights

W0 for the

optimal hyperplane

in the

feature space

can be

written

some linear combination

support

vectors

The

linear

decision

function

/ (z) in the

feature

space

will accordingly

be of the

form:

where

• z is the

dot-product between support vectors

and

vector

z in

feature space.

The

decision

function

can

therefore

described

as a two

layer network (Fig.

3).

However, even

if the

optimal hyperplane generalizes well

the

technical problem

of how

treat

the

high dimensional feature

space

remains.

1992

it was

shown (Boser, Guyon,

Vapnik,

1992),

that

the

order

operations

for

constructing

decision

function

can

interchanged: instead

making

non-linear transformation

of the

input vectors fol-

lowed

dot-products with support vectors

feature space,

one can first

compare

two

vectors

input

space

(by

e.g. taking their dot-product

some distance

measure),

and

then make

non-linear transformation

of the

value

of the

result (see Fig.

4).

This

en-

ables

the

construction

rich classes

decision surfaces,

for

example polynomial decision

surfaces

arbitrary

degree.

will

call

this

type

learning machine

support-vector

network3.

The

technique

support-vector networks

was first

developed

for the

restricted case

separating training data without errors.

this article

extend

the

approach

support-

vector

networks

cover

when

separation

without

error

on the

training

vectors

impossible.

With this extension

consider

the

support-vector networks

as a new

class

learning

machine,

powerful

and

universal

neural networks.

Section

5 we

will demonstrate

how

well

generalizes

for

high degree polynomial decision surfaces

(up to

order

7) in a

high

dimensional

space

(dimension 256).

The

performance

of the

algorithm

compared

that

classical learning machines e.g. linear classifiers, ^-nearest neighbors classifiers,

and

neural networks. Sections

2, 3, and 4 are

devoted

to the

major

points

of the

derivation

the

algorithm

and a

discussion

some

of its

properties. Details

of the

derivation

are

relegated

to an

appendix.

剩余24页未读，继续阅读

评论收藏

内容反馈

ElricYM

2012-12-28

挺好的，不过是英文的

LilyBrownme

粉丝: 2
资源: 5

SVM的原理介绍

SVM原理详解，通俗易懂

svm原理，最最详细的介绍

svm基础原理快速入门

weka-svm原理

svm分类器原理svm分类器原理

SVM原理讲解

SVM原理入门式介绍

SVM交叉验证原理

svm算法基本原理详解

理解svm原理书籍整合

svm结构原理及方法介绍

SVM(支持向量机)入门 (深入浅出讲解原理)

SVM从原理到实现

SVM原理详解

SVM原理[定义].pdf

svm的详细介绍

Qt上位机软件串口通讯，视频源码，免费下载

张玉生《C语言程序设计》双色版 C语言程序设计理论教材习题参考答案.pdf

代码随想录算法PDF.rar

cloud compare用户手册（中文+英文）

qt-everywhere-src-5.12.12源码

CRC8/CRC16/CRC32常见几个标准的算法及C语言实现

STM32F4 ADC采样FFT运算测试代码

Qt Qss三套样式文件 qss.zip

点云粗配准算法

C++读取excel数据

第十二届蓝桥杯大赛模拟赛（第三期）.pdf

最新资源