支持向量机核函数方面外文文献资源-CSDN文库

共16个文件

pdf：16个

4星 · 超过85%的资源需积分: 10 30 浏览量 2009-02-23 15:56:26 上传评论收藏 6.43MB RAR 举报

支持向量机（Support Vector Machine, SVM）是一种广泛应用于分类和回归分析的机器学习模型，尤其在模式识别、文本分类和生物信息学等领域表现出色。其核心思想是找到一个超平面，使得不同类别的样本点被最大程度地分开。在实际应用中，数据往往不是线性可分的，这时就需要引入核函数（Kernel Function）的概念。核函数是SVM中的关键技术，它允许我们将原始数据从低维空间映射到高维特征空间，在这个高维空间中找到一个线性可分的超平面。核函数的选择对SVM的性能有直接影响。常见的核函数包括： 1. **线性核函数（Linear Kernel）**：形式为`K(x, y) = x·y`，适用于线性可分的数据集，计算简单但可能不适用于复杂非线性问题。 2. **多项式核函数（Polynomial Kernel）**：形式为`K(x, y) = (γ * x·y + r)^d`，其中γ是缩放参数，r是偏置项，d是多项式的度。它可以看作是线性核函数在更高维度的推广，适用于处理线性不可分但可以通过多项式变换解决的问题。 3. **高斯核函数（RBF或Gaussian Kernel）**：也称为径向基函数，形式为`K(x, y) = exp(-γ ||x - y||^2)`，其中γ控制决策边界的宽度。高斯核函数能自动适应数据的复杂性，通常在未知数据分布的情况下表现良好，是SVM中最常用的核函数。 4. **Sigmoid核函数**：形式为`K(x, y) = tanh(γ * x·y + r)`，类似于神经网络的激活函数，但在SVM中应用较少，因为其优化过程可能比其他核函数更困难。这些外文文献可能涵盖了核函数的理论基础、选择原则、优化方法以及在不同领域的应用案例。例如，可能会探讨如何通过交叉验证选择最优的核函数参数，或者比较不同核函数在特定任务上的性能差异。此外，文献也可能涉及新的核函数设计，如组合核函数或者自定义核函数，以应对特定领域的问题。在研究这些文献时，理解核函数的数学原理和它们如何影响SVM的决策边界是关键。同时，掌握如何评估和调优核函数参数（如γ和r），以及如何在实际问题中选择合适的核函数，对于提高模型的预测性能至关重要。通过深入阅读和分析这些文献，我们可以了解当前国际上对支持向量机核函数的最新研究进展和最佳实践。

资源推荐

资源详情

资源评论

收起资源包目录

(核)1(2).rar （16个子文件）

5.pdf 229KB

4.pdf 703KB

13.pdf 291KB

8.pdf 192KB

9.pdf 488KB

14.pdf 1.35MB

10.pdf 637KB

3.pdf 238KB

6.pdf 318KB

13-1.pdf 286KB

1.pdf 349KB

11.pdf 2.37MB

2.pdf 764KB

7.pdf 672KB

15.pdf 412KB

12.pdf 123KB

Neural Networks 22 (2009) 49–57

Contents lists available at ScienceDirect

Neural Networks

journal homepage: www.elsevier.com/locate/neunet

A signal theory approach to support vector classification: The sinc kernel

James D.B. Nelson, Robert I. Damper

∗

, Steve R. Gunn, Baofeng Guo

Information: Signals, Images, Systems Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK

a r t i c l e i n f o

Article history:

Received 11 May 2006

Accepted 15 September 2008

Keywords:

Hyperspectral imaging

Parameter estimation

Regularisation

Reproducing kernel Hilbert spaces

Sequency analysis

Signal theory

Sinc kernel

Support vector machines

a b s t r a c t

Fourier-based regularisation is considered for the support vector machine classification problem over

absolutely integrable loss functions. By invoking the modest assumption that the decision function

belongs to a Paley–Wiener space, it is shown that the classification problem can be developed in the

context of signal theory. Furthermore, by employing the Paley–Wiener reproducing kernel, namely the

sinc function, it is shown that a principled and finite kernel hyper-parameter search space can be

discerned, a priori. Subsequent simulations performed on a commonly-available hyperspectral image data

set reveal that the approach yields results that surpass state-of-the-art benchmarks.

1. Introduction

An often-cited property of the support vector machine (SVM)

learning method is the existence of a unique solution. Another

very desirable attribute, namely flexibility, is readily realised by

the introduction of non-linear kernel methods. But herein lies a

conflict. Although flexibility admits richness, it also introduces

parameters, and thereby precludes uniqueness. Whether the

parameter takes the form of a scaling vector, a scaling number, or

the kernel itself, the fact remains that in the context of non-linear

support vector machines there are uncountably many solutions.

Unfortunately, the only way to determine the best solution is to

build uncountably many kernels. This is, of course, intractable.

However, when framed in the context of reproducing kernel

Hilbert spaces, it has been shown by Girosi (1998) that the

choice of kernel and parameters control the nature and degree of

regularisation that is imposed on the solution. A related issue is

that the so-called curse of dimensionality often turns out not to

have the detrimental effect that is predicted. Some recent machine

learning research has focused on finding cogent explanations for

this phenomenon. Belkin and Niyogi (2004) argue that a possible

reason is that the data lie on a sub-manifold, embedded in the

input space. Indeed, data with a large number of variables may

This research was supported by the Data and Information Fusion (DIF) Defence

Technology Centre, United Kingdom, under DTC Project 8.2, funded by the UK

Ministry of Defence and managed by General Dynamics Limited and QinetiQ.

∗

Corresponding author. Tel.: +44 0 23 8059 4577.

E-mail addresses: jn@ecs.soton.ac.uk (J.D.B. Nelson), rid@ecs.soton.ac.uk

(R.I. Damper), srg@ecs.soton.ac.uk (S.R. Gunn), bg@ecs.soton.ac.uk (B. Guo).

lie entirely in a much smaller-dimensional manifold. Knowledge

pertaining to the structure of the manifold can be used to

guide the choice of parameters, and thus the nature and degree

of regularisation. Such realisations lead to a more considered

approach: that is to ascertain, a priori, properties of the space

wherein the data lie. Although there may still exist infinitely many

solutions, the range of an empirical search could then at least be

focused upon subsets of parameters rather than all possible choices

of parameters.

We propose a principled way of reducing the infinite parameter

search space to an exhaustive and finite one. Our approach

is motivated by sampling theory, where the main goal is to

establish equivalence relations between data sequence spaces

and kernel function spaces. To this end, we employ perhaps the

most elementary function space from sampling theory, namely

the simply connected and zero-centred Paley–Wiener reproducing

kernel Hilbert space, more commonly referred to by engineers as

baseband-limited signals. For a given class of data, we show how to

estimate, a priori, a suitable kernel and parameter subspace. Smale

and Zhou (2004) have also studied the application of sampling

theory and reproducing kernel Hilbert spaces to learning theory.

They consider the least squares loss regression problem and

construct probability estimates for the sampling error. The work

reported here adds to the rather small amount of literature on this

under-explored topic.

The remainder of this paper is structured as follows. In

Section 2, the data class under consideration and its corresponding

reproducing kernel Hilbert space are constructed. Accordingly,

some necessary signal theory concepts are introduced and

discussed in Section 3, and exploited in Section 4. Finally, in

Section 5, we report the best results to date on a popular

doi:10.1016/j.neunet.2008.09.016

50 J.D.B. Nelson et al. / Neural Networks 22 (2009) 49–57

List of Symbols

R set of real numbers

N set of natural numbers

Z set of integers

X input space

H , F Hilbert spaces

k reproducing kernel function

Γ regularisation operator

sgn signum functional

ϕ informative part of data

 non-informative part of data

direct sum

Ω

∗

frequency support of ϕ

PW Paley–Wiener function space

∧

Fourier transform operator

h·, ·i inner product

· complex conjugate

cal sgn cos

sal sgn sin

µ Möbius function

n|m n divides m

·,·

Kronecker’s delta

δ(·) Dirac’s delta

∼

sequency transform operator

S sequency space

·∗· convolution operator

∗

(unknown) optimal kernel parameter

hyperspectral image data set, confirming the power and utility of

the approach.

2. Model construction

Let x

∈ X ⊆ R

, y

∈ {±1}, n ∈ N, and consider the usual

SVM classification problem

min

f ∈H

Γ f

+ C

n=1

1 − y

f (x

)

, (1)

where f , the decision function to be determined in some Hilbert

space H (X), is regularised by the operator Γ : H 7→ F that maps

the input space to the desired feature space. The resulting learned

decision function, implied by the representer theorem (Kimeldorf

& Wahba, 1970), is the solution

f =

n=1

k(x

, ·), (2)

where k is a Mercer kernel (Mercer, 1909). Herewith, the classifier

is defined by sgn f . Our main contention is that before any effort is

made to design the classifier, it is good practice, in a qualitative

sense, to attempt to discern the properties of the underlying

decision function. A natural preface, proposed here, is that the

labelling function maps d-variate data to labels via y: R

⊃ X 7→

{±1}, with

y(x) := sgn

(

ϕ(x) + (x)

)

, (3)

where the noise is modelled by , and under the assumption

that the information content ϕ, lies entirely within the space of

Paley–Wiener functions over some multi-dimensional baseband

region Ω

∗

, viz.

Ω

∗

r=1

Ω

∗

r=1



−ω

∗

π, ω

∗



That is

ϕ ∈ PW

Ω

∗

r=1



ζ ∈ L

(X) : supp ζ

∧

⊆ Ω

∗



, (4)

with supp ζ := {x ∈ X : ζ (x) 6= 0}, and where ·

∧

denotes Fourier

transformation:

∧

(ω) :=

(2π)

d/2

ζ (x)e

−ihω,xi

dx.

The condition ϕ ∈ PW

Ω

∗

restricts the behaviour of the informa-

tion content to functions of finite bandwidth around the origin.

Since the time of Hardy (1941), it has been known that the

orthogonal function for the band-limited space PW

(−Bπ,Bπ)

, with

B > 0, is that function nowadays commonly known as the sinc

kernel, defined by

sinc

(· − x) :=

sin(Bπ(· − x))

Bπ(· − x)

Indeed, Higgins (1985) has suggested that the origins of this

orthogonal system may well go as far back as Borel (1897).

Although this kernel is familiar to signal theorists and engineers,

it is a seemingly rare tool in machine learning. Kon, Raphael, and

Williams (2005) make a brief mention of it, by way of an example,

in their work on approximation estimates and statistical learning

theory.

Sugiyama and Müller (2002) use the sinc kernel, among

other choices, to demonstrate that their generalisation bound for

regression is stable with respect to kernel choice. It is perhaps

less well known that, by virtue of the following three established

results, the sinc kernel also lends itself to the regularised support

vector classification setting.

Theorem 2.1 (Self-consistency Property, Smola, Schölkopf, & Müller,

1998). Let the Mercer kernel defined by k: X × X 7→ R, and

the regularisation operator Γ : H 7→ F , be such that k(x, ξ) ≡

(Γ k)(x), (Γ k)(ξ )

. Then the SVM classification problem can be

written

min

f ∈H

Γ f

+ C

n=1

1 − y

f (x

)

as earlier (Eq. (1)).

Theorem 2.2 (Translation Invariant Kernels, Smola et al., 1998). Con-

sider a kernel, endowed with translation invariance, namely k(x, ξ ) =

k(x − ξ ), with the regularisation operator Γ : H 7→ F , defined by

hΓ f , Γ gi

(2π)

d/2

∧

(ω)g

∧

(ω)

∧

(ω)

dω.

Then k(x, ξ) ≡

(Γ k)(x), (Γ k)(ξ )

, and the self-consistency

property of Theorem 2.1 is satisfied.

Corollary 2.3. It follows from Theorem 2.2 and the work of Aron-

szajn (1950) on tensor products of reproducing kernels that the regu-

larisation term from the SVM problem is

Γ f

(2π)

d/2

r=1

Ω

∗



∧

(ω)



∧

(ω

)

dω

with ω :=

(

)

r=1

, and that

∧

(ω)

r=1

∧

(ω

)

−1

regularises the decision function f by acting as a filter, in the signal

analysis sense, on



∧



J.D.B. Nelson et al. / Neural Networks 22 (2009) 49–57 51

The unique kernel associated with the reproducing kernel Hilbert

space PW

Ω

∗

is the sinc kernel

k(x, ξ ) :=

r=1

∗

, ξ

) :=

r=1

sinc

∗

− ξ

). (5)

Given the model (3), where the information content is embedded

in the Paley–Wiener space (4), it is only sensible to constrain the

decision function to the same Paley–Wiener space. From Corol-

lary 2.3, it follows that in the Fourier domain the multiplicative

filter that acts upon



∧



∧

(ω)

Ω

∗

(ω)

r=1

Ω

∗

(ω

)

with the d-dimensional hypercuboid

Ω

(ω) :=



1 if ω ∈ Ω

0 otherwise.

(6)

In this case, since k

∧

≥ 0 holds over R

, Bochner’s theorem

(Bochner, 1959) ensures that the sinc kernel is a Mercer kernel. The

multiplicative filter regularises the decision function by penalising

the frequency content of f on R \Ω

∗

. The sinc kernel also keeps the

content over Ω

∗

unaltered. These penalisation and preservation

properties are, by definition, unique to the sinc kernel. Since

Paley–Wiener spaces are closed under addition, the representer

result (2) ensures that the decision function is restricted to PW

Ω

∗

Remark 2.4. We now see that, in the context of our work, the non-

regularised, higher-dimensional input space discussed by Belkin

and Niyogi (2004) is PW

, and the sub-manifold is PW

Ω

∗

⊆ PW

That is, in the frequency domain, the sub-manifold invoked by our

work can be described as a hypercuboid centred on the origin, and

the regularising operator is precisely the mapping Γ : PW

7→

Ω

∗

We are now left with the problem of finding an optimal hyper-

parameter set {ω

∗

}, in the sense of the SVM problem. Before

this is attempted, we propose a novel approach to elicit spectral

properties of the labelling function that employs some recently-

constructed tools from signal theory.

3. From signal theory to SVM classification

Intuitively, the labelling function y of Eq. (3) can be understood

as a piecewise-constant function that maps d-many real variables

to positive or negative unity. It can, therefore, be treated as a

square-wave function over d-variate space. To this end, we propose

the use of sequency analysis as a means to elicit some properties

of y and, consequently, the information content ϕ. Such properties

will suggest how the decision function should be regularised.

Before the analysis, it is instructive to introduce a family of

functions that has the labelling function as a member.

Definition 3.1. Let the cal and sal functions be defined by

cal

(t) := sgn cos ωt,

sal

(t) := sgn sin ωt.

Now, define the complex square-wave family as

(

cal

+ i sal

)

This definition is consistent with the construction given by

Elliot and Rao (1982), Hughes and Heron (1989), and Nelson

(2002). However, this basis, and therefore the definition of

sequency, differs from the more common Walsh–Hadamard

analysis described elsewhere, such as Beer (1981). In particular, the

Walsh–Hadamard system, defined over a dyadic grid, constitutes

an orthogonal basis. On the other hand, the system employed here

is defined over a denser, uniform grid and, as will be shown below,

it forms a biorthogonal basis. As such, it can be used to analyse the

spectral properties of functions over a more opaque domain. Now,

consider the Möbius arithmetic function µ: N 7→ {0, ±1}, given by

µ(n) :=

(

1, if n = 1

(−1)

, if n is the product of m distinct primes

0, otherwise,

which is employed here due to the utility afforded by the following

result, taken from number theory:

Lemma 3.2 (Möbius). Let µ denote the Möbius function. Then, for

m ∈ N,

n|m

µ(n) = δ

m,1

where δ

·,·

denotes the Kronecker delta. The next result, outlined

by Nelson (2001), enables us to express the labelling function in

terms of the complex square-wave family.

Proposition 3.3 (Biorthogonal Complex Square-wave System, Nel-

son 2001). The biorthogonal dual of {ψ

} is

∗

(t) :=

√

2π

m∈4Z+1

−1

µ(

int/m

Proof. We require hψ

, ψ

∗

(R)

= δ

n,j

. Since the complex square

wave ψ

is periodic, it can be expanded as the Fourier series

(t) =

√

2π

m∈4Z+1

imnt

Hence

hψ

, ψ

∗

(R)

(t)

∗

(t) dt

2π

m,`∈4Z+1

µ(

)

i(mn−j/`)t

dt.

The integral over R can be written as

lim

τ →0

π/τ

−π/τ

i(mn−j/`)x

dx = 2π lim

τ →0



sinc

−1

(mn − j/`)



= 2πδ(mn − j/`),

where the δ(·) denotes the Dirac delta generalised function, the

non-zero values of which can be found by taking mn = j/`. For

then

hψ

, ψ

∗

(R)

= δ(0)

`|j/n

µ(

Lemma 3.2 implies

`|j/n

µ(

) =



1, for j = n

0, otherwise

and, hence, the non-zero values exist when j = n. Since j = n

implies that m = 1/`, it follows that the sum over m collapses to

the sole term m = 1, and we have

hψ

, ψ

∗

(R)

= δ

n,j

δ(0).

评论收藏

内容反馈

fengjian945698076

2013-03-05

不错，就是比较多。做为外文翻译的话，比较累！

zidianwjq

粉丝: 3
资源: 4

支持向量机核函数方面外文文献

一些关于支持向量机的文献

支持向量机的外文资料~很难找的哦

支持向量机入门书籍（英文版）

支持向量机导论+中文版-英文版

支持向量机核函数部分外文文献

支持向量机算法原理，相关文献

支持向量机核函数的参数选择方法_范瑞雅

支持向量机算法的研究及其应用

支持向量机算法

支持向量机核函数优化_支持向量机_核函数_支持向量机、核函数_

支持向量机

电信设备-基于样本先验信息的支持向量机核函数选择方法及应用.zip

核方法文献呀，中英文的都有

boyuai#CS420#18_支持向量机核函数_课堂笔记1

几篇关于SVM的经典外文文献论坛上有的就不发啦-一些文献.rar

基于压缩感知的相关外文文献

摩擦补偿 低速 外文文献

仿真电路以及操作方法

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

【纯干货啊】华为IPD流程管理(完整版).pptx

信号与系统——保研复习资料.pdf

可编程语言标准IEC61131-3中文版.pdf

最新资源

摩擦补偿低速外文文献

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar