【免费】高斯过程回归-经典文章拜读资源-CSDN文库

需积分: 0 164 浏览量 2024-05-16 22:58:34 上传评论收藏 314KB PDF 举报

资源推荐

资源详情

资源评论

Gaussian Processes for Regression:

A Quick Introduction

M. Ebden, August 2008

Comments to mebden@gmail.com

1 MOTIVATION

Figure 1 illustrates a typical example of a prediction problem: given some noisy obser-

vations of a dependent variable at certain values of the independent variable x, what is

our best estimate of the dependent variable at a new value, x

∗

If we expect the underlying function f(x) to be linear, and can make some as-

sumptions about the input data, we might use a least-squares method to ﬁt a straight

line (linear regression). Moreover, if we suspect f(x) may also be quadratic, cubic, or

even nonpolynomial, we can use the principles of model selection to choose among the

various possibilities.

Gaussian process regression (GPR) is an even ﬁner approach than this. Rather

than claiming f(x) relates to some speciﬁc models (e.g. f(x) = mx + c), a Gaussian

process can represent f(x) obliquely, but rigorously, by letting the data ‘speak’ more

clearly for themselves. GPR is still a form of supervised learning, but the training data

are harnessed in a subtler way.

As such, GPR is a less ‘parametric’ tool. However, it’s not completely free-form,

and if we’re unwilling to make even basic assumptions about f(x), then more gen-

eral techniques should be considered, including those underpinned by the principle of

maximum entropy; Chapter 6 of Sivia and Skilling (2006) offers an introduction.

−1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

Figure 1: Given six noisy data points (error bars are indicated with vertical lines), we

are interested in estimating a seventh at x

∗

= 0.2.

arXiv:1505.02965v2 [math.ST] 29 Aug 2015

2 DEFINITION OF A GAUSSIAN PROCESS

Gaussian processes (GPs) extend multivariate Gaussian distributions to inﬁnite dimen-

sionality. Formally, a Gaussian process generates data located throughout some domain

such that any ﬁnite subset of the range follows a multivariate Gaussian distribution.

Now, the n observations in an arbitrary data set, y = {y

, . . . , y

}, can always be

imagined as a single point sampled from some multivariate (n-variate) Gaussian distri-

bution, after enough thought. Hence, working backwards, this data set can be partnered

with a GP. Thus GPs are as universal as they are simple.

Very often, it’s assumed that the mean of this partner GP is zero everywhere. What

relates one observation to another in such cases is just the covariance function, k(x, x

A popular choice is the ‘squared exponential’,

k(x, x

) = σ

exp



−(x − x

)



, (1)

where the maximum allowable covariance is deﬁned as σ

— this should be high for

functions which cover a broad range on the y axis. If x ≈ x

, then k(x, x

) approaches

this maximum, meaning f(x) is nearly perfectly correlated with f(x

). This is good:

for our function to look smooth, neighbours must be alike. Now if x is distant from

, we have instead k(x, x

) ≈ 0, i.e. the two points cannot ‘see’ each other. So, for

example, during interpolation at new x values, distant observations will have negligible

effect. How much effect this separation has will depend on the length parameter, l, so

there is much ﬂexibility built into (1).

Not quite enough ﬂexibility though: the data are often noisy as well, from measure-

ment errors and so on. Each observation y can be thought of as related to an underlying

function f(x) through a Gaussian noise model:

y = f(x) + N (0, σ

), (2)

something which should look familiar to those who’ve done regression before. Regres-

sion is the search for f (x). Purely for simplicity of exposition in the next page, we take

the novel approach of folding the noise into k(x, x

), by writing

k(x, x

) = σ

exp



−(x − x

)



+ σ

δ(x, x

), (3)

where δ(x, x

) is the Kronecker delta function. (When most people use Gaussian pro-

cesses, they keep σ

separate from k(x, x

). However, our redeﬁnition of k(x, x

) is

equally suitable for working with problems of the sort posed in Figure 1. So, given n

observations y, our objective is to predict y

∗

, not the ‘actual’ f

∗

; their expected values

are identical according to (2), but their variances differ owing to the observational noise

process. e.g. in Figure 1, the expected value of y

∗

, and of f

∗

, is the dot at x

∗

To prepare for GPR, we calculate the covariance function, (3), among all possible

combinations of these points, summarizing our ﬁndings in three matrices:

K =







k(x

, x

) k(x

, x

) · · · k(x

, x

)

k(x

, x

) k(x

, x

) · · · k(x

, x

)

k(x

, x

) k(x

, x

) · · · k(x

, x

)







(4)

∗



k(x

∗

, x

) k(x

∗

, x

) · · · k(x

∗

, x

)



∗∗

= k(x

∗

, x

∗

). (5)

Conﬁrm for yourself that the diagonal elements of K are σ

+ σ

, and that its extreme

off-diagonal elements tend to zero when x spans a large enough domain.

3 HOW TO REGRESS USING GAUSSIAN PROCESSES

Since the key assumption in GP modelling is that our data can be represented as a

sample from a multivariate Gaussian distribution, we have that



∗



∼ N





K K

∗

∗∗



, (6)

where T indicates matrix transposition. We are of course interested in the conditional

probability p(y

∗

|y): “given the data, how likely is a certain prediction for y

∗

?”. As

explained more slowly in the Appendix, the probability follows a Gaussian distribution:

∗

|y ∼ N (K

∗

−1

y, K

∗∗

− K

∗

−1

∗

). (7)

Our best estimate for y

∗

is the mean of this distribution:

∗

= K

∗

−1

y, (8)

and the uncertainty in our estimate is captured in its variance:

var(y

∗

) = K

∗∗

− K

∗

−1

∗

. (9)

We’re now ready to tackle the data in Figure 1.

1. There are n = 6 observations y, at

x =



−1.50 −1.00 −0.75 −0.40 −0.25 0.00



We know σ

= 0.3 from the error bars. With judicious choices of σ

and l (more

on this later), we have enough to calculate a covariance matrix using (4):

K =







1.70 1.42 1.21 0.87 0.72 0.51

1.42 1.70 1.56 1.34 1.21 0.97

1.21 1.56 1.70 1.51 1.42 1.21

0.87 1.34 1.51 1.70 1.59 1.48

0.72 1.21 1.42 1.59 1.70 1.56

0.51 0.97 1.21 1.48 1.56 1.70







From (5) we also have K

∗∗

= 1.70 and

∗



0.38 0.79 1.03 1.35 1.46 1.58



2. From (8) and (9), y

∗

= 0.95 and var(y

∗

) = 0.21.

3. Figure 1 shows a data point with a question mark underneath, representing the

estimation of the dependent variable at x

∗

= 0.2.

We can repeat the above procedure for various other points spread over some portion

of the x axis, as shown in Figure 2. (In fact, equivalently, we could avoid the repetition

by performing the above procedure once with suitably larger K

∗

and K

∗∗

matrices.

In this case, since there are 1,000 test points spread over the x axis, K

∗∗

would be

of size 1,000 × 1,000.) Rather than plotting simple error bars, we’ve decided to plot

∗

± 1.96

var(y

∗

), giving a 95% conﬁdence interval.

剩余12页未读，继续阅读

评论收藏

内容反馈

weixin_46621519

粉丝: 0
资源: 1

高斯过程回归-经典文章拜读

GPR-高斯过程回归-高斯函数

matlab-gpml_高斯过程回归_高斯过程_GPR预测_GPR_themselvesokc_

基于高斯过程回归(GPR)的数据回归预测，matlab代码，多变量输入模型(Matlab完整源码和数据）

高斯过程回归-fun函数

基于高斯过程回归的翼型快速设计研究

Matlab实现基于高斯过程回归(GPR)的数据多变量输入回归预测(完整源码和数据）

高斯过程回归(GPR)的多变量数据预测

高斯过程回归代码

高斯过程回归GPR代码

高斯过程 回归

_Gaussian-process-regression_GPR_高斯过程回归_高斯回归_gaussianprocess_高斯过

基于遗传算法(GA)优化高斯过程回归(GA-GPR)的数据回归预测，matlab代码，多变量输入模型 评价指标包括:R2、M

GPR-高斯过程回归,gpr高斯过程回归matlab,matlab源码.zip

高斯过程回归的代码

论文研究-一种基于高斯过程回归的图像插值算法 .pdf

高斯过程回归工具箱

高斯过程回归方法综述.

高斯过程回归for机器学习

gpml-matlab-v1_gp_高斯回归模型_高斯过程回归_gpml_机器学习预测_源码.rar

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中 支持win7+

ChatGPT教程（终极版）最全整理

博客中Kmeans以及FCM算法数据（免积分）

神经网络回归预测--气温数据集

XGBoost+LightGBM+LSTM-光伏发电量预测

hugging face的models-openai-clip-vit-large-patch14文件夹

中文短信数据集-带标签

Mathwork+Matlab+编程手册

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！

亚博K210模型训练部署

最新资源

高斯过程回归

基于遗传算法(GA)优化高斯过程回归(GA-GPR)的数据回归预测，matlab代码，多变量输入模型评价指标包括:R2、M

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中支持win7+