没有合适的资源?快使用搜索试试~ 我知道了~
1. Compute the negative gradient as the working response 2. Fit a regression mod
资源详情
资源评论
资源推荐
Generalized Boosted Models:
A guide to the gbm package
Greg Ridgeway
August 3, 2007
Boosting takes on various forms with different programs using different loss
functions, different base models, and different optimization schemes. The gbm
package takes the approach described in [2] and [3]. Some of the terminology
differs, mostly due to an effort to cast boosting terms into more standard sta-
tistical terminology (e.g. deviance). In addition, the gbm package implements
b oosting for models commonly used in statistics but not commonly associated
with boosting. The Cox proportional hazard model, for example, is an incred-
ibly useful model and the boosting framework applies quite readily with only
slight modification [5]. Also some algorithms implemented in the gbm package
differ from the standard implementation. The AdaBoost algorithm [1] has a
particular loss function and a particular optimization algorithm associated with
it. The gbm implementation of AdaBoost adopts AdaBoost’s exponential loss
function (its bound on misclassification rate) but uses Friedman’s gradient de-
scent algorithm rather than the original one proposed. So the main purposes of
this document is to spell out in detail what the gbm package implements.
1 Gradient boosting
This section essentially presents the derivation of b oosting described in [2]. The
gbm package also adopts the stochastic gradient boosting strategy, a small but
important tweak on the basic algorithm, described in [3].
1.1 Friedman’s gradient boosting machine
Friedman (2001) and the companion paper Friedman (2002) extended the work
of Friedman, Hastie, and Tibshirani (2000) and laid the ground work for a new
generation of boosting algorithms. Using the connection between boosting and
optimization, this new work proposes the Gradient Boosting Machine.
In any function estimation problem we wish to find a regression function,
ˆ
f(x), that minimizes the expectation of some loss function, Ψ(y, f), as shown
in (4).
ˆ
f(x) = arg min
f(x)
E
y ,x
Ψ(y, f(x))
1
Initialize
ˆ
f(x) to be a constant,
ˆ
f(x) = arg min
ρ
P
N
i=1
Ψ(y
i
, ρ).
For t in 1, . . . , T do
1. Compute the negative gradient as the working response
z
i
= −
∂
∂f(x
i
)
Ψ(y
i
, f(x
i
))|
f(x
i
)=
ˆ
f(x
i
)
(1)
2. Fit a regression model, g(x), predicting z
i
from the covariates x
i
.
3. Choose a gradient descent step size as
ρ = arg min
ρ
N
X
i=1
Ψ(y
i
,
ˆ
f(x
i
) + ρg(x
i
)) (2)
4. Update the estimate of f(x) as
ˆ
f(x) ←
ˆ
f(x) + ρg(x) (3)
Figure 1: Friedman’s Gradient Boost algorithm
2
= arg min
f(x)
E
x
h
E
y |x
Ψ(y, f(x))
x
i
(4)
We will focus on finding estimates of f(x) such that
ˆ
f(x) = arg min
f(x)
E
y |x
[Ψ(y, f(x))|x] (5)
Parametric regression models assume that f(x) is a function with a finite number
of parameters, β, and estimates them by selecting those values that minimize a
loss function (e.g. squared error loss) over a training sample of N observations
on (y, x) pairs as in (6).
ˆ
β = arg min
β
N
X
i=1
Ψ(y
i
, f(x
i
; β)) (6)
When we wish to estimate f(x) non-parametrically the task becomes more dif-
ficult. Again we can proceed similarly to [4] and modify our current estimate
of f (x) by adding a new function f(x) in a greedy fashion. Letting f
i
= f(x
i
),
we see that we want to decrease the N dimensional function
J(f ) =
N
X
i=1
Ψ(y
i
, f(x
i
))
=
N
X
i=1
Ψ(y
i
, F
i
). (7)
The negative gradient of J(f) indicates the direction of the locally greatest
decrease in J(f ). Gradient descent would then have us modify f as
ˆ
f ←
ˆ
f − ρ∇J(f ) (8)
where ρ is the size of the step along the direction of greatest descent. Clearly,
this step alone is far from our desired goal. First, it only fits f at values of
x for which we have observations. Second, it does not take into account that
observations with similar x are likely to have similar values of f (x). Both these
problems would have disastrous effects on generalization error. However, Fried-
man suggests selecting a class of functions that use the covariate information
to approximate the gradient, usually a regression tree. This line of reasoning
produces his Gradient Boosting algorithm shown in Figure 1. At each itera-
tion the algorithm determines the direction, the gradient, in which it needs to
improve the fit to the data and selects a particular model from the allowable
class of functions that is in most agreement with the direction. In the case of
squared-error loss, Ψ(y
i
, f(x
i
)) =
P
N
i=1
(y
i
−f(x
i
))
2
, this algorithm corresponds
exactly to residual fitting.
There are various ways to extend and improve upon the basic framework
suggested in Figure 1. For example, Friedman (2001) substituted several choices
3
剩余11页未读,继续阅读
柔粟
- 粉丝: 28
- 资源: 304
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 批量word文件内容替换工具1.0 (批量实现多个 Word 文档文件文字替换利器).exe
- Cartoon GUI Pack 1.2.zip
- 【数据集和代码】基于加速度传感器的步态识别行人分类实验(可做步态识别)
- 我分享个魔兽内存修改器
- Python毕业设计基于Django的网易云数据分析可视化大屏系统的设计与实现+使用说明+全部资料(优秀项目).zip
- mp3 idv2,idv1,frame分析工具
- Python毕业设计基于Django的网易云数据分析可视化大屏系统的设计与实现+使用说明+全部资料(高分项目).zip
- 人工兔优化算法ARO MATLAB源码, 应用案例为函数极值求解以及优化svm进行分类,代码注释详细,可结合自身需求进行应用
- 人才网站设计-asp.net+sql-(系统源码)
- asp.net+sql人才网站设计-含系统源码
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0