Build Generalized Linear Models with Spark MLlib
梁堰波
Hortonworks
• Generalized Linear Models (GLMs)
• Linear Regression / Logistic Regression
• Generalized Linear Regression
• Accel erated failure time (AFT) Survival Regression
• GLMs in Spark MLlib and SparkR
• Inter nal Implementation
• Gradient Descent / L-BFGS / OWL-QN
• Weighted Least Squares (WLS) / Iteratively Reweig hted
Lea st Squares (IRLS)
• Regularization, standardization, interceptand p erformance
tips
Outline
• https://spark-summit.org/east -2016/events/g ener alized-
linear-models-in-spark -mllib-and-sparkr/
• coworker s: Xiangrui Meng, Joseph Bradley, Eric Liang, Yanbo
Liang, DB Tsai, etc al.
Background & Ref erence
Linear Regression
• !observati ons: "
#
$%
#
$ "
&
$%
&
$'$("
)
$%
)
*
• +: explanatory variab les
• ,: dependentvariable
• assumes linear relationship between +and ,:
% - "
.
/012
• minimizes the sum of the squares of the errors
34543467Z
89:
;
1
<
=
> ? %
@
A"
@
.
/?
&
&
)
@B#
• has both analytic solutions and convex optimi zation
methods
Linear Least Squ ares