梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2

所需积分/C币:43 2019-02-24 17:45:56 167.88MB PDF
39
收藏 收藏
举报

梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2。详细、明了地介绍了机器学习中的相关概念、数学知识和各种经典算法。以浅显易懂的方式去讲解它,降低大家的学习门槛。因为文件较大因此分为两部分。第一部分请到我的资源页寻找下载
Linear Regression Multiple Linear Regression Example: we can use Normal Equation to find the value of the coefficients/parameters b that minimizes the cost function Size(feet) Number of Number of Age of home Price($1000) bedrooms floors (years) x x x x x example 4 2104 5 460 1416 232 y=60+6zx1+62x2+63x3+64x 1534 315 852 178 121045145 460 The inputs can be represented as an X matrix in which each row is sample and 114163240 232 each column is a dimension 115343230 18522136 178 The outputs can be represented as y matrix in which each row is a sample y=BX 8=(XX). X. y /value of 0 that minimizes the cost function can be solved by this equation Linear Regression Multiple Linear regression Use gradient Descent to find the value of parameters a that minimize the cost function J(e) Gradient Descent for multiple Linear Regression Hypothesis ha0(x)=X=60x0+0x1+62x2+63x3+…+6nx2 Parameters. 6(661…,6n) repeat until convergence i Cost Function: Simultaneously update 0, for every j=0, (6o1…,6n)= 2m (ha(x()-y(9)2 ,:=8 a6,J() Gradient Ve/(8): (6) ())x:() Gradient Vol(o) is the partial derivatives o x(0 is the jth features of ith observation the cost function with respect to the Gradient Descent parameters Ou,01…,O Repeat until convergence 0,:=01·aEm2(ha(x() (Simultaneously update 0, for every j=0, 1,. n) Linear Regression Multiple Linear regression Use gradient Descent to find the value of parameters a that minimize the cost function J(0) Gradient Descent for multiple Linear Regression Size(feet] Number of Number of Age of home Price($1000) bedrooms floors (years) x epeat until convergence t example 2104 45 1416 3 1534 315 :=6·anE=1(a(x)-y()x1 852 178 (Simultaneously update 0 for every j=0, 1, .. n ∑71(h6(x()-y)x0 Video: Gradient Descent For Multiple Variables (5mins) y=h(x)=00x+81x1+2X2+03x3+64x4 https://youtu.be/okjjoro-b5c 6z:=02-a1(h6(x()-y()x2 6:=日·a∑1(ha(x()-y()x3 0x:=04·a±∑1(h6(x()-y0)x4 Linear Regression Summary: Gradient Descent vs Normal Equation gradient descent is best used when the parameters cannot be calculated analytically(e.g. using linear algebra) and must be searched for by an optimization algorithm In Normal Equation approach the value of g that minimizes the cost function is solved directly rather than iteratively. 0=(XX).X.y Gradient Descent Normal Equation Disadvantage Advantage Need to choose learning rate No Need to choose learning rate Needs many iterations Don t need to iteration Works well when n(the number of features)is small. e.g. n=100 Advantage Works well even when n( the number of features)is Disadvantage large. e.g. n=10,000 Need to compute(XX Slow if the number of features is very large Computational Complexity The Normal Equation computes the inverse of(XX, which is a nxn matrix (where n is the number of features). The computational complexity of inverting such a matrix is typically about o (n2. 4 )to o(ns). If the number of the feature is large, the computing can be very slow ISource:https://machinelearningmastery,com/gradient-descent-for-machine-jearming 2 Source: htrp: //cs229 stanford. edu/notes/cs229-natesi. pat Linear Regression Feature Scaling One of the most important transformations you need to apply to your data is feature scaling. With few exceptions, Machine Learning algorithms don't perform well when the input numerical attributes have very different scales. When using Gradient Descent, you should ensure that all features have a similar scale, or else it will take much longer to converge For example x1= size(100-2000 feet) Because they have different scale, it would take longer time to converge x2= number of bedrooms(1-5) b b Gradient Descent with feature scaling Gradient Descent without feature scaling As you can see, on the left the Gradient Descent algorithm goes straight toward the minimum, thereby reaching it quickly, hereas on the right it first goes in a direction almost orthogonal to the direction of the global minimum, and it ends with a long march down an almost flat valley. It will eventually reach the minimum, but it will take a long time 3 2.3 Source: Hands-On Machine Learning with Scikit-Learn TensorFlow SourceofFigure:https://medium.com/@imadpha/gradient-descent-algorithm-and-its-variants-10f65280603 Linear Regression Feature Scaling How to make sure multiple features are on a similar scale There are two common ways to get all attributes to have the similar scale Min-max Scaling(many people call this Normalization): the data is scaled to a fixed range-usually o to 1 Standardization: Standardizing the features so that they are centered around O with a standard deviation of 1 Min-Max normalization Z-score normalization standardization -x min x-1 x where where x nax x1 min x is an original value x is an original value x' is the normalized value z is the standard scol u is the mean of that feature vector Sigma a is its standard deviation from the mean For example x1=5e-10 Standardization involves rescaling the features such that they have the 1800 x1=sze(100-2000feet2) properties of a standard normal distribution with a mean of zero and a x2= number of bedrooms(1-5 x2=50105 standard deviation of one 1 1Saurce:http://scikitlearnoru/stoblr/outoexampiry/preprocessingplotscalingimportancrhtm Linear Regression Create New Feature Feature/Variable creation is a process to generate a new variables/features based on existing variable(s) Previously he(x)=00+8,x1+82x2 // Housing Prices Prediction Frontages width Depth x=Width" Depth. /combine frontage and depth as new feature"Area Now he(x)=6a+61x1+6 he(x)=80+a7x / so just use one feature"area"instead of two features simplified as Picture of house: Designed by Freepik Linear Regression If the inputs are categorical features instead of number Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers 1 Roof oof Types Example y=6+61x1+2x2+日 Suppose x2 is the roof types which include Flat roof, Gable roof, gambrel Roof, Mansard roof, shed roof. Gambrel Roof Hipped Roof, Dormer Roof, etc However, Linear Regression algorithm can t work with such categorical data directly What to do next

...展开详情
试读 127P 梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2
立即下载 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 分享王者

关注 私信
上传资源赚钱or赚积分
最新推荐
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2 43积分/C币 立即下载
1/127
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第1页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第2页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第3页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第4页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第5页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第6页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第7页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第8页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第9页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第10页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第11页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第12页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第13页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第14页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第15页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第16页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第17页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第18页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第19页
梁劲机器学习笔记-全面简单Getting Started With MachineLearning (all in one)_部分2第20页

试读结束, 可继续阅读

43积分/C币 立即下载