Linear Regression in multiple variables with training set
- Multiple features/variables
+ n = number of features
+ x^(i) = input (features) of ith training example
+ (x_j)^(i) = value of feature j in ith training example
+ h(x) = theta_0 + theta_1 * x_1 + theta_2 * x_2 + ...
theta_n * x_n
+ if x_0 = 1, then vector x = [x_0; x_1; ... x_n]
+ and vector theta = [theta_0; theta_1; ... theta_n]
+ COOL THING: we can rewrite h(x) = X * theta
+ NOTE: h(x) = theta^T * x where T is transpose and x is a vector of 1 training example
+ J(theta) = (1/2m)(Sum from i = 1 to m of (h(x^(i))-y^(i))^2 )
+ <b>We want to find which theta gives us the smallest cost J(theta)! By using gradient descent.</b>
+ Remember that GRADIENT DESCENT = algorithm that lets us find the theta vector that gives us the minimal cost!!!
- Feature Scaling = technique for making gradient descent work better = make sure features are on a similar scale
+ For example, if x_1 has a range of values from 0 - 2000 and x_2 has a range of values from 0 - 5
inverse both values so x_1 = 1/(x_1) and x_2 = 1/(x_2) making both features take on a range of values from
0 < x_i < 1. Your goal is to get every feature approximately in the range -1 < x_i < 1.
- Mean normalization = technique for making gradient descent work better = replace x_i with x_i - mu_i to make
features have approximately zero mean (Do NOT apply to x_0 = 1).
+ new x_i = (x_i - mu_i)/s_i where mu_i is the average value of x_i in training set and s_i is the range of the training set of is the standard deviation of the training set.
- Picking the Learning Rate Problem:
+ if alpha is too small, there will be a slow convergence. Solution = picker larger alpha.
+ if alpha is too large, J(theta) may not decrease during each iteration of gradient decent and will not converge. Solution = picker smaller alpha.
- Polynomial Regressioin
+ if your traning data is not lineary distributed you may consider making your h_theta(x) function quadratic or cubic. This can be done even if you only have one feature. For example:
+ you currently have h_theta(x) = theta_0 + theta_1 * x_1 but your data isn't LINEAR
+ so rewrite as h_theta(x) = theta_0 + theta_1 * x_1 + theta_2 * (x_1)^2 giving you are quadratic function
- Normal Equation = method to solve for theta at once instead of iteratively with gradient descent
+ To understand the reason behind this method first image you have a quadratic equation of the form
J(theta) = a*theta^2 + b*theta + c
+ to find the value of theta that will give you the smallest value of J(theta) you would take the derivative of J(theta), set the derivative to 0, and then solve for theta.
+ This is the exact same that will be done for the cost function J(theta_0, theta_1, ..., theta_m) instead of J(theta)
+ where theta = (X^T * X)^-1 * X^T * y and can be coded as ```pinv(X' * X) * X' * y```
- When to use Normal Equation vs. Gradient Descent
+ m = # of training examples, n = # of features, alpha = learning rate in gradient descent
|When to use Normal vs. Gradient Descent|Pros |Cons |
|:-------------------------------------:|:-------------------------------------:|:-------------------------------------:|
|Gradient Descent |works well even when n is large |need to choose alpha |
| | |needs many iterations |
|Normal Equation |no need to choose alpha |slow if n is very large |
| |don't need to iterate |doesn't work for logistic regression |
+ General Rule = <b>If # of features is > 10,000 use Normal Equation, otherwise use Gradient Descent</b>
+ NOTE: the Normal equation is slow on mordern computers in 2013 when n > 10,000 because we need to compute (X^T * X)^-1 which will take O(n^3) time for inversing the matrix. If n > 10^4 then you should use gradient decent instead.
linearRegressionInMultipleVariables.rar_one more
版权申诉
35 浏览量
2022-07-13
20:31:39
上传
评论
收藏 5KB RAR 举报
周楷雯
- 粉丝: 78
- 资源: 1万+
最新资源
- 基于matlab实现串口发送接收数据 可配置端口,波特率等 发送可选择ASCII方式或HEX方式
- matlab基于BP神经网络手写字母识别(单一).zip代码9
- 基于matlab实现编写的串口调试工具,数据接收部分采用中断方式,保证了实时的数据显示
- 基于matlab实现39节点电力系统合闸角调控过程中的机组和负荷的灵敏度计算.rar
- HBase数据库性能调优
- 原生微信小程序源码 - -首字母排序选择
- 基于QT+C++开发的保卫萝卜塔防游戏+源码(毕业设计&课程设计&项目开发)
- newapp.apk
- 项目申报管理系统论文Java项目
- 8数码、α-β搜索的博弈树算法编写一字棋游戏、Fisher线性分类器、感知器算法、SVM 分类器、卷积神经网络 CNN 框架
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈