cs231n课程课件

斯坦福大学cs231n课程课件，卷积神经网络与图像识别。
1231/2016 CS231n Convolutional Neural Networks for Visual Recognition Interpreting a linear classifier Notice that a linear classifier computes the score of a class as a weighted sum of all of its pixel values across all 3 of its color channels. Depending on precisely what values we set for these weights, the function has the capacity to like or dislike(depending on the sign of each weight certain colors at certain positions in the image. For instance, you can imagine that the"ship"class might be more likely if there is a lot of blue on the sides of an image(which could likely correspond to water). You might expect that the"ship"classifier would then have a lot of positive weights across its blue channel weights(presence of blue increases score of ship), and negative weights in the red/green channels(presence of red/green decreases the score of ship stretch pixels into single column 020.50.12.0 56 96.8 cat score 15132100231+32 437.9 dog score 00.25020.3 24 12 input image 61.95 ship score W f(i; W, b i An example of mapping an image to class scores For the sake of visualization, we assume the image only has 4 pixels(4 monochrome pixels, we are not considering color channels in this example for brevity ), and that we have 3 classes(red(cat), green(dog), blue(ship) class).(Clarification: in particular, the colors here simply indicate 3 classes and are not related to the RGB channels. )We stretch the image pixels into a column and perform matrix multiplication to get the scores for each class. Note that this particular set of weights W is not good at all: the weights assign our cat image a very low cat score. In particular, this set of weights seems convinced that it's looking at a dog Analogy of images as highdimensional points. Since the images are stretched into high dimensional column vectors, we can interpret each image as a single point in this space (e.g each image in CIFAR10 is a point in 3072dimensional space of 32X 3 pixels ). Analogously, the entire dataset is a(labeled set of points Since we defined the score of each class as a weighted sum of all image pixels, each class score is a linear function over this space. We cannot visualize 3072dimensional spaces, but if we imagine squashing all those dimensions into only two dimensions, then we can try to visualize what the classifier might be doing tt p: c$231n.github. o/linearclassifyl 3/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition car classifier 0 airplane classifier/ deer classifier Cartoon representation of the image space, where each image is a single point, and three classifiers are visualized. Using the example of the car classifier(in red), the red line shows all points in the space that get a score of zero for the car class. The red arrow shows the direction of increase, so all points to the right of the red line have positive(and linearly increasing) scores, and all points to the left have a negative(and linearly decreasing) scores As We saw above, every row of w is a classifier for one of the classes. The geometric interpretation of these numbers is that as we change one of the rows of w, the corresponding line in the pixel space will rotate in different directions. the biases b, on the other hand, allow our classifiers to translate the lines. In particular, note that without the bias terms, plugging in i=0 would always give score of zero regardless of the weights, so all lines would be forced to cross the origin Interpretation of linear classifiers as template matching. Another interpretation for the weights w is that each row of w corresponds to a template or sometimes also called a prototype)for one of the classes. The score of each class for an image is then obtained by comparing each template with the image using an inner product (or dot product) one by one to find the one that fits"best. With this terminology, the linear classifier is doing template matching where the templates are learned. Another way to think of it is that we are still effectively doing Nearest tt p: c$231n.github. o/linearclassifyl 4/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition Neighbor, but instead of having thousands of training images we are only using a single image per class (although we will learn it, and it does not necessarily have to be one of the images in the training set), and we use the(negative) inner product as the distance instead of the l1 or L2 distance gda cat deet horse sh甲p truck Skipping ahead a bit: Example learned weights at the end of learning for CIFAR 10. Note that, for example, the ship template contains a lot of blue pixels as expected. this template will therefore give a high score once it is matched against images of ships on the ocean with an inner product Additionally, note that the horse template seems to contain a twoheaded horse, which is due to both left and right facing horses in the dataset. The linear classifier merges these two modes of horses in the data into a single template. Similarly, the car classifier seems to have merged several modes into a single template which has to identify cars from all sides and of all colors. In particular, this template ended up being red, which hints that there are more red cars in the CIFAR10 dataset than of any other color. The linear classifier is too weak to properly account for differentcolored cars, but as we will see later neural networks will allow us to perform this task ooking ahead a bit, a neural network will be able to develop intermediate neurons in its hidden layers that could detect specific car types(e.g. green car facing left, blue car facing front, etc ) and neurons on the next layer could combine these into a more accurate car score through a weighted sum of the individual car detectors Bias trick. Before moving on we want to mention a common simplifying trick to representing the two parameters w, b as one. Recall that we defined the score function as f(xi, w,b=Wxit b As we proceed through the material it is a little cumbersome to keep track of two sets of parameters(the biases b and weights W) separately. a commonly used trick is to combine the two sets of parameters into a single matrix that holds both of them by extending the vector x with one additional dimension that always holds the constant 1a default bias dimension With the extra dimension, the new score function will simplify to a single matrix multiply f(xi, w)=wxi With our CIFAR10 example, x i is now [3073X 1] instead of [3072 x 1](with the extra dimension holding the constant 1), and w is now [10 x 3073] instead of [10 x 3072]. The extra column that w now corresponds to the bias b. An illustration might help clarify tt p: c$231n.github. o/linearclassifyl 5/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition 020.50.120 6 1.1 0.20.50.12.0‖1.1 56 15132100231+321513210032231 00250.20.3 24 1.2 00.250.20.312 24 2 W 2 new, single W Ilustration of the bias trick. Doing a matrix multiplication and then adding a bias vector(left) is equivalent to adding a bias dimension with a constant of 1 to all input vectors and extending the weight matrix by 1 columna bias column(right). Thus, if we preprocess our data by appending ones to all vectors we only have to learn a single matrix of weights instead of two matrices that hold the weights and the biases Image data preprocessing. As a quick note, in the examples above we used the raw pixel values (Which range from [0..255) In Machine Learning, it is a very common practice to always perform normalization of your input features (in the case of images, every pixel is thought of as a feature). In particular, it is important to center your data by subtracting the mean from every feature. In the case of images, this corresponds to computing a mean image across the training images and subtracting it from every image to get images where the pixels range from approximately [127... 127]. Further common preprocessing is to scale each input feature so that its values range from[ 1, 1]. of these, zero mean centering is arguably more important but we will have to wait for its justification until we understand the dynamics of gradient descent Loss function In the previous section we defined a function from the pixel values to class scores, which was parameterized by a set of weights W. Moreover, we saw that we don't have control over the data (xi, yi)(it is fixed and given), but we do have control over these weights and we want to set them so that the predicted class scores are consistent with the ground truth labels in the training data For example, going back to the example image of a cat and its scores for the classes cat dog and"", we saw that the particular set of weights in that example was not very good at all: We fed in the pixels that depict a cat but the cat score came out very low(96. 8)compared to the other classes(dog score 437.9 and ship score 61.95). We are going to measure our unhappiness with outcomes such as this one with a loss function (or sometimes also referred to as the cost function or the objective). Intuitively, the loss will be high if we're doing a poor job of classifying the training data, and it will be low if we're doing well tt p: c$231n.github. o/linearclassifyl 6/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition Multiclass Support Vector Machine loss There are several ways to define the details of the loss function. as a first example we will first develop a commonly used loss called the Multiclass Support Vector Machine(SVM)loss. The SVM loss is set up so that the svm wants" the correct class for each image to a have a score higher than the incorrect classes by some fixed margin A Notice that it's sometimes helpful to anthropomorphise the loss functions as we did above: The Svm"wants"a certain outcome in the sense that the outcome would yield a lower loss(which is good) et's now get more precise. Recall that for the ith example we are given the pixels of image x and the label yi that specifies the index of the correct class. The score function takes the pixels and computes the vector f(xi, w)of class scores, Which we will abbreviate to S(short for scores). For example, the score for the jth class is the jth element: S=f(xi, W)j.The Multiclass SVM loss for the ith example is then formalized as follows ∑ max(U. S +△) ≠v2 Example. Lets unpack this with an example to see how it works. Suppose that we have three classes that receive the scores s=[13, 7,11, and that the first class is the true class (i.e Yi=o). Also assume that a(a hyperparameter we will go into more detail about soon) is 10 The expression above sums over all incorrect classes (i* yi, so we get two terms Lz=max(0,713+10)+max(0,1l13+10) You can see that the first term gives zero since [713+10 gives a negative number, which is then thresholded to zero with the max(o,  function We get zero loss for this pair because the correct class score(13) was greater than the incorrect class score(7 )by at least the margin 10 In fact the difference was 20, which is much greater than 10 but the Svm only cares that the difference is at least 10: Any additional difference above the margin is clamped at zero with the max operation. The second term computes [1113+10 which gives 8. That is, even though the correct class had a higher score than the incorrect class(13>11), it was not greater by the desired margin of 10. the difference was only 2, which is why the loss comes out to 8(i.e. how much higher the difference would have to be to meet the margin). In summary, the SvM loss function wants the score of the correct class y i to be larger than the incorrect class scores by at least by A(delta If this is not the case, we will accumulate loss Note that in this particular module we are working with linear score functions(f(xi,w)=Wxi so we can also rewrite the loss function in this equivalent form ∑ max(0,w2x;w1x;+△) j≠y tt p: c$231n.github. o/linearclassifyl 7/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition where wi is the jth row of w reshaped as a column. However, this will not necessarily be the case once we start to consider more complex forms of the score function f A last piece of terminology we'll mention before we finish with this section is that the threshold at zero max(o,  function is often called the hinge loss. You'll sometimes hear about people instead using the squared hinge loss SVM(or L2SVM), which uses the form max(, ) that penalizes violated margins more strongly(quadratically instead of linearly The unsquared ersion is more standard, but in some datasets the squared hinge loss can work better. this can be determined during crossvalidation The loss function quantifies our unhappiness with predictions on the training set ++‖ delta score scores for other classes score for correct class The Multiclass Support Vector Machine wants the score of the correct class to be higher than all other scores by at least a margin of delta. If any class has a score inside the red region(or higher), then there will be accumulated loss. otherwise the loss will be zero. Our objective will be to find the weights that will simultaneously satisfy this constraint for all examples in the training data and give a total loss that is as low as possible Regularization. There is one bug with the loss function we presented above. Suppose that we have a dataset and a set of parameters w that correctly classify every example (i.e. all scores are so that all the margins are met, and Li =o for all i. the issue is that this set of w is not necessarily unique: there might be many similar w that correctly classify the examples. One easy way to see this is that if some parameters w correctly classify all examples(so loss is zero for each example), then any multiple of these parameters nw where n> I will also give zero loss because this transformation uniformly stretches all score magnitudes and hence also their absolute differences. For example, if the difference in scores between a correct class and a nearest incorrect class was 15, then multiplying all elements of w by 2 would make the new difference 30 In other words, we wish to encode some preference for a certain set of weights w over others to remove this ambiguity. We can do so by extending the loss function with a regularization penalty R(W The most common regularization penalty is the l2 norm that discourages large weights through an elementwise quadratic penalty over all parameters R(W)=∑∑W2 tt p: c$231n.github. o/linearclassifyl 8/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition In the expression above, we are summing up all the squared elements of w. notice that the regularization function is not a function of the data, it is only based on the weights. Including the regularization penalty completes the full Multiclass Support Vector Machine loss, which is made up of two components: the data loss(which is the average loss Li over all examples) and the regularization loss. That is, the full Multiclass SvM loss becomes N∑+xR(W regularization loss data lo Or expanding this out in its full form L=N∑∑ma(O/;w)/(x;Wy+4)+2∑W ij≠=y k l Where N is the number of training examples. As you can see, we append the regularization penalty to the loss objective, weighted by a hyperparameter n. there is no simple way of setting this hyperparameter and it is usually determined by crossvalidation In addition to the motivation we provided above there are many desirable properties to include the regularization penalty, many of which we will come back to in later sections for example it turns out that including the L2 penalty leads to the appealing max margin property in SVMs(See CS229 lecture notes for full details if you are interested The most appealing property is that penalizing large weights tends to improve generalization, because it means that no input dimension can have a very large influence on the scores all b itself. For example, suppose that we have some input vector x=l, 1, l, 1 and two weight vectors w1=[1,0O,0,O],w2=[0.25,0.25,0.25,0.25. Then wix=w2x=1 so both weight vectors lead to the same dot product, but the l2 penalty of wi is 1.0 while the l2 penalty of w2 is only 0.25. Therefore, according to the l2 penalty the weight vector w2 would be preferred since it achieves a lower regularization loss. Intuitively, this is because the weights in w2 are smaller and more diffuse. Since the L2 penalty prefers smaller and more diffuse weight vectors, the final classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. As we will see later in the class, this effect can improve the generalization performance of the classifiers on test images and lead to less overfitting Note that biases do not have the same effect since, unlike the weights, they do not control the strength of influence of an input dimension. Therefore, it is common to only regularize the weights w but not the biases b. However, in practice this often turns out to have a negligible tt p: c$231n.github. o/linearclassifyl 9/18 1231/2016 CS231n Convolutional Neural Networks for Visual Recognition effect. Lastly, note that due to the regularization penalty we can never achieve loss of exactly 0.0 on all examples, because this would only be possible in the pathological setting of W=o Code. Here is the loss function (without regularization) implemented in Python, in both unvectorized and halfvectorized form defL_ i(x, y, w) unvectorized version Compute the multiclass svm loss for a single example (x, y) x is a column vector representing an image(e.g. 3073 x I in CIFAR10) with an appended bias dimension in the 3073rd position (i.e. bias trick) y is an integer giving index of correct class(e. g. between 0 and 9 in CIFAR10 W is the weight matrix(e.g. 10 x 3073 in CIFAR10) delta = 1.0 #t see notes about delta later in this section scores= W.dot(x)# scores becomes of size 10x 1, the scores for each class correct_class_score= scoresby] D=W shape[0] number of classes, e. g. 10 loss i=0.0 for j in xrange(D): #f iterate over all wrong classes skip for the true class to only loop over incorrect classes continue f accumulate loss for the ith example loss_i+= max(0, scoresllcorrect class score t delta) return loss i defl_i_vectorized(x, y, w) A faster halfvectorized implementation. halfvectorized refers to the fact that for a single example the implementation contains no for loops, but there is still one loop over the examples(outside this function) delta = 1.0 scores W. dot(x) compute the margins for all classes in one vector operation margins np maximum(0, scoresscores[y]+ delta) f on yth position scores/y/scoresly/ canceled and gave delta. Wewant f to ignore the yth position and only consider margin on max wrong class margins] loss_i=np. sum(margins) return loss i tt p: c$231n.github. o/linearclassifyl 10/18
 69.33MB
斯坦福cs231n_2019课件完整版
20190618该文档为斯坦福cs231n课程2019最新课件，十分值得学习，方便国内用户使用，在此上传。如有侵权，请联系删除！
 90.62MB
斯坦福大学CS231n课程课件
20180925这些是从这个课程给定的官网上下载的，只有pdf格式的课件。
 72.38MB
CS231n2017年度课程ppt全集
201807292017年斯坦福大学计算机视觉方向深度学习CS231N课程讲义全集
 10.59MB
CS231N最新课件 （2019）5
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 57.35MB
cs231n ppt 课件 深度学习 斯坦福 计算机视觉
20170907本课程属于机器学习的深化课程，主要是介绍深度学习（尤其是卷积神经网络和与其相关的框架）在计算机视觉领域的应用，内容涵盖多种神经网络具体结构与训练应用细节，以及针对大规模图像识别，物体定位，物体检测，图
 2.55MB
CS231N最新课件9 （2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 87.77MB
2017 斯坦福CS231n课件
20180818斯坦福李飞飞主讲的计算机视觉课程CS231n课件，2017年版
 158.77MB
2018 斯坦福CS231n课件
20180818斯坦福李飞飞主讲的计算机视觉课程CS231n课件，2018版
 5.29MB
CS231N最新课件 6（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 7.51MB
CS231N最新课件 12（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 166.85MB
cs231n作业+数据集.zip
20190823压缩包中包括了斯坦福公开课cs231n课后的作业(未做), 以及需要使用的数据集 都打包在一起了就需要要费劲找了(*^▽^*) 还有一个课件在另一个包里, 因为太大了一起传不了
 90.57MB
斯坦福大学李飞飞教授CS231N课程完整课件
20180824斯坦福大学李飞飞教授CS231N课程完整课件, pdf版本，对学习计算机视觉及深度学习有很大的帮助
 90.60MB
CS 231n, Stanford University 计算机视觉课程全套课件*16
201804282017春季CS231n 斯坦福深度视觉识别课 开课时间：2017年11月10日 开课时长：讲座共有6个lecture，3个 Guest Talk，已完结
 2.91MB
斯坦福CS224 NLP课程课件lecture02/cs224n2017lecture2
20190303斯坦福CS224 NLP课程课件lecture02 深度学习与NLP专栏地址：https://blog.csdn.net/qq_34243930/column/info/31958
 4.66MB
CS231N最新课件3 （2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新版本
 51KB
CS231N最新课件 2 （2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 1.69MB
CS231N最新课件 14（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 16.51MB
CS231N最新课件 13（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 4.50MB
CS231N最新课件 11（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 5.90MB
CS231N最新课件 10（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 2.89MB
CS231N最新课件8 （2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 1.82MB
CS231N最新课件 7（2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 3.93MB
CS231N最新课件4 （2019）
20201027CS231N最新课件 （2019）：斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新个版本
 287.68MB
2019 最新斯坦福CS224n课件.rar
202008142019 cs224n 最新课件（包括ppt+note+homework），课程主要对接stanford cs224n 2019年最新课程
 67B
李飞飞CS231n计算机视觉公开课
20181205CS231n计算机视觉公开课视频教程百度网盘链接，包含课件，是广大计算机视觉同学入门的经典教程，建议搭配吴恩达深度学习课程观看。
 90.66MB
standfordCS231n2017 课件
20181114standfordCS231n2017 课件，课程视频对应的PPT，还有两篇课程推荐的论文
 218.44MB
2018年CS224n资料
201810292018年斯坦福CS224n课程资料，包括课件、assignment及解析。
 6.14MB
CS231N最新课件1（2019）
20201027斯坦福CS231李飞飞 深度学习，卷积神经网络的入门课程，对于想学习深度学习尤其是图像识别来说非常适合，19年最新版本！！
 61.76MB
cs231n深度学习与计算机视觉课件
20180424该讲义为斯坦福大学李飞飞教授所开深度学习与计算机视觉（cs231n）课程讲义，为深度学习入门课程，通俗易懂且内容前沿，适合于对深度学习/计算机视觉有兴趣并有一定基础的人学习。
 189.83MB
斯坦福CS224n（2019最新）课件笔记合集
20190316斯坦福自然语言处理课程CS224n2019冬季学期最新课件笔记合集

学院
一天学完MySQL数据库
一天学完MySQL数据库

学院
Docker从入门到精通
Docker从入门到精通

博客
np.logspace()
np.logspace()

学院
Glasterfs 分布式网络文件系统
Glasterfs 分布式网络文件系统

下载
具有超窄带宽的稳定锁模纳秒无Chi脉冲产生
具有超窄带宽的稳定锁模纳秒无Chi脉冲产生

学院
在 Linux 上构建企业级 DNS 域名解析服务
在 Linux 上构建企业级 DNS 域名解析服务

学院
虚幻4引擎基础
虚幻4引擎基础

学院
mpsoc zcu104 上做hdmi 显示实验
mpsoc zcu104 上做hdmi 显示实验

下载
沿RF锁相辅助的光纤环路链路上任意中间点的精确时延感测和工作台频率分配
沿RF锁相辅助的光纤环路链路上任意中间点的精确时延感测和工作台频率分配

学院
Unity 热更新技术ILRuntime
Unity 热更新技术ILRuntime

下载
ApacheBeam实战指南玩转KafkaIO与Flink
ApacheBeam实战指南玩转KafkaIO与Flink

博客
PHP超全局变量
PHP超全局变量

博客
XSS检测点处理
XSS检测点处理

博客
ES6新特性
ES6新特性

博客
20210225
20210225

博客
composer中一些命令\参数\说明
composer中一些命令\参数\说明

博客
正点原子STM32F103学习笔记（十一）——ADC&DAC
正点原子STM32F103学习笔记（十一）——ADC&DAC

学院
电商PC前后端分离项目Spring Boot后台实战第一期
电商PC前后端分离项目Spring Boot后台实战第一期

学院
【Python随到随学】FLask第二周
【Python随到随学】FLask第二周

学院
Mycat 实现 MySQL的分库分表、读写分离、主从切换
Mycat 实现 MySQL的分库分表、读写分离、主从切换

博客
pl是什么软件
pl是什么软件

学院
基于Qt的LibVLC开发教程
基于Qt的LibVLC开发教程

下载
量子差分密码分析
量子差分密码分析

学院
深究字符编码的奥秘，与乱码说再见
深究字符编码的奥秘，与乱码说再见

学院
MHA 高可用 MySQL 架构与 Altas 读写分离
MHA 高可用 MySQL 架构与 Altas 读写分离

下载
APPKIT打造稳定、灵活、高效的运营配置平台
APPKIT打造稳定、灵活、高效的运营配置平台

学院
Mysql数据库面试直通车
Mysql数据库面试直通车

下载
自适应极限学习机
自适应极限学习机

下载
GoSpeedTestBot：帮助您使用手机管理所有节点的机器人源码
GoSpeedTestBot：帮助您使用手机管理所有节点的机器人源码

下载
WLAN中共存的802.11a / n和802.11ac客户端：优化和区分
WLAN中共存的802.11a / n和802.11ac客户端：优化和区分