可分离卷积基本介绍

所需积分/C币:37 2019-04-11 14:58:29 314KB PDF
273
收藏 收藏
举报

可分离卷积基本介绍,A Basic Introduction to Separable Convolutions。
Unlike spatial separable convolutions, depthwise separable convolutions work with kernels that cannot be "factored into two smaller kernels. Hence, it is more commonly used This is the type of separable convolution seen in keraslayers. SeparableConv2D or tf layers. separable coned The depthwise separable convolution is so named because it deals not just with the spatial dimensions, but with the depth dimension-the number of channels-as well An input image may have 3 channels RGB. After a few convolutions, an image may have multiple channels You can image each channel as a particular interpretation of that Image; in for example,the“red” channel interprets the“ redness"of each pixel, the"blue channel interprets the"blueness"of each pixel and the green" channel interprets the"greenness"of each pixel. An image with 64 channels has 64 different interpretations of that Image Similar to the spatial separable convolution, a depthwise separable convolution splits a kernel into 2 separate kernels that do two convolutions: the depthwise convolution and the pointwise convolution but first of all. let's see how a normal convolution works Normal Convolution: If you dont know how a convolution works from a 2-D perspective, read this article or check out this site a typical image, however, is not 2-D; it also has depth as well as width and height. Let us assume that we have an input image of 12X12x3 pixels, an RGB image of size 12X12 Let's do a 5x5 convolution on the image with no padding and a stride of 1. If we only consider the width and height of the image the convolution process is kind of like this: 12X12-(5X5)->8x8. The 5x5 kernel undergoes scalar multiplication with every 25 pixels, giving outi number every time We end up with a 8x8 pixel image, since there is no padding(12-5+1=8) However, because the image has 3 channels, our convolutional kernel needs to have 3 channels as well. This means, instead of doing 55=25 multiplications, we actually do 5x5x3=75 multiplications every time the kernel moves Just like the 2-D interpretation, we do scalar matrix multiplication on every 25 pixels, outputting 1 number. After going through a 5X5X3 kernel, the 12X12x3 image will become a 8x8x1 image I mage 4: A normal convolution with 8x8x1 output What if we want to increase the number of channels in our output image? What if we want an output of size 8X8X256? Well, we can create 256 kernels to create 256 8x8X1 images, then stack them up together to create a 8x8x256 image output mage 5: A normal convolution with 8x8x256 output This is how a normal convolution works i like to think of it like a function: 12X12x3-(5x5x3x256)->12X12x256(Where 5x5X3X256 represents the height, width, number of input channels, and number of output channels of the kernel). not that this is not matrix multiplication; were not multiplying the whole image by the kernel but moving the kernel through every part of the image and multiplying small parts of it separately. a depthwise separable convolution separates this process into 2 parts: a depthwise convolution and a pointwise convolution Part 1-Depthwise Convolution In the first part, depthwise convolution, we give the input image a convolution without changing the depth We do so by using 3 kernels of shape 5x5x1 This embedded content is from a site that does not comply with the Do Not Track (DNT) setting now enabled on your browser Please note, if you click through and view it anyway, you may be tracked by the website hosting the embed earn More about Medium's DNT policy Video 1: Iterating 3 kernels through a 3 channel image ⑨ Image 6: Depthwise convolution, uses 3 kernels to transform a 12x 12X3 image to a 8x8X3 image Each 5x5x1 kernel iterates 1 channel of the image(note: 1 channel not all channels), getting the scalar products of every 25 pixel group giving out a 8x8x1 image Stacking these images together creates a 8x8x3 image Part 2-Pointwise Convolution: Remember the original convolution transformed a 12x12x3 image to a 8x8x256 image Currently, the depthwise convolution has transformed the 12X12X3 image to a 8x8x3 image Now, we need to increase the number of channels of each image The pointwise convolution is so named because it uses a 1x1 kernel or a kernel that iterates through every single point This kernel has a depth of however many channels the input image has; in our case, 3 Therefore, we iterate a 1x13 kernel through our 8x8x3 image, to get a 8x8x1 image 8 Image 7: Pointwise convolution, transforms an image of 3 channels to an image of l channe We can create 256 1x1x3 kernels that output a 8x8X1 image each to get a final image of shape 8x8x256 Image 8: Pointwise convolution with 256 kernels, outputting an image with 256 channels And thats it! We've separated the convolution into 2: a depthwise convolution and a pointwise convolution. In a more abstract way, if the original convolution function is 12X12X3-(5X5x3X256 12X12X256, we can illustrate this new convolution as 12X12x3 (5X5X1X1)->(1x1X3X256)->12X12x256 Alright, but what's the point of creating a depthwise separable convolution? Let's calculate the number of multiplications the computer has to do in the original convolution. There are 256 5 5x3 kernels that move 8x8 times. That's 256x3X5X5x8x8=1, 228, 800 multiplications What about the separable convolution? In the depthwise convolution, we have 3 5X5x1 kernels that move 8x8 times. Thats 3%5X5X8x8 4, 800 multiplications. In the pointwise convolution, we have 256 1x1x3 kernels that move 88 times. Thats 256x1X1X3X8X8=49, 152 multiplications. Adding them up together, that's 53, 952 multiplications 52,952 is a lot less than 1, 228, 800. With less computations, the network is able to process more in a shorter amount of time How does that work, though? The first time I came across this explanation, it didn 't really make sense to me intuitively arent the two convolutions doing the same thing? In both cases, we pass the image through a 5x5 kernel, shrink it down to one channel, then expand it to 256 channels. How come one is more than twice as fast as the other? After pondering about it for some time, I realized that the main difference is this: in the normal convolution, we are transforming the image 256 times. And every transformation uses up 5x5X3x8X8=4800 multiplications. In the separable convolution, we only really transform the image once-in the depthwise convolution. Then, we take the transformed image and simply elongate it to 256 channels Without having to transform the Image over and over again, we can save up on computational power. It's worth noting that in both Keras and Tensorflow, there is a argument called the "depth multiplier". It is set to 1 at default. By changing this argument, we can change the number of output channels in the depthwise convolution For example, if we set the depth multiplier to 2, each 5x5x1 kernel will give out an output image of 8x8X2, making the total(stacked)output of the depthwise convolution 8x8x6 instead of 8x83. Some may choose to manually set the depth multiplier to increase the number of parameters in their neural net for it to better learn more traits Are the disadvantages to a depthwise separable convolution Definitely! Because it reduces the number of parameters in a convolution, if your network is already small, you might end up with too few parameters and your network might fail to properly learn during training If used properly, however, it manages to enhance efficiency without significantly reducing effectiveness, which makes it a quite popular choice. 1x1 Kernels. Finally, because pointwise convolutions use the concept, Id like to touch upon the usages of a 1x1 kernel. A 1x1 kernel-or rather n 1x1xm kernels where n is the number of output channels and m is the number of input channels-can be used outside of separable convolutions. One obvious purpose of a 1x1 kernel is to increase or reduce the depth of an image. If you find that your convolution has too many or too little channels, a 1x1 kernel can help balance it out For me, however, the main purpose of a 1x1 kernel is to apply non linearlity. After every layer of a neural network, we can apply an activation layer. Whether it be relu, Prelu, Softmax, or another activation lavers are non-linear, unlike convolution layers. "A linear combination of lines is still a line. Non-linear layers expand the possibilities for the model, as is what generally makes a"deep network better than a"wide network. In order to increase the number of non-linear layers without significantly increasing the number of parameters and computations, we can apply a 1x1 kernel and add an activation layer after it. This helps give the network an added layer of depth Leave a comment below if you have any further questions! And dont forget to give this story some claps

...展开详情
试读 10P 可分离卷积基本介绍
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 签到新秀

  • 分享王者

关注 私信
上传资源赚钱or赚积分
最新推荐
可分离卷积基本介绍 37积分/C币 立即下载
1/10
可分离卷积基本介绍第1页
可分离卷积基本介绍第2页

试读结束, 可继续读1页

37积分/C币 立即下载