# FCN.tensorflow
Tensorflow implementation of [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/pdf/1605.06211v1.pdf) (FCNs).
The implementation is largely based on the reference code provided by the authors of the paper [link](https://github.com/shelhamer/fcn.berkeleyvision.org). The model was applied on the Scene Parsing Challenge dataset provided by MIT [http://sceneparsing.csail.mit.edu/](http://sceneparsing.csail.mit.edu/).
1. [Prerequisites](#prerequisites)
2. [Results](#results)
3. [Observations](#observations)
4. [Useful links](#useful-links)
## Prerequisites
- The results were obtained after training for ~6-7 hrs on a 12GB TitanX.
- The code was originally written and tested with `tensorflow0.11` and `python2.7`.
The tf.summary calls have been updated to work with tensorflow version 0.12. To work with older versions of tensorflow use branch [tf.0.11_compatible](https://github.com/shekkizh/FCN.tensorflow/tree/tf.0.11_compatible).
- Some of the problems while working with tensorflow1.0 and in windows have been discussed in [Issue #9](https://github.com/shekkizh/FCN.tensorflow/issues/9).
- To train model simply execute `python FCN.py`
- To visualize results for a random batch of images use flag `--mode=visualize`
- `debug` flag can be set during training to add information regarding activations, gradients, variables etc.
- The [IPython notebook](https://github.com/shekkizh/FCN.tensorflow/blob/master/logs/images/Image_Cmaped.ipynb) in logs folder can be used to view results in color as below.
## Results
Results were obtained by training the model in batches of 2 with resized image of 256x256. Note that although the training is done at this image size - Nothing prevents the model from working on arbitrary sized images. No post processing was done on the predicted images. Training was done for 9 epochs - The shorter training time explains why certain concepts seem semantically understood by the model while others were not. Results below are from randomly chosen images from validation dataset.
Pretty much used the same network design as in the reference model implementation of the paper in caffe. The weights for the new layers added were initialized with small values, and the learning was done using Adam Optimizer (Learning rate = 1e-4).
![](logs/images/inp_1.png) ![](logs/images/gt_c1.png) ![](logs/images/pred_c1.png)
![](logs/images/inp_2.png) ![](logs/images/gt_c2.png) ![](logs/images/pred_c2.png)
![](logs/images/inp_3.png) ![](logs/images/gt_c3.png) ![](logs/images/pred_c3.png)
![](logs/images/inp_4.png) ![](logs/images/gt_c4.png) ![](logs/images/pred_c4.png)
![](logs/images/inp_6.png) ![](logs/images/gt_c6.png) ![](logs/images/pred_c6.png)
## Observations
- The small batch size was necessary to fit the training model in memory but explains the slow learning
- Concepts that had many examples seem to be correctly identified and segmented - in the example above you can see that cars, persons were identified better. I believe this can be solved by training for longer epochs.
- Also the resizing of images cause loss of information - you can notice this in the fact smaller objects are segmented with less accuracy.
![](logs/images/sparse_entropy.png)
Now for the gradients,
- If you closely watch the gradients you will notice the inital training is almost entirely on the new layers added - it is only after these layers are reasonably trained do we see the VGG layers get some gradient flow. This is understandable as changes the new layers affect the loss objective much more in the beginning.
- The earlier layers of the netowrk are initialized with VGG weights and so conceptually would require less tuning unless the train data is extremely varied - which in this case is not.
- The first layer of convolutional model captures low level information and since this entrirely dataset dependent you notice the gradients adjusting the first layer weights to accustom the model to the dataset.
- The other conv layers from VGG have very small gradients flowing as the concepts captured here are good enough for our end objective - Segmentation.
- This is the core reason **Transfer Learning** works so well. Just thought of pointing this out while here.
![](logs/images/conv_1_1_gradient.png) ![](logs/images/conv_4_1_gradient.png) ![](logs/images/conv_4_2_gradient.png) ![](logs/images/conv_4_3_gradient.png)
## Useful Links
- Video of the presentaion given by the authors on the paper - [link](http://techtalks.tv/talks/fully-convolutional-networks-for-semantic-segmentation/61606/)
yc1111yc
- 粉丝: 25
- 资源: 164
最新资源
- Python asdasdas
- 基于NGSIM数据集的驾驶风格特征提取与高斯聚类分析:从换道工况探究驾驶风格多样性,驾驶风格,高斯聚类,特征提取,NGSIM 利用公开数据集#NGSIM(i-80,US101)中基于道工况的驾驶风格特
- 基于协同过滤的旅游景点与路线推荐系统:个性化推荐、管理功能与Django框架实现 ,旅游景点推荐系统 旅游路线推荐系统 使用基于协同过滤的方法,为用户推荐一座城市的景点或路线,同时实现登陆、查看景点信
- Python自动化办公源码-01批量更改Excel文件中200多个工作表的内容
- PLC三菱FX3U-48MRT控制器资料:STM32F103VET6芯片、多路输入输出、通信接口与指示灯,源码及原理图包含,PLC三菱PLC FX3U-48MRT 源码,原理图,PCBFX3U PL
- 深度学习乐园项目案例分享:A057-PCC Net模型实现行人数量统计
- STM32BMS动力电池管理系统仿真:探究物理模型与多样控制策略的关键技术及应用,STM32bms动力电池管理系统仿真 Battery Simulink电池平衡控制策略模型 动力电池管理系统仿真 BM
- 计算机视觉领域HALCON内存管理和缓存机制详解及其应用
- "C#读写台达PLC源代码详解:寄存器的读写操作、通讯协议详解与寄存器通讯对照表,专为工程师速成参考",C#读写台达PLC源代码,分别操控D、M、X、Y、T寄存器的读写,代码注释详细、分类说明,附带文
- 基于ASPIC开发流程的VCU整车管理控制器策略文档:应用层软件与快速原型开发,VCU整车管理控制器?策略文档,量产车型使用 快速原型开发 整车管理系统策略开发,应用层软件,在车型最新版本软件 按
- 基于粒子群算法的虚拟储能系统集成在智能楼宇微网优化调度中的实现,储能 智能楼宇 采用matlab编程,程序采用粒子群算法,将储能系统集成到楼宇微网优化调度模型中,通过在温度舒适度范围内对楼宇室温进行
- 西门子S-InOrder管理层序:博图V15.1下的控制台与A2伺服驱动器通讯控制程序实现与调试,西门子1200控制台达A2伺服485通讯控制程序,博图V15.1 ,西门子; 1200控制台; 达A2
- Halcon图像处理领域的灰度值插值方法研究及应用
- 基于Simulink仿真的双容水箱恒水位控制系统设计与参数优化研究,双容水箱恒水位控制系统,约8k字 基于SIMULINK仿真环境,在假设双容水箱的数学建模后,采用PID控制算法,分别选用单回路控制
- 基于java+ssm+mysql的仓库管理系统 源码+数据库+论文(高分毕设项目).zip
- 基于java+ssm+mysql的餐厅点餐系统 源码+数据库+论文(高分毕设项目).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈