# Description
This repo contains ICNet implemented by PyTorch, based on [paper](https://arxiv.org/abs/1704.08545) by Hengshuang Zhao, and et. al(ECCV'18).
Training and evaluation are done on the [Cityscapes dataset](https://www.cityscapes-dataset.com/) by default.
# Requirements
Python 3.6 or later with the following `pip3 install -r requirements.txt`:
- torch==1.1.0
- torchsummary==1.5.1
- torchvision==0.3.0
- numpy==1.17.0
- Pillow==6.0.0
- PyYAML==5.1.2
# Updates
- 2019.11.15: change `crop_size=960`, the best mIoU increased to 71.0%. It took about 2 days. Get [icnet_resnet50_197_0.710_best_model.pth]()
# Performance
| Method | mIoU(%) | Time(ms) | FPS | Memory(GB)| GPU |
|:---:|:---:|:---:|:---:|:---:|:---:|
| ICNet(paper) | 67.7% | 33ms | 30.3 | **1.6** | TitanX|
| ICNet(ours) | **71.0%** | **19ms** | **52.6** | 1.86 | GTX 1080Ti|
- Base on Cityscapes dataset, only train on trainning set, and test on validation set, using only one GTX 1080Ti card, and input size of the test phase is 2048x1024x3.
- For the performance of the original paper, you can query the "Table2" in the [paper](https://arxiv.org/abs/1704.08545).
# Demo
|image|predict|
|:---:|:---:|
|![src](https://github.com/liminn/ICNet/raw/master/demo/frankfurt_000001_057181_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/frankfurt_000001_057181_leftImg8bit_mIoU_0.716.png)|
|![src](https://github.com/liminn/ICNet/raw/master/demo/lindau_000005_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/lindau_000005_000019_leftImg8bit_mIoU_0.700.png) |
|![src](https://github.com/liminn/ICNet/raw/master/demo/munster_000061_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/munster_000061_000019_leftImg8bit_mIoU_0.692.png) |
|![src](https://github.com/liminn/ICNet/raw/master/demo/munster_000075_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/munster_000075_000019_leftImg8bit_mIoU_0.690.png) |
|![src](https://github.com/liminn/ICNet/raw/master/demo/munster_000106_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/munster_000106_000019_leftImg8bit_mIoU_0.690.png) |
|![src](https://github.com/liminn/ICNet/raw/master/demo/munster_000121_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/munster_000121_000019_leftImg8bit_mIoU_0.678.png) |
|![src](https://github.com/liminn/ICNet/raw/master/demo/munster_000124_000019_leftImg8bit_src.png)|![predict](https://github.com/liminn/ICNet/raw/master/demo/munster_000124_000019_leftImg8bit_mIoU_0.695.png) |
- All the input images comes from the validation dataset of the Cityscaps, you can switch to the `demo/` directory to check more demo results.
# Usage
## Trainning
First, modify the configuration in the `configs/icnet.yaml` file:
```Python
### 3.Trainning
train:
specific_gpu_num: "1" # for example: "0", "1" or "0, 1"
train_batch_size: 7 # adjust according to gpu resources
cityscapes_root: "/home/datalab/ex_disk1/open_dataset/Cityscapes/"
ckpt_dir: "./ckpt/" # ckpt and trainning log will be saved here
```
Then, run: `python3 train.py`
## Evaluation
First, modify the configuration in the `configs/icnet.yaml` file:
```Python
### 4.Test
test:
ckpt_path: "./ckpt/icnet_resnet50_197_0.710_best_model.pth" # set the pretrained model path correctly
```
Then, run: `python3 evaluate.py`
# Discussion
![ICNet](https://github.com/liminn/ICNet/raw/master/ICNet.png)
The structure of ICNet is mainly composed of `sub4`, `sub2`, `sub1` and `head`:
- `sub4`: basically a `pspnet`, the biggest difference is a modified `pyramid pooling module`.
- `sub2`: the first three phases convolutional layers of `sub4`, `sub2` and `sub4` share these three phases convolutional layers.
- `sub1`: three consecutive stried convolutional layers, to fastly downsample the original large-size input images
- `head`: through the `CFF` module, the outputs of the three cascaded branches( `sub4`, `sub2` and `sub1`) are connected. Finaly, using 1x1 convolution and interpolation to get the output.
During the training, I found that `pyramid pooling module` in `sub4` is very important. It can significantly improve the performance of the network and lightweight models.
The most import thing in data preprocessing phase is to set the `crop_size` reasonably, you should set the `crop_size` as close as possible to the input size of prediction phase, here is my experiment:
- I set the `base_size` to 520, it means resize the shorter side of image between 520x0.5 and 520x2, and set the `crop size` to 480, it means randomly crop 480x480 patch to train. The final best mIoU is 66.7%.
- I set the `base_size` to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set the `crop_size` to 720, it means randomly crop 720x720 patch to train. The final best mIoU is 69.9%.
- Beacuse our target dataset is Cityscapes, the image size is 2048x1024, so the larger `crop_size`(720x720) is better. I have not tried a larger `crop_size`(such as 960x960 or 1024x1024) yet, beacuse it will result in a very small batch size and is very time-consuming, in addition, the current mIoU is already high. But I believe that larger `crop_size` will bring higher mIoU.
In addition, I found that a small training technique can improve the performance of the model:
- set the learning rate of `sub4` to orginal initial learning rate(0.01), because it has backbone pretrained weights.
- set the learning rate of `sub1` and `head` to 10 times initial learning rate(0.1), because there are no pretrained weights for them.
This small training technique is really effective, it can improve the mIoU performance by 1~2 percentage points.
Any other questions or my mistakes can be fedback in the comments section. I will replay as soon as possible.
# Reference
- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
- [awesome-semantic-segmentation-pytorch](https://github.com/Tramac/awesome-semantic-segmentation-pytorch)
- [Human-Segmentation-PyTorch](https://github.com/thuyngch/Human-Segmentation-PyTorch)
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
ICNet.zip (36个子文件)
models
__pycache__
model_store.cpython-36.pyc 2KB
icnet.cpython-36.pyc 4KB
segbase.cpython-36.pyc 2KB
__init__.cpython-36.pyc 162B
__init__.py 0B
base_models
__pycache__
resnetv1b.cpython-36.pyc 8KB
__init__.cpython-36.pyc 200B
__init__.py 25B
resnetv1b.py 10KB
icnet.py 5KB
model_store.py 2KB
segbase.py 2KB
ICNet.png 446KB
train.py 11KB
LICENSE 1KB
evaluate.py 8KB
configs
.gitignore 20B
icnet.yaml 909B
requirements.txt 138B
.gitignore 1KB
README.md 6KB
utils
loss.py 1KB
visualize.py 6KB
lr_scheduler.py 778B
logger.py 1KB
metric.py 6KB
__pycache__
lr_scheduler.cpython-36.pyc 1KB
metric.cpython-36.pyc 5KB
loss.cpython-36.pyc 1KB
visualize.cpython-36.pyc 5KB
logger.cpython-36.pyc 844B
__init__.cpython-36.pyc 428B
download.cpython-36.pyc 3KB
__init__.py 206B
download.py 3KB
resnet50-19c8e357.pth 97.75MB
共 36 条
- 1
scl6666
- 粉丝: 19
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0