# Faster R-CNN in MXNet with distributed implementation and data parallelization
![example detections](https://cloud.githubusercontent.com/assets/13162287/22101032/92085dc0-de6c-11e6-9228-67e72606ddbc.png)
## Why?
There exist good implementations of Faster R-CNN yet they lack support for recent
ConvNet architectures. The aim of reproducing it from scratch is to fully utilize
MXNet engines and parallelization for object detection.
| Indicator | py-faster-rcnn (caffe resp.) | mx-rcnn (this reproduction) |
| :-------- | :--------------------------- | :-------------------------- |
| Speed [1] | 2.5 img/s training, 5 img/s testing | 3.8 img/s in training, 12.5 img/s testing |
| Performance [2] | mAP 73.2 | mAP 75.97 |
| Efficiency [3] | 11G for Fast R-CNN | 4.6G for Fast R-CNN |
| Parallelization [4] | None | 3.8 img/s to 6 img/s for 2 GPUs |
| Extensibility [5] | Old framework and base networks | ResNet |
[1] On Ubuntu 14.04.5 with device Titan X, cuDNN enabled.
The experiment is VGG-16 end-to-end training.
[2] VGG network. Trained end-to-end on VOC07trainval+12trainval, tested on VOC07 test.
[3] VGG network. Fast R-CNN is the most memory expensive process.
[4] VGG network (parallelization limited by bandwidth).
ResNet-101 speeds up from 2 img/s to 3.5 img/s.
[5] py-faster-rcnn does not support ResNet or recent caffe version.
## Why Not?
* If you value stability and reproducibility over performance and efficiency, please refer to official implementations.
There is no promise in all cases nor all experiments.
* If you value simplicity. Technical details are *very complicated* in MXNet.
This is by design to attain maximum possible performance instead of patching fixes after fixes.
Performance and parallelization are more than a change of parameter.
* If you want to do CPU training, be advised that it has not been verified yet.
You will not encounter NOT_IMPLEMENTED_ERROR so it is still possible.
* If you are on Windows or Python3, some people reported it was possible with some modifications.
But they have disappeared.
## Experiments
| Method | Network | Training Data | Testing Data | Reference | Result | Link |
| :----- | :------ | :------------ | :----------- | :-------: | :----: | :---: |
| Fast R-CNN | VGG16 | VOC07 | VOC07test | 66.9 | 66.50 | [Dropbox](https://www.dropbox.com/s/xmxjitv0kl96h7v/vgg_fast_rcnn-0008.params?dl=0) |
| Faster R-CNN alternate | VGG16 | VOC07 | VOC07test | 69.9 | 69.62 | [Dropbox](https://www.dropbox.com/s/fgj71uzxz8h6ajj/vgg_voc_alter-0008.params?dl=0) |
| Faster R-CNN end-to-end | VGG16 | VOC07 | VOC07test | 69.9 | 70.23 | [Dropbox](https://www.dropbox.com/s/gfxnf1qzzc0lzw2/vgg_voc07-0010.params?dl=0) |
| Faster R-CNN end-to-end | VGG16 | VOC07+12 | VOC07test | 73.2 | 75.97 | [Dropbox](https://www.dropbox.com/s/rvktx65s48cuyb9/vgg_voc0712-0010.params?dl=0) |
| Faster R-CNN end-to-end | ResNet-101 | VOC07+12 | VOC07test | 76.4 | 79.35 | [Dropbox](https://www.dropbox.com/s/ge2wl0tn47xezdf/resnet_voc0712-0010.params?dl=0) |
| Faster R-CNN end-to-end | VGG16 | COCO train | COCO val | 21.2 | 22.8 | [Dropbox](https://www.dropbox.com/s/e0ivvrc4pku3vj7/vgg_coco-0010.params?dl=0) |
| Faster R-CNN end-to-end | ResNet-101 | COCO train | COCO val | 27.2 | 26.1 | [Dropbox](https://www.dropbox.com/s/bfuy2uo1q1nwqjr/resnet_coco-0010.params?dl=0) |
The above experiments were conducted at [a mx-rcnn version](https://github.com/precedenceguo/mx-rcnn/tree/6a1ab0eec5035a10a1efb5fc8c9d6c54e101b4d0)
using [a MXNet fork, based on MXNet 0.9.1 nnvm pre-release](https://github.com/precedenceguo/mxnet/tree/simple).
## I'm Feeling Lucky
* Prepare: `bash script/additional_deps.sh`
* Download training data: `bash script/get_voc.sh`
* Download pretrained model: `bash script/get_pretrained_model.sh`
* Training and testing: `bash script/vgg_voc07.sh 0,1` (use gpu 0 and 1)
## Getting started
See if `bash script/additional_deps.sh` will do the following for you.
* Suppose `HOME` represents where this file is located. All commands, unless stated otherwise, should be started from `HOME`.
* Install python package `cython easydict matplotlib scikit-image`.
* Install MXNet version v0.9.5 or higher and MXNet Python Interface. Open `python` type `import mxnet` to confirm.
* Run `make` in `HOME`.
Command line arguments have the same meaning as in mxnet/example/image-classification.
* `prefix` refers to the first part of a saved model file name and `epoch` refers to a number in this file name.
In `model/vgg-0000.params`, `prefix` is `"model/vgg"` and `epoch` is `0`.
* `begin_epoch` means the start of your training process, which will apply to all saved checkpoints.
* Remember to turn off cudnn auto tune. `export MXNET_CUDNN_AUTOTUNE_DEFAULT=0`.
## Demo (Pascal VOC)
* An example of trained model (trained on VOC07 trainval) can be accessed from
[Baidu Yun](http://pan.baidu.com/s/1boRhGvH) (ixiw) or
[Dropbox](https://www.dropbox.com/s/jrr83q0ai2ckltq/final-0000.params.tar.gz?dl=0).
If you put the extracted model `final-0000.params` in `HOME` then use `--prefix final --epoch 0` to access it.
* Try out detection result by running `python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 0 --vis`.
Drop the `--vis` if you do not have a display or want to save as a new file.
## Training Faster R-CNN
The following tutorial is based on VOC data, VGG network. Supply `--network resnet` and
`--dataset coco` to use other networks and datasets.
Refer to `script/vgg_voc07.sh` and other experiments for examples.
### Prepare Training Data
See `bash script/get_voc.sh` and `bash script/get_coco.sh` will do the following for you.
* Make a folder `data` in `HOME`. `data` folder will be used to place the training data folder `VOCdevkit` and `coco`.
* Download and extract [Pascal VOC data](http://host.robots.ox.ac.uk/pascal/VOC/), place the `VOCdevkit` folder in `HOME/data`.
* Download and extract [coco dataset](http://mscoco.org/dataset/), place all images to `coco/images` and annotation jsons to `data/annotations`.
(Skip this if not interested) All dataset have three attributes, `image_set`, `root_path` and `dataset_path`.
* `image_set` could be `2007_trainval` or something like `2007trainval+2012trainval`.
* `root_path` is usually `data`, where `cache`, `selective_search_data`, `rpn_data` will be stored.
* `dataset_path` could be something like `data/VOCdevkit`, where images, annotations and results can be put so that many copies of datasets can be linked to the same actual place.
### Prepare Pretrained Models
See if `bash script/get_pretrained_model.sh` will do this for you. If not,
* Make a folder `model` in `HOME`. `model` folder will be used to place model checkpoints along the training process.
It is recommended to set `model` as a symbolic link to somewhere else in hard disk.
* Download VGG16 pretrained model `vgg16-0000.params` from [MXNet model gallery](https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-vgg.md) to `model` folder.
* Download ResNet pretrained model `resnet-101-0000.params` from [ResNet](https://github.com/tornadomeet/ResNet) to `model` folder.
### Alternate Training
See if `bash script/vgg_alter_voc07.sh 0` (use gpu 0) will do the following for you.
* Start training by running `python train_alternate.py`. This will train the VGG network on the VOC07 trainval.
More control of training process can be found in the argparse help.
* Start testing by running `python test.py --prefix model/final --epoch 0` after completing the training process.
This will test the VGG network on the VOC07 test with the model in `HOME/model/final-0000.params`.
Adding a `--vis` will turn on visualization and `-h` will show help as in the training process.
### End-to-end Training (approximate process)
See if `bash script/vgg_voc07.sh 0` (use gpu 0) will do the following for you.
* Start training by running `python train_