# Introduction
Sparse DETR is an efficient end-to-end object detector that **sparsifies encoder tokens** by using the learnable DAM(Decoder Attention Map) predictor. It achieves better performance than Deformable DETR even with only 10% encoder queries on the COCO dataset.
<p align="center">
<img src="./figs/dam_creation.png" height=350>
</p>
# Installation
## Requirements
We have tested the code on the following environments:
* Python 3.7.7 / Pytorch 1.6.0 / torchvisoin 0.7.0 / CUDA 10.1 / Ubuntu 18.04
* Python 3.8.3 / Pytorch 1.7.1 / torchvisoin 0.8.2 / CUDA 11.1 / Ubuntu 18.04
Run the following command to install dependencies:
```bash
pip install -r requirements.txt
```
## Compiling CUDA operators
```bash
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
# Usage
## Dataset preparation
Please download [COCO 2017 dataset](https://cocodataset.org/) and organize them as follows:
```
code_root/
└── data/
└── coco/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
```
## Training
### Training on a single node
For example, the command for training Sparse DETR with the keeping ratio of 10% on 8 GPUs is as follows:
```bash
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/swint_sparse_detr_rho_0.1.sh
```
### Training on multiple nodes
For example, the command Sparse DETR with the keeping ratio of 10% on 2 nodes of each with 8 GPUs is as follows:
On node 1:
```bash
$ MASTER_ADDR=<IP address of node 1> NODE_RANK=0 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh
```
On node 2:
```bash
$ MASTER_ADDR=<IP address of node 2> NODE_RANK=1 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh
```
### Direct argument control
```bash
# Deformable DETR (with bounding-box-refinement and two-stage argument, if wanted)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage
# Efficient DETR (with the class-specific head as describe in their paper)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head
# Sparse DETR (with the keeping ratio of 10% and encoder auxiliary loss)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head --rho 0.1 --use_enc_aux_loss
```
### Some tips to speed-up training
* If your file system is slow to read images, you may consider enabling '--cache_mode' option to load the whole dataset into memory at the beginning of training.
* You may increase the batch size to maximize the GPU utilization, according to GPU memory of yours, e.g., set '--batch_size 3' or '--batch_size 4'.
## Evaluation
You can get the pre-trained model of Sparse DETR (the link is in "Main Results" session), then run the following command to evaluate it on COCO 2017 validation set:
```bash
# Note that you should run the command with the corresponding configuration.
$ ./configs/swint_sparse_detr_rho_0.1.sh --resume <path to pre-trained model> --eval
```
You can also run distributed evaluation by using ```./tools/run_dist_launch.sh```.
# Main Results
The tables below demonstrate the detection performance of Sparse DETR on the COCO 2017 validation set when using different backbones.
* **Top-k** : sampling the top-k object queries instead of using the learned object queries(as in Efficient DETR).
* **BBR** : performing bounding box refinement in the decoder block(as in Deformable DETR).
* The **encoder auxiliary loss** proposed in our paper is only applied to Sparse DETR.
* **FLOPs** and **FPS** are measured in the same way as used in Deformable DETR.
* Refer to **Table 1** in the paper for more details.
## ResNet-50 backbone
| Method | Epochs | ρ | Top-k & BBR | AP | #Params(M) | GFLOPs | B4FPS | Download |
|:------------------:|:------:|:---:|:-----------:|:----:|:----------:|:------:|:-----:|:--------:|
| Faster R-CNN + FPN | 109 | N/A | | 42.0 | 42M | 180G | 26 | |
| DETR | 50 | N/A | | 35.0 | 41M | 86G | 28 | |
| DETR | 500 | N/A | | 42.0 | 41M | 86G | 28 | |
| DETR-DC5 | 500 | N/A | | 43.3 | 41M | 187G | 12 | |
| PnP-DETR | 500 | 33% | | 41.1 | | | | |
| | 500 | 50% | | 41.8 | | | | |
| PnP-DETR-DC5 | 500 | 33% | | 42.7 | | | | |
| | 500 | 50% | | 43.1 | | | | |
| Deformable-DETR | 50 | N/A | | 43.9 | 39.8M | 172.9G | 19.1 | |
| | 50 | N/A | o | 46.0 | 40.8M | 177.3G | 18.2 | |
| Sparse-DETR | 50 | 10% | o | 45.3 | 40.9M | 105.4G | 26.5 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_10.pth) |
| | 50 | 20% | o | 45.6 | 40.9M | 112.9G | 24.8 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_20.pth) |
| | 50 | 30% | o | 46.0 | 40.9M | 120.5G | 23.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_30.pth) |
| | 50 | 40% | o | 46.2 | 40.9M | 128.0G | 21.8 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_40.pth) |
| | 50 | 50% | o | 46.3 | 40.9M | 135.6G | 20.5 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_50.pth) |
## Swin-T backbone
| Method | Epochs | ρ | Top-k & BBR | AP | #Params(M) | GFLOPs | B4FPS | Download |
|:---------------:|:------:|:---:|:-----------:|:----:|:----------:|:------:|:-----:|:--------:|
| DETR | 50 | N/A | | 35.9 | 45.0M | 91.6G | 26.8 | |
| DETR | 500 | N/A | | 45.4 | 45.0M | 91.6G | 26.8 | |
| Deformable-DETR | 50 | N/A | | 45.7 | 40.3M | 180.4G | 15.9 | |
| | 50 | N/A | o | 48.0 | 41.3M | 184.8G | 15.4 | |
| Sparse-DETR | 50 | 10% | o | 48.2 | 41.4M | 113.4G | 21.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_10.pth) |
| | 50 | 20% | o | 48.8 | 41.4M | 121.0G | 20 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_20.pth) |
| | 50 | 30% | o | 49.1 | 41.4M | 128.5G | 18.9 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_30.pth) |
| | 50 | 40% | o | 49.2 | 41.4M | 136.1G | 18 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_40.pth) |
| | 50 | 50% | o | 49.3 | 41.4M | 143.7G | 17.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_50.pth) |
## Initializing ResNet-50 backbone with SCRL
The performance of Sparse DETR can be further improved when the backbone network is initialized with the `SCRL`([Spatially Consistent Representation Learning](https://arxiv.org/abs/2103.06122)) that aims to learn dense representations in a self-supervised way, compared to the default initialization with the ImageNet pre-trained one, denoted as `IN-sup` in the table below.
* We obtained pre-trained weights from [Torchvision](https://pytorch.org/tutorials/beginner/finetuning_torchvision
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
稀疏化DETR_基于Pytorch实现稀疏化DETR_SparseDETR_附流程教程+项目源码_优质项目实战.zip (64个子文件)
稀疏化DETR_基于Pytorch实现稀疏化DETR_SparseDETR_附流程教程+项目源码_优质项目实战
tools
run_dist_launch.sh 517B
launch.py 9KB
main.py 20KB
configs
swint_sparse_detr_rho_0.3.sh 301B
swint_sparse_detr_rho_0.1.sh 301B
r50_sparse_detr_rho_0.1.sh 275B
r50_efficient_detr.sh 233B
swint_sparse_detr_rho_0.2.sh 301B
swint_efficient_detr.sh 259B
swint_deformable_detr.sh 169B
r50_sparse_detr_rho_0.2.sh 275B
r50_deformable_detr.sh 143B
r50_sparse_detr_rho_0.3.sh 275B
datasets
__init__.py 835B
coco.py 5KB
torchvision_datasets
__init__.py 33B
coco.py 3KB
panoptic_eval.py 1KB
samplers.py 5KB
transforms.py 8KB
coco_panoptic.py 4KB
data_prefetcher.py 3KB
coco_eval.py 8KB
figs
architecture.png 271KB
dam_creation.png 164KB
requirements.txt 47B
models
__init__.py 86B
deformable_detr.py 31KB
segmentation.py 15KB
position_encoding.py 3KB
swin_transformer
__init__.py 350B
configs
swin_small_patch4_window7_224.yaml 208B
swin_base_patch4_window7_224.yaml 208B
swin_large_patch4_window7_224.yaml 188B
swin_tiny_patch4_window7_224.yaml 206B
default.yaml 346B
swin_transformer.py 23KB
build.py 3KB
config.py 5KB
matcher.py 4KB
backbone.py 8KB
ops
setup.py 2KB
src
vision.cpp 799B
cpu
ms_deform_attn_cpu.h 1KB
ms_deform_attn_cpu.cpp 1KB
ms_deform_attn.h 2KB
cuda
ms_deform_attn_cuda.h 1KB
ms_deform_im2col_cuda.cuh 53KB
ms_deform_attn_cuda.cu 7KB
modules
__init__.py 739B
ms_deform_attn.py 8KB
functions
__init__.py 56B
ms_deform_attn_func.py 3KB
test.py 3KB
make.sh 51B
deformable_transformer.py 30KB
engine.py 8KB
util
__init__.py 211B
benchmark.py 5KB
box_ops.py 3KB
misc.py 20KB
plot_utils.py 4KB
dam.py 4KB
README.md 9KB
共 64 条
- 1
资源评论
__AtYou__
- 粉丝: 1585
- 资源: 441
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功