稀疏化DETR-基于Pytorch实现稀疏化DETR-SparseDETR-附流程教程+项目源码-优质项目实战.zip资源-CSDN文库

共64个文件

py：36个

sh：12个

yaml：5个

版权申诉

DETR

目标检测

165 浏览量 2024-05-12 08:38:02 上传评论收藏 539KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

稀疏化DETR_基于Pytorch实现稀疏化DETR_SparseDETR_附流程教程+项目源码_优质项目实战.zip （64个子文件）

稀疏化DETR_基于Pytorch实现稀疏化DETR_SparseDETR_附流程教程+项目源码_优质项目实战

tools

run_dist_launch.sh 517B

launch.py 9KB

main.py 20KB

configs

swint_sparse_detr_rho_0.3.sh 301B

swint_sparse_detr_rho_0.1.sh 301B

r50_sparse_detr_rho_0.1.sh 275B

r50_efficient_detr.sh 233B

swint_sparse_detr_rho_0.2.sh 301B

swint_efficient_detr.sh 259B

swint_deformable_detr.sh 169B

r50_sparse_detr_rho_0.2.sh 275B

r50_deformable_detr.sh 143B

r50_sparse_detr_rho_0.3.sh 275B

datasets

__init__.py 835B

coco.py 5KB

torchvision_datasets

__init__.py 33B

coco.py 3KB

panoptic_eval.py 1KB

samplers.py 5KB

transforms.py 8KB

coco_panoptic.py 4KB

data_prefetcher.py 3KB

coco_eval.py 8KB

figs

architecture.png 271KB

dam_creation.png 164KB

requirements.txt 47B

models

__init__.py 86B

deformable_detr.py 31KB

segmentation.py 15KB

position_encoding.py 3KB

swin_transformer

__init__.py 350B

configs

swin_small_patch4_window7_224.yaml 208B

swin_base_patch4_window7_224.yaml 208B

swin_large_patch4_window7_224.yaml 188B

swin_tiny_patch4_window7_224.yaml 206B

default.yaml 346B

swin_transformer.py 23KB

build.py 3KB

config.py 5KB

matcher.py 4KB

backbone.py 8KB

ops

setup.py 2KB

src

vision.cpp 799B

cpu

ms_deform_attn_cpu.h 1KB

ms_deform_attn_cpu.cpp 1KB

ms_deform_attn.h 2KB

cuda

ms_deform_attn_cuda.h 1KB

ms_deform_im2col_cuda.cuh 53KB

ms_deform_attn_cuda.cu 7KB

modules

__init__.py 739B

ms_deform_attn.py 8KB

functions

__init__.py 56B

ms_deform_attn_func.py 3KB

test.py 3KB

make.sh 51B

deformable_transformer.py 30KB

engine.py 8KB

util

__init__.py 211B

benchmark.py 5KB

box_ops.py 3KB

misc.py 20KB

plot_utils.py 4KB

dam.py 4KB

README.md 9KB

# Introduction Sparse DETR is an efficient end-to-end object detector that **sparsifies encoder tokens** by using the learnable DAM(Decoder Attention Map) predictor. It achieves better performance than Deformable DETR even with only 10% encoder queries on the COCO dataset. <p align="center"> <img src="./figs/dam_creation.png" height=350> </p> # Installation ## Requirements We have tested the code on the following environments: * Python 3.7.7 / Pytorch 1.6.0 / torchvisoin 0.7.0 / CUDA 10.1 / Ubuntu 18.04 * Python 3.8.3 / Pytorch 1.7.1 / torchvisoin 0.8.2 / CUDA 11.1 / Ubuntu 18.04 Run the following command to install dependencies: ```bash pip install -r requirements.txt ``` ## Compiling CUDA operators ```bash cd ./models/ops sh ./make.sh # unit test (should see all checking is True) python test.py ``` # Usage ## Dataset preparation Please download [COCO 2017 dataset](https://cocodataset.org/) and organize them as follows: ``` code_root/ └── data/ └── coco/ ├── train2017/ ├── val2017/ └── annotations/ ├── instances_train2017.json └── instances_val2017.json ``` ## Training ### Training on a single node For example, the command for training Sparse DETR with the keeping ratio of 10% on 8 GPUs is as follows: ```bash $ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/swint_sparse_detr_rho_0.1.sh ``` ### Training on multiple nodes For example, the command Sparse DETR with the keeping ratio of 10% on 2 nodes of each with 8 GPUs is as follows: On node 1: ```bash $ MASTER_ADDR=<IP address of node 1> NODE_RANK=0 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh ``` On node 2: ```bash $ MASTER_ADDR=<IP address of node 2> NODE_RANK=1 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh ``` ### Direct argument control ```bash # Deformable DETR (with bounding-box-refinement and two-stage argument, if wanted) $ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage # Efficient DETR (with the class-specific head as describe in their paper) $ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head # Sparse DETR (with the keeping ratio of 10% and encoder auxiliary loss) $ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head --rho 0.1 --use_enc_aux_loss ``` ### Some tips to speed-up training * If your file system is slow to read images, you may consider enabling '--cache_mode' option to load the whole dataset into memory at the beginning of training. * You may increase the batch size to maximize the GPU utilization, according to GPU memory of yours, e.g., set '--batch_size 3' or '--batch_size 4'. ## Evaluation You can get the pre-trained model of Sparse DETR (the link is in "Main Results" session), then run the following command to evaluate it on COCO 2017 validation set: ```bash # Note that you should run the command with the corresponding configuration. $ ./configs/swint_sparse_detr_rho_0.1.sh --resume <path to pre-trained model> --eval ``` You can also run distributed evaluation by using ```./tools/run_dist_launch.sh```. # Main Results The tables below demonstrate the detection performance of Sparse DETR on the COCO 2017 validation set when using different backbones. * **Top-k** : sampling the top-k object queries instead of using the learned object queries(as in Efficient DETR). * **BBR** : performing bounding box refinement in the decoder block(as in Deformable DETR). * The **encoder auxiliary loss** proposed in our paper is only applied to Sparse DETR. * **FLOPs** and **FPS** are measured in the same way as used in Deformable DETR. * Refer to **Table 1** in the paper for more details. ## ResNet-50 backbone | Method | Epochs | ρ | Top-k & BBR | AP | #Params(M) | GFLOPs | B4FPS | Download | |:------------------:|:------:|:---:|:-----------:|:----:|:----------:|:------:|:-----:|:--------:| | Faster R-CNN + FPN | 109 | N/A | | 42.0 | 42M | 180G | 26 | | | DETR | 50 | N/A | | 35.0 | 41M | 86G | 28 | | | DETR | 500 | N/A | | 42.0 | 41M | 86G | 28 | | | DETR-DC5 | 500 | N/A | | 43.3 | 41M | 187G | 12 | | | PnP-DETR | 500 | 33% | | 41.1 | | | | | | | 500 | 50% | | 41.8 | | | | | | PnP-DETR-DC5 | 500 | 33% | | 42.7 | | | | | | | 500 | 50% | | 43.1 | | | | | | Deformable-DETR | 50 | N/A | | 43.9 | 39.8M | 172.9G | 19.1 | | | | 50 | N/A | o | 46.0 | 40.8M | 177.3G | 18.2 | | | Sparse-DETR | 50 | 10% | o | 45.3 | 40.9M | 105.4G | 26.5 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_10.pth) | | | 50 | 20% | o | 45.6 | 40.9M | 112.9G | 24.8 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_20.pth) | | | 50 | 30% | o | 46.0 | 40.9M | 120.5G | 23.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_30.pth) | | | 50 | 40% | o | 46.2 | 40.9M | 128.0G | 21.8 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_40.pth) | | | 50 | 50% | o | 46.3 | 40.9M | 135.6G | 20.5 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_r50_50.pth) | ## Swin-T backbone | Method | Epochs | ρ | Top-k & BBR | AP | #Params(M) | GFLOPs | B4FPS | Download | |:---------------:|:------:|:---:|:-----------:|:----:|:----------:|:------:|:-----:|:--------:| | DETR | 50 | N/A | | 35.9 | 45.0M | 91.6G | 26.8 | | | DETR | 500 | N/A | | 45.4 | 45.0M | 91.6G | 26.8 | | | Deformable-DETR | 50 | N/A | | 45.7 | 40.3M | 180.4G | 15.9 | | | | 50 | N/A | o | 48.0 | 41.3M | 184.8G | 15.4 | | | Sparse-DETR | 50 | 10% | o | 48.2 | 41.4M | 113.4G | 21.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_10.pth) | | | 50 | 20% | o | 48.8 | 41.4M | 121.0G | 20 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_20.pth) | | | 50 | 30% | o | 49.1 | 41.4M | 128.5G | 18.9 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_30.pth) | | | 50 | 40% | o | 49.2 | 41.4M | 136.1G | 18 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_40.pth) | | | 50 | 50% | o | 49.3 | 41.4M | 143.7G | 17.2 | [link](https://twg.kakaocdn.net/brainrepo/sparse_detr/sparse_detr_swint_50.pth) | ## Initializing ResNet-50 backbone with SCRL The performance of Sparse DETR can be further improved when the backbone network is initialized with the `SCRL`([Spatially Consistent Representation Learning](https://arxiv.org/abs/2103.06122)) that aims to learn dense representations in a self-supervised way, compared to the default initialization with the ImageNet pre-trained one, denoted as `IN-sup` in the table below. * We obtained pre-trained weights from [Torchvision](https://pytorch.org/tutorials/beginner/finetuning_torchvision

评论收藏

内容反馈

版权申诉