# ViTDet: Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li, Hanzi Mao, Ross Girshick†, Kaiming He†
[[`arXiv`](https://arxiv.org/abs/2203.16527)] [[`BibTeX`](#CitingViTDet)]
In this repository, we provide configs and models in Detectron2 for ViTDet as well as MViTv2 and Swin backbones with our implementation and settings as described in [ViTDet](https://arxiv.org/abs/2203.16527) paper.
## Pretrained Models
### COCO
#### Mask R-CNN
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">pre-train</th>
<th valign="bottom">train<br/>time<br/>(s/im)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_vitdet_b_100ep -->
<tr><td align="left"><a href="configs/COCO/mask_rcnn_vitdet_b_100ep.py">ViTDet, ViT-B</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">0.314</td>
<td align="center">0.079</td>
<td align="center">10.9</td>
<td align="center">51.6</td>
<td align="center">45.9</td>
<td align="center">325346929</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/mask_rcnn_vitdet_b/f325346929/model_final_61ccd1.pkl">model</a></td>
</tr>
<!-- ROW: mask_rcnn_vitdet_l_100ep -->
<tr><td align="left"><a href="configs/COCO/mask_rcnn_vitdet_l_100ep.py">ViTDet, ViT-L</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">0.603</td>
<td align="center">0.125</td>
<td align="center">20.9</td>
<td align="center">55.5</td>
<td align="center">49.2</td>
<td align="center">325599698</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/mask_rcnn_vitdet_l/f325599698/model_final_6146ed.pkl">model</a></td>
</tr>
<!-- ROW: mask_rcnn_vitdet_b_75ep -->
<tr><td align="left"><a href="configs/COCO/mask_rcnn_vitdet_h_75ep.py">ViTDet, ViT-H</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">1.098</td>
<td align="center">0.178</td>
<td align="center">31.5</td>
<td align="center">56.7</td>
<td align="center">50.2</td>
<td align="center">329145471</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/mask_rcnn_vitdet_h/f329145471/model_final_7224f1.pkl">model</a></td>
</tr>
</tbody></table>
#### Cascade Mask R-CNN
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">pre-train</th>
<th valign="bottom">train<br/>time<br/>(s/im)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: cascade_mask_rcnn_swin_b_in21k_50ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_swin_b_in21k_50ep.py">Swin-B</a></td>
<td align="center">IN21K, sup</td>
<td align="center">0.389</td>
<td align="center">0.077</td>
<td align="center">8.7</td>
<td align="center">53.9</td>
<td align="center">46.2</td>
<td align="center">342979038</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_swin_b_in21k/f342979038/model_final_246a82.pkl">model</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_swin_l_in21k_50ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_swin_l_in21k_50ep.py">Swin-L</a></td>
<td align="center">IN21K, sup</td>
<td align="center">0.508</td>
<td align="center">0.097</td>
<td align="center">12.6</td>
<td align="center">55.0</td>
<td align="center">47.2</td>
<td align="center">342979186</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_swin_l_in21k/f342979186/model_final_7c897e.pkl">model</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_mvitv2_b_in21k_100ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_mvitv2_b_in21k_100ep.py">MViTv2-B</a></td>
<td align="center">IN21K, sup</td>
<td align="center">0.475</td>
<td align="center">0.090</td>
<td align="center">8.9</td>
<td align="center">55.6</td>
<td align="center">48.1</td>
<td align="center">325820315</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_mvitv2_b_in21k/f325820315/model_final_8c3da3.pkl">model</a></td>
</tr>
</tr>
<!-- ROW: cascade_mask_rcnn_mvitv2_l_in21k_50ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_mvitv2_l_in21k_50ep.py">MViTv2-L</a></td>
<td align="center">IN21K, sup</td>
<td align="center">0.844</td>
<td align="center">0.157</td>
<td align="center">19.7</td>
<td align="center">55.7</td>
<td align="center">48.3</td>
<td align="center">325607715</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_mvitv2_l_in21k/f325607715/model_final_2141b0.pkl">model</a></td>
</tr>
</tr>
<!-- ROW: cascade_mask_rcnn_mvitv2_h_in21k_36ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_mvitv2_h_in21k_36ep.py">MViTv2-H</a></td>
<td align="center">IN21K, sup</td>
<td align="center">1.655</td>
<td align="center">0.285</td>
<td align="center">18.4*</td>
<td align="center">55.9</td>
<td align="center">48.3</td>
<td align="center">326187358</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_mvitv2_h_in21k/f326187358/model_final_2234d7.pkl">model</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_vitdet_b_100ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_vitdet_b_100ep.py">ViTDet, ViT-B</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">0.362</td>
<td align="center">0.089</td>
<td align="center">12.3</td>
<td align="center">54.0</td>
<td align="center">46.7</td>
<td align="center">325358525</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_b/f325358525/model_final_435fa9.pkl">model</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_vitdet_l_100ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_vitdet_l_100ep.py">ViTDet, ViT-L</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">0.643</td>
<td align="center">0.142</td>
<td align="center">22.3</td>
<td align="center">57.6</td>
<td align="center">50.0</td>
<td align="center">328021305</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_l/f328021305/model_final_1a9f28.pkl">model</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_vitdet_h_75ep -->
<tr><td align="left"><a href="configs/COCO/cascade_mask_rcnn_vitdet_h_75ep.py">ViTDet, ViT-H</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">1.137</td>
<td align="center">0.196</td>
<td align="center">32.9</td>
<td align="center">58.7</td>
<td align="center">51.0</td>
<td align="center">328730692</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl">model</a></td>
</tr>
</tbody></table>
### LVIS
#### Mask R-CNN
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">pre-train</th>
<th valign="bottom">train<br/>time<br/>(s/im)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_vitdet_b_100ep -->
<tr><td align="left"><a href="configs/LVIS/mask_rcnn_vitdet_b_100ep.py">ViTDet, ViT-B</a></td>
<td align="center">IN1K, MAE</td>
<td align="center">0.317</td>
<td align="center">0.085</td>
<td align="center">14.4</td>
<td align="center">40.2</td>
<td align="cent
没有合适的资源?快使用搜索试试~ 我知道了~
slowfast项目压缩文件
共820个文件
py:496个
yaml:181个
md:70个
需积分: 0 1 下载量 178 浏览量
2023-12-24
16:06:57
上传
评论
收藏 1.26MB RAR 举报
温馨提示
github上面的slowfast项目文件
资源推荐
资源详情
资源评论
收起资源包目录
slowfast项目压缩文件 (820个子文件)
pkg_helpers.bash 2KB
setup.cfg 994B
.clang-format 2KB
cocoeval.cpp 20KB
ROIAlignRotated_cpu.cpp 16KB
torchscript_mask_rcnn.cpp 6KB
vision.cpp 3KB
nms_rotated_cpu.cpp 2KB
box_iou_rotated_cpu.cpp 1KB
vision.cpp 429B
custom.css 511B
deform_conv_cuda_kernel.cu 43KB
deform_conv_cuda.cu 31KB
ROIAlignRotated_cuda.cu 14KB
SwapAlign2Nat_cuda.cu 13KB
nms_rotated_cuda.cu 5KB
box_iou_rotated_cuda.cu 4KB
cuda_version.cu 622B
Dockerfile 2KB
deploy.Dockerfile 1KB
.flake8 487B
.gitignore 556B
.gitignore 7B
box_iou_rotated_utils.h 11KB
deform_conv.h 8KB
cocoeval.h 3KB
ROIAlignRotated.h 3KB
SwapAlign2Nat.h 1KB
nms_rotated.h 1KB
box_iou_rotated.h 988B
lazyconfig.jpg 64KB
levenshtein.js 2KB
LICENSE 10KB
Makefile 630B
MODEL_ZOO.md 57KB
DENSEPOSE_IUV.md 30KB
DENSEPOSE_DATASETS.md 19KB
DENSEPOSE_CSE.md 16KB
README.md 15KB
datasets.md 14KB
INSTALL.md 12KB
TOOL_APPLY_NET.md 10KB
models.md 9KB
README.md 9KB
augmentation.md 8KB
deployment.md 8KB
benchmarks.md 7KB
README.md 7KB
lazyconfigs.md 7KB
BOOTSTRAPPING_PIPELINE.md 7KB
extend.md 6KB
README.md 6KB
data_loading.md 5KB
README.md 5KB
compatibility.md 4KB
README.md 4KB
TOOL_QUERY_DB.md 4KB
write-models.md 4KB
README.md 4KB
README.md 4KB
CONTRIBUTING.md 4KB
GETTING_STARTED.md 3KB
training.md 3KB
GETTING_STARTED.md 3KB
README.md 3KB
README.md 3KB
RELEASE_2021_03.md 3KB
README.md 3KB
README.md 3KB
README.md 3KB
evaluation.md 3KB
changelog.md 3KB
configs.md 3KB
unexpected-problems-bugs.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
feature-request.md 1KB
bugs.md 1KB
RELEASE_2021_06.md 1KB
README.md 622B
README.md 572B
README.md 508B
README.md 477B
documentation.md 414B
pull_request_template.md 380B
README.md 371B
README.md 347B
RELEASE_2020_04.md 346B
README.md 327B
README.md 291B
README.md 275B
CODE_OF_CONDUCT.md 244B
README.md 214B
README.md 193B
README.md 175B
ISSUE_TEMPLATE.md 143B
README.md 122B
README.md 110B
共 820 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9
资源评论
计算机视觉Dragon
- 粉丝: 1
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- vmware workstation pro 17 linux版
- 3479521_1710042575-1.rwmod
- 安装及环境配置UMCM-2023C-ma笔记
- (完整)数据库课程设计餐厅点餐说明书-21ab6d3c8beb172ded630b1c59eef8c75ebf952c.doc
- 2023-04-06-项目笔记 - 第一百五十四阶段 - 4.4.2.152全局变量的作用域-152 -2024.06.04
- 松哥解协议松哥解协议松哥解协议松哥解协议松哥解协议
- 618节日618节日618节日
- tensorflow-gpu-2.9.1-cp37-cp37m-win-amd64.whl
- tensorflow-gpu-2.9.0-cp37-cp37m-win-amd64.whl
- tensorflow-gpu-2.9.0-cp39-cp39-win-amd64.whl
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功