# CSWin-Transformer, CVPR 2022
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cswin-transformer-a-general-vision/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=cswin-transformer-a-general-vision)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cswin-transformer-a-general-vision/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=cswin-transformer-a-general-vision)
This repo is the official implementation of ["CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows"](https://arxiv.org/pdf/2107.00652.pdf).
## Introduction
**CSWin Transformer** (the name `CSWin` stands for **C**ross-**S**haped **Win**dow) is introduced in [arxiv](https://arxiv.org/abs/2107.00652), which is a new general-purpose backbone for computer vision. It is a hierarchical Transformer and replaces the traditional full attention with our newly proposed cross-shaped window self-attention. The cross-shaped window self-attention mechanism computes self-attention in the horizontal and vertical stripes in parallel that from a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. With CSWin, we could realize global attention with a limited computation cost.
CSWin Transformer achieves strong performance on ImageNet classification (87.5 on val with only 97G flops) and ADE20K semantic segmentation (`55.7 mIoU` on val), surpassing previous models by a large margin.
![teaser](teaser.png)
## Main Results on ImageNet
| model | pretrain | resolution | acc@1 | #params | FLOPs | 22K model | 1K model |
|:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| CSWin-T | ImageNet-1K | 224x224 | 82.8 | 23M | 4.3G | - | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_tiny_224.pth) |
| CSWin-S | ImageNet-1k | 224x224 | 83.6 | 35M | 6.9G | - | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_small_224.pth) |
| CSWin-B | ImageNet-1k | 224x224 | 84.2 | 78M | 15.0G | - | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_base_224.pth) |
| CSWin-B | ImageNet-1k | 384x384 | 85.5 | 78M | 47.0G | - | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_base_384.pth) |
| CSWin-L | ImageNet-22k | 224x224 | 86.5 | 173M | 31.5G | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_large_22k_224.pth) | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_large_224.pth) |
| CSWin-L | ImageNet-22k | 384x384 | 87.5 | 173M | 96.8G | - | [model](https://github.com/microsoft/CSWin-Transformer/releases/download/v0.1.0/cswin_large_384.pth) |
## Main Results on Downstream Tasks
**COCO Object Detection**
| backbone | Method | pretrain | lr Schd | box mAP | mask mAP | #params | FLOPS |
|:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| CSwin-T | Mask R-CNN | ImageNet-1K | 3x | 49.0 | 43.6 | 42M | 279G |
| CSwin-S | Mask R-CNN | ImageNet-1K | 3x | 50.0 | 44.5 | 54M | 342G |
| CSwin-B | Mask R-CNN | ImageNet-1K | 3x | 50.8 | 44.9 | 97M | 526G |
| CSwin-T | Cascade Mask R-CNN | ImageNet-1K | 3x | 52.5 | 45.3 | 80M | 757G |
| CSwin-S | Cascade Mask R-CNN | ImageNet-1K | 3x | 53.7 | 46.4 | 92M | 820G |
| CSwin-B | Cascade Mask R-CNN | ImageNet-1K | 3x | 53.9 | 46.4 | 135M | 1004G |
**ADE20K Semantic Segmentation (val)**
| Backbone | Method | pretrain | Crop Size | Lr Schd | mIoU | mIoU (ms+flip) | #params | FLOPs |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| CSwin-T | Semantic FPN | ImageNet-1K | 512x512 | 80K | 48.2 | - | 26M | 202G |
| CSwin-S | Semantic FPN | ImageNet-1K | 512x512 | 80K | 49.2 | - | 39M | 271G |
| CSwin-B | Semantic FPN | ImageNet-1K | 512x512 | 80K | 49.9 | - | 81M | 464G |
| CSwin-T | UPerNet | ImageNet-1K | 512x512 | 160K | 49.3 | 50.7 | 60M | 959G |
| CSwin-S | UperNet | ImageNet-1K | 512x512 | 160K | 50.4 | 51.5 | 65M | 1027G |
| CSwin-B | UperNet | ImageNet-1K | 512x512 | 160K | 51.1 | 52.2 | 109M | 1222G |
| CSwin-B | UPerNet | ImageNet-22K | 640x640 | 160K | 51.8 | 52.6 | 109M | 1941G |
| CSwin-L | UperNet | ImageNet-22K | 640x640 | 160K | 53.4 | 55.7 | 208M | 2745G |
pretrained models and code could be found at [`segmentation`](segmentation)
## Requirements
timm==0.3.4, pytorch>=1.4, opencv, ... , run:
```
bash install_req.sh
```
Apex for mixed precision training is used for finetuning. To install apex, run:
```
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```
Data prepare: ImageNet with the following folder structure, you can extract imagenet by this [script](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).
```
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
```
## Train
Train the three lite variants: CSWin-Tiny, CSWin-Small and CSWin-Base:
```
bash train.sh 8 --data <data path> --model CSWin_64_12211_tiny_224 -b 256 --lr 2e-3 --weight-decay .05 --amp --img-size 224 --warmup-epochs 20 --model-ema-decay 0.99984 --drop-path 0.2
```
```
bash train.sh 8 --data <data path> --model CSWin_64_24322_small_224 -b 256 --lr 2e-3 --weight-decay .05 --amp --img-size 224 --warmup-epochs 20 --model-ema-decay 0.99984 --drop-path 0.4
```
```
bash train.sh 8 --data <data path> --model CSWin_96_24322_base_224 -b 128 --lr 1e-3 --weight-decay .1 --amp --img-size 224 --warmup-epochs 20 --model-ema-decay 0.99992 --drop-path 0.5
```
If you want to train our CSWin on images with 384x384 resolution, please use '--img-size 384'.
If the GPU memory is not enough, please use '-b 128 --lr 1e-3 --model-ema-decay 0.99992' or use [checkpoint](https://pytorch.org/docs/stable/checkpoint.html) '--use-chk'.
## Finetune
Finetune CSWin-Base with 384x384 resolution:
```
bash finetune.sh 8 --data <data path> --model CSWin_96_24322_base_384 -b 32 --lr 5e-6 --min-lr 5e-7 --weight-decay 1e-8 --amp --img-size 384 --warmup-epochs 0 --model-ema-decay 0.9998 --finetune <pretrained 224 model> --epochs 20 --mixup 0.1 --cooldown-epochs 10 --drop-path 0.7 --ema-finetune --lr-scale 1 --cutmix 0.1
```
Finetune ImageNet-22K pretrained CSWin-Large with 224x224 resolution:
```
bash finetune.sh 8 --data <data path> --model CSWin_144_24322_large_224 -b 64 --lr 2.5e-4 --min-lr 5e-7 --weight-decay 1e-8 --amp --img-size 224 --warmup-epochs 0 --model-ema-decay 0.9996 --finetune <22k-pretrained model> --epochs 30 --mixup 0.01 --cooldown-epochs 10 --interpolation bicubic --lr-scale 0.05 --drop-path 0.2 --cutmix 0.3 --use-chk --fine-22k --ema-finetune
```
If the GPU memory is not enough, please use [checkpoint](https://pytorch.org/docs/stable/checkpoint.html) '--use-chk'.
## Cite CSWin Transformer
```
@misc{dong2021cswin,
title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows},
author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
year={2021},
eprint={2107.00652},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Acknowledgement
This repository is built using the [timm](https://github.com/rwightman/pytorch-image-models) library and the [DeiT](https://github.com/facebookresearch/deit) repository.
## License
This project is licensed under th
sjx_alo
- 粉丝: 1w+
- 资源: 1235
最新资源
- ABAQUS高速铁路板式无砟轨道耦合动力学模型
- 短路电流计算 Matlab编程计算 针对常见的四种短路故障(单相接地短路,两相相间短路,两相接地短路,三相短路),可采取三种方法进行计算: 1.实用短路电流计算 2.对称分量法计算 3
- 优化算法改进 Matlab 麻雀搜索算法,粒子群优化算法,鲸鱼优化算法,灰狼优化算法,黏菌优化算法等优化算法,提供算法改进点 改进后的优化算法也可应用于支持向量机,最小二乘支持向量机,随机森林,核
- 遗传算法优化极限学习机做预测,运行直接出图,包含真实值,ELM,GA-ELM对比,方便比较 智能优化算法,粒子群算法,花授粉算法,麻雀算法,鲸鱼算法,灰狼算法等,优化BP神经网络,支持向量机,极限学
- FX3U,FX5U,控制IO卡 ,STM32F407ZET6工控板,包括pcb,原理图 , PLC STMF32F407ZET6 FX-3U PCB生产方案 板载资源介绍 1. 8路高速脉冲加方向
- 利用matlab和simulink搭建的纯跟踪控制器用于单移线轨迹跟踪,效果如图 版本各为2018b和2019 拿后内容包含: 1、simulink模型 2、纯跟踪算法的纯matlab代码,便于理解
- 三相光伏并网逆变器设计,原理图,PCB,以及源代码 主要包括以下板卡: 1)主控DSP板, 负责逆变器的逆变及保护控制 原理图为pdf. pcb为AD文件 2)接口板,负责信号采集、处理,以及
- 考虑气电联合需求响应的 气电综合能源配网系统协调优化运行 该文提出气电综合能源配网系统最优潮流的凸优化方法,即利用二阶锥规划方法对配电网潮流方 程约束进行处理,并提出运用增强二阶锥规划与泰勒级数展开相
- 光子晶体BIC,OAM激发 若需第二幅图中本征态以及三维Q等计算额外
- 基于共享储能电站的工业用户日前优化经济调度,通过协调各用户使用共享储能电站进行充放电,实现日运行最优 代码环境:matlab+yalmip+cplex gurobi ,注释详尽,结果正确 对学习储
- 三相PWM整流器simulink仿真模型,采用双闭关PI控制,SVPWM调制策略,可以实现很好的整流效果,交流侧谐波含量低,可以很好的应对负载突变等复杂工况
- 红外遥控器+红外一体化接收头部分的仿真 带程序 红外线编码是数据传输和家用电器遥控常用的一种通讯方法,其实质是一种脉宽调制的串行通讯 家电遥控中常用的红外线编码电路有μPD6121G型HT622型和
- 新能源系统低碳优化调度(用Matlab) 包含各类分布式电源消纳、热电联产、电锅炉、储能电池、天然气等新能源元素,实现系统中各种成本的优化,调度 若有需要,我也有matlab
- Matlab 遗传算法解决0-1背包问题(装包问题) 源码+详细注释 问题描述:已知不同物品质量与不同背包最大载重,求取最优值使得所有背包所装得的物品质量总和最大 可以改物品质量与背包载重数据
- 信捷plc控制3轴机械臂调试程序,只是调试程序,包含信捷plc程序,信捷触摸屏程序,手机组态软件程序,含手机组态软件 程序自己写的,后期还会增加相关项目 触摸屏示教程序写好,可以任意示教完成全部动
- ABS制动系统开发 PID控制 开关控制 matlab simulink carsim联合仿真,下面视频为pid控制效果和不带ABS的对比 滑移率控制目标20% 分离路面制动
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈