# Swin Transformer
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=swin-transformer-v2-scaling-up-capacity-and)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=swin-transformer-v2-scaling-up-capacity-and)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=swin-transformer-v2-scaling-up-capacity-and)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=swin-transformer-v2-scaling-up-capacity-and)
This repo is the official implementation of ["Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"](https://arxiv.org/pdf/2103.14030.pdf) as well as the follow-ups. It currently includes code and models for the following tasks:
> **Image Classification**: Included in this repo. See [get_started.md](get_started.md) for a quick start.
> **Object Detection and Instance Segmentation**: See [Swin Transformer for Object Detection](https://github.com/SwinTransformer/Swin-Transformer-Object-Detection).
> **Semantic Segmentation**: See [Swin Transformer for Semantic Segmentation](https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation).
> **Video Action Recognition**: See [Video Swin Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer).
> **Semi-Supervised Object Detection**: See [Soft Teacher](https://github.com/microsoft/SoftTeacher).
> **SSL: Contrasitive Learning**: See [Transformer-SSL](https://github.com/SwinTransformer/Transformer-SSL).
> **SSL: Masked Image Modeling**: See [get_started.md#simmim-support](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#simmim-support).
> **Mixture-of-Experts**: See [get_started](get_started.md#mixture-of-experts-support) for more instructions.
> **Feature-Distillation**: See [Feature-Distillation](https://github.com/SwinTransformer/Feature-Distillation).
## Updates
***12/29/2022***
1. **Nvidia**'s [FasterTransformer](https://github.com/NVIDIA/FasterTransformer/blob/main/docs/swin_guide.md) now supports Swin Transformer V2 inference, which have significant speed improvements on `T4 and A100 GPUs`.
***11/30/2022***
1. Models and codes of **Feature Distillation** are released. Please refer to [Feature-Distillation](https://github.com/SwinTransformer/Feature-Distillation) for details, and the checkpoints (FD-EsViT-Swin-B, FD-DeiT-ViT-B, FD-DINO-ViT-B, FD-CLIP-ViT-B, FD-CLIP-ViT-L).
***09/24/2022***
1. Merged [SimMIM](https://github.com/microsoft/SimMIM), which is a **Masked Image Modeling** based pre-training approach applicable to Swin and SwinV2 (and also applicable for ViT and ResNet). Please refer to [get started with SimMIM](get_started.md#simmim-support) to play with SimMIM pre-training.
2. Released a series of Swin and SwinV2 models pre-trained using the SimMIM approach (see [MODELHUB for SimMIM](MODELHUB.md#simmim-pretrained-swin-v2-models)), with model size ranging from SwinV2-Small-50M to SwinV2-giant-1B, data size ranging from ImageNet-1K-10% to ImageNet-22K, and iterations from 125k to 500k. You may leverage these models to study the properties of MIM methods. Please look into the [data scaling](https://arxiv.org/abs/2206.04664) paper for more details.
***07/09/2022***
`News`:
1. SwinV2-G achieves `61.4 mIoU` on ADE20K semantic segmentation (+1.5 mIoU over the previous SwinV2-G model), using an additional [feature distillation (FD)](https://github.com/SwinTransformer/Feature-Distillation) approach, **setting a new recrod** on this benchmark. FD is an approach that can generally improve the fine-tuning performance of various pre-trained models, including DeiT, DINO, and CLIP. Particularly, it improves CLIP pre-trained ViT-L by +1.6% to reach `89.0%` on ImageNet-1K image classification, which is **the most accurate ViT-L model**.
2. Merged a PR from **Nvidia** that links to faster Swin Transformer inference that have significant speed improvements on `T4 and A100 GPUs`.
3. Merged a PR from **Nvidia** that enables an option to use `pure FP16 (Apex O2)` in training, while almost maintaining the accuracy.
***06/03/2022***
1. Added **Swin-MoE**, the Mixture-of-Experts variant of Swin Transformer implemented using [Tutel](https://github.com/microsoft/tutel) (an optimized Mixture-of-Experts implementation). **Swin-MoE** is introduced in the [TuTel](https://arxiv.org/abs/2206.03382) paper.
***05/12/2022***
1. Pretrained models of [Swin Transformer V2](https://arxiv.org/abs/2111.09883) on ImageNet-1K and ImageNet-22K are released.
2. ImageNet-22K pretrained models for Swin-V1-Tiny and Swin-V2-Small are released.
***03/02/2022***
1. Swin Transformer V2 and SimMIM got accepted by CVPR 2022. [SimMIM](https://github.com/microsoft/SimMIM) is a self-supervised pre-training approach based on masked image modeling, a key technique that works out the 3-billion-parameter Swin V2 model using `40x less labelled data` than that of previous billion-scale models based on JFT-3B.
***02/09/2022***
1. Integrated into [Huggingface Spaces ����](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/Swin-Transformer)
***10/12/2021***
1. Swin Transformer received ICCV 2021 best paper award (Marr Prize).
***08/09/2021***
1. [Soft Teacher](https://arxiv.org/pdf/2106.09018v2.pdf) will appear at ICCV2021. The code will be released at [GitHub Repo](https://github.com/microsoft/SoftTeacher). `Soft Teacher` is an end-to-end semi-supervisd object detection method, achieving a new record on the COCO test-dev: `61.3 box AP` and `53.0 mask AP`.
***07/03/2021***
1. Add **Swin MLP**, which is an adaption of `Swin Transformer` by replacing all multi-head self-attention (MHSA) blocks by MLP layers (more precisely it is a group linear layer). The shifted window configuration can also significantly improve the performance of vanilla MLP architectures.
***06/25/2021***
1. [Video Swin Transformer](https://arxiv.org/abs/2106.13230) is released at [Video-Swin-Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer).
`Video Swin Transformer` achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including action recognition (`84.9` top-1 accuracy on Kinetics-400 and `86.1` top-1 accuracy on Kinetics-600 with `~20x` less pre-training data and `~3x` smaller model size) and temporal modeling (`69.6` top-1 accuracy on Something-Something v2).
***05/12/2021***
1. Used as a backbone for `Self-Supervised Learning`: [Transformer-SSL](https://github.com/SwinTransformer/Transformer-SSL)
Using Swin-Transformer as the backbone for self-supervised learning enables us to evaluate the transferring performance of the learnt representations on down-stream tasks, which is missing in previous works due to the use of ViT/DeiT, which has not been well tamed for down-stream tasks.
***04/12/2021***
Initial commits:
1. Pretrained models on ImageNet-1K ([Swin-T-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth), [Swin-S-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth), [Swin-B-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/sw
没有合适的资源?快使用搜索试试~ 我知道了~
Swin-Transformer
共100个文件
yaml:46个
py:29个
md:6个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 136 浏览量
2023-08-20
17:29:49
上传
评论
收藏 2.55MB ZIP 举报
温馨提示
Swin-Transformer是2021年微软研究院发表在ICCV上的一篇文章,并且已经获得ICCV 2021 best paper的荣誉称号。虽然Vision Transformer (ViT)在图像分类方面的结果令人鼓舞,但是由于其低分辨率特性映射和复杂度随图像大小的二次增长,其结构不适合作为密集视觉任务或高分辨率输入图像的通过骨干网路。为了最佳的精度和速度的权衡,提出了Swin-Transformer结构。
资源推荐
资源详情
资源评论
收起资源包目录
Swin-Transformer (100个子文件)
swin_window_process.cpp 4KB
swin_window_process_kernel.cu 9KB
.gitignore 2KB
.gitignore 190B
Swin-Transformer-main.iml 454B
W-MSA.jpg 1.15MB
MSA.jpg 1.04MB
SW-MSA.jpg 750KB
LICENSE 1KB
MODELHUB.md 36KB
README.md 30KB
get_started.md 12KB
SECURITY.md 3KB
SUPPORT.md 1KB
CODE_OF_CONDUCT.md 444B
2103.14030.pdf 1.3MB
teaser.png 909KB
swin_transformer_moe.py 37KB
swin_transformer.py 27KB
swin_transformer_v2.py 26KB
swin_mlp.py 18KB
main_moe.py 16KB
main.py 15KB
main_simmim_ft.py 13KB
config.py 11KB
utils_moe.py 11KB
utils.py 9KB
unit_test.py 9KB
main_simmim_pt.py 9KB
cached_image_folder.py 9KB
utils_simmim.py 8KB
build.py 7KB
simmim.py 7KB
build.py 6KB
optimizer.py 6KB
lr_scheduler.py 5KB
data_simmim_ft.py 4KB
data_simmim_pt.py 4KB
zipreader.py 3KB
window_process.py 2KB
imagenet22k_dataset.py 2KB
logger.py 1KB
samplers.py 781B
__init__.py 382B
setup.py 343B
__init__.py 30B
map22kto1k.txt 5KB
SW-MSA.vsdx 54KB
W-MSA.vsdx 21KB
workspace.xml 2KB
Project_Default.xml 1KB
modules.xml 301B
misc.xml 188B
profiles_settings.xml 174B
swin_moe_base_patch4_window12_192_cosine_router_32expert_32gpu_22k.yaml 734B
swin_moe_small_patch4_window12_192_cosine_router_32expert_32gpu_22k.yaml 734B
swin_moe_small_patch4_window12_192_16expert_32gpu_22k.yaml 697B
swin_moe_base_patch4_window12_192_16expert_32gpu_22k.yaml 697B
swin_moe_small_patch4_window12_192_32expert_32gpu_22k.yaml 696B
swin_moe_base_patch4_window12_192_32expert_32gpu_22k.yaml 696B
swin_moe_small_patch4_window12_192_8expert_32gpu_22k.yaml 696B
swin_moe_small_patch4_window12_192_64expert_64gpu_22k.yaml 696B
swin_moe_base_patch4_window12_192_8expert_32gpu_22k.yaml 696B
simmim_pretrain__swinv2_base__img192_window12__800ep.yaml 564B
swin_moe_base_patch4_window12_192_densebaseline_22k.yaml 540B
swin_moe_small_patch4_window12_192_densebaseline_22k.yaml 540B
simmim_pretrain__swin_base__img192_window6__800ep.yaml 489B
simmim_finetune__swinv2_base__img224_window14__800ep.yaml 470B
swinv2_large_patch4_window12to24_192to384_22kto1k_ft.yaml 415B
simmim_finetune__swin_base__img224_window7__800ep.yaml 414B
swinv2_base_patch4_window12to24_192to384_22kto1k_ft.yaml 413B
swinv2_large_patch4_window12to16_192to256_22kto1k_ft.yaml 394B
swinv2_base_patch4_window12to16_192to256_22kto1k_ft.yaml 393B
swinv2_large_patch4_window12_192_22k.yaml 378B
swinv2_base_patch4_window12_192_22k.yaml 376B
swin_large_patch4_window12_384_22kto1k_finetune.yaml 359B
swin_base_patch4_window12_384_22kto1k_finetune.yaml 357B
swin_large_patch4_window7_224_22k.yaml 355B
swin_base_patch4_window7_224_22k.yaml 353B
swin_small_patch4_window7_224_22k.yaml 353B
swin_tiny_patch4_window7_224_22k.yaml 351B
swin_base_patch4_window12_384_finetune.yaml 349B
swin_large_patch4_window7_224_22kto1k_finetune.yaml 315B
swin_small_patch4_window7_224_22kto1k_finetune.yaml 313B
swin_base_patch4_window7_224_22kto1k_finetune.yaml 313B
swin_tiny_patch4_window7_224_22kto1k_finetune.yaml 311B
swin_mlp_tiny_c6_patch4_window8_256.yaml 222B
swin_mlp_tiny_c12_patch4_window8_256.yaml 221B
swin_mlp_tiny_c24_patch4_window8_256.yaml 220B
swinv2_base_patch4_window16_256.yaml 214B
swinv2_small_patch4_window16_256.yaml 214B
swinv2_base_patch4_window8_256.yaml 212B
swinv2_tiny_patch4_window16_256.yaml 212B
swinv2_small_patch4_window8_256.yaml 212B
swinv2_tiny_patch4_window8_256.yaml 210B
swin_tiny_c24_patch4_window8_256.yaml 208B
swin_mlp_base_patch4_window7_224.yaml 197B
swin_small_patch4_window7_224.yaml 184B
swin_base_patch4_window7_224.yaml 184B
swin_tiny_patch4_window7_224.yaml 182B
共 100 条
- 1
资源评论
sjx_alo
- 粉丝: 1w+
- 资源: 1235
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功