Swin-Transformer_7×7像素的空间移动窗口和4年的时间移动窗口的组合资源-CSDN文库

共100个文件

yaml：46个

py：29个

md：6个

版权申诉

人工智能

深度学习

transformer

136 浏览量 2023-08-20 17:29:49 上传评论收藏 2.55MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Swin-Transformer （100个子文件）

swin_window_process.cpp 4KB

swin_window_process_kernel.cu 9KB

.gitignore 2KB

.gitignore 190B

Swin-Transformer-main.iml 454B

W-MSA.jpg 1.15MB

MSA.jpg 1.04MB

SW-MSA.jpg 750KB

LICENSE 1KB

MODELHUB.md 36KB

README.md 30KB

get_started.md 12KB

SECURITY.md 3KB

SUPPORT.md 1KB

CODE_OF_CONDUCT.md 444B

2103.14030.pdf 1.3MB

teaser.png 909KB

swin_transformer_moe.py 37KB

swin_transformer.py 27KB

swin_transformer_v2.py 26KB

swin_mlp.py 18KB

main_moe.py 16KB

main.py 15KB

main_simmim_ft.py 13KB

config.py 11KB

utils_moe.py 11KB

utils.py 9KB

unit_test.py 9KB

main_simmim_pt.py 9KB

cached_image_folder.py 9KB

utils_simmim.py 8KB

build.py 7KB

simmim.py 7KB

build.py 6KB

optimizer.py 6KB

lr_scheduler.py 5KB

data_simmim_ft.py 4KB

data_simmim_pt.py 4KB

zipreader.py 3KB

window_process.py 2KB

imagenet22k_dataset.py 2KB

logger.py 1KB

samplers.py 781B

__init__.py 382B

setup.py 343B

__init__.py 30B

map22kto1k.txt 5KB

SW-MSA.vsdx 54KB

W-MSA.vsdx 21KB

workspace.xml 2KB

Project_Default.xml 1KB

modules.xml 301B

misc.xml 188B

profiles_settings.xml 174B

swin_moe_base_patch4_window12_192_cosine_router_32expert_32gpu_22k.yaml 734B

swin_moe_small_patch4_window12_192_cosine_router_32expert_32gpu_22k.yaml 734B

swin_moe_small_patch4_window12_192_16expert_32gpu_22k.yaml 697B

swin_moe_base_patch4_window12_192_16expert_32gpu_22k.yaml 697B

swin_moe_small_patch4_window12_192_32expert_32gpu_22k.yaml 696B

swin_moe_base_patch4_window12_192_32expert_32gpu_22k.yaml 696B

swin_moe_small_patch4_window12_192_8expert_32gpu_22k.yaml 696B

swin_moe_small_patch4_window12_192_64expert_64gpu_22k.yaml 696B

swin_moe_base_patch4_window12_192_8expert_32gpu_22k.yaml 696B

simmim_pretrain__swinv2_base__img192_window12__800ep.yaml 564B

swin_moe_base_patch4_window12_192_densebaseline_22k.yaml 540B

swin_moe_small_patch4_window12_192_densebaseline_22k.yaml 540B

simmim_pretrain__swin_base__img192_window6__800ep.yaml 489B

simmim_finetune__swinv2_base__img224_window14__800ep.yaml 470B

swinv2_large_patch4_window12to24_192to384_22kto1k_ft.yaml 415B

simmim_finetune__swin_base__img224_window7__800ep.yaml 414B

swinv2_base_patch4_window12to24_192to384_22kto1k_ft.yaml 413B

swinv2_large_patch4_window12to16_192to256_22kto1k_ft.yaml 394B

swinv2_base_patch4_window12to16_192to256_22kto1k_ft.yaml 393B

swinv2_large_patch4_window12_192_22k.yaml 378B

swinv2_base_patch4_window12_192_22k.yaml 376B

swin_large_patch4_window12_384_22kto1k_finetune.yaml 359B

swin_base_patch4_window12_384_22kto1k_finetune.yaml 357B

swin_large_patch4_window7_224_22k.yaml 355B

swin_base_patch4_window7_224_22k.yaml 353B

swin_small_patch4_window7_224_22k.yaml 353B

swin_tiny_patch4_window7_224_22k.yaml 351B

swin_base_patch4_window12_384_finetune.yaml 349B

swin_large_patch4_window7_224_22kto1k_finetune.yaml 315B

swin_small_patch4_window7_224_22kto1k_finetune.yaml 313B

swin_base_patch4_window7_224_22kto1k_finetune.yaml 313B

swin_tiny_patch4_window7_224_22kto1k_finetune.yaml 311B

swin_mlp_tiny_c6_patch4_window8_256.yaml 222B

swin_mlp_tiny_c12_patch4_window8_256.yaml 221B

swin_mlp_tiny_c24_patch4_window8_256.yaml 220B

swinv2_base_patch4_window16_256.yaml 214B

swinv2_small_patch4_window16_256.yaml 214B

swinv2_base_patch4_window8_256.yaml 212B

swinv2_tiny_patch4_window16_256.yaml 212B

swinv2_small_patch4_window8_256.yaml 212B

swinv2_tiny_patch4_window8_256.yaml 210B

swin_tiny_c24_patch4_window8_256.yaml 208B

swin_mlp_base_patch4_window7_224.yaml 197B

swin_small_patch4_window7_224.yaml 184B

swin_base_patch4_window7_224.yaml 184B

swin_tiny_patch4_window7_224.yaml 182B

共 100 条

# Swin Transformer [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=swin-transformer-v2-scaling-up-capacity-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=swin-transformer-v2-scaling-up-capacity-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=swin-transformer-v2-scaling-up-capacity-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin-transformer-v2-scaling-up-capacity-and/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=swin-transformer-v2-scaling-up-capacity-and) This repo is the official implementation of ["Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"](https://arxiv.org/pdf/2103.14030.pdf) as well as the follow-ups. It currently includes code and models for the following tasks: > **Image Classification**: Included in this repo. See [get_started.md](get_started.md) for a quick start. > **Object Detection and Instance Segmentation**: See [Swin Transformer for Object Detection](https://github.com/SwinTransformer/Swin-Transformer-Object-Detection). > **Semantic Segmentation**: See [Swin Transformer for Semantic Segmentation](https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation). > **Video Action Recognition**: See [Video Swin Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer). > **Semi-Supervised Object Detection**: See [Soft Teacher](https://github.com/microsoft/SoftTeacher). > **SSL: Contrasitive Learning**: See [Transformer-SSL](https://github.com/SwinTransformer/Transformer-SSL). > **SSL: Masked Image Modeling**: See [get_started.md#simmim-support](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#simmim-support). > **Mixture-of-Experts**: See [get_started](get_started.md#mixture-of-experts-support) for more instructions. > **Feature-Distillation**: See [Feature-Distillation](https://github.com/SwinTransformer/Feature-Distillation). ## Updates ***12/29/2022*** 1. **Nvidia**'s [FasterTransformer](https://github.com/NVIDIA/FasterTransformer/blob/main/docs/swin_guide.md) now supports Swin Transformer V2 inference, which have significant speed improvements on `T4 and A100 GPUs`. ***11/30/2022*** 1. Models and codes of **Feature Distillation** are released. Please refer to [Feature-Distillation](https://github.com/SwinTransformer/Feature-Distillation) for details, and the checkpoints (FD-EsViT-Swin-B, FD-DeiT-ViT-B, FD-DINO-ViT-B, FD-CLIP-ViT-B, FD-CLIP-ViT-L). ***09/24/2022*** 1. Merged [SimMIM](https://github.com/microsoft/SimMIM), which is a **Masked Image Modeling** based pre-training approach applicable to Swin and SwinV2 (and also applicable for ViT and ResNet). Please refer to [get started with SimMIM](get_started.md#simmim-support) to play with SimMIM pre-training. 2. Released a series of Swin and SwinV2 models pre-trained using the SimMIM approach (see [MODELHUB for SimMIM](MODELHUB.md#simmim-pretrained-swin-v2-models)), with model size ranging from SwinV2-Small-50M to SwinV2-giant-1B, data size ranging from ImageNet-1K-10% to ImageNet-22K, and iterations from 125k to 500k. You may leverage these models to study the properties of MIM methods. Please look into the [data scaling](https://arxiv.org/abs/2206.04664) paper for more details. ***07/09/2022*** `News`: 1. SwinV2-G achieves `61.4 mIoU` on ADE20K semantic segmentation (+1.5 mIoU over the previous SwinV2-G model), using an additional [feature distillation (FD)](https://github.com/SwinTransformer/Feature-Distillation) approach, **setting a new recrod** on this benchmark. FD is an approach that can generally improve the fine-tuning performance of various pre-trained models, including DeiT, DINO, and CLIP. Particularly, it improves CLIP pre-trained ViT-L by +1.6% to reach `89.0%` on ImageNet-1K image classification, which is **the most accurate ViT-L model**. 2. Merged a PR from **Nvidia** that links to faster Swin Transformer inference that have significant speed improvements on `T4 and A100 GPUs`. 3. Merged a PR from **Nvidia** that enables an option to use `pure FP16 (Apex O2)` in training, while almost maintaining the accuracy. ***06/03/2022*** 1. Added **Swin-MoE**, the Mixture-of-Experts variant of Swin Transformer implemented using [Tutel](https://github.com/microsoft/tutel) (an optimized Mixture-of-Experts implementation). **Swin-MoE** is introduced in the [TuTel](https://arxiv.org/abs/2206.03382) paper. ***05/12/2022*** 1. Pretrained models of [Swin Transformer V2](https://arxiv.org/abs/2111.09883) on ImageNet-1K and ImageNet-22K are released. 2. ImageNet-22K pretrained models for Swin-V1-Tiny and Swin-V2-Small are released. ***03/02/2022*** 1. Swin Transformer V2 and SimMIM got accepted by CVPR 2022. [SimMIM](https://github.com/microsoft/SimMIM) is a self-supervised pre-training approach based on masked image modeling, a key technique that works out the 3-billion-parameter Swin V2 model using `40x less labelled data` than that of previous billion-scale models based on JFT-3B. ***02/09/2022*** 1. Integrated into [Huggingface Spaces ��](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/Swin-Transformer) ***10/12/2021*** 1. Swin Transformer received ICCV 2021 best paper award (Marr Prize). ***08/09/2021*** 1. [Soft Teacher](https://arxiv.org/pdf/2106.09018v2.pdf) will appear at ICCV2021. The code will be released at [GitHub Repo](https://github.com/microsoft/SoftTeacher). `Soft Teacher` is an end-to-end semi-supervisd object detection method, achieving a new record on the COCO test-dev: `61.3 box AP` and `53.0 mask AP`. ***07/03/2021*** 1. Add **Swin MLP**, which is an adaption of `Swin Transformer` by replacing all multi-head self-attention (MHSA) blocks by MLP layers (more precisely it is a group linear layer). The shifted window configuration can also significantly improve the performance of vanilla MLP architectures. ***06/25/2021*** 1. [Video Swin Transformer](https://arxiv.org/abs/2106.13230) is released at [Video-Swin-Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer). `Video Swin Transformer` achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including action recognition (`84.9` top-1 accuracy on Kinetics-400 and `86.1` top-1 accuracy on Kinetics-600 with `~20x` less pre-training data and `~3x` smaller model size) and temporal modeling (`69.6` top-1 accuracy on Something-Something v2). ***05/12/2021*** 1. Used as a backbone for `Self-Supervised Learning`: [Transformer-SSL](https://github.com/SwinTransformer/Transformer-SSL) Using Swin-Transformer as the backbone for self-supervised learning enables us to evaluate the transferring performance of the learnt representations on down-stream tasks, which is missing in previous works due to the use of ViT/DeiT, which has not been well tamed for down-stream tasks. ***04/12/2021*** Initial commits: 1. Pretrained models on ImageNet-1K ([Swin-T-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth), [Swin-S-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth), [Swin-B-IN1K](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/sw

评论收藏

内容反馈

版权申诉