poolformer源码_机器学习资源-CSDN文库

共106个文件

py：91个

sh：9个

md：3个

版权申诉

机器学习

70 浏览量 2022-09-21 02:53:01 上传评论收藏 442KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

poolformer源码_机器学习（106个子文件）

Dockerfile_mmdetseg 1KB

poolformer_demo.ipynb 426KB

LICENSE 11KB

README.md 7KB

README.md 5KB

README.md 4KB

train.py 40KB

poolformer.py 20KB

validate.py 15KB

pytorch2onnx.py 13KB

deploy_test.py 11KB

test.py 9KB

onnx2tensorrt.py 9KB

align_resize.py 9KB

checkpoint.py 9KB

test.py 8KB

train.py 7KB

cascade_mask_rcnn_r50_fpn.py 7KB

train.py 7KB

cascade_mask_rcnn_pvtv2_b2_fpn.py 7KB

coco_stuff10k.py 7KB

train.py 6KB

cascade_rcnn_r50_fpn.py 6KB

pytorch2torchscript.py 6KB

stare.py 6KB

test.py 6KB

train.py 6KB

browse_dataset.py 5KB

coco_stuff164k.py 5KB

analyze_logs.py 4KB

hrf.py 4KB

drive.py 4KB

mask_rcnn_r50_fpn.py 4KB

mask_rcnn_r50_caffe_c4.py 4KB

epoch_based_runner.py 4KB

mmseg2torchserve.py 4KB

faster_rcnn_r50_caffe_c4.py 4KB

faster_rcnn_r50_fpn.py 4KB

faster_rcnn_r50_caffe_dc5.py 3KB

chase_db1.py 3KB

voc_aug.py 3KB

mit2mmseg.py 3KB

pascal_context.py 3KB

checkpoint.py 3KB

swin2mmseg.py 3KB

benchmark.py 2KB

vit2mmseg.py 2KB

fast_rcnn_r50_fpn.py 2KB

wider_face.py 2KB

cityscapes_instance.py 2KB

ade20k.py 2KB

cityscapes_detection.py 2KB

coco_instance_semantic.py 2KB

voc0712.py 2KB

deepfashion.py 2KB

mmseg_handler.py 2KB

cityscapes.py 2KB

rpn_r50_fpn.py 2KB

get_flops.py 2KB

test_torchserve.py 2KB

coco_instance.py 2KB

retinanet_r50_fpn.py 2KB

rpn_r50_caffe_c4.py 2KB

coco_detection.py 2KB

ssd300.py 1KB

fpn_poolformer_s24_ade20k_40k.py 1KB

fpn_poolformer_m48_ade20k_40k.py 1KB

fpn_poolformer_s36_ade20k_40k.py 1KB

fpn_poolformer_m36_ade20k_40k.py 1KB

fpn_poolformer_s12_ade20k_40k.py 1KB

optimizer.py 1KB

publish_model.py 1KB

fpn_r50.py 1KB

print_config.py 1KB

retinanet_poolformer_s12_fpn_1x_coco.py 936B

retinanet_poolformer_s36_fpn_1x_coco.py 935B

retinanet_poolformer_s24_fpn_1x_coco.py 935B

mask_rcnn_poolformer_s12_fpn_1x_coco.py 874B

mask_rcnn_poolformer_s36_fpn_1x_coco.py 873B

mask_rcnn_poolformer_s24_fpn_1x_coco.py 873B

lvis_v0.5_instance.py 786B

lvis_v1_instance.py 736B

fpn_r50_512x512_40k_ade20k.py 681B

schedule_160k.py 382B

schedule_20k.py 379B

schedule_40k.py 379B

schedule_80k.py 379B

default_runtime.py 368B

default_runtime.py 321B

schedule_2x.py 320B

schedule_20e.py 320B

schedule_1x.py 319B

fpn_x101644d_512x512_40k_ade20k.py 249B

fpn_x101324d_512x512_40k_ade20k.py 249B

fpn_r18_512x512_40k_ade20k.py 191B

fpn_r101_512x512_40k_ade20k.py 123B

__init__.py 25B

slurm_test.sh 566B

slurm_train.sh 539B

dist_test.sh 272B

共 106 条

# PoolFormer: [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) (CVPR 2022 Oral) <p align="center"> <a href="https://arxiv.org/abs/2111.11418" alt="arXiv"> <img src="https://img.shields.io/badge/arXiv-2111.11418-b31b1b.svg?style=flat" /></a> <a href="https://huggingface.co/spaces/akhaliq/poolformer" alt="Hugging Face Spaces"> <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" /></a> <a href="https://colab.research.google.com/github/sail-sg/poolformer/blob/main/misc/poolformer_demo.ipynb" alt="Colab"> <img src="https://colab.research.google.com/assets/colab-badge.svg" /></a> </p> This is a PyTorch implementation of **PoolFormer** proposed by our paper "[MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)" (CVPR 2022 Oral). **Note**: Instead of designing complicated token mixer to achieve SOTA performance, the target of this work is to demonstrate the competence of transformer models largely stem from the general architecture MetaFormer. Pooling/PoolFormer are just the tools to support our claim. ![MetaFormer](https://user-images.githubusercontent.com/15921929/144710761-1635f59a-abde-4946-984c-a2c3f22a19d2.png) Figure 1: **MetaFormer and performance of MetaFormer-based models on ImageNet-1K validation set.** We argue that the competence of transformer/MLP-like models primarily stem from the general architecture MetaFormer instead of the equipped specific token mixers. To demonstrate this, we exploit an embarrassingly simple non-parametric operator, pooling, to conduct extremely basic token mixing. Surprisingly, the resulted model PoolFormer consistently outperforms the DeiT and ResMLP as shown in (b), which well supports that MetaFormer is actually what we need to achieve competitive performance. RSB-ResNet in (b) means the results are from “ResNet Strikes Back” where ResNet is trained with improved training procedure for 300 epochs. ![PoolFormer](https://user-images.githubusercontent.com/15921929/142746124-1ab7635d-2536-4a0e-ad43-b4fe2c5a525d.png) Figure 2: (a) **The overall framework of PoolFormer.** (b) **The architecture of PoolFormer block.** Compared with transformer block, it replaces attention with an extremely simple non-parametric operator, pooling, to conduct only basic token mixing. ## Bibtex ``` @article{yu2021metaformer, title={MetaFormer is Actually What You Need for Vision}, author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng}, journal={arXiv preprint arXiv:2111.11418}, year={2021} } ``` **Detection and instance segmentation on COCO** configs and trained models are [here](detection/). **Semantic segmentation on ADE20K** configs and trained models are [here](segmentation/). ## Image Classification ### 1. Requirements torch>=1.7.0; torchvision>=0.8.0; pyyaml; [apex-amp](https://github.com/NVIDIA/apex) (if you want to use fp16); [timm](https://github.com/rwightman/pytorch-image-models) (`pip install git+https://github.com/rwightman/pytorch-image-models.git@9d6aad44f8fd32e89e5cca503efe3ada5071cc2a`) data prepare: ImageNet with the following folder structure, you can extract ImageNet by this [script](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4). ``` │imagenet/ ├──train/ │ ├── n01440764 │ │ ├── n01440764_10026.JPEG │ │ ├── n01440764_10027.JPEG │ │ ├── ...... │ ├── ...... ├──val/ │ ├── n01440764 │ │ ├── ILSVRC2012_val_00000293.JPEG │ │ ├── ILSVRC2012_val_00002138.JPEG │ │ ├── ...... │ ├── ...... ``` ### 2. PoolFormer Models | Model | #params | Image resolution | Top1 Acc| Download | | :--- | :---: | :---: | :---: | :---: | | poolformer_s12 | 12M | 224 | 77.2 | [here](https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s12.pth.tar) | | poolformer_s24 | 21M | 224 | 80.3 | [here](https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s24.pth.tar) | | poolformer_s36 | 31M | 224 | 81.4 | [here](https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s36.pth.tar) | | poolformer_m36 | 56M | 224 | 82.1 | [here](https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m36.pth.tar) | | poolformer_m48 | 73M | 224 | 82.5 | [here](https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m48.pth.tar) | All the pretrained models can also be downloaded by [BaiDu Yun](https://pan.baidu.com/s/1HSaJtxgCkUlawurQLq87wQ) (password: esac). #### Comparison with improved ResNet scores ![Updated_ResNet_Scores](https://user-images.githubusercontent.com/15921929/143457150-f9cab201-963b-43f4-ae04-40a60798ac9b.png) #### Web Demo Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/poolformer) #### Usage We also provide a Colab notebook which run the steps to perform inference with poolformer: [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sail-sg/poolformer/blob/main/misc/poolformer_demo.ipynb) ### 3. Validation To evaluate our PoolFormer models, run: ```bash MODEL=poolformer_s12 # poolformer_{s12, s24, s36, m36, m48} python3 validate.py /path/to/imagenet --model $MODEL \ --checkpoint /path/to/checkpoint -b 128 ``` ### 4. Train We show how to train PoolFormers on 8 GPUs. The relation between learning rate and batch size is lr=bs/1024*1e-3. For convenience, assuming the batch size is 1024, then the learning rate is set as 1e-3 (for batch size of 1024, setting the learning rate as 2e-3 sometimes sees better performance). ```bash MODEL=poolformer_s12 # poolformer_{s12, s24, s36, m36, m48} DROP_PATH=0.1 # drop path rates [0.1, 0.1, 0.2, 0.3, 0.4] responding to model [s12, s24, s36, m36, m48] CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet \ --model $MODEL -b 128 --lr 1e-3 --drop-path $DROP_PATH --apex-amp ``` ## Acknowledgment Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works. [pytorch-image-models](https://github.com/rwightman/pytorch-image-models), [mmdetection](https://github.com/open-mmlab/mmdetection), [mmsegmentation](https://github.com/open-mmlab/mmsegmentation). Besides, Weihao Yu would like to thank TPU Research Cloud (TRC) program for the support of partial computational resources.

评论收藏

内容反馈

版权申诉