# MaxViT: Multi-Axis Vision Transformer (ECCV 2022)
[![Paper](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2204.01697)
[![Tutorial In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-research/maxvit/blob/master/MaxViT_tutorial.ipynb)
[![video](https://img.shields.io/badge/Video-Presentation-F9D371)](https://youtu.be/WEgB4lAZyKM)
This repository hosts the official TensorFlow implementation of MAXViT models:
__[MaxViT: Multi-Axis Vision Transformer](https://arxiv.org/abs/2204.01697)__. ECCV 2022.\
[Zhengzhong Tu](https://twitter.com/_vztu), [Hossein Talebi](https://scholar.google.com/citations?hl=en&user=UOX9BigAAAAJ), [Han Zhang](https://sites.google.com/view/hanzhang), [Feng Yang](https://sites.google.com/view/feng-yang), [Peyman Milanfar](https://sites.google.com/view/milanfarhome/), [Alan Bovik](https://www.ece.utexas.edu/people/faculty/alan-bovik), and [Yinxiao Li](https://scholar.google.com/citations?user=kZsIU74AAAAJ&hl=en)\
Google Research, University of Texas at Austin
*Disclaimer: This is not an officially supported Google product.*
**News**:
- Oct 12, 2022: Added the remaining ImageNet-1K and -21K checkpoints.
- Oct 4, 2022: A list of updates
* Added MaxViTTiny and MaxViTSmall checkpoints.
* Added a Colab tutorial.
- Sep 8, 2022: our Google AI blog covering both [MaxViT](https://arxiv.org/abs/2204.01697) and [MAXIM](https://github.com/google-research/maxim) is [live](https://ai.googleblog.com/2022/09/a-multi-axis-approach-for-vision.html).
- Sep 7, 2022: [@rwightman](https://github.com/rwightman) released a few small model weights in [timm](https://github.com/rwightman/pytorch-image-models#aug-26-2022). Achieves even better results than our paper. See more [here](https://github.com/rwightman/pytorch-image-models#aug-26-2022).
- Aug 26, 2022: our MaxViT models have been implemented in [timm (pytorch-image-models)](https://github.com/rwightman/pytorch-image-models#aug-26-2022). Kudos to [@rwightman](https://github.com/rwightman)!
- July 21, 2022: Initial code release of [MaxViT models](https://arxiv.org/abs/2204.01697): accepted to ECCV'22.
- Apr 6, 2022: MaxViT has been implemented by [@lucidrains](https://github.com/lucidrains): [vit-pytorch](https://github.com/lucidrains/vit-pytorch#maxvit) :scream: :exploding_head:
- Apr 4, 2022: initial uploads to [Arxiv](https://arxiv.org/abs/2204.01697)
## MaxViT Models
[MaxViT](https://arxiv.org/abs/2204.01697) is a family of hybrid (CNN + ViT) image classification models, that achieves better performances across the board for both parameter and FLOPs efficiency than both SoTA ConvNets and Transformers. They can also scale well on large dataset sizes like ImageNet-21K. Notably, due to the linear-complexity of the grid attention used, MaxViT is able to ''see'' globally throughout the entire network, even in earlier, high-resolution stages.
MaxViT meta-architecture:
<p align="center">
<img src = "./doc/maxvit_arch.png" width="80%">
</p>
Results on ImageNet-1k train and test:
<p align="center">
<img src = "./doc/imagenet_results.png" width="80%">
</p>
Results on ImageNet-21k and JFT pre-trained models:
<p align="center">
<img src = "./doc/i21k_jft_results.png" width="80%">
</p>
## Colab Demo
We have released a Google Colab Demo on the tutorials of how to run MaxViT on images. Try it here [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-research/maxvit/blob/master/MaxViT_tutorial.ipynb)
## Pretrained MaxViT Checkpoints
We have provided a list of results and checkpoints as follows:
| Name | Resolution | Top1 Acc. | #Params | FLOPs | Model |
| ---------- | ---------| ------ | ------ | ------ | ------ |
| MaxViT-T | 224x224 | 83.62% | 31M | 5.6B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvittiny/i1k/224)
| MaxViT-T | 384x384 | 85.24% | 31M | 17.7B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvittiny/i1k/384)
| MaxViT-T | 512x512 | 85.72% | 31M | 33.7B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvittiny/i1k/512)
| MaxViT-S | 224x224 | 84.45% | 69M | 11.7B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitsmall/i1k/224)
| MaxViT-S | 384x384 | 85.74% | 69M | 36.1B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitsmall/i1k/384)
| MaxViT-S | 512x512 | 86.19% | 69M | 67.6B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitsmall/i1k/512)
| MaxViT-B | 224x224 | 84.95% | 119M | 24.2B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i1k/224)
| MaxViT-B | 384x384 | 86.34% | 119M | 74.2B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i1k/384)
| MaxViT-B | 512x512 | 86.66% | 119M | 138.5B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i1k/512)
| MaxViT-L | 224x224 | 85.17% | 212M | 43.9B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i1k/224)
| MaxViT-L | 384x384 | 86.40% | 212M | 133.1B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i1k/384)
| MaxViT-L | 512x512 | 86.70% | 212M | 245.4B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i1k/512)
Here are a list of ImageNet-21K pretrained and ImageNet-1K finetuned models:
| Name | Resolution | Top1 Acc. | #Params | FLOPs | 21k model | 1k model |
| ---------- | ------ | ------ | ------ | ------ | ------ | --------|
| MaxViT-B | 224x224 | - | 119M | 24.2B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i21k_pt/224) | - |
| MaxViT-B | 384x384 | - | 119M | 74.2B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i21k_i1k/384)
| MaxViT-B | 512x512 | - | 119M | 138.5B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitbase/i21k_i1k/512)
| MaxViT-L | 224x224 | - | 212M | 43.9B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i21k_pt/224) | - |
| MaxViT-L | 384x384 | - | 212M | 133.1B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i21k_i1k/384)
| MaxViT-L | 512x512 | - | 212M | 245.4B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitlarge/i21k_i1k/512)
| MaxViT-XL | 224x224 | - | 475M | 97.8B | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitxlarge/i21k_pt/224) | - |
| MaxViT-XL | 384x384 | - | 475M | 293.7B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitxlarge/i21k_i1k/384)
| MaxViT-XL | 512x512 | - | 475M | 535.2B | - | [ckpt](https://console.cloud.google.com/storage/browser/gresearch/maxvit/ckpts/maxvitxlarge/i21k_i1k/512)
## Citation
Should you find this repository useful, please consider citing:
```
@article{tu2022maxvit,
title={MaxViT: Multi-Axis Vision Transformer},
author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
journal={ECCV},
year={2022},
}
```
## Other Related Works
* MAXIM: Multi-Axis ML
没有合适的资源?快使用搜索试试~ 我知道了~
MaxViT:多轴视觉Transformer
共23个文件
py:15个
png:3个
md:2个
需积分: 3 3 下载量 116 浏览量
2022-10-26
13:07:35
上传
评论
收藏 1.46MB ZIP 举报
温馨提示
这是一篇谷歌发表在ECCV2022的论文,这篇论文可以说是提供了一个即插即用的模块(个人觉得),该模块将CNN与Transformer相结合。 里面只有代码,只是为了方便大家。权重自己去下载哦
资源推荐
资源详情
资源评论
收起资源包目录
maxvit-main.zip (23个子文件)
maxvit-main
maxvit
models
hp
vision.py 3KB
__init__.py 576B
vision_i1k.py 4KB
utils.py 7KB
maxvit.py 42KB
attention_utils.py 5KB
__init__.py 576B
hparams_registry.py 770B
hparams.py 1KB
common_ops.py 5KB
hparam_configs.py 8KB
eval_ckpt.py 11KB
test_maxvit.py 675B
__init__.py 576B
LICENSE 11KB
CONTRIBUTING.md 1KB
requirements.txt 114B
doc
imagenet_results.png 318KB
maxvit_arch.png 212KB
i21k_jft_results.png 261KB
setup.py 1KB
README.md 9KB
MaxViT_tutorial.ipynb 949KB
共 23 条
- 1
资源评论
一名不想学习的学渣
- 粉丝: 7251
- 资源: 16
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功