# DINOv2: Learning Robust Visual Features without Supervision
**[Meta AI Research, FAIR](https://ai.facebook.com/research/)**
Maxime Oquab,
Timothée Darcet,
Théo Moutakanni,
Huy Vo,
Marc Szafraniec,
Vasil Khalidov,
Patrick Labatut,
Armand Joulin,
Piotr Bojanowski
[[`Paper`](https://arxiv.org/abs/2304.07193)] [[`Blog`](https://ai.facebook.com/blog/dino-v2-computer-vision-self-supervised-learning/)] [[`Demo`](https://dinov2.metademolab.com)] [[`BibTeX`](#citing-dinov2)]
PyTorch implementation and pretrained models for DINOv2. For details, see the paper: **DINOv2: Learning Robust Visual Features without Supervision**.
DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations.
https://user-images.githubusercontent.com/60359573/230078733-5faffa19-e6ce-4c55-9200-62dd76f8236a.mp4
<div align="center">
Visualization of the three first principal components of the patch features of all frames, mapped to RGB values.
</div>
## Pretrained models
<table>
<tr>
<th>model</th>
<th># of<br />params</th>
<th>ImageNet<br />k-NN</th>
<th>ImageNet<br />linear</th>
<th>download</th>
</tr>
<tr>
<td>ViT-S/14 distilled</td>
<td align="right">21 M</td>
<td align="right">79.0%</td>
<td align="right">81.1%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth">backbone only</a></td>
</tr>
<tr>
<td>ViT-B/14 distilled</td>
<td align="right">86 M</td>
<td align="right">82.1%</td>
<td align="right">84.5%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth">backbone only</a></td>
</tr>
<tr>
<td>ViT-L/14 distilled</td>
<td align="right">300 M</td>
<td align="right">83.5%</td>
<td align="right">86.3%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth">backbone only</a></td>
</tr>
<tr>
<td>ViT-g/14</td>
<td align="right">1,100 M</td>
<td align="right">83.5%</td>
<td align="right">86.5%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth">backbone only</a></td>
</tr>
</table>
### Pretrained models via PyTorch Hub
Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install the PyTorch and torchvision dependencies (these are the only required dependencies). Installing both PyTorch and torchvision with CUDA support is strongly recommended.
The corresponding model card can be found in the [[`MODEL_CARD.md`](MODEL_CARD.md)] file.
```python
import torch
dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
dinov2_vitb14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14')
dinov2_vitl14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14')
dinov2_vitg14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14')
```
## Installation
The training and evaluation code requires PyTorch 2.0 and xFormers 0.0.18 as well as a number of other 3rd party packages. To setup all the required dependencies for training and evaluation, please follow the instructions below:
*conda* **(Recommended)** - Create and activate a `dinov2` conda environment using the provided environment definition:
```shell
conda env create -f conda.yaml
conda activate dinov2
```
*pip* - Use the provided `requirements.txt` to install the dependencies:
```shell
pip install -r requirements.txt
```
## Data preparation
Expected contents for the ImageNet-1k data folder:
- `<root>/test/ILSVRC2012_test_00000001.JPEG`
- `<root>/test/[..]`
- `<root>/test/ILSVRC2012_test_00100000.JPEG`
- `<root>/train/n01440764/n01440764_10026.JPEG`
- `<root>/train/[...]`
- `<root>/train/n15075141/n15075141_9993.JPEG`
- `<root>/val/n01440764/ILSVRC2012_val_00000293.JPEG`
- `<root>/val/[...]`
- `<root>/val/n15075141/ILSVRC2012_val_00049174.JPEG`
- `<root>/labels.txt`
For ImageNet-22k, please adapt the Dataset object accordingly.
## Training
### Fast setup: training DINOv2 ViT-L/16 on ImageNet-1k
Run DINOv2 on 4 A100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit.
```shell
python dinov2/run/train/train.py \
--nodes 4 \
--config-file dinov2/configs/train/vitl16_short.yaml \
--output-dir <PATH/TO/OUTPUT/DIR> \
train.dataset_path=ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```
Training time is approximately 1 day and the resulting checkpoint should reach 81.6% on k-NN eval and 82.9% on linear eval.
The training code saves the weights of the teacher in the `eval` folder every 12500 iterations for evaluation.
### Long setup: training DINOv2 ViT-L/14 on ImageNet-22k
Run on 12 A100-80GB nodes (96 GPUs) in a SLURM cluster environment with submitit.
```
python dinov2/run/train/train.py \
--nodes 12 \
--config-file dinov2/configs/train/vitl14.yaml \
--output-dir <PATH/TO/OUTPUT/DIR> \
train.dataset_path=ImageNet22k:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```
Training time is approximately 3.3 days and the resulting checkpoint should reach 82.0% on k-NN eval and 84.5% on linear eval.
The training code saves the weights of the teacher in the `eval` folder every 12500 iterations for evaluation.
## Evaluation
The training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node:
### k-NN classification on ImageNet-1k
```
python dinov2/run/eval/knn.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/knn \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```
### Logistic regression classification on ImageNet-1k
```
python dinov2/run/eval/log_regression.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/logreg \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```
### Linear classification with data augmentation on ImageNet-1k
```
python dinov2/run/eval/linear.py \
--config-file <PATH/TO/OUTPUT/DIR>/config.yaml \
--pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \
--output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/linear \
--train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
--val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```
We release the weights from evaluating the different models:
<table>
<tr>
<th>model</th>
<th>ImageNet<br />top-1</th>
<th>linear evaluation</th>
</tr>
<tr>
<td>ViT-S/14 distilled</td>
<td align="right">81.1%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_linear_head.pth">linear head weights</a></td>
</tr>
<tr>
<td>ViT-B/14 distilled</td>
<td align="right">84.5%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_linear_head.pth">linear head weights</a></td>
</tr>
<tr>
<td>ViT-L/14 distilled</td>
<td align="right">86.3%</td>
<td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_linear_head.pth">linear head weights</a></td>
</tr>
<tr>
<td>ViT-g/14</td>
<td align="righ
没有合适的资源?快使用搜索试试~ 我知道了~
前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4
共200个文件
py:124个
png:26个
yaml:15个
0 下载量 183 浏览量
2024-01-06
23:47:04
上传
评论
收藏 29.6MB ZIP 举报
温馨提示
前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4
资源推荐
资源详情
资源评论
收起资源包目录
前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 (200个子文件)
setup.cfg 93B
.gitignore 2KB
.gitignore 120B
01.jpg 970KB
08.jpg 886KB
09.jpg 855KB
06.jpg 837KB
04.jpg 543KB
04.jpg 531KB
00.jpg 422KB
22.jpg 340KB
50.jpg 166KB
13.jpg 149KB
17.jpg 142KB
39.jpg 141KB
44.jpg 116KB
36.jpg 104KB
43.jpg 94KB
LICENSE 19KB
README.md 9KB
MODEL_CARD.md 6KB
readme.md 6KB
CODE_OF_CONDUCT.md 3KB
CONTRIBUTING.md 1KB
00.png 2.42MB
02.png 1.19MB
Teaser.png 1.17MB
gen_res.png 1.12MB
03.png 1.01MB
gradio.png 967KB
01.png 821KB
22.png 607KB
01.png 597KB
28.png 573KB
00.png 569KB
tryon.png 553KB
06.png 488KB
07.png 484KB
25.png 478KB
07.png 456KB
test.png 431KB
33.png 424KB
18.png 417KB
04.png 417KB
000000309203_GT.png 391KB
02.png 297KB
03.png 245KB
000000047948_GT.png 245KB
000000309203_mask.png 4KB
000000047948_mask.png 2KB
coarse_mask_refine.pth 7.5MB
ddpm.py 83KB
dpm_solver.py 64KB
model.py 34KB
openaimodel.py 30KB
utils_image.py 28KB
bsrgan.py 25KB
bsrgan_light.py 22KB
linear.py 21KB
cldm.py 19KB
ssl_meta_arch.py 18KB
ddim.py 17KB
ddim_hacked.py 16KB
log_regression.py 15KB
vit.py 14KB
knn.py 14KB
run_gradio_demo.py 13KB
plms.py 13KB
vision_transformer.py 12KB
attention.py 12KB
modules.py 11KB
run_inference.py 11KB
coarse_mask_refine_util.py 11KB
data_utils.py 11KB
train.py 10KB
image_net_22k.py 10KB
util.py 10KB
image_net.py 9KB
block.py 9KB
blocks.py 9KB
base.py 8KB
__init__.py 8KB
autoencoder.py 8KB
transforms.py 8KB
samplers.py 7KB
util.py 7KB
loaders.py 7KB
helpers.py 6KB
hubconf.py 6KB
ibot_patch_loss.py 6KB
api.py 5KB
utils.py 5KB
midas_net_custom.py 5KB
__init__.py 5KB
utils.py 4KB
augmentations.py 4KB
metrics.py 4KB
dino_clstoken_loss.py 4KB
param_groups.py 4KB
hack.py 3KB
共 200 条
- 1
- 2
资源评论
qq_39305263
- 粉丝: 176
- 资源: 61
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功