前沿的人工智能模型移动模型4前沿的人工智能模型移动模型4前沿的人工智能模型移动模型4前沿的人工智能模型移动模型4资源-CSDN文库

共200个文件

py：124个

png：26个

yaml：15个

人工智能

183 浏览量 2024-01-06 23:47:04 上传评论收藏 29.6MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 前沿的人工智能模型移动模型 4 （200个子文件）

setup.cfg 93B

.gitignore 2KB

.gitignore 120B

01.jpg 970KB

08.jpg 886KB

09.jpg 855KB

06.jpg 837KB

04.jpg 543KB

04.jpg 531KB

00.jpg 422KB

22.jpg 340KB

50.jpg 166KB

13.jpg 149KB

17.jpg 142KB

39.jpg 141KB

44.jpg 116KB

36.jpg 104KB

43.jpg 94KB

LICENSE 19KB

README.md 9KB

MODEL_CARD.md 6KB

readme.md 6KB

CODE_OF_CONDUCT.md 3KB

CONTRIBUTING.md 1KB

00.png 2.42MB

02.png 1.19MB

Teaser.png 1.17MB

gen_res.png 1.12MB

03.png 1.01MB

gradio.png 967KB

01.png 821KB

22.png 607KB

01.png 597KB

28.png 573KB

00.png 569KB

tryon.png 553KB

06.png 488KB

07.png 484KB

25.png 478KB

07.png 456KB

test.png 431KB

33.png 424KB

18.png 417KB

04.png 417KB

000000309203_GT.png 391KB

02.png 297KB

03.png 245KB

000000047948_GT.png 245KB

000000309203_mask.png 4KB

000000047948_mask.png 2KB

coarse_mask_refine.pth 7.5MB

ddpm.py 83KB

dpm_solver.py 64KB

model.py 34KB

openaimodel.py 30KB

utils_image.py 28KB

bsrgan.py 25KB

bsrgan_light.py 22KB

linear.py 21KB

cldm.py 19KB

ssl_meta_arch.py 18KB

ddim.py 17KB

ddim_hacked.py 16KB

log_regression.py 15KB

vit.py 14KB

knn.py 14KB

run_gradio_demo.py 13KB

plms.py 13KB

vision_transformer.py 12KB

attention.py 12KB

modules.py 11KB

run_inference.py 11KB

coarse_mask_refine_util.py 11KB

data_utils.py 11KB

train.py 10KB

image_net_22k.py 10KB

util.py 10KB

image_net.py 9KB

block.py 9KB

blocks.py 9KB

base.py 8KB

__init__.py 8KB

autoencoder.py 8KB

transforms.py 8KB

samplers.py 7KB

util.py 7KB

loaders.py 7KB

helpers.py 6KB

hubconf.py 6KB

ibot_patch_loss.py 6KB

api.py 5KB

utils.py 5KB

midas_net_custom.py 5KB

__init__.py 5KB

utils.py 4KB

augmentations.py 4KB

metrics.py 4KB

dino_clstoken_loss.py 4KB

param_groups.py 4KB

hack.py 3KB

共 200 条

# DINOv2: Learning Robust Visual Features without Supervision **[Meta AI Research, FAIR](https://ai.facebook.com/research/)** Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, Armand Joulin, Piotr Bojanowski [[`Paper`](https://arxiv.org/abs/2304.07193)] [[`Blog`](https://ai.facebook.com/blog/dino-v2-computer-vision-self-supervised-learning/)] [[`Demo`](https://dinov2.metademolab.com)] [[`BibTeX`](#citing-dinov2)] PyTorch implementation and pretrained models for DINOv2. For details, see the paper: **DINOv2: Learning Robust Visual Features without Supervision**. DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations. https://user-images.githubusercontent.com/60359573/230078733-5faffa19-e6ce-4c55-9200-62dd76f8236a.mp4 <div align="center"> Visualization of the three first principal components of the patch features of all frames, mapped to RGB values. </div> ## Pretrained models <table> <tr> <th>model</th> <th># of<br />params</th> <th>ImageNet<br />k-NN</th> <th>ImageNet<br />linear</th> <th>download</th> </tr> <tr> <td>ViT-S/14 distilled</td> <td align="right">21 M</td> <td align="right">79.0%</td> <td align="right">81.1%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth">backbone only</a></td> </tr> <tr> <td>ViT-B/14 distilled</td> <td align="right">86 M</td> <td align="right">82.1%</td> <td align="right">84.5%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth">backbone only</a></td> </tr> <tr> <td>ViT-L/14 distilled</td> <td align="right">300 M</td> <td align="right">83.5%</td> <td align="right">86.3%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth">backbone only</a></td> </tr> <tr> <td>ViT-g/14</td> <td align="right">1,100 M</td> <td align="right">83.5%</td> <td align="right">86.5%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth">backbone only</a></td> </tr> </table> ### Pretrained models via PyTorch Hub Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install the PyTorch and torchvision dependencies (these are the only required dependencies). Installing both PyTorch and torchvision with CUDA support is strongly recommended. The corresponding model card can be found in the [[`MODEL_CARD.md`](MODEL_CARD.md)] file. ```python import torch dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14') dinov2_vitb14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14') dinov2_vitl14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14') dinov2_vitg14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14') ``` ## Installation The training and evaluation code requires PyTorch 2.0 and xFormers 0.0.18 as well as a number of other 3rd party packages. To setup all the required dependencies for training and evaluation, please follow the instructions below: *conda* **(Recommended)** - Create and activate a `dinov2` conda environment using the provided environment definition: ```shell conda env create -f conda.yaml conda activate dinov2 ``` *pip* - Use the provided `requirements.txt` to install the dependencies: ```shell pip install -r requirements.txt ``` ## Data preparation Expected contents for the ImageNet-1k data folder: - `<root>/test/ILSVRC2012_test_00000001.JPEG` - `<root>/test/[..]` - `<root>/test/ILSVRC2012_test_00100000.JPEG` - `<root>/train/n01440764/n01440764_10026.JPEG` - `<root>/train/[...]` - `<root>/train/n15075141/n15075141_9993.JPEG` - `<root>/val/n01440764/ILSVRC2012_val_00000293.JPEG` - `<root>/val/[...]` - `<root>/val/n15075141/ILSVRC2012_val_00049174.JPEG` - `<root>/labels.txt` For ImageNet-22k, please adapt the Dataset object accordingly. ## Training ### Fast setup: training DINOv2 ViT-L/16 on ImageNet-1k Run DINOv2 on 4 A100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit. ```shell python dinov2/run/train/train.py \ --nodes 4 \ --config-file dinov2/configs/train/vitl16_short.yaml \ --output-dir <PATH/TO/OUTPUT/DIR> \ train.dataset_path=ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> ``` Training time is approximately 1 day and the resulting checkpoint should reach 81.6% on k-NN eval and 82.9% on linear eval. The training code saves the weights of the teacher in the `eval` folder every 12500 iterations for evaluation. ### Long setup: training DINOv2 ViT-L/14 on ImageNet-22k Run on 12 A100-80GB nodes (96 GPUs) in a SLURM cluster environment with submitit. ``` python dinov2/run/train/train.py \ --nodes 12 \ --config-file dinov2/configs/train/vitl14.yaml \ --output-dir <PATH/TO/OUTPUT/DIR> \ train.dataset_path=ImageNet22k:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> ``` Training time is approximately 3.3 days and the resulting checkpoint should reach 82.0% on k-NN eval and 84.5% on linear eval. The training code saves the weights of the teacher in the `eval` folder every 12500 iterations for evaluation. ## Evaluation The training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node: ### k-NN classification on ImageNet-1k ``` python dinov2/run/eval/knn.py \ --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \ --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \ --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/knn \ --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \ --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> ``` ### Logistic regression classification on ImageNet-1k ``` python dinov2/run/eval/log_regression.py \ --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \ --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \ --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/logreg \ --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \ --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> ``` ### Linear classification with data augmentation on ImageNet-1k ``` python dinov2/run/eval/linear.py \ --config-file <PATH/TO/OUTPUT/DIR>/config.yaml \ --pretrained-weights <PATH/TO/OUTPUT/DIR>/eval/training_24999/teacher_checkpoint.pth \ --output-dir <PATH/TO/OUTPUT/DIR>/eval/training_24999/linear \ --train-dataset ImageNet:split=TRAIN:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \ --val-dataset ImageNet:split=VAL:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> ``` We release the weights from evaluating the different models: <table> <tr> <th>model</th> <th>ImageNet<br />top-1</th> <th>linear evaluation</th> </tr> <tr> <td>ViT-S/14 distilled</td> <td align="right">81.1%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_linear_head.pth">linear head weights</a></td> </tr> <tr> <td>ViT-B/14 distilled</td> <td align="right">84.5%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_linear_head.pth">linear head weights</a></td> </tr> <tr> <td>ViT-L/14 distilled</td> <td align="right">86.3%</td> <td><a href="https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_linear_head.pth">linear head weights</a></td> </tr> <tr> <td>ViT-g/14</td> <td align="righ

评论收藏

内容反馈