# vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ using ggml without any extra dependencies
## Description
This project presents a standalone implementation of the well known Vision Transformer (ViT) model family, used in a broad spectrum of applications and SOTA models like Large Multimodal Models(LMM). The primary goal is to develop a C/C++ inference engine tailored for ViT models, utilizing [ggml](https://github.com/ggerganov/ggml) to enhance performance, particularly on edge devices. Designed to be both lightweight and self-contained, this implementation can be run across diverse platforms.
<details>
<summary>Table of Contents</summary>
1. [Description](#Description)
2. [Features](#features)
3. [Vision Transformer Architecture](#vision-transformer-architecture)
4. [Quick Example](#quick-example)
5. [Convert PyTorch to GGUF](#convert-pytorch-to-gguf)
6. [Build](#build)
- [Simple Build](#simple-build)
- [Per Device Optimizations](#per-device-optimizations)
- [OpenMP](#using-openmp)
7. [Run](#run)
8. [Benchmark against PyTorch](#benchmark-against-pytorch)
- [ViT Inference](#vit-inference)
- [Benchmark on Your Machine](#benchmark-on-your-machine)
9. [Quantization](#quantization)
10. [To-Do List](#to-do-list)
</details>
## Features
- Dependency-free and lightweight inference thanks to [ggml](https://github.com/ggerganov/ggml).
- 4-bit, 5-bit and 8-bit quantization support.
- Support for timm ViTs with different variants out of the box.
An important aspect of using `vit.cpp` is that it has short startup times compared to common DL frameworks, which makes it suitable for serverless deployments where the cold start is an issue.
## Vision Transformer architecture
The implemented architecture is based on the original Vision Transformer from:
- [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
<p align="center">
<img src="assets/image.png" alt="Vision Transformer overview" width="60%" height="auto">
</p>
<p align="center">
ViT architecture. Taken from the <a href="https://arxiv.org/abs/2010.11929">original paper</a>.
</p>
## Quick example
<details>
<p align="center">
<img src="assets/magpie.jpeg" alt="example input" width="50%" height="auto">
</p>
<summary>See output</summary>
<pre>
$ ./bin/vit -t 4 -m ../ggml-model-f16.gguf -i ../assets/magpie.jpeg -k 5
main: seed = 1701176263
main: n_threads = 4 / 8
vit_model_load: loading model from '../ggml-model-f16.gguf' - please wait
vit_model_load: hidden_size = 192
vit_model_load: num_hidden_layers = 12
vit_model_load: num_attention_heads = 3
vit_model_load: patch_size = 16
vit_model_load: img_size = 224
vit_model_load: num_classes = 1000
vit_model_load: ftype = 1
vit_model_load: qntvr = 0
operator(): ggml ctx size = 11.13 MB
vit_model_load: ................... done
vit_model_load: model size = 11.04 MB / num tensors = 152
main: loaded image '../assets/magpie.jpeg' (500 x 470)
vit_image_preprocess: scale = 2.232143
processed, out dims : (224 x 224)
> magpie : 0.87
> goose : 0.02
> toucan : 0.01
> drake : 0.01
> king penguin, Aptenodytes patagonica : 0.01
main: model load time = 17.92 ms
main: processing time = 146.96 ms
main: total time = 164.88 ms
</pre>
</details>
## Convert PyTorch to GGUF
# clone the repo recursively
git clone --recurse-submodules https://github.com/staghado/vit.cpp.git
cd vit.cpp
# install torch and timm
pip install torch timm
# list available models if needed; note that not all models are supported
python convert-pth-to-ggml.py --list
# convert the weights to gguf : vit tiny with patch size of 16 and an image
# size of 384 pre-trained on ImageNet21k and fine-tuned on ImageNet1k
python convert-pth-to-ggml.py --model_name vit_tiny_patch16_384.augreg_in21k_ft_in1k --ftype 1
> **Note:** You can also download the converted weights from [Hugging Face](https://huggingface.co/staghado/vit.cpp) directly.
> ```wget https://huggingface.co/staghado/vit.cpp/blob/main/tiny-ggml-model-f16.gguf```
## Build
### Simple build
# build ggml and vit
mkdir build && cd build
cmake .. && make -j4
# run inference
./bin/vit -t 4 -m ../ggml-model-f16.gguf -i ../assets/tench.jpg
The optimal number of threads to use depends on many factors and more is not always better. Usually using a number of threads equal to the number of available physical cores gives the best performance in terms of speed.
### Per device optimizations
Generate per-device instructions that work best for the given machine rather than using general CPU instructions.
This can be done by specifying `-march=native` in the compiler flags.
* Multi-threading and vectorization
* Loop transformations(unrolling)
#### For AMD host processors
You can use a specialized compiler released by AMD to make full use of your specific processor's architecture.
Read more here : [AMD Optimizing C/C++ and Fortran Compilers (AOCC)](https://www.amd.com/en/developer/aocc.html)
You can follow the given instructions to install the AOCC compiler.
Note : For my AMD Ryzen 7 3700U, the improvements were not very significant but for more recent processors there could be a gain in using a specialized compiler.
### Using OpenMP
Additionally compile with OpenMP by specifying the `-fopenmp` flag to the compiler in the CMakeLists file,
allowing multithreaded runs. Make sure to also enable multiple threads when running, e.g.:
OMP_NUM_THREADS=4 ./bin/vit -t 4 -m ../ggml-model-f16.bin -i ../assets/tench.jpg
## Run
usage: ./bin/vit [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-m FNAME, --model FNAME model path (default: ../ggml-model-f16.bin)
-i FNAME, --inp FNAME input file (default: ../assets/tench.jpg)
-k N, --topk N top k classes to print (default: 5)
-e FLOAT, --epsilon epsilon (default: 0.000001)
## Benchmark against PyTorch
First experiments on Apple M1 show inference speedups(up to 6x faster for base model) compared to native PyTorch inference.
### ViT inference
You can efficiently run ViT inference on the CPU.
Memory requirements and inference speed on AMD Ryzen 7 3700U(4 cores, 8 threads) for both native PyTorch and `vit.cpp`.
Using 4 threads gives better results for my machine. The reported results of inference speed correspond to 10 runs averages for both PyTorch and `vit.cpp`.
| Model | Max Mem(PyTorch) | Max Mem | Speed(PyTorch) | Speed |
| :----: | :-----------: | :------------: | :------------: | :------------: |
| tiny | ~780 MB | **~20 MB** | 431 ms | **120 ms** |
| small | ~965 MB | **~52 MB** | 780 ms | **463 ms** |
| base | ~1.61 GB | **~179 MB** | 2393 ms | **1441 ms** |
| large | ~3.86 GB | **~597 MB** | 8151 ms | **4892 ms** |
> **Note:** The models used are of the form `vit_{size}_patch16_224.augreg_in21k_ft_in1k`.
### Benchmark on your machine
In order to test the inference speed on your machine, you can run the following scripts:
chmod +x scripts/benchmark.*
# install memory_profiler & threadpoolctl
pip install memory_profiler threadpoolctl
# run the benchmark of PyTorch
python scripts/benchmark.py
# run the benchmark of vit.cpp for non-qunatized model
./scripts/b
没有合适的资源?快使用搜索试试~ 我知道了~
基于ggml+C++部署Vision-Transformer算法-无依赖+轻量化+4bit+8bit量化源码+项目说明.zip
共970个文件
png:765个
cpp:46个
txt:29个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 82 浏览量
2024-10-13
07:55:54
上传
评论
收藏 6.2MB ZIP 举报
温馨提示
基于ggml+C++部署Vision-Transformer算法-无依赖+轻量化+4bit+8bit量化源码+项目说明.zip
资源推荐
资源详情
资源评论
收起资源包目录
基于ggml+C++部署Vision-Transformer算法-无依赖+轻量化+4bit+8bit量化源码+项目说明.zip (970个子文件)
ggml.c 630KB
ggml-quants.c 292KB
test-mul-mat2.c 92KB
ggml-backend.c 36KB
ggml-alloc.c 27KB
test-vec1.c 21KB
test1.c 17KB
test-mul-mat0.c 11KB
test-mul-mat1.c 9KB
test-conv-transpose.c 8KB
test-vec2.c 7KB
test-customop.c 7KB
test-blas0.c 7KB
test2.c 6KB
test-svd0.c 5KB
test-pool.c 5KB
test-vec0.c 3KB
test-xpos.c 3KB
test-rel-pos.c 3KB
test3.c 3KB
test0.c 1KB
BuildTypes.cmake 2KB
GitVars.cmake 739B
whisper.cpp 231KB
main.cpp 99KB
ggml-opencl.cpp 71KB
test-grad0.cpp 56KB
main.cpp 46KB
main-batched.cpp 42KB
main.cpp 40KB
starcoder-mmap.cpp 40KB
main.cpp 38KB
vit.cpp 38KB
vitstr.cpp 37KB
main-backend.cpp 35KB
main.cpp 34KB
main.cpp 33KB
main-alloc.cpp 31KB
main-ctx.cpp 30KB
main.cpp 29KB
common.cpp 28KB
main.cpp 28KB
main.cpp 26KB
yolov3-tiny.cpp 19KB
test-conv2d.cpp 17KB
test-mul-mat.cpp 15KB
test-quantize-perf.cpp 14KB
quantize.cpp 13KB
quantize.cpp 13KB
test-conv1d.cpp 10KB
main.cpp 10KB
common-ggml.cpp 8KB
quantize.cpp 8KB
yolo-image.cpp 6KB
quantize.cpp 6KB
quantize.cpp 6KB
quantize.cpp 6KB
quantize.cpp 6KB
quantize.cpp 6KB
test-quantize-fns.cpp 6KB
quantize.cpp 6KB
quantize.cpp 6KB
main-cnn.cpp 6KB
test-opt.cpp 5KB
benchmark.cpp 4KB
main-cpu.cpp 4KB
main-mtl.cpp 3KB
main.cpp 3KB
main.cpp 3KB
ggml-cuda.cu 319KB
.editorconfig 397B
.git 29B
.gitignore 428B
.gitignore 20B
.gitignore 3B
.gitmodules 80B
stb_image.h 286KB
dr_wav.h 242KB
ggml.h 80KB
stb_image_write.h 71KB
whisper.h 30KB
ggml-quants.h 10KB
common.h 9KB
ggml-impl.h 7KB
ggml-backend.h 6KB
ggml-metal.h 4KB
vitstr.h 4KB
ggml-alloc.h 3KB
vit.h 3KB
ggml-backend-impl.h 3KB
ggml-cuda.h 2KB
yolo-image.h 1KB
ggml-opencl.h 870B
main-mtl.h 526B
common-ggml.h 428B
api.h 424B
index.html 6KB
t10k-images.idx3-ubyte 7.48MB
ggml.pc.in 251B
magpie.jpeg 271KB
共 970 条
- 1
- 2
- 3
- 4
- 5
- 6
- 10
资源评论
生活家小毛.
- 粉丝: 6046
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功