基于ggml+C++部署Vision-Transformer算法-无依赖+轻量化+4bit+8bit量化源码+项目说明.zip资源-CSDN文库

共970个文件

png：765个

cpp：46个

txt：29个

版权申诉

源码

82 浏览量 2024-10-13 07:55:54 上传评论收藏 6.2MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于ggml+C++部署Vision-Transformer算法-无依赖+轻量化+4bit+8bit量化源码+项目说明.zip （970个子文件）

ggml.c 630KB

ggml-quants.c 292KB

test-mul-mat2.c 92KB

ggml-backend.c 36KB

ggml-alloc.c 27KB

test-vec1.c 21KB

test1.c 17KB

test-mul-mat0.c 11KB

test-mul-mat1.c 9KB

test-conv-transpose.c 8KB

test-vec2.c 7KB

test-customop.c 7KB

test-blas0.c 7KB

test2.c 6KB

test-svd0.c 5KB

test-pool.c 5KB

test-vec0.c 3KB

test-xpos.c 3KB

test-rel-pos.c 3KB

test3.c 3KB

test0.c 1KB

BuildTypes.cmake 2KB

GitVars.cmake 739B

whisper.cpp 231KB

main.cpp 99KB

ggml-opencl.cpp 71KB

test-grad0.cpp 56KB

main.cpp 46KB

main-batched.cpp 42KB

main.cpp 40KB

starcoder-mmap.cpp 40KB

main.cpp 38KB

vit.cpp 38KB

vitstr.cpp 37KB

main-backend.cpp 35KB

main.cpp 34KB

main.cpp 33KB

main-alloc.cpp 31KB

main-ctx.cpp 30KB

main.cpp 29KB

common.cpp 28KB

main.cpp 28KB

main.cpp 26KB

yolov3-tiny.cpp 19KB

test-conv2d.cpp 17KB

test-mul-mat.cpp 15KB

test-quantize-perf.cpp 14KB

quantize.cpp 13KB

test-conv1d.cpp 10KB

main.cpp 10KB

common-ggml.cpp 8KB

quantize.cpp 8KB

yolo-image.cpp 6KB

quantize.cpp 6KB

test-quantize-fns.cpp 6KB

quantize.cpp 6KB

main-cnn.cpp 6KB

test-opt.cpp 5KB

benchmark.cpp 4KB

main-cpu.cpp 4KB

main-mtl.cpp 3KB

main.cpp 3KB

ggml-cuda.cu 319KB

.editorconfig 397B

.git 29B

.gitignore 428B

.gitignore 20B

.gitignore 3B

.gitmodules 80B

stb_image.h 286KB

dr_wav.h 242KB

ggml.h 80KB

stb_image_write.h 71KB

whisper.h 30KB

ggml-quants.h 10KB

common.h 9KB

ggml-impl.h 7KB

ggml-backend.h 6KB

ggml-metal.h 4KB

vitstr.h 4KB

ggml-alloc.h 3KB

vit.h 3KB

ggml-backend-impl.h 3KB

ggml-cuda.h 2KB

yolo-image.h 1KB

ggml-opencl.h 870B

main-mtl.h 526B

common-ggml.h 428B

api.h 424B

index.html 6KB

t10k-images.idx3-ubyte 7.48MB

ggml.pc.in 251B

magpie.jpeg 271KB

共 970 条

# vit.cpp Inference Vision Transformer (ViT) in plain C/C++ using ggml without any extra dependencies ## Description This project presents a standalone implementation of the well known Vision Transformer (ViT) model family, used in a broad spectrum of applications and SOTA models like Large Multimodal Models(LMM). The primary goal is to develop a C/C++ inference engine tailored for ViT models, utilizing [ggml](https://github.com/ggerganov/ggml) to enhance performance, particularly on edge devices. Designed to be both lightweight and self-contained, this implementation can be run across diverse platforms. <details> <summary>Table of Contents</summary> 1. [Description](#Description) 2. [Features](#features) 3. [Vision Transformer Architecture](#vision-transformer-architecture) 4. [Quick Example](#quick-example) 5. [Convert PyTorch to GGUF](#convert-pytorch-to-gguf) 6. [Build](#build) - [Simple Build](#simple-build) - [Per Device Optimizations](#per-device-optimizations) - [OpenMP](#using-openmp) 7. [Run](#run) 8. [Benchmark against PyTorch](#benchmark-against-pytorch) - [ViT Inference](#vit-inference) - [Benchmark on Your Machine](#benchmark-on-your-machine) 9. [Quantization](#quantization) 10. [To-Do List](#to-do-list) </details> ## Features - Dependency-free and lightweight inference thanks to [ggml](https://github.com/ggerganov/ggml). - 4-bit, 5-bit and 8-bit quantization support. - Support for timm ViTs with different variants out of the box. An important aspect of using `vit.cpp` is that it has short startup times compared to common DL frameworks, which makes it suitable for serverless deployments where the cold start is an issue. ## Vision Transformer architecture The implemented architecture is based on the original Vision Transformer from: - [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) <img src="assets/image.png" alt="Vision Transformer overview" width="60%" height="auto"> ViT architecture. Taken from the <a href="https://arxiv.org/abs/2010.11929">original paper</a>. ## Quick example <details> <img src="assets/magpie.jpeg" alt="example input" width="50%" height="auto"> <summary>See output</summary> <pre> $ ./bin/vit -t 4 -m ../ggml-model-f16.gguf -i ../assets/magpie.jpeg -k 5 main: seed = 1701176263 main: n_threads = 4 / 8 vit_model_load: loading model from '../ggml-model-f16.gguf' - please wait vit_model_load: hidden_size = 192 vit_model_load: num_hidden_layers = 12 vit_model_load: num_attention_heads = 3 vit_model_load: patch_size = 16 vit_model_load: img_size = 224 vit_model_load: num_classes = 1000 vit_model_load: ftype = 1 vit_model_load: qntvr = 0 operator(): ggml ctx size = 11.13 MB vit_model_load: ................... done vit_model_load: model size = 11.04 MB / num tensors = 152 main: loaded image '../assets/magpie.jpeg' (500 x 470) vit_image_preprocess: scale = 2.232143 processed, out dims : (224 x 224) > magpie : 0.87 > goose : 0.02 > toucan : 0.01 > drake : 0.01 > king penguin, Aptenodytes patagonica : 0.01 main: model load time = 17.92 ms main: processing time = 146.96 ms main: total time = 164.88 ms </pre> </details> ## Convert PyTorch to GGUF # clone the repo recursively git clone --recurse-submodules https://github.com/staghado/vit.cpp.git cd vit.cpp # install torch and timm pip install torch timm # list available models if needed; note that not all models are supported python convert-pth-to-ggml.py --list # convert the weights to gguf : vit tiny with patch size of 16 and an image # size of 384 pre-trained on ImageNet21k and fine-tuned on ImageNet1k python convert-pth-to-ggml.py --model_name vit_tiny_patch16_384.augreg_in21k_ft_in1k --ftype 1 > **Note:** You can also download the converted weights from [Hugging Face](https://huggingface.co/staghado/vit.cpp) directly. > ```wget https://huggingface.co/staghado/vit.cpp/blob/main/tiny-ggml-model-f16.gguf``` ## Build ### Simple build # build ggml and vit mkdir build && cd build cmake .. && make -j4 # run inference ./bin/vit -t 4 -m ../ggml-model-f16.gguf -i ../assets/tench.jpg The optimal number of threads to use depends on many factors and more is not always better. Usually using a number of threads equal to the number of available physical cores gives the best performance in terms of speed. ### Per device optimizations Generate per-device instructions that work best for the given machine rather than using general CPU instructions. This can be done by specifying `-march=native` in the compiler flags. * Multi-threading and vectorization * Loop transformations(unrolling) #### For AMD host processors You can use a specialized compiler released by AMD to make full use of your specific processor's architecture. Read more here : [AMD Optimizing C/C++ and Fortran Compilers (AOCC)](https://www.amd.com/en/developer/aocc.html) You can follow the given instructions to install the AOCC compiler. Note : For my AMD Ryzen 7 3700U, the improvements were not very significant but for more recent processors there could be a gain in using a specialized compiler. ### Using OpenMP Additionally compile with OpenMP by specifying the `-fopenmp` flag to the compiler in the CMakeLists file, allowing multithreaded runs. Make sure to also enable multiple threads when running, e.g.: OMP_NUM_THREADS=4 ./bin/vit -t 4 -m ../ggml-model-f16.bin -i ../assets/tench.jpg ## Run usage: ./bin/vit [options] options: -h, --help show this help message and exit -s SEED, --seed SEED RNG seed (default: -1) -t N, --threads N number of threads to use during computation (default: 4) -m FNAME, --model FNAME model path (default: ../ggml-model-f16.bin) -i FNAME, --inp FNAME input file (default: ../assets/tench.jpg) -k N, --topk N top k classes to print (default: 5) -e FLOAT, --epsilon epsilon (default: 0.000001) ## Benchmark against PyTorch First experiments on Apple M1 show inference speedups(up to 6x faster for base model) compared to native PyTorch inference. ### ViT inference You can efficiently run ViT inference on the CPU. Memory requirements and inference speed on AMD Ryzen 7 3700U(4 cores, 8 threads) for both native PyTorch and `vit.cpp`. Using 4 threads gives better results for my machine. The reported results of inference speed correspond to 10 runs averages for both PyTorch and `vit.cpp`. | Model | Max Mem(PyTorch) | Max Mem | Speed(PyTorch) | Speed | | :----: | :-----------: | :------------: | :------------: | :------------: | | tiny | ~780 MB | **~20 MB** | 431 ms | **120 ms** | | small | ~965 MB | **~52 MB** | 780 ms | **463 ms** | | base | ~1.61 GB | **~179 MB** | 2393 ms | **1441 ms** | | large | ~3.86 GB | **~597 MB** | 8151 ms | **4892 ms** | > **Note:** The models used are of the form `vit_{size}_patch16_224.augreg_in21k_ft_in1k`. ### Benchmark on your machine In order to test the inference speed on your machine, you can run the following scripts: chmod +x scripts/benchmark.* # install memory_profiler & threadpoolctl pip install memory_profiler threadpoolctl # run the benchmark of PyTorch python scripts/benchmark.py # run the benchmark of vit.cpp for non-qunatized model ./scripts/b

评论收藏

内容反馈

版权申诉