大模型部署-基于Rust+CUDA加速部署LLaMA-7b-附项目源码+流程教程-优质项目实战.zip资源-CSDN文库

共11个文件

rs：6个

gif：2个

toml：1个

版权申诉

Rust

CUDA

55 浏览量 2024-10-15 21:29:42 上传评论收藏 327KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

大模型部署_基于Rust+CUDA加速部署LLaMA-7b_附项目源码+流程教程_优质项目实战.zip （11个子文件）

大模型部署_基于Rust+CUDA加速部署LLaMA-7b_附项目源码+流程教程_优质项目实战

Cargo.toml 538B

llama-13b-a10.gif 81KB

llama-7b-a10.gif 247KB

src

sampling.rs 1KB

main.rs 8KB

lazy.rs 5KB

loading.rs 7KB

pipeline.rs 4KB

modeling.rs 11KB

convert.py 829B

README.md 2KB

# LLaMa 7b in rust This repo contains the popular [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) language model, fully implemented in the rust programming language! Uses [dfdx](https://github.com/coreylowman/dfdx) tensors and CUDA acceleration. **This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU.** Using CUDA is heavily recommended. Here is the 7b model running on an A10 GPU: ![](llama-7b-a10.gif) # How To Run ## (Once) Setting up model weights ### Download model weights 1. Install git lfs. On ubuntu you can run `sudo apt install git-lfs` 2. Activate git lfs with `git lfs install`. 3. Run the following commands to download the model weights in pytorch format (~25 GB): 1. LLaMa 7b (~25 GB): `git clone https://huggingface.co/decapoda-research/llama-7b-hf` 2. LLaMa 13b (~75 GB): `git clone https://huggingface.co/decapoda-research/llama-13b-hf` 3. LLaMa 65b (~244 GB): `git clone https://huggingface.co/decapoda-research/llama-65b-hf` ### Convert the model 1. (Optional) Run `python3.x -m venv <my_env_name>` to create a python virtual environment, where `x` is your prefered python version 2. (Optional, requires 1.) Run `source <my_env_name>\bin\activate` (or `<my_env_name>\Scripts\activate` if on Windows) to activate the environment 3. Run `pip install numpy torch` 4. Run `python convert.py` to convert the model weights to rust understandable format: a. LLaMa 7b: `python convert.py` b. LLaMa 13b: `python convert.py llama-13b-hf` c. LLaMa 65b: `python convert.py llama-65b-hf` ## (Once) Compile You can compile with normal rust commands: With cuda: ```bash cargo build --release -F cuda ``` Without cuda: ```bash cargo build --release ``` ## Run the executable With default args: ```bash ./target/release/llama-dfdx --model <model-dir> generate "<prompt>" ./target/release/llama-dfdx --model <model-dir> chat ./target/release/llama-dfdx --model <model-dir> file <path to prompt file> ``` To see what commands/custom args you can use: ```bash ./target/release/llama-dfdx --help ```

评论收藏

内容反馈

版权申诉