# gemma.cpp
gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma
foundation models from Google.
For additional information about Gemma, see
[ai.google.dev/gemma](https://ai.google.dev/gemma). Model weights, including gemma.cpp
specific artifacts, are [available on
kaggle](https://www.kaggle.com/models/google/gemma).
NOTE: 2024-04-04: if using 2B models, please re-download weights from Kaggle and
ensure you have the latest version (-mqa or version 3). We are changing the code
to match the new weights. If you wish to use old weights, change `ConfigGemma2B`
in `configs.h` back to `kVocabSize = 256128` and `kKVHeads = 8`.
## Who is this project for?
Modern LLM inference engines are sophisticated systems, often with bespoke
capabilities extending beyond traditional neural network runtimes. With this
comes opportunities for research and innovation through co-design of high level
algorithms and low-level computation. However, there is a gap between
deployment-oriented C++ inference runtimes, which are not designed for
experimentation, and Python-centric ML research frameworks, which abstract away
low-level computation through compilation.
gemma.cpp provides a minimalist implementation of Gemma 2B and 7B models,
focusing on simplicity and directness rather than full generality. This is
inspired by vertically-integrated model implementations such as
[ggml](https://github.com/ggerganov/ggml),
[llama.c](https://github.com/karpathy/llama2.c), and
[llama.rs](https://github.com/srush/llama2.rs).
gemma.cpp targets experimentation and research use cases. It is intended to be
straightforward to embed in other projects with minimal dependencies and also
easily modifiable with a small ~2K LoC core implementation (along with ~4K LoC
of supporting utilities). We use the [Google
Highway](https://github.com/google/highway) Library to take advantage of
portable SIMD for CPU inference.
For production-oriented edge deployments we recommend standard deployment
pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers
([all model variations here](https://www.kaggle.com/models/google/gemma)).
## Contributing
Community contributions large and small are welcome. See
[DEVELOPERS.md](https://github.com/google/gemma.cpp/blob/main/DEVELOPERS.md)
for additional notes contributing developers and [join the discord by following
this invite link](https://discord.gg/H5jCBAWxAe). This project follows
[Google's Open Source Community
Guidelines](https://opensource.google.com/conduct/).
*Active development is currently done on the `dev` branch. Please open pull
requests targeting `dev` branch instead of `main`, which is intended to be more
stable.*
## Quick Start
### System requirements
Before starting, you should have installed:
- [CMake](https://cmake.org/)
- [Clang C++ compiler](https://clang.llvm.org/get_started.html), supporting at
least C++17.
- `tar` for extracting archives from Kaggle.
Building natively on Windows requires the Visual Studio 2012 Build Tools with the
optional Clang/LLVM C++ frontend (`clang-cl`). This can be installed from the
command line with
[`winget`](https://learn.microsoft.com/en-us/windows/package-manager/winget/):
```sh
winget install --id Kitware.CMake
winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--passive --wait --add Microsoft.VisualStudio.Workload.VCTools;installRecommended --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset"
```
### Step 1: Obtain model weights and tokenizer from Kaggle or Hugging Face Hub
Visit [the Gemma model page on
Kaggle](https://www.kaggle.com/models/google/gemma/frameworks/gemmaCpp) and select `Model Variations
|> Gemma C++`. On this tab, the `Variation` dropdown includes the options below.
Note bfloat16 weights are higher fidelity, while 8-bit switched floating point
weights enable faster inference. In general, we recommend starting with the
`-sfp` checkpoints.
Alternatively, visit the [gemma.cpp](https://huggingface.co/models?other=gemma.cpp)
models on the Hugging Face Hub. First go the the model repository of the model of interest
(see recommendations below). Then, click the `Files and versions` tab and download the
model and tokenizer files. For programmatic downloading, if you have `huggingface_hub`
installed, you can also download by running:
```
huggingface-cli login # Just the first time
huggingface-cli download google/gemma-2b-sfp-cpp --local-dir build/
```
2B instruction-tuned (`it`) and pre-trained (`pt`) models:
| Model name | Description |
| ----------- | ----------- |
| `2b-it` | 2 billion parameter instruction-tuned model, bfloat16 |
| `2b-it-sfp` | 2 billion parameter instruction-tuned model, 8-bit switched floating point |
| `2b-pt` | 2 billion parameter pre-trained model, bfloat16 |
| `2b-pt-sfp` | 2 billion parameter pre-trained model, 8-bit switched floating point |
7B instruction-tuned (`it`) and pre-trained (`pt`) models:
| Model name | Description |
| ----------- | ----------- |
| `7b-it` | 7 billion parameter instruction-tuned model, bfloat16 |
| `7b-it-sfp` | 7 billion parameter instruction-tuned model, 8-bit switched floating point |
| `7b-pt` | 7 billion parameter pre-trained model, bfloat16 |
| `7b-pt-sfp` | 7 billion parameter pre-trained model, 8-bit switched floating point |
> [!NOTE]
> **Important**: We strongly recommend starting off with the `2b-it-sfp` model to
> get up and running.
### Step 2: Extract Files
If you downloaded the models from Hugging Face, skip to step 3.
After filling out the consent form, the download should proceed to retrieve a
tar archive file `archive.tar.gz`. Extract files from `archive.tar.gz` (this can
take a few minutes):
```
tar -xf archive.tar.gz
```
This should produce a file containing model weights such as `2b-it-sfp.sbs` and
a tokenizer file (`tokenizer.spm`). You may want to move these files to a
convenient directory location (e.g. the `build/` directory in this repo).
### Step 3: Build
The build system uses [CMake](https://cmake.org/). To build the gemma inference
runtime, create a build directory and generate the build files using `cmake`
from the top-level project directory. Note if you previous ran `cmake` and are
re-running with a different setting, be sure to clean out the `build/` directory
with `rm -rf build/*` (warning this will delete any other files in the `build/`
directory.
For the 8-bit switched floating point weights (sfp), run cmake with no options:
#### Unix-like Platforms
```sh
cmake -B build
```
**or** if you downloaded bfloat16 weights (any model *without* `-sfp` in the name),
instead of running cmake with no options as above, run cmake with WEIGHT_TYPE
set to [highway's](https://github.com/google/highway) `hwy::bfloat16_t` type
(this will be simplified in the future, we recommend using `-sfp` weights
instead of bfloat16 for faster inference):
```sh
cmake -B build -DWEIGHT_TYPE=hwy::bfloat16_t
```
After running whichever of the above `cmake` invocations that is appropriate for
your weights, you can enter the `build/` directory and run `make` to build the
`./gemma` executable:
```sh
# Configure `build` directory
cmake --preset make
# Build project using make
cmake --build --preset make -j [number of parallel threads to use]
```
Replace `[number of parallel threads to use]` with a number - the number of
cores available on your system is a reasonable heuristic. For example,
`make -j4 gemma` will build using 4 threads. If the `nproc` command is
available, you can use `make -j$(nproc) gemma` as a reasonable default
for the number of threads.
If you aren't sure of the right value for the `-j` flag, you can simply run
`make gemma` instead and it should still build the `./gemma` executable.
> [!NOTE]
> On Windows Subsystem for Linux (WSL) users should set the number of
> parallel threads to 1. Using a larger number may result in errors.
If the build is succe
没有合适的资源?快使用搜索试试~ 我知道了~
算法部署-基于C++推理Google-Gemma模型-轻量级实现-附项目源码+详细流程介绍-优质项目实战.zip
共54个文件
h:16个
cc:11个
md:5个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 6 浏览量
2024-05-16
20:30:05
上传
评论
收藏 151KB ZIP 举报
温馨提示
算法部署_基于C++推理Google-Gemma模型_轻量级实现_附项目源码+详细流程介绍_优质项目实战
资源推荐
资源详情
资源评论
收起资源包目录
算法部署_基于C+推理Google-Gemma模型_轻量级实现_附项目源码+详细流程介绍_优质项目实战.zip (54个子文件)
算法部署_基于C+推理Google-Gemma模型_轻量级实现_附项目源码+详细流程介绍_优质项目实战
BUILD.bazel 3KB
CMakeLists.txt 4KB
bazel
com_google_sentencepiece.patch 89KB
BUILD 174B
sentencepiece.bazel 2KB
compress_weights.cc 3KB
configs.h 5KB
run.cc 9KB
WORKSPACE 130B
.clang-format 35B
experimental
.gitkeep 0B
README.md 70B
benchmark.cc 11KB
examples
hello_world
CMakeLists.txt 2KB
run.cc 2KB
build
.gitignore 14B
README.md 2KB
README.md 323B
gemma.cc 66KB
ops_test.cc 17KB
docs
CONTRIBUTING.md 1KB
MODULE.bazel 2KB
.clang-tidy 9KB
build
.gitignore 24B
compression
blob_store.h 2KB
nuq-inl.h 31KB
distortion.h 3KB
analyze.h 7KB
sfp_test.cc 16KB
sfp-inl.h 22KB
sfp.h 2KB
nuq.h 4KB
stats.h 4KB
nuq_test.cc 16KB
compress-inl.h 17KB
blob_store.cc 14KB
compress.h 7KB
BUILD 3KB
test_util.h 2KB
stats.cc 3KB
ops.h 32KB
.bazelrc 23B
gemma_test.cc 15KB
models
.gitignore 0B
gemma.h 4KB
util
app.h 7KB
convert_weights.py 7KB
args.h 6KB
README.md 20KB
cmake.sh 8KB
goldens
7b-it.txt 216B
2b-it.txt 799B
CMakePresets.json 1KB
.bazelversion 6B
共 54 条
- 1
资源评论
m0_57195758
- 粉丝: 1398
- 资源: 424
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功