# bark.cpp
![bark.cpp](./assets/banner.png)
[![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml)
Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++.
## Description
With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community.
- [x] Plain C/C++ implementation without dependencies
- [x] AVX, AVX2 and AVX512 for x86 architectures
- [x] CPU and GPU compatible backends
- [x] Mixed F16 / F32 precision
- [x] 4-bit, 5-bit and 8-bit integer quantization
- [x] Metal and CUDA backends
**Models supported**
- [x] [Bark Small](https://huggingface.co/suno/bark-small)
- [x] [Bark Large](https://huggingface.co/suno/bark)
**Models we want to implement! Please open a PR :)**
- [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62))
- [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82))
- [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135))
Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95))
---
Here is a typical run using `bark.cpp`:
```java
make -j && ./main -p "This is an audio generated by bark.cpp"
__ __
/ /_ ____ ______/ /__ _________ ____
/ __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/
/_/ /_/
bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222
Generating semantic tokens: [========> ] (17%)
bark_print_statistics: sample time = 10.98 ms / 138 tokens
bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token
bark_print_statistics: total time = 633.54 ms
Generating coarse tokens: [==================================================>] (100%)
bark_print_statistics: sample time = 3.75 ms / 410 tokens
bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token
bark_print_statistics: total time = 3274.00 ms
Generating fine tokens: [==================================================>] (100%)
bark_print_statistics: sample time = 38.82 ms / 6144 tokens
bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token
bark_print_statistics: total time = 4772.92 ms
write_wav_on_disk: Number of frames written = 65600.
main: load time = 324.14 ms
main: eval time = 8806.57 ms
main: total time = 9131.68 ms
```
Here are typical audio pieces generated by `bark.cpp`:
https://github.com/PABannier/bark.cpp/assets/12958149/f9f240fd-975f-4d69-9bb3-b295a61daaff
https://github.com/PABannier/bark.cpp/assets/12958149/c0caadfd-bed9-4a48-8c17-3215963facc1
## Usage
Here are the steps to use Bark.cpp
### Get the code
```bash
git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive
```
### Build
In order to build bark.cpp you must use `CMake`:
```bash
mkdir build
cd build
cmake ..
cmake --build . --config Release
```
### Prepare data & Run
```bash
# Install Python dependencies
python3 -m pip install -r requirements.txt
# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark
# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16
# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4
```
### (Optional) Quantize weights
Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`.
Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.
```bash
./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0
```
### Seminal papers
- Bark
- [Text Prompted Generative Audio](https://github.com/suno-ai/bark)
- Encodec
- [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438)
- GPT-3
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
### Contributing
`bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be
- bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section.
- feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.
### Coding guidelines
- Avoid adding third-party dependencies, extra files, extra headers, etc.
- Always consider cross-compatibility with other operating systems and architectures
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
SunO AI资源致力于为社区bark.cpp提供实时、真实的多语言文本到语音生成服务。此资源采用纯C/C++实现,无依赖关系,具备以下亮点: 1. 兼容性:适用于x86架构的AVX、AVX2和AVX512,支持CPU和GPU后端,满足不同硬件需求。 2. 精度:支持混合F16/F32精度,以及4位、5位和8位整数量化,平衡计算速度和语音质量。 3. 跨平台:支持Metal和CUDA后端,实现跨平台部署和应用。 适用人群: 1. 开发者:希望在自己的项目中集成实时、真实的多语言文本到语音生成功能。 2. 研究人员:研究多语言文本到语音生成技术,寻求高效、灵活的实现方案。 3. 教育工作者:需要实时、真实的多语言文本到语音生成工具,以提高教学效果。 使用场景及目标: 1. 在线教育:为不同语言的学生提供实时、真实的语音辅导和教学。 2. 智能家居:实现多语言语音助手,提升用户体验。 3. 跨境电商:提供多语言商品介绍和客服支持,降低沟通成本。 4. 智能硬件:集成多语言语音交互功能,拓宽市场竞争力。 其他说明: 1. 高度可定制:开发者可根据需求调整音量、语速、语调等参数,实现个性化语音
资源推荐
资源详情
资源评论
收起资源包目录
SUNAI1111.zip (28个子文件)
bark.cpp-main
bark.h 6KB
CMakeLists.txt 861B
.vscode
settings.json 2KB
tasks.json 738B
launch.json 631B
.github
workflows
banner.png 3.65MB
build.yml 2KB
assets
banner.png 3.65MB
LICENSE 1KB
download_weights.py 1KB
examples
CMakeLists.txt 233B
common.cpp 3KB
common.h 2KB
quantize
CMakeLists.txt 156B
main.cpp 2KB
main
CMakeLists.txt 254B
main.cpp 2KB
server
CMakeLists.txt 255B
httplib.h 295KB
json.hpp 887KB
server.cpp 5KB
dr_wav.h 236KB
convert.py 12KB
encodec.cpp
.gitmodules 94B
requirements.txt 36B
.gitignore 160B
README.md 5KB
bark.cpp 75KB
共 28 条
- 1
资源评论
Ai自然说
- 粉丝: 332
- 资源: 29
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于matlab实现本份代码能对图像进行gabor滤波处理,结合指纹方向图以及指纹沟壑频率特性,对指纹图像进行增强.rar
- 基于matlab实现RBM神经网络实现了手写数字体识别的GUI程序.rar
- 基于matlab实现蝙蝠算法优化相关向量机建模对数据进行建模和预测.rar
- 基于matlab实现编写的禁忌搜索算法,解决了TSP问题,对初学者有重要的参考价值.rar
- 基于matlab实现SOH关于IMU进行姿态求解的C代码,里面包含了两套代码,分别是重力约束法求解和梯度下降法求解.rar
- 1_简单电子邮件客户端.zip
- 基于matlab实现powell优化搜索算法,适合于多参数优化且目标函数中不包含参数的情况.rar
- 基于matlab实现PID神经网络前向和反向学习算法的matlab的原程序代码.rar
- 基于matlab实现nsga-2的多目标优化算法,有注解.rar
- AIR-AP1815-K9-ME-8-5-182-0.zipFor 1815 1830 1840 1850 2700 3700
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功