SunOAI：AI语音合成+多语言支持+C/C++实现+x86架构优化+混合精度量化资源-CSDN文库

共28个文件

txt：6个

cpp：5个

h：4个

人工智能

语音生成

x86架构

需积分: 5 141 浏览量 2024-04-25 16:21:02 上传评论收藏 7.56MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

SUNAI1111.zip （28个子文件）

bark.cpp-main

bark.h 6KB

CMakeLists.txt 861B

.vscode

settings.json 2KB

tasks.json 738B

launch.json 631B

.github

workflows

banner.png 3.65MB

build.yml 2KB

assets

banner.png 3.65MB

LICENSE 1KB

download_weights.py 1KB

examples

CMakeLists.txt 233B

common.cpp 3KB

common.h 2KB

quantize

CMakeLists.txt 156B

main.cpp 2KB

main

CMakeLists.txt 254B

main.cpp 2KB

server

CMakeLists.txt 255B

httplib.h 295KB

json.hpp 887KB

server.cpp 5KB

dr_wav.h 236KB

convert.py 12KB

encodec.cpp

.gitmodules 94B

requirements.txt 36B

.gitignore 160B

README.md 5KB

bark.cpp 75KB

# bark.cpp ![bark.cpp](./assets/banner.png) [![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml) Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++. ## Description With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community. - [x] Plain C/C++ implementation without dependencies - [x] AVX, AVX2 and AVX512 for x86 architectures - [x] CPU and GPU compatible backends - [x] Mixed F16 / F32 precision - [x] 4-bit, 5-bit and 8-bit integer quantization - [x] Metal and CUDA backends **Models supported** - [x] [Bark Small](https://huggingface.co/suno/bark-small) - [x] [Bark Large](https://huggingface.co/suno/bark) **Models we want to implement! Please open a PR :)** - [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62)) - [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82)) - [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135)) Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95)) --- Here is a typical run using `bark.cpp`: ```java make -j && ./main -p "This is an audio generated by bark.cpp" __ __ / /_ ____ ______/ /__ _________ ____ / __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \ / /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ / /_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/ /_/ /_/ bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp' bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222 Generating semantic tokens: [========> ] (17%) bark_print_statistics: sample time = 10.98 ms / 138 tokens bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token bark_print_statistics: total time = 633.54 ms Generating coarse tokens: [==================================================>] (100%) bark_print_statistics: sample time = 3.75 ms / 410 tokens bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token bark_print_statistics: total time = 3274.00 ms Generating fine tokens: [==================================================>] (100%) bark_print_statistics: sample time = 38.82 ms / 6144 tokens bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token bark_print_statistics: total time = 4772.92 ms write_wav_on_disk: Number of frames written = 65600. main: load time = 324.14 ms main: eval time = 8806.57 ms main: total time = 9131.68 ms ``` Here are typical audio pieces generated by `bark.cpp`: https://github.com/PABannier/bark.cpp/assets/12958149/f9f240fd-975f-4d69-9bb3-b295a61daaff https://github.com/PABannier/bark.cpp/assets/12958149/c0caadfd-bed9-4a48-8c17-3215963facc1 ## Usage Here are the steps to use Bark.cpp ### Get the code ```bash git clone --recursive https://github.com/PABannier/bark.cpp.git cd bark.cpp git submodule update --init --recursive ``` ### Build In order to build bark.cpp you must use `CMake`: ```bash mkdir build cd build cmake .. cmake --build . --config Release ``` ### Prepare data & Run ```bash # Install Python dependencies python3 -m pip install -r requirements.txt # Download the Bark checkpoints and vocabulary python3 download_weights.py --out-dir ./models --models bark-small bark # Convert the model to ggml format python3 convert.py --dir-model ./models/bark-small --use-f16 # run the inference ./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4 ``` ### (Optional) Quantize weights Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`. Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models. ```bash ./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0 ``` ### Seminal papers - Bark - [Text Prompted Generative Audio](https://github.com/suno-ai/bark) - Encodec - [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) - GPT-3 - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) ### Contributing `bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be - bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section. - feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions. - pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you. ### Coding guidelines - Avoid adding third-party dependencies, extra files, extra headers, etc. - Always consider cross-compatibility with other operating systems and architectures

评论收藏

内容反馈