# barkify
Barkify: an unoffical repo for training Bark, a text-prompted generative audio model by suno-ai.
Bark has two GPT style models which is compatible for prompting and other tricks from NLP. Bark realize a great real world tts result but the repo itself doesn't a training recipe. We want to conduct some experiments or train this model. Here we release our basic training code which might be a guidance of training for open source community.
## Process dataset
We do our experiment on LJspeech. Follow the instrcutions in `process.ipynb`. <br>
For Chinese, we test a famous steamer named `峰哥亡命天涯`. It shows an acceptable result but worse than our other TTS repo.
For English, we test LibriTTS dataset. It works fine and basic items in our roadmap have been proved.
## Training
Stage1 stands for text to semantic and stage2 stands for semantic to acoustic. <br>
You should config paramters in the `configs/barkify.yaml`. We use one A100 to train our model (both S1&S2).
```
# training stage 1 or 2
python trainer.py start_path=/path/to/your/work_env stage=1 name=<dataset>
python trainer.py start_path=/path/to/your/work_env stage=2 name=<dataset>
```
## Inference
Directly use `infer.ipynb` and follow the instrcutions to infer your model.
## Roadmap
We have already achieve the following items and we will release our code soon.
- [x] Construct a basic training code for bark-like generative model
- [x] Test one speaker scenario
- [x] Test multi speaker scenario
- [x] Test speaker semantic prompting
- [x] Test speech/audio acoustic prompting
- [x] Test variable length data(as we use a fixed length now)
These items are pretty data-hungry or rely on massive GPUs. <br>
So we are open to any sponsors or collaborators to finish these jobs. <br>
You could contact us by QQ: 3284494602 or email us at 3284494602@qq.com
- [ ] Long-form generation(which may be longer than 1min.)
- [ ] Support more language(especially for ZH)
- [ ] Paralanguage modeling in the text input
- [ ] Speaker generation by text prompts
- [ ] Emotion/Timbre/Rhythm controlling by text/acoustic prompts
- [ ] Add/Remove background noise(which might be important for downstream tasks)
## Appreciation
- [bark](https://github.com/suno-ai/bark/) is a transformer-based text-to-audio model.
- [Vall-E](https://github.com/lifeiteng/vall-e) is an unofficial PyTorch implementation of VALL-E.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
Barkify是一个非官方的代码仓库,用于训练Bark,这是由suno-ai开发的一款文本提示生成音频的模型。 Bark包含两个与自然语言处理(NLP)提示和其他技巧兼容的GPT风格模型。Bark实现了出色的真实世界文本转语音(TTS)效果,但代码仓库本身并没有提供训练配方。我们希望进行一些实验或训练这个模型。在这里,我们发布了我们的基础训练代码,这可能对开源社区的训练有一定的指导意义。 处理数据集 我们在LJspeech数据集上进行实验。请按照process.ipynb中的说明操作。 对于中文,我们测试了一个名为“峰哥亡命天涯”的著名语音合成器。它显示出可以接受的结果,但不如我们其他的TTS代码仓库的效果。对于英文,我们测试了LibriTTS数据集。它运行良好,我们路线图中的基本项目已经得到了验证。 训练 第一阶段代表文本到语义,第二阶段代表语义到声音。 您应该在configs/barkify.yaml中配置参数。我们使用一块A100 GPU来训练我们的模型(包括S1和S2)。
资源推荐
资源详情
资源评论
收起资源包目录
Barkify-main.zip (17个子文件)
Barkify-main
infer.ipynb 7KB
configs
barkify.yaml 2KB
trainer.py 1KB
process.ipynb 15KB
barkify
utils.py 269B
datas
__init__.py 657B
tokenizer.py 4KB
data.py 7KB
pl_model.py 4KB
bark
__init__.py 295B
model_fine.py 6KB
model.py 9KB
api.py 4KB
generation.py 30KB
requirements.txt 82B
.gitignore 2KB
README.md 2KB
共 17 条
- 1
资源评论
进击的代码家
- 粉丝: 2203
- 资源: 204
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功