# Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech
This repository is the official implementation of ["Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech"](https://arxiv.org/abs/2111.04040v1).
<!--ð Optional: include a graphic explaining your approach/main result, bibtex entry, link to demos, blog posts and tutorials-->
| multi-task learning | meta learning |
| --- | --- |
| ![](evaluation/images/meta-TTS-multi-task.png) | ![](evaluation/images/meta-TTS-meta-task.png) |
### Meta-TTS
![image](evaluation/images/meta-FastSpeech2.png)
## Requirements
This is how I build my environment, which is not exactly needed to be the same:
- Sign up for [Comet.ml](https://www.comet.ml/), find out your workspace and API key via [www.comet.ml/api/my/settings](www.comet.ml/api/my/settings) and fill them in `config/comet.py`. Comet logger is used throughout train/val/test stages.
- Check my training logs [here](https://www.comet.ml/b02901071/meta-tts/view/Zvh3Lz3Wvy2AiWcinD06TaS0G).
- [Optional] Install [pyenv](https://github.com/pyenv/pyenv.git) for Python version
control, change to Python 3.8.6.
```bash
# After download and install pyenv:
pyenv install 3.8.6
pyenv local 3.8.6
```
- [Optional] Install [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv.git) as a plugin of pyenv for clean virtual environment.
```bash
# After install pyenv-virtualenv
pyenv virtualenv meta-tts
pyenv activate meta-tts
```
- Install requirements:
```bash
pip install -r requirements.txt
```
## Proprocessing
First, download [LibriTTS](https://www.openslr.org/60/) and [VCTK](https://datashare.ed.ac.uk/handle/10283/3443), then change the paths in `config/LibriTTS/preprocess.yaml` and `config/VCTK/preprocess.yaml`, then run
```bash
python3 prepare_align.py config/LibriTTS/preprocess.yaml
python3 prepare_align.py config/VCTK/preprocess.yaml
```
for some preparations.
Alignments of LibriTTS is provided [here](https://github.com/kan-bayashi/LibriTTSLabel.git), and
the alignments of VCTK is provided [here](https://drive.google.com/file/d/1ScLIiyIgLRIZ03DqCmrZ8F75miC77o8g/view?usp=sharing).
You have to unzip the files into `preprocessed_data/LibriTTS/TextGrid/` and
`preprocessed_data/VCTK/TextGrid/`.
Then run the preprocessing script:
```bash
python3 preprocess.py config/LibriTTS/preprocess.yaml
# Copy stats from LibriTTS to VCTK to keep pitch/energy normalization the same shift and bias.
cp preprocessed_data/LibriTTS/stats.json preprocessed_data/VCTK/
python3 preprocess.py config/VCTK/preprocess.yaml
```
## Training
To train the models in the paper, run this command:
```bash
python3 main.py -s train \
-p config/preprocess/<corpus>.yaml \
-m config/model/base.yaml \
-t config/train/base.yaml config/train/<corpus>.yaml \
-a config/algorithm/<algorithm>.yaml
```
To reproduce, please use 8 V100 GPUs for meta models, and 1 V100 GPU for baseline
models, or else you might need to tune gradient accumulation step (grad_acc_step)
setting in `config/train/base.yaml` to get the correct meta batch size.
Note that each GPU has its own random seed, so even the meta batch size is the
same, different number of GPUs is equivalent to different random seed.
After training, you can find your checkpoints under
`output/ckpt/<corpus>/<project_name>/<experiment_key>/checkpoints/`, where the
project name is set in `config/comet.py`.
To inference the models, run:
```bash
python3 main.py -s test \
-p config/preprocess/<corpus>.yaml \
-m config/model/base.yaml \
-t config/train/base.yaml config/train/<corpus>.yaml \
-a config/algorithm/<algorithm>.yaml \
-e <experiment_key> -c <checkpoint_file_name>
```
and the results would be under
`output/result/<corpus>/<experiment_key>/<algorithm>/`.
## Evaluation
> **Note:** The evaluation code is not well-refactored yet.
`cd evaluation/` and check [README.md](evaluation/README.md)
## Pre-trained Models
> **Note:** The checkpoints are with older version, might not capatiable with
> the current code. We would fix the problem in the future.
Since our codes are using Comet logger, you might need to create a dummy
experiment by running:
```Python
from comet_ml import Experiment
experiment = Experiment()
```
then put the checkpoint files under
`output/ckpt/LibriTTS/<project_name>/<experiment_key>/checkpoints/`.
You can download pretrained models [here](https://drive.google.com/drive/folders/1Av7afSMcHX6pp2_ZmpHqfJNx6ONM7N8d?usp=sharing).
## Results
| Corpus | LibriTTS | VCTK |
| --- | --- | --- |
| Speaker Similarity | ![](evaluation/images/LibriTTS/errorbar_plot_encoder.png) | ![](evaluation/images/VCTK/errorbar_plot_encoder.png) |
| Speaker Verification | ![](evaluation/images/LibriTTS/eer_encoder.png)<br>![](evaluation/images/LibriTTS/det_encoder.png) | ![](evaluation/images/VCTK/eer_encoder.png)<br>![](evaluation/images/VCTK/det_encoder.png) |
| Synthesized Speech Detection | ![](evaluation/images/LibriTTS/auc_encoder.png)<br>![](evaluation/images/LibriTTS/roc_encoder.png) | ![](evaluation/images/VCTK/auc_encoder.png)<br>![](evaluation/images/VCTK/roc_encoder.png) |
<!--## Contributing-->
<!--ð Pick a licence and describe how to contribute to your code repository. -->
没有合适的资源?快使用搜索试试~ 我知道了~
中文语音克隆内含数据集和预训练模型:voice clone.zip
共142个文件
py:66个
yaml:41个
png:13个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 197 浏览量
2023-04-10
17:39:24
上传
评论
收藏 3.96MB ZIP 举报
温馨提示
中文语音克隆内含数据集和预训练模型:voice clone
资源推荐
资源详情
资源评论
收起资源包目录
中文语音克隆内含数据集和预训练模型:voice clone.zip (142个子文件)
.gitignore 47B
Meta-TTS.iml 441B
evaluate_flowchart.jpg 1.34MB
pair.json 243KB
pair.json 162KB
README.md 5KB
README.md 2KB
README.md 1013B
meta-FastSpeech2.png 353KB
meta-TTS-meta-task.png 137KB
det_encoder.png 78KB
roc_encoder.png 78KB
roc_encoder.png 75KB
det_encoder.png 73KB
meta-TTS-multi-task.png 69KB
errorbar_plot_encoder.png 48KB
errorbar_plot_encoder.png 42KB
eer_encoder.png 40KB
eer_encoder.png 40KB
auc_encoder.png 39KB
auc_encoder.png 37KB
speaker_verification.py 26KB
wavs_to_dvector.py 15KB
similarity_plot.py 14KB
compute_mos.py 14KB
preprocessor.py 13KB
saver.py 12KB
utils.py 11KB
tools.py 10KB
imaml.py 10KB
modules.py 10KB
dataset.py 9KB
collate.py 9KB
centroid_similarity.py 9KB
system.py 9KB
visualize.py 7KB
config.py 7KB
base_adaptor.py 7KB
main.py 7KB
stft.py 6KB
utils.py 6KB
Models.py 6KB
utils.py 5KB
phoneme_embedding.py 5KB
pair_similarity.py 5KB
Layers.py 4KB
baseline_datamodule.py 4KB
meta.py 4KB
loss.py 3KB
fastspeech2.py 3KB
sampler.py 3KB
base_datamodule.py 3KB
speaker_encoder.py 3KB
SubLayers.py 3KB
audio_processing.py 3KB
utils.py 3KB
cleaners.py 2KB
cmudict.py 2KB
pinyin.py 2KB
numbers.py 2KB
__init__.py 2KB
preprocess.py 2KB
progressbar.py 2KB
libritts.py 2KB
vctk.py 2KB
baseline.py 2KB
optimizer.py 2KB
prepare_align.py 1KB
meta_datamodule.py 1KB
model.py 1KB
merge_image.py 1KB
tools.py 1KB
main.py 1023B
symbols.py 862B
scheduler.py 844B
Modules.py 598B
__init__.py 482B
optimizer.py 480B
__init__.py 347B
comet.py 343B
define.py 131B
Constants.py 108B
__init__.py 108B
__init__.py 68B
__init__.py 67B
__init__.py 65B
__init__.py 0B
speaker1_utterance1.TextGrid 2KB
librispeech-lexicon.txt 5.37MB
eer.txt 10KB
mbnet.txt 6KB
mosnet.txt 6KB
mosnet.txt 6KB
mbnet.txt 6KB
wav2vec2.txt 6KB
eer.txt 5KB
requirements.txt 382B
workspace.xml 2KB
modules.xml 268B
misc.xml 193B
共 142 条
- 1
- 2
资源评论
小黑码蚁
- 粉丝: 2414
- 资源: 2045
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功