中文语音克隆内含数据集和预训练模型：voiceclone.zip

共142个文件

py：66个

yaml：41个

png：13个

版权申诉

数据集

197 浏览量 2023-04-10 17:39:24 上传评论收藏 3.96MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

中文语音克隆内含数据集和预训练模型：voice clone.zip （142个子文件）

.gitignore 47B

Meta-TTS.iml 441B

evaluate_flowchart.jpg 1.34MB

pair.json 243KB

pair.json 162KB

README.md 5KB

README.md 2KB

README.md 1013B

meta-FastSpeech2.png 353KB

meta-TTS-meta-task.png 137KB

det_encoder.png 78KB

roc_encoder.png 78KB

roc_encoder.png 75KB

det_encoder.png 73KB

meta-TTS-multi-task.png 69KB

errorbar_plot_encoder.png 48KB

errorbar_plot_encoder.png 42KB

eer_encoder.png 40KB

auc_encoder.png 39KB

auc_encoder.png 37KB

speaker_verification.py 26KB

wavs_to_dvector.py 15KB

similarity_plot.py 14KB

compute_mos.py 14KB

preprocessor.py 13KB

saver.py 12KB

utils.py 11KB

tools.py 10KB

imaml.py 10KB

modules.py 10KB

dataset.py 9KB

collate.py 9KB

centroid_similarity.py 9KB

system.py 9KB

visualize.py 7KB

config.py 7KB

base_adaptor.py 7KB

main.py 7KB

stft.py 6KB

utils.py 6KB

Models.py 6KB

utils.py 5KB

phoneme_embedding.py 5KB

pair_similarity.py 5KB

Layers.py 4KB

baseline_datamodule.py 4KB

meta.py 4KB

loss.py 3KB

fastspeech2.py 3KB

sampler.py 3KB

base_datamodule.py 3KB

speaker_encoder.py 3KB

SubLayers.py 3KB

audio_processing.py 3KB

utils.py 3KB

cleaners.py 2KB

cmudict.py 2KB

pinyin.py 2KB

numbers.py 2KB

__init__.py 2KB

preprocess.py 2KB

progressbar.py 2KB

libritts.py 2KB

vctk.py 2KB

baseline.py 2KB

optimizer.py 2KB

prepare_align.py 1KB

meta_datamodule.py 1KB

model.py 1KB

merge_image.py 1KB

tools.py 1KB

main.py 1023B

symbols.py 862B

scheduler.py 844B

Modules.py 598B

__init__.py 482B

optimizer.py 480B

__init__.py 347B

comet.py 343B

define.py 131B

Constants.py 108B

__init__.py 108B

__init__.py 68B

__init__.py 67B

__init__.py 65B

__init__.py 0B

speaker1_utterance1.TextGrid 2KB

librispeech-lexicon.txt 5.37MB

eer.txt 10KB

mbnet.txt 6KB

mosnet.txt 6KB

mbnet.txt 6KB

wav2vec2.txt 6KB

eer.txt 5KB

requirements.txt 382B

workspace.xml 2KB

modules.xml 268B

misc.xml 193B

共 142 条

# Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of ["Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech"](https://arxiv.org/abs/2111.04040v1).  | multi-task learning | meta learning | | --- | --- | | ![](evaluation/images/meta-TTS-multi-task.png) | ![](evaluation/images/meta-TTS-meta-task.png) | ### Meta-TTS ![image](evaluation/images/meta-FastSpeech2.png) ## Requirements This is how I build my environment, which is not exactly needed to be the same: - Sign up for [Comet.ml](https://www.comet.ml/), find out your workspace and API key via [www.comet.ml/api/my/settings](www.comet.ml/api/my/settings) and fill them in `config/comet.py`. Comet logger is used throughout train/val/test stages. - Check my training logs [here](https://www.comet.ml/b02901071/meta-tts/view/Zvh3Lz3Wvy2AiWcinD06TaS0G). - [Optional] Install [pyenv](https://github.com/pyenv/pyenv.git) for Python version control, change to Python 3.8.6. ```bash # After download and install pyenv: pyenv install 3.8.6 pyenv local 3.8.6 ``` - [Optional] Install [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv.git) as a plugin of pyenv for clean virtual environment. ```bash # After install pyenv-virtualenv pyenv virtualenv meta-tts pyenv activate meta-tts ``` - Install requirements: ```bash pip install -r requirements.txt ``` ## Proprocessing First, download [LibriTTS](https://www.openslr.org/60/) and [VCTK](https://datashare.ed.ac.uk/handle/10283/3443), then change the paths in `config/LibriTTS/preprocess.yaml` and `config/VCTK/preprocess.yaml`, then run ```bash python3 prepare_align.py config/LibriTTS/preprocess.yaml python3 prepare_align.py config/VCTK/preprocess.yaml ``` for some preparations. Alignments of LibriTTS is provided [here](https://github.com/kan-bayashi/LibriTTSLabel.git), and the alignments of VCTK is provided [here](https://drive.google.com/file/d/1ScLIiyIgLRIZ03DqCmrZ8F75miC77o8g/view?usp=sharing). You have to unzip the files into `preprocessed_data/LibriTTS/TextGrid/` and `preprocessed_data/VCTK/TextGrid/`. Then run the preprocessing script: ```bash python3 preprocess.py config/LibriTTS/preprocess.yaml # Copy stats from LibriTTS to VCTK to keep pitch/energy normalization the same shift and bias. cp preprocessed_data/LibriTTS/stats.json preprocessed_data/VCTK/ python3 preprocess.py config/VCTK/preprocess.yaml ``` ## Training To train the models in the paper, run this command: ```bash python3 main.py -s train \ -p config/preprocess/<corpus>.yaml \ -m config/model/base.yaml \ -t config/train/base.yaml config/train/<corpus>.yaml \ -a config/algorithm/<algorithm>.yaml ``` To reproduce, please use 8 V100 GPUs for meta models, and 1 V100 GPU for baseline models, or else you might need to tune gradient accumulation step (grad_acc_step) setting in `config/train/base.yaml` to get the correct meta batch size. Note that each GPU has its own random seed, so even the meta batch size is the same, different number of GPUs is equivalent to different random seed. After training, you can find your checkpoints under `output/ckpt/<corpus>/<project_name>/<experiment_key>/checkpoints/`, where the project name is set in `config/comet.py`. To inference the models, run: ```bash python3 main.py -s test \ -p config/preprocess/<corpus>.yaml \ -m config/model/base.yaml \ -t config/train/base.yaml config/train/<corpus>.yaml \ -a config/algorithm/<algorithm>.yaml \ -e <experiment_key> -c <checkpoint_file_name> ``` and the results would be under `output/result/<corpus>/<experiment_key>/<algorithm>/`. ## Evaluation > **Note:** The evaluation code is not well-refactored yet. `cd evaluation/` and check [README.md](evaluation/README.md) ## Pre-trained Models > **Note:** The checkpoints are with older version, might not capatiable with > the current code. We would fix the problem in the future. Since our codes are using Comet logger, you might need to create a dummy experiment by running: ```Python from comet_ml import Experiment experiment = Experiment() ``` then put the checkpoint files under `output/ckpt/LibriTTS/<project_name>/<experiment_key>/checkpoints/`. You can download pretrained models [here](https://drive.google.com/drive/folders/1Av7afSMcHX6pp2_ZmpHqfJNx6ONM7N8d?usp=sharing). ## Results | Corpus | LibriTTS | VCTK | | --- | --- | --- | | Speaker Similarity | ![](evaluation/images/LibriTTS/errorbar_plot_encoder.png) | ![](evaluation/images/VCTK/errorbar_plot_encoder.png) | | Speaker Verification | ![](evaluation/images/LibriTTS/eer_encoder.png)<br>![](evaluation/images/LibriTTS/det_encoder.png) | ![](evaluation/images/VCTK/eer_encoder.png)<br>![](evaluation/images/VCTK/det_encoder.png) | | Synthesized Speech Detection | ![](evaluation/images/LibriTTS/auc_encoder.png)<br>![](evaluation/images/LibriTTS/roc_encoder.png) | ![](evaluation/images/VCTK/auc_encoder.png)<br>![](evaluation/images/VCTK/roc_encoder.png) |

评论收藏

内容反馈

版权申诉