![alt text](assets/banner.jpg)
# Deepvoice3_pytorch
[![PyPI](https://img.shields.io/pypi/v/deepvoice3_pytorch.svg)](https://pypi.python.org/pypi/deepvoice3_pytorch)
[![Build Status](https://travis-ci.org/r9y9/deepvoice3_pytorch.svg?branch=master)](https://travis-ci.org/r9y9/deepvoice3_pytorch)
[![Build status](https://ci.appveyor.com/api/projects/status/8eurjakfaofbr24k?svg=true)](https://ci.appveyor.com/project/r9y9/deepvoice3-pytorch)
PyTorch implementation of convolutional networks-based text-to-speech synthesis models:
1. [arXiv:1710.07654](https://arxiv.org/abs/1710.07654): Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
2. [arXiv:1710.08969](https://arxiv.org/abs/1710.08969): Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.
Audio samples are available at https://r9y9.github.io/deepvoice3_pytorch/.
## Online TTS demo
Notebooks supposed to be executed on https://colab.research.google.com are available:
- [DeepVoice3: Multi-speaker text-to-speech demo](https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/DeepVoice3_multi_speaker_TTS_en_demo.ipynb)
- [DeepVoice3: Single-speaker text-to-speech demo](https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/DeepVoice3_single_speaker_TTS_en_demo.ipynb)
## Highlights
- Convolutional sequence-to-sequence model with attention for text-to-speech synthesis
- Multi-speaker and single speaker versions of DeepVoice3
- Audio samples and pre-trained models
- Preprocessor for [LJSpeech (en)](https://keithito.com/LJ-Speech-Dataset/), [JSUT (jp)](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and [VCTK](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html) datasets, as well as [carpedm20/multi-speaker-tacotron-tensorflow](https://github.com/carpedm20/multi-Speaker-tacotron-tensorflow) compatible custom dataset (in JSON format)
- Language-dependent frontend text processor for English and Japanese
### Samples
- [Ja Step000380000 Predicted](https://soundcloud.com/user-623907374/ja-step000380000-predicted)
- [Ja Step000370000 Predicted](https://soundcloud.com/user-623907374/ja-step000370000-predicted)
- [Ko_single Step000410000 Predicted](https://soundcloud.com/user-623907374/ko-step000410000-predicted)
- [Ko_single Step000400000 Predicted](https://soundcloud.com/user-623907374/ko-step000400000-predicted)
- [Ko_multi Step001680000 Predicted](https://soundcloud.com/user-623907374/step001680000-predicted)
- [Ko_multi Step001700000 Predicted](https://soundcloud.com/user-623907374/step001700000-predicted)
## Pretrained models
**NOTE**: pretrained models are not compatible to master. To be updated soon.
| URL | Model | Data | Hyper paramters | Git commit | Steps |
|-----|------------|----------|--------------------------------------------------|----------------------|--------|
| [link](https://www.dropbox.com/s/5ucl9remrwy5oeg/20180505_deepvoice3_checkpoint_step000640000.pth?dl=0) | DeepVoice3 | LJSpeech | [link](https://www.dropbox.com/s/0ck82unm0bo0rxd/20180505_deepvoice3_ljspeech.json?dl=0) | [abf0a21](https://github.com/r9y9/deepvoice3_pytorch/tree/abf0a21f83aeb451b918f867bc23378f1e2e608b)| 640k |
| [link](https://www.dropbox.com/s/1y8bt6bnggbzzlp/20171129_nyanko_checkpoint_step000585000.pth?dl=0) | Nyanko | LJSpeech | `builder=nyanko,preset=nyanko_ljspeech` | [ba59dc7](https://github.com/r9y9/deepvoice3_pytorch/tree/ba59dc75374ca3189281f6028201c15066830116) | 585k |
| [link](https://www.dropbox.com/s/uzmtzgcedyu531k/20171222_deepvoice3_vctk108_checkpoint_step000300000.pth?dl=0) | Multi-speaker DeepVoice3 | VCTK | `builder=deepvoice3_multispeaker,preset=deepvoice3_vctk` | [0421749](https://github.com/r9y9/deepvoice3_pytorch/tree/0421749af908905d181f089f06956fddd0982d47) | 300k + 300k |
To use pre-trained models, it's highly recommended that you are on the **specific git commit** noted above. i.e.,
```
git checkout ${commit_hash}
```
Then follow the "Synthesize from a checkpoint" section in the README of the specific git commit. Please notice that the latest development version of the repository may not work.
You could try for example:
```
# pretrained model (20180505_deepvoice3_checkpoint_step000640000.pth)
# hparams (20180505_deepvoice3_ljspeech.json)
git checkout 4357976
python synthesis.py --preset=20180505_deepvoice3_ljspeech.json \
20180505_deepvoice3_checkpoint_step000640000.pth \
sentences.txt \
output_dir
```
## Notes on hyper parameters
- Default hyper parameters, used during preprocessing/training/synthesis stages, are turned for English TTS using LJSpeech dataset. You will have to change some of parameters if you want to try other datasets. See `hparams.py` for details.
- `builder` specifies which model you want to use. `deepvoice3`, `deepvoice3_multispeaker` [1] and `nyanko` [2] are surpprted.
- Hyper parameters described in DeepVoice3 paper for single speaker didn't work for LJSpeech dataset, so I changed a few things. Add dilated convolution, more channels, more layers and add guided attention loss, etc. See code for details. The changes are also applied for multi-speaker model.
- Multiple attention layers are hard to learn. Empirically, one or two (first and last) attention layers seems enough.
- With guided attention (see https://arxiv.org/abs/1710.08969), alignments get monotonic more quickly and reliably if we use multiple attention layers. With guided attention, I can confirm five attention layers get monotonic, though I cannot get speech quality improvements.
- Binary divergence (described in https://arxiv.org/abs/1710.08969) seems stabilizes training particularly for deep (> 10 layers) networks.
- Adam with step lr decay works. However, for deeper networks, I find Adam + noam's lr scheduler is more stable.
## Requirements
- Python 3
- CUDA >= 8.0
- PyTorch >= v0.4.0
- TensorFlow >= v1.3
- [nnmnkwii](https://github.com/r9y9/nnmnkwii) >= v0.0.11
- [MeCab](http://taku910.github.io/mecab/) (Japanese only)
## Installation
Please install packages listed above first, and then
```
git clone https://github.com/r9y9/deepvoice3_pytorch && cd deepvoice3_pytorch
pip install -e ".[bin]"
```
## Getting started
### Preset parameters
There are many hyper parameters to be turned depends on what model and data you are working on. For typical datasets and models, parameters that known to work good (**preset**) are provided in the repository. See `presets` directory for details. Notice that
1. `preprocess.py`
2. `train.py`
3. `synthesis.py`
accepts `--preset=<json>` optional parameter, which specifies where to load preset parameters. If you are going to use preset parameters, then you must use same `--preset=<json>` throughout preprocessing, training and evaluation. e.g.,
```
python preprocess.py --preset=presets/deepvoice3_ljspeech.json ljspeech ~/data/LJSpeech-1.0
python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./data/ljspeech
```
instead of
```
python preprocess.py ljspeech ~/data/LJSpeech-1.0
# warning! this may use different hyper parameters used at preprocessing stage
python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./data/ljspeech
```
### 0. Download dataset
- LJSpeech (en): https://keithito.com/LJ-Speech-Dataset/
- VCTK (en): http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html
- JSUT (jp): https://sites.google.com/site/shinnosuketakamichi/publication/jsut
- NIKL (ko) (**Need korean cellphone number to access it**): http://www.korean.go.kr/front/board/boardStandardView.do?board_id=4&mn_id=17&b_seq=464
### 1. Preprocessing
Usage:
```
python preprocess.py ${dataset_name} ${dataset_path} ${out_dir} --preset=<json>
```
Supported `${dataset_name}`s are:
- `ljspeech` (en, single speaker)
- `vctk` (en, multi-speaker)
- `jsut` (jp, single speaker)
- `nikl_m` (ko, multi-speaker)
- `nikl_s` (ko, single speak
没有合适的资源?快使用搜索试试~ 我知道了~
Python-用PyTorch实现DeepVoice3语音合成
共136个文件
py:42个
wav:36个
png:26个
需积分: 50 40 下载量 190 浏览量
2019-08-09
16:58:37
上传
评论 6
收藏 6.71MB ZIP 举报
温馨提示
PyTorch实现基于卷积网络的文本到语音合成模型
资源推荐
资源详情
资源评论
收起资源包目录
Python-用PyTorch实现DeepVoice3语音合成 (136个子文件)
skeleton.css 11KB
normalize.css 8KB
custom.css 3KB
.gitignore 2KB
.gitignore 27B
.gitignore 6B
.gitmodules 0B
header.html 1KB
mathjax.html 1KB
footer.html 1022B
social.html 635B
list.html 520B
single.html 488B
index.html 176B
MANIFEST.in 29B
tox.ini 184B
banner.jpg 1.1MB
deepvoice3_vctk.json 2KB
deepvoice3_niklm.json 2KB
nyanko_ljspeech.json 2KB
deepvoice3_ljspeech.json 2KB
deepvoice3_nikls.json 2KB
README.md 16KB
index.md 16KB
README.md 3KB
LICENSE.md 1KB
README.md 570B
ljspeech-mel-00001.npy 261KB
512logotipo.png 40KB
3_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 28KB
3_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 28KB
0_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 27KB
2_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 27KB
0_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 27KB
2_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 27KB
5_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 27KB
5_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 26KB
4_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 26KB
4_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 26KB
3_20171129_nyanko_checkpoint_step000585000_alignment.png 26KB
1_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker61_alignment.png 26KB
1_20171222_deepvoice3_vctk108_checkpoint_step000300000_speaker62_alignment.png 26KB
2_20171129_nyanko_checkpoint_step000585000_alignment.png 25KB
3_checkpoint_step000210000_alignment.png 25KB
0_20171129_nyanko_checkpoint_step000585000_alignment.png 25KB
5_20171129_nyanko_checkpoint_step000585000_alignment.png 25KB
1_20171129_nyanko_checkpoint_step000585000_alignment.png 24KB
4_20171129_nyanko_checkpoint_step000585000_alignment.png 24KB
5_checkpoint_step000210000_alignment.png 24KB
2_checkpoint_step000210000_alignment.png 24KB
0_checkpoint_step000210000_alignment.png 24KB
4_checkpoint_step000210000_alignment.png 23KB
1_checkpoint_step000210000_alignment.png 23KB
favicon.png 1KB
extract_feats.py 55KB
train.py 36KB
deepvoice3.py 24KB
nyanko.py 16KB
test_deepvoice3.py 12KB
builder.py 10KB
json_meta.py 9KB
modules.py 8KB
test_nyanko.py 7KB
synthesis.py 6KB
gentle_web_align.py 6KB
prepare_htk_alignments_vctk.py 5KB
hparams.py 5KB
__init__.py 5KB
nikl_m.py 3KB
nikl_s.py 3KB
ljspeech.py 3KB
vctk.py 3KB
setup.py 3KB
cleaners.py 3KB
prepare_metafile.py 3KB
audio.py 2KB
conv.py 2KB
__init__.py 2KB
jsut.py 2KB
numbers.py 2KB
cmudict.py 2KB
test_conv.py 2KB
preprocess.py 2KB
__init__.py 2KB
test_frontend.py 2KB
compute_timestamp_ratio.py 2KB
prepare_vctk_labels.py 1KB
lrschedule.py 1KB
test_embedding.py 899B
__init__.py 818B
__init__.py 730B
dump_hparams_to_json.py 725B
symbols.py 618B
test_audio.py 475B
__init__.py 377B
__init__.py 251B
release.sh 632B
config.toml 413B
1_20171129_nyanko_checkpoint_step000585000.wav 225KB
3_20171129_nyanko_checkpoint_step000585000.wav 223KB
共 136 条
- 1
- 2
资源评论
weixin_39840515
- 粉丝: 446
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 2013-2022vc运行命令库
- probleme.exe
- 基于python完成对csv文件的读取并且通过flask框架显示
- 基于SpringBoot的“体质测试数据分析及可视化”的设计与实现.zip
- c语言-c语言编程基础之leetcode题解第21题合并两个有序链表.zip
- 快速开发API服务的框架
- c语言-c语言编程基础之leetcode题解第20题有效的括号.zip
- c语言-c语言编程基础之leetcode题解第19题删除链表的倒数第N个结点.zip
- c语言-c语言编程基础之leetcode题解第17题电话号码的字母组合.zip
- c语言-c语言编程基础之leetcode题解第16题最接近的三数之和.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功