竞赛资料源码-第十四届中国研究生电子设计竞赛——华为命题-----语音合成.zip

共68个文件

py：25个

pyc：17个

png：6个

版权申诉

毕业设计

课程设计

人工智能

项目开发

资源资料

15 浏览量 2024-02-07 17:23:38 上传评论收藏 34.53MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

第十四届中国研究生电子设计竞赛——华为命题-----语音合成.zip （68个子文件）

资料总结

Tacotron+WaveRNN

preprocess.py 2KB

assets

training_viz.gif 8.18MB

WaveRNN.png 13KB

wavernn_alt_model_hrz2.png 195KB

tacotron_wavernn.png 198KB

LICENSE.txt 1KB

quick_start.py 4KB

hparams.py 4KB

utils

display.py 3KB

distribution.py 5KB

dataset.py 6KB

files.py 199B

paths.py 2KB

text

__init__.py 2KB

LICENSE 1KB

numbers.py 2KB

cleaners.py 2KB

cmudict.py 2KB

symbols.py 720B

recipes.py 336B

__pycache__

symbols.cpython-37.pyc 576B

cleaners.cpython-37.pyc 2KB

numbers.cpython-37.pyc 2KB

recipes.cpython-37.pyc 524B

cmudict.cpython-37.pyc 2KB

__init__.cpython-37.pyc 3KB

__pycache__

display.cpython-37.pyc 3KB

paths.cpython-37.pyc 1KB

files.cpython-37.pyc 379B

distribution.cpython-37.pyc 3KB

dsp.cpython-37.pyc 4KB

__init__.cpython-37.pyc 132B

dsp.py 2KB

gen_tacotron.py 5KB

sentences.txt 442B

train_tacotron.py 5KB

quick_start

quick_start1.wav 160KB

quick_start.wav 225KB

quick_start.wav.png 12KB

quick_start1.wav.png 12KB

train_wavernn.py 4KB

requirements.txt 62B

windowsGUI.py 4KB

models

fatchord_version.py 14KB

tacotron.py 16KB

deepmind_version.py 7KB

__pycache__

tacotron.cpython-37.pyc 13KB

fatchord_version.cpython-37.pyc 13KB

PlayAudio_RNN.py 338B

__pycache__

PlayAudio_RNN.cpython-37.pyc 553B

APP_RNN.cpython-37.pyc 2KB

hparams.cpython-37.pyc 2KB

README.md 3KB

gen_wavernn.py 4KB

电脑端界面.jpg 296KB

第十四届研电赛企业命题获奖名单7-23.pdf 155KB

build_csv.py 661B

Tacotron.zip 14.63MB

手机APP1.jpg 335KB

手机APP3.jpg 118KB

作品简介.docx 750KB

APP.png 51KB

第十四届中国研究生电赛_奔跑吧小白.docx 6.79MB

门型展架海报.jpg 5.55MB

手机APP2.jpg 677KB

第十四届中国研究生电赛_奔跑吧小白.pdf 1.96MB

README.md 891B

决赛-华为命题-南京航空航天大学.gif 157KB

# WaveRNN ##### (Update: Vanilla Tacotron One TTS system just implemented - more coming soon!) ![Tacotron with WaveRNN diagrams](assets/tacotron_wavernn.png) Pytorch implementation of Deepmind's WaveRNN model from [Efficient Neural Audio Synthesis](https://arxiv.org/abs/1802.08435v1) # Installation Ensure you have: * Python >= 3.6 * [Pytorch 1 with CUDA](https://pytorch.org/) Then install the rest with pip: > pip install -r requirements.txt # How to Use ### Quick Start If you want to use TTS functionality immediately you can simply use: > python quick_start.py This will generate everything in the default sentences.txt file and output to a new 'quick_start' folder where you can playback the wav files and take a look at the attention plots You can also use that script to generate custom tts sentences and/or use '-u' to generate unbatched (better audio quality): > python quick_start.py -u --input_text "What will happen if I run this command?' ### Training your own Models ![Attenion and Mel Training GIF](assets/training_viz.gif) Download the [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) Dataset. Edit **hparams.py**, point **wav_path** to your dataset and run: > python preprocess.py or use preprocess.py --path to point directly to the dataset ___ Here's my recommendation on what order to run things: 1 - Train Tacotron with: > python train_tacotron.py 2 - You can leave that finish training or at any point you can use: > python train_tacotron.py --force_gta this will force tactron to create a GTA dataset even if it hasn't finish training. 3 - Train WaveRNN with: > python train_wavernn.py --gta NB: You can always just run train_wavernn.py without --gta if you're not interested in TTS. 4 - Generate Sentences with both models using: > python gen_tacotron.py this will generate default sentences. If you want generate custom sentences you can use > python gen_tacotron.py --input_text "this is whatever you want it to be" And finally, you can always use --help on any of those scripts to see what options are available :) # Samples [Can be found here.](https://fatchord.github.io/model_outputs/) # Pretrained Models Currently there are two pretrained models available in the /pretrained/ folder': Both are trained on LJSpeech * WaveRNN (Mixture of Logistics output) trained to 800k steps * Tacotron trained to 180k steps ____ ### References * [Efficient Neural Audio Synthesis](https://arxiv.org/abs/1802.08435v1) * [Tacotron: Towards End-to-End Speech Synthesis](https://arxiv.org/abs/1703.10135) * [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) ### Acknowlegements * [https://github.com/keithito/tacotron](https://github.com/keithito/tacotron) * [https://github.com/r9y9/wavenet_vocoder](https://github.com/r9y9/wavenet_vocoder) * Special thanks to github users [G-Wang](https://github.com/G-Wang), [geneing](https://github.com/geneing) & [erogol](https://github.com/erogol)

评论收藏

内容反馈

版权申诉