基于Python的深度学习的中文语音识别系统.zip_深度学习语音识别,基于python的语音识别资源-CSDN文库

共30个文件

py：8个

txt：8个

ipynb：3个

版权申诉

Python

深度学习

中文语音识别

语音识别系统

课程设计

5星 · 超过95%的资源 183 浏览量 2022-06-14 14:12:43 上传评论 41 收藏 108.4MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于Python的深度学习的中文语音识别系统.zip （30个子文件）

deepspeechrecognition

设计报告.docx 39KB

tutorial

self-attention_tutorial.ipynb 417KB

data

zh.tsv 23.69MB

CNN+CTC_tutorial.ipynb 409KB

CBHG_tutorail.ipynb 121KB

utils.py 9KB

data

stcmd.txt 13MB

prime.txt 12.04MB

aishell_train.txt 19.08MB

aishell_dev.txt 2.24MB

thchs_test.txt 711KB

thchs_train.txt 2.79MB

aishell_test.txt 1.14MB

thchs_dev.txt 254KB

test.py 3KB

model_language

cbhg.py 0B

transformer.py 13KB

train.py 4KB

LICENSE 1KB

logs_lm

checkpoint 73B

model_20.index 6KB

model_20.data-00000-of-00001 81.63MB

model_20.meta 1.5MB

.gitignore 19B

logs_am

model.h5 6.79MB

README.md 24KB

.gitattributes 101B

model_speech

fsmn.py 0B

gru_ctc.py 3KB

cnn_ctc.py 3KB

# 基于深度学习的中文语音识别系统 **注意**：本人于近期想对该项目进行翻新，tf 现在已经将 keras 作为重要的一部分，因此可能将代码用 TensorFlow2 来进行修改。大家有什么建议可以在 issue 提一下。 ## 1. Introduction 该系统实现了基于深度框架的语音识别中的声学模型和语言模型建模，其中声学模型包括 CNN-CTC、GRU-CTC、CNN-RNN-CTC，语言模型包含 [transformer](https://jalammar.github.io/illustrated-transformer/)、[CBHG](https://github.com/crownpku/Somiao-Pinyin)，数据集包含 stc、primewords、Aishell、thchs30 四个数据集。本项目现已训练一个迷你的语音识别系统，将项目下载到本地上，下载 [thchs 数据集](http://www.openslr.org/resources/18/data_thchs30.tgz)并解压至 data，运行 `test.py`，不出意外能够进行识别，结果如下： ``` the 0 th example. 文本结果： lv4 shi4 yang2 chun1 yan1 jing3 da4 kuai4 wen2 zhang1 de di3 se4 si4 yue4 de lin2 luan2 geng4 shi4 lv4 de2 xian1 huo2 xiu4 mei4 shi1 yi4 ang4 ran2 原文结果： lv4 shi4 yang2 chun1 yan1 jing3 da4 kuai4 wen2 zhang1 de di3 se4 si4 yue4 de lin2 luan2 geng4 shi4 lv4 de2 xian1 huo2 xiu4 mei4 shi1 yi4 ang4 ran2 原文汉字：绿是阳春烟景大块文章的底色四月的林峦更是绿得鲜活秀媚诗意盎然识别结果：绿是阳春烟景大块文章的底色四月的林峦更是绿得鲜活秀媚诗意盎然 ``` 若自己建立模型则需要删除现有模型，重新配置参数训练，具体实现流程参考本页最后。 ## 2. 声学模型声学模型采用 CTC 进行建模，采用 CNN-CTC、GRU-CTC、FSMN 等模型 `model_speech`，采用 keras 作为编写框架。 - 论文地址：[http://www.infocomm-journal.com/dxkx/CN/article/openArticlePDFabs.jsp?id=166970](http://www.infocomm-journal.com/dxkx/CN/article/openArticlePDFabs.jsp?id=166970) - tutorial：[https://blog.csdn.net/chinatelecom08/article/details/85013535](https://blog.csdn.net/chinatelecom08/article/details/85013535) ## 3. 语言模型新增基于 self-attention 结构的语言模型 `model_language\transformer.py`，该模型已经被证明有强于其他框架的语言表达能力。 - 论文地址：[https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)。 - tutorial：[https://blog.csdn.net/chinatelecom08/article/details/85051817](https://blog.csdn.net/chinatelecom08/article/details/85051817) 基于 CBHG 结构的语言模型 `model_language\cbhg.py`，该模型之前用于谷歌声音合成，移植到该项目中作为基于神经网络的语言模型。 - 原理地址：[https://github.com/crownpku/Somiao-Pinyin](https://github.com/crownpku/Somiao-Pinyin) - tutorial：[https://blog.csdn.net/chinatelecom08/article/details/85048019](https://blog.csdn.net/chinatelecom08/article/details/85048019) ## 4. 数据集包括 stc、primewords、Aishell、thchs30 四个数据集，共计约 430 小时, 相关链接：[http://www.openslr.org/resources.php](http://www.openslr.org/resources.php) | Name | train | dev | test | | ---------- | :----: | ----: | ---: | | aishell | 120098 | 14326 | 7176 | | primewords | 40783 | 5046 | 5073 | | thchs-30 | 10000 | 893 | 2495 | | st-cmd | 10000 | 600 | 2000 | 数据标签整理在 `data` 路径下，其中 primewords、st-cmd 目前未区分训练集测试集。若需要使用所有数据集，只需解压到统一路径下，然后设置 utils.py 中 datapath 的路径即可。与数据相关参数在 `utils.py` 中： - data_type: train, test, dev - data_path: 对应解压数据的路径 - thchs30, aishell, prime, stcmd: 是否使用该数据集 - batch_size: batch_size - data_length: 我自己做实验时写小一些看效果用的，正常使用设为 None 即可 - shuffle：正常训练设为 True，是否打乱训练顺序 ```py def data_hparams(): params = tf.contrib.training.HParams( # vocab data_type = 'train', data_path = 'data/', thchs30 = True, aishell = True, prime = False, stcmd = False, batch_size = 1, data_length = None, shuffle = False) return params ``` ## 5. 配置使用 train.py 文件进行模型的训练。声学模型可选 cnn-ctc、gru-ctc，只需修改导入路径即可： `from model_speech.cnn_ctc import Am, am_hparams` `from model_speech.gru_ctc import Am, am_hparams` 语言模型可选 transformer 和 cbhg: `from model_language.transformer import Lm, lm_hparams` `from model_language.cbhg import Lm, lm_hparams` ### 模型识别使用 test.py 检查模型识别效果。模型选择需和训练一致。 # 一个简单的例子 # 1. 声学模型训练 train.py 文件 ```python import os import tensorflow as tf from utils import get_data, data_hparams # 准备训练所需数据 data_args = data_hparams() data_args.data_length = 10 train_data = get_data(data_args) # 1.声学模型训练----------------------------------- from model_speech.cnn_ctc import Am, am_hparams am_args = am_hparams() am_args.vocab_size = len(train_data.am_vocab) am = Am(am_args) if os.path.exists('logs_am/model.h5'): print('load acoustic model...') am.ctc_model.load_weights('logs_am/model.h5') epochs = 10 batch_num = len(train_data.wav_lst) // train_data.batch_size for k in range(epochs): print('this is the', k+1, 'th epochs trainning !!!') #shuffle(shuffle_list) batch = train_data.get_am_batch() am.ctc_model.fit_generator(batch, steps_per_epoch=batch_num, epochs=1) am.ctc_model.save_weights('logs_am/model.h5') ``` ``` get source list... load thchs_train.txt data... 100%|████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 236865.96it/s] load aishell_train.txt data... 100%|██████████████████████████████████████████████████████████████████████| 120098/120098 [00:00<00:00, 260863.15it/s] make am vocab... 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9986.44it/s] make lm pinyin vocab... 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9946.18it/s] make lm hanzi vocab... 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9950.90it/s] Using TensorFlow backend. _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= the_inputs (InputLayer) (None, None, 200, 1) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, None, 200, 32) 320 _________________________________________________________________ batch_normalization_1 (Batch (None, None, 200, 32) 128 _________________________________________________________________ conv2d_2 (Conv2D) (None, None, 200, 32) 9248 _________________________________________________________________ batch_normalization_2 (Batch (None, None, 200, 32) 128 ___________

评论收藏

内容反馈

版权申诉

大嘴巴子Pro

2022-06-26

我真是服了你们这些老六，明明就是GitHub上的源码，搬过来就要40？？这么想钱？里面fsmn，cbhg都是空文件，你是真的狗啊
zwd1112

2023-02-21

内容与描述一致，超赞的资源，值得借鉴的内容很多，支持！
星野各一半

2024-01-14

资源内容总结的很到位，内容详实，很受用，学到了~
XXXUUXXX

2024-10-17

发现一个超赞的资源，赶紧学习起来，大家一起进步，支持！
xidbswk

2024-04-10

资源简直太好了，完美解决了当下遇到的难题，这样的资源很难不支持~