# ASRT-SR-tensorflow2.0
基于深度学习识别THCHS30数据集(refer to [ASRT_SpeechRecognition](https://github.com/nl8590687/ASRT_SpeechRecognition) [AI柠檬](https://blog.ailemon.me/2018/08/29/asrt-a-chinese-speech-recognition-system/))
# 加载数据
产生数据目录和标签的csv文件
python3 prepare_data.py
**train.csv**
wav_filename CHS_transcript Pinyin_transcript
1. ... /ASRT_SR_tensorflow2.0/data/data_thchs30/train/A2_0.wav 绿是阳春烟景大块文章的底色四月的林峦更是绿得鲜活秀媚诗意盎然 lv4 shi4 yang2 chun1 yan1 jing3 da4 kuai4 wen2 zhang1 de5 di3 se4 si4 yue4 de5 lin2 luan2 geng4 shi4 lv4 de5 xian1 huo2 xiu4 mei4 shi1 yi4 ang4 ran2
2. ... /ASRT_SR_tensorflow2.0/data/data_thchs30/train/A2_15.wav 柳宗夏现年六十岁五十年代进入韩外务部工作一九九四年十二月任外交安保首席秘书 liu3 zong1 xia4 xian4 nian2 liu4 shi2 sui4 wu3 shi2 nian2 dai4 jin4 ru4 han2 wai4 wu4 bu4 gong1 zuo4 yi1 jiu3 jiu3 si4 nian2 shi2 er4 yue4 ren4 wai4 jiao1 an1 bao3 shou3 xi2 mi4 shu1
3. ... /ASRT_SR_tensorflow2.0/data/data_thchs30/train/A2_26.wav 等从晕晕乎乎中醒转来时间已经丢下一串长长的脚印消逝得无影无踪 deng3 cong2 yun1 yun5 hu1 hu1 zhong1 xing3 zhuan3 lai2 shi2 jian1 yi3 jing1 diu1 xia4 yi2 chuan4 chang2 chang2 de5 jiao3 yin4 xiao1 shi4 de5 wu2 ying3 wu2 zong1
**dev.csv**
...
**test.csv**
...
SpeechDataset.py generate_data()函数得到 (B, 1600, 200, 1) (B, 64) 的数据和标签
# 特征提取
提取wav的语谱图特征 (eg. A2_33.wav)
![image](https://github.com/Mitomzhou/ASRT_SR_tensorflow2.0/blob/master/image/spectrogram.png)
# 深度架构
![image](https://github.com/Mitomzhou/ASRT_SR_tensorflow2.0/blob/master/image/frame.png)
# 训练声学模型
python3 SpeechModel.py 或者
bash run.sh
流程如下:
root = '/raid/BH/mitom/data/data_thchs30'
# 数据预处理,将data-label输出到cvs文件
SUBSETS = ["train", "dev", "test"]
for SUBSET in SUBSETS:
processor(root, SUBSET, True, root)
# init model
model = SpeechModel(root)
# train
model.train_speech(epoch=70, step_per_epochs=640, batch_size=16, path='model_speech/train/',modelfile='weight.ckpt')
70epoch训练结果日志:
2020-04-16 21:44:39
epoch: 0
Train for 640 steps
1/640 [..............................] - ETA: 2:58:20 - loss: 725.1706
2/640 [..............................] - ETA: 1:30:23 - loss: 660.8561
3/640 [..............................] - ETA: 1:01:02 - loss: 539.7244
4/640 [..............................] - ETA: 46:13 - loss: 493.3954
5/640 [..............................] - ETA: 37:25 - loss: 453.6105
6/640 [..............................] - ETA: 31:28 - loss: 424.4420
...
...
634/640 [============================>.] - ETA: 2s - loss: 36.1764
635/640 [============================>.] - ETA: 1s - loss: 36.1792
636/640 [============================>.] - ETA: 1s - loss: 36.1763
637/640 [============================>.] - ETA: 1s - loss: 36.1718
638/640 [============================>.] - ETA: 0s - loss: 36.1719
639/640 [============================>.] - ETA: 0s - loss: 36.1677
640/640 [==============================] - 255s 398ms/step - loss: 36.1670
save model file successful, model_speech/train/weight.ckpt
***train: error_word_rate: 0.1655813953488372***
***test: error_word_rate: 0.3559650824442289***
2020-04-17 02:46:03
# 测试声学模型
# wav test
model = SpeechModel('')
model.load_model('model_speech/train/weight.ckpt')
model.recognize_speech("./data/A2_0.wav")
pred:
['lv4', 'shi4', 'yang2', 'chun1', 'yan1', 'jing3', 'da4', 'kuai4', 'wen2', 'rang4', 'de5', 'di3', 'se4', 'si4', 'yu4', 'de5', 'ling2', 'luan2', 'ge4', 'shi4', 'lv4', 'de5', 'xian1', 'huo2', 'xiu4', 'wu3', 'mei4', 'shi1', 'yi4', 'ang4', 'ran2']
label:
lv4 shi4 yang2 chun1 yan1 jing3 da4 kuai4 wen2 zhang1 de5 di3 se4 si4 yue4 de5 lin2 luan2 geng4 shi4 lv4 de5 xian1 huo2 xiu4 mei4 shi1 yi4 ang4 ran2
绿 是 阳春 烟 景 大块 文章 的 底色 四月 的 林 峦 更是 绿 得 鲜活 秀媚 诗意 盎然
# 语言模型
lm = LanguageModel('model_language/')
lm.load_model()
list_pinyin = ['kao3','yan2', 'ying1', 'yu3']
list_pinyin = ['hong2','lei2', 'ba1', 'lei2']
list_pinyin = ['zhong1','guo2', 'ren2', 'min2']
list_pinyin = ['cai4', 'zuo4', 'hao3', 'le5',
'yi4', 'wan3', 'qing1', 'zheng1', 'wu3', 'chang1', 'yu2',
'yi4', 'wan3', 'fan1', 'qie2', 'chao3', 'ji1', 'dan4',
'yi4', 'wan3', 'zha4', 'cai4', 'gan1', 'zi4', 'chao3', 'rou4', 'si1']
result = lm.speech2text(list_pinyin)
print(result)
pred & label
菜做好了一碗清正武昌鱼一晚翻茄炒鸡蛋一晚咋采干字炒肉丝
菜做好了一碗清蒸武昌鱼一碗蕃茄炒鸡蛋一碗榨菜干子炒肉丝
# 测试
python3 test.py
from SpeechModel import SpeechModel
from LanguageModel import LanguageModel
# 声学模型
model = SpeechModel('')
model.load_model('model_speech/train/weight.ckpt')
result_pinyin = model.recognize_speech("./data/data_thchs30/data/A2_3.wav")
# 语言模型
lm = LanguageModel('model_language/')
lm.load_model()
result_chs = lm.speech2text(result_pinyin)
print(result_chs)
['cai4', 'zuo4', 'hao3', 'le5', 'yi4', 'wan3', 'qing1', 'zheng1', 'wu3', 'chang1', 'yu2', 'yi4', 'wan3', 'fan1', 'qie2', 'chou3', 'ji1', 'dai4', 'yi4', 'wan3', 'zha4', 'cai4', 'gan1', 'zi4', 'chao3', 'lu4', 'si1']
菜做好了一碗清正武昌鱼一晚翻茄丑期待一晚咋采干字炒绿思
# 总结
声学模型和语言模型性能需改善
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
基于深度学习识别THCHS30数据集.zip (118个子文件)
speech_model251_e_0_step_625000.model.base 5.66MB
checkpoint 79B
train.csv 3KB
dev.csv 1KB
test.csv 1KB
data.csv 65B
.data-00000-of-00001 16.83MB
weight.ckpt.data-00000-of-00002 3KB
weight.ckpt.data-00001-of-00002 5.61MB
base_weight.h5 5.68MB
.index 5KB
weight.ckpt.index 2KB
nohup.log 5.86MB
README.md 6KB
speech_model251_e_0_step_625000.model 5.67MB
spectrogram.png 153KB
frame.png 42KB
org.eclipse.core.resources.prefs 62B
.project 375B
SpeechModel.py 11KB
LanguageModel.py 5KB
feat.py 3KB
prepare_data.py 3KB
SpeechDataset.py 3KB
utils.py 3KB
main.py 900B
test.py 423B
SpeechModel.cpython-35.pyc 9KB
SpeechModel.cpython-37.pyc 7KB
LanguageModel.cpython-35.pyc 3KB
feat.cpython-35.pyc 3KB
feat.cpython-37.pyc 3KB
prepare_data.cpython-35.pyc 2KB
SpeechDataset.cpython-35.pyc 2KB
utils.cpython-35.pyc 2KB
prepare_data.cpython-37.pyc 2KB
SpeechDataset.cpython-37.pyc 2KB
utils.cpython-37.pyc 1KB
.pydevproject 423B
run.sh 133B
A2_15.wav.trn 551B
A2_85.wav.trn 545B
A2_54.wav.trn 535B
A2_47.wav.trn 534B
A2_43.wav.trn 514B
A2_40.wav.trn 489B
A2_58.wav.trn 480B
A2_26.wav.trn 468B
A2_33.wav.trn 465B
D4_753.wav.trn 463B
A2_0.wav.trn 451B
D4_750.wav.trn 448B
D4_752.wav.trn 447B
A2_5.wav.trn 446B
A2_3.wav.trn 401B
A2_92.wav.trn 372B
A2_50.wav.trn 345B
D4_751.wav.trn 343B
D4_752.wav.trn 23B
D4_750.wav.trn 23B
D4_751.wav.trn 23B
D4_753.wav.trn 23B
A2_54.wav.trn 22B
A2_40.wav.trn 22B
A2_15.wav.trn 22B
A2_47.wav.trn 22B
A2_26.wav.trn 22B
A2_43.wav.trn 22B
A2_50.wav.trn 22B
A2_58.wav.trn 22B
A2_85.wav.trn 22B
A2_33.wav.trn 22B
A2_92.wav.trn 22B
A2_3.wav.trn 21B
A2_5.wav.trn 21B
A2_0.wav.trn 21B
language_model2.txt 4.97MB
dic_pinyin.txt 1.94MB
language_model1.txt 47KB
dict.txt 32KB
log.txt 1KB
A2_15.wav 373KB
A2_15.wav 373KB
A2_40.wav 316KB
A2_40.wav 316KB
A2_85.wav 316KB
A2_85.wav 316KB
A2_43.wav 311KB
A2_43.wav 311KB
A2_0.wav 307KB
A2_0.wav 307KB
A2_0.wav 307KB
D4_750.wav 301KB
D4_750.wav 301KB
A2_5.wav 297KB
A2_5.wav 297KB
A2_3.wav 291KB
A2_3.wav 291KB
A2_54.wav 281KB
A2_54.wav 281KB
共 118 条
- 1
- 2
资源评论
Nowl
- 粉丝: 1w+
- 资源: 3873
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功