基于FlaskWeb的中文自动语音识别演示系统源码+项目说明（,包含语音识别、语音合成、声纹识别之说话人识别）.zip资源-CSDN文库

共430个文件

jpg：102个

js：54个

png：45个

版权申诉

flask

语音识别

62 浏览量 2024-02-28 10:36:01 上传评论收藏 116.96MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于Flask Web的中文自动语音识别演示系统源码+项目说明（,包含语音识别、语音合成、声纹识别之说话人识别）.zip （430个子文件）

index.html.bak 12KB

baidu_aip.py.bak 1KB

speech_model251_e_0_step_120000.model.base 5.66MB

speech_model251_e_0_step_135500.model.base 5.66MB

speech_model251_e_0_step_68000.model.base 5.66MB

config 92B

app.v2.css 201KB

bootstrap.css 179KB

bootstrap.min.css 152KB

bootstrap.css 120KB

main.css 71KB

main.css 68KB

animate.min.css 52KB

font-awesome.min.css 30KB

bootstrap-grid.css 18KB

gw-product.css 15KB

layer.css 14KB

jquery.DonutWidget.min.css 13KB

bootstrap-slider.min.css 10KB

bootstrap-select.min.css 10KB

pages.css 9KB

gw-header.css 8KB

linearicons.css 8KB

ttsdemo.css 8KB

magnific-popup.css 7KB

asrdemo.css 7KB

layer.css 5KB

owl.carousel.css 4KB

nice-select.css 4KB

bootstrap-reboot.css 4KB

toast.css 815B

jquerysctipttop.css 736B

说话人识别实践.docx 400KB

语音合成实践.docx 271KB

语音识别实践.docx 125KB

.DS_Store 10KB

.DS_Store 6KB

fontawesome-webfont.eot 162KB

Linearicons-Free.eot 55KB

loading-0.gif 6KB

loading-2.gif 2KB

loading-1.gif 701B

zhi.gmm 11KB

李航航.gmm 11KB

test.gmm 11KB

jingkun.gmm 11KB

liu.gmm 11KB

hang.gmm 11KB

speech_model251_e_0_step_120000.h5 16.93MB

speech_model251_e_0_step_68000.h5 16.93MB

speech_model251_e_0_step_135500.h5 16.93MB

speech_model251_e_0_step_68000.base.h5 5.68MB

speech_model251_e_0_step_135500.base.h5 5.68MB

speech_model251_e_0_step_120000.base.h5 5.68MB

index.html 17KB

index.html 13KB

blog-home-banner.jpg 1.82MB

g7.jpg 154KB

g1.jpg 122KB

g2.jpg 114KB

about.jpg 87KB

feature-img1.jpg 85KB

g4.jpg 83KB

g3.jpg 80KB

g6.jpg 73KB

feature-img2.jpg 71KB

共 430 条

# Speaker-Recognition A simple Speaker Recognition application in python using Mel-Frequency Cepstrum Coefficients and Gaussian Mixture Model. The mel-frequency cepstrum coefficients of each sample is extracted and fitted into a Gaussian Mixture Model. We have taken 4 samples of 9 people of length 2 seconds each. The samples are taken in normal surroundings, hence some noise is accompanied in all samples. The first three samples are used for training and the fourth one is then tested. Gmm models of these 9 people are already created and are present in the /gmm_models directory. You can find their corresponding samples in /samples directory. The accuracy of our implementation is very high (95%-96%) as tested upon the given samples. The accuracy still depends on the quality of the samples provided and amount of training set. Running instructions : This application runs on python 3.4 (windows 10). Python modules used are python_speech_features, Pyaudio, sklearn, Scipy and numpy. Step 1 : Command Prompt start Open up command prompt and go to the project's directory Step 2 : Registration First you need to register a user, providing the samples of the user's voice. Type : python register.py This will run the register.py file. It will ask for entering the username. Once entered, the script will start recording the voice. It will ask for 3 samples of the user of length 2 seconds each time. For convenience, we have asked user to say the words 'up' for first time, then 'down and then 'left'(although you can say anything, our application is speech independent. So just sing along for 6 seconds xD). Once the 3 samples are taken, the script trains these samples and then creates and dumps the gaussian mixture model in the gmm_models directory. Step 3 : Testing Once the .gmm extension file is create, you can now succesfully test your voice. Type: python speakerrecog.py This script records the voice of the user for 2 seconds. Say something for 2 seconds. Then the script outputs the result as : detected as - "username"

评论收藏

内容反馈

版权申诉