(源码)基于Sherpancnn框架的实时语音识别系统.zip

共283个文件

h：40个

cc：38个

sh：23个

版权申诉

63 浏览量 2024-11-13 18:59:15 上传评论收藏 2.59MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

(源码)基于Sherpancnn框架的实时语音识别系统.zip （283个子文件）

gradlew.bat 3KB

decode-file-c-api.c 5KB

generate-int8-scale-table.cc 34KB

sherpa-ncnn-ffmpeg.cc 29KB

jni.cc 14KB

zipformer-model.cc 14KB

resample.cc 13KB

conv-emformer-model.cc 10KB

lstm-model.cc 9KB

alsa.cc 9KB

model.cc 8KB

recognizer.cc 7KB

modified-beam-search-decoder.cc 6KB

sherpa-ncnn-microphone.cc 6KB

c-api.cc 6KB

features.cc 5KB

sherpa-ncnn-alsa.cc 5KB

wave-reader.cc 5KB

sherpa-ncnn.cc 4KB

endpoint.cc 4KB

test-resample.cc 4KB

recognizer.cc 3KB

tensorasstrided.cc 3KB

stream.cc 3KB

model.cc 3KB

symbol-table.cc 3KB

greedy-search-decoder.cc 3KB

poolingmodulenoproj.cc 3KB

stack.cc 3KB

simpleupsample.cc 3KB

hypothesis.cc 3KB

endpoint.cc 3KB

meta-data.cc 2KB

features.cc 1KB

sherpa-ncnn.cc 1KB

decoder.cc 1KB

microphone.cc 1KB

stream.cc 1KB

decoder.cc 1KB

display.cc 1KB

CPPLINT.cfg 519B

CPPLINT.cfg 44B

ios.toolchain.cmake 41KB

ncnn.cmake 5KB

portaudio.cmake 3KB

kaldi-native-fbank.cmake 2KB

pybind11.cmake 1KB

arm-linux-gnueabihf.toolchain.cmake 690B

aarch64-linux-gnu.toolchain.cmake 634B

riscv64-linux-gnu.toolchain.cmake 630B

RealtimeSpeechRecognitionDlg.cpp 15KB

RealtimeSpeechRecognition.cpp 3KB

pch.cpp 684B

sherpa-ncnn.cs 8KB

Program.cs 8KB

WaveReader.cs 7KB

DecodeFile.cs 4KB

microphone.csproj 475B

decode-file.csproj 455B

font-awesome.css 39KB

font-awesome.min.css 30KB

style.css 3KB

fontawesome-webfont.eot 162KB

RealtimeSpeechRecognition.vcxproj.filters 2KB

on.flac 43KB

.gitignore 7KB

.gitignore 250B

.gitignore 98B

.gitignore 46B

.gitignore 38B

.gitignore 28B

.gitignore 19B

.gitignore 13B

.gitignore 10B

.gitignore 6B

.gitkeep 0B

main.go 9KB

main.go 8KB

sherpa_ncnn.go 6KB

build.gradle 1KB

settings.gradle 343B

build.gradle 301B

gradlew 6KB

c-api.h 9KB

resample.h 7KB

model.h 5KB

lstm-model.h 5KB

zipformer-model.h 4KB

conv-emformer-model.h 4KB

hypothesis.h 4KB

recognizer.h 3KB

math.h 3KB

共 283 条

## AudioSer介绍 AudioSer是一个先进的深度学习语音识别API服务系统，它可以将上传的.wav格式的语音文件进行转换为文本，并返回给客户端支持多种语言和口音识别，语音转换为文本支持大规模并发请求通过缓存机制避免重复处理相同的文件。 ### 技术细节 API使用了sherpa_ncnn库作为深度学习框架，使用了递归神经网络模型和长短时记忆网络模型对声学特征进行建模，对语音信号序列进行处理，实现语音信号的文字转换，我使用了Flask作为 Web 服务框架，通过RESTful API的方式与客户端交互，让其性能发挥最优。 ### 目录结构 ```python AudioSer ├───model │ ├───decoder_jit_trace-pnnx.ncnn.bin │ ├───... │ └───tokens.txt │───cache │ │───log │ └───voice │───sox │ └───ffmpeg.exe │───static │ ├───css │ ├───... │ └───src │───templates │ └───index.html └─── AudioSer.py |requirements.txt │README.md |config.py └─── ``` ### 使用说明安装模块： ```python pip install -r requirements.txt ``` 运行服务: ```python python AudioSer.py ``` <table style="width:100%"> <tr> <th>AudioSer web</th> </tr> <tr> <td><img src="/python-api-examples/AudioSer/web.png" alt="VITS at training" height="400"></td> </tr> </table> ```python http://127.0.0.1:5620 ``` 运行后可以访问WEB界面进行体验测试。 ### AP调用向服务器发送HTTP POST请求，音频以提交字节流方式提交仅支持wav格式。 ```pytohn POST http://127.0.0.1:5620/voice Content-Type: audio/wav file:1.wav ``` ### curl ```python curl -F "file=@E:\Desktop\1.wav" http://127.0.0.1:5620/voice ``` ### Python ```python import requests url = 'http://127.0.0.1:5620/voice' file = open('E:/Desktop/1.wav', 'rb') files = {'file': ('2.wav', file)} response = requests.post(url, files=files).json() print(response) file.close() ``` 响应示例：服务器将返回一段JSON格式的文本。 ```json { "status": 200, "message": "helloworld" } ``` ```json { "status": 200, "message": "你好世界" } ```

评论收藏

内容反馈

版权申诉