### Download the model and place it in the models directory
- Link: https://github.com/mymagicpower/AIAS/releases/download/apps/voiceprint.zip
### Voiceprint Recognition
The so-called voiceprint is the sound wave spectrum that carries speech information displayed by the electroacoustic instrument. The generation of human language is a complex physiological and physical process between the language center and the pronunciation organ of the human body. The size and shape of the pronunciation organs such as the tongue, teeth, larynx, lungs, and nasal cavity used by people when speaking vary greatly, so the voiceprint spectra of any two people are different. Voiceprint recognition (Voiceprint Recognition, VPR), also known as speaker recognition, has two types: speaker recognition (Speaker Identification) and speaker verification (Speaker Verification). The former is used to determine which of several people a certain speech segment is spoken by, which is a "multiple-choice" problem; while the latter is used to confirm whether a certain speech segment is spoken by a specified person, which is a "one-to-one discrimination" problem. Different tasks and applications will use different voiceprint recognition technologies. For example, identification technology may be needed when narrowing the scope of criminal investigation, while confirmation technology is required for bank transactions. Whether it is identification or confirmation, the speaker's voiceprint needs to be modeled first, which is the so-called "training" or "learning" process.
The SDK implements the voiceprint recognition model based on PaddlePaddle. The Chinese speech corpus dataset is used, which has voice data from 3242 people and over 1,130,000 speech data.
### SDK contains functions
- Voiceprint feature vector extraction
-Voiceprint similarity calculation
### Running Example - VoiceprintExample
After running successfully, the command line should see the following information:
```text
...
# Audio files a_1.wav and a_2.wav are from the same person
[INFO ] - input audio: src/test/resources/a_1.wav
[INFO ] - input audio: src/test/resources/a_2.wav
[INFO ] - input audio: src/test/resources/b_1.wav
# Voiceprint 512-dimensional feature vector
[INFO ] - a_1.wav feature: [-0.24602059, 0.20456463, -0.306607, ..., 0.016211584, 0.108457334]
[INFO ] - a_2.wav feature: [-0.115257666, 0.18287876, -0.45560476, ..., 0.15607461, 0.12677354]
[INFO ] - b_1.wav feature: [-0.009925389, -0.02331138, 0.18817122, ..., 0.058160514, -0.041663148]
# Similarity calculation
[INFO ] - a_1.wav, a_2.wav similarity: 0.9165065
[INFO ] - a_1.wav, b_1.wav similarity: 0.024052326
```
### Open source algorithm
### 1. Open source algorithm used by the SDK
- [VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)
#### 2. How to export the model?
- [how_to_create_paddlepaddle_model](http://docs.djl.ai/docs/paddlepaddle/how_to_create_paddlepaddle_model_zh.html)
- Export model
- export_model.py
```text
import argparse
import functools
import os
import shutil
import time
from datetime import datetime, timedelta
import paddle
import paddle.distributed as dist
from paddle.io import DataLoader
from paddle.metric import accuracy
from paddle.static import InputSpec
from visualdl import LogWriter
from utils.resnet import resnet34
from utils.metrics import ArcNet
from utils.reader import CustomDataset
from utils.utility import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('gpus', str, '0', 'GPU number used for training, separated by English commas, such as: 0,1')
add_arg('batch_size', int, 32, 'Batch size for training')
add_arg('num_workers', int, 4, 'Number of threads for reading data')
add_arg('num_epoch', int, 50, 'Number of training rounds')
add_arg('num_classes', int, 3242, 'Number of classification categories')
add_arg('learning_rate', float, 1e-3, 'Size of the initial learning rate')
add_arg('input_shape', str, '(None, 1, 257, 257)', 'Data input shape')
add_arg('train_list_path', str, 'dataset/train_list.txt', 'Data list path for training data')
add_arg('test_list_path', str, 'dataset/test_list.txt', 'Data list path for test data')
add_arg('save_model', str, 'models/', 'Path to save the model')
add_arg('resume', str, None, 'Resume training, if None, no restored model is used')
add_arg('pretrained_model', str, None, 'Path to the pre-trained model, if None, no pre-trained model is used')
args = parser.parse_args()
# Evaluate the model
@paddle.no_grad()
def test(model, metric_fc, test_loader):
model.eval()
accuracies = []
for batch_id, (spec_mag, label) in enumerate(test_loader()):
feature = model(spec_mag)
output = metric_fc(feature, label)
label = paddle.reshape(label, shape=(-1, 1))
acc = accuracy(input=output, label=label)
accuracies.append(acc.numpy()[0])
model.train()
return float(sum(accuracies) / len(accuracies))
# Save the model
def save_model(args,model):
input_shape = eval(args.input_shape)
# Save the prediction model
if not os.path.exists(os.path.join(args.save_model, 'infer')):
os.makedirs(os.path.join(args.save_model, 'infer'))
paddle.jit.save(layer=model,
path=os.path.join(args.save_model, 'infer/model'),
input_spec=[InputSpec(shape=[input_shape[0], input_shape[1], input_shape[2], input_shape[3]], dtype='float32')])
if __name__ == '__main__':
save_model(args)
```
没有合适的资源?快使用搜索试试~ 我知道了~
人工智能加速器套件 提供: 包括SDK,平台引擎,场景套件在内,合计超过100个项目组成的项目集
共848个文件
java:334个
md:136个
xml:102个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 118 浏览量
2024-01-12
11:04:23
上传
评论
收藏 43.76MB ZIP 举报
温馨提示
AIAS (AI Acceleration Suite) - 人工智能加速器套件。提供: 包括SDK,平台引擎,场景套件在内,合计超过100个项目组成的项目集。
资源推荐
资源详情
资源评论
收起资源包目录
人工智能加速器套件 提供: 包括SDK,平台引擎,场景套件在内,合计超过100个项目组成的项目集 (848个子文件)
gradlew.bat 2KB
mvnw.cmd 6KB
.env.development 93B
sougou.dict 983KB
user.dict 85B
libwebrtcvadwrapper.dll 30KB
libfvad.dll 11KB
.editorconfig 243B
.eslintignore 34B
.gitignore 439B
build.gradle 2KB
build.gradle 1KB
settings.gradle 39B
gradlew 5KB
index.html 620B
favicon.ico 17KB
voiceprint_sdk.iml 26KB
asr_sdk.iml 25KB
ocr-sdk.iml 23KB
sv2tts_waveglow_sdk.iml 22KB
tacotron2_sdk.iml 22KB
camera_facemask_sdk.iml 22KB
rtsp_facemask_sdk.iml 22KB
mp4_facemask_sdk.iml 22KB
camera-facemask-sdk.iml 22KB
rtsp-facemask-sdk.iml 22KB
mp4-facemask-sdk.iml 22KB
sv2tts_speakencoder_sdk.iml 22KB
tacotron_stft_sdk.iml 21KB
first_order_sdk.iml 20KB
platform-train.iml 14KB
flink_sentence_encoder_sdk.iml 8KB
word_encoder_cn_sdk.iml 6KB
sentence-encoder-sdk.iml 6KB
npy_npz_sdk.iml 4KB
dishes_sdk.iml 4KB
senta_textcnn_sdk.iml 4KB
pedestrian_sdk.iml 4KB
vehicle_sdk.iml 4KB
kafka_sentiment_analysis_sdk.iml 4KB
animal_sdk.iml 4KB
depth_estimation_sdk.iml 4KB
semantic_simnet_bow_sdk.iml 3KB
translation_zh_en_sdk.iml 3KB
porn_detection_sdk.iml 3KB
senta_bilstm_sdk.iml 3KB
lac_sdk.iml 3KB
fasttext_sdk.iml 3KB
mask_sdk.iml 3KB
crowd_sdk.iml 3KB
sentence_encoder_en_sdk.iml 3KB
instance_segmentation_sdk.iml 3KB
sentiment_analysis_sdk.iml 3KB
reflective_vest_sdk.iml 3KB
fire_smoke_sdk.iml 3KB
porn-detection-sdk.iml 3KB
translation-zh-en-sdk.iml 3KB
senta-bilstm-sdk.iml 3KB
lac-sdk.iml 3KB
semantic-simnet-bow-sdk.iml 3KB
depth-estimation-sdk.iml 3KB
animal-sdk.iml 3KB
dish-sdk.iml 3KB
smart_construction_sdk.iml 3KB
vehicle-sdk.iml 3KB
sentiment-analysis-sdk.iml 3KB
instance-segmentation-sdk.iml 3KB
reflective-vest-sdk.iml 3KB
ndarray_lessons.iml 3KB
sentencepiece_sdk.iml 3KB
ph_sdk.iml 2KB
librosa_sdk.iml 2KB
jieba_sdk.iml 2KB
jieba_lib.iml 2KB
test.iml 475B
main.iml 424B
pedestrian-sdk.iml 190B
biggan-sdk.iml 190B
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB
jieba-lib-0.1.0.jar 2.09MB
live2d_android.jar 101KB
gradle-wrapper.jar 53KB
jitsi-webrtcvadwrapper-1.0-SNAPSHOT.jar 51KB
maven-wrapper.jar 47KB
aias-fire-smoke-lib-0.1.0.jar 24KB
aias-ph-lib-0.1.0.jar 23KB
aias-sv2tts-speakencoder-lib-0.1.0.jar 15KB
aias-tacotron-lib-0.1.0.jar 6KB
JiebaSegmenterTest.java 53KB
CameraConnectionFragment.java 23KB
OCRDetectionTranslator.java 19KB
PhonemeUtils.java 15KB
PhonemeUtils.java 14KB
PhonemeUtils.java 14KB
FileUtil.java 12KB
共 848 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9
资源评论
Java程序员-张凯
- 粉丝: 1w+
- 资源: 6732
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功