人工智能加速器套件提供:包括SDK，平台引擎，场景套件在内，合计超过100个项目组成的项目集资源-CSDN文库

共848个文件

java：334个

md：136个

xml：102个

版权申诉

人工智能

118 浏览量 2024-01-12 11:04:23 上传评论收藏 43.76MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

人工智能加速器套件提供: 包括SDK，平台引擎，场景套件在内，合计超过100个项目组成的项目集（848个子文件）

gradlew.bat 2KB

mvnw.cmd 6KB

.env.development 93B

sougou.dict 983KB

user.dict 85B

libwebrtcvadwrapper.dll 30KB

libfvad.dll 11KB

.editorconfig 243B

.eslintignore 34B

.gitignore 439B

build.gradle 2KB

build.gradle 1KB

settings.gradle 39B

gradlew 5KB

index.html 620B

favicon.ico 17KB

voiceprint_sdk.iml 26KB

asr_sdk.iml 25KB

ocr-sdk.iml 23KB

sv2tts_waveglow_sdk.iml 22KB

tacotron2_sdk.iml 22KB

camera_facemask_sdk.iml 22KB

rtsp_facemask_sdk.iml 22KB

mp4_facemask_sdk.iml 22KB

camera-facemask-sdk.iml 22KB

rtsp-facemask-sdk.iml 22KB

mp4-facemask-sdk.iml 22KB

sv2tts_speakencoder_sdk.iml 22KB

tacotron_stft_sdk.iml 21KB

first_order_sdk.iml 20KB

platform-train.iml 14KB

flink_sentence_encoder_sdk.iml 8KB

word_encoder_cn_sdk.iml 6KB

sentence-encoder-sdk.iml 6KB

npy_npz_sdk.iml 4KB

dishes_sdk.iml 4KB

senta_textcnn_sdk.iml 4KB

pedestrian_sdk.iml 4KB

vehicle_sdk.iml 4KB

kafka_sentiment_analysis_sdk.iml 4KB

animal_sdk.iml 4KB

depth_estimation_sdk.iml 4KB

semantic_simnet_bow_sdk.iml 3KB

translation_zh_en_sdk.iml 3KB

porn_detection_sdk.iml 3KB

senta_bilstm_sdk.iml 3KB

lac_sdk.iml 3KB

fasttext_sdk.iml 3KB

mask_sdk.iml 3KB

crowd_sdk.iml 3KB

sentence_encoder_en_sdk.iml 3KB

instance_segmentation_sdk.iml 3KB

sentiment_analysis_sdk.iml 3KB

reflective_vest_sdk.iml 3KB

fire_smoke_sdk.iml 3KB

porn-detection-sdk.iml 3KB

translation-zh-en-sdk.iml 3KB

senta-bilstm-sdk.iml 3KB

lac-sdk.iml 3KB

semantic-simnet-bow-sdk.iml 3KB

depth-estimation-sdk.iml 3KB

animal-sdk.iml 3KB

dish-sdk.iml 3KB

smart_construction_sdk.iml 3KB

vehicle-sdk.iml 3KB

sentiment-analysis-sdk.iml 3KB

instance-segmentation-sdk.iml 3KB

reflective-vest-sdk.iml 3KB

ndarray_lessons.iml 3KB

sentencepiece_sdk.iml 3KB

ph_sdk.iml 2KB

librosa_sdk.iml 2KB

jieba_sdk.iml 2KB

jieba_lib.iml 2KB

test.iml 475B

main.iml 424B

pedestrian-sdk.iml 190B

biggan-sdk.iml 190B

jlibrosa-1.1.8-SNAPSHOT.jar 2.41MB

jieba-lib-0.1.0.jar 2.09MB

live2d_android.jar 101KB

gradle-wrapper.jar 53KB

jitsi-webrtcvadwrapper-1.0-SNAPSHOT.jar 51KB

maven-wrapper.jar 47KB

aias-fire-smoke-lib-0.1.0.jar 24KB

aias-ph-lib-0.1.0.jar 23KB

aias-sv2tts-speakencoder-lib-0.1.0.jar 15KB

aias-tacotron-lib-0.1.0.jar 6KB

JiebaSegmenterTest.java 53KB

CameraConnectionFragment.java 23KB

OCRDetectionTranslator.java 19KB

PhonemeUtils.java 15KB

PhonemeUtils.java 14KB

FileUtil.java 12KB

共 848 条

### Download the model and place it in the models directory - Link: https://github.com/mymagicpower/AIAS/releases/download/apps/voiceprint.zip ### Voiceprint Recognition The so-called voiceprint is the sound wave spectrum that carries speech information displayed by the electroacoustic instrument. The generation of human language is a complex physiological and physical process between the language center and the pronunciation organ of the human body. The size and shape of the pronunciation organs such as the tongue, teeth, larynx, lungs, and nasal cavity used by people when speaking vary greatly, so the voiceprint spectra of any two people are different. Voiceprint recognition (Voiceprint Recognition, VPR), also known as speaker recognition, has two types: speaker recognition (Speaker Identification) and speaker verification (Speaker Verification). The former is used to determine which of several people a certain speech segment is spoken by, which is a "multiple-choice" problem; while the latter is used to confirm whether a certain speech segment is spoken by a specified person, which is a "one-to-one discrimination" problem. Different tasks and applications will use different voiceprint recognition technologies. For example, identification technology may be needed when narrowing the scope of criminal investigation, while confirmation technology is required for bank transactions. Whether it is identification or confirmation, the speaker's voiceprint needs to be modeled first, which is the so-called "training" or "learning" process. The SDK implements the voiceprint recognition model based on PaddlePaddle. The Chinese speech corpus dataset is used, which has voice data from 3242 people and over 1,130,000 speech data. ### SDK contains functions - Voiceprint feature vector extraction -Voiceprint similarity calculation ### Running Example - VoiceprintExample After running successfully, the command line should see the following information: ```text ... # Audio files a_1.wav and a_2.wav are from the same person [INFO ] - input audio: src/test/resources/a_1.wav [INFO ] - input audio: src/test/resources/a_2.wav [INFO ] - input audio: src/test/resources/b_1.wav # Voiceprint 512-dimensional feature vector [INFO ] - a_1.wav feature: [-0.24602059, 0.20456463, -0.306607, ..., 0.016211584, 0.108457334] [INFO ] - a_2.wav feature: [-0.115257666, 0.18287876, -0.45560476, ..., 0.15607461, 0.12677354] [INFO ] - b_1.wav feature: [-0.009925389, -0.02331138, 0.18817122, ..., 0.058160514, -0.041663148] # Similarity calculation [INFO ] - a_1.wav, a_2.wav similarity: 0.9165065 [INFO ] - a_1.wav, b_1.wav similarity: 0.024052326 ``` ### Open source algorithm ### 1. Open source algorithm used by the SDK - [VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle) #### 2. How to export the model? - [how_to_create_paddlepaddle_model](http://docs.djl.ai/docs/paddlepaddle/how_to_create_paddlepaddle_model_zh.html) - Export model - export_model.py ```text import argparse import functools import os import shutil import time from datetime import datetime, timedelta import paddle import paddle.distributed as dist from paddle.io import DataLoader from paddle.metric import accuracy from paddle.static import InputSpec from visualdl import LogWriter from utils.resnet import resnet34 from utils.metrics import ArcNet from utils.reader import CustomDataset from utils.utility import add_arguments, print_arguments parser = argparse.ArgumentParser(description=__doc__) add_arg = functools.partial(add_arguments, argparser=parser) add_arg('gpus', str, '0', 'GPU number used for training, separated by English commas, such as: 0,1') add_arg('batch_size', int, 32, 'Batch size for training') add_arg('num_workers', int, 4, 'Number of threads for reading data') add_arg('num_epoch', int, 50, 'Number of training rounds') add_arg('num_classes', int, 3242, 'Number of classification categories') add_arg('learning_rate', float, 1e-3, 'Size of the initial learning rate') add_arg('input_shape', str, '(None, 1, 257, 257)', 'Data input shape') add_arg('train_list_path', str, 'dataset/train_list.txt', 'Data list path for training data') add_arg('test_list_path', str, 'dataset/test_list.txt', 'Data list path for test data') add_arg('save_model', str, 'models/', 'Path to save the model') add_arg('resume', str, None, 'Resume training, if None, no restored model is used') add_arg('pretrained_model', str, None, 'Path to the pre-trained model, if None, no pre-trained model is used') args = parser.parse_args() # Evaluate the model @paddle.no_grad() def test(model, metric_fc, test_loader): model.eval() accuracies = [] for batch_id, (spec_mag, label) in enumerate(test_loader()): feature = model(spec_mag) output = metric_fc(feature, label) label = paddle.reshape(label, shape=(-1, 1)) acc = accuracy(input=output, label=label) accuracies.append(acc.numpy()[0]) model.train() return float(sum(accuracies) / len(accuracies)) # Save the model def save_model(args,model): input_shape = eval(args.input_shape) # Save the prediction model if not os.path.exists(os.path.join(args.save_model, 'infer')): os.makedirs(os.path.join(args.save_model, 'infer')) paddle.jit.save(layer=model, path=os.path.join(args.save_model, 'infer/model'), input_spec=[InputSpec(shape=[input_shape[0], input_shape[1], input_shape[2], input_shape[3]], dtype='float32')]) if __name__ == '__main__': save_model(args) ```

评论收藏

内容反馈

版权申诉