从近场到远场的多层次迁移学习说话人验证_Multi-LevelTransferLearningfromNear-Fiel资源-CSDN文库

版权申诉

107 浏览量 2022-01-21 01:34:29 上传评论收藏 648KB PDF 举报

资源推荐

资源详情

资源评论

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker

Veriﬁcation

Li Zhang

, Qing Wang

, Kong Aik Lee

, Lei Xie

∗

, Haizhou Li

†

Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science,

Northwestern Polytechnical University, Xi’an, China

Institute for Infocomm Research, A

STAR, Singapore

Department of Electrical and Computer Engineering, National University of Singapore, Singapore

lizhang.aslp.npu@gmail.com, lxie@nwpu.edu.cn

Abstract

In far-ﬁeld speaker veriﬁcation, the performance of speaker em-

beddings is susceptible to degradation when there is a mis-

match between the conditions of enrollment and test speech. To

solve this problem, we propose the feature-level and instance-

level transfer learning in the teacher-student framework to learn

a domain-invariant embedding space. For the feature-level

knowledge transfer, we develop the contrastive loss to trans-

fer knowledge from teacher model to student model, which can

not only decrease the intra-class distance, but also enlarge the

inter-class distance. Moreover, we propose the instance-level

pairwise distance transfer method to force the student model to

preserve pairwise instances distance from the well optimized

embedding space of the teacher model. On FFSVC 2020 eval-

uation set, our EER on Full-eval trials is relatively reduced

by 13.9% compared with the fusion system result on Partial-

eval trials of Task2. On Task1, compared with the winner’s

DenseNet result on Partial-eval trials, our minDCF on Full-eval

trials is relatively reduced by 6.3%. On Task3, the EER and

minDCF of our proposed method on Full-eval trials are very

close to the result of the fusion system on Partial-eval trials.

Our results also outperform other competitive domain adapta-

tion methods.

Index Terms: far-ﬁeld speaker veriﬁcation, teacher-student,

domain-invariant, transfer learning

1. Introduction

Speaker veriﬁcation (SV) is to decide to accept or reject test

utterances according to the enrollment utterances [1]. In re-

cent years, most speaker veriﬁcation methods based on deep

learning have achieved superior recognition performance under

controlled conditions, i.e. close-talk scenarios with less inter-

ference and less mismatch. However, their performances drop

signiﬁcantly when the speech is collected in the wild, such as

far-ﬁeld noisy scenarios or mismatch exists. In far-ﬁeld scenar-

ios, it is common for users to enroll their voice via close-talking

mobile phones and authenticate in the complex far-ﬁeld daily

home environment. So most SV formulas in smart speakers and

various voice-enabled IoT gadgets need to deal with domain

mismatch between enrollment and test utterances.

In recent studies, the solutions to domain adaptation in SV

tasks can be divided into three categories. The ﬁrst method

is data augmentation, which can make the SV model ‘see’

more acoustic environment variances and obtain more robust

* Corresponding author.

† This work was supported by the Science and Engineering Re-

search Council, Agency of Science, Technology and Research, Singa-

pore, through the National Robotics Program under Grant No. 192 25

00054.

speaker embeddings. In far-ﬁeld speaker veriﬁcation challenge

2020 (FFSVC 2020) [2], many systems [2, 3, 4, 5] have consid-

ered data augmentation as a solution for domain adaptation to

improve system performances. The second method is to apply

adversarial learning to make the distribution of source and tar-

get domain more similar [6, 7, 8, 9, 10, 11]. The third method

is to adopt teacher-student (T/S) model for knowledge transfer

learning [12, 13, 14, 15]. T/S model was ﬁrstly introduced to

reduce model size by distilling knowledge from a well trained

large teacher model to a small student model [16]. Moreover,

T/S model can deal with domain mismatches by transferring

accurate knowledge from teacher model to student model [17],

which can make the student model robust in different mis-

match scenes [18]. Besides knowledge transfer learning with

the Kullback-Leibler (KL) divergence in T/S model, minimiz-

ing the distance from the corresponding embeddings extracted

from teacher model and student model can also decrease the

mismatch between the teacher and student [19]. Liang et al. [20]

and Chen et al. [21] proposed invariant representation learn-

ing with cosine-based consistency embedding training. Jung et

al. [22] proposed the cosine-based T/S to improve short utter-

ance veriﬁcation performance with the help of long utterances.

However, all previous T/S methods in speech processing

only considered classiﬁcation accuracy guidance from teacher

model and the embedding layer mapping between the teacher

and student model. The embedding layer mapping aims to

reduce the distance of embeddings from the same classes ex-

tracted from teacher and student models but ignores enlarging

the distance between different classes which is vital as well.

Moreover, the methods mentioned above mainly focus on the

mismatches between training and test set but we deal with the

mismatch between enrollment and test utterances, which is ex-

tremely common in far-ﬁeld speaker veriﬁcation.

In this paper, we propose the multi-level transfer learning

from near-ﬁeld to far-ﬁeld to solve the mismatch between en-

rollment and test utterances. In the proposed method, we make

good use of the domain-invariant knowledge from close-talking

data to guide our student model to learn with far-ﬁeld data. In-

spired by the contrastive loss in self-learning [23, 24, 25], we

develop the contrastive loss to increase the distance between

different classes in T/S model. Besides the feature-level knowl-

edge transfer in embedding layer, we propose an instance-level

pairwise distance transfer method to force the student model to

preserve pairwise instances distance which is calculated from

a well optimized embedding space of the teacher model. Ex-

perimental results with the proposed method on FFSVC 2020

evaluation trials illustrate that our methods get signiﬁcant im-

provements compared with several competitive methods.

arXiv:2106.09320v1 [cs.SD] 17 Jun 2021

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6450
资源: 9万+

从近场到远场的多层次迁移学习说话人验证_Multi-Level Transfer Learning from Near-Fiel

最新资源

从近场到远场的多层次迁移学习说话人验证_Multi-Level Transfer Learning from Near-Fiel

近场与远场衍射的matlab模拟.rar_flewe55_衍射_近场远场_近远场变换_fft_远场matlab

near-field-to-far-field.rar_Matlab 远场_Near to Far field_天线远场_近场

FDTD _C语言_fdtd_FDTD天线_近场远场_天线_

基础电子中的电磁场的近场和远场有什么差别？

脉冲爆轰发动机噪声近场和远场传播研究

计算机实现近场和远场衍射.docx

taibao.zip_matlab 光学_matlab衍射_衍射光学_近场远场_远场

参考资料--辐射(近场-远场)-详解.zip

通过表面等离子体激元实现从近场到远场的纳米尺寸信息的高效传输

EMC原理-传导(共模-差模)-辐射(近场-远场)-详解

参考资料-EMC原理-传导(共模-差模)-辐射(近场-远场)-详解.zip

EMC原理-传导(共模-差模)-辐射(近场-远场)-详解.pdf

NF2FF:用于天线测量的近场到远场转换-matlab开发

ATC规范中建议的远场、近场脉冲非脉冲地震动记录ATC-63 Record Set.zip

基于Matlab模拟无穷小偶极子天线的近场和远场.zip

近场到远场：用于使用圆柱形近场数据计算被测天线的远场方向图。-matlab开发

表面等离子体激元从近场到远场高效传输纳米级信息

Principles of Planar Near-Field Antenna Measurements

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip

在dataGridView的列中出现日历选择控件的类型

最新资源