没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
5页
从近场到远场的多层次迁移学习说话人验证_Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification.pdf
资源推荐
资源详情
资源评论
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker
Verification
Li Zhang
1
, Qing Wang
1
, Kong Aik Lee
2
, Lei Xie
1
∗
, Haizhou Li
3
†
1
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science,
Northwestern Polytechnical University, Xi’an, China
2
Institute for Infocomm Research, A
?
STAR, Singapore
3
Department of Electrical and Computer Engineering, National University of Singapore, Singapore
lizhang.aslp.npu@gmail.com, lxie@nwpu.edu.cn
Abstract
In far-field speaker verification, the performance of speaker em-
beddings is susceptible to degradation when there is a mis-
match between the conditions of enrollment and test speech. To
solve this problem, we propose the feature-level and instance-
level transfer learning in the teacher-student framework to learn
a domain-invariant embedding space. For the feature-level
knowledge transfer, we develop the contrastive loss to trans-
fer knowledge from teacher model to student model, which can
not only decrease the intra-class distance, but also enlarge the
inter-class distance. Moreover, we propose the instance-level
pairwise distance transfer method to force the student model to
preserve pairwise instances distance from the well optimized
embedding space of the teacher model. On FFSVC 2020 eval-
uation set, our EER on Full-eval trials is relatively reduced
by 13.9% compared with the fusion system result on Partial-
eval trials of Task2. On Task1, compared with the winner’s
DenseNet result on Partial-eval trials, our minDCF on Full-eval
trials is relatively reduced by 6.3%. On Task3, the EER and
minDCF of our proposed method on Full-eval trials are very
close to the result of the fusion system on Partial-eval trials.
Our results also outperform other competitive domain adapta-
tion methods.
Index Terms: far-field speaker verification, teacher-student,
domain-invariant, transfer learning
1. Introduction
Speaker verification (SV) is to decide to accept or reject test
utterances according to the enrollment utterances [1]. In re-
cent years, most speaker verification methods based on deep
learning have achieved superior recognition performance under
controlled conditions, i.e. close-talk scenarios with less inter-
ference and less mismatch. However, their performances drop
significantly when the speech is collected in the wild, such as
far-field noisy scenarios or mismatch exists. In far-field scenar-
ios, it is common for users to enroll their voice via close-talking
mobile phones and authenticate in the complex far-field daily
home environment. So most SV formulas in smart speakers and
various voice-enabled IoT gadgets need to deal with domain
mismatch between enrollment and test utterances.
In recent studies, the solutions to domain adaptation in SV
tasks can be divided into three categories. The first method
is data augmentation, which can make the SV model ‘see’
more acoustic environment variances and obtain more robust
* Corresponding author.
† This work was supported by the Science and Engineering Re-
search Council, Agency of Science, Technology and Research, Singa-
pore, through the National Robotics Program under Grant No. 192 25
00054.
speaker embeddings. In far-field speaker verification challenge
2020 (FFSVC 2020) [2], many systems [2, 3, 4, 5] have consid-
ered data augmentation as a solution for domain adaptation to
improve system performances. The second method is to apply
adversarial learning to make the distribution of source and tar-
get domain more similar [6, 7, 8, 9, 10, 11]. The third method
is to adopt teacher-student (T/S) model for knowledge transfer
learning [12, 13, 14, 15]. T/S model was firstly introduced to
reduce model size by distilling knowledge from a well trained
large teacher model to a small student model [16]. Moreover,
T/S model can deal with domain mismatches by transferring
accurate knowledge from teacher model to student model [17],
which can make the student model robust in different mis-
match scenes [18]. Besides knowledge transfer learning with
the Kullback-Leibler (KL) divergence in T/S model, minimiz-
ing the distance from the corresponding embeddings extracted
from teacher model and student model can also decrease the
mismatch between the teacher and student [19]. Liang et al. [20]
and Chen et al. [21] proposed invariant representation learn-
ing with cosine-based consistency embedding training. Jung et
al. [22] proposed the cosine-based T/S to improve short utter-
ance verification performance with the help of long utterances.
However, all previous T/S methods in speech processing
only considered classification accuracy guidance from teacher
model and the embedding layer mapping between the teacher
and student model. The embedding layer mapping aims to
reduce the distance of embeddings from the same classes ex-
tracted from teacher and student models but ignores enlarging
the distance between different classes which is vital as well.
Moreover, the methods mentioned above mainly focus on the
mismatches between training and test set but we deal with the
mismatch between enrollment and test utterances, which is ex-
tremely common in far-field speaker verification.
In this paper, we propose the multi-level transfer learning
from near-field to far-field to solve the mismatch between en-
rollment and test utterances. In the proposed method, we make
good use of the domain-invariant knowledge from close-talking
data to guide our student model to learn with far-field data. In-
spired by the contrastive loss in self-learning [23, 24, 25], we
develop the contrastive loss to increase the distance between
different classes in T/S model. Besides the feature-level knowl-
edge transfer in embedding layer, we propose an instance-level
pairwise distance transfer method to force the student model to
preserve pairwise instances distance which is calculated from
a well optimized embedding space of the teacher model. Ex-
perimental results with the proposed method on FFSVC 2020
evaluation trials illustrate that our methods get significant im-
provements compared with several competitive methods.
arXiv:2106.09320v1 [cs.SD] 17 Jun 2021
资源评论
易小侠
- 粉丝: 6450
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功