没有合适的资源?快使用搜索试试~ 我知道了~
A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION ...
0 下载量 142 浏览量
2021-02-11
01:49:23
上传
评论
收藏 193KB PDF 举报
温馨提示
A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION USING DEEP STACKING NETWORK
资源推荐
资源详情
资源评论
A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION USING
DEEP STACKING NETWORK
Hui Zhang
1
, Xueliang Zhang
1
, Shuai Nie
2
, Guanglai Gao
1
, Wenju Liu
2
1
Computer Science Department, Inner Mongolia University, Hohhot, China, 010021
2
National Laboratory of Patten Recognition (NLPR), Institute of Automation, University of Chinese
Academy of Sciences, Beijing, China, 100190
alzhu.san@163.com, cszxl@imu.edu.cn, nss90221@gmail.com, csggl@imu.edu.cn, lwj@nlpr.ia.ac.cn
ABSTRACT
Pitch information is an important cue for speech separation.
However, pitch estimation in noisy condition is also a task
as challenging as speech separation. In this paper, we
propose a supervised learning architecture which combines
these two problems concisely. The proposed algorithm is
based on deep stacking network (DSN) which provides a
method of stacking simple processing modules in building
deep architecture. In the training stage, an ideal binary mask
is used as target. The input vector includes the outputs
of lower module and frame-level features which consist of
spectral and pitch-based features. In the testing stage, each
module provides an estimated binary mask which is employed
to re-estimate pitch. Then we update the pitch-based features
to the next module. This procedure is embedded iteratively
in DSN, and we obtain the final separation results from the
last module of DSN. Systematic evaluations show that the
proposed approach produces high quality estimated binary
mask and outperforms recent systems in generalization.
Index Terms— Speech separation, Pitch estimation,
Computational auditory scene analysis, Supervised learning
1. INTRODUCTION
In realistic environments, noise usually degrades the speech
intelligibility of hearing-impaired listeners or performance
of automatic speech recognition (ASR) systems. Speech
separation aims to remove noise by separating target speech
from background interference. It is helpful for both hearing
aids wearers and ASR systems [1, 2]. Computational auditory
scene analysis (CASA) is a promising method to solve the
speech separation problem [3].
CASA defines the goal of speech separation as computing
an ideal binary mask (IBM) [4], which is useful for
improving speech intelligibility [5] and the performance
of speech/speaker recognition [6, 7]. The IBM is a time-
frequency (T-F) mask, which can be computed from premixed
This research was supported in part by the China National
Nature Science Foundation (No.61365006, No.61263037, No.61305027,
No.91120303, No.61273267, No.61403370, and No.90820011).
target and interference. Specifically, in a T-F unit, if the
signal-to-noise ratio (SNR) is greater than a local SNR
criterion (LC), the corresponding mask element in the IBM
is set to 1 (target-dominant). Otherwise, the mask element is
set to 0 (interference-dominant).
When adopting IBM as the computational goal of
CASA, we can naturally formulate the speech separation
as a binary classification problem [5]. From the viewpoint
of classification, the feature selection is important. Many
features have been inspected. Those features include:
pitch-based features [8], amplitude modulation spectrum
(AMS) [9], relative spectral transform and perceptual linear
prediction (RASTA-PLP), Mel-frequency cepstral coefficient
(MFCC) and Gammatone frequency cepstral coefficient
(GFCC) [10] etc. Wang et al. [10] suggest that pitch-based
features have a good generalization in speech separation.
Pitch-based features are derived from pitch. But
extracting pitch from noisy speech is also a difficult task,
especially in low SNR conditions. Generally speaking, on one
hand, if the target voice is separated from the background,
we can obtain the pitch easily. On the other hand, speech
separation performance will get better if pitch estimation is
accuracy. Since these two tasks could benefit from each other,
speech separation and pitch extraction in noisy conditions are
considered to be a “chicken-and-egg” problem.
In this paper, we propose a supervised learning system to
deal with this “chicken-and-egg” problem more concisely.
• Pitch extraction and speech separation are boosted
alternately. (Section 2.1)
• Frame-level features are adopted, which consist of
spectral features, and pitch-based features. (Section
2.2)
• We use deep stacking network (DSN) to implement our
idea of working on the two problems (pitch extraction
and speech separation) alternately. (Section 2.3)
• Systematic evaluations show the proposed approach
produces high quality estimated binary masks and
outperforms recent systems in unmatched noisy
conditions. (Section 3)
246978-1-4673-6997-8/15/$31.00 ©2015 IEEE ICASSP 2015
资源评论
weixin_38705699
- 粉丝: 3
- 资源: 961
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (源代码+论文+PPT模板+配置方法)房源优选租赁分析系统【java毕业设计】.zip
- (源代码+论文+PPT模板+配置方法)租赁E站全程服务平台【java毕业设计】.zip
- split_config.arm64_v8a.apk
- (论文+配置+源代码+PPT模板)房屋租赁智慧眼(java毕业设计).zip
- (论文+PPT模板+配置方法+源代码)租房帮手综合管理系统(java毕业设计).zip
- java毕业设计丨房源动态租赁追踪器(论文+源代码+PPT模板).zip
- 【java毕业设计】租赁小秘书高效助手(源代码+论文+配置方法+PPT模板).zip
- 延安市2005-2024年近20年历史气象数据下载
- 天水市2005-2024年近20年历史气象数据下载
- split_config.zh.apk
- 求特殊方程的正整数解一元与多元素方程正整数解法综述及应用实例
- 三轴平移工作台sw18全套技术资料100%好用.zip
- mmexport1735483585567.mp4
- (论文+配置+源代码+PPT模板)房屋租赁智能管家平台(java毕业设计).zip
- java毕业设计】房源精选租赁助手(源代码+论文+配置方法+PPT模板).zip
- (论文+PPT模板+配置+源代码)租房易行无忧系统【java毕业设计】.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功