APAIRWISEALGORITHMFORPITCHESTIMATIONANDSPEECHSEPARATIONUSINGDEEPSTACKINGNETWORK

研究论文

142 浏览量 2021-02-11 01:49:23 上传评论收藏 193KB PDF 举报

资源推荐

资源详情

资源评论

A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION USING

DEEP STACKING NETWORK

Hui Zhang

, Xueliang Zhang

, Shuai Nie

, Guanglai Gao

, Wenju Liu

Computer Science Department, Inner Mongolia University, Hohhot, China, 010021

National Laboratory of Patten Recognition (NLPR), Institute of Automation, University of Chinese

Academy of Sciences, Beijing, China, 100190

alzhu.san@163.com, cszxl@imu.edu.cn, nss90221@gmail.com, csggl@imu.edu.cn, lwj@nlpr.ia.ac.cn

ABSTRACT

Pitch information is an important cue for speech separation.

However, pitch estimation in noisy condition is also a task

as challenging as speech separation. In this paper, we

propose a supervised learning architecture which combines

these two problems concisely. The proposed algorithm is

based on deep stacking network (DSN) which provides a

method of stacking simple processing modules in building

deep architecture. In the training stage, an ideal binary mask

is used as target. The input vector includes the outputs

of lower module and frame-level features which consist of

spectral and pitch-based features. In the testing stage, each

module provides an estimated binary mask which is employed

to re-estimate pitch. Then we update the pitch-based features

to the next module. This procedure is embedded iteratively

in DSN, and we obtain the ﬁnal separation results from the

last module of DSN. Systematic evaluations show that the

proposed approach produces high quality estimated binary

mask and outperforms recent systems in generalization.

Index Terms— Speech separation, Pitch estimation,

Computational auditory scene analysis, Supervised learning

1. INTRODUCTION

In realistic environments, noise usually degrades the speech

intelligibility of hearing-impaired listeners or performance

of automatic speech recognition (ASR) systems. Speech

separation aims to remove noise by separating target speech

from background interference. It is helpful for both hearing

aids wearers and ASR systems [1, 2]. Computational auditory

scene analysis (CASA) is a promising method to solve the

speech separation problem [3].

CASA deﬁnes the goal of speech separation as computing

an ideal binary mask (IBM) [4], which is useful for

improving speech intelligibility [5] and the performance

of speech/speaker recognition [6, 7]. The IBM is a time-

frequency (T-F) mask, which can be computed from premixed

This research was supported in part by the China National

Nature Science Foundation (No.61365006, No.61263037, No.61305027,

No.91120303, No.61273267, No.61403370, and No.90820011).

target and interference. Speciﬁcally, in a T-F unit, if the

signal-to-noise ratio (SNR) is greater than a local SNR

criterion (LC), the corresponding mask element in the IBM

is set to 1 (target-dominant). Otherwise, the mask element is

set to 0 (interference-dominant).

When adopting IBM as the computational goal of

CASA, we can naturally formulate the speech separation

as a binary classiﬁcation problem [5]. From the viewpoint

of classiﬁcation, the feature selection is important. Many

features have been inspected. Those features include:

pitch-based features [8], amplitude modulation spectrum

(AMS) [9], relative spectral transform and perceptual linear

prediction (RASTA-PLP), Mel-frequency cepstral coefﬁcient

(MFCC) and Gammatone frequency cepstral coefﬁcient

(GFCC) [10] etc. Wang et al. [10] suggest that pitch-based

features have a good generalization in speech separation.

Pitch-based features are derived from pitch. But

extracting pitch from noisy speech is also a difﬁcult task,

especially in low SNR conditions. Generally speaking, on one

hand, if the target voice is separated from the background,

we can obtain the pitch easily. On the other hand, speech

separation performance will get better if pitch estimation is

accuracy. Since these two tasks could beneﬁt from each other,

speech separation and pitch extraction in noisy conditions are

considered to be a “chicken-and-egg” problem.

In this paper, we propose a supervised learning system to

deal with this “chicken-and-egg” problem more concisely.

• Pitch extraction and speech separation are boosted

alternately. (Section 2.1)

• Frame-level features are adopted, which consist of

spectral features, and pitch-based features. (Section

2.2)

• We use deep stacking network (DSN) to implement our

idea of working on the two problems (pitch extraction

and speech separation) alternately. (Section 2.3)

• Systematic evaluations show the proposed approach

produces high quality estimated binary masks and

outperforms recent systems in unmatched noisy

conditions. (Section 3)

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

weixin_38705699

粉丝: 3
资源: 961

A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION ...

最新资源

A PAIRWISE ALGORITHM FOR PITCH ESTIMATION AND SPEECH SEPARATION ...

一种适用于卷积神经网络的Stacking算法.pdf

LDB特征提取算法

人脸聚类算法讲解（含原论文）

pairwise：pairwise.org网站

Pairwise-DeepFm

Probability Estimates for Multi-class Classification by Pairwise Coupling.pdf

Computing and Combinatorics

Supervised speech separation based on deep learning: An overview

Pairwise Testing算法的java实现

cameraReady_enliangCVPR2014.pdf

pairwise_pairwisespillover_

Learning to Rank for Information Retrieval and Natural Language Processing

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

pairwise算法c++

A Parallel Pairwise Alignment with Pruning for Large Genomic Sequences

Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power

Parallel Smith-Waterman Algorithm for Pairwise Sequence Alignment on CPU-GPU heterogeneous platform

A Hybrid Clustering System based, (DE) Algorithm for Clustering:A Hybrid Clustering System based on, (DE) Algorithm for Clustering-matlab开发

酒店管理系统

供应链毕业论文

A COMPACT PAIRWISE TRAJECTORY REPRESENTATION FOR ACTION RECOGNITION

Feature Learning based Deep Supervised Hashing with Pairwise Labels

Computational Intelligence:An Introduction

An Introduction to Combinatorics and Graph Theory

Python库 | gpu-pairwise-0.0.3.tar.gz

MIMO-OFDM Wireless Communications with MATLAB

最新资源