speex降噪文献资源-CSDN文库

需积分: 9 94 浏览量 2016-05-31 17:05:35 上传评论收藏 387KB PDF 举报

### Speex降噪技术概述 #### 一、引言与背景本文介绍了一种用于非平稳噪声环境下的语音增强技术，即优化修改的对数谱幅度（Optimally Modified Log-Spectral Amplitude, OM-LSA）语音估计器以及最小控制递归平均（Minimum Controlled Recursive Averaging, MCRA）噪声估计方法。该方法通过有效处理非平稳噪声来实现鲁棒性的语音增强。 #### 二、关键技术点 ##### 1. OM-LSA 语音估计器 - **概念定义**：OM-LSA 方法是一种针对非平稳噪声环境下提高语音清晰度的技术。它通过计算一个最优的谱增益函数来最小化对数谱误差，从而达到语音增强的目的。 - **工作原理**：OM-LSA 的核心在于利用语音存在不确定性假设下的假设增益的几何平均值作为谱增益函数。这种方法能够更准确地恢复被噪声掩盖的语音信号，同时避免了传统方法中的音乐残留噪声现象。 - **优势特点**： - **高精度**：通过最小化均方误差，提高语音质量。 - **适应性强**：适用于多种非平稳噪声环境。 - **保留弱语音成分**：有效保留了弱语音信号，提高了整体可理解性。 ##### 2. MCRA 噪声估计方法 - **概念定义**：MCRA 是一种用于噪声估计的技术，它通过平滑过去频谱功率值的方式来获取噪声估计，并根据子带内的语音存在概率调整平滑参数。 - **工作原理**：该方法引入了两种不同的语音存在概率函数：一种用于估计语音，另一种用于控制噪声谱的自适应过程。前者基于先验信噪比的时间-频率分布；后者则由噪声信号局部能量与其在特定时间窗口内的最小值之间的比例确定。 - **优势特点**： - **鲁棒性**：对于变化的噪声环境具有良好的适应能力。 - **减少误判**：通过精确的语音存在概率估计减少了噪声估计的误差。 - **动态调整**：平滑参数根据实时情况动态调整，确保估计的准确性。 #### 三、性能评估为了验证OM-LSA 和 MCRA 方法的有效性，研究者进行了客观和主观评价实验。实验涵盖了各种环境条件下的噪声抑制测试。结果显示，这两种方法能够显著提高语音质量，同时有效去除噪声干扰，保持了语音信号中的细微成分，且避免了常见的音乐残留噪声问题。 #### 四、应用前景 - **移动通信**：提高手机通话质量，尤其是在嘈杂环境中。 - **智能语音助手**：提升语音识别系统的性能，使其在复杂环境下依然能够准确识别指令。 - **会议系统**：改善远程会议体验，确保清晰无误的语音传输。 - **听力辅助设备**：帮助听障人士更好地理解对话内容。 #### 五、结论本文提出的OM-LSA 语音估计器与 MCRA 噪声估计方法为非平稳噪声环境下的语音增强提供了一种高效解决方案。通过对谱增益函数和噪声估计过程的优化，该方法能够显著提升语音信号的质量，有效应对各种复杂的噪声环境挑战，为语音通信和处理技术的发展带来了新的可能。 OM-LSA 与 MCRA 技术不仅在理论上取得了突破，在实际应用中也展现了广泛的应用前景。随着技术的进一步发展和完善，相信未来将有更多基于此类技术的产品和服务出现，为人们的生活带来便利。

资源推荐

资源详情

资源评论

Signal Processing 81 (2001) 2403–2418

www.elsevier.com/locate/sigpro

Speech enhancement for non-stationary noise environments

Israel Cohen

∗

, Baruch Berdugo

Lamar Signal Processing Ltd., P.O.Box 573, Yokneam Ilit 20692, Israel

Received 18 February 2001; received in revised form 26 June 2001

Abstract

In this paper, we present an optimally-modi#ed log-spectral amplitude (OM-LSA) speech estimator and a minima

controlled recursive averaging (MCRA) noise estimation approach for robust speech enhancement. The spectral gain

function, which minimizes the mean-square error of the log-spectra, is obtained as a weighted geometric mean of

the hypothetical gains associated with the speech presence uncertainty. The noise estimate is given by averaging past

spectral power values, using a smoothing parameter that is adjusted by the speech presence probability in subbands. We

introduce two distinct speech presence probability functions, one for estimating the speech and one for controlling the

adaptation of the noise spectrum. The former is based on the time–frequency distribution of the a priori signal-to-noise

ratio. The latter is determined by the ratio between the local energy of the noisy signal and its minimum within

a specied time window. Objective and subjective evaluation under various environmental conditions conrm the

superiority of the OM-LSA and MCRA estimators. Excellent noise suppression is achieved, while retaining weak speech

components and avoiding the musical residual noise phenomena.

1. Introduction

A practical speech enhancement system gener-

ally consists of two major components: the estima-

tion of noise power spectrum, and the estimation

of speech. The estimation of noise, when only one

microphone source is provided, is based on the as-

sumption of a slowly varying noise environment.

In particular, the noise spectrum remains virtually

stationary during speech activity. The estimation of

speech is based on the assumed statistical model,

distortion measure, and the estimated noise.

A commonly used approach for estimating the

noise power spectrum is to average the noisy sig-

nal over sections which do not contain speech. A

∗

Corresponding author. Tel.: +972-4-993-7066; fax:

+972-4-993-7064.

E-mail address: icohen@lamar.co.il (I. Cohen).

soft-decision speech pause detection is either imple-

mented on a frame-by-frame basis [12,22] or esti-

mated independently for individual subbands using

an a posteriori signal-to-noise ratio (SNR) [11,13].

However, the detection reliability severely deteri-

orates for weak speech components and low in-

put SNR. Additionally, the amount of presumable

non-speech sections in the signal may not be su-

cient, which restricts the tracking capability of the

noise estimator in non-stationary environments. Al-

ternatively, the noise can be estimated from his-

tograms in the power spectral domain [11,18,24].

Unfortunately, such methods are computationally

expensive.

Martin [14,15] has proposed an algorithm for

noise estimation based on minimum statistics. The

noise estimate is obtained as the minima values

of a smoothed power estimate of the noisy signal,

multiplied by a factor that compensates the bias.

PII: S0165-1684(01)00128-1

2404 I. Cohen, B. Berdugo / Signal Processing 81 (2001) 2403–2418

Nomenclature

A spectral speech amplitude

b smoothing window for computing S

cost for deciding H



when H



D short-time Fourier transform of the

noise signal

d noise signal

G spectral gain function

conditional gain function

min

spectral gain oor

speech absence hypothesis for speech

estimation

speech presence hypothesis for

speech estimation



speech absence hypothesis for noise

estimation



speech presence hypothesis for noise

estimation

h analysis window

h synthesis window

local

, local and global smoothing windows

global

I indicator function for hypothesis

testing

k frequency bin (subband) index

L number of frames used for nding

tmp

‘ time frame index

L set of frames that contain speech

M framing step

N size of the analysis window

n discrete time index

local

; local and global likelihood of speech

global

frame

frame likelihood of speech

p speech presence probability for

speech estimation



speech presence probability for noise

estimation

q a priori probability for speech absence

max

upper threshold for q

S local energy of the noisy signal

frequency average of the noisy

signal’s energy

min

local minimum of S

tmp

temporary minimum of S

ratio between the local energy and

local minimum

w length of b is 2w +1

X short-time Fourier transform of the

speech signal

x speech signal

Y short-time Fourier transform of the

noisy signal

y noisy signal

 weighting factor for the a priori SNR

estimation



smoothing parameter for estimating

the noise spectrum

˜

time-varying smoothing parameter



smoothing parameter for computing





smoothing parameter for computing S

 smoothing parameter for computing

! a posteriori SNR

" threshold value of S

for hypothesis

testing

recursive average of the a

priori SNR

frame

frame average of the a priori SNR

local

;

global

local and global averages of the a pri-

ori SNR

min

;

max

empirical constants

p min

;

p max

empirical constants

peak

conned peak value of

frame

# generalized likelihood ratio

$ designates either “local” or “global”

variance of D

variance of X given speech is present

% transition function from speech to

noise

& a priori SNR estimate assuming

speech is present

' a priori SNR

' a priori SNR estimate under speech

presence uncertainty

Abbreviations

LSA log-spectral amplitude

MCRA minima controlled recursive

averaging

MM-LSA multiplicatively-modied log-spectral

amplitude

OM-LSA optimally modied log-spectral

amplitude

PDF probability density function

SNR signal-to-noise ratio

STFT short-time Fourier transform

STSA short-time spectral amplitude

I. Cohen, B. Berdugo / Signal Processing 81 (2001) 2403–2418 2405

However, this noise estimate is sensitive to outliers

[24], generally biased [16], and its variance is about

twice as large as the variance of a conventional

noise estimator [15]. Additionally, this method oc-

casionally attenuates low energy phonemes [15]. To

overcome these limitations, the smoothing parame-

ter and the bias compensation factor are turned into

time and frequency dependent, and estimated for

each spectral component and each time frame [16].

In [6], a computationally more ecient minimum

tracking scheme is presented. Its main drawbacks

are the very slow update rate of the noise estimate

in case of a sudden rise in the noise energy level,

and its tendency to cancel the signal [19].

Considering the speech estimation, Ephraim and

Malah [8] derived a log-spectral amplitude (LSA)

estimator, which minimizes the mean-square error

of the log-spectra, based on a Gaussian statistical

model. This estimator proved very ecient in re-

ducing musical residual noise phenomena [6,12,17].

However, the speech spectrum is estimated under

speech presence hypothesis. In contrast to other es-

timators, whose performance improves by utilizing

the speech presence probability [7,10,18,23,25], it

was believed that modication of the LSA estima-

tor under speech presence uncertainty is “unwor-

thy” [8]. Malah et al. [13] have recently proposed a

multiplicatively modi#ed LSA (MM-LSA) estima-

tor. Accordingly, the spectral gain is multiplied by

the conditional speech presence probability, which

is estimated for each frequency bin and each frame.

Unfortunately, the multiplicative modier is not op-

timal [13]. Moreover, their estimate for the a priori

SNR interacts with the estimated a priori speech

absence probability [17]. This adversely aects the

total gain for noise-only bins, and results in an un-

naturally structured residual noise.

Kim and Chang [12] proposed to use a small xed

a priori speech absence probability q (q =0:0625)

and a multiplicative modier, which is based on

the global conditional speech absence probability in

each frame. This modier is applied to the a priori

and a posteriori SNRs. Not only such a modication

Applying a uniform attenuation factor to frames that do not

contain speech eliminates the noise structuring in such frames

[13]. Yet, in speech-plus-noise frames the noise structuring

persists.

is inconsistent with the statistical model, but also

insignicant due to the small value of q and the

inuence of a few noise-only bins on the global

speech absence probability.

In this paper, we present an optimally modi#ed

LSA (OM-LSA) speech estimator and a minima

controlled recursive averaging (MCRA) noise es-

timation approach for robust speech enhancement.

The optimal spectral gain function is obtained as a

weighted geometric mean of the hypothetical gains

associated with the speech presence uncertainty.

The exponential weight of each hypothetical gain

is its corresponding probability, conditional on the

observed signal. The noise spectrum is estimated

by recursively averaging past spectral power val-

ues, using a smoothing parameter that is adjusted

by the speech presence probability in subbands.

We introduce two distinct speech presence prob-

ability functions, one for estimating the speech and

one for controlling the adaptation of the noise spec-

trum. The former is based on the time–frequency

distribution of the a priori SNR. The latter is de-

termined by the ratio between the local energy of

the noisy signal and its minimum within a spec-

ied time window. The probability functions are

estimated for each frame and each subband via a

soft-decision approach, which exploits the strong

correlation of speech presence in neighboring fre-

quency bins of consecutive frames.

Objective and subjective evaluation of the

OM-LSA and MCRA estimators is performed un-

der various environmental conditions. We show

that these estimators are superior, particularly for

low input SNRs and non-stationary noise. The

MCRA noise estimate is unbiased, computationally

ecient, robust with respect to the input SNR and

type of underlying additive noise, and characterized

by the ability to quickly follow abrupt changes in

the noise spectrum. Its performance is close to the

theoretical limit. The OM-LSA estimator demon-

strates excellent noise suppression, while retaining

weak speech components and avoiding the musical

residual noise phenomena.

The paper is organized as follows. In Section 2,

we derive the OM-LSA speech estimator and its

corresponding speech presence probability func-

tion. In Section 3, we discuss the problem of the

a priori SNR estimation under speech presence

剩余15页未读，继续阅读

评论收藏

内容反馈

跬步达千里

粉丝: 238
资源: 43

speex降噪文献

android 利用speex 音频降噪，回声消除

android 利用speex 音频降噪，回声消除demo

ALSA中集成SPEEX降噪算法方法

speex音频噪声抑制

Android使用speex+rtmp实现网络语音聊天

SPEEX.zip_speex_speex移植_speex移植到M0

speex_speex_speex.dll_

Speex手册中文版

Speex编解码器手册 1.2Beta3版

基于speex的android 录音

speex音频转pcm和aac

speex-1.2.0.tar.gz

speex相关文档

AndroidStudio 上编译speex库，进行使用

SpeeX参考文档.rar

speex for android jni

speex_manual

SPEEX-V1_speexstm32_speex_stm32_stm32speex_

Vector Davinci官方帮助配置使用手册（AutoSAR）.pdf

c++入门，核心，提高讲义笔记

数字图像处理 冈萨雷斯 课后习题

离散数学及其应用 第八版 奇数编号练习答案.pdf

科研伦理与学术规范 期末考试2 （40题）.pdf

软件著作权设计说明书模板（含填写说明）.docx

最值得收藏的 考研线性代数 全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx

AUTOSAR培训教材.rar

“互联网+”大学生创新创业大赛项目计划书

SMA_Connector.zip

最新资源

数字图像处理冈萨雷斯课后习题

离散数学及其应用第八版奇数编号练习答案.pdf

科研伦理与学术规范期末考试2 （40题）.pdf

最值得收藏的考研线性代数全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx