没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
This paper describes a method for enhancing speech corrupted by broadband noise. The method is based on the spectral noise subtraction method. The original method entails subtracting an estimate of the noise power spectrum from the speech power spectrum, setting negative differences to zero
资源推荐
资源详情
资源评论
ENHANCEMENT OF SPEECH CORRUPTED BY ACOUSTIC NOISE*
M.
Berouti,
R.
Schwartz,
and J. Makhoul
Bolt Beranek and Newman Inc.
Cambridge,
Mass.
ABSTRACT
This
paper
describes a method for
enhancing
speech
corrupted
by
broadband noise. The method is
based
on the
spectral
noise subtraction method.
The
original
method entails
subtracting
an
estimate
of the noise
power spectrum
from the
speech power
spectrum, setting negative
differences to
zero,
recombining
the new
power spectrum
with the
original phase,
and then
reconstructing
the time
waveform. While this method reduces the broadband
noise,
it also
usually
introduces an
annoying
"musical
noise". We have
devised a
method that
eliminates this "musical noise"
while further
reducing
the
background
noise. The method
consists
in
subtracting
an overestimate of
the noise
power
spectrum,
and
preventing
the resultant
spectral
components
from
going
below
a
preset
minimum level
(søectral floor). The method can
automatically
adapt
to a wide
range
of
signal—to—noise ratios,
as
long
as a reasonable estimate of the noise
spectrum
can be obtained. Extensive
listening
tests were
performed
to determine the
quality
and
intelligibility
of
speech
enhanced
by
our method.
Listeners
unanimously preferred
the
quality
of the
processed speech. Also,
for an
input
signal—to—noise
ratio of 5
dB,
there was no
loss
of
intelligibility
associated with the enhancement
technique.
1. INTRODUCTION
We
report
on our work to enhance the
quality
of
speech
degraded
by
additive white noise. Our
goal
is to
improve
the
listenability
of the
speech
signal by decreasing
the
background noise,
without
affecting
the
intelligibility
of the
speech.
The
noise is at such levels that the
speech
is
essentially unintelligible
out of context. We use
the
average segmental signal—to—noise
ratio
(SNR)
to measure the noise level of the
noise—corrupted
speech signal.
We found that sentences with a SNR
in the
range
—5
to
+5
dB have an
intelligibility
score in the
range
20 to
80%.
There is
strong
correlation between
the
intelligibility
of a
sentence and the
SNR,
but
intelligibility
also
depends
on
the
speaker,
on
context,
and on the
phonetic
content.
After an initial
investigation
of several
methods of
speech enhancement,
we concluded that
the method of
spectral
noise subtraction is more
effective than others. In this
paper
we discuss
our
implementation
of that
method,
which differs
from that
reported by
others in two
major
ways:
first,
we subtract a factor
(a)
times the noise
spectrum,
where a is a number
greater
than
unity
and varies from frame to frame.
Second,
we
prevent
the
spectral components
of
the
processed signal
from
going
below a certain
lower bound which we
call
the sceotral floor. We
express
the
spectral
floor as a fraction
,
of the
original
noise
power
spectrum
Pn(w).
2. BASIC METHOD
The basic
principle
of
spectral
noise
subtraction
appears
in
the literature
in
various
implementations [1_1].
Basically,
most
methods of
speech
enhancement have in common the
assumption
that the
power spectrum
of
a
signal corrupted by
uncorrelated noise is
equal
to the sum of the
signal spectrum
and the noise
spectrum.
The
preceding
statement is true
only
in the statistical
sense.
However, taking
this
assumption
as a
reasonable
approximation
for short—term
(25 as)
spectra,
its
application
leads to a
simple
noise
subtraction method.
Initially,
the method we
implemented
consisted in
computing
the
power
spectrum
of each windowed
segment
of
speech
and
subtracting
from
it
an
estimate of the noise
power
spectrum.
The
estimate
of
the noise is formed
during periods
of "silence". The
original phase
of
the OFT of the
input signal
is
retained for
resynthesis. Thus,
the enhancement
algorithm
consists
of
a
straightforward implementation
of the
following relationship:
let D(w)
=
P5(w)—P0(w)
ID(v),
if
D(w)>O
P(w)
0,
otherwise
(1)
where
P(w)
is the modified
signal
spectrum,
P5(w)
is the
spectrum
of the
input
noise—corrupted
speech,
and
Pn(w)
is the smoothed estimate of the
noise
spectrum.
Pn(w)
is obtained
by
a
two—step
process:
First we
average
the noise
spectra
from
several frames of "silence".
Second,
we smooth in
frequency
this
average
noise
spectrum.
For the
specific
case of white
noise,
the smoothed estimate
of the noise
spectrum
is flat. The
enhanced
speech
signal
is obtained from both
P(w)
and the
original
phase by
an
inverse Fourier transform:
s'(t)
=
F{)
(2)
where
0(w)
is the
phase
function of the DFT of the
input
speech.
Since the
assumption
of uncorrelated
signal
and noise is not
strictly
valid for
short—term
spectra,
some of
the
components
of the
processed spectrum, P(w), may
be
negative.
These
negative
values are set to zero as shown in
(1).
*
An earlier version of' this
paper
was
presented
at
the ARPA Network
Speech Compression (NSC) Group
meeting, Cambridge, MA, May 1978,
in a
special
session on
speech
enhancement.
CHl37Y—7/7/OOQ0—O201)O.T5
©1w79
TELE
208
资源评论
ColorLSu_forever
- 粉丝: 12
- 资源: 28
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 小笑授权系统V5.0开心版
- 基于SpringBoot+Vue.JS前后端分离的墙绘产品展示交易平台 源码+数据库+论文(毕业设计)
- PHP论文格式化系统-前台的设计与实现(源代码+LW).zip
- PHP课程网站络管理系统(源代码+LW).zip
- 拼图游戏-如何将游戏存档?
- Sandboxie-Plus-x64-v1.10.5.zip.fgpg
- Screenshot_20240914_175208.jpg
- 木舟0基础学习Java的第二十八天(常见的Java框架,mybatis框架)
- 基于SpringBoot+Vue.JS前后端分离的体育馆管理系统 源码+数据库+论文(毕业设计)
- Typora.1.6.7.pj.zip.fgpg
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功