FundamentalFrequencyEstimationInSpeechSignalsWithVariableRateParticleFilters资源-CSDN文库

172 浏览量 2021-02-07 07:25:59 上传评论收藏 682KB PDF 举报

### 基本频率估计在语音信号中的应用与变率粒子滤波器 #### 引言本文讨论了使用变率粒子滤波器（Variable Rate Particle Filters, VRPF）进行语音信号基本频率估计的研究，这一研究对学术界及工业界均具有重要意义。基本频率（即所谓的“音高”）估计在语音信号处理中是一项核心任务，它能够帮助识别出语音信号中的周期性模式。粒子滤波器作为一种强大的贝叶斯推断方法，在非线性状态空间模型参数跟踪方面表现出色。 #### 研究背景与动机在过去的几十年里，已经提出了多种针对语音信号的鲁棒音高估计算法。这些算法大多基于时域或频域技术，例如：基于自相关的方法、平均幅度差函数（Average Magnitude Difference Function, AMDF）以及基于短时傅里叶变换的频域方法等。其中，RAPT（Robust Algorithm for Pitch Tracking）算法和YIN算法因其在不同噪声条件下的稳健性而受到广泛认可。然而，在极端噪声环境下（如信噪比SNR低至-5dB至-10dB），这些传统方法往往难以获得满意的音高跟踪结果。 #### 方法论本研究提出了一种基于时间变化源-滤波器模型的语音建模方法，并利用变率粒子滤波器来开发音高周期估计的方法。此外，还实现了一种Rao–Blackwellised变率粒子滤波器（RBVRPF）。通过与现有先进的音高估计算法YIN进行比较，验证了所提方法的有效性。 - **时间变化源-滤波器模型**：该模型考虑了语音信号的时间动态特性，通过将语音信号分解为声源部分和声道滤波器部分，从而更好地捕捉到语音信号的本质特征。 - **变率粒子滤波器**：VRPF是一种改进型粒子滤波器，它可以自动调整采样速率以提高估计精度和效率。在处理动态变化较快的信号时，VRPF能够快速响应并保持较高的估计准确度。 - **Rao–Blackwellised变率粒子滤波器**：RBVRPF进一步优化了VRPF，通过将某些参数固定化（即Rao-Blackwellisation过程），减少粒子退化问题，从而提高了整体性能。 #### 实验结果与分析为了评估所提方法的有效性，进行了多组实验测试。实验结果表明，在不同类型的背景噪声条件下，无论是干净的语音还是噪声污染严重的环境，VRPF和RBVRPF都能提供比YIN算法更准确的音高估计结果。尤其是在极端噪声条件下（例如-5dB至-10dB的信噪比），VRPF和RBVRPF表现出了显著的优势。 #### 结论本文介绍了一种新颖的方法来解决语音信号中的基本频率估计问题，即利用变率粒子滤波器进行音高周期估计。通过对时间变化源-滤波器模型的应用以及Rao-Blackwellisation技术的结合，该方法在各种噪声环境下均能提供更为精确的音高估计结果。相比于传统的音高估计算法，如YIN算法，该方法在处理强噪声环境下的语音信号时表现出更佳的鲁棒性和准确性。未来的研究可以进一步探索如何在实际应用场景中更高效地部署这些算法，以支持更广泛的语音信号处理需求。

资源推荐

资源详情

资源评论

890 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 5, MAY 2016

Fundamental Frequency Estimation in Speech

Signals With Variable Rate Particle Filters

Geliang Zhang and Simon Godsill, Member, IEEE

Abstract—Fundamental frequency estimation, known as pitch

estimation in speech signals is of interest both to the research com-

munity and to industry. Meanwhile, the particle ﬁlter is known

to be a powerful Bayesian inference method to track dynamic

parameters in nonlinear state-space models. In this paper, we pro-

pose a speech model under a time-varying source-ﬁlter speech

model, and use variable rate particle ﬁlters (VRPF) to develop

methods for estimation of pitch periods in speech signals. A

Rao–Blackwellised variable rate particle ﬁlter (RBVRPF) is also

implemented. The proposed VRPF and RBVRPF are compared

with a state-of-the-art pitch estimation algorithm, the YIN algo-

rithm. Simulation results show that more accurate estimation of

pitch can be obtained by VRPF and RBVRPF even under strong

background noise conditions.

Index Terms—variable rate particle ﬁlters, pitch estimation,

Rao–Blackwellisation, source-ﬁlter model.

I. INTRODUCTION

OBUST pitch estimation algorithms for speech signals

have wide application and thus have been proposed in

many papers. Previous algorithms are mainly based on time

domain and frequency domain techniques; for example, a

robust algorithm for pitch tracking (RAPT) algorithm and YIN,

a fundamental frequency estimator for speech and music [1]

[2] [3]. Most time domain algorithms are based on autocorre-

lation methods and the average magnitude difference function

(AMDF) method, which can be used to estimate the periods of

speech signals [4]. Recently a robust frequency domain algo-

rithm for pitch estimation has also been proposed [5]. Some

researchers have proposed a statistical method which chooses

peaks from short time s pectrum of speech signals [6]. Other

methods which have been proposed to estimate glottal waves

can also be used to extract pitch periods, include [7] [8].

However, very few of them can give satisfactory pitch tracking

results under strong noise conditions, for example, when the

Signal-to-Noise Ratio (SNR) is as poor as −5dBto−10 dB.

Particle ﬁlters have been used widely in tracking applica-

tions since their development in recent decades [9] [10] [11].

Manuscript received May 07, 2015; revised August 04, 2015 and December

29, 2015; accepted February 02, 2016. Date of publication February 18, 2016;

date of current version March 23, 2016. This work was supported in part by the

Cambridge Commonwealth, in part by the European and International Trust,

and in part by the Natural Science Foundation of China under Grant 61463035.

The associate editor coordinating the review of this manuscript and approving

it for publication was Dr. Sin-Horng Chen.

The authors are with the Signal Processing and Communication Laboratory,

Engineering Department, University of Cambridge, Cambridge CB2 1PZ, U.K.

(e-mail: gz246@cam.ac.uk; sjg30@cam.ac.uk).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TASLP.2016.2531285

However, little work has been done to apply t he particle ﬁlter

to pitch tracking. Recently it has been shown that the particle

ﬁlter approach can be used to track pitch period, using a quasi-

periodic speech signal model [12]. In this paper, we propose

another particle ﬁlter approach to address the pitch tracking

problem using a source-ﬁlter speech signal model, which is

capable of tracking pitch period under very noisy conditions.

A possible source-ﬁlter model that can be used to capture

the pitch period of speech signals is the time-varying autore-

gressive (AR) model driven by some source signals. Because

of the near-periodic properties of voiced speech signals, the

driving sources should themselves be near-periodic signals. A

promising driving source model is proposed here in this paper,

with an accompanying speech waveform model. Experiments

have been carried out to test the performance of the proposed

algorithm in various SNR conditions, showing that the pro-

posed method can track pitch periods more successfully than

state-of-the-art algorithms under noisy conditions.

The paper is organised as follows. Section 2 describes the

source-ﬁlter speech model. In Section 3, a detailed description

of the application of variable rate particle ﬁlters of the problem

is presented. An initialization step for the particle ﬁlter using a

joint optimization approach is derived in Section 4. Section 5

describes the details of the Rao-Blackwellisation approach to

the previous variable rate particle ﬁlter. Section 6 gives experi-

ment results for the proposed methods and compares them with

the YIN algorithm. Finally, conclusions are drawn in Section 7.

II. S

PEECH MODEL

A. A Time-Varying AR Source-Filter Model

It is known that in human speech the fundamental period has

a lower bound T

low

and an upper bound T

upp

.Then-th period

is thus modeled as,

= T

n−1

+ τ

low

upp

. (1)

The speech signal at time t in the current period n

are repre-

sented as a periodic source input to a M-th order time-varying

AR model,



p=1

(t−p)

+ V

, (2)

where V

denotes the input source to the AR model and can

be modeled as either near-periodic signals or glottal pulse

sequences, while the current period n

is n : t ∈ [P

n+1

is the time when the n-th period starts, i.e. P



n−1

i=1

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

ZHANG AND GODSILL: FUNDAMENTAL FREQUENCY ESTIMATION IN SPEECH SIGNALS WITH VARIABLE RATE PARTICLE FILTERS 891

The AR coefﬁcients a

are assumed to change randomly

between periods, but remain ﬁxed within each period,

= a

−1

+ τ

a,p

, (3)

where τ

∼ U(τ

min

,τ

max

). U refers to the uniform dis-

tribution and N denotes the Gaussian distribution through

out the paper. τ

min

= max[−τ

low

− T

n−1

], and τ

max

min[τ

upp

− T

n−1

], where τ

is a ﬁxed hyperparameter.

a,p

can be sampled from N (0,σ

a,p

Finally, the voiced speech signal s

is observed in Gaussian

noise:

= s

+ G

. (4)

is sampled from N (0,σ

). Values of hyperparameters

such as σ

and τ

in these distributions are related with the

extent of variations of parameters in the speech model and are

given in the experiment section. We use a

to denote a

1:M

The characteristics of the speech signal model are largely

determined by the input source, V

. V

can potentially be mod-

eled using different quasi-periodic models, resulting in different

performances. A particular source model is proposed here,

described in the following subsection.

B. Input Sources Modeled as Almost Periodic Signals

Because the voiced speech signal is almost periodic, with

time-varying period, it is suggested that the input source can be

modeled as an almost periodic signal itself. Such an approach

has previously been used to model spectroscopy signals and

music signals [13] [14]. Here it is proposed that a similar

method can be used to model the input source to the speech

production model as well,



k=0

cos(kw

t)+B

sin(kw

t)+W

(5)

= A

−1

+ 

A,k

(6)

= B

−1

+ 

B,k

(7)

∼ N(0,τ

) (8)

Here 

A,k

and 

B,k

can be sampled from Gaussian distribu-

tions N(0,σ

A,k

) and N(0,σ

B,k

In equation (5), K +1denotes the number of harmonic com-

ponents, i.e. cosine and sine waves, used in the input source

model. The variable w

refers to the fundamental frequency of

the current speech signal, which is the inverse of current pitch

period T

. Here we assume that A

and B

change slowly

that they can be assumed ﬁxed within each pitch period. In order

to simplify the notation, we use A

and B

to denote A

1:K

and B

1:K

III. IMPLEMENTATION OF VARIABLE RATE

PARTICLE FILTER

We can use Bayesian ﬁltering to recursively estimate hid-

den states x

1:t

from observable states y

1:t

[10], [15], using the

following prediction and updating equations,

p(x

1:t

1:t−1

)=p(x

1:t−1

)p(x

1:t−1

) (9)

p(x

1:t

) ∝ p(y

1:t

1:t−1

)p(x

1:t

1:t−1

) (10)

Thus if we can set the initial prior p(x

), we can use (9)

and (10) to calculate the posterior distribution of p(x

1:t

)

and its marginal distributions once a new observable state y

received [10].

The variable rate particle ﬁlter approach uses a set of random,

weighted ‘particles’ x

(i)

1:t

to approximate the posterior distribu-

tion for the unknown state variable sequence x

1:t

from the noisy

data y

1:t

p(x

1:t

) ≈



i=1

(i)

δ(x

1:t

− x

(i)

1:t

), (11)

where N denote the number of total particles.

In order to deal with the analytic intractability of the speech

signal model and considering the fact that the period T is

asynchronous with the sample time t, we adopt the variable

rate particle ﬁlter here. Compared with the standard parti-

cle ﬁlter, the variable rate particle ﬁlter (VRPF) applies to

cases when the state variables arrive at unknown times rela-

tive to the observation process [16]. This makes VRPF suit-

able for this speech signal model in which the pitch period

arrives at a random rate relative to the observed speech signal

samples.

In the VRPF, at a time t, t he unknown parameters in the

problem are s

1:t

, T

1:n

, A

1:n

, B

1:n

, and a

1:M

1:n

. Fixed hyperpa-

rameters (σ

,σ

a,p

,τ

,σ

A,k

,σ

B,k

) are assumed here.

Thus the hidden state vector x

1:t

is deﬁned as,

1:t

=[s

1:t

1:M

1:n

, A

1:n

, B

1:n

]. (12)

The algorithm of variable rate particle ﬁlter used here can be

summarized as follows.

One thing we would like to mention here is that it is not sug-

gested that we make the decision of whether or not to resample

when every signal sample becomes available, as it will decrease

the robustness of the algorithm. Rather, we make the decision

only after a certain length of samples have been processed,

which is equal to the length of a pre-determined window size.

The time length of this window function is usually about 32, 64

or 128 ms depending on the context.

IV. I

NITIALIZATION OF PARTICLE FILTERS

A. Motivation

In order to apply the particle ﬁlter to estimate the pitch period

of speech signal using the time-varying AR model, it is nec-

essary to estimate all the parameters used in the model except

for the ﬁxed parameters. However, if too many parameters

need to be tracked simultaneously without any prior knowl-

edge about their initial value, it means we need to estimate

the state vector within a high dimensional space, which needs

exponentially growing computation and number of particles to

剩余10页未读，继续阅读

评论收藏

内容反馈

weixin_38620839

粉丝: 8
资源: 938

Fundamental Frequency Estimation In Speech Signals With Variable...

最新资源

Fundamental Frequency Estimation In Speech Signals With Variable...

Particle Filtering

Harvest A high-performance fundamental frequency estimator.pdf

Fundamental frequency_frequency_源码

Fundamental of speech recognition

Fundamental Limitations in Filtering and Control

Fundamental_of_Speech_Recognition_-_Lawr.pdf_speechrecognition_源

Speech Recognition and Acoustic Features in

Spectral Analysis of Signals(Petre Stoica and Randolph Moses) 下半部分

Camera Models and Fundamental Concepts Used in Geometric Computer Vision

Robotics, Vision and Control -- Fundamental Algorithms in MATLAB

fundamental of statistical signal processing estimation theory

數位語音信號處理

Spare Channel Estimation

《Robotics, Vision and Control — Fundamental Algorithms in MATLAB》

Robotics, Vision and Control: Fundamental Algorithms in MATLAB 第一章

Robotics,Vision and Control Fundamental Algorithms in MATLAB第二版中使用的机器人学和视觉工具箱

Robotics, Vision and Control: Fundamental Algorithms in MATLAB

Fundamental Networking in Java

Robotics, Vision and Control: Fundamental Algorithms in MATLAB , Part 1

Robotics, Vision and Control: Fundamental Algorithms in MATLAB 第二章

图像矩阵matlab代码-fundamental-matrix-estimation:使用RANSAC进行基本矩阵估计

Enhancement of the surface emission at the fundamental frequency and the transmitted high-order harmonics by pre-structured targets

机器人外文文献.pdf

基于CORDIC的反正弦和反余弦计算的FPGA实现

最新资源