890 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 5, MAY 2016
Fundamental Frequency Estimation in Speech
Signals With Variable Rate Particle Filters
Geliang Zhang and Simon Godsill, Member, IEEE
Abstract—Fundamental frequency estimation, known as pitch
estimation in speech signals is of interest both to the research com-
munity and to industry. Meanwhile, the particle filter is known
to be a powerful Bayesian inference method to track dynamic
parameters in nonlinear state-space models. In this paper, we pro-
pose a speech model under a time-varying source-filter speech
model, and use variable rate particle filters (VRPF) to develop
methods for estimation of pitch periods in speech signals. A
Rao–Blackwellised variable rate particle filter (RBVRPF) is also
implemented. The proposed VRPF and RBVRPF are compared
with a state-of-the-art pitch estimation algorithm, the YIN algo-
rithm. Simulation results show that more accurate estimation of
pitch can be obtained by VRPF and RBVRPF even under strong
background noise conditions.
Index Terms—variable rate particle filters, pitch estimation,
Rao–Blackwellisation, source-filter model.
I. INTRODUCTION
R
OBUST pitch estimation algorithms for speech signals
have wide application and thus have been proposed in
many papers. Previous algorithms are mainly based on time
domain and frequency domain techniques; for example, a
robust algorithm for pitch tracking (RAPT) algorithm and YIN,
a fundamental frequency estimator for speech and music [1]
[2] [3]. Most time domain algorithms are based on autocorre-
lation methods and the average magnitude difference function
(AMDF) method, which can be used to estimate the periods of
speech signals [4]. Recently a robust frequency domain algo-
rithm for pitch estimation has also been proposed [5]. Some
researchers have proposed a statistical method which chooses
peaks from short time s pectrum of speech signals [6]. Other
methods which have been proposed to estimate glottal waves
can also be used to extract pitch periods, include [7] [8].
However, very few of them can give satisfactory pitch tracking
results under strong noise conditions, for example, when the
Signal-to-Noise Ratio (SNR) is as poor as −5dBto−10 dB.
Particle filters have been used widely in tracking applica-
tions since their development in recent decades [9] [10] [11].
Manuscript received May 07, 2015; revised August 04, 2015 and December
29, 2015; accepted February 02, 2016. Date of publication February 18, 2016;
date of current version March 23, 2016. This work was supported in part by the
Cambridge Commonwealth, in part by the European and International Trust,
and in part by the Natural Science Foundation of China under Grant 61463035.
The associate editor coordinating the review of this manuscript and approving
it for publication was Dr. Sin-Horng Chen.
The authors are with the Signal Processing and Communication Laboratory,
Engineering Department, University of Cambridge, Cambridge CB2 1PZ, U.K.
(e-mail: gz246@cam.ac.uk; sjg30@cam.ac.uk).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TASLP.2016.2531285
However, little work has been done to apply t he particle filter
to pitch tracking. Recently it has been shown that the particle
filter approach can be used to track pitch period, using a quasi-
periodic speech signal model [12]. In this paper, we propose
another particle filter approach to address the pitch tracking
problem using a source-filter speech signal model, which is
capable of tracking pitch period under very noisy conditions.
A possible source-filter model that can be used to capture
the pitch period of speech signals is the time-varying autore-
gressive (AR) model driven by some source signals. Because
of the near-periodic properties of voiced speech signals, the
driving sources should themselves be near-periodic signals. A
promising driving source model is proposed here in this paper,
with an accompanying speech waveform model. Experiments
have been carried out to test the performance of the proposed
algorithm in various SNR conditions, showing that the pro-
posed method can track pitch periods more successfully than
state-of-the-art algorithms under noisy conditions.
The paper is organised as follows. Section 2 describes the
source-filter speech model. In Section 3, a detailed description
of the application of variable rate particle filters of the problem
is presented. An initialization step for the particle filter using a
joint optimization approach is derived in Section 4. Section 5
describes the details of the Rao-Blackwellisation approach to
the previous variable rate particle filter. Section 6 gives experi-
ment results for the proposed methods and compares them with
the YIN algorithm. Finally, conclusions are drawn in Section 7.
II. S
PEECH MODEL
A. A Time-Varying AR Source-Filter Model
It is known that in human speech the fundamental period has
a lower bound T
low
and an upper bound T
upp
.Then-th period
T
n
is thus modeled as,
T
n
= T
n−1
+ τ
n
,T
low
<T
n
<T
upp
. (1)
The speech signal at time t in the current period n
t
are repre-
sented as a periodic source input to a M-th order time-varying
AR model,
s
t
=
M
p=1
a
p
n
t
s
(t−p)
+ V
t
, (2)
where V
t
denotes the input source to the AR model and can
be modeled as either near-periodic signals or glottal pulse
sequences, while the current period n
t
is n : t ∈ [P
n
,P
n+1
].
P
n
is the time when the n-th period starts, i.e. P
n
=
n−1
i=1
T
i
.
2329-9290 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.