RobustOn-lineBeatTrackingwithKalmanFilteringandprobabilisticdataassociation资源-CSDN文库

需积分: 13 140 浏览量 2010-06-09 22:54:23 上传评论收藏 556KB PDF 举报

### 基于卡尔曼滤波与概率数据关联的稳健在线节拍跟踪 #### 概述本文介绍了一种利用卡尔曼滤波（KF）与概率数据关联（PDA）进行在线音乐节拍跟踪的方法。这种方法针对音乐信号处理中的一个核心问题——自动识别音乐中的节拍或节奏，进行了深入研究。作者们首先将节拍跟踪过程建模为一种线性动态系统，并通过卡尔曼滤波算法对该系统进行估计，以确定时间变化的节奏和节拍位置。然而，在存在节奏波动和表达性时间偏差的情况下，传统的卡尔曼滤波方法并不足够可靠。为了解决这一问题，文章提出采用数据关联技术来为所有可能的节拍解释分配概率质量，并根据权重定位真实的节拍。 #### 方法与实现在文中，作者提出了两种方法：PDA-I 和 PDA-II。PDA-I 通过计算候选观测值与预测节拍位置之间的距离来进行权重选择；而 PDA-II 不仅考虑了距离因素，还加入了音强（即音乐信号中的强度变化）作为权重选择的因素之一。这两种方法都旨在提高在实际音乐数据中的节拍跟踪性能。 #### 技术细节 1. **卡尔曼滤波**： - 卡尔曼滤波是一种用于处理噪声数据的递归算法，特别适用于线性动态系统的状态估计。 - 在本研究中，卡尔曼滤波被应用于动态系统模型，该模型描述了音乐中节拍的进展过程。 - 通过卡尔曼滤波，可以有效地估计出音乐信号中随时间变化的节奏和节拍位置。 2. **概率数据关联（PDA）**： - PDA 是一种数据融合技术，用于解决多目标跟踪中的观测到目标的对应问题。 - 在音乐节拍跟踪领域，PDA 被用来处理多个可能的节拍解释，并为其分配相应的概率质量。 - 通过这种方式，即使在存在多个潜在节拍位置的情况下，也能有效地定位最真实的节拍。 3. **PDA-I 和 PDA-II 的比较**： - **PDA-I** 仅基于候选观测值与预测节拍位置之间的距离来计算权重，这种方法相对简单。 - **PDA-II** 在计算权重时考虑了额外的信息——音强，这使得它能更准确地反映音乐信号的真实特性。 - 实验结果表明，PDA-II 在处理复杂的音乐数据时具有更好的性能。 #### 应用案例为了验证所提出的算法的有效性，作者们使用了两个标准数据库进行了实验：一个是 Music Information Retrieval Evaluation Exchange (MIREX) 2006 节拍跟踪竞赛练习数据集，另一个是 Billboard Top-10 数据库。实验结果显示，无论是在 MIREX 数据集还是 Billboard Top-10 数据库上，所提出的节拍跟踪算法都表现出显著优于传统方法的性能。 #### 结论本文提出了一种基于卡尔曼滤波和概率数据关联的在线音乐节拍跟踪方法。通过将节拍跟踪过程建模为线性动态系统，并结合卡尔曼滤波和数据关联技术，该方法能够在存在节奏波动和表达性时间偏差的情况下，更准确地估计音乐中的时间变化节奏和节拍位置。此外，提出的 PDA-I 和 PDA-II 方法分别通过不同的权重选择策略进一步提高了算法的鲁棒性和准确性。实验结果证明了该方法在处理实际音乐数据时的有效性和优越性，为音乐信号处理领域的研究提供了一个有力的工具。

资源推荐

资源详情

资源评论

Y. Shiu et al.: Robust On-line Beat Tracking with Kalman Filtering and Probabilistic Data Association (KF-PDA)

1369

Robust On-line Beat Tracking with Kalman Filtering

and Probabilistic Data Association (KF-PDA)

Yu Shiu, Student Member, IEEE, Namgook Cho, Student Member, IEEE, Pei-Chen Chang

and C.-C. Jay Kuo, Fellow, IEEE

Abstract — A Kalman filtering (KF) approach to on-line

musical beat tracking with probabilistic data association

(PDA) is investigated in this work. We first formulate the beat

tracking process as a linear dynamic system of beat

progression, and then apply the Kalman filtering algorithm to

the dynamic system in estimating the time-varying tempo and

beat locations. Musical beat tracking using traditional

Kalman filtering is however not reliable in the presence of

tempo fluctuations and expressive timing deviations. To

address this problem, we adopt data association techniques to

assign probability masses to all possible beat interpretations,

and then locate the true beat according to the weighting. Two

methods are proposed. The first one (PDA-I) weighs the

distance between the candidate observation and the predicted

beat location while the second method (PDA-II) considers not

only the distance but also the onset intensity in weight

selection. Superior performance of the proposed beat tracking

algorithm is demonstrated with simulation results on the

Music Information Retrieval Evaluation Exchange (MIREX)

2006 beat tracking competition practice dataset and the

Billboard Top-10 database

Index Terms — Musical signal processing, on-line beat

tracking, Kalman filter, probabilistic data association, music

information retrieval.

I. INTRODUCTION

When listening to music, most people even without musical

education can grasp the speed of music and follow it by foot-

tapping or hand-clapping along with beats. However, the same

is not true for electronic devices. Automatic beat tracking has

been an active area of research for more than twenty years.

The beat is a fundamental unit of the temporal structure of

music, especially to Western music, and beat tracking is an

essential task in many musical applications such as musical

analysis, synchronization, editing of musical sounds, and

human-computer improvisation. This work presents an on-line

(or causal) musical beat tracking system, where beat

estimation at a given time depends only on past and present

data.

Beat tracking is defined by estimating the possibly time-

varying tempo and the time location of each beat, where the

beat is referred to as the foot tapping and tempo as the beat

rate [1]. Our research goal is to estimate the set of beat

Part of this work was presented at ICCE2008, Las Vegas, NV USA.

The authors are with Department of Electrical Engineering and Signal and

Image Processing Institute, University of Southern California, Los Angeles,

CA 90089-2564 USA (e-mails: atoultaro@gmail.com

, namgookc@usc.edu,

peichenc@usc.edu

, and cckuo@sipi.usc.edu).

locations from musical audio signals sequentially. Ideally,

when beat pulses are strong and the duration between adjacent

beats is perceptually clear, automatic beat tracking can be

done easily. Its performance nevertheless degrades

significantly in practice due to several reasons. The first one

comes from rest notes and missed-beat syncopation. The rest

notes hide beat tracking cues, whereas syncopation does not

have an onset pulse on expected beat location but with a small

shift. The second one is due to variability in human

performance. Even if a performer attempts to keep the

duration between two consecutive beats constant throughout

the whole music piece, the actual duration tends to vary along

time. The last one is that some music pieces have time-varying

tempo and, consequently, a time-varying beat period. The

performance of beat tracking algorithms is often less robust

when dealing with classical music, as compared with that

containing drum sounds [1], [2].

Early work on automatic beat tracking was done by

researchers in the fields of music perception and computer

science [3]. More recently, Brown [4] used the autocorrelation

function to examine the pulses in musical scores. Scheirer [5]

applied a bank of comb filters to a musical signal at different

fixed frequencies and searched for the filter that gives the

strongest response for tempo estimation. Afterwards, the beat

location was calculated by examining the phase of the filtered

signal. Goto [2] developed an on-line beat tracking system

that can process music with or without drum sounds. The

system recognizes the hierarchical beat structure using three

kinds of musical knowledge: onset times, chord changes, and

drum patterns. A probabilistic generative model for tempo

tracking was examined by Cemgil et al. [6],[7]. A Kalman

filtering process was used to track beats in [6], which was

followed by using the tempogram representation to assign

probability masses to all possible beat candidates, while

Monte Carlo methods were exploited to infer a hidden tempo

variable in [7]. Hainsworth and Macleod [8] used particle

filters to associate onsets from an audio signal to a time-

varying tempo process so as to determine the beat locations.

Most of earlier work for beat tracking used symbolic or

musical instrument digital interface (MIDI) data, e.g.,

[4],[6],[7]. Audio signals have been examined more recently,

e.g., [2],[5],[8]. In addition, most previous beat tracking

systems adopt a non-causal method that allows the use of

future data and backward decoding, which is not suitable for

real-time implementation in consumer electronic applications.

In this work, we present a method that extracts beat

locations from acoustic musical signals, not limited to any

particular music type, including both classical music and

Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on April 21,2010 at 13:44:39 UTC from IEEE Xplore. Restrictions apply.

IEEE Transactions on Consumer Electronics, Vol. 54, No. 3, AUGUST 2008

1370

modern music with drums. Our research is motivated by the

probabilistic model proposed by Cemgil et al. in [6]. More

specifically, after pre-processing audio signals and extracting

onsets, we formulate the beat tracking process as a linear

dynamic system of beat progression by following the

framework in [6]. Then, a Kalman filtering process can be

applied to the dynamic system to estimate the hidden state, i.e.,

the beat location and the period. However, beat estimation

using only Kalman filtering process is not reliable in the

presence of tempo fluctuations and expressive timing

deviations.

To improve the tracking performance, a probabilistic

approach can be used by assigning probability masses to all

possible beat interpretations. Following this line of thought,

the tempogram representation was adopted in [6]. Here, we

adopt an alternative approach known as the probabilistic data

association (PDA) technique to enhance the robustness of

Kalman filtering. PDA has been widely used in real-time

object tracking in computer vision [9]-[11] and radar

applications [12],[13]. The proposed method, called KF-PDA,

is theoretically elegant and computationally efficient as

compared with the KF-tempogram approach [6]. For example,

the switching mechanism required by the KF-tempogram

approach to handle outliers is not needed in the proposed KF-

PDA solution.

The basic idea of KF-PDA is briefly described below. First,

a simple strategy, called local maximum selection, is used to

choose the location of the predicted beat that has the

maximum onset intensity among a set of beat candidates

within a fixed window. Then, we consider two PDA methods.

The first one (PDA-I) weighs the distance between the

candidate observation and the predicted beat location while

the second one (PDA-II) considers the distance as well as the

onset intensity in weight selection. It is demonstrated by

experimental results that the proposed beat tracking system

leads to reliable performance even with tempo fluctuations

and beat deviations.

The rest of this paper is organized as follows. Sec. II

describes the pre-processing of music signals for beat tracking.

A linear dynamic model of beat progression and Kalman

filtering algorithm are given in Sec. III. In Sec. IV, beat

selection techniques based on PDA are discussed. Finally,

experimental results are given to compare the performance of

two PDA methods in Sec. V, followed by concluding remarks

and future research directions in Sec. VI.

II. M

USICAL DATA PRE-PROCESSING

The block-diagram of the proposed musical beat tracking

algorithm is illustrated in Fig. 1. Given the acoustic waveform

of a musical signal, musical data pre-processing is performed

to extract temporal locations of musical onsets by two

modules: 1) onset detection and 2) periodicity estimation.

Typically, these tasks can be done within a local temporal

interval of an analysis window and updated from one interval

to the other. The temporal locations and intensities of onsets

are used as the input data to the next module, i.e., the Kalman

filtering process. It should be noted that although there exist

many techniques for onset detection and periodicity estimation,

we choose classical algorithms in our implementation for their

simplicity. Next, the Kalman filtering process is used to

estimate the hidden state: temporal locations of beats and their

period. At each step, we validate only measurements whose

predicted probability is sufficiently high, and then select the

best beat estimate by assigning probability masses to the

validated measurements. In this section, we focus on music

data pre-processing.

Fig. 1. Overview of the proposed musical beat tracking system.

A. Onset Detection

The aim of onset detection is to extract a detection function

to indicate the locations of the most salient features of an

audio signal [14]. These events are particularly crucial to beat

perception and provided as an input to the proposed musical

beat tracking system as shown in Fig. 1. The onset detection

task falls into two categories: detection of percussive events

and harmonic changes [8]. The transient events, usually

coming from drum sounds, have strong energy changes while

the change of musical piches/harmonies, usually due to the

arrival of a new note, is associated with small energy changes.

Here, we adopt a cepstral distance method to calculate the

musical onset detection function as described below. The

discrete Fourier transform of the input audio signal is

calculated for every 20-ms time frame, which is Hanning-

windowed and overlapping with each other by 50%. In each

frame, the spectrum is mapped onto the mel-scale using the

triangular mel-scale filter bank. Then, the mel-frequency

cepstral coefficients (MFCCs) are calculated by taking the

cosine transform of a log power spectrum on the mel-scale

frequency, denoted by the mth cepstral coefficient in the nth

time frame,

Lmnc

,,0),( L

, where L is the order of the

cepstral coefficients.

Since low-order MFCCs are highly correlated to the mel-

scale energy envelope of audio signals, we choose four low-

order coefficients to represent the energy change of audio

signals. Specifically, the 0th order coefficient,

)(

represents exactly the mel-scale energy while three low-order

coefficients,

)(

nc , )(

nc , and )(

nc , capture well the

energy change of harmonic sounds. Then, the chosen

coefficients are averaged over p consecutive time frames, i.e.,

)1(,),(

−

pncnc

L to represent the smoothed

coefficients,

3,,0 ),(

mnc

at time frame n. We selected

Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on April 21,2010 at 13:44:39 UTC from IEEE Xplore. Restrictions apply.

剩余8页未读，继续阅读

评论收藏

内容反馈