Proc. of the 14
th
Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011
CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH
NONSTATIONARY GABOR FRAMES
Gino Angelo Velasco
∗†
, Nicki Holighaus
∗
, Monika Dörfler
∗
, Thomas Grill
♯
∗
NuHAG, Faculty of Mathematics, University of Vienna, Austria
†
Institute of Mathematics, University of the Philippines, Diliman, Quezon City, Philippines
♯
Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria
{gino.velasco,nicki.holighaus,monika.doerfler}@univie.ac.at,
thomas.grill@ofai.at
ABSTRACT
An efficient and perfectly invertible signal transform featuring a
constant-Q frequency resolution is presented. The proposed ap-
proach is based on the idea of the recently introduced nonstation-
ary Gabor frames. Exploiting the properties of the operator corre-
sponding to a family of analysis atoms, this approach overcomes
the problems of the classical implementations of constant-Q trans-
forms, in particular, computational intensity and lack of invertibil-
ity. Perfect reconstruction is guaranteed by using an easy to calcu-
late dual system in the synthesis step and computation time is kept
low by applying FFT-based processing. The proposed method is
applied to real-life signals and evaluated in comparison to a related
approach, recently introduced specifically for audio signals.
1. INTRODUCTION
Many traditional signal transforms impose a regular spacing of fre-
quency bins. In particular, Fourier transform based methods such
as the short-time Fourier transform (STFT) lead to a frequency
resolution that does not depend on frequency, but is constant over
the whole frequency range. In contrast, the constant-Q transform
(CQT), originally introduced by J. Brown [1, 2], features a fre-
quency resolution dependent on the center frequencies of the win-
dows used for each bin and the center frequencies of the frequency
bins are not linearly, but geometrically spaced. In this sense, the
principal idea of CQT is reminiscent of wavelet transforms, com-
pare [3]: the Q-factor, i.e. the ratio of the center frequency to
bandwidth is constant over all bins and thus the frequency resolu-
tion is better for low frequencies whereas time resolution improves
with increasing frequency. However, the transform proposed in the
original paper [1] is not invertible and does not rely on any concept
of (orthonormal) bases. In fact, the number of bins used per octave
is much higher than most traditional wavelet techniques would al-
low for. Furthermore, the computational efficiency of the original
transform and its improved versions, [4], may be insufficient.
CQTs rely on perception-based considerations, which is one
of the reasons for their importance in the processing of speech and
music signals. In these fields, the lack of invertibility of existing
CQTs has become an important issue: for important applications
such as masking of certain signal components or transposition of
This work was supported by the Vienna Science, Research and Tech-
nology Fund (WWTF) project Audio-Miner (MA09-024) and Austrian
Science Fund (FWF) projects LOCATIF(T384-N13) and SISE(S10602-
N13).
an entire signal or, again, some isolated signal components, the
unbiased reconstruction from analysis coefficients is crucial. An
interesting and promising approach to music processing with CQT
was recently suggested in [5], also cf. references therein.
In the present contribution, we take a different point of view
and consider both the implementation and inversion of a constant-
Q transform in the context of the nonstationary Gabor transform
(NSGT). Classical Gabor transform [6, 7] may be understood as
a sampled STFT or sliding window transform. The generalization
to NSGT was introduced in [8, 9] and allows for windows with
flexible, adaptive bandwidths. Figure 1 shows examples of spec-
trograms of the same signal obtained from the classical sampled
STFT (Gabor transform) and the proposed constant-Q nonstation-
ary Gabor transform (CQ-NSGT).
If the analysis windows are chosen appropriately, both analy-
sis and reconstruction is realized efficiently with FFT-based meth-
ods. The original motivation for the introduction of NSGT was
the desire to adapt both window size and sampling density in time,
in order to resolve transient signal components more accurately.
Here, we apply the same idea in frequency: we use windows with
adaptive, compact bandwidth and choose the time-shift parameters
dependent on the bandwidth of each window. The construction of
the atoms, i.e. the shifted versions of the basic window functions
used in the transform, is done directly in the frequency domain,
see Sections 2.2 and 3.1. This approach allows for efficient imple-
mentation using the FFT, as explained in Section 2.3. To exploit
the efficiency of FFT, the signal of interest must be transformed
into the frequency domain. For long real-life signals (e.g. signals
longer than 10 seconds at a sampling rate of 44100Hz), process-
ing is therefore done on consecutive time-slices, which is a natural
processing step in real-time signal analysis
1
. The resolution of the
proposed CQ-NSGT is identical to that of the CQT and perfect re-
construction is assured by relying on concepts from frame theory,
which will be discussed next.
2. NONSTATIONARY GABOR FRAMES
Frames were first mentioned in [10], also see [11, 12]. Frames are
a generalization of (orthonormal) bases and allow for redundancy
and thus for much more flexibility in design of the signal repre-
sentation. Thus, frames may be tailored to a specific application
1
If the time-slicing is done using smooth windows with a judiciously
chosen amount of zero-padding, no undesired artifacts after modification
of the analysis coefficients have to be expected. Mathematical details and
error estimates will be given elsewhere.
DAFX-1
Proc. of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011
DAFx-93