NSGT.zip_cqt资源-CSDN文库

共7个文件

pdf：4个

zip：3个

版权申诉

53 浏览量 2022-09-24 20:04:14 上传评论收藏 9.01MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

NSGT.zip （7个子文件）

NSGT

CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH.pdf 3.8MB

dogrhove12_amsart.pdf 935KB

NSGToolbox_V010.zip 548KB

CQT_toolbox_2013.zip 1.71MB

cqt_toolbox.zip 13KB

smc2010.pdf 466KB

schoerkhuber-aes-2014.pdf 2.32MB

Proc. of the 14

Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH

NONSTATIONARY GABOR FRAMES

Gino Angelo Velasco

∗†

, Nicki Holighaus

∗

, Monika Dörﬂer

∗

, Thomas Grill

♯

∗

NuHAG, Faculty of Mathematics, University of Vienna, Austria

†

Institute of Mathematics, University of the Philippines, Diliman, Quezon City, Philippines

♯

Austrian Research Institute for Artiﬁcial Intelligence (OFAI), Vienna, Austria

{gino.velasco,nicki.holighaus,monika.doerfler}@univie.ac.at,

thomas.grill@ofai.at

ABSTRACT

An efﬁcient and perfectly invertible signal transform featuring a

constant-Q frequency resolution is presented. The proposed ap-

proach is based on the idea of the recently introduced nonstation-

ary Gabor frames. Exploiting the properties of the operator corre-

sponding to a family of analysis atoms, this approach overcomes

the problems of the classical implementations of constant-Q trans-

forms, in particular, computational intensity and lack of invertibil-

ity. Perfect reconstruction is guaranteed by using an easy to calcu-

late dual system in the synthesis step and computation time is kept

low by applying FFT-based processing. The proposed method is

applied to real-life signals and evaluated in comparison to a related

approach, recently introduced speciﬁcally for audio signals.

1. INTRODUCTION

Many traditional signal transforms impose a regular spacing of fre-

quency bins. In particular, Fourier transform based methods such

as the short-time Fourier transform (STFT) lead to a frequency

resolution that does not depend on frequency, but is constant over

the whole frequency range. In contrast, the constant-Q transform

(CQT), originally introduced by J. Brown [1, 2], features a fre-

quency resolution dependent on the center frequencies of the win-

dows used for each bin and the center frequencies of the frequency

bins are not linearly, but geometrically spaced. In this sense, the

principal idea of CQT is reminiscent of wavelet transforms, com-

pare [3]: the Q-factor, i.e. the ratio of the center frequency to

bandwidth is constant over all bins and thus the frequency resolu-

tion is better for low frequencies whereas time resolution improves

with increasing frequency. However, the transform proposed in the

original paper [1] is not invertible and does not rely on any concept

of (orthonormal) bases. In fact, the number of bins used per octave

is much higher than most traditional wavelet techniques would al-

low for. Furthermore, the computational efﬁciency of the original

transform and its improved versions, [4], may be insufﬁcient.

CQTs rely on perception-based considerations, which is one

of the reasons for their importance in the processing of speech and

music signals. In these ﬁelds, the lack of invertibility of existing

CQTs has become an important issue: for important applications

such as masking of certain signal components or transposition of

This work was supported by the Vienna Science, Research and Tech-

nology Fund (WWTF) project Audio-Miner (MA09-024) and Austrian

Science Fund (FWF) projects LOCATIF(T384-N13) and SISE(S10602-

N13).

an entire signal or, again, some isolated signal components, the

unbiased reconstruction from analysis coefﬁcients is crucial. An

interesting and promising approach to music processing with CQT

was recently suggested in [5], also cf. references therein.

In the present contribution, we take a different point of view

and consider both the implementation and inversion of a constant-

Q transform in the context of the nonstationary Gabor transform

(NSGT). Classical Gabor transform [6, 7] may be understood as

a sampled STFT or sliding window transform. The generalization

to NSGT was introduced in [8, 9] and allows for windows with

ﬂexible, adaptive bandwidths. Figure 1 shows examples of spec-

trograms of the same signal obtained from the classical sampled

STFT (Gabor transform) and the proposed constant-Q nonstation-

ary Gabor transform (CQ-NSGT).

If the analysis windows are chosen appropriately, both analy-

sis and reconstruction is realized efﬁciently with FFT-based meth-

ods. The original motivation for the introduction of NSGT was

the desire to adapt both window size and sampling density in time,

in order to resolve transient signal components more accurately.

Here, we apply the same idea in frequency: we use windows with

adaptive, compact bandwidth and choose the time-shift parameters

dependent on the bandwidth of each window. The construction of

the atoms, i.e. the shifted versions of the basic window functions

used in the transform, is done directly in the frequency domain,

see Sections 2.2 and 3.1. This approach allows for efﬁcient imple-

mentation using the FFT, as explained in Section 2.3. To exploit

the efﬁciency of FFT, the signal of interest must be transformed

into the frequency domain. For long real-life signals (e.g. signals

longer than 10 seconds at a sampling rate of 44100Hz), process-

ing is therefore done on consecutive time-slices, which is a natural

processing step in real-time signal analysis

. The resolution of the

proposed CQ-NSGT is identical to that of the CQT and perfect re-

construction is assured by relying on concepts from frame theory,

which will be discussed next.

2. NONSTATIONARY GABOR FRAMES

Frames were ﬁrst mentioned in [10], also see [11, 12]. Frames are

a generalization of (orthonormal) bases and allow for redundancy

and thus for much more ﬂexibility in design of the signal repre-

sentation. Thus, frames may be tailored to a speciﬁc application

If the time-slicing is done using smooth windows with a judiciously

chosen amount of zero-padding, no undesired artifacts after modiﬁcation

of the analysis coefﬁcients have to be expected. Mathematical details and

error estimates will be given elsewhere.

DAFX-1

Proc. of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

DAFx-93

Proc. of the 14

Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

time (seconds)

frequency (Hz)

Kafziel − dB−scaled regular Gabor transform

0 2 4 6 8 10 12

200

800

3200

12800

22050

Kafziel − dB−scaled CQ−NSGT

time (seconds)

frequency (Hz)

0 2 4 6 8 10 12

200

800

3200

12800

22050

Figure 1: Representations of a musical piece for violin and piano

using the classical sampled STFT (Gabor transform) and the CQ-

NSGT, respectively. A Hann window of length 1024 samples with

a hop-size of 512 samples was used for the Gabor transform, while

a minimum frequency of ξ

min

= 50 Hz at 48 bins per octave was

used for the CQ-NSGT.

or certain requirements such as a constant-Q frequency resolution.

Loosely speaking, we wish to expand, or represent, a given sig-

nal of interest as a linear combination of some building blocks or

atoms ϕ

n,k

, with (n, k) ∈ Z × Z, which are the members of our

frame:

f =

n,k

(1)

for some coefﬁcients c

n,k

. The double indexes (n, k) allude to the

fact that each atom has a certain location and concentration in time

and frequency, compare Figure 2. Frame theory now allows us to

determine, under which conditions an expansion (1) is possible

and how coefﬁcients leading to stable, perfect reconstruction may

be determined.

We introduce the concept of frames for a Hilbert space H. In

a continuous setting, one may think of H = L

(R), whereas we

will choose H = C

, L being the signal length, for describing

the implementation.

2.1. Frames

Consider a collection of atoms ϕ

n,k

∈ H with (n, k) ∈ Z × Z.

Here, n may be thought of as a time index and k as an index related

to frequency. We then deﬁne the frame operator S by

Sf =

n,k

hf, ϕ

n,k

iϕ

n,k

for all f ∈ H. Note that, if the set of functions {ϕ

n,k

, (n, k) ∈

Z × Z} is an orthonormal basis, then S is the identity operator. If

S is invertible on H, then the collection {ϕ

n,k

}, (n, k) ∈ Z × Z

is a frame. In this case, we may deﬁne a dual frame by

n,k

= S

−1

n,k

Then, reconstruction from the coefﬁcients c

n,k

= hf, ϕ

n,k

iis pos-

sible:

f = S

−1

Sf =

n,k

hf, ϕ

n,k

−1

n,k

2.2. The Case of Painless Nonstationarity

In a general setting, the inversion of the operator S poses a prob-

lem in numerical realization of frame analysis. However, it was

shown in [13], that under certain conditions, usually fulﬁlled in

practical applications, S is diagonal. This situation of painless

non-orthogonal expansions can now be generalized to allowing

for adaptive resolution. Adaptive time-resolution was described

in [8, 9], and here we turn to adaptivity in frequency in the same

manner.

In the sequel, let T

denote a time-shift by x, M

denote a

frequency shift (or modulation) by ω and Ff =

f the Fourier

transform of f . Let ϕ

, k ∈ Z, be band-limited windows, well-

localized in time, whose Fourier transforms ψ

= cϕ

are cen-

tered around possibly irregularly (or, e.g. geometrically) spaced

frequency points ξ

Then, we choose frequency dependent time-shift parameters

(hop-sizes) a

as follows: if the support of cϕ

is contained in an

interval of length |I

|, then we choose a

such that

≤

for all k.

In other words, the time-sampling points have to be chosen dense

enough to guarantee this condition. Finally, we obtain the frame

members by setting

n,k

= T

Under these conditions on the windows ϕ

and the hop-sizes a

the frame operator is diagonal in the Fourier domain: since, by uni-

tarity of the Fourier transform [14] and the Walnut representation

of the frame operator [15], we have

hSf, f i =

n,k

|hf, T

n,k

f, M

−na

cϕ



|cϕ



DAFX-2

Proc. of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

DAFx-94

Proc. of the 14

Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

the frame operator assumes the following form:

Sf = F

−1



|cϕ



. (2)

See [16, 13, 17] for detailed proofs of the diagonality of the frame

operator in the described setting. From (2), it follows immediately

that the frame operator is invertible whenever there exist real num-

bers numbers A and B such that the inequalities

0 < A ≤

|cϕ

≤ B < ∞ (3)

hold almost everywhere. In this case, the dual frame is given by

the elements

n,k

= T



−1



cϕ



|bϕ



2.3. Realization in the Frequency domain

Based on the implementation of nonstationary Gabor frames per-

forming adaptivity in the time domain [9], the above framework

permits a fast realization by considering the Fourier transform of

the input signal. The transform coefﬁcients c

n,k

= hf, ϕ

n,k

i take

the form

n,k

= hf, T

i = h

f, M

−na

cϕ

and can be calculated, for each k, with an inverse FFT (IFFT)

of length determined by the support of ψ

= cϕ

. Similarly,

reconstruction is realized by applying the dual windows cγ

cϕ



|bϕ

in a simple overlap-add process:

f =

n,k

f, M

−na

cϕ

−na

cγ

. (4)

3. THE CQ-NSGT PARAMETERS: WINDOWS AND

LATTICES

We will now describe in detail the parameters involved in the de-

sign of a nonstationary Gabor transform with constant-Q frequency

resolution.

The CQT in [1] depends on the following parameters: the win-

dow functions, the number of frequency bins per octave, the mini-

mum and maximum frequencies. These parameters determine the

Q-factor, which is, as mentioned before, the ratio of the center

frequency to the bandwidth. Here, the Q-factor is desired to be

constant for all the relevant bins.

Let B and ξ

min

denote the number of frequency bins per oc-

tave and the desired minimum frequency, respectively. For the

proposed CQ-NSGT, we consider band-limited window functions

∈ C

, k = 1, . . . , K, with center frequencies ξ

(in Hz) sat-

isfying ξ

= ξ

min

k−1

, as in the classical CQT. The maximum

frequency ξ

max

is restricted to be less than the Nyquist frequency

/2, where ξ

denotes the sampling frequency. Further, we re-

quire the existence of an index K such that ξ

max

≤ ξ

< ξ

/2.

We may set K = ⌈B log

(ξ

max

/ξ

min

) + 1⌉, with ⌈z⌉ denoting the

smallest integer greater than or equal to z.

Note that in the CQT, since the frequency spacing in the CQT

is geometric, no 0-frequency is present and some high frequency

content might not be represented. In the CQ-NSGT, however, there

is freedom to use additional center frequencies, at negligible com-

putational cost, to guarantee perfect reconstruction.

In our current implementation, tailored to (real) audio signals,

we consider some symmetry in the frequency domain, and take the

following values for the frequency-centers ξ











0, k = 0

min

k−1

, k = 1, . . . , K

/2, k = K + 1

− ξ

2K+2−k

, k = K + 2, . . . , 2K + 1.

The bandwidth Ω

(the support of the window in frequency)

of ϕ

is set to be Ω

= ξ

k+1

− ξ

k−1

, for k = 2, . . . , K − 1,

which leads to a constant Q-factor Q = (2

− 2

−

). To obtain

the same Q-factor on the relevant frequency bins, Ω

and Ω

are

therefore set to be ξ

/Q and ξ

/Q, respectively. Finally, we let

Ω

= 2ξ

min

and Ω

K+1

= ξ

− 2ξ

. In summary, we have

the following values for Ω

Ω











2ξ

min

, k = 0

/Q, k = 1, . . . , K

− 2ξ

, k = K + 1

2K+2−k

/Q, k = K + 2, . . . , 2K + 1.

3.1. Window Choice: Satisfying the Frame Conditions

We now give the details on the windows ϕ

to be used such that

(3) and hence the frame property is fulﬁlled.

We use a Hann window

h that is zero outside [−1/2, 1/2],

i.e. a standard Hann window centered at 0 with support of length

1. We obtain the atoms ϕ

by translation and dilation of

h: cϕ

[j] =

h((jξ

/L − ξ

)/Ω

), k = 1, . . . , K, K + 2, . . . , 2K + 1, j =

0 . . . , L − 1.

For the windows corresponding to the 0 and Nyquist frequen-

cies, we use a plateau-like function ˆg, e.g. a Tukey window. We

obtain ϕ

and ϕ

K+1

by setting cϕ

[j] = ˆg( (jξ

/L − ξ

)/Ω

k = 0, K + 1.

Now, for the collection of time-shifts of the constructed win-

dows a

, we require a

≤ ξ

/Ω

in order to satisfy (3). The ϕ

n,k

are then given by their Fourier transforms as:

dϕ

n,k

= M

−na

cϕ

, n = 0, . . . , ⌈

⌉ − 1.

Figure 2 illustrates the time-frequency sampling grid of the set-up

with the sampling points taken geometrically over frequency and

linearly over time. Given these parameters, the coefﬁcients of the

CQ-NSGT are of the form c

n,k

= hf, ϕ

n,k

i = h

f, dϕ

n,k

i, f ∈

. We note that the time-shift parameters can also be ﬁxed to

have the same value a = min

} and the coefﬁcients obtained

from the CQ-NSGT can be put in a matrix of size ⌈

⌉×2(K +1).

From the given support condition, the system {cϕ

}

has an

overlap factor of around 1/2. This implies that for the case where

= ξ

/Ω

, the redundancy of the system is approximately 2.

By construction, the sum

2K+1

m=0



cϕ



is ﬁnite and

bounded away from 0. From Sections 2.2 and 2.3, the frame opera-

tor is invertible and perfect reconstruction of the signal is obtained

from the coefﬁcients c

n,k

by applying (4).

DAFX-3

Proc. of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

DAFx-95

评论收藏

内容反馈

版权申诉

小贝德罗

粉丝: 71
资源: 1万+

NSGT.zip_cqt

PyPI 官网下载 | nsgt-0.17.tar.gz

libnsgt:非平稳Gabor变换

s变换matlab代码-nsgt:非平稳Gabor变换（NSGT），Python实现

flippy:Flippy-音乐乐谱音频对齐

冰河的渗透实战笔记-冰河.pdf

大灰狼远控2021最新版，解压密码222

J-LINK V10 V11固件.rar

ISO21434.pdf

Web安全漏洞扫描工具-AWVS14

stm32f103 adc采样+dma传输+fft处理 频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_

CTF 竞赛入门指南（ctf-all-in-one）.pdf

Web中间件常见漏洞总结.pdf

jts-1.14.zip

CobaltStrike4.4.zip

RK3568硬件设计资料.zip_C#

cisp-pte渗透测试资源下载 （考试环境+题库）

goby2021红队专版，1.8.255

DEAP2.1.zip_DEA2.1软件下载_dea 2.1软件下载_deap2.1_deap2.1基础模型_dea模型

数据结构与算法分析--C语言描述_数据结构与算法_

海康威视配置文件解码专用工具器.rar

苹果越狱解ID博客中提到的所有工具集.zip

QT帮助文档_中文版_QT中文版帮助文档_

pconline1478255959502.rar

熵值法_stata熵权法_熵权法stata_熵值法stata_state熵值法_面板数据熵值法stata代

APPinvent，蓝牙软件源文件

Burpsuite使用手册中文版全套

AWD之赛前培训.pptx

内网渗透攻击线路图.pdf

Unity 中文语言包 zh-hans.po

最新资源

stm32f103 adc采样+dma传输+fft处理频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_

cisp-pte渗透测试资源下载（考试环境+题库）