参数贝叶斯字典学习在强混响环境中的声源定位资源-CSDN文库

92 浏览量 2021-03-07 21:13:07 上传评论收藏 1.99MB PDF 举报

声源定位技术在声学、语音信号处理、机器人导航以及无线通信领域中扮演着重要的角色。尤其是在混响环境中，多径效应的存在使得声源定位更具挑战。传统的声源定位方法在强混响环境下往往表现不佳，因而迫切需要更为高效和准确的算法。在讨论“参数贝叶斯字典学习在强混响环境中的声源定位”这一主题时，首先需要关注的是稀疏表示技术。稀疏表示技术因其能够准确描述混响环境中复杂的多径效应，而成为声源定位的一个重要工具。稀疏表示通过建立一个字典来逼近信号，字典中的每个原子可以表示一种特定的源到麦克风的多径信道。这种方法利用信号在某种变换域（如傅里叶域）中的稀疏性来提升定位的准确性和鲁棒性。接下来，参数贝叶斯字典学习作为一种特殊的稀疏表示技术，引入了参数化的字典。在这里，字典的内空间通过未知能量反射比参数化，这样的参数化字典学习能够更好地适应声学环境的变化。具体而言，每个字典原子对应于特定的源到麦克风的多径信道，从而使声源定位问题可以被重新表述为一个联合稀疏信号恢复和参数化字典学习问题。为了解决这一问题，研究者们采用了稀疏贝叶斯框架。在这个框架下，通过变分贝叶斯期望最大化技术获得了问题的解。变分贝叶斯期望最大化技术是一种迭代算法，它通过交替地最大化观测数据的边际似然来估计模型参数，同时假设参数的后验分布服从一定的分布形式。这种方法允许在不需要复杂参数调优的情况下进行学习，并且可以提供统计信息。此外，算法还利用了频域内的联合稀疏性来提高字典学习的性能。在实际应用中，尤其在强混响环境下，这种方法显示出高声源定位准确性、低旁瓣水平以及对多源情况下的高鲁棒性。而且，其计算复杂度相对较低，与目前最先进的其他方法相比具有明显的优势。文章还提到了算法的数值仿真结果，这些结果表明提出的算法在强混响环境下，对于多个声源的定位准确、鲁棒，同时计算复杂度低。这意味着该技术能够在复杂环境中实现更为有效的声源定位，为实际应用提供了新的可能性。总体而言，参数贝叶斯字典学习结合了稀疏表示技术与贝叶斯推断的优点，在强混响环境下声源定位问题上展现了其独特的优越性。它不仅提高了定位性能，同时简化了模型参数的选择过程，减少了对先验知识的依赖。这项研究为声源定位领域提供了新的思路和方法，预示着未来在噪声环境和复杂场景中定位技术的进一步发展和应用。

资源推荐

资源详情

资源评论

Signal Processing 143 (2018) 232–240

Contents lists available at ScienceDirect

Signal Processing

journal homepage: www.elsevier.com/locate/sigpro

Acoustic source localization in strong reverberant environment by

parametric Bayesian dictionary learning

Lu Wang

, Yanshan Liu

, Lifan Zhao

b , ∗

, Qiang Wang

, Xiangyang Zeng

, Kean Chen

School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, 710072, China

School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore

a r t i c l e i n f o

Article history:

Received 4 May 2017

Revised 29 August 2017

Accepted 1 September 2017

Available online 14 September 2017

Keywords:

Source localization

Sparse Bayesian method

Parametric dictionary learning

Reverberant environment

a b s t r a c t

Sparse representation techniques have become increasingly promising for localizing the sound source in

reverberant environment, where the multipath channel effects can be accurately characterized by the

image model. In this paper, a dictionary is constructed by discretizing the inner space of the enclosure,

which is parameterized by the unknown energy reﬂective ratio. More speciﬁcally, each atom of the dic-

tionary can characterize a speciﬁc source-to-microphone multipath channel. Subsequently, source local-

ization can be reformulated as a joint sparse signal recovery and parametric dictionary learning problem.

In particular, a sparse Bayesian framework is utilized for modeling, where its solution can be obtained

by variational Bayesian expectation maximization technique. Moreover, the joint sparsity in frequency

domain is exploited to improve the dictionary learning performances. A remarkably advantage of this

approach is that no laborious parameter tuning procedure is required and statistical information can be

provided. Numerical simulation results have shown that the proposed algorithm achieves high source

localization accuracy, low sidelobes and high robustness for multiple sources with low computational

complexity in strong reverberant environments, compared with other state-of-the-art methods.

1. Introduction

Sound source localization from measurements of microphone

array is one of the key techniques for video-conferencing, surveil-

lance, source separation, etc. Conventional source localization

methods can generally be divided into three categories [1–4] , i.e.,

maximization of the Steered Response Power (SRP) of a beam-

former, high-resolution spectral estimation, and time-difference of

arrival (TDOA). These methods perform rather well in a free ﬁeld,

but suffer from degraded performances in reverberant environ-

ments due to the distortions caused by reﬂections of sound waves

from the surrounding walls. Although proper modiﬁcation could be

made to alleviate their vulnerability in reverberant environment, it

remains challenging to prevent performance losses of these meth-

ods in the presence of a strong reverberant ﬁeld.

More recently, sparse representation has become an increas-

ingly promising technique in acoustic source localization due to

its capability of achieving higher resolution with a few sources

present in the interested enclosure [5–14] . By discretizing the en-

∗

Corresponding author.

E-mail addresses: wanglu@nwpu.edu.cn (L. Wang), liuysnwpu@mail.nwpu.edu.cn

(Y. Liu), zhao0145@e.ntu.edu.sg (L. Zhao), wqiang0212@mail.nwpu.edu.cn (Q. Wang),

zenggxy@nwpu.edu.cn (X. Zeng), kachen@nwpu.edu.cn (K. Chen).

closure into grids, the source localization can be recast as a sup-

port recovery problem in the context of sparse signal recovery

[5–14] . Localization algorithms differ from each other in how the

reverberant effects are characterized and the speciﬁc sparse recov-

ery method adopted. In [5,6] , it is concluded that the reverberant

effects can be properly exploited to enhance the source localization

by assuming a precisely known impulse response (RIR) between

any pair of source and sensor. In [7–11] , the reverberation and the

ambient noise are treated altogether as residual or distortion to

be accounted for in weak reverberant ﬁeld. In that case, reverber-

ations can be tactfully modeled as spatially colored noise [9–11] .

However, performances of those algorithms decrease as reverber-

ation intensity increases. In [12] , signal measured by the micro-

phone array is decomposed into a diffuse ﬁeld containing a serial

of plane waves corresponding to the reverberant part, and a source

ﬁeld attributed to sparse monopole sources. Subsequently, sources

can be localized by simultaneously identifying both ﬁelds with a

structured sparsity constraint [12] . In [13,14] , the point source-to-

microphone impulse responses are estimated based on the image

model [15] and the source localization is formulated as a joint

sparse recovery problem, which can be conveniently solved by var-

ious off-the-shelf model-based sparse signal recovery algorithms,

such as iterative hard thresholding and 

− 

norm minimization

[16–18] .

http://dx.doi.org/10.1016/j.sigpro.2017.09.005

L. Wang et al. / Signal Processing 143 (2018) 232–240 233

However, to localize all source images, sparse approximation of

the spatial spectrum of virtual sources has to be considered in a

huge expanded free-space. In other words, an large over-complete

dictionary should be constructed to allow for sparse representa-

tion, which would ultimately result in a problem with catastrophic

dimensionality [13,14] . If the number of sources are known a priori,

it is possible to solve effectively the sparse recovery problem with

huge dimensionality by greedy algorithms. In practice, the source

number is practically unknown, the huge dimensionality will be

problematic and induce high computational complexity. Moreover,

the accuracy of greedy algorithms is often very limited, which is

another drawback of the conventional approaches.

Since the reverberant ﬁeld is modeled by a superposition of the

projections associated with the source images in [13] , a stronger

reverberant ﬁeld would generate a higher order source images re-

quired by the image model technique. The discretized enclosure is

then accordingly expanded into a huge free-space. Our work is mo-

tivated by the major drawback of the induced huge dimensionality

in [13,14] . To reduce the size of the problem within the enclosure,

we merely discretize the inner planar area of the enclosure into

grids and construct the corresponding dictionary by calculating the

images of the microphone array rather than those of the potential

sound sources. In this way, the multi-path effect can be character-

ized by a weighted superposition of the media Green’s functions

with weights being the reﬂective energy ratios of different orders.

Since the reﬂective energy ratio is generally unknown, the prob-

lem can be formulated into a sparse signal recovery and paramet-

ric dictionary learning problem, which is a more elegant way of

solving the huge dimensionality issue. A sparse Bayesian method

is proposed to automatically localize the sources and estimate the

unknown parameter of the dictionary, which is facilitated by the

variational Bayesian Expectation and Maximization (VBEM) tech-

nique [19–21] . To the best of our knowledge, this work is the pio-

neering one in introducing the parametric dictionary to model an

unknown reverberant ﬁeld in a statistical way. The joint sparsity in

frequency is exploited to further improve the localization and dic-

tionary learning performances. Numerical simulation results have

demonstrated that the proposed method achieve high resolution,

low computational complexity, low sidelobes and high robustness

for multiple sources.

The rest of the paper is organized as follows. In Section 2 , the

sparse signal model will be formulated and the corresponding dic-

tionaries will be constructed. The source localization under strong

reverberant environment is formulated as a parametric Bayesian

dictionary learning problem in Section 3 . In Section 4 , numerical

simulation results will be presented to demonstrate the effective-

ness of the proposed method. Section 5 concludes the paper.

2. Sparse signal model

Suppose the sources are located on a two-dimensional plane in

a rectangular room with ﬁnite impedance walls and the measure-

ments obtained with a linear microphone array of M sensors are

transformed into the spatial-spectral domain. The point source-to-

microphone impulse responses of the room considering the mul-

tipath effect can be calculated based on the image model [13] ,

where each reﬂective wave can be treated as a signal coming from

a virtual source with a power equal to the reﬂective energy ratio

of the wall. The image method is an example of simpliﬁed ray-

based modeling of room reverberation where specular reﬂections

are considered. Such a simpliﬁcation is justiﬁable when the diffrac-

tion and its interference effects found in wave propagation are in-

signiﬁcant. For example, the wavelength of the sound is small com-

pared to the dimensions of the reﬂecting surfaces in the room and

large compared to any structural details or surface texture, which

is generally the case for ordinary rooms. By discretizing the inner

Microphone array

image 3

Source

image2

Microphone array

image 2

Microphone array

image 1

Source

image3

Source

image1

Fig. 1. The illustration of the equivalence. Without loss of generality, two perpen-

dicular walls marked in red line are considered. The inner plane containing the

actual sources is divided into grids. (For interpretation of the references to color in

this ﬁgure legend, the reader is referred to the web version

of this article.)

planar area of the enclosure into N grids, the projections of source

located at cell n and received by microphone at grid m , can be

characterized by the media Green’s function

(

m, n

)

= x



γ =0

4 π



− s

n,γ



exp



− j2 π f



− s

n,γ





, (1)

where s

n, γ

represents the location of the γ th virtual source cor-

responding to the actual source located at cell n with the reﬂec-

tive energy ratio of β

; R is the number of source images; c is the

speed of sound; and x

is the source amplitude of frequency f .

2.1. Dictionary constructed in [13]

In [13] , the N -cell grid of the room is expanded into N

-cell

free-space to contain all the active actual-virtual sources. Subse-

quently, a free-space propagation model with R = 0 in Eq. (1) , is

considered for the projection between N

potential source loca-

tions and M microphone positions. Consequently, a dictionary D

of size M × N

can be constructed with its element d

( m, n ) given

(

m, n

)

4 π



− s



exp



− j2 π f



− s





where n = 1 , 2 , ··· , N

. Set

{

}

n =1

contains all sources and their

image sources in a large expanded free space. If each source has

R images, N

should be equal to (R + 1) N. A stronger reverberant

ﬁeld would generally require a larger R . A moderate reverberant

strength in practice could result in a N

much larger than N , ulti-

mately leading to computationally expensive sparse recovery pro-

cedures. It should be noted that a large N

probably result in an

unsolvable sparse recovery problem as will demonstrated later in

Section 4 .

2.2. Proposed parametric dictionary construction

In this paper, to restrict the problem size to N , a parameterized

dictionary of size M × N is constructed merely on the inner grids of

the enclosure. Using the equality



− s

n,γ



− r

m,γ



, pro-

jections in Eq. (1) can be equivalently written as

(

m, n

)

= x



γ =0

4 π



− r

m,γ



exp



− j2 π f



− r

m,γ





(2)

where r

m, γ

is the γ th image of the microphone m . The equiva-

lence is illustrated in Fig. 1 . Notably, the distance between the γ th

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38551070

粉丝: 3
资源: 900

参数贝叶斯字典学习在强混响环境中的声源定位

贝叶斯网络的参数学习研究贝叶斯网络的参数学习研究

基于结构相似性的非参数贝叶斯字典学习算法

稀疏贝叶斯字典学习空时机动目标参数估计算法.docx

基于非参数贝叶斯模型和深度学习的古文分词研究.pdf

Bayesian Nonparametric, 非参数贝叶斯模型

贝叶斯机器学习前沿进展综述_朱军

稀疏贝叶斯学习的代码 稀疏贝叶斯学习的代码

《贝叶斯深度学习原理 》

贝叶斯网络学习、推理与应用

Python-Python软件包利用PyTorch的变分推理来促进使用贝叶斯深度学习方法

贝叶斯网络学习算法――k2算法

贝叶斯学习

贝叶斯机器学习前沿进展综述.pdf

贝叶斯统计机器学习ppt

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

磁悬浮系统自适应模糊PID控制器的设计

基于BP神经网络的人口预测

最新资源

稀疏贝叶斯学习的代码稀疏贝叶斯学习的代码

《贝叶斯深度学习原理》