没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
12页
基于深度学习种子重要性采样的引力波快速天空定位_Swift sky localization of gravitational waves using deep learning seeded importance sampling.pdf
资源推荐
资源详情
资源评论
Swift sky localization of gravitational waves
using deep learning seeded importance sampling
Alex Kolmus,
1, ∗
Gr´egory Baltus,
2
Justin Janquart,
3, 4
Twan van Laarhoven,
1
Sarah Caudill,
3, 4
and Tom Heskes
1
1
Institute for Computing and Information Sciences (ICIS),
Radboud University Nijmegen, Toernooiveld 212, 6525 EC Nijmegen, The Netherlands
2
STAR Institut, Bˆatiment B5, Universit´e de Li`ege, Sart Tilman B4000 Li`ege, Belgium
3
Nikhef, Science Park 105, 1098 XG Amsterdam, The Netherlands
4
Institute for Gravitational and Subatomic Physics (GRASP),
Utrecht University, Princetonplein 1, 3584 CC Utrecht, The Netherlands
(Dated: November 2, 2021)
Fast, highly accurate, and reliable inference of the sky origin of gravitational waves would enable
real-time multi-messenger astronomy. Current Bayesian inference methodologies, although highly
accurate and reliable, are slow. Deep learning models have shown themselves to be accurate and
extremely fast for inference tasks on gravitational waves, but their output is inherently questionable
due to the blackbox nature of neural networks. In this work, we join Bayesian inference and deep
learning by applying importance sampling on an approximate posterior generated by a multi-headed
convolutional neural network. The neural network parametrizes Von Mises-Fisher and Gaussian
distributions for the sky coordinates and two masses for given simulated gravitational wave injections
in the LIGO and Virgo detectors. We generate skymaps for unseen gravitational-wave events that
highly resemble predictions generated using Bayesian inference in a few minutes. Furthermore, we
can detect poor predictions from the neural network, and quickly flag them.
I. INTRODUCTION
Gravitational waves (GWs) have immensely advanced
our understanding of physics and astronomy since 2015
[1–4]. These GWs are observed by the Hanford (H)
and Livingston (L) interferometers of the Laser Inter-
ferometer Gravitational Wave Observatory (LIGO) [5]
and the Advanced Virgo (V) interferometer [6]. The
collaboration between these three detectors has enabled
triple-detector observations of GWs [2], making it pos-
sible to do proper sky localisation of their astrophysical
sources. This additional detector changes the sky distri-
bution from a broad band to a more narrow distribution
[2].
Better early sky localisation capabilities would allow
for real-time multi-messenger astronomy (MMA), observ-
ing astrophysical events through multiple channels - elec-
tromagnetic transients, cosmic rays, neutrinos - only sec-
onds after the GW is detected. MMA is limited to GWs
originating from binary neutron star (BNS) and neutron
star-black hole mergers. According to current literature,
it is unlikely that binary black holes (BBHs) emit an elec-
tromagnetic counterpart during their merger [7, 8]. Cur-
rently, astrophysicists try to collect the non-GW chan-
nels in the weeks after the event. A notable example
is GW170817 [9, 10]. This process takes an enormous
amount of effort, while the obtained data quality is of-
ten sub-optimal. Having all channels observed for the
full duration of the event would be a major leap forward.
Real-time MMA would enable a plethora of new science,
e.g. unravelling the nucleosynthesis of heavy elements
∗
alex.kolmus@ru.nl
using r- and s-processes, more accurate and novel tests
of general relativity, and a deeper understanding of the
cosmological evolution [11–13]. As aforementioned, real-
time MMA relies on the generation of a skymap and it
imposes two limits on the methodology used to obtain
one. First, it needs to be swift in order to allow observa-
tories to turn towards an event’s origin, preferably only
seconds after its observation. Second, the skymap needs
to be as accurate as possible since telescopes have a lim-
ited area they can observe. Below we present current
approaches in generating skymaps for GW events.
Most GW software libraries [14, 15] use Bayesian infer-
ence methods - in particular Markov chain Monte Carlo
(MCMC) and nested sampling [16] - to construct the pos-
terior over all GW parameters. These methods asymp-
totically approach the true distribution given a sufficient
number of samples [17]. Although theoretically optimal,
a chain with around 10
6
to 10
8
samples is required [14]
to closely approximate the true posterior distribution for
a GW event. Even when using Bilby [18] - a modern
Bayesian inference library made for GW astronomy - to
perform the inference for a single BBH event, takes hours
to produce [19]; BNS events take even longer. Bayesian
inference is the most accurate method available for GW
posterior estimation, but its run-time is prohibitively
long when it comes to MMA.
To overcome the speed limitations of the Bayesian ap-
proaches, Singer and Price developed BAYESTAR in
2016 [20], an algorithm that can output a robust skymap
for a GW event within a minute. BAYESTAR realizes
this speedup in two ways. First, it exploits the infor-
mation provided by the matched filtering pipeline used
in the detection of GWs. The inner product between
time strain and matched filters contains nearly all of
the information regarding arrival times, amplitudes and
arXiv:2111.00833v1 [gr-qc] 1 Nov 2021
2
phases, which are critical for skymap estimation. Sec-
ond, Singer and Price derive a likelihood function that
is semi-independent from the mass estimation and does
not rely on direct computation of GW waveforms, allow-
ing for massive speedups and parallelization. Although
BAYESTAR is fast, its predictions tend to be broader
and less precise than those made by Bilby [21].
Deep learning (DL) algorithms have shown themselves
to be exceptionally quick and powerful when handling
high-dimensional data [22, 23]. Therefore, they are an
interesting alternative to the Bayesian methods. Several
papers have proposed methods to estimate the GW pos-
terior, including the skymap, using DL algorithms. Ex-
amples of such algorithms are Delaunoy et al. [24] and
Green and Gair [25]. Delaunoy et al. [24] use a convolu-
tional neural network (CNN) to model the likelihood-
to-evidence ratio when given a strain-parameter pair.
By evaluating a large amount of parameter options in
parallel, they can generate confidence intervals within a
minute. The reported confidence intervals are slightly
wider than those made by Bilby. A completely different
approach was taken by Green and Gair [25]. They show-
case complete 15-parameter inference for GW150914 us-
ing normalizing flows. They apply a sequence of invert-
ible functions to transform an elementary distribution
into a complex distribution [26] which, in this case, is
a BBH posterior. Within a single second, their method
is able to generate 5,000 independent posterior samples
that are in agreement with the reference posterior[27].
A Kolmogorov-Smirnov test confirms that these samples
are very closely resemble the samples that are drawn
from the exact posterior. Both DL methods are fast
and seem to be accurate for the 100 - 1000 simulated
GW events they have been evaluated on. However, these
methods have a few issues: (1) they are both suscepti-
ble to changes in the power spectral density (PSD) and
signal-to-noise ratio (SNR), (2) both are close in perfor-
mance to Bilby but do not match it, (3) they can act un-
predictably outside of the trained strain-parameters pairs
and, even within this space, they can act unpredictably
due to the blackbox nature of neural networks (NNs).
Issues (1) and (2) have been addressed for the normaliz-
ing flow algorithm in a recent paper by Dax et al. [28],
however the robustness guarantees remain behind those
of traditional Bayesian inference.
Our method tries to bridge the gap between Bayesian
inference and DL methods, allowing for fast inference
while still guaranteeing optimal accuracy. It is to be
noted that combining Bayesian inference and DL meth-
ods has recently gained traction in the GW community,
see for example reference [29]. The goal of our algorithm
is to restrict the parameter space such that, via sam-
pling, one can quickly obtain an accurate skymap. We
use a multi-headed CNN to parameterize an independent
sky and mass distribution for a given BBH event. The
model is trained on simulated precessing quasi-circular
BBH signals resembling the ones observed by the HLV
detectors. The parameterized sky and mass distribu-
tions are Gaussian-like and are assumed to approximate
the sky and mass distributions generated by Bayesian
inference. Using the parameterized sky and mass dis-
tributions, we construct a proposal posterior in which
all other BBH parameters are uniformly distributed. By
using importance sampling we can then sample from the
exact reference posterior. This implies that we effectively
match the performance of Bayesian inference in a short
time span, without exploring the entire parameter space.
We stress that this work is a proof of concept to show
the promises of combining NNs and Bayesian inference.
More flexible DL models and BNS events will be consid-
ered in future studies.
This paper is organised as follows. Section 2 discusses
the model architecture and importance sampling scheme.
Section 3 details the performed experiments, including
the model training. Section 4 covers the results of these
experiments and subsequently assesses the performance
of the model and importance sampling scheme by com-
paring it with skymaps generated using Bilby for a non-
spinning BBH system. Conclusions and future endeav-
ours are specified in Section 5.
II. METHODOLOGY
Our inference setup is a two-step method. In the ini-
tial step we infer simple distributions for the sky local-
ization and the masses of the BBH by using a neural
network. Subsequently, we apply importance sampling
to these simple distributions to compute a more accu-
rate posterior. The first subsection describes the role
and implementation of importance sampling. The sec-
ond subsection discusses the neural network setup and
our method for distribution estimation.
A. Importance sampling
High-dimensional distributions in which the majority
of the probability density is confined to a small volume of
the entire space are hard to sample from, which results
in long run times to get proper estimates when using
MCMC methods. A well-known method to cope with this
problem is importance sampling. By using a proposal dis-
tribution q that covers this high probability density re-
gion of the complex distribution p one can quickly obtain
useful samples. There are two requirements when using
importance sampling. First, the desired distribution p
needs to be known up to the normalization constant Z:
p(λ) =
1
Z
θ(λ). Second, the proposal distribution q needs
to be non-zero for all λ where p is non-zero. Importance
sampling can be understood as compensating for the dif-
ference between the distributions p and q by assigning an
importance weight w(λ) to the each sample λ,
w(λ) =
θ(λ)
q(λ)
, (1)
3
where the fraction is the likelihood ratio between the -
not-normalized - p and q. The distribution created by
the reweighted samples will converge to the p distribution
given enough samples [30].
Generating accurate posteriors for GW observations
using MCMC is very time consuming, and thus impor-
tance sampling is an interesting alternative. Importance
sampling requires us to have a viable proposal distribu-
tion. Published posteriors for known gravitational waves
show that the probability density in the posterior is rel-
atively well confined for both the sky location and the
two masses [31]. A Von Mises Fisher (VMF) and Multi
Variate Gaussian (MVG) distribution are good first order
approximations of the sky and mass distribution respec-
tively, and thus suitable to use as a proposal distribution
for importance sampling. We propose to construct this
proposal distribution by assuming a uniform distribution
over all non-spinning BBH parameters, except for the sky
angles which will be represented by a VMF and a MVG
distribution for the masses. Assuming that the BBH pa-
rameters, sky angles, and masses are independent, our
proposal distribution becomes the product of these two
distributions. In the next subsection we discuss how we
create this proposal distribution using a neural network.
Importance sampling demands a likelihood function for
the proposal distribution and the desired distribution. In
the previous paragraph we have discussed how we want
to create a proposal distribution, we will now focus on
the desired distribution p. For the likelihood function of
the GW posterior p(s|λ) we take the definition given by
Canizares et al. [32]:
p(s|λ) ∝ θ(s|λ) = exp
−
hs − h(λ)|s − h(λ)i
2
, (2)
where s is the observed strain, h(λ) is the GW template
defined by parameters λ. The inner product is weighted
by the PSD of the detector’s noise. In practice we use
the likelihood implementation provided by Bilby named
GravitationalWaveTransient.
We now have all the parts needed to discuss how we uti-
lize importance sampling for a given strain s. A trained
neural network parameterizes the proposal distribution
q for the given strain. The proposal distribution gen-
erates n samples, these samples represent possible GW
parameter configurations. For each sample we calculate
the logarithm of the importance weight,
log w(λ) = log θ(s|λ) − log q(λ) + C, (3)
instead of the importance weight w(λ) itself to prevent
numeric under- and overflow. The constant C is added to
set the highest log w(λ) to zero, to prevent very large neg-
ative values from becoming zero when we calculate the
associated likelihood. Since we normalize the weights
afterwards the correct importance weights are still ob-
tained. The reweighted samples represent the desired
distribution p.
If the proposal distribution does not cover the true
distribution well enough, the importance samples will be
dominated by only a single to a few weights if we restrict
the run-time. We can use this as a gauge to check if the
skymap produced by the neural network and importance
sampling is to be trusted.
B. Model
Previous work done by George et al. [33] shows that
convolutional neural networks (CNN) are able to extract
the masses from a BBH event just as well as the currently-
in-use matched filtering. Furthermore, work done by Fan
et al. [34] indicates that 1D CNNs are able to locate GW
origins. We therefore chose to use a 1D CNN to model
both the distribution across the sky for the origin of the
GWs and a multivariate normal distribution for the two
masses of the BBH system.
The network architecture of this 1D CNN is presented
in Figure 1 and consists of four parts: a convolutional
feature extractor and three neural network heads. These
heads are used to specify the two distributions. The fol-
lowing properties were tested or tuned for optimal per-
formance: number of convolutional layers, kernel size,
dilation, batch normalization, and dropout. The model
shown in Figure 1 produced the best result on a valida-
tion set.
The convolutional feature extractor generates a set of
features that characterize a given GW. This set of fea-
tures is passed on to the neural heads. Each head is
specialized to model a specific GW parameter. The first
head determines the sky distribution, the second head
the masses, and the third head the uncertainty over the
two masses. Below we will elaborate on each of these
heads and how they characterize these distributions.
The first head specifies the distribution of the GW ori-
gin. Since the sky is described by the surface of a 3D
sphere, a 2D Gaussian distribution is an ill fit. A suitable
alternative is the Von Mises-Fisher (VMF) distribution
[35] which is the equivalent of a Gaussian distribution on
the surface of a sphere. The probability density function
and the associated negative log-likelihood (NLL) of the
VMF distribution:
p(x|µ, κ) =
κ
4π sinh(κ)
exp
κx
T
µ
(4)
NLL
VMF
(x, µ, κ) = − log(κ) − log(1 − exp(−2κ))
−κ− log(2π) + κx
T
µ , (5)
where x and µ are normalized vectors in R
3
, with
the former being the true direction and the latter being
the predicted direction. κ is the concentration parame-
ter, which determines the width of the distribution. It
plays the same role as the inverse of the variance for
a Gaussian distribution. We use this distribution by
letting the first head output a three-dimensional vector
D = (D
x
, D
y
, D
z
). The norm of D specifies the con-
centration parameter κ, and its projection onto the unit
sphere gives the mean µ, κ = |D|, and µ = D/|D|. These
剩余11页未读,继续阅读
资源评论
易小侠
- 粉丝: 6507
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 运算放大器基础.pdf
- 本仓库包含我们队伍2019年数模美赛题目A、我们的代码及论文
- 基于JavaWeb图书管理系统课程设计软件源码+数据库+实验报告
- 基于Selenium的Java爬虫实战(内含谷歌浏览器Chrom和Chromedriver版本122.0.6172.0)
- WesternDigital-SSD-Dashboard-v4.2.2.5
- 计算机视觉,课后习题部分解答(章毓晋)
- “推荐系统”相关资源推荐
- 软件工程期末复习笔记 快速冲刺
- 毕业设计基于Spring Boot的健身房管理系统源码+数据库+使用文档(高分项目)
- 基于Selenium的Java爬虫实战(内含谷歌浏览器Chrom和Chromedriver版本122.0.6170.3)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功