大型强子对撞机上用于新物理挖掘的变体自动编码器资源-CSDN文库

Open

Access

102 浏览量 2020-04-05 19:21:59 上传评论收藏 1.66MB PDF 举报

资源推荐

资源详情

资源评论

JHEP05(2019)036

Published for SISSA by Springer

Received: December 6, 2018

Revised: February 18, 2019

Accepted: April 18, 2019

Published: May 7, 2019

Variational autoencoders for new physics mining at

the Large Hadron Collider

Olmo Cerri,

Thong Q. Nguyen,

Maurizio Pierini,

Maria Spiropulu

and Jean-Roch Vlimant

California Institute of Technology,

1200 E California Blvd, Pasadena, CA 91125, U.S.A.

CERN,

Espl. des Particules 1, 1217 Meyrin, Switzerland

E-mail: olmo@caltech.edu, thong@caltech.edu, Maurizio.Pierini@cern.ch,

smaria@caltech.edu, jvlimant@caltech.edu

Abstract: Using variational autoencoders trained on known physics processes, we de-

velop a one-sided threshold test to isolate previously unseen processes as outlier events.

Since the autoencoder training does not depend on any speciﬁc new physics signature, the

proposed procedure doesn’t make speciﬁc assumptions on the nature of new physics. An

event selection based on this algorithm would be complementary to classic LHC searches,

typically based on model-dependent hypothesis testing. Such an algorithm would deliver a

list of anomalous events, that the experimental collaborations could further scrutinize and

even release as a catalog, similarly to what is typically done in other scientiﬁc domains.

Event topologies repeating in this dataset could inspire new-physics model building and

new experimental searches. Running in the trigger system of the LHC experiments, such

an application could identify anomalous events that would be otherwise lost, extending the

scientiﬁc reach of the LHC.

Keywords: Beyond Standard Model, Hadron-Hadron scattering (experiments)

ArXiv ePrint: 1811.10276

Open Access,

 The Authors.

Article funded by SCOAP

https://doi.org/10.1007/JHEP05(2019)036

JHEP05(2019)036

Establishing alternative search methodologies with reduced model dependence is an

important aspect of future LHC runs. Traditionally, this issue was addressed with so-

called model-independent searches, performed at the Tevatron [4, 5], at HERA [6], and at

the LHC [7, 8], as discussed in section 2.

In this paper, we propose to address this need by deploying an unsupervised algorithm

in the online selection system (trigger) of the LHC experiments.

This algorithm would be

trained on known SM processes and could be able to identify BSM events as anomalies.

The selected events could be stored in a special stream, scrutinized by experts (e.g., to

exclude the occurrence of detector malfunctions that could explain the anomalies), and even

released outside the experimental collaborations, in the form of an open-access catalog. The

ﬁnal goal of this application is to identify anomalous event topologies and inspire future

supervised searches on data collected afterwards.

As an example, we consider the case of a typical single-lepton data stream, selected

by a hardware-based Level-1 (L1) trigger system. In normal conditions, the L1 trigger

is the ﬁrst of a two-steps selection stage. After a coarse (and often local) reconstruction

and loose selection at L1, events are fully reconstructed in the High Lever Trigger (HLT),

where a much tighter selection is applied. The selection is usually done having in mind

speciﬁc signal topologies, eg., speciﬁc BSM models. In this study, we imagine to replace

this model-dependent selection with a variational autoencoder (VAE) [11, 12] looking for

anomalous events in the incoming single-lepton stream. The VAE is trained to compress

the input event representation into a lower-dimension latent space and then decompress it,

returning the shape parameters describing the probability density function (pdf) of each

input quantity given a point in the compressed space. In addition, a VAE allows a stochastic

modeling of the latent space, a feature which is missing in a simple AE architecture. The

highlighted procedure is not speciﬁc of the considered single-lepton stream and could be

easily extended to other data streams.

The distribution of the VAE’s reconstruction loss on a validation sample is used to

deﬁne a threshold, corresponding to a desired acceptance rate for SM events. All the

events with loss larger than the threshold are considered as potential anomalies and could

be stored in a low-rate anomalous-event data stream. In this work, we set the threshold

such that ∼ 1000 SM events would be collected every month under typical LHC operation

conditions. In particular, we took as a reference 8 months of data taking per year, with an

integrated luminosity of ∼ 40 fb

−1

. Assuming an LHC duty cycle of 2/3, this corresponds

to an average instantaneous luminosity of ∼ 2.9 × 10

−2

−1

We then evaluate the BSM production cross section that would correspond to a signal

excess of 100 BSM events selected per month, as well as the one that would give a signal

yield ∼ 1/3 of the SM yield. For this, we consider a set of low-mass BSM resonances,

decaying to one or more leptons and light enough to be challenging for the currently

employed LHC trigger algorithms.

A description of the ATLAS and CMS trigger systems can be found in ref. [9] and ref. [10], respectively.

In this study, we take the data-taking strategy of these two experiments as a reference. On the other hand,

the proposed strategy could be adapted to other use cases.

– 2 –

JHEP05(2019)036

This paper is structured as follows: we discuss related works in section 2. Section 3

gives a brief description of the dataset used. Section 4 describes the VAE model used in

the study, as well as a set of fully-supervised classiﬁers used for performance comparison.

Results are discussed in section 5. In section 6 we discuss how such a procedure could

be deployed in a typical LHC experiment while relying exclusively on data. Conclusions

are given in section 7. Appendix A provides a brief comparison between VAEs and plain

autoencoders (AEs).

2 Related work

Model-independent searches for new physics have been performed at the Tevatron [4, 5],

HERA [6], and the LHC [7, 8]. These searches are based on the comparison of a large set

of binned distributions to the prediction from Monte Carlo (MC) simulations, in search for

bins exhibiting a deviation larger than some predeﬁned threshold. While the eﬀectiveness

of this strategy in establishing a discovery has been a matter of discussion, a recent study by

the ATLAS collaboration [8] rephrased this model-independent search strategy into a tool

to identify interesting excesses, on which traditional analysis techniques could be performed

on independent datasets (e.g., the data collected after running the model-independent

analysis). This change of scope has the advantage of reducing the trial factor (i.e., the

so-called look-elsewhere eﬀect [13, 14]), which would otherwise wash out the signiﬁcance of

an observed excess.

Our strategy is similar to what is proposed in ref. [8], with two substantial diﬀerences:

(i) we aim to process also those events that could be discarded by the online selection, by

running the algorithm as part of the trigger process; (ii) we do so exploiting deep-learning-

based anomaly detection techniques.

Applying deep learning at the trigger level has been proposed in ref. [15]. Recent

works [16–19] have investigated the use of machine-learning techniques to setup new strate-

gies for BSM searches with minimal or no assumption on the speciﬁc new-physics scenario

under investigation. In this work, we use VAEs [11, 12] based on high-level features as a

baseline. Previously, autoencoders have been used in collider physics for detector moni-

toring [20, 21] and event generation [22]. Autoencoders have also been explored to deﬁne

a jet tagger that would identify new physics events with anomalous jets [23, 24], with a

strategy similar to what we apply to the full event in this work.

Anomaly detection has been a traditional use case for one-class machine learning meth-

ods, such as one-class Support Vector Machine [25] or Isolation Forest [26, 27]. A review

of proposed methods can be found in ref. [28]. Variational methods have been shown to

be eﬀective for novelty detection, as for instance is discussed in ref. [29]. In particular,

VAEs [11] have been proposed as an eﬀective method for anomaly detection [12].

3 Data samples

The dataset used for this study is a reﬁned version of the high-level-feature (HLF) dataset

used in ref. [15]. Proton-proton collisions are generated using the PYTHIA8 event-generation

library [30], ﬁxing the center-of-mass energy to the LHC Run-II value (13 TeV) and the

– 3 –

JHEP05(2019)036

average number of overlapping collisions per beam crossing (pileup) to 20. These beam

conditions loosely correspond to the LHC operating conditions in 2016.

Events generated by PYTHIA8 are processed with the DELPHES library [31], to emulate

detector eﬃciency and resolution eﬀects. We take as a benchmark detector description the

upgraded design of the CMS detector, foreseen for the High-Luminosity LHC phase [32].

In particular, we use the CMS HL-LHC detector card distributed with DELPHES. We run

the DELPHES particle-ﬂow (PF) algorithm, which combines the information from diﬀerent

detector components to derive a list of reconstructed particles, the so-called PF candi-

dates. For each particle, the algorithm returns the measured energy and ﬂight direction.

Each particle is associated to one of three classes: charged particles, photons, and neutral

hadrons. In addition, lists of reconstructed electrons and muons are given.

Many SM processes would contribute to the considered single-lepton dataset. For sim-

plicity, we restrict the list of relevant SM processes to the four with the highest production

cross sections, namely:

• Inclusive W production, with W → `ν (` = e, µ, τ ).

• Inclusive Z production, with Z → `` (` = e, µ, τ).

• t

t production.

• QCD multijet production.

These samples are mixed to provide a SM cocktail dataset, which is then used to train

autoencoder models and to tune the threshold requirement that deﬁnes what we consider

an anomaly. The cocktail is built scaling down the high-statistics samples (t

t, W , and

Z) to the lowest-statistics one (QCD, whose generation is the most computing-expensive),

according to their production cross-section values (estimated at leading order with PYTHIA)

and selection eﬃciencies, shown in table 1.

Events are ﬁltered at generation requiring an electron, muon, or tau lepton with p

22 GeV. Once detector eﬀects are taken into account through the DELPHES simulation,

events are further selected requiring the presence of one reconstructed lepton (electron or

muon) with transverse momentum p

> 23 GeV and a loose isolation requirement Iso <

0.45. If more than one reconstructed lepton is present, the highest p

one is considered.

The isolation for the considered lepton ` is computed as:

Iso =

p6=`

, (3.1)

where the index p runs over all the photons, charged particles, and neutral hadrons within

a cone of size ∆R =

∆η

+ ∆φ

< 0.3 from `.

To speed up the generation process for QCD events, we require

√

ˆs > 10 GeV, the fraction of QCD

events with

√

ˆs < 10 GeV and producing a lepton within acceptance being negligible but computationally

expensive.

As common for collider physics, we use a Cartesian coordinate system with the z axis oriented along

the beam axis, the x axis on the horizontal plane, and the y axis oriented upward. The x and y axes deﬁne

the transverse plane, while the z axis identiﬁes the longitudinal direction. The azimuth angle φ is computed

from the x axis. The polar angle θ is used to compute the pseudorapidity η = −log(tan(θ/2)). We ﬁx units

such that c = ~ = 1.

– 4 –

剩余28页未读，继续阅读

评论收藏

内容反馈

weixin_38748555

粉丝: 6
资源: 934

大型强子对撞机上用于新物理挖掘的变体自动编码器

大型强子对撞机的重子希格斯

大型强子对撞机的最新消息

大型强子对撞机和100 TeV质子对撞机上的长寿命动物

大型强子对撞机Pb-Pb碰撞中的热质子屈服异常及其分辨率

在大型强子对撞机上探测沉重的希格斯玻色子

高发光度大型强子对撞机的电弱相互作用粒子的间接探针

在大型强子对撞机上使用ATLAS探测器测量的s = 13 TeV pp相互作用中的带电粒子分布

大型强子对撞机上的750 GeV希格斯玻色子的暗区闪耀

大型强子对撞机可能通过重的希格斯玻色子衰变发现潜在的成因

类矢量夸克耦合大型强子对撞机和未来强子对撞机的歧视

在大型强子对撞机中探索新物理学的规模：希格斯数据示例

大型强子对撞机的希格斯玻色子具有较大的横向动量

在大型强子对撞机的一项新实验中寻找带电粒子

大型强子对撞机的线性BESS模型

在大型强子对撞机和未来对撞机上通过希格斯物理学探究风味非一般性理论

大型强子对撞机寻找暗区淋浴

通过大型强子对撞机和未来对撞机的精确测量来探测新的电弱状态

在大型强子对撞机和未来的对撞机上探索三重态-四重态铁离子暗物质

Origin绘制相关性热图插件(Correlation Plot)

（免费）Chrome浏览器插件axure-chrome-extension

noc指导教师资格认证题库

vep视频快速加密提取器

2011-2022年北大数字普惠金融指数数据（包括省市县）.zip

最新版YS9082HC主控开卡工具 YS9082HC-MPToolV8.00.00.18.826-HCS1A25E2023062

糖尿病数据集diabetes.csv（免费）

IEEE 802.11be（WiFi7） 协议原文pdf文档

Mann -kendall突变检验的MATLAB代码

全国统计用区划代码和城乡划分代码(2023版)

最新资源

IEEE 802.11be（WiFi7）协议原文pdf文档