没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:介绍了利用掩码图注意力网络提出的新模型CEAM来解决跨多个不同漏洞库集成网络安全漏洞数据的问题。研究关注了不同来源漏洞记录中不一致性带来的挑战,并通过引入非对称掩码聚合和分区注意力机制改进传统的方法。实验结果显示CEAM显著优于现有的最先进实体对齐方法,在漏洞信息整合任务中取得较好的精度、召回率与F1分数表现。 适用人群:安全研究人员、网络安全专家以及相关领域的从业者。 使用场景及目标:本研究适用于网络安全漏洞管理中的自动化漏洞合并应用场合,提升识别相同漏洞信息的精确度从而辅助漏洞数据库补充和完善错误的或遗漏的安全漏洞记录。 其他说明:所提方法在NVD、ICS-CERT等政府机构发布的漏洞报告集上进行了验证。
资源推荐
资源详情
资源评论
Vulnerability Intelligence Alignment via Masked Graph
A�ention Networks
Yue Qin
Indiana University Bloomington
Bloomington, Indiana, USA
qinyue@iu.edu
Yue Xiao
Indiana University Bloomington
Bloomington, Indiana, USA
xiaoyue@iu.edu
Xiaojing Liao
Indiana University Bloomington
Bloomington, Indiana, USA
xliao@indiana.edu
ABSTRACT
Cybersecurity vulnerability information is often sourced from multi-
ple channels, such as government vulnerability repositories, individ-
ually maintained vulnerability-gathering platforms, or vulnerability-
disclosure email lists and forums. Integrating vulnerability infor-
mation from dierent channels enables comprehensive threat as-
sessment and quick deployment to various security mechanisms.
However, automatic integration of vulnerability information, espe-
cially those lacking decisive information (e.g., CVE-ID), is hindered
by the limitations of today’s entity alignment techniques.
In our study, we annotate and release the rst cybersecurity-
domain vulnerability alignment dataset, and highlight the unique
characteristics of security entities, including the inconsistent vul-
nerability artifacts of identical vulnerability (e.g., impact and af-
fected version) in dierent vulnerability repositories. Based on these
characteristics, we propose an entity alignment model, CEAM, for
integrating vulnerability information from multiple sources. CEAM
equips graph neural network-based entity alignment techniques
with two application-driven mechanisms: asymmetric masked ag-
gregation and partitioned attention. These techniques selectively
aggregate vulnerability artifacts to learn the semantic embeddings
for vulnerabilities by an asymmetric mask, while ensuring that the
artifacts critical to the vulnerability identication are always taken
more consideration. Experimental results on vulnerability align-
ment datasets demonstrate that CEAM signicantly outperforms
state-of-the-art entity alignment methods.
CCS CONCEPTS
• Computing methodologies → Natural language processing
;
• Security and privacy;
KEYWORDS
Entity Alignment; Vulnerability Intelligence; Knowledge Graph
Alignment; Graph Attention Networks; Vulnerability Repository
Inconsistency
ACM Reference Format:
Yue Qin, Yue Xiao, and Xiaojing Liao. 2023. Vulnerability Intelligence Align-
ment via Masked Graph Attention Networks. In Proceedings of the 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CCS ’23, November 26–30, 2023, Copenhagen, Denmark
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0050-7/23/11. . .$15.00
https://doi.org/10.1145/3576915.3616686
SIGSAC Conference on Computer and Communications Security (CCS ’23),
November 26–30, 2023, Copenhagen, Denmark. ACM, New York, NY, USA,
16 pages. https://doi.org/10.1145/3576915.3616686
1 INTRODUCTION
There is a wealth of cybersecurity vulnerability information avail-
able through various channels: public vulnerability databases (e.g.,
the National Vulnerability Database [
71
]), individual-maintained
vulnerability-gathering platforms (e.g., SecurityFocus [
14
]), security
advisories of dierent vendors (e.g., Palo Alto Networks Security
Advisories [
41
], Android Security Bulletins [
30
]), vulnerability dis-
closure email lists [
15
,
17
–
19
] and forums [
42
], and many others.
These vulnerability repositories contain a range of vulnerability
artifacts, such as vulnerability type, aected device information,
vulnerability severity score. This contextual information is essential
for security practitioners to prioritize vulnerability remediation and
preventing attacks.
However, the vulnerability artifacts of an identical vulnerabil-
ity typically vary across dierent vulnerability repositories. As an
example, the Linux kernel evolves quickly and has a large number
of versions and derivatives such as Android, Ubuntu, Red Hat, and
various IoT systems. However, for the Linux kernel-related vul-
nerabilities, the NVD often only records the vulnerability artifacts
associated with limited Linux kernel versions and derivatives, while
for identical Linux kernel vulnerabilities, the vulnerability artifacts
(e.g., aected device and version) recorded in the security advisories
of mobile device vendors can supplement those in the NVD [
25
].
Given an organization with both Red Hat and Android devices,
without comprehensive vulnerability artifacts from both NVD and
the Android security advisory, once a Linux kernel vulnerability
was found, the organization cannot fully respond in a timely man-
ner. Hence, integrating vulnerability information from dierent
channels is essential for an organization to gain comprehensive and
credible vulnerability information associated with dierent devices
and OS derivatives, identify early signs of cybersecurity risk, and
eectively contain the threat with proper means.
Challenges in vulnerability alignment
. However, it is non-trivial
to link the same vulnerabilities among dierent sources, especially
for those newly-reported vulnerabilities without unique identi-
ers (e.g., CVE-ID), or those from vulnerability reports with less-
structured format. For instance, news reports [
4
,
7
] indicate that
over 42% of vulnerabilities listed in VulnDB [
5
] do not have CVE-
IDs. This creates a situation where vulnerable software products
may remain unmaintained and untracked, making it challenging
for security engineers to detect and x vulnerabilities that might
impact their products. IT professionals have also expressed con-
cerns [
10
,
20
] about the lack of CVE-IDs, which hinders them from
2202
CCS ’23, November 26–30, 2023, Copenhagen, Denmark Yue Qin, Yue Xiao, and Xiaojing Liao
cross-referencing vulnerabilities documented by other reposito-
ries. Moreover, widely used scanning tools rely on vulnerability
databases indexed by CVE-IDs to determine if products contain
vulnerable software [
6
]. Additionally, as reported in [
54
], the vul-
nerabilities in the IoT vulnerability disclosure forums were typically
released before obtaining CVE-IDs. Despite the lack of CVE-IDs,
these forums often contain valuable vulnerability artifacts, such as
proof of concept (PoC), which is crucial for ecient vulnerability as-
sessment and patch generation. However, multiple commonly-used
public vulnerability repositories, such as the National Vulnerability
Database (NVD), do not provide such detailed information. To this
end, organizations have expended signicant human eorts to estab-
lish links between those IoT vulnerabilities and those in the NVD.
The critical aspect of vulnerability information integration and
management is linking security entities (i.e., vulnerabilities) that
pertain to the same real-world entity across various data sources.
This is a form of entity alignment (EA) problem [
79
] that arises in
vulnerability repositories, when the repositories can be represented
as vulnerability knowledge graphs (KG) that programmatically con-
structed from structured or semi-structured vulnerability reports.
However, to the best of our knowledge, no previous research has
explored cybersecurity entity alignment techniques tailored to the
application of vulnerability artifact integration.
CEAM: design and implementation
. Traditional entity align-
ment methods, which rely on hand-crafted rules, are often less ef-
fective in aligning security entities across dierent KGs due to vari-
ations in their structures and textual features [
92
]. Recent work [
59
]
uses Knowledge Graph Embedding (KGE) models, trained to mea-
sure triple plausibility (e.g., TransE [
46
]) to align equivalent entities
into a unied vector space based on a few seed alignments. How-
ever, such methods are not suitable for security entity alignment as
KGE models cannot generate embeddings for new entities added to
the KG after training. In the context of vulnerability information,
timely updates are crucial, and it is impractical to retrain the KG
embedding model on the entire augmented graph each time new
security entities, such as vulnerabilities, are discovered. Mao et
al. [
67
] consider entity alignment as an assignment problem be-
tween two isomorphic graphs by reordering the entity node indices.
However, adapting such techniques for security entity alignment
is challenging due to the signicant dierences in graph topology
between security KGs constructed from dierent repositories, as
observed in our measurements in §3.4.
In recent years, Graph Neural Network (GNN) has shown great
success in open-domain entity alignment tasks [
68
,
70
,
88
]. This
mechanism allows for recursive information propagation among
neighbors to learn structure-aware entity representations. However,
their core assumptions that identical entities have similar attributes
and neighbors and vice versa do not hold for cross-platform secu-
rity entities. In our study, we observe identical vulnerability shows
inconsistent attributes (vulnerability artifacts, e.g, impact and aected
version) in dierent vulnerability repositories (see §3.4). The main
reason is that a vulnerability is sometimes assessed considering
dierent execution environments such as operating systems, archi-
tectures, congurations, and organization policies, which can lead
to dierent vulnerability artifacts. In addition, dierent repositories
provide vulnerability artifacts in dierent granularity (e.g., level
of details), espe cially for vulnerabilities disclosed in maillists and
forums. We also observe that dierent vulnerabilities can be associ-
ated with a considerable number of identical artifacts, leading to
false positives of the alignment (see §3.4).
Given the aforementioned observations in the application of
vulnerability alignment, we propose an entity alignment model,
CEAM, tailored for the cybersecurity domain. It equips GNN-based
entity alignment model with two application-driven designs: asym-
metric masked aggregation and partitioned attention, to address the
above challenges. We rst aggregate selective attribute informa-
tion to learn the semantic embeddings for security entities by an
asymmetric mask. It computes similar representations for the same
vulnerabilities with inconsistent artifacts in dierent repositories,
by scaling down a partial representation of the inconsistent relation
(e.g., has_product). This will alleviate the false negatives caused
by inconsistencies between positive pairs. Further, we use GNNs
to update entity embeddings with structural information based on
graph topology, where the partitioned attention mechanism en-
sures that the artifacts critical to the vulnerability identication are
always taken more consideration during the propagation, which
cannot be guaranteed by traditional neural networks. Finally, we
use two-layer MLP (Mutilayer Perceptron) to decide whether two
entities are identical according to the discrepancy between entity
embeddings learned by GNNs.
We have implemented CEAM and evaluated it on two anno-
tated entity alignment datasets. We found that CEAM achieves the
precision of 73.4%, the recall of 91.7% and the F1 score of 81.5%,
which outperform the state-of-the-art entity alignment models
CG-MuAlign [
95
], PARIS [
79
] and PRASE [
75
]. Particularly, our
experiments show that the proposed two innovative mechanisms
asymmetric masked aggregation and partitioned attention colle c-
tively improve alignment quality by 10.3% of F1 score on average.
Contributions. The contributions of this paper are as follows:
•
We proposed an entity alignment model, named CEAM, tailored
for the application of vulnerability inconsistency identication
across dierent vulnerability repositories.
•
We released the rst annotated datasets for cyberse curity-domain
entity alignment, and unveil their characteristics that challenge
the assumption made in traditional entity alignment tasks [
45
,
88
,
95
]. These characteristics include inconsistencies in vulnerability
artifacts for identical vulnerabilities across dierent repositories,
as well as similarities in artifacts for dierent vulnerabilities.
•
We discussed two potential applications of CEAM: (1) supplement-
ing vulnerability artifacts across repositories and (2) debunking
erroneous vulnerability artifacts.
•
Our code, datasets, and full-version paper with Appendix are
available at [40].
2 BACKGROUND
2.1 Vulnerability Artifacts
In Table 1, we present examples of common vulnerability artifacts
used in our study. Specically, CVE is the common identiers of cy-
bersecurity vulnerabilities. Weakness characterizes the category of
the vulnerability, and CWE_ID is the identier of Weakness. Product
and Vendor are the name and the provider of the aected products
(e.g., software, hardware, device, etc). Version is short for aected
2203
Vulnerability Intelligence Alignment via Masked Graph A�ention Networks CCS ’23, November 26–30, 2023, Copenhagen, Denmark
versions. Impact is the consequence of exploiting the vulnerabil-
ity. Discoverer is the name of the person or the organization who
reported the vulnerability. Note that there exists relation between
vulnerability artifacts. For example, the relation between a Vulnera-
bility entity and a Discoverer is hasDiscoverer, which can be denoted
as a triplet
(⌘, A, C)
where the relation
A
is decided by the types of
the head entity
⌘
and the tail entity
C
and is in the form of hasTail.
CVSS and CVSS metrics
. The Common Vulnerability Scoring
System (CVSS) is an open and widely-adopted vulnerability sever-
ity scoring standard [
34
], which suggests various kinds of critical
vulnerability artifacts associated with vulnerability severity. More
specically, in CVSSv3.1, vulnerability artifacts consist of eight di-
mensions in the base metrics: Attack Vector (AV), Attack Complexity
(AC), Privileges Required (PR) and User Interaction (UI), Scope (S),
Condentiality (C), Integrity (I) and Availability (A). For instance,
the artifact of AV reects the context by which vulnerability ex-
ploitation is possible (e.g., remotely exploit); and the artifact of AC
describes the conditions beyond the attacker’s control that must
exist in order to exploit the vulnerability.
There also exist artifacts which measure the current state of
exploit techniques or code availability, namely temporal metrics,
i.e., Exploit Code Maturity (E), the existence of any patches or
workarounds, i.e., Remediation Level (RL), or the condence in
the description of a vulnerability, i.e., Report Condence (RC). A
CVSS metric value (e.g., AV:N) represents a pair of a CVSS met-
ric and its value, i.e., the Attack Vector to be Network. The arti-
fact of the CVSS score is calculated according to a correspond-
ing CVSS vector aggregating CVSS metric values. For example, a
CVSSv3.1 base score of 7.5 is calculated given the CVSS vector
(AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H). The formulas to calculate
the base and temporal CVSS scores can be found in [
35
]. Note that
while the current version of CVSS score is v3.1, many vulnerability
reports still use v2 and v3.0. Despite this, both v3.1 and v3.0 share
the same CVSS metrics and score calculation formula. However,
v3.1 provides additional assessment guidance by updating the CVSS
document specication.
Vulnerability proling standard
. According to CNA [
33
], to re-
quest an identier for a newly-found vulnerability, certain informa-
tion is required, such as the vulnerability type, vendor, and aected
equipment. Additional information, such as impact, attack patterns,
and discoverer, are also encouraged. In our research, we refer to
these required and encouraged pieces of information as proling
artifacts (i.e., vendor, aected equipment, weakness, etc.). Other
vulnerability artifacts like CVSS score, CVSS vector are considered
as non-proling artifacts. In our proposed system, the proling arti-
facts are emphasized in a mandatory way by a partitioned attention
mechanism during the aggregation of neighborhood information
to update the representation of a target entity (§5.3).
2.2 GNN-based Entity Alignment
Most GNN-based entity alignment methods [
45
,
68
,
95
] are subject
to the following framework: (1) a GNN to learn node representa-
tions from graph structure and (2) a margin-based loss to rank the
distance between entity pairs. The loss function
! =
’
(8, 9 )∈P
’
(8
,9
)∈P
−
max
d(⌘
8
,⌘
9
)−d(⌘
8
,⌘
9
)+W, 0
Figure 1: KG schema. Blue: entities; Yellow: literal artifacts;
Purple: intermediate nodes.
aims at making equivalent entities
(8, 9 )
close to each other while
maximizing the distance between negative pairs
(8
,9
)
. Here
⌘
8
is
the embedding of entity 8 updated by a GNN layer in the form of:
⌘
;+1
8
← f
⇣
Aggregate
h
,
;
· ⌘
;
:
, ∀: ∈(8 ∪ #
8
)
i⌘
where
#
8
is the set of neighboring nodes around node
8
,
,
;
is
the transformation matrix in layer
;
, and
f
is a non-linear acti-
vation function. Instead of
,
;
, the relation-aware GNN learns a
specic transformation matrix
,
;
A
for each relation
A
. GNN variants
serve the purpose of
Aggregate
by dierent operations such as
normalized mean pooling [
45
] and weighted summation [
83
]. In
this paper, we propose masked aggregation and partitioned attention
in
Aggregate
operation to infuse security domain knowledge into
the learning of entity embeddings.
3 VULNERABILITY KNOWLEDGE BASE
In our study, we rst time annotated and released three vulnerability
knowledge graphs (KGs) based on two governmental vulnerabil-
ity repositories, i.e, National Vulnerability Database (NVD) [
71
]
and ICS-CERT Advisories (ICS-CERT) [
16
], and one security infor-
mation portal, i.e., SecurityFocus (SF) [
14
]. Given those vulner-
ability knowledge graphs, we also generated and released two
cybersecurity-domain entity alignment (EA) datasets by linking
entities from ICS-CERT and SecurityFocus to NVD. The annotated
dataset is available at [
40
]. Note that the crawled data are for infor-
mational purposes only, following all repositories’ terms of service.
Below we explain the annotation process of the vulnerability KGs
(§3.1, §3.2) and the EA dataset (§3.3). A quantitative study in §3.4
showing the particularity of the data demonstrates the challenges
of aligning security entities.
3.1 KG schema
We design the vulnerability KG schema by summarizing the com-
mon artifacts and their relations provided by vulnerability reposito-
ries. Table 1 illustrates the common artifacts provided by the three
vulnerability repositories investigated in our study, i.e., NVD, ICS-
CERT and SF. In our study, we selected the vulnerability repository
considering its popularity, vulnerability report format (including
both structure and semi-structure). We also include one vulner-
ability repository which has special focus (i.e., ICS-CERT which
focuses on the vulnerabilities of Industrial Control Systems).
Figure 1 illustrates a general schema (i.e., entity types and rela-
tions) of the proposed security KGs. These vulnerability artifacts
can be interlinked by the concepts in the following standard secu-
rity databases: Common Vulnerabilities and Exposures (CVE) for
2204
对⽐学习?
此⽅法通过GNN学习节点的表示,使⽤边距损失函数来优化正负
实体对之间的距离,从⽽使相同实体的表示更接近,⽽不同实体
的表示更远。
实体嵌⼊向量
欧式距离或余弦相似度
γ:这是⼀个预设的边际值,⽤来确保正样
本对的距离⽐负样本对的距离⾄少⼩𝛾。
剩余14页未读,继续阅读
资源评论
pk_xz123456
- 粉丝: 1880
- 资源: 558
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功