没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
LogAnomaly: Unsupervised Detection of Sequential and
Quantitative Anomalies in Unstructured Logs
LogAnomaly:无结构日志中顺序和数量异常的无监督检测
Abstract: Recording runtime status via logs is common for almost computer system, and
detecting anomalies in logs is crucial for timely identifying malfunctions of systems.
However, manually detecting anomalies for logs is time-consuming, error-prone, and
infeasible. Existing automatic log anomaly detection approaches, using indexes rather than
semantics of log templates, tend to cause false alarms. In this work, we propose
LogAnomaly, a framework to model a log stream as a natural language sequence.
Empowered by template2vec, a novel, simple yet effective method to extract the semantic
information hidden in log templates, LogAnomaly can detect both sequential and
quantitive log anomalies simultaneously, which has not been done by any previous work.
Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log
templates between periodic model retrainings. Our evaluation on two public production
log datasets show that LogAnomaly outperforms existing log-based anomaly detection
methods.
摘要:通过日志记录运行时状态在几乎所有计算机系统中都很常见,而检测日志中
的异常对于及时识别系统故障是至关重要的。然而,为日志手工检测异常是费时的,
容易出错的,也是不可行的。现有的自动日志缺陷检测方法,使用索引而不是日志
模板的语义,往往会导致假警报。在这项工作中,我们提出了 LogAnomaly,一个
将日志流建模为自然语言序列的框架。利用 template2vec 这一新颖、简单、有效的
方法提取日志模板中隐藏的语义信息,LogAnomaly 可以同时检测序列日志异常和
定量日志异常,这是以往任何工作都没有做到的。此外,logexception 可以避免周期
性模型再训练之间新出现的日志模板所引起的故障警报。我们对两个公共生产日志
数据集的评估表明,LogAnomaly 优于现有的基于日志的缺陷检测方法。
1 Introduction
1 引言
Today’s large-scale services are becoming increasingly more agile and complicated. A
single service anomaly can impact million of users’ experience [Bu et al., 2018; Zhang et
al., 2015; Ma et al., 2018]. Accurate and timely anomaly detection can help operators
quickly mitigate losses [Zhang et al., 2018b], which is crucial for these services. Large-
scale services usually generate logs, which describe a vast range of events observed by
them, to record system states at runtime. Logs are one of the most valuable data sources
for anomaly detection [Satpathi et al., 2018; Lin et al., 2016; Du et al., 2017; Khatuya et
al., 2018; Nandi et al., 2016; He et al., 2018; Meng et al., 2018; Zhang et al., 2018a].
今天的大型服务正变得越来越敏捷和复杂。单个服务异常可以影响数百万用户
的体验[Bu 等人,2018; Zhan 等人,2015; Ma 等人,2018]。准确及时的缺陷检测可
以帮助运营商快速缓解损失[Zhang 等人,2018b],这对这些服务至关重要。大规模
服务通常会生成日志,用于描述它们所观察到的一系列大量事件,从而在运行时记
录系统状态。日志是缺陷检测最有价值的数据源之一[Satpathi 等人,2018; Lin 等人,
2016; Du 等人,2017; Khatuya 等人,2018;Nandi 等人,2016; He 等人,2018; 孟等
人,2018; 张等人,2018a]。
A large-scale service and its underlying machines are often implemented/maintained by
hundreds of developers/operators. Usually a developer/operator has incomplete
information of the overall system, and tend to determine anomalous logs from a local
perspective and thus is error-prone. In addition, manual detection of anomalous logs is
becoming infeasible due to the explosion of logs. Keywords (e.g., “fail”) matching and
regular expressions, detecting single anomalous logs based on explicit keywords or
structural features, prevent a large portion of log anomalies from being detected. These
anomalies can only be inferred based on their log sequences which contains multiple logs
violating regular rules. For example, the first four logs in Figure 1 show two normal link
flaps. If we apply keyword matching to detect log anomalies, both L1 and L2, which contain
Commented [秋王 1]: APA 进一步规定:
同一作者的不同文献可用出版年份来区
别,而同一作者在同一年份发表的文献应
对年份另加字母,以示区别。同样的,在
参考文献著录中相应的条目里的年份应加
同样的字母。
the keyword “down”, will trigger false alarms. However, it is actually a normal event
because the switch automatically recovers quickly as demonstrated in L4. Consequently,
an automatic anomaly detection method according to log sequences is needed.
一个大规模的服务及其底层机器通常由数百个开发人员/操作人员来实现/维护。通
常,开发人员/操作员对整个系统的信息不完整,并且倾向于从局部角度确定异常日
志,因此很容易出错。此外,由于日志规模的爆炸式膨胀,人工检测异常日志也变
得越来越不可行。关键字(例如“fail”)匹配和正则表达式,基于显式关键字或结构
特征检测单个异常日志,可以防止检测到大量日志异常。这些异常只能根据包含违
反常规规则的多个日志的日志序列来推断。例如,图 1 中的前四个日志显示了两个
正常的链路振荡。如果我们应用关键字匹配来检测日志异常,L1 和 L2 这两个包含
关键字“down”的地方都会触发故障警报。然而,这实际上是一个正常的事件,因
为交换机自动恢复很快,如 L4 所示。因此,需要一种基于日志序列的自动缺陷检
测方法。
sequential anomaly occurs if a log sequence deviates from normal patterns of program
flows. Meanwhile, program execution has some constant linear relationships, which can
be captured by the quantitative relationships of logs, should always hold true under
different workloads. We say a quantitative anomaly occurs if these relationships are broken
for a collection of logs. Existing automatic anomalous log sequence detection approaches
can be broadly classified into two categories: log message counter-based approaches (e.g.,
PCA [Xu et al., 2009], Invariant Mining [Lou et al., 2010], LogClustering [Lin et al., 2016])
to capture quantitative anomalies, and deep learning based approaches (e.g., DeepLog [Du
et al., 2017]) to learn sequential patterns from log sequences. These methods all take log
template indexes as input, which can often induce false alarms. For example, as shown in
Figure 1, words with underlines are variables, and the remaining parts are templates, each
of which is usually indexed by a numerical identifier. Suppose that the above methods have
been trained based on the normal log sequence, i.e.,
𝐿
1
to
𝐿
4
. When a system generates
𝐿
6
(the templates of
𝐿
2
and
𝐿
6
are very similar but different, and
𝐿
1
/
𝐿
3
/
𝐿
4
has the
same template with
𝐿
5
/
𝐿
7
/
𝐿
8
), the above methods will mistakenly think that the log
sequence of
𝐿
5
to
𝐿
8
is anomalous, based on the observation that
𝐿
2
and
𝐿
6
have
different template indexes. Above all, the anomalous log sequence detection problem faces
the following three challenges.
如果日志序列偏离程序流的正常模式,就会发生顺序异常。同时,程序执行有
一些恒定的线性关系,这些关系可以通过日志的数量关系来捕捉,在不同的工作负
载下应该总是成立的。如果一组日志的这些关系被打破,就会出现数量上的异常。
现有的自动异常日志序列检测方法大致可分为两类:基于日志消息计数器的方法(如
PCA [Xu 等人.,2009],不变挖掘[Lou 等人,2010], LogClustering [Lin 等人,2016])
捕捉数量异常,而基于深度学习的方法(如 DeepLog [Du 等人,2017])从日志序列中
学习序列模式。这些方法均以日志模板索引为输入,容易产生误报。例如,如图 1
所示,带下划线的单词是变量,其余部分是模板,每个模板通常由数字标识符索引。
假设以上方法都是按照正常的日志序列,即
𝐿
1
到
𝐿
4
进行训练的。当系统生成
𝐿
6
时(
𝐿
2
和
𝐿
6
的模板非常相似但又有所不同,
𝐿
1
/
𝐿
3
/
𝐿
4
与
𝐿
5
/
𝐿
7
/
𝐿
8
有相同的模板),上述方法根
据
𝐿
2
和
𝐿
6
的模板索引不同,会错误地认为
𝐿
5
到
𝐿
8
的日志序列是异常的。首先,异常
日志序列检测问题面临以下三个方面的挑战。
1. Valuable information could be lost if only log template indexes are used, because they
cannot reveal the semantic relations of logs. For example, some templates are similar
in semantics but different in template indexed, and ignoring this similarity can induce
false alarms.
1. 如果只使用日志模板索引,可能会丢失有价值的信息,因为它们不能揭示日志的
语义关系。例如,有些模板在语义上相似,但在模板索引上不同,忽略这种相似
性可能会引起故障警报。
2. Services can generate new log templates between two adjacent periodic re-trainings,
and existing approaches cannot address this problem. For instance, manual feedback
on large number of new log templates (as is done by [Du et al., 2017]) is infeasible in
practice.
2. 服务可以在两个相邻的周期重新训练之间生成新的日志模板,而现有的方法不能
解决这个问题。例如,对大量新的日志模板进行人工反馈(如[Du et al., 2017]所
做的)在实践中是不可行的。
3. Existing methods cannot detect sequential and quantitative anomalies simultaneously.
3. 现有的方法不能同时检测序列异常和数量异常。
We propose LogAnomaly, a unified data-driven deep-learning framework for anomaly
detection on unstructured log streams. The core idea of LogAnomaly is that most system
logs are semi-structured texts “print”ed by certain procedures of systems, and the intuitions
and methods in natural language processing can be applied or improved for log anomaly
detection. LogAnomaly tackles the above challenges as follows.
我们提出了 LogAnomaly,一个统一的数据驱动深度监测框架,用于非结构化
日志流上的缺陷检测。LogAnomaly 的核心思想是,系统日志大多是由系统的特定
程序“打印”出来的半结构化文本,可以应用或改进自然语言处理中的直观和方法
进行日志缺陷检测。LogAnomaly 以如下方式处理上述挑战。
1. Inspired by word embedding, we design a simple yet effective template representation
剩余27页未读,继续阅读
资源评论
ProgrammerMonkey
- 粉丝: 43
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功