【免费】论文《LogAnomaly:无结构日志中顺序和数量异常的无监督检测》翻译

流程挖掘

缺陷检测

需积分: 0 10 浏览量 2023-09-06 18:46:52 上传评论收藏 840KB DOCX 举报

资源推荐

资源详情

资源评论

LogAnomaly: Unsupervised Detection of Sequential and

Quantitative Anomalies in Unstructured Logs

LogAnomaly:无结构日志中顺序和数量异常的无监督检测

Abstract: Recording runtime status via logs is common for almost computer system, and

detecting anomalies in logs is crucial for timely identifying malfunctions of systems.

However, manually detecting anomalies for logs is time-consuming, error-prone, and

infeasible. Existing automatic log anomaly detection approaches, using indexes rather than

semantics of log templates, tend to cause false alarms. In this work, we propose

LogAnomaly, a framework to model a log stream as a natural language sequence.

Empowered by template2vec, a novel, simple yet effective method to extract the semantic

information hidden in log templates, LogAnomaly can detect both sequential and

quantitive log anomalies simultaneously, which has not been done by any previous work.

Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log

templates between periodic model retrainings. Our evaluation on two public production

log datasets show that LogAnomaly outperforms existing log-based anomaly detection

methods.

摘要：通过日志记录运行时状态在几乎所有计算机系统中都很常见，而检测日志中

的异常对于及时识别系统故障是至关重要的。然而，为日志手工检测异常是费时的，

容易出错的，也是不可行的。现有的自动日志缺陷检测方法，使用索引而不是日志

模板的语义，往往会导致假警报。在这项工作中，我们提出了 LogAnomaly，一个

将日志流建模为自然语言序列的框架。利用 template2vec 这一新颖、简单、有效的

方法提取日志模板中隐藏的语义信息，LogAnomaly 可以同时检测序列日志异常和

定量日志异常，这是以往任何工作都没有做到的。此外，logexception 可以避免周期

性模型再训练之间新出现的日志模板所引起的故障警报。我们对两个公共生产日志

数据集的评估表明，LogAnomaly 优于现有的基于日志的缺陷检测方法。

1 Introduction

1 引言

Today’s large-scale services are becoming increasingly more agile and complicated. A

single service anomaly can impact million of users’ experience [Bu et al., 2018; Zhang et

al., 2015; Ma et al., 2018]. Accurate and timely anomaly detection can help operators

quickly mitigate losses [Zhang et al., 2018b], which is crucial for these services. Large-

scale services usually generate logs, which describe a vast range of events observed by

them, to record system states at runtime. Logs are one of the most valuable data sources

for anomaly detection [Satpathi et al., 2018; Lin et al., 2016; Du et al., 2017; Khatuya et

al., 2018; Nandi et al., 2016; He et al., 2018; Meng et al., 2018; Zhang et al., 2018a].

今天的大型服务正变得越来越敏捷和复杂。单个服务异常可以影响数百万用户

的体验[Bu 等人，2018; Zhan 等人，2015; Ma 等人，2018]。准确及时的缺陷检测可

以帮助运营商快速缓解损失[Zhang 等人，2018b]，这对这些服务至关重要。大规模

服务通常会生成日志，用于描述它们所观察到的一系列大量事件，从而在运行时记

录系统状态。日志是缺陷检测最有价值的数据源之一[Satpathi 等人，2018; Lin 等人，

2016; Du 等人，2017; Khatuya 等人，2018;Nandi 等人，2016; He 等人，2018; 孟等

人，2018; 张等人，2018a]。

A large-scale service and its underlying machines are often implemented/maintained by

hundreds of developers/operators. Usually a developer/operator has incomplete

information of the overall system, and tend to determine anomalous logs from a local

perspective and thus is error-prone. In addition, manual detection of anomalous logs is

becoming infeasible due to the explosion of logs. Keywords (e.g., “fail”) matching and

regular expressions, detecting single anomalous logs based on explicit keywords or

structural features, prevent a large portion of log anomalies from being detected. These

anomalies can only be inferred based on their log sequences which contains multiple logs

violating regular rules. For example, the first four logs in Figure 1 show two normal link

flaps. If we apply keyword matching to detect log anomalies, both L1 and L2, which contain

Commented [秋王 1]: APA 进一步规定：

同一作者的不同文献可用出版年份来区

别，而同一作者在同一年份发表的文献应

对年份另加字母，以示区别。同样的，在

参考文献著录中相应的条目里的年份应加

同样的字母。

be captured by the quantitative relationships of logs, should always hold true under

different workloads. We say a quantitative anomaly occurs if these relationships are broken

for a collection of logs. Existing automatic anomalous log sequence detection approaches

can be broadly classified into two categories: log message counter-based approaches (e.g.,

PCA [Xu et al., 2009], Invariant Mining [Lou et al., 2010], LogClustering [Lin et al., 2016])

to capture quantitative anomalies, and deep learning based approaches (e.g., DeepLog [Du

et al., 2017]) to learn sequential patterns from log sequences. These methods all take log

template indexes as input, which can often induce false alarms. For example, as shown in

Figure 1, words with underlines are variables, and the remaining parts are templates, each

of which is usually indexed by a numerical identifier. Suppose that the above methods have

been trained based on the normal log sequence, i.e.,

𝐿

. When a system generates

𝐿

(the templates of

𝐿

and

𝐿

are very similar but different, and

𝐿

has the

same template with

𝐿

), the above methods will mistakenly think that the log

sequence of

𝐿

is anomalous, based on the observation that

𝐿

and

𝐿

have

different template indexes. Above all, the anomalous log sequence detection problem faces

the following three challenges.

如果日志序列偏离程序流的正常模式，就会发生顺序异常。同时，程序执行有

一些恒定的线性关系，这些关系可以通过日志的数量关系来捕捉，在不同的工作负

载下应该总是成立的。如果一组日志的这些关系被打破，就会出现数量上的异常。

现有的自动异常日志序列检测方法大致可分为两类:基于日志消息计数器的方法(如

PCA [Xu 等人.，2009]，不变挖掘[Lou 等人，2010]， LogClustering [Lin 等人，2016])

捕捉数量异常，而基于深度学习的方法(如 DeepLog [Du 等人，2017])从日志序列中

学习序列模式。这些方法均以日志模板索引为输入，容易产生误报。例如，如图 1

所示，带下划线的单词是变量，其余部分是模板，每个模板通常由数字标识符索引。

假设以上方法都是按照正常的日志序列，即

𝐿

到

𝐿

进行训练的。当系统生成

𝐿

时(

𝐿

和

𝐿

的模板非常相似但又有所不同，

𝐿

与

𝐿

有相同的模板)，上述方法根

据

𝐿

和

𝐿

的模板索引不同，会错误地认为

𝐿

到

𝐿

的日志序列是异常的。首先，异常

日志序列检测问题面临以下三个方面的挑战。

1. Valuable information could be lost if only log template indexes are used, because they

cannot reveal the semantic relations of logs. For example, some templates are similar

in semantics but different in template indexed, and ignoring this similarity can induce

false alarms.

1. 如果只使用日志模板索引，可能会丢失有价值的信息，因为它们不能揭示日志的

语义关系。例如，有些模板在语义上相似，但在模板索引上不同，忽略这种相似

性可能会引起故障警报。

2. Services can generate new log templates between two adjacent periodic re-trainings,

and existing approaches cannot address this problem. For instance, manual feedback

on large number of new log templates (as is done by [Du et al., 2017]) is infeasible in

practice.

2. 服务可以在两个相邻的周期重新训练之间生成新的日志模板，而现有的方法不能

解决这个问题。例如，对大量新的日志模板进行人工反馈(如[Du et al.， 2017]所

做的)在实践中是不可行的。

3. Existing methods cannot detect sequential and quantitative anomalies simultaneously.

3. 现有的方法不能同时检测序列异常和数量异常。

We propose LogAnomaly, a unified data-driven deep-learning framework for anomaly

detection on unstructured log streams. The core idea of LogAnomaly is that most system

logs are semi-structured texts “print”ed by certain procedures of systems, and the intuitions

and methods in natural language processing can be applied or improved for log anomaly

detection. LogAnomaly tackles the above challenges as follows.

我们提出了 LogAnomaly，一个统一的数据驱动深度监测框架，用于非结构化

日志流上的缺陷检测。LogAnomaly 的核心思想是，系统日志大多是由系统的特定

程序“打印”出来的半结构化文本，可以应用或改进自然语言处理中的直观和方法

进行日志缺陷检测。LogAnomaly 以如下方式处理上述挑战。

1. Inspired by word embedding, we design a simple yet effective template representation

剩余27页未读，继续阅读

评论收藏

内容反馈

ProgrammerMonkey

粉丝: 43
资源: 37

论文《LogAnomaly:无结构日志中顺序和数量异常的无监督检测》翻译

无监督异常检测论文集

论文研究-一种基于聚类的无监督异常检测方法.pdf

论文【流程感知信息系统日志中轨迹异常检测的算法】翻译

弱监督目标检测论文.rar

论文研究-基于改进CURE聚类算法的无监督异常检测方法 .pdf

论文《基于概率标签估计的半监督日志缺陷检测》翻译

Anomaly-Detection:无监督和半监督异常检测隔离森林内核PCA检测ADOA等

目标检测经典论文-YOLO论文翻译：（YOLO：统一的实时目标检测）

CAN异常检测——logbert实现

计算机检测技术翻译论文 带中英文

Syslog日志高效解析和异常检测

论文《流程日志中事件顺序缺陷的检测和交互式修复》

基于深度学习的系统日志异常检测研究.pdf

人工智能论文：基于深度学习的目标检测技术综述.docx

人工智能论文：基于深度学习的目标检测技术综述.pdf

论文翻译：一个面向软件需求演变的方法

异常检测论文概述：SDFVAE: Static and Dynamic Factorized VAE for Anomaly Detection

Kdd2017最佳应用论文奖：使用结构化异构信息网络（HIN）来检测 Android恶意软件

视频监控-异常行为检测经典论文

MINOS：基于无监督的 Netflow 检测受感染和被攻击的主机，以及大型网络中的攻击时间（计算机硕士毕业论文英文参考资料）

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

农村公交与异构无人机协同配送优化

李飞飞自传 我看见的世界 The World I see

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

最新资源

计算机检测技术翻译论文带中英文

李飞飞自传我看见的世界 The World I see