没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Logram: Efficient Log Parsing Using
n
―
𝑮𝒓𝒂𝒎
Dictionaries
Logram:使用
n
―
𝑮𝒓𝒂𝒎
词典的高效日志解析
Abstract—Software systems usually record important runtime information in their logs. Logs
help practitioners understand system runtime behaviors and diagnose field failures. As logs
are usually very large in size, automated log analysis is needed to assist practitioners in their
software operation and maintenance efforts. Typically, the first step of automated log analysis
is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing
is challenging, because logs are produced by static templates in the source code (i.e., logging
statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed
automated log parsing approaches that have achieved high accuracy. However, as the volume
of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log
parsing. In this work, we propose an automated log parsing approach, Logram, which leverages
n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log
datasets and compared Logram with five state-of-the-art log parsing approaches. We found
that Logram achieves a higher parsing accuracy than the best existing approaches (i.e., at least
10% higher, on average) and also outperforms these approaches in efficiency (i.e., 1.8 to 5.1
times faster than the second-fastest approaches in terms of end-to-end parsing time).
Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently
with the number of Spark nodes (e.g., with near-linear scalability for some logs) without
sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective
online parsing of logs, achieving similar parsing results and efficiency to the offline mode.
Index Terms—Log parsing, Log analysis, N-gram
摘要:软件系统通常在记录中记录重要实时信息。日志帮助从业者更好地理解系统的实时行为,
并且诊断产生的错误。随着日志大小的规模不断变大, 自动化日志分析被用来帮助从业者进行
软件操作和维护工作。通常,自动化日志分析的第一步是日志解析,即将无结构行日志转换为
有结构数据。然而,日志解析是一项挑战,因为日志通过源代码中的静态模板产生,尽管解析
日志时模板通常是不可用的。目前已有的自动化日志解析方法已经能够实现高准确率。然而,
在云计算行业日志的规模极速增长,有效性成为日志解析中主要焦点。在本文工作中,我们提
出一种自动化日志解析方法,Logram,使用
n
―
𝐺𝑟𝑎𝑚
词典来实现高效日志解析。我们在 16 个
公共日志数据集上评估了 Logram,并且用五个最先进的日志解析方法与 Logram 进行对比。我
们发现 Logram 比现有的最好的方法达到的解析准确率更高(即,平均至少高出来 10%)并且
在效率方面也比这些方法的表现更好(按照端到端解析时间来算,比第二名方法快出来 1.8 到
5.1 倍)。进一步将,我们在 Spark 上部署了 Logram,同时我们发现随着 Spark 节点的数量的增
加,Logram 能够有效地进行横向扩展而不牺牲解析准确率。除此之外,我们展示了 Logram 能
够有效支持的在线解析日志,与离线方法相比,能够实现近乎相同的解析结果和效率。
关键词:日志解析,日志分析,
𝑁
―
𝐺𝑟𝑎𝑚
1 INTRODUCTION
1 引言
Modern software systems usually record valuable runtime information (e.g., important
events and variable values) in logs. Logs play an important role for practitioners to
understand the runtime behaviors of software systems and to diagnose system failures [1],
[2]. However, since logs are often very large in size (e.g., tens or hundreds of gigabytes)
[3], [4], prior research has proposed automated approaches to analyze logs. These
automated approaches help practitioners with various software maintenance and operation
activities, such as anomaly detection [5], [6], [7], [8], [9], failure diagnosis [10], [11],
performance diagnosis and improvement [12], [13], and system comprehension [10], [14].
Recently, the fast-emerging AIOps (Artificial Intelligence for IT Operations) solutions also
depend heavily on automated analysis of operation logs [15], [16], [17], [18], [19].
现在软件系统通常实时记录日志中有价值的信息(例如重要事件和变量值)。日
志对从业者来说起到了重要的作用,用来理解软件系统的实时行为并且诊断系统错
误。然而, 由于日志的规模非常大(例如几十或者几百 GB),先前的研究提出分
析日志的自动化方法。这种自动化方法用各种软件维持和操作活动来帮助从业者,
例如缺陷检测,错误分析,性能诊断和提升以及系统理解。最近,快速崛起的 AIOps
(IT 运营的人工智能)解决方案严重依赖操作日志的自动化分析。
Logs are generated by logging statements in the source code. As shown in Figure 1, a
logging statement is composed of log level (i.e., info), static text (i.e., “Found block” and
“locally”), and dynamic variables (i.e., “$blockId”). During system runtime, the logging
statement would generate raw log messages, which is a line of unstructured text that
contains the static text and the values for the dynamic variables (e.g., “rdd 42 20”) that are
specified in the logging statement. The log message also contains information such as the
timestamp (e.g., “17/06/09 20:11:11”) of when the event happened. In other words, logging
statements define the templates for the log messages that are generated at runtime.
Automated log analysis usually has difficulties analyzing and processing the unstructured
logs due to their dynamic nature [5], [10]. Instead, a log parsing step is needed to convert
the unstructured logs into a structured format before the analysis. The goal of log parsing
is to extract the static template, dynamic variables, and the header information (i.e.,
timestamp, log level, and logger name) from a raw log message to a structured format.
Such structured information is then used as input for automated log analysis. He et al. [20]
found that the results of log parsing are critical to the success of log analysis tasks.
在源码中的日志语句中生成日志。图 1 所示,一个日志语句由日志级(即
𝑖𝑛𝑓𝑜
), 静 态 文 本 ( 即 “
Found
𝑏𝑙𝑜𝑐𝑘
” 和 “
locally
”), 以 及 动 态 变 量
(即”$blockID”)。在系统运行时,日志语句将生成行原始日志信息,这是非结构文
本行,它包含静态文本和日志语句中指定的动态变量值(即”
𝑟𝑑𝑑_42_20
”)。日志信
息同样包含如时间戳信息(例如”17/06/09 20:11:11”),用来说明事件发生的时间。
换句话说,日志语句定义了实时生成日志信息的模板。由于非结构日志的动态特性,
我们分析和处理这些日志是具有一定难度的。因此,日志解析在分析日之前需要将
非结构日志转化为有结构的形式。日志解析的目标是从原始日志信息到结构格式提
取静态模板,动态变量以及头信息(即时间戳,日志级别以及记录器名称)。这种结
构化的信息用做自动化日志分析的输入。He 等人发现日志解析的结果对于日志分析
任务的成功来说是非常重要的。
In practice, practitioners usually write ad hoc log parsing scripts that depend heavily on
specially-designed regular expressions [21], [22], [23]. As modern software systems
usually contain large numbers of log templates which are constantly evolving [24], [25],
[26], practitioners need to invest a significant amount of efforts to develop and maintain
such regular expressions. In order to ease the pain of developing and maintaining ad hoc
log parsing scripts, prior work proposed various approaches for automated log parsing [21].
For example, Drain [22] uses a fixed-depth tree to parse logs. Each layer of the tree defines
a rule for grouping log messages (e.g., log message length, preceding tokens, and token
similarity). At the end, log messages with the same templates are clustered into the same
groups. Zhu et al. [21] proposed a benchmark and thoroughly compared prior approaches
for automated log parsing.
实践中,从业者通常编写特别的日志解析脚本,这些脚本严重依赖专门设计的正则
表达式。目前的软件系统通常包含大量不断演变的日志模板,从业者需要投入巨大
的精力来发展和维持这种正则表达式。为了减轻发展和维持特殊日志解析的痛苦,
先前的工作在自动化日志解析方面提出了各种工作。例如,Drain 使用固定深度的
树来解析日志。树的每层定义了一组日志信息的规则(例如,日志消息长度,优先
托肯,以及托肯相似性)。最后,具有相同模板的日志消息被聚类到相同组。Zhu 等
人提出了基线方法,并且全面比较了前人在自动化解析日志领域提出的方法。
Despite the existence of prior log parsers, as the size of logs grows rapidly [1], [2], [27]
and the need for low-latency log analysis increases [19], [28], efficiency becomes an
important concern for log parsing. In this work, we propose Logram, an automated log
parsing approach that leverages n-gram dictionaries to achieve efficient log parsing. In
short, Logram uses dictionaries to store the frequencies of n-grams in logs and leverage the
n-gram dictionaries to extract the static templates and dynamic variables in logs. Our
intuition is that frequent n-grams are more likely to represent the static templates while rare
n-grams are more likely to be dynamic variables. The n-gram dictionaries can be
constructed and queried efficiently, i.e., with a complexity of O(n) and O(1), respectively.
尽管存在先前的日志解析器,随着日志规模的急速增长,以及对低延迟日志分
析需求的增加,日志解析的高效性成为主要关心的问题。在本文中,我们提出 Logram,
一种利用
n
―
𝐺𝑟𝑎𝑚
词典来实现高效日志解析的自动化日志解析方法。简而言之,
Logram 使用词典来存储日志中
n
―
𝐺𝑟𝑎𝑚
的频率,并且利用
𝑛
―
𝐺𝑟𝑎𝑚
词典来提取日
志中的静态模板和动态变量。我们的直觉是,频繁的
n
―
Gram
更可能表征静态模板,
同时不频繁的
n
―
𝐺𝑟𝑎𝑚
更可能表征动态变量。
n
―
𝐺𝑟𝑎𝑚
可以高效的构建和查询,即
构建的时间复杂度为
O
(𝑛)
,查询的时间复杂度为
𝑂(1)
。
We evaluated Logram on 16 log datasets [21] and compared Logram with five state-of-the-
art log parsing approaches. We found that Logram achieves a higher accuracy compared
with the best existing approaches (i.e., at least 10% higher on average), and that Logram
outperforms these best existing approaches in efficiency, achieving a parsing speed that is
1.8 to 5.1 times faster than the second-fastest approaches. Furthermore, as the n-gram
dictionaries can be constructed in parallel and aggregated efficiently, we demonstrated that
Logram can achieve high scalability when deployed on a multi-core environment (e.g., a
Spark cluster), without sacrificing any parsing accuracy. Finally, we demonstrated that
Logram can support effective online parsing, i.e., by updating the n-gram dictionaries
continuously when new logs are added in a streaming manner.
我们在 16 个日志数据集上评估了 Logram,同时将 Logram 与五个最先进的日
志解析方法相比较。我们发现 Logram 与现有的最好方法相比实现了更高的准确率
(即平均至少高出 10%),并且 Logram 在时间性能上超过了目前最好的方法,实现
了比第二名的方法更快出 1.8 到 5.1 倍。此外,
n
―
Gram
词典能够并行构造和高效聚
合,我们展示了当 Logram 部署在多核环境中能够实现高扩展性,而不用牺牲任何
解析准确率。最后,我们展示了 Logram 能够支持高效在线解析,即当新日志以流
的方式添加时,可以不断更新
n
―
Gram
词典。
剩余52页未读,继续阅读
资源评论
ProgrammerMonkey
- 粉丝: 43
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功