【免费】论文《自动日志解析的工具和基准》翻译资源-CSDN文库

流程挖掘

日志解析

需积分: 0 118 浏览量 2023-08-20 16:16:09 上传评论收藏 754KB DOCX 举报

资源推荐

资源详情

资源评论

Tools and Benchmarks for Automated Log Parsing

自动日志解析的工具和基准

Abstract—Logs are imperative in the development and maintenance process of many

software systems. They record detailed runtime information that allows developers and

support engineers to monitor their systems and dissect anomalous behaviors and errors.

The increasing scale and complexity of modern software systems, however, make the

volume of logs explodes. In many cases, the traditional way of manual log inspection

becomes impractical. Many recent studies, as well as industrial tools, resort to powerful

text search and machine learning-based analytics solutions. Due to the unstructured nature

of logs, a first crucial step is to parse log messages into structured data for subsequent

analysis. In recent years, automated log parsing has been widely studied in both academia

and industry, producing a series of log parsers by different techniques. To better understand

the characteristics of these log parsers, in this paper, we present a comprehensive

evaluation study on automated log parsing and further release the tools and benchmarks

for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets

spanning distributed systems, supercomputers, operating systems, mobile systems, server

applications, and standalone software. We report the benchmarking results in terms of

accuracy, robustness, and efficiency, which are of practical importance when deploying

automated log parsing in production. We also share the success stories and lessons learned

in an industrial application at Huawei. We believe that our work could serve as the basis

and provide valuable guidance to future research and deployment of automated log parsing.

摘要—日志在许多软件系统的开发和维护中是必不可少的。他们记录详细的运行信

息，是开发人员和支持工程师能够监视他们的系统，并分析异常行为和错误。然而，

现代软件系统不断增长的规模和复杂性使得日志的数量爆炸式增长。在许多案例中，

传统的手工日志检查方法变得不具有实用性。最近的许多研究，以及工业工具，都

求助于强大的文本搜索和基于机器学习的分析方法来解决问题。由于日志的无结构

性，第一个关键步骤就是将日志转化为有结构的数据以便于后面的分析。近年来，

自动日志解析在学术界和工业界都进行了广泛的研究，并通过不同的技术产生了一

系列日志解析器。为了更好地理解这些日志解析器的特征，在本文中，我们对自动

日志解析的研究进行了全面的评估，并进一步发布了简易重用的工具和基准。更具

体地说，我们在总共 16 个日志数据集上（包含分布式系统、超级计算机、操作系统、

移动系统、服务器应用和脱机软件上的日志数据集）评估了 13 个日志解析器。我们

根据准确率、鲁棒性和效率方面报道了基准测试结果，准确率、鲁棒性和效率对于

在产品中部署自动日志解析是非常重要的。我们还分享了在企业应用（如华为）中

分享成功故事和经验教训。我们相信，我们的工作可以作为基础。同时为未来的研

究和自动日志解析的部署提供有价值的指导。

Index Terms—Log management, log parsing, log analysis, anomaly detection, AIOps

关键词—日志管理，日志解析，日志分析，异常检测，AIOps（Artificial Intelligence for

IT Operations）

1. INTRODUCTION

1. 引言

Logs play an important role in the development and maintenance of software systems. It is

a common practice to record detailed system runtime information into logs, allowing

developers and support engineers to understand system behaviours and track down

problems that may arise. The rich information and the pervasiveness of logs enable a wide

variety of system management and diagnostic tasks, such as analyzing usage statistics [1],

ensuring application security [2], identifying performance anomalies [3], [4], and

diagnosing errors and crashes [5], [6].

日志在软件系统的开发和维护中起着重要作用。将详细的系统运行时信息记录

到日志中是一种通用的做法，这使开发者和支持工程师能够理解系统行为并追踪可

能出现的问题。丰富的信息和广泛的日志能够支持各种各样的系统管理和诊断任务，

例如使用统计进行分析，确保应用程序安全性，识别性能异常，以及诊断错误和崩

溃。

Despite the tremendous value buried in logs, how to analyze them effectively is still a great

challenge [7]. First, modern software systems routinely generate tons of logs (e.g., about

gigabytes of data per hour for a commercial cloud application [8]). The huge volume of

logs makes it impractical to manually inspect log messages for key diagnostic information,

even provided with search and grep utilities. Second, log messages are inherently

unstructured, because developers usually record system events using free text for

convenience and flexibility [9]. This further increases the difficulty in automated analysis

of log data. Many recent studies (e.g., [10]–[12]), as well as industrial solutions (e.g.,

Splunk [13], ELK [14], Logentries [15]), have evolved to provide powerful text search and

machine learning-based analytics capabilities. To enable such log analysis, the first and

foremost step is log parsing [9], a process to parse free-text raw log messages into a stream

of structured events.

尽管日志中隐藏着巨大的价值，如何有效地分析他们仍然是一个巨大的挑战。

首先，现代软件系统通常会生成大量的日志（例如，对于商业云应用程序每个小时

都能生成 GB 级别的数据）。大量的日志数据使得手动检查关键诊断信息的日志消息

变得不切实际，即使提供了搜索和正则表达式工具。第二，日志消息的本质是无结

构的，因为开发者通常使用方便和灵活的自由文本来记录系统事件。这进一步增加

了自动分析日志数据的困难。许多最近的研究，以及工业的解决方案（例如

Splunk， ELK， Logentrics），已经发展到提供强大的文本搜索和基于机器学习的

分析能力。

为了能够使用这样的日志分析，第一步也是最重要的一步就是日志解析，这是一种

将自由原始日志消息解析为有结构的事件流的流程。

As the example illustrated in Fig.1, each log message is printed by a logging statement and

records a specific system event with its message header and message content. The message

header is determined by the logging framework and thus can be relatively easily extracted,

such as timestamp, verbosity level (e.g., ERROR/INFO/DEBUG), and component. In contrast, it

manually writing ad-hoc rules to parse a huge volume of logs is really a time-consuming

and error-prone pain (e.g., over 76K templates in our Android dataset). Especially, logging

code in modern software systems usually update frequently (up to thousands of log

statements every month [17]), leading to the inevitable cost of regularly revising these

handcrafted parsing rules. To reduce the manual efforts in log parsing, some studies [18],

[19] have explored the static analysis techniques to extract event templates from source

code directly. While it is a viable approach in some cases, source code is not always

accessible in practice (e.g., when using third-party components). Meanwhile, non-trivial

efforts are required to build such a static analysis tool for software systems developed

across different programming languages.

传统的日志解析方法依赖于手工编写的正则表达式或 grok 模式[16]来提取事件

模板和关键参数。虽然很简单，但是手工编写专门的规则来解析大量的日志确实是

一件非常耗时且容易出错的事情(例如，在我们的 Android 数据集中有超过 76K 个模

板)。特别是，现代软件系统中的日志代码通常更新得非常频繁(每个月多达数千条

日志语句[17])，导致定期修改这些手工编写的解析规则的不可避免的成本。为了减

少日志解析中的手工工作，一些研究[18]、[19]探索了直接从源代码中提取事件模板

的静态分析技术。虽然在某些案例下这是一种可行的方法，但在实践中源代码并不

总是可访问的(例如，在使用第三方组件时)。与此同时，为跨不同编程语言开发的

软件系统构建这样一个静态分析工具需要付出巨大的努力。

To achieve the goal of automated log parsing, many data-driven approaches have been

proposed from both academia and industry, including frequent pattern mining (SLCT [20],

and its extension LogCluster [21]), iterative partitioning (IPLoM [22]), hierarchical

clustering (LKE [23]), longest common subsequence computation (Spell [24]), parsing tree

(Drain [25]), etc. In contrast to handcrafted rules and source code-based parsing, these

approaches are capable of learning patterns from log data and automatically generating

common event templates. In our previous work [9], we have conducted an evaluation study

of four representative log parsers and made the first step towards reproducible research and

剩余37页未读，继续阅读

评论收藏

内容反馈

ProgrammerMonkey

粉丝: 43
资源: 37

论文《自动日志解析的工具和基准》翻译

logparser:用于自动日志解析的工具包[ICSE'19，TDSC'18，DSN'16]

静态源代码安全分析工具测评基准.pdf

硬盘低级测试,修复和基准测试工具

电压基准源设计ADC中基准电压源CMOS带隙基准电压设计论文资料10篇合集.zip

论文研究-引入SUR方法的基于层次分析模型的基准分析在供应链系统优化中的应用.pdf

详解MySQL基准测试和sysbench工具.doc

电压基准芯片的参数解析及应用技巧

2021营销自动化应用基准报告-神策数据.pdf

nanobench简单的基准测试工具拥有类似TAP的输出方便解析

OpenMMLab检测工具箱和基准.zip

本科毕业论文-带隙基准.pdf

论文研究-一种简单的带隙基准源 .pdf

研究人员：自我监督的结构敏感型学习和人类解析的新基准

微基准测试工具：jmh

论文《基于LSTM的流程实例异常检测:基准测试和调整》翻译

通过抽象语义保留转换从开源存储库获取真实世界的基准程序（计算机硕士毕业论文英文参考资料）.pdf

program-repair.github.io:自动化程序修复参考书目，工具和基准

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

农村公交与异构无人机协同配送优化

李飞飞自传 我看见的世界 The World I see

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

4个亲测好用的ChatGPT4渠道

基于小波与卷积神经网络的多尺度时间序列分类.zip

最新资源

李飞飞自传我看见的世界 The World I see