没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
LogRank +: A Novel Approach to Support Business Process Event Log Sampling
LogRank +:一种支持业务流程事件日志采样的新方法
Abstract. Massive amounts of business process event logs are collected and stored by
modern information systems. Numerous process discovery approaches have been
proposed to extract descriptive process models from such event logs in the past decades.
To improve process discovery efficiency, event log sampling techniques are proposed.
A sample log is a delicately selected subset of the original log that requires less
computational cost. However, existing sampling techniques have difficulties, e.g., low
efficiency, in handling large-scale event logs. To tackle this challenge, we propose a
novel ranking-based event log sampling approach, denoted as LogRank +, to support
efficient sampling. In addition, we introduce a framework to evaluate the effectiveness
of different sampling techniques by quantifying the sampling efficiency and the quality
of sample logs. The proposed sampling approach has been implemented in the open-
source process mining toolkit ProM. Experimental evaluation with both synthetic and
real-life event logs demonstrates that the proposed sampling approach provides an
effective solution to improve event log sampling efficiency as well as ensuring high
quality of the obtained sample logs from a process discovery perspective.
摘要。现代信息系统收集和存储了大量的业务流程事件日志。在过去的几十年中,
已经提出了许多流程发现方法来从此类事件日志中提取描述性流程模型。为了提
高流程发现效率,提出了事件日志采样技术。样本日志是原始日志的精心选择的
子集,需要较少的计算成本。然而,现有的采样技术在处理大规模事件日志方面
存在困难,例如效率低下。为了应对这一挑战,我们提出了一种新颖的基于排名
的事件日志采样方法,表示为 LogRank +,以支持高效采样。此外,我们引入了
一个框架,通过量化采样效率和采样日志的质量来评估不同采样技术的有效性。
提议的抽样方法已在开源流程挖掘工具包 ProM 中实施。对合成事件日志和真
实事件日志的实验评估表明,所提出的采样方法提供了一种有效的解决方案,可
以提高事件日志采样效率,并从流程发现的角度确保获得的样本日志的高质量。
Keywords: Event logs · Efficient sampling · Process discovery · Effectiveness
evaluation
关键词:事件日志·高效采样·流程发现·效果评估
1 Introduction
1 简介
Process mining [2,14,25] aims at extracting process-oriented insights from business
process event logs that are readily available from modern information systems. Process
discovery, as one of the most fundamental tasks of process mining, allows to uncover
descriptive process models from event logs. Various process discovery approaches, e.g.,
Alpha Miner [3], Heuristic Miner [23], and Inductive Miner [12], that take as input an
event log and produce a process model have been proposed in the past two decades.
However, existing process discovery approaches are unable to handle properly or may
cause low efficiency when facing large-scale event logs.
流程挖掘旨在从现代信息系统中容易获得的业务流程事件日志中提取面向流程
的见解。流程发现作为流程挖掘最基本的任务之一,允许从事件日志中发现描述
性流程模型。在过去的二十年中,已经提出了各种过程发现方法,例如 Alpha
Miner、Heuristic Miner 和 Inductive Miner,它们将事件日志作为输入并生成流程
模型。然而,现有的流程发现方法在面对大规模事件日志时无法正确处理或可能
导致效率低下。
To improve discovery efficiency, one effective strategy is to re-implement existing
discovery approaches using MapReduce to make them scalable to large-scale event logs.
Evermann presents the MapReduce implementations of the Alpha Miner and Heuristic
Miner in [11]. However, the re-implementation process is extremely time-consuming
and requires developers to have extensive knowledge on the underlying discovery
approach. Moreover, re-implementation techniques are specially tailored for specific
approach and cannot be generalized. Rather than re-implementing existing discovery
approaches, event log sampling techniques provide an alternative mean to improve
discovery efficiency. Considering for example the LogRank-based sampling technique
in [18] and [19]. It implements a graph-based ranking model to extract a sample log by
taking an arbitrary event log as input. The sample log is much smaller and can be
processed more efficiently than the original log. Although sampling techniques
facilitate efficient process discovery, the sampling itself is sometimes time-consuming
when handling large-scale event logs. To tackle this challenge, this paper proposes a
novel ranking-based event log sampling approach, denoted as LogRank +, to support
efficient log sampling. In addition, we introduce a framework to evaluate the
effectiveness of different sampling techniques from a process discovery perspective.
为了提高发现效率,一种有效的策略是使用 MapReduce 重新实现现有的发现方
法,使其可扩展到大规模事件日志。 Evermann 在 [11] 中介绍了 Alpha Miner
和 Heuristic Miner 的 MapReduce 实现。然而,重新实现过程非常耗时,并且
需要开发人员对底层发现方法有广泛的了解。此外,重新实现技术是专门为特定
方法量身定制的,不能一概而论。事件日志采样技术不是重新实施现有的发现方
法,而是提供了一种提高发现效率的替代方法。例如考虑 [18] 和 [19] 中基于
LogRank 的采样技术。它实现了一个基于图形的排名模型,通过将任意事件日
志作为输入来提取样本日志。示例日志比原始日志小得多,可以更有效地处理。
虽然采样技术有助于高效的流程发现,但在处理大规模事件日志时,采样本身有
时会很耗时。为了应对这一挑战,本文提出了一种新的基于排名的事件日志采样
方法,表示为 LogRank +,以支持高效的日志采样。此外,我们引入了一个框架
来从过程发现的角度评估不同采样技术的有效性。
The rest of this paper is organized as follows. Section 2 presents a brief review of the
related work. Section 3 defines some preliminaries. Section 4 presents the research
questions and introduces an overview of our approach. Section 5 introduces the
LogRank +-based sampling technique and a framework to evaluate the effectiveness of
a sampling technique. Section 6 presents tool support. Section 7 presents experimental
evaluation. Finally, Sect. 8 concludes the paper.
本文的其余部分安排如下。第 2 节简要回顾了相关工作。第 3 节定义了一些预
备知识。第 4 节提出了研究问题并介绍了我们的方法的概述。第 5 节介绍基于
LogRank + 的抽样技术和评估抽样技术有效性的框架。第 6 节介绍了工具支持。
第 7 节介绍了实验评估。最后,第 8 节总结了论文。
2 Related Work
2 相关工作
Process mining aims to discover, monitor and improve real business processes by
extracting knowledge from event logs [2]. As one of the most challenging process
mining tasks, process discovery has received a lot of attention in the past years. In
general, existing process discovery approaches can be categorized into two types. One
type discovers a process model that can guarantee 100% fitness against the input log,
i.e., all traces from the input log can be replayed by the discovered model. Inductive
Miner is one typical approach [12] of this type. The other type discovers process models
that do not provide 100% fitness guarantee, e.g., Heuristic Miner [23]. These
approaches typically consider traces not included in the discovered process model as
exceptional behavior or noise. Therefore, they are excluded during discovery.
流程挖掘旨在通过从事件日志中提取知识来发现、监控和改进真实的业务流程[2]。
作为最具挑战性的流程挖掘任务之一,过程发现在过去几年受到了很多关注。一
般来说,现有的流程发现方法可以分为两类。一种类型发现了一个流程模型,该
模型可以保证对输入日志 100% 适合,即,输入日志中的所有轨迹都可以由发
现的模型重放。 Inductive Miner 是这种类型的一种典型方法 [12]。另一种类型
发现不提供 100% 适应性保证的流程模型,例如 Heuristic Miner [23]。这些方法
通常将未包含在已发现流程模型中的轨迹视为异常行为或噪声。因此,它们在发
现期间被排除在外。
With the growing availability of event logs from current information systems, large-
scale event logs have posed new performance challenges for existing process discovery
approaches. The main reason is that most discovery approaches are no longer feasible
to process an entire large data set using a single machine, due to the hardware
limitations such as I/O and memory. In such scenarios, the discovery of process models
from large-scale event logs has to resort to current distributed platforms. Considering
for example the well-known MapReduce framework [8] has been used to implement
the existing process discovery algorithms, e.g., [11]. Their implementations reply on
constructing log abstractions such as directly-follows graphs required by current
discovery approaches, and the computation progress of which is done in parallel using
one or several MapReduce jobs. Although these approaches have shown that they can
efficiently speed up standalone algorithms in the presence of large-scale event logs,
their designs still follow a conventional way, i.e., applying computation to all traces in
a log.
随着当前信息系统中事件日志可用性的不断提高,大规模事件日志对现有流程发
现方法提出了新的性能挑战。主要原因是由于 I/O 和内存等硬件限制,大多数
发现方法不再适用于使用单台机器处理整个大型数据集。在这种情况下,从大规
模事件日志中发现流程模型不得不求助于当前的分布式平台。考虑例如著名的
MapReduce 框架 [8] 已用于实现现有的流程发现算法,例如 [11]。它们的实现
依赖于构建日志抽象,例如当前发现方法所需的直接跟踪图,其计算过程是使用
一个或多个 MapReduce 作业并行完成的。尽管这些方法表明它们可以在存在大
规模事件日志的情况下有效地加速独立算法,但它们的设计仍然遵循传统方式,
即对日志中的所有轨迹应用计算。
Rather than re-implementing existing discovery approaches, event log sampling
techniques provide an alternative mean to handle large-scale event logs. For example,
the LogRank-based sampling technique in [18] and [19] is capable of sampling a large-
scale event log to a smaller size that can be efficiently processes by existing discovery
剩余15页未读,继续阅读
资源评论
ProgrammerMonkey
- 粉丝: 43
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功