【免费】论文《流程日志中丢失事件的启发式恢复》翻译资源-CSDN文库

流程挖掘

事件日志

需积分: 0 154 浏览量 2024-02-26 00:06:23 上传评论收藏 580KB DOCX 举报

资源推荐

资源详情

资源评论

Heuristic Recovery of Missing Events in Process Logs

Abstract—Event logs are of paramount significance for process mining and complex

event processing. Yet, the quality of event logs remains a serious problem. Missing

events of logs are usually caused by omitting manual recording, system failures, and

hybrid storage of executions of different processes. It has been proved that the problem

of minimum recovery based on a priori process specification is NP-hard. State-of-the-

art approach is still lacking in efficiency because of the large search space. To address

this issue, in this paper, we leverage the technique of process decomposition and present

heuristics to efficiently prune the unqualified sub-processes that fail to generate the

minimum recovery. We employ real-world processes and their incomplete sequences

to evaluate our heuristic approach. The experimental results demonstrate that our

approach achieves high accuracy as the state-of-the-art approach does, but it is more

efficient.

Keywords ： missing events; heuristic recovery; Petri nets; process decomposition;

trace replaying

流程日志中丢失事件的启发式恢复

摘要：事件日志对于流程挖掘和复杂事件处理具有重要意义。然而，事件日志的质量仍然

是一个严重的问题。日志丢失事件通常是由于忽略了手动记录、系统故障和不同流程的执

行的混合存储而导致的。证明了基于先验流程规范的最小回收问题是 NP-hard 问题。由于

搜索空间大，最先进的方法仍然缺乏效率。为了解决这个问题，在这篇论文中，我们利用

了流程分解的技术，并提出了启发式的方法来有效地剪枝不合格的（即不能产生）最小恢

复的子流程。我们使用真实世界的流程及其不完全序列来评估我们的启发式方法。实验结

果表明，该方法与现有的方法相比具有较高的精度，但效率更高。

关键词：缺失事件;启发式恢复;Petri 网;流程分解;轨迹重演

1. INTRODUCTION

Contemporarily, business processes of enterprises are event-driven [1], [2]. For one

thing, the executions of business processes are triggered and piloted by events. For

another, business processes continuously generate large volume of event data when

they are in execution. Accordingly, event logs play an increasingly crucial role in

today’s companies. However, the quality of event logs still remains a big problem,

which seriously affects the analysis results of many business intelligence functionalities,

e.g., complex event processing [3], provenance analysis [4], and process mining [5], to

name a few.

Missing events are the major reasons for the low-quality event logs [6], [7]. In

practice, there are several scenarios which could cause event missing in process logs,

such as omitting manual recording, temporary system failures, and hybrid storage of

executions of different processes [6], [7], [8], [9]. We are not surprised that it is hard to

recover the missing events without any predefined knowledge. In this paper, we study

the problem of automatic recovery of missing events in the light of a priori process

specification. Without losing generality, we employ a special kind of Petri nets [10],

that is, workflow nets [11], to represent process specifications.

For an incomplete event sequence, there can be multiple recoveries because of

parallel routings (i.e., concurrencies) and alternative routings (i.e., choices) in the

process specification. If the specification involves iterative routings (i.e., loops), the

number of recoveries is even infinite. Following the minimum change principle in

improving data quality [12], [13], we focus on finding the optimal recovery which is

minimally different from the original incomplete sequence. This is a rational choice,

because the probability of missing more events is much smaller than that of missing

less events. Unfortunately, it is an NP-hard problem to find the minimum recovery of

an incomplete sequence in a predefined process specification [7]. On the other hand,

the process system continuously produces event sequences in real-time. Hence, the

generated sequences can be regarded as a kind of big data. For these reasons, it is of

great importance to efficiently recover the incomplete sequences.

In order to fulfill the efficient recovery, the state-of-the-art approach (i.e., the

branching approach [7]) first maps a process specification modeled as a Petri net into a

homomorphic specification (called branching net) which does not contain OR-joins (i.e.,

merges of different alternative braches), and then utilizes advanced indexing and pruning

techniques to reduce the search space. In the spite the fact that the branching approach is

much more efficient than the exhaustive search, there is still much room for the

improvement.

In this paper, we present a more efficient approach for the minimum recovery.

Considering the fact that a process specification may be composed by several

independent subprocesses, the minimum recovery of an incomplete sequence can only

be in one of the sub-processes. With this in mind, we propose heuristics to identify the

sub-process that can generate the minimum recovery, while other sub-processes that

fail to generate the minimum recovery are pruned without further investigation. In this

way, the search space is significantly reduced. According to the qualified sub-process,

we propose a linear-time algorithm of trace replaying to find the minimum recovery.

This algorithm is promising in practice. For one thing, it avoids enumerating all possible

event interleavings to find the minimum recovery. For another, it further decreases the

time cost by combining the conformance checking [5] and subsequent sequence

recovery into one single step.

To evaluate our approach, we perform experiments by applying our approach and

the state-of-the art approach (i.e., the branching approach [7]) to a data set which

consists of 36 real-world process specifications and their incomplete event sequences.

The experimental results demonstrate that our approach achieves high accuracy for the

recovery, and is more efficient than the branching approach.

In a nutshell, the key contribution of our work is threefold:

 We present a linear time algorithm which is based on trace replaying to

generate the minimum recovery of an incomplete event sequence from a

qualified sub-process with no choices.

 We present a heuristics recovery approach for process specifications with

choices and loops. More specifically, we first decompose the process into

several subprocesses, and then present heuristics to prune unqualified sub-

processes that fails to generate the minimum recovery, thus significantly

accelerating the recovery process.

 We evaluate our approach by employing a real-world data set to compare our

approach with the state-of-the-art approach. The experimental results show

the effectiveness and efficiency of our approach.

The remainder of this paper is organized as follows. Section II introduces the

background of the research problem. Section III proposes our sequence recovery

approach. Section IV shows our experimental results. Section V reviews related

work. Section VI concludes the paper and envisions the future work.

1 引言

同时，企业的业务流程是事件驱动的[1]、[2]。首先，业务流程的执行是由

事件触发和引导的。另一方面，业务流程在执行时不断地生成大量事件数据。

因此，事件日志在当今的公司中扮演着越来越重要的角色。然而，事件日志的

质量仍然是一个大问题，这严重影响了许多业务智能功能的分析结果，例如复

杂事件处理[3]、来源分析[4]和流程挖掘[5]等。

事件缺失是导致事件日志质量低下的主要原因[6]、[7]。在实践中，有几种

情况可能会导致流程日志中的事件丢失，例如忽略手动记录，临时系统故障以

及不同进程执行的混合存储[6]，[7]，[8]，[9]。我们并不感到惊讶，这是很难恢

复丢失的事件没有任何预定义的知识。在本文中，我们依据一个先验的流程规

范研究自动恢复丢失的事件问题。在不失一般性的情况下，我们采用一种特殊

的 Petri 网[10]，即工作流网[11]来表示流程规范。

对于不完整的事件序列，由于流程规范中存在并行路由（即，并发）和替

代路由（即，选择），会存在多种修复序列。如果规范涉及迭代路由（即，循

环），恢复的数量甚至是无限的。遵循改善数据质量的最小变化原则[12]，[13]，

我们专注于寻找与原始不完整序列差异最小的最佳恢复。这是一个理性的选择，

因为丢失更多事件的概率远小于错过更少事件的概率。不幸的是，从预定义流

程规范中找到不完整序列的最小恢复是 NP-hard 问题[7]。另一方面，流程系统

连续地实时产生事件序列。因此，生成的序列可以被视为一种大数据。因此，

有效地恢复不完整的序列是非常重要的。

为了实现有效的恢复，最先进的方法(即分支方法[7])首先将流程规范成的

Petri 网映射为不包含 OR-joins(即不同可选分支的合并)的同态规范(称为分支网)，

然后利用先进的索引和剪枝技术来减少搜索空间。尽管分支搜索法比穷举搜索

法高效得多，但仍有很大的改进空间。

在本文中，我们提出了一种更有效的最小恢复方法。考虑到流程规范可能

由几个独立的子流程组成，不完整序列的最小恢复只能在其中一个子流程中。

考虑到这一点，我们提出了识别能够生成最小恢复的子流程的启发式方法，而

其他不能生成最小恢复的子流程则在没有进一步研究的情况下被剪枝。这样，

搜索空间就大大减少了。根据限定子流程，提出了一种轨迹重演的线性时间算

法来求最小恢复。该算法具有较好的应用前景。首先，它避免列举所有可能的

事件交错来寻找最小恢复。另一方面，通过将合规检查[5]和后续的序列恢复合

并到一个步骤中，进一步降低了时间成本。

为了评估我们的方法，我们通过将我们的方法和最新的方法(即分支方法[7])

应用于一个数据集合来执行实验，该数据集由 36 个真实世界的流程规范及其不

完整的事件序列组成。实验结果表明，该方法具有较高的恢复精度，且比分支

方法更有效。

简而言之，我们工作的主要贡献有三个方面:

 我们提出了一种基于轨迹重演的线性时间算法，从无选择的限定子流

程中产生不完整事件序列的最小恢复。

 我们提出了一种启发式恢复方法，用于带有选择和循环的流程规范。

更具体地说，我们首先将流程分解为几个子流程，然后提出启发式方

法来删除未生成最小恢复的不合格子流程，从而显著加快恢复流程。

 我们通过使用真实世界的数据集合来比较我们的方法和最先进的方法

来评估我们的方法。实验结果表明了该方法的有效性和效率。

这篇论文的其余部分组织如下。第二部分介绍了研究问题的背景。第三部

分提出了序列恢复方法。第四部分是我们的实验结果。第五节审查有关工作。

第六部分对论文进行了总结，并对今后的工作进行了展望。

II. RELATED WORK

Although data recovery is intensively studied in the field of data mining [14],

recovering process logs is a novel research topic. It is notable that log recovery is highly

relevant to conformance checking of business processes, so we first review related work

on conformance checking.

As a kind of process mining technique, conformance checking focuses on

calculating the consistency degree (ranging from 0 to 1.0) between a process and its

event log [15], [16], [17], [18]. It has two major applications. First, it is used to

determine to what extent an event sequence conforms to a predefined process. Second,

剩余32页未读，继续阅读

评论收藏

内容反馈

ProgrammerMonkey

粉丝: 43
资源: 37

论文《流程日志中丢失事件的启发式恢复》翻译

大学毕业论文 针对二进制病毒的启发式扫描技术

论文研究-启发式路径下节约里程的订单分批算法.pdf

人工智能启发式图搜索算法

论文研究 - 语言理解中的启发式

论文研究-两阶段启发式算法求解带时间窗的多中心车辆路径问题.pdf

论文研究-基于因果图启发式的并行概率规划求解.pdf

论文研究-基于启发式规则的网页主题信息精确定位方法.pdf

《【大学生实习日志】试论大学生野外实习中的启发式教学教育论文》.doc

论文研究-航空公司机队的鲁棒性规划启发式算法.pdf

论文研究-一种脑启发式的边缘检测模型.pdf

启发式分割算法.rar_启发式 分割_启发式分割_启发式分割算法_突变_突变点

启发式扫描技术

生产排程之启发式算法

论文研究 - 基于高效启发式的两阶段转运问题方法

边缘计算环境中基于启发式算法的深度神经网络卸载策略.zip

论文研究 - 求解机器调度问题的Job Shop启发式算法

论文研究-多启发式规则融合粒子群算法的受限项目调度.pdf

启发式搜索

迷宫搜索问题-利用启发式搜索 c++

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

李飞飞自传 我看见的世界 The World I see

农村公交与异构无人机协同配送优化

4个亲测好用的ChatGPT4渠道

2023泛娱乐社交出海手册-ZEGO即构科技

最新资源

大学毕业论文针对二进制病毒的启发式扫描技术

启发式分割算法.rar_启发式分割_启发式分割_启发式分割算法_突变_突变点

李飞飞自传我看见的世界 The World I see