【免费】OutlierDetectionTechniquesforProcessminingApplications资源-CSDN文库

流程挖掘

频繁模式

需积分: 0 45 浏览量 2023-12-11 16:41:31 上传评论收藏 262KB DOCX 举报

资源推荐

资源详情

资源评论

Outlier Detection Techniques for Process Mining Applications

Abstract. Classical outlier detection approaches may hardly fit process mining applications, since in

these settings anomalies emerge not only as deviations from the sequence of events most often registered

in the log, but also as deviations from the behavior prescribed by some (possibly unknown) process

model. These issues have been faced in the paper via an approach for singling out anomalous evolutions

within a set of process traces, which takes into account both statistical properties of the log and the

constraints associated with the process model. The approach combines the discovery of frequent

execution patterns with a cluster-based anomaly detection procedure; notably, this procedure is suited

to deal with categorical data and is, hence, interesting in its own, given that outlier detection has mainly

been studied on numerical domains in the literature. All the algorithms presented in the paper have been

implemented and integrated into a system prototype that has been thoroughly tested to assess its

scalability and effectiveness.

摘要：经典的离群值检测方法可能很难适合流程挖掘应用，因为在这些设置中，异常不仅表现

为与最常记录在日志中的事件序列的偏差，而且还表现为与某些(可能未知的)流程模型规定的

行为的偏差。这些问题在论文中已经通过一种在流程轨迹的集合中挑出异常演化的方法得到了

解决，该方法同时考虑了日志的统计特性和与流程模型相关的约束。该方法将频繁执行模式的

发现与基于集群的缺陷检测过程相结合;值得注意的是，这一过程适合于处理分类数据，因此，

考虑到文献中离群值检测主要是在数值域上进行的研究，它本身就很有趣。论文中提出的所有

算法都已实现并集成到一个系统原型中，该原型已经过全面测试，以评估其可伸缩性和有效性。

1 Introduction

1 引言

Several efforts have recently been spent in the scientific community and in the industry to exploit

data mining techniques for the analysis of process logs [12], and to extract high-quality knowledge on

the actual behavior of business processes (see, e.g., [6,3]). In a typical process mining scenario, a set of

traces (registering the sequencing of activities performed along several enactments) is given to hand and

the aim is to derive a model explaining all the episodes recorded in them. Eventually, the “mined” model

is used to (re)design a detailed process schema, capable to support forthcoming enactments. As an

example, the event log (over activities a, b, ...o) shown in the right side of Figure 1 might be given in

input, and the goal would be to derive a model like the one shown in the left side, representing a

simplified process schema according to the intuitive notation where precedence relationships are

depicted as directed arrows between activities (e.g., b must be executed after a and concurrently with

c).

最近，科学界和业界都在努力开发数据挖掘技术，以分析流程日志[12]，并提取关于业务

流程的实际行为的高质量知识(参见，[6,3])。在一个典型的流程挖掘场景中，会提供一组集合

(记录沿几个规则执行的活动的顺序)，目的是导出一个模型来解释其中记录的所有事件。最后，

“挖掘”模型被用来(重新)设计一个详细的流程模式，能够支持即将到来的实施。例如，图 1右

侧所示的事件日志(关于活动 a、b、…o)可能在输入中给出，目标是派生出一个类似于左侧所示

的模型，根据直观的符号表示一个简化的流程模式，其中优先关系被描述为活动之间的定向箭

头(例如，b 必须在 a 之后执行，并与 c 并发)。

In the paper, this peculiar aspect of process mining is investigated and the problem of singling out

exceptional individuals (usually referred to as outliers in the literature) from a set of traces is addressed.

在论文中，研究了流程挖掘的这个特殊方面，并讨论了从轨迹的集合中挑出例外个体(通

常在文献中称为离群值)的问题。

Outlier detection has already found important applications in bioinformatics [1], fraud detection [5],

and intrusion detection [9], just to cite a few. When adapting these approaches for process mining

applications, novel challenges however come into play:

离群点检测在生物信息学[1]、欺诈检测[5]和入侵检测[9]等领域已经有了重要的应用。在将

这些方法应用于流程挖掘应用时，出现了新的挑战:

(C1) On the one hand, looking only at the sequencing of the events may be misleading in some cases.

Indeed, real processes usually allow for a high degree of concurrency, and are to produce a lot of traces

that only differ in the ordering between parallel tasks. Consequently, the mere application of existing

outlier detection approaches for sequential data to process logs may yield many false positives, as a

notable fraction of task sequences might have very low frequency in the log. As an example, in Figure

1, each of the traces in {s1, ..., s5} rarely occurs in the log, but it is not to be classified as anomalous.

Indeed, they correspond to a different interleaving of the same enactment, which occurs in 10 of 40

traces.

(C1) 一方面，只看事件的先后次序在某些案例下可能会产生误导。实际上，真正的流程通常

允许高度的并发性，并且会产生许多只在并行任务之间的顺序不同的轨迹。因此，仅仅将现有

的用于顺序数据的离群值检测方法应用于流程日志可能会产生许多假阳性结果，因为有相当一

部分任务序列在日志中出现的频率可能非常低。例如，在图 1 中，{

𝑠

，…，

𝑠

}很少出现在日

志中，但不属于异常。事实上，它们对应于同一实施的不同交错，发生在 40 条轨迹中的 10 条。

(C2) On the other hand, considering the compliance with an ideal schema may lead to false negatives,

as some trace might well be supported by a model, yet representing a behavior that deviates from that

observed in the majority of the traces. As an example, in Figure 1, traces

and

correspond to the

same behavior where all the activities have been executed. Even though this behavior is admitted by the

process model on the left, it is anomalous since it only characterizes 3 of 40 traces.

(C2) 另一方面，考虑到与理想模式的遵从性可能会导致假负性，因为一些轨迹可能很好地得

到模型的支持，但表示的行为却与在大多数轨迹中观察到的行为不同。例如，在图 1 中，轨迹

和

对应于执行所有活动的相同行为。尽管左边的流程模型承认了这种行为，但它是异常的，

因为它只描述了 40 条轨迹中的 3 条。

Facing (C1) and (C2) is complicated by the fact that the process model underlying a given set of traces

is generally unknown and must be inferred from the data itself. E.g., in our running example, a

preliminary question is how we can recognize the abnormality of s9, ..., s14, without any a-priori

knowledge about the model for the given process.

面对(C1)和(C2)很复杂，因为在给定的轨迹集合下的流程模型通常是未知的，必须从数据本身

推断。例如，在我们的运行示例中，一个初步的问题是我们如何能够识别

𝑠

，…，

𝑠

的异常，

而没有任何关于给定流程的模型的先验知识。

Addressing this question and subsequently (C1) and (C2) is precisely the aim the paper, where an outlier

detection technique tailored for process mining applications is discussed. In a nutshell, rather than

extracting a model that accurately describes all possible execution paths for the process (but, the

anomalies as well), the idea is of capturing the “normal” behavior of the process by simpler (partial)

models consisting of frequent structural patterns. More precisely, outliers are found by a two-steps

approach:

� First, we mine the patterns of executions that are likely to characterize the behavior of a given

log. In fact, we specialize earlier frequent pattern mining approaches to the context of process

logs, by (i) defining a notion of pattern which effectively characterizes concurrent processes by

accounting for typical routing constructs, and by (ii) presenting an algorithm for their

identification.

� Second, we use an outlier detection approach which is cluster-based, i.e., it computes a

clustering for the logs (where the similarity measure roughly accounts for how many patterns

jointly characterize the execution of the traces) and finds outliers as those individuals that

hardly belong to any of the computed clusters or that belong to clusters whose size is

definitively smaller than the average cluster size.

解决这个问题以及随后的(C1)和(C2)正是论文的目的，其中讨论了为流程挖掘应用量身定制的

离群点检测技术。简而言之，与其提取一个精确描述流程所有可能执行路径的模型(但是，也

包括异常情况)，还不如通过由频繁的结构模式组成的更简单(部分)的模型来捕获流程的“正常”

行为。更准确地说，异常值是通过两步方法找到的:

� 首先，我们挖掘可能描述给定日志的行为的执行模式。实际上，我们通过(i)定义一个

模式概念(通过考虑典型的路由构造有效地并发流程特征)，以及(ii)提出一种识别它们的

算法，专门化了早期对流程日志上下文的频繁模式挖掘方法。

� 其次，我们使用了一种基于聚类的离群点检测方法，即，它为日志计算一个聚类(相似

度度量大致说明了有多少模式共同描述了轨迹的执行)，并发现异常值为那些几乎不属

于任何计算得到的聚类的个体，或者属于其大小绝对小于平均聚类大小的聚类的个体。

剩余16页未读，继续阅读

评论收藏

内容反馈

ProgrammerMonkey

粉丝: 43
资源: 37

Outlier Detection Techniques for Process mining Applications

最新资源

Outlier Detection Techniques for Process mining Applications

Outlier Detection Techniques.pdf

Yang_Zhang_Outlier_detection_techniques_for_wireless_sensor_networks

Characteristics and classification of outlier detection

A Survey of Outlier Detection Methodologies

论文研究-An Optimization Model for Outlier Detection in Categorical Data.pdf

Effective time series outlier detection algorithm based on segmentation

\Multivariate outlier detection in exploration geochemistry

Outlier Detection in Near Infra_Red Spec.pdf

Distance-based Outlier Detection in Data Streams.pdf

论文研究-A Unified Subspace Outlier Ensemble Framework for Outlier Detection.pdf

Maximum Likelihood Outlier Detection

Density-Based Cluster- and Outlier Detection

outlier detection.pdf

LOF.rar_LOF_outlier_outlier detection

robust regression and outlier detection

OUTLIER DETECTION IN GRAPHS AND NETWORKS

论文研究-A Fast Greedy Algorithm for Outlier Mining.pdf

Algorithm for Fast Spatial Outlier Detection

Outlier Detection using Structural Scores in a High dimensional Space

Spotting Outliers in Large Distributed Datasets using

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

农村公交与异构无人机协同配送优化

李飞飞自传 我看见的世界 The World I see

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

4个亲测好用的ChatGPT4渠道

基于小波与卷积神经网络的多尺度时间序列分类.zip

最新资源

李飞飞自传我看见的世界 The World I see