没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Extraction of Missing Tendency Using Decision Tree Learning in Business
Process Event Log
利用决策树学习提取业务流程事件日志中的缺失趋势
Abstract: In recent years, process mining has been attracting attention as an
effective method for improving business operations by analyzing event logs that
record what is done in business processes. The event log may contain missing
data due to technical or human error, and if the data are missing, the analysis
results will be inadequate. Traditional methods mainly use prediction completion
when there are missing values, but accurate completion is not always possible.
In this paper, we propose a method for understanding the tendency of missing
values in the event log using decision tree learning without supplementing the
missing values. We conducted experiments using data from the incident management
system and confirmed the effectiveness of our method.
Keywords: process mining; business process management; data quality; data
management
摘要:近年来,流程挖掘作为一种有效的方法引起了人们的关注,它通过分析记录业务流程
中所做的事情的事件日志来改善企业运营。事件日志中可能包含由于技术或人为错误而导致
的数据缺失,如果数据缺失,分析结果将是不充分的。传统的方法主要是在有缺失值时使用
预测补全,但准确的补全并不总是可能的。在本文中,我们提出了一种在不补充缺失值的情
况下,利用决策树学习来理解事件日志中的缺失值趋势的方法。我们使用事件管理系统的数
据进行了实验,证实了我们方法的有效性。
关键词:流程挖掘;业务流程管理;数据质量;数据管理
1.Introduction
1.引言
Information systems encourage the efficient execution and management of business
processes. If the results of business process execution are recorded as an event
log, we can effectively use it to improve the business process. The analysis of
event logs is called process mining [1] and has attracted much attention in
recent years. For example, process discovery [1] is a technique to automatically
generate a business process model that satisfies the behavior by inputting an
event log that records who executed what activity at what time. The visualization
of business processes by using business process models can be used to understand
the current situation. In addition to this, process mining techniques can also
be used to check whether a process conforms to the organization’s rules, analyze
process performance, and suggest process improvements. The availability of event
logs will allow for evidence-based analysis and increase opportunities for
business process improvement [2].
信息系统鼓励业务流程的有效执行和管理。如果业务流程的执行结果被记录为事件日志,我
们可以有效地利用它来改进业务流程。对事件日志的分析被称为流程挖掘[1],在最近几年
引起了广泛的关注。例如,流程发现[1]是一种技术,通过输入记录谁在什么时间执行什么
活动的事件日志,自动生成满足行为的业务流程模型。通过使用业务流程模型对业务流程进
行可视化,可以用来了解当前情况。除此之外,流程挖掘技术还可以用来检查一个流程是否
符合组织的规则,分析流程的性能,并提出流程的改进建议。事件日志的可用性将允许进行
基于证据的分析,并增加业务流程改进的机会[2]。
Many process mining algorithms assume that the input event log is of high-
quality. That is, it is required to have no missing values. However, due to
technical (deviations could occur even for automatic logging systems due to
machine breakdowns, system bugs, and resource constraints [3]) or human (human
error) reasons, the event log may contain missing values [4]. A certain level of
errors in event logs is often unavoidable, particularly when event logs are built
by integrating several heterogeneous data sources or where manual logging is
involved [5]. In addition to this, data failures also occur for the reason that
they improve the adaptability of behavior during the execution of a process
instance [6].
许多过程挖掘算法假定输入的事件日志是高质量的。也就是说,它被要求没有缺失值。然而,
由于技术原因(由于机器故障、系统错误和资源限制,即使是自动记录系统也可能出现偏差
[3])或人为原因(人为错误),事件日志可能包含缺失值[4]。事件日志中一定程度的错误
往往是不可避免的,特别是当事件日志是通过整合几个异质数据源建立的,或者涉及到人工
记录时[5]。除此以外,数据失误也会发生,原因是它们可以提高流程实例执行过程中的行
为适应性[6]。
In data analysis, this phenomenon is called “garbage in, garbage out” [7] and
analyzing poor quality data will only yield meaningless results [8,9]. Mans et
al have also shown that the quality of the event log is an important success
factor for process mining projects [5]. Therefore, it is necessary to pre-process
the event log before analyzing the data.
在数据分析中,这种现象被称为 "垃圾进,垃圾出"[7],分析质量差的数据只会得到毫无意
义的结果[8,9]。Mans 等人也表明,事件日志的质量是流程挖掘项目的一个重要成功因素
[5]。因此,在分析数据之前,有必要对事件日志进行预处理。
In the area of process mining, research on pre-processing of event logs has been
conducted. Sim et al proposed a likelihood-based Multiple Imputation to complete
the missing values for events in the event log [10]. Conforti et al proposed a
method for repairing the timestamp with which an event was executed in the event
log. By using these methods, it is possible to complement the missing value in
the event log with the predicted value. However, even if a method of repairing
missing values is used, it is not always possible to guarantee that the missing
parts are repaired correctly to what is truly executed, which may lead to
erroneous repairs. Therefore, it is desirable to prevent missing elements in the
data at the time of data acquisition. References [12,13] state that it is
important to know how to systematically identify the root causes of data quality
problems in event logs. Quality of data can be improved by (i) improving the way
in which data are captured while they are being generated and (ii) improving the
data after they have been acquired [8]. The above studies [10,11] are (ii), while
our study is (i). By using these two perspective methods together, it is expected
that data quality can be further improved.
在流程挖掘领域,人们对事件日志的预处理进行了研究。Sim 等人提出了一种基于似然的多
重替代法来补全事件日志中事件的缺失值。Conforti 等人提出了一种修复事件日志中事件
执行的时间戳的方法。通过使用这些方法,有可能用预测值来补充事件日志中的缺失值。然
而,即使使用了修复缺失值的方法,也不一定能保证将缺失部分正确地修复为真正执行的内
Commented [秋实 1]: 陈悦,刘则渊.悄然兴
起的科学知识图谱[J].科学学研究,200
5,23(2):149�154.
容,这可能导致错误的修复。因此,最好是在数据采集时防止数据中的缺失元素。参考文献
[12,13]指出,知道如何系统地识别事件日志中数据质量问题的根本原因是很重要的。可以
通过以下方式提高数据质量:(1)改进数据产生时的采集方式;(2)改进数据获取后的质量
[8]。上述研究[10,11]是(二),而我们的研究是(一)。通过同时使用这两个角度的方法,
预计可以进一步提高数据质量。
In this paper, we do not repair the missing values, but we propose a method to
understand the tendency of the missing events in the event log. By using decision
tree learning to learn the information around the missing points in the event
log, we can identify the tendency of the missing points from the branching of
the constructed tree. Our proposed method is superior in that it is simple and
easy for users to understand, considering the purpose of user support for process
mining. Furthermore, in the absence of our method, a human would need to look
closely at the data to see where the missing values are occurring, and it would
be ad hoc. On the other hand, since our method can express the tendency of the
occurrence of missing values in the event log by using a decision tree, we think
that our method can analyze the cause of the missing values efficiently. We
conducted experiments using real data of business processes published by Volvo
IT Belgium and confirmed the effectiveness of our method.
在本文中,我们不修复缺失值,但我们提出了一种方法来了解事件日志中缺失事件的趋势。
通过使用决策树学习来学习事件日志中缺失点周围的信息,我们可以从构建的树的分支中识
别缺失点的趋势。考虑到用户支持流程挖掘的目的,我们提出的方法的优点在于它简单易懂,
便于用户理解。此外,在没有我们的方法的情况下,人类将需要仔细查看数据以查看缺失值
出现的位置,这将是临时的。另一方面,由于我们的方法可以通过使用决策树来表达事件日
志中缺失值出现的趋势,我们认为我们的方法可以有效地分析缺失值的原因。我们使用
Volvo IT Belgium 发布的业务流程真实数据进行了实验,并证实了我们方法的有效性。
This paper is organized as follows. Section 2 explains the preliminary knowledge
of this paper. Section 3 explains the proposed method. Section 4 evaluates the
proposed method. Section 5 describes the related works. Section 6 summarizes
this paper.
本文组织如下。第 2 节解释了本文的初步知识。第 3 节解释了所提出的方法。第 4 节评估
所提出的方法。第 5 节介绍了相关工作。第 6 节总结了本文。
2. Background Knowledge
2. 背景知识
Section 2.1 describes about event logs. Section 2.2 describes about data quality
in process mining.
2.1 节描述了事件日志。 2.2 节描述了流程挖掘中的数据质量。
2.1. Event Logs
2.1.事件日志
Event Log is recorded by information systems as a result of events executed in
business processes. Performed events are recorded in a business process instance
(from the start to the end of a case). A trace is a division of executed events
into business process instances, and a trace
T
𝑖
of length n is represented as an
ordered set of events
𝑒
in the following manner.
𝑖
represents the identifier of
the trace in the event log.
剩余15页未读,继续阅读
资源评论
ProgrammerMonkey
- 粉丝: 43
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功