没有合适的资源?快使用搜索试试~ 我知道了~
Multi-Relational Data Mining using UML for ILP 1
需积分: 3 22 下载量 119 浏览量
2008-04-16
21:41:29
上传
评论
收藏 141KB PDF 举报
温馨提示
试读
12页
Multi-Relational Data Mining using UML for ILP 1
资源推荐
资源详情
资源评论
Multi-Relational Data Mining, using UML for ILP
1
Arno J. Knobbe
1
, Arno Siebes
2
, Hendrik Blockeel
3
, Daniël van der Wallen
4
1
Kiminkii, Postbus 171, 3990 DD Houten, The Netherlands,
a.knobbe@kiminkii.com
2
CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands
arno@cwi.nl
3
K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, B-3001 Heverlee, Belgium
hendrik.blockeel@cs.kuleuven.ac.be
4
Dantec B.V., Pontanusstraat 21, 1093 RX Amsterdam, The Netherlands
daniel@dantec.nl
Abstract.
Although there is a growing need for multi-relational data
mining solutions in KDD, the use of obvious candidates from the field
of Inductive Logic Programming (ILP) has been limited. In our view
this is mainly due to the variation in ILP engines, especially with
respect to input specification, as well as the limited attention for
relational database issues. In this paper we describe an approach which
uses UML as the common specification language for a large range of
ILP engines. Having such a common language will enable a wide range
of users, including non-experts, to model problems and apply different
engines without any extra effort. The process involves transformation
of UML into a language called CDBL, that is then translated to a
variety of input formats for different engines.
1 Introduction
A central problem in the specification of a multi-relational data mining problem is the
definition of a model of the data. Such a model directly determines the type of
patterns that will be considered, and thus the direction of the search. Such
specifications are usually referred to as declarative or language bias in ILP [14]. Most
current systems use logic-based formalisms to specify the language bias (e.g., Progol,
S-CART, Claudien, ICL, Tilde, Warmr [13, 12, 6, 7, 3, 8]). Although most of these
formalisms are quite similar and make use of the same concepts (e.g., types and
modes), there are still differences between the formalisms that make the sharing of the
language bias specification between engines a non-trivial task. The main reasons for
this are:
•
the different formalisms each have their own syntax; the user needs to be familiar
with all of them
1
First published at PKDD 2000.
•
many formalisms contain certain constructs, the semantics of which, sometimes
in a subtle way, reflect behavioral characteristics of the inductive algorithm.
The use of different ILP-systems would be simplified significantly if a common
declarative bias language were available. Such a language should have the following
characteristics:
•
The common language should be usable for a large range of ILP systems, which
means that it should be easy to translate a bias specification from the common
language to the native language of the ILP system
•
It should be easy to learn. This means it should make use of concepts most users
are familiar with. In the ideal case, the whole language itself is a language that
the intended users are familiar with already
•
The bias should not just serve as a necessary prerequisite for running the
induction algorithm, but should also be usable as a shared piece of information or
documentation about a problem within a team of analysts with varying levels of
technical expertise
•
It should be easy to judge the complexity of a problem from a single glance at the
declarative bias. A graphical representation would be desirable.
In this paper we propose the use of the Unified Modeling Language (UML) [2, 15, 16,
17] as the language of choice for specifying declarative bias of such nature. Over the
past few years UML has proven itself as a versatile tool for modeling a large range of
applications in various domains. For ILP the Class Diagrams with their usefulness in
database modeling are specifically interesting. Our discussion will be based on these
diagrams.
Why do we wish to use UML to express bias? First of all, as UML is an intuitive
visual language, essentially consisting of annotated graphs, we can easily write down
the declarative bias for a particular domain or judge the complexity of a given data
model [9]. Another reason for using UML is its widespread use in database (as well as
object oriented) modelling. UML has effectively become a standard with thorough
support in many commercial tools. Some tools allow the reverse engineering of a data
model from a given relational database, directly using the table specifications and
foreign key relations. If we can come up with a process of using UML in ILP
algorithms, we would then have practically automated the analysis process of a
relational database. Finally, UML may serve as a common means of stating
declarative bias languages used in the different ILP engines.
Although it is clear that UML is a good candidate for specifying first order
declarative bias, it may not be directly clear how the different engines will actually be
making use of the UML declarations. Its use in our previously published Multi-
Relational Data Mining framework [10, 11] is straightforward, as this framework and
the related engine Fiji2 have been designed around the use of UML from the outset.
To translate UML bias declarations to logic-based bias languages, we use an
intermediate textual representation, called Common Declarative Bias Language
(CDBL). CDBL is essentially a set of Prolog predicates, which can be easily
processed by the different translation procedures. Translation procedures for the
popular engines Tilde, Warmr and Progol are currently available. The whole process
of connecting an ILP engine to a relational database now becomes a series of
translation steps as is illustrated by the diagram in figure 1. We have implemented and
embedded each of these steps into a single GUI.
The investigation of UML as a common declarative bias language for non-experts
was motivated by the efforts involved in the Esprit IV project Aladin. This project
aims at bringing ILP capabilities to a wider, commercial audience by embedding a
range of ILP algorithms into the commercial Data Mining tool, Clementine.
The outline of this paper is as follows. A section describing UML and its potential
as first order declarative bias follows this introduction. We then give a short overview
of the syntax of the Common Declarative Bias Language. In
Translating CDBL
we
give an algorithm for translating CDBL to ILP input. Next we analyze the usefulness
of UML as a declarative bias language compared to other approaches in
Comparing
UML to traditional languages
. This section is followed by a
Conclusion
.
2 UML
From the large set of modelling tools provided by UML we will focus on the richest
and most commonly used one: Class Diagrams [16]. These diagrams model exactly
the concepts relevant for ILP, namely tables and the relation between them. In fact
when we write UML in this paper we are referring specifically to Class Diagrams.
There are two specific concepts within the Class Diagrams that we will be focusing
on. The first is the concept of
class.
A class is a description of a set of objects that
share the same features, relationships, and semantics. In a Class Diagram, a class is
represented as a rectangle. Typically, a class represents some tangible entity in the
problem domain, and maps to a table in the database.
The second concept is that of
association.
An association is a structural
relationship that specifies that objects of one class are connected to objects of another.
An important aspect of an association is its
multiplicity.
This specifies how many
Figure 1 The complete process of using UML with existing engines.
剩余11页未读,继续阅读
资源评论
morre
- 粉丝: 187
- 资源: 2337
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 记录了贪心,动态规划等算法基本思想与设计.zip
- 基于菲阿里基本模型,以及MACD RSI BooL 等技术指标 构建一套基于贪心算法策略的智能投顾模型.zip
- oj算法代码-贪心算法.zip
- 基于yolov8行人检测源码+模型.zip
- 公开整理-地级市-绿色专利申请、授权数据集(2000-2022年).xlsx
- 基于Transformer模型的图像质量评分模型实现源码+详细说明文档.zip
- CST电磁场仿真+线性螺旋电感+建模步骤细节和RLC端口配置+CST高级建模操作
- 大数据库实验的报告材料材料(word文档良心出品).doc
- AIS2024 valid
- 最入门的爬虫代码 python.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功