Multi-RelationalDataMiningusingUMLforILP1资源-CSDN文库

需积分: 3 157 浏览量 2008-04-16 21:41:29 上传评论收藏 141KB PDF 举报

标题与描述均提到了“使用UML进行ILP 1的多关系数据挖掘”，这实际上是一种结合了统一建模语言（UML）和归纳逻辑编程（ILP）的技术，用于解决多关系数据挖掘中的问题。多关系数据挖掘是数据挖掘领域的一个分支，它专注于处理包含多个实体以及这些实体之间复杂关系的数据集。这种技术在知识发现过程（KDD）中变得越来越重要，尤其是在涉及关系数据库的场景下。 ### 多关系数据挖掘多关系数据挖掘的目标是从包含多个相互关联实体的数据集中提取有用的信息或模式。与传统的单表数据挖掘不同，多关系数据挖掘需要处理实体之间的关系，这增加了数据的复杂性和挖掘的难度。在实际应用中，多关系数据挖掘可以用于社交网络分析、生物信息学、企业关系管理等多个领域。 ### UML在ILP中的应用归纳逻辑编程（ILP）是一种机器学习方法，特别适合于处理结构化数据，如关系数据库。然而，ILP引擎之间的差异，特别是在输入规格方面的变化，限制了其在多关系数据挖掘中的广泛应用。UML作为一种通用的建模语言，在此背景下被引入到ILP中，作为不同ILP引擎之间的桥梁。通过使用UML，非专家用户也能更容易地定义问题并应用不同的引擎，而无需额外的努力。 ### 从UML到CDBL的转换论文中提到的过程涉及将UML模型转换为一种名为CDBL的语言，然后再将CDBL翻译成各种ILP引擎所需的输入格式。CDBL（Common Database Language）作为中介语言，旨在简化不同ILP引擎之间的转换过程，使得UML模型能够无缝地应用于多种ILP系统。 ### 面临的挑战尽管大多数当前的ILP系统使用基于逻辑的形式主义来指定语言偏置，如Progol、S-CART、Claudien等，但这些形式主义之间仍然存在差异，使得在不同引擎之间共享语言偏置规范成为一项复杂的任务。主要的原因在于： 1. **语法差异**：每种形式主义都有自己的语法，用户必须熟悉所有这些语法才能有效地使用。 2. **语义差异**：许多形式主义包含某些构造，其语义可能微妙地反映不同引擎的行为特性，导致即使在表面上相似的情况下，也可能出现不兼容的情况。 ### 结论使用UML进行ILP 1的多关系数据挖掘，提供了一种潜在的解决方案，以克服不同ILP引擎之间的差异性。通过UML和CDBL的中介作用，可以促进更广泛的数据挖掘应用，特别是对于那些没有深入ILP知识背景的用户。这种方法不仅增强了多关系数据挖掘的灵活性和实用性，也为进一步的研究和应用提供了新的视角。

资源推荐

资源详情

资源评论

Multi-Relational Data Mining, using UML for ILP

Arno J. Knobbe

, Arno Siebes

, Hendrik Blockeel

, Daniël van der Wallen

Kiminkii, Postbus 171, 3990 DD Houten, The Netherlands,

a.knobbe@kiminkii.com

CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands

arno@cwi.nl

K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, B-3001 Heverlee, Belgium

hendrik.blockeel@cs.kuleuven.ac.be

Dantec B.V., Pontanusstraat 21, 1093 RX Amsterdam, The Netherlands

daniel@dantec.nl

Abstract.

Although there is a growing need for multi-relational data

mining solutions in KDD, the use of obvious candidates from the field

of Inductive Logic Programming (ILP) has been limited. In our view

this is mainly due to the variation in ILP engines, especially with

respect to input specification, as well as the limited attention for

relational database issues. In this paper we describe an approach which

uses UML as the common specification language for a large range of

ILP engines. Having such a common language will enable a wide range

of users, including non-experts, to model problems and apply different

engines without any extra effort. The process involves transformation

of UML into a language called CDBL, that is then translated to a

variety of input formats for different engines.

1 Introduction

A central problem in the specification of a multi-relational data mining problem is the

definition of a model of the data. Such a model directly determines the type of

patterns that will be considered, and thus the direction of the search. Such

specifications are usually referred to as declarative or language bias in ILP [14]. Most

current systems use logic-based formalisms to specify the language bias (e.g., Progol,

S-CART, Claudien, ICL, Tilde, Warmr [13, 12, 6, 7, 3, 8]). Although most of these

formalisms are quite similar and make use of the same concepts (e.g., types and

modes), there are still differences between the formalisms that make the sharing of the

language bias specification between engines a non-trivial task. The main reasons for

this are:

•

the different formalisms each have their own syntax; the user needs to be familiar

with all of them

First published at PKDD 2000.

•

many formalisms contain certain constructs, the semantics of which, sometimes

in a subtle way, reflect behavioral characteristics of the inductive algorithm.

The use of different ILP-systems would be simplified significantly if a common

declarative bias language were available. Such a language should have the following

characteristics:

•

The common language should be usable for a large range of ILP systems, which

means that it should be easy to translate a bias specification from the common

language to the native language of the ILP system

•

It should be easy to learn. This means it should make use of concepts most users

are familiar with. In the ideal case, the whole language itself is a language that

the intended users are familiar with already

•

The bias should not just serve as a necessary prerequisite for running the

induction algorithm, but should also be usable as a shared piece of information or

documentation about a problem within a team of analysts with varying levels of

technical expertise

•

It should be easy to judge the complexity of a problem from a single glance at the

declarative bias. A graphical representation would be desirable.

In this paper we propose the use of the Unified Modeling Language (UML) [2, 15, 16,

17] as the language of choice for specifying declarative bias of such nature. Over the

past few years UML has proven itself as a versatile tool for modeling a large range of

applications in various domains. For ILP the Class Diagrams with their usefulness in

database modeling are specifically interesting. Our discussion will be based on these

diagrams.

Why do we wish to use UML to express bias? First of all, as UML is an intuitive

visual language, essentially consisting of annotated graphs, we can easily write down

the declarative bias for a particular domain or judge the complexity of a given data

model [9]. Another reason for using UML is its widespread use in database (as well as

object oriented) modelling. UML has effectively become a standard with thorough

support in many commercial tools. Some tools allow the reverse engineering of a data

model from a given relational database, directly using the table specifications and

foreign key relations. If we can come up with a process of using UML in ILP

algorithms, we would then have practically automated the analysis process of a

relational database. Finally, UML may serve as a common means of stating

declarative bias languages used in the different ILP engines.

Although it is clear that UML is a good candidate for specifying first order

declarative bias, it may not be directly clear how the different engines will actually be

making use of the UML declarations. Its use in our previously published Multi-

Relational Data Mining framework [10, 11] is straightforward, as this framework and

the related engine Fiji2 have been designed around the use of UML from the outset.

To translate UML bias declarations to logic-based bias languages, we use an

intermediate textual representation, called Common Declarative Bias Language

(CDBL). CDBL is essentially a set of Prolog predicates, which can be easily

processed by the different translation procedures. Translation procedures for the

popular engines Tilde, Warmr and Progol are currently available. The whole process

of connecting an ILP engine to a relational database now becomes a series of

translation steps as is illustrated by the diagram in figure 1. We have implemented and

embedded each of these steps into a single GUI.

The investigation of UML as a common declarative bias language for non-experts

was motivated by the efforts involved in the Esprit IV project Aladin. This project

aims at bringing ILP capabilities to a wider, commercial audience by embedding a

range of ILP algorithms into the commercial Data Mining tool, Clementine.

The outline of this paper is as follows. A section describing UML and its potential

as first order declarative bias follows this introduction. We then give a short overview

of the syntax of the Common Declarative Bias Language. In

Translating CDBL

give an algorithm for translating CDBL to ILP input. Next we analyze the usefulness

of UML as a declarative bias language compared to other approaches in

Comparing

UML to traditional languages

. This section is followed by a

Conclusion

2 UML

From the large set of modelling tools provided by UML we will focus on the richest

and most commonly used one: Class Diagrams [16]. These diagrams model exactly

the concepts relevant for ILP, namely tables and the relation between them. In fact

when we write UML in this paper we are referring specifically to Class Diagrams.

There are two specific concepts within the Class Diagrams that we will be focusing

on. The first is the concept of

class.

A class is a description of a set of objects that

share the same features, relationships, and semantics. In a Class Diagram, a class is

represented as a rectangle. Typically, a class represents some tangible entity in the

problem domain, and maps to a table in the database.

The second concept is that of

association.

An association is a structural

relationship that specifies that objects of one class are connected to objects of another.

An important aspect of an association is its

multiplicity.

This specifies how many

Figure 1 The complete process of using UML with existing engines.

剩余11页未读，继续阅读

评论收藏

内容反馈

morre

粉丝: 187
资源: 2331

Multi-Relational Data Mining using UML for ILP 1

最新资源

Multi-Relational Data Mining using UML for ILP 1

Speeding Up Multi-Relational Data Mining

Multi-Relational Data Mining in Medical Databases

Prospects and Challenges for Multi-Relational Data Mining

Numbers in Multi-Relational Data Mining

multi-relational data mining

Introduction to multi-relational data mining

Multi-relational data mining in Microsoft SQL Server 2005

A Relational Tucker Decomposition for Multi-Relational Link Prediction.pdf

COMPOSITION-BASED MULTI-RELATIONAL GRAPH CONVOLUTIONAL NETWORKS

人工智能和机器学习之关联规则学习算法：Multi-Relational Association教程.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：数据库理论与实践.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：关联规则学习概论.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：知识图谱构建与应用.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：关联规则评估与优化.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：关联规则学习项目实践.docx

前端项目-backbone-relational.zip

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：多关系数据表示与预处理.docx

人工智能和机器学习之关联规则学习算法：Multi-Relational Association：多关系学习中的模式挖掘.docx

Community Mining from Multi-relational

A-Relational-Model-of-Data-for-Large-Shared-Data-Banks

pagerankmatlab代码-Link_Prediction_in_Multi-relational_Networks:Link_Pred

Oracle 9i Application Developer's Guide - Object-Relational Feat

DATA MINING Concepts and Techniques 3rd(数据挖掘：概念与技术)

HIBERNATE - Relational Persistence for Idiomatic Java.chm

南开大学数据库原理课件lecture2-Relational-Data-Model.ppt

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs.pdf

haskell-relational-record-driver-mysql:用于 haskell-relational-record 的 MySQL 驱动程序

最新资源