数据挖掘技术与关联规则挖掘算法研究.doc资源-CSDN文库

版权申诉

7 浏览量 2023-09-08 14:03:23 上传评论收藏 617KB DOC 举报

资源推荐

资源详情

资源评论

摘要

数据挖掘是致力于数据分析和理解、揭示数据内部蕴藏知识的技术，它

成为未来信息技术应用的重要目标之一。经过十几年的努力，数据挖掘产生

了许多新概念和方法。特别是最近几年，一些基本概念和方法趋于清晰，它

的研究正向着更深入的方向发展。像其它新技术的发展历程一样，数据挖掘

技术也必须经过概念提出、概念接受、广泛研究和探索、逐步应用和大量应

用等阶段。从目前的现状看，大部分学者认为数据挖掘的研究仍然处于广泛

研究和探索阶段，迫切需要在基础理论、应用模式、系统构架以及挖掘算法

和挖掘语言等方面进行创新。关联规则挖掘是数据挖掘中成果颇丰而且比较

活跃的研究分支，留给研究者的是更深入的课题。面对大型数据库，关联规

则挖掘需要在挖掘效率、可用性、精确性等方面得到提升。因此，需要探索

新的挖掘理论和模型；需要利用用户的约束等聚焦挖掘目标；需要对一些传

统的算法进行改进；也需要研究新的更有效的算法等。鉴于目前数据挖掘技

术和关联规则挖掘研究的现状和发展趋势，在各类基金的支持下，我们选择

了这一课题开展相关工作。

本文的研究主要包括数据挖掘应用系统体系结构、关联规则挖掘理论及

其算法等。关于数据挖掘应用系统体系结构研究方面，我们设计了一个数据

挖掘应用系统的原型体系结构，系统化地分析了知识发现的基本过程和系统

的各部件功能。由于不同的源数据类型、不同的应用目标以及不同的挖掘策

略对数据挖掘系统的功能部件要求不同，这些研究主要是从知识发现的基本

过程出发，探讨系统应具备的主要功能部件及其相互联系等。在关联规则挖

掘理论研究上，我们首次给出了项目序列集格空间，并且探讨了在这个空间

上的基本操作算子。基于项目序列集格空间及其操作，我们建立了关联规则

挖掘模型和算法。在关联规则挖掘算法方面，设计了基于项目序列集操作理

论的关联规则挖掘算法 ISS-DM、时态约束下的关联规则挖掘算法 TISS-DM、

数据分割下的关联规则挖掘算法 PISS-DM。ISS-DM 算法是建立在严格的项

目序列集格理论及其操作基础上，是一个一次数据库扫描的而且不使用侯选

集的高效算法。我们选择目前引用率较高的 Apriori 算法和 ISS-DM 进行了对

比实验。结果表明，ISS-DM 执行时间整体上优于 Apriori 算法，而且随着数

据量的增大 ISS-DM 执行时间的增长幅度也小于 Apriori 算法。为了提高对大

型数据集挖掘的适应性，将时态约束应用到挖掘的预处理中，改进 ISS-DM

成 TISS-DM。这部分工作还包括对时态区间、时态约束下的数据挖掘空间以

及时态区间操作等进行了形式化，它们是 TISS-DM 的理论基础。对 ISS-DM

的另一个改进算法是 PISS-DM。它是针对大数据集挖掘过程中对内存和 CPU

等系统资源要求较高的情况被提出和设计的，采用了数据分割的方法来减少

资源的占用。本文解决了数据分割下局部频繁项目序列集和全局频繁项目序

列集的转换等问题，是一个两次扫描数据库的算法。

总之，本文在分析、归类现有数据挖掘研究成果以及原型系统的基础

上，进行了数据挖掘应用系统体系结构、关联规则挖掘理论模型以及算法方

面的研究。在项目序列集格及其操作、时态约束挖掘空间等方面具有较好的

Abstract

Data mining is a technique that aims to analyze and

understand large source data and reveal knowledge hidden

in the data. It has been viewed as an important evolution in

information processing. Why there have been more

attentions to it from researchers or businessmen is due to

the wide availability of huge amounts of data and imminent

needs for turning such data into valuable information.

During the past decade or over, the concepts and

techniques on data mining have been presented, and some

of them have been discussed in higher levels for the last

few years. Data mining involves an integration of

techniques from database, artificial intelligence, machine

learning, statistics, knowledge engineering, object-oriented

method, information retrieval, high-performance computing

and visualization. Essentially, data mining is high-level

analysis technology and it has a strong purpose for

business profiting. Unlike OLTP applications, data mining

should provide in-depth data analysis and the supports for

business decisions. Like the other new techniques, however,

data mining must develop gradually from concept creation,

accepted importance, wide discussion, few usage attempts

to a large applications. Most experts consider it as the

phase of wide discussion today. It still needs theoretic

studies and algorithm exploring. Though some results have

been achieved, more theoretic problems are kept in ongoing

researches. In addition, data mining is from real

applications and must combine with the specific business

application logic to solve the specific problem. This is

because that different business fields have different mining

needs and targets. The successful data mining systems are

the excellent combination of data mining techniques and the

business logic, rather than tools that are designed to make

data mining application development convenient.

Abstract

Association rule mining is an important branch of data

mining that it has obtained many valuable results but there

still are a deal of more challenging problems to discuss. For

large databases, the research on improving the mining

performance and precision is necessary, so many focuses

of today on association rule mining are about new mining

theories, algorithms and improvement to old methods.

In this paper, the main researches involve the

application architecture of data mining, the mining theories

for association rules and the design of new efficient

algorithms. This paper analyzed the basic processing

phases of data mining or KDD, and gives the components

of a data mining application system and their functions. In

theoretic research, we first define Set of Item Sequences,

and give some operators on this algebra lattice. Applying

such theoretic results, we design an algorithm for mining

association rules called ISS-DM, which is efficient with one

pass to the database and without large candidates

generated and stored. For mining large-scale databases, it

is smart strategy to make use of constrains for improving

data quality and reducing data capability. This paper

introduces the problem of data mining based on temporal

constrains. We create two new operators on temporal

interval space and design an algorithm called TISS-DM by

making advance of these operators. TISS-DM may be seen

as an improvement algorithm to ISS-DM, which can process

more scale databases. In fact, recent researches have paid

more attention to reduce the number of passes over

databases (I/O cost), memory usage and CPU overhead.

This paper also gives an algorithm called PISS-DM which

employs data partitioning technique and only has two

passes over databases. Experimental results showed that

these algorithms have higher mining efficiency in execution

time, memory usage and CPU utilization than most current

ones like Apriori.

In conclusion, this paper analyzes application

剩余101页未读，继续阅读

评论收藏

内容反馈

版权申诉

南抖北快东卫

粉丝: 70
资源: 5584

数据挖掘技术与关联规则挖掘算法研究.doc

数据挖掘中关联规则及聚类并行算法研究.doc

关联规则在购物篮数据分析中的应用-数据挖掘.doc

基于Apriori算法的关联规则挖掘系统的设计与实现毕业论文.doc

数据仓库与数据挖掘实验报告--.doc

数据挖掘实验报告.doc

数据挖掘与数据分析论文.doc

数据挖掘技术分析.doc

大数据与数据挖掘.doc

基于遗传算法的中药药对挖掘系统的设计与实现.doc

【转】数据挖掘-关联分析频繁模式挖掘Apriori、FP-Growth及Eclat算法的JAVA及C 实现.doc

数据分析与挖掘实验报告(2).doc

数据分析与挖掘实验报告(1).doc

数据分析与挖掘实验报告.doc

《数据仓库与数据挖掘》课程作业.doc

数据挖掘分析报告模板.doc

数据仓库与数据挖掘实验报告.doc

大数据分析报告与挖掘实验报告材料.doc

完整版 重庆大学 商务智能课程 BI教程 大数据与数据挖掘教程 第5章 数据预处理技术（共114页）.ppt

2023.10.21 雷蛇+鼠标宏+PUBG+绝地求生 步枪通用

115转存助手ui优化版3.9.1网友魔改-转存提取全修复-user

恩山新版中兴Telnet工具体验版-20230830.rar

基于华为eNSP的校园网设计和仿真模拟.zip

蓝牙驱动，解决win11下的设备管理器蓝牙报错提示为：Generic Bluetooth Adapter驱动感叹号解

大漠插件3.1233全套

csgo鼠标宏，lua格式

Chrome浏览器 v109.0.5414.75稳定版 64位 离线安装包

高逼格PPT模板20套.rar

OpenGD77刷机材料

yolov5的模型yolov5s.pt

最新资源

完整版重庆大学商务智能课程 BI教程大数据与数据挖掘教程第5章数据预处理技术（共114页）.ppt

2023.10.21 雷蛇+鼠标宏+PUBG+绝地求生步枪通用

Chrome浏览器 v109.0.5414.75稳定版 64位离线安装包