没有合适的资源?快使用搜索试试~ 我知道了~
数据挖掘技术与关联规则挖掘算法研究.doc
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 7 浏览量
2023-09-08
14:03:23
上传
评论
收藏 617KB DOC 举报
温馨提示
试读
102页
数据挖掘技术与关联规则挖掘算法研究.doc
资源推荐
资源详情
资源评论
摘 要
数据挖掘是致力于数据分析和理解、揭示数据内部蕴藏知识的技术,它
成为未来信息技术应用的重要目标之一。经过十几年的努力,数据挖掘产生
了许多新概念和方法。特别是最近几年,一些基本概念和方法趋于清晰,它
的研究正向着更深入的方向发展。像其它新技术的发展历程一样,数据挖掘
技术也必须经过概念提出、概念接受、广泛研究和探索、逐步应用和大量应
用等阶段。从目前的现状看,大部分学者认为数据挖掘的研究仍然处于广泛
研究和探索阶段,迫切需要在基础理论、应用模式、系统构架以及挖掘算法
和挖掘语言等方面进行创新。关联规则挖掘是数据挖掘中成果颇丰而且比较
活跃的研究分支,留给研究者的是更深入的课题。面对大型数据库,关联规
则挖掘需要在挖掘效率、可用性、精确性等方面得到提升。因此,需要探索
新的挖掘理论和模型;需要利用用户的约束等聚焦挖掘目标;需要对一些传
统的算法进行改进;也需要研究新的更有效的算法等。鉴于目前数据挖掘技
术和关联规则挖掘研究的现状和发展趋势,在各类基金的支持下,我们选择
了这一课题开展相关工作。
本文的研究主要包括数据挖掘应用系统体系结构、关联规则挖掘理论及
其算法等。关于数据挖掘应用系统体系结构研究方面,我们设计了一个数据
挖掘应用系统的原型体系结构,系统化地分析了知识发现的基本过程和系统
的各部件功能。由于不同的源数据类型、不同的应用目标以及不同的挖掘策
略对数据挖掘系统的功能部件要求不同,这些研究主要是从知识发现的基本
过程出发,探讨系统应具备的主要功能部件及其相互联系等。在关联规则挖
掘理论研究上,我们首次给出了项目序列集格空间,并且探讨了在这个空间
上的基本操作算子。基于项目序列集格空间及其操作,我们建立了关联规则
挖掘模型和算法。在关联规则挖掘算法方面,设计了基于项目序列集操作理
论的关联规则挖掘算法 ISS-DM、时态约束下的关联规则挖掘算法 TISS-DM、
数据分割下的关联规则挖掘算法 PISS-DM。ISS-DM 算法是建立在严格的项
目序列集格理论及其操作基础上,是一个一次数据库扫描的而且不使用侯选
集的高效算法。我们选择目前引用率较高的 Apriori 算法和 ISS-DM 进行了对
比实验。结果表明,ISS-DM 执行时间整体上优于 Apriori 算法,而且随着数
据量的增大 ISS-DM 执行时间的增长幅度也小于 Apriori 算法。为了提高对大
型数据集挖掘的适应性,将时态约束应用到挖掘的预处理中,改进 ISS-DM
成 TISS-DM。这部分工作还包括对时态区间、时态约束下的数据挖掘空间以
及时态区间操作等进行了形式化,它们是 TISS-DM 的理论基础。对 ISS-DM
的另一个改进算法是 PISS-DM。它是针对大数据集挖掘过程中对内存和 CPU
等系统资源要求较高的情况被提出和设计的,采用了数据分割的方法来减少
资源的占用。本文解决了数据分割下局部频繁项目序列集和全局频繁项目序
列集的转换等问题,是一个两次扫描数据库的算法。
总之,本文在分析、归类现有数据挖掘研究成果以及原型系统的基础
上,进行了数据挖掘应用系统体系结构、关联规则挖掘理论模型以及算法方
面的研究。在项目序列集格及其操作、时态约束挖掘空间等方面具有较好的
理论价值,所设计的算法在挖掘效率和对大型数据库挖掘的可用性方面具有
潜在的应用前景。
关键词:数据挖掘,知识发现,关联规则,项目序列集,时态约束,
数据分割。
Abstract
Abstract
Data mining is a technique that aims to analyze and
understand large source data and reveal knowledge hidden
in the data. It has been viewed as an important evolution in
information processing. Why there have been more
attentions to it from researchers or businessmen is due to
the wide availability of huge amounts of data and imminent
needs for turning such data into valuable information.
During the past decade or over, the concepts and
techniques on data mining have been presented, and some
of them have been discussed in higher levels for the last
few years. Data mining involves an integration of
techniques from database, artificial intelligence, machine
learning, statistics, knowledge engineering, object-oriented
method, information retrieval, high-performance computing
and visualization. Essentially, data mining is high-level
analysis technology and it has a strong purpose for
business profiting. Unlike OLTP applications, data mining
should provide in-depth data analysis and the supports for
business decisions. Like the other new techniques, however,
data mining must develop gradually from concept creation,
accepted importance, wide discussion, few usage attempts
to a large applications. Most experts consider it as the
phase of wide discussion today. It still needs theoretic
studies and algorithm exploring. Though some results have
been achieved, more theoretic problems are kept in ongoing
researches. In addition, data mining is from real
applications and must combine with the specific business
application logic to solve the specific problem. This is
because that different business fields have different mining
needs and targets. The successful data mining systems are
the excellent combination of data mining techniques and the
business logic, rather than tools that are designed to make
data mining application development convenient.
Abstract
Association rule mining is an important branch of data
mining that it has obtained many valuable results but there
still are a deal of more challenging problems to discuss. For
large databases, the research on improving the mining
performance and precision is necessary, so many focuses
of today on association rule mining are about new mining
theories, algorithms and improvement to old methods.
In this paper, the main researches involve the
application architecture of data mining, the mining theories
for association rules and the design of new efficient
algorithms. This paper analyzed the basic processing
phases of data mining or KDD, and gives the components
of a data mining application system and their functions. In
theoretic research, we first define Set of Item Sequences,
and give some operators on this algebra lattice. Applying
such theoretic results, we design an algorithm for mining
association rules called ISS-DM, which is efficient with one
pass to the database and without large candidates
generated and stored. For mining large-scale databases, it
is smart strategy to make use of constrains for improving
data quality and reducing data capability. This paper
introduces the problem of data mining based on temporal
constrains. We create two new operators on temporal
interval space and design an algorithm called TISS-DM by
making advance of these operators. TISS-DM may be seen
as an improvement algorithm to ISS-DM, which can process
more scale databases. In fact, recent researches have paid
more attention to reduce the number of passes over
databases (I/O cost), memory usage and CPU overhead.
This paper also gives an algorithm called PISS-DM which
employs data partitioning technique and only has two
passes over databases. Experimental results showed that
these algorithms have higher mining efficiency in execution
time, memory usage and CPU utilization than most current
ones like Apriori.
In conclusion, this paper analyzes application
Abstract
architecture of data mining systems, creates new mining
theoretic models, and designs a series of new algorithms
based on such theories.
Key words: Data mining, KDD(Knowledge
Discovery in Databases), Association rules , Set of
itemsequences, Temporal constraint, Data partitioning.
剩余101页未读,继续阅读
资源评论
南抖北快东卫
- 粉丝: 70
- 资源: 5584
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功