没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Introduction to Data Mining
and
Knowledge Discovery
Third Edition
by
Two Crows Corporation
Introduction to Data Mining and Knowledge Discovery, Third Edition
ISBN: 1-892095-02-5
© 1999 by Two Crows Corporation. No portion of this document may be reproduced without express permission.
For permission, please contact: Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.).
Phone: 1-301-983-9550; fax: 1-301-983-3554. Web address: www.twocrows.com
You’re ready to move ahead in data mining...
but where do you begin?
Data Mining ’99: Technology Report is an essential guide. Before your staff spends hours gathering
information from vendors — and before you hire a consultant — save money by using this report to
focus on the products and approaches best suited for your organization’s needs.
Data Mining ’99: Technology Report contains a clear, non-technical overview of data mining
techniques and their role in knowledge discovery, PLUS detailed vendor specifications and feature
descriptions for over two dozen data mining products (check our website for the complete list). It also
comes with CD-ROMs that contain selected product demos and vendor-provided case histories. The
Technology Report is an extraordinary value for only $695 (additional copies $495 each).
Data Mining ’99 is the newest report from Two Crows Corporation. The previous edition (Data
Mining: Products, Applications & Technologies) sold out its printing, with purchasers around the
world in banking, insurance, telecom, retailing, government, consulting, academia and information
systems.
We ship Data Mining ’99: Technology Report by air express (FREE within the United States). We’re
confident you’ll find this the most useful, comprehensive overview of data mining available
anywhere. Contact us now to order your copy of Data Mining ’99: Technology Report.
• We accept MasterCard, VISA and American Express.
• Orders from Maryland, please add 5% sales tax.
• Academic purchasers: ask about special pricing.
Two Crows Corporation
10500 Falls Road
Potomac, MD 20854
(301) 983-9550
www.twocrows.com
TABLE OF CONTENTS
Introduction
Data mining: In brief ................................................................... 1
Data mining: What it can’t do ..................................................... 1
Data mining and data warehousing ............................................. 2
Data mining and OLAP............................................................... 3
Data mining, machine learning and statistics .............................. 4
Data mining and hardware/software trends................................. 4
Data mining applications............................................................. 5
Successful data mining................................................................ 5
Data Description for Data Mining
Summaries and visualization....................................................... 6
Clustering .................................................................................... 6
Link analysis ............................................................................... 7
Predictive Data Mining
A hierarchy of choices................................................................. 9
Some terminology ..................................................................... 10
Classification ............................................................................. 10
Regression ................................................................................. 10
Time series ................................................................................ 10
Data Mining Models and Algorithms
Neural networks ........................................................................ 11
Decision trees ............................................................................ 14
Multivariate Adaptive Regression Splines (MARS) ................. 17
Rule induction ........................................................................... 17
K-nearest neighbor and memory-based reasoning (MBR) ....... 18
Logistic regression .................................................................... 19
Discriminant analysis ................................................................ 19
Generalized Additive Models (GAM)....................................... 20
Boosting .................................................................................... 20
Genetic algorithms .................................................................... 21
The Data Mining Process
Process Models ......................................................................... 22
The Two Crows Process Model ................................................ 22
Selecting Data Mining Products
Categories.................................................................................. 34
Basic capabilities....................................................................... 34
Summary............................................................................................ 36
RELATED READINGS
Data Mining ’99: Technology Report, Two Crows Corporation, 1999
M. Berry and G. Linoff, Data Mining Techniques, John Wiley, 1997
William S. Cleveland, The Elements of Graphing Data, revised, Hobart Press, 1994
Howard Wainer, Visual Revelations, Copernicus, 1997
R. Kennedy, Lee, Reed, and Van Roy, Solving Pattern Recognition Problems,
Prentice-Hall, 1998
U. Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, Advances in Knowledge
Discovery and Data Mining, MIT Press, 1996
Dorian Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999
C. Westphal and T. Blaxton, Data Mining Solutions, John Wiley, 1998
Vasant Dhar and Roger Stein, Seven Methods for Transforming Corporate Data into
Business Intelligence, Prentice Hall 1997
Brieman, Freidman, Olshen, and Stone, Classification and Regression Trees,
Wadsworth, 1984
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992
© 1999 Two Crows Corporation 1
Introduction to Data Mining and Knowledge Discovery
INTRODUCTION
Data mining: In brief
Databases today can range in size into the terabytes — more than 1,000,000,000,000 bytes of data.
Within these masses of data lies hidden information of strategic importance. But when there are so
many trees, how do you draw meaningful conclusions about the forest?
The newest answer is data mining, which is being used both to increase revenues and to reduce costs.
The potential returns are enormous. Innovative organizations worldwide are already using data
mining to locate and appeal to higher-value customers, to reconfigure their product offerings to
increase sales, and to minimize losses due to error or fraud.
Data mining is a process that uses a variety of data analysis tools to discover patterns and
relationships in data that may be used to make valid predictions.
The first and simplest analytical step in data mining is to describe the data — summarize its statistical
attributes (such as means and standard deviations), visually review it using charts and graphs, and
look for potentially meaningful links among variables (such as values that often occur together). As
emphasized in the section on T
HE DATA MINING PROCESS, collecting, exploring and selecting the right
data are critically important.
But data description alone cannot provide an action plan. You must build a predictive model based
on patterns determined from known results, then test that model on results outside the original
sample. A good model should never be confused with reality (you know a road map isn’t a perfect
representation of the actual road), but it can be a useful guide to understanding your business.
The final step is to empirically verify the model. For example, from a database of customers who
have already responded to a particular offer, you’ve built a model predicting which prospects are
likeliest to respond to the same offer. Can you rely on this prediction? Send a mailing to a portion of
the new list and see what results you get.
Data mining: What it can’t do
Data mining is a tool, not a magic wand. It won’t sit in your database watching what happens and
send you e-mail to get your attention when it sees an interesting pattern. It doesn’t eliminate the need
to know your business, to understand your data, or to understand analytical methods. Data mining
assists business analysts with finding patterns and relationships in the data — it does not tell you the
value of the patterns to the organization. Furthermore, the patterns uncovered by data mining must be
verified in the real world.
Remember that the predictive relationships found via data mining are not necessarily causes of an
action or behavior. For example, data mining might determine that males with incomes between
$50,000 and $65,000 who subscribe to certain magazines are likely purchasers of a product you want
to sell. While you can take advantage of this pattern, say by aiming your marketing at people who fit
the pattern, you should not assume that any of these factors cause them to buy your product.
剩余39页未读,继续阅读
资源评论
- soddy19902014-06-13该文件为扫描版,要十分清晰的同学慎重了。
叶顺平
- 粉丝: 1
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功