没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Bing L iu
Web Data Mining
Exploring Hyperlinks,
Contents, and Usage Data
With 177 Figures
123
Bing Liu
Departmen t of Computer Science
University of Illinois at Chicago
851 S. Morgan Street
Chicago, IL 60607-7053
USA
liub@cs.uic.edu
Library of Congress Control Number: 2006937132
ACM Computing Classification (1998): H.2, H.3, I.2, I.5, E.5
ISBN-10 3-540-37881-2 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-37881-5 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication
of this publication or parts thereof is permitted only under the provisions of the German Copyright
Law of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2007
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Cover Design: KünkelLopka, Heidelberg
Typesetting: by the Author
Production: LE-T
E
XJelonek,Schmidt&VöcklerGbR,Leipzig
Printed on acid-free paper 45/3100/YL 5 4 3210
To my parents, my wife Yue and children Shelley and Kate
Preface
The rapid growth of the Web in the last decade makes it the largest pub-
licly accessible data source in the world. Web mining aims to discover use-
ful information or knowledge from Web hyperlinks, page contents, and us-
age logs. Based on the primary kinds of data used in the mining process,
Web mining tasks can be categorized into three main types: Web structure
mining, Web content mining and Web usage mining. Web structure min-
ing discovers knowledge from hyperlinks, which represent the structure of
the Web. Web content mining extracts useful information/knowledge from
Web page contents. Web usage mining mines user access patterns from
usage logs, which record clicks made by every user.
The goal of this book is to present these tasks, and their core mining al-
gorithms. The book is intended to be a text with a comprehensive cover-
age, and yet, for each topic, sufficient details are given so that readers can
gain a reasonably complete knowledge of its algorithms or techniques
without referring to any external materials. Four of the chapters, structured
data extraction, information integration, opinion mining, and Web usage
mining, make this book unique. These topics are not covered by existing
books, but yet they are essential to Web data mining. Traditional Web
mining topics such as search, crawling and resource discovery, and link
analysis are also covered in detail in this book.
Although the book is entitled Web Data Mining, it also includes the
main topics of data mining and information retrieval since Web mining
uses their algorithms and techniques extensively. The data mining part
mainly consists of chapters on association rules and sequential patterns,
supervised learning (or classification), and unsupervised learning (or clus-
tering), which are the three most important data mining tasks. The ad-
vanced topic of partially (semi-) supervised learning is included as well.
For information retrieval, its core topics that are crucial to Web mining are
described. This book is thus naturally divided into two parts. The first part,
which consists of Chaps. 2–5, covers data mining foundations. The second
part, which contains Chaps. 6–12, covers Web specific mining.
Two main principles have guided the writing of this book. First, the ba-
sic content of the book should be accessible to undergraduate students, and
yet there are sufficient in-depth materials for graduate students who plan to
pursue Ph.D. degrees in Web data mining or related areas. Few assump-
tions are made in the book regarding the prerequisite knowledge of read-
ers. One with a basic understanding of algorithms and probability concepts
should have no problem with this book. Second, the book should examine
the Web mining technology from a practical point of view. This is impor-
tant because most Web mining tasks have immediate real-world applica-
tions. In the past few years, I was fortunate to have worked directly or in-
directly with many researchers and engineers in several search engine and
e-commerce companies, and also traditional companies that are interested
in exploiting the information on the Web in their businesses. During the
process, I gained practical experiences and first-hand knowledge of real-
world problems. I try to pass those non-confidential pieces of information
and knowledge along in the book. The book, thus, should have a good bal-
ance of theory and practice. I hope that it will not only be a learning text
for students, but also a valuable source of information/knowledge and even
ideas for Web mining researchers and practitioners.
Acknowledgements
Many researchers have assisted me technically in writing this book. With-
out their help, this book might never have become reality. My deepest
thanks goes to Filippo Menczer and Bamshad Mobasher, who were so kind
to have helped write two essential chapters of the book. They are both ex-
perts in their respective fields. Filippo wrote the chapter on Web crawling
and Bamshad wrote the chapter on Web usage mining. I am also very
grateful to Wee Sun Lee, who helped a great deal in the writing of Chap. 5
on partially supervised learning.
Jian Pei helped with the writing of the PrefixSpan algorithm in Chap. 2,
and checked the MS-PS algorithm. Eduard Dragut assisted with the writing
of the last section of Chap. 10 and also read the chapter many times.
Yuanlin Zhang gave many great suggestions on Chap. 9. I am indebted to
all of them.
Many other researchers also assisted in various ways. Yang Dai and
Rudy Setiono helped with Support Vector Machines (SVM). Chris Ding
helped with link analysis. Clement Yu and ChengXiang Zhai read Chap. 6,
and Amy Langville read Chap. 7. Kevin C.-C. Chang, Ji-Rong Wen and
Clement Yu helped with many aspects of Chap 10. Justin Zobel helped
clarify some issues related to index compression, and Ion Muslea helped
clarify some issues on wrapper induction. Divy Agrawal, Yunbo Cao,
Edward Fox, Hang Li, Xiaoli Li, Zhaohui Tan, Dell Zhang and Zijian
Zheng helped check various chapters or sections. I am very grateful.
VIII Preface
剩余492页未读,继续阅读
资源评论
- 中国沙东2019-03-30很好的一本书
- yousama1232016-01-06很好的一本书,对于网页数据分析很有帮助
jiangdmdr
- 粉丝: 57
- 资源: 774
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功