没有合适的资源?快使用搜索试试~ 我知道了~
Information Retrieval Data Structures Algorithms
4星 · 超过85%的资源 需积分: 9 161 下载量 167 浏览量
2009-04-25
14:49:07
上传
评论
收藏 1.18MB PDF 举报
温馨提示
试读
630页
Information Retrieval Data Structures Algorithms
资源推荐
资源详情
资源评论
Information Retrieval: Table of Contents
Information Retrieval: Data Structures &
Algorithms
edited by William B. Frakes and Ricardo Baeza-Yates
FOREWORD
PREFACE
CHAPTER 1: INTRODUCTION TO INFORMATION STORAGE AND RETRIEVAL SYSTEMS
CHAPTER 2: INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS RELATED TO
INFORMATION RETRIEVAL
CHAPTER 3: INVERTED FILES
CHAPTER 4: SIGNATURE FILES
CHAPTER 5: NEW INDICES FOR TEXT: PAT TREES AND PAT ARRAYS
CHAPTER 6: FILE ORGANIZATIONS FOR OPTICAL DISKS
CHAPTER 7: LEXICAL ANALYSIS AND STOPLISTS
CHAPTER 8: STEMMING ALGORITHMS
CHAPTER 9: THESAURUS CONSTRUCTION
CHAPTER 10: STRING SEARCHING ALGORITHMS
CHAPTER 11: RELEVANCE FEEDBACK AND OTHER QUERY MODIFICATION TECHNIQUES
CHAPTER 12: BOOLEAN OPERATIONS
CHAPTER 13: HASHING ALGORITHMS
file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDobbs_Books_Algorithms_Collection2ed/books/book5/toc.htm (1 of 2)7/3/2004 4:19:10 PM
Information Retrieval: Table of Contents
CHAPTER 14: RANKING ALGORITHMS
CHAPTER 15: EXTENDED BOOLEAN MODELS
CHAPTER 16: CLUSTERING ALGORITHMS
CHAPTER 17: SPECIAL-PURPOSE HARDWARE FOR INFORMATION RETRIEVAL
CHAPTER 18: PARALLEL INFORMATION RETRIEVAL ALGORITHMS
file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDobbs_Books_Algorithms_Collection2ed/books/book5/toc.htm (2 of 2)7/3/2004 4:19:10 PM
Information Retrieval: FOREWORD
FOREWORD
Udi Manber
Department of Computer Science, University of Arizona
In the not-so-long ago past, information retrieval meant going to the town's library and asking the
librarian for help. The librarian usually knew all the books in his possession, and could give one a
definite, although often negative, answer. As the number of books grew--and with them the number of
libraries and librarians--it became impossible for one person or any group of persons to possess so much
information. Tools for information retrieval had to be devised. The most important of these tools is the
index--a collection of terms with pointers to places where information about them can be found. The
terms can be subject matters, author names, call numbers, etc., but the structure of the index is
essentially the same. Indexes are usually placed at the end of a book, or in another form, implemented as
card catalogs in a library. The Sumerian literary catalogue, of c. 2000 B.C., is probably the first list of
books ever written. Book indexes had appeared in a primitive form in the 16th century, and by the 18th
century some were similar to today's indexes. Given the incredible technology advances in the last 200
years, it is quite surprising that today, for the vast majority of people, an index, or a hierarchy of
indexes, is still the only available tool for information retrieval! Furthermore, at least from my
experience, many book indexes are not of high quality. Writing a good index is still more a matter of
experience and art than a precise science.
Why do most people still use 18th century technology today? It is not because there are no other
methods or no new technology. I believe that the main reason is simple: Indexes work. They are
extremely simple and effective to use for small to medium-size data. As President Reagan was fond of
saying "if it ain't broke, don't fix it." We read books in essentially the same way we did in the 18th
century, we walk the same way (most people don't use small wheels, for example, for walking, although
it is technologically feasible), and some people argue that we teach our students in the same way. There
is a great comfort in not having to learn something new to perform an old task. However, with the
information explosion just upon us, "it" is about to be broken. We not only have an immensely greater
amount of information from which to retrieve, we also have much more complicated needs. Faster
computers, larger capacity high-speed data storage devices, and higher bandwidth networks will all
come along, but they will not be enough. We will need better techniques for storing, accessing,
querying, and manipulating information.
It is doubtful that in our lifetime most people will read books, say, from a notebook computer, that
people will have rockets attached to their backs, or that teaching will take a radical new form (I dare not
even venture what form), but it is likely that information will be retrieved in many new ways, but many
more people, and on a grander scale.
file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...ooks_Algorithms_Collection2ed/books/book5/foreword.htm (1 of 2)7/3/2004 4:19:16 PM
Information Retrieval: FOREWORD
I exaggerated, of course, when I said that we are still using ancient technology for information retrieval.
The basic concept of indexes--searching by keywords--may be the same, but the implementation is a
world apart from the Sumerian clay tablets. And information retrieval of today, aided by computers, is
not limited to search by keywords. Numerous techniques have been developed in the last 30 years, many
of which are described in this book. There are efficient data structures to store indexes, sophisticated
query algorithms to search quickly, data compression methods, and special hardware, to name just a few
areas of extraordinary advances. Considerable progress has been made for even seemingly elementary
problems, such as how to find a given pattern in a large text with or without preprocessing the text.
Although most people do not yet enjoy the power of computerized search, and those who do cry for
better and more powerful methods, we expect major changes in the next 10 years or even sooner. The
wonderful mix of issues presented in this collection, from theory to practice, from software to hardware,
is sure to be of great help to anyone with interest in information retrieval.
An editorial in the Australian Library Journal in 1974 states that "the history of cataloging is exceptional
in that it is endlessly repetitive. Each generation rethinks and reformulates the same basic problems,
reframing them in new contexts and restating them in new terminology." The history of computerized
cataloging is still too young to be in a cycle, and the problems it faces may be old in origin but new in
scale and complexity. Information retrieval, as is evident from this book, has grown into a broad area of
study. I dare to predict that it will prosper. Oliver Wendell Holmes wrote in 1872 that "It is the province
of knowledge to speak and it is the privilege of wisdom to listen." Maybe, just maybe, we will also be
able to say in the future that it is the province of knowledge to write and it is the privilege of wisdom to
query.
Go to
Preface Back to Table of Contents
file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...ooks_Algorithms_Collection2ed/books/book5/foreword.htm (2 of 2)7/3/2004 4:19:16 PM
Information Retrieval: PREFACE
PREFACE
Text is the primary way that human knowledge is stored, and after speech, the primary way it is
transmitted. Techniques for storing and searching for textual documents are nearly as old as written
language itself. Computing, however, has changed the ways text is stored, searched, and retrieved. In
traditional library indexing, for example, documents could only be accessed by a small number of index
terms such as title, author, and a few subject headings. With automated systems, the number of indexing
terms that can be used for an item is virtually limitless.
The subfield of computer science that deals with the automated storage and retrieval of documents is
called information retrieval (IR). Automated IR systems were originally developed to help manage the
huge scientific literature that has developed since the 1940s, and this is still the most common use of IR
systems. IR systems are in widespread use in university, corporate, and public libraries. IR techniques
have also been found useful, however, in such disparate areas as office automation and software
engineering. Indeed, any field that relies on documents to do its work could potentially benefit from IR
techniques.
IR shares concerns with many other computer subdisciplines, such as artificial intelligence, multimedia
systems, parallel computing, and human factors. Yet, in our observation, IR is not widely known in the
computer science community. It is often confused with DBMS--a field with which it shares concerns
and yet from which it is distinct. We hope that this book will make IR techniques more widely known
and used.
Data structures and algorithms are fundamental to computer science. Yet, despite a large IR literature,
the basic data structures and algorithms of IR have never been collected in a book. This is the need that
we are attempting to fill. In discussing IR data structures and algorithms, we attempt to be evaluative as
well as descriptive. We discuss relevant empirical studies that have compared the algorithms and data
structures, and some of the most important algorithms are presented in detail, including implementations
in C.
Our primary audience is software engineers building systems with text processing components. Students
of computer science, information science, library science, and other disciplines who are interested in text
retrieval technology should also find the book useful. Finally, we hope that information retrieval
researchers will use the book as a basis for future research.
Bill Frakes
Ricardo Baeza-Yates
ACKNOWLEDGEMENTS
file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...Books_Algorithms_Collection2ed/books/book5/preface.htm (1 of 2)7/3/2004 4:19:18 PM
剩余629页未读,继续阅读
资源评论
- acecamel2011-11-08内容比较全面,可惜有点老了,尤其对于分布式并行搜索引擎中的算法和数据结构方面的内容介绍得不够。
hjtan_002
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功