InformationRetrievalDataStructuresAlgorithms资源-CSDN文库

4星 · 超过85%的资源需积分: 9 167 浏览量 2009-04-25 14:49:07 上传评论收藏 1.18MB PDF 举报

资源推荐

资源详情

资源评论

Information Retrieval: Table of Contents

Information Retrieval: Data Structures &

Algorithms

edited by William B. Frakes and Ricardo Baeza-Yates

FOREWORD

PREFACE

CHAPTER 1: INTRODUCTION TO INFORMATION STORAGE AND RETRIEVAL SYSTEMS

CHAPTER 2: INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS RELATED TO

INFORMATION RETRIEVAL

CHAPTER 3: INVERTED FILES

CHAPTER 4: SIGNATURE FILES

CHAPTER 5: NEW INDICES FOR TEXT: PAT TREES AND PAT ARRAYS

CHAPTER 6: FILE ORGANIZATIONS FOR OPTICAL DISKS

CHAPTER 7: LEXICAL ANALYSIS AND STOPLISTS

CHAPTER 8: STEMMING ALGORITHMS

CHAPTER 9: THESAURUS CONSTRUCTION

CHAPTER 10: STRING SEARCHING ALGORITHMS

CHAPTER 11: RELEVANCE FEEDBACK AND OTHER QUERY MODIFICATION TECHNIQUES

CHAPTER 12: BOOLEAN OPERATIONS

CHAPTER 13: HASHING ALGORITHMS

file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDobbs_Books_Algorithms_Collection2ed/books/book5/toc.htm (1 of 2)7/3/2004 4:19:10 PM

Information Retrieval: FOREWORD

FOREWORD

Udi Manber

Department of Computer Science, University of Arizona

In the not-so-long ago past, information retrieval meant going to the town's library and asking the

librarian for help. The librarian usually knew all the books in his possession, and could give one a

definite, although often negative, answer. As the number of books grew--and with them the number of

libraries and librarians--it became impossible for one person or any group of persons to possess so much

information. Tools for information retrieval had to be devised. The most important of these tools is the

index--a collection of terms with pointers to places where information about them can be found. The

terms can be subject matters, author names, call numbers, etc., but the structure of the index is

essentially the same. Indexes are usually placed at the end of a book, or in another form, implemented as

card catalogs in a library. The Sumerian literary catalogue, of c. 2000 B.C., is probably the first list of

books ever written. Book indexes had appeared in a primitive form in the 16th century, and by the 18th

century some were similar to today's indexes. Given the incredible technology advances in the last 200

years, it is quite surprising that today, for the vast majority of people, an index, or a hierarchy of

indexes, is still the only available tool for information retrieval! Furthermore, at least from my

experience, many book indexes are not of high quality. Writing a good index is still more a matter of

experience and art than a precise science.

Why do most people still use 18th century technology today? It is not because there are no other

methods or no new technology. I believe that the main reason is simple: Indexes work. They are

extremely simple and effective to use for small to medium-size data. As President Reagan was fond of

saying "if it ain't broke, don't fix it." We read books in essentially the same way we did in the 18th

century, we walk the same way (most people don't use small wheels, for example, for walking, although

it is technologically feasible), and some people argue that we teach our students in the same way. There

is a great comfort in not having to learn something new to perform an old task. However, with the

information explosion just upon us, "it" is about to be broken. We not only have an immensely greater

amount of information from which to retrieve, we also have much more complicated needs. Faster

computers, larger capacity high-speed data storage devices, and higher bandwidth networks will all

come along, but they will not be enough. We will need better techniques for storing, accessing,

querying, and manipulating information.

It is doubtful that in our lifetime most people will read books, say, from a notebook computer, that

people will have rockets attached to their backs, or that teaching will take a radical new form (I dare not

even venture what form), but it is likely that information will be retrieved in many new ways, but many

more people, and on a grander scale.

file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...ooks_Algorithms_Collection2ed/books/book5/foreword.htm (1 of 2)7/3/2004 4:19:16 PM

Information Retrieval: FOREWORD

I exaggerated, of course, when I said that we are still using ancient technology for information retrieval.

The basic concept of indexes--searching by keywords--may be the same, but the implementation is a

world apart from the Sumerian clay tablets. And information retrieval of today, aided by computers, is

not limited to search by keywords. Numerous techniques have been developed in the last 30 years, many

of which are described in this book. There are efficient data structures to store indexes, sophisticated

query algorithms to search quickly, data compression methods, and special hardware, to name just a few

areas of extraordinary advances. Considerable progress has been made for even seemingly elementary

problems, such as how to find a given pattern in a large text with or without preprocessing the text.

Although most people do not yet enjoy the power of computerized search, and those who do cry for

better and more powerful methods, we expect major changes in the next 10 years or even sooner. The

wonderful mix of issues presented in this collection, from theory to practice, from software to hardware,

is sure to be of great help to anyone with interest in information retrieval.

An editorial in the Australian Library Journal in 1974 states that "the history of cataloging is exceptional

in that it is endlessly repetitive. Each generation rethinks and reformulates the same basic problems,

reframing them in new contexts and restating them in new terminology." The history of computerized

cataloging is still too young to be in a cycle, and the problems it faces may be old in origin but new in

scale and complexity. Information retrieval, as is evident from this book, has grown into a broad area of

study. I dare to predict that it will prosper. Oliver Wendell Holmes wrote in 1872 that "It is the province

of knowledge to speak and it is the privilege of wisdom to listen." Maybe, just maybe, we will also be

able to say in the future that it is the province of knowledge to write and it is the privilege of wisdom to

query.

Go to

Preface Back to Table of Contents

file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...ooks_Algorithms_Collection2ed/books/book5/foreword.htm (2 of 2)7/3/2004 4:19:16 PM

Information Retrieval: PREFACE

PREFACE

Text is the primary way that human knowledge is stored, and after speech, the primary way it is

transmitted. Techniques for storing and searching for textual documents are nearly as old as written

language itself. Computing, however, has changed the ways text is stored, searched, and retrieved. In

traditional library indexing, for example, documents could only be accessed by a small number of index

terms such as title, author, and a few subject headings. With automated systems, the number of indexing

terms that can be used for an item is virtually limitless.

The subfield of computer science that deals with the automated storage and retrieval of documents is

called information retrieval (IR). Automated IR systems were originally developed to help manage the

huge scientific literature that has developed since the 1940s, and this is still the most common use of IR

systems. IR systems are in widespread use in university, corporate, and public libraries. IR techniques

have also been found useful, however, in such disparate areas as office automation and software

engineering. Indeed, any field that relies on documents to do its work could potentially benefit from IR

techniques.

IR shares concerns with many other computer subdisciplines, such as artificial intelligence, multimedia

systems, parallel computing, and human factors. Yet, in our observation, IR is not widely known in the

computer science community. It is often confused with DBMS--a field with which it shares concerns

and yet from which it is distinct. We hope that this book will make IR techniques more widely known

and used.

Data structures and algorithms are fundamental to computer science. Yet, despite a large IR literature,

the basic data structures and algorithms of IR have never been collected in a book. This is the need that

we are attempting to fill. In discussing IR data structures and algorithms, we attempt to be evaluative as

well as descriptive. We discuss relevant empirical studies that have compared the algorithms and data

structures, and some of the most important algorithms are presented in detail, including implementations

in C.

Our primary audience is software engineers building systems with text processing components. Students

of computer science, information science, library science, and other disciplines who are interested in text

retrieval technology should also find the book useful. Finally, we hope that information retrieval

researchers will use the book as a basis for future research.

Bill Frakes

Ricardo Baeza-Yates

ACKNOWLEDGEMENTS

file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDob...Books_Algorithms_Collection2ed/books/book5/preface.htm (1 of 2)7/3/2004 4:19:18 PM

剩余629页未读，继续阅读

评论收藏

内容反馈

acecamel

2011-11-08

内容比较全面，可惜有点老了，尤其对于分布式并行搜索引擎中的算法和数据结构方面的内容介绍得不够。

hjtan_002

粉丝: 0
资源: 3

Information Retrieval Data Structures Algorithms

最新资源

Information Retrieval Data Structures Algorithms

Information Retrieval Data Structures & Algorithms

Algorithms and Data Structures

Data-Structures-and-Algorithms

Information Retrieval Data Structures and Algorithms (William B. Frakes, Ricardo Baeza-Yates)

Think Data Structures Algorithms and Information Retrieval in 无水印pdf

Think Data Structures Algorithms and Information Retrieval in Java-OReilly(2017)

Think Data Structures Algorithms and Information Retrieval in mobi

Think Data Structures Algorithms and Information Retrieval in epub

Think Data Structures: Algorithms and Information Retrieval in Java

Dr. Dobb's Essential Books on Algorithms and Data Structures(Part 1)

Essential Books on Algorithms and Data Structures(part1)(chm version)

数据结构思维 中文版 Think Data Structures

Data-Structures-And-Algorithms

Data-Structures-and-Algorithms-I

Algorithms-and-Data-Structures

Essential Books on Algorithms and Data Structures(part2)(chm version)

Dr. Dobb's Essential Books on Algorithms and Data Structures(Part 2)

国外数据结构和算法合集书籍的种子

十部经典的算法书籍（上部分）

最新资源

数据结构思维中文版 Think Data Structures