This book is designed for a two-semester sequence in computer science,
beginning with what is typically known as Data Structures (CS-2) and con-
tinuing with advanced data structures and algorithm analysis.
The content of the CS-2 course has been evolving for some time.
Although there is some general con
sensus concerning topic coverage, con-
siderable disagreement still exists over the details. One uniformly accepted
topic is principles of software development, most notably the concepts of
encapsulation and information hiding. Algorithmically, all CS-2 courses
tend to include an introduction to running-time analysis, recursion, basic
sorting algorithms, and elementary data structures. An advanced course is
offered at many universities that covers topics in data structures, algorithms,
and running-time analysis at a higher level. The material in this text has been
designed for use in both levels of courses, thus eliminating the need to pur-
chase a second textbook.
Although the most passionate debates in CS-2 revolve around the choice
of a programming language, other fundamental choices need to be made,
whether to introduce object-oriented design or object-based design
the level of mathematical rigor,
the appropriate balance between the implementation of data struc-
tures and their use, and
programming details related to the language chosen.
My goal in writing this text was to provide a practical introduction to
data structures and algorithms from the viewpoint of abstract thinking and
problem solving. I tried to cover all of the important details concerning the
data structures, their analyses, and their C++ implementations, while staying away from data structures that are theoretically interesting but not widely
used. It is impossible to cover in a single course all the different data struc-
tures, including their uses and the analysis, described in this text. So, I
designed the textbook to allow instructors flexibility in topic coverage. The
instructor will need to decide on an appropriate balance between practice
and theory and then choose those topics that best fit the course. As I discuss
later in this Preface, I organized the text to minimize dependencies among
the various chapters.
A Unique Approach
My basic premise is that software development tools in all languages come
with large libraries, and many data structures are part of these libraries. I
envision an eventual shift in emphasis of data structures courses from imple-
mentation to use. In this book I take a unique approach by separating the
data structures into their specification and subsequent implementation and
take advantage of an already existing data structures library, the Standard
Template Library (STL).
A subset of the STL suitable for most applications is discussed in a sin-
gle chapter (Chapter 7) in Part 11. Part 11 also covers basic analysis tech-
niques, recursion, and sorting. Part I11 contains a host of applications that
use the STL's data structures. Implementation of the STL is not shown until
Part IV, once the data structures have already been used. Because the STL is
part of C++ (older compilers can use the textbook's STL code instead-see
Code Availability, xxix), students can design large projects early on, using
existing software components.
Despite the central use of the STL in this text, it is neither a book on the
STL nor a primer on implementing the STL specifically; it remains a book
that emphasizes data structures and basic problem-solving techniques. Of
course, the general techniques used in the design of data structures are appli-
cable to the implementation of the STL, so several chapters in Part IV
include STL implementations. However, instructors can choose the simpler
implementations in Part IV that do not discuss the STL protocol. Chapter 7,
which presents the STL, is essential to understanding the code in Part 111. I
attempted to use only the basic parts of the STL.
Many instructors will prefer a more traditional approach in which each
data structure is defined, implemented, and then used. Because there is no
dependency between material in Parts I11 and IV, a traditional course can
easily be taught from this book. Prerequisites
Students using this book should have knowledge of either an object-oriented
or procedural programming language. Knowledge of basic features, includ-
ing primitive data types, operators, control structures, functions (methods),
and input and output (but not necessarily arrays and classes) is assumed.
Students who have taken a first course using C++ or Java may find the
first two chapters "light" reading in some places. However, other parts are
definitely "heavy" with C++ details that may not have been covered in intro-
Students who have had a first course in another language should begin at
Chapter 1 and proceed slowly. They also should consult Appendix A which
discusses some language issues that are somewhat C++ specific. If a student
would like also to use a C++ reference book, some recommendations are
given in Chapter 1, pages 38-39.
Knowledge of discrete math is helpful but is not an absolute prerequi-
site. Several mathematical proofs are presented, but the more complex
proofs are preceded by a brief math review. Chapters 8 and 19-24 require
some degree of mathematical sophistication. The instructor may easily elect
to skip mathematical aspects of the proofs by presenting only the results. All
proofs in the text are clearly marked and are separate from the body of the
Summary of Changes in the Second Edition
1. Much of Part I was rewritten. In Chapter 1, primitive arrays are no
longer presented (a discussion of them was moved to Appendix D);
vectors are used instead, and push-back is introduced. Pointers
appear later in this edition than in the first edition. In Chapter 2,
material was significantly rearranged and simplified. Chapter 3 has
additional material on templates. In Chapter 4, the discussion on
inheritance was rewritten to simplify the initial presentation. The
end of the chapter contains the more esoteric C++ details that are
important for advanced uses.
2. An additional chapter on design patterns was added in Part I. Sev-
eral object-based patterns, including Functor, Wrapper, and Iterator,
are described, and patterns that make use of inheritance. including
Observer, are discussed.
3. The Data Structures chapter in Part I1 was rewritten with the STL in
mind. Both generic interfaces (as in the first edition) and STL inter-
faces are illustrated in the revised Chapter 7. 4. The code in Part I11 is based on the STL. In several places, the code
is more object-oriented than before. The Huffman coding example
is completely coded.
5. In Part IV, generic data structures were rewritten to be much sim-
pler and cleaner. Additionally, as appropriate, a simplified STL
implementation is illustrated at the end of the chapters in Part IV.
lmplemented components include vector, 1 is t, stack, queue,
set, map, priority-queue, and various function objects and
Using C++ presents both advantages and disadvantages. The C++ class
allows the separation of interface and implementation, as well as the hid-
ing of internal details of the implementation. It cleanly supports the notion
of abstraction. The advantage of C++ is that it is widely used in industry.
Students perceive that the material they are learning is practical and will
help them find employment, which provides motivation to persevere
through the course. One disadvantage of C++ is that it is far from a perfect
language pedagogically, especially in a second course, and thus additional
care needs to be expended to avoid bad programming practices. A second
disadvantage is that C++ is still not a stable language, so the various com-
pilers behave differently.
It might have been preferable to write the book in a language-indepen-
dent fashion, concentrating only on general principles such as the theory of
the data structures and referring to C++ code only in passing, but that is
impossible. C++ code is complex, and students will need to see complete
examples to understand some of its finer points. As mentioned earlier, a brief
review of parts of C++ is provided in Appendix A. Part I of the book
describes some of C++'s more advanced features relevant to data structures.
Several parts of the language stand out as requiring special consider-
ation: templates, inheritance, exceptions, namespaces and other recent C++
additions, and the Standard Library. 1 approached this material in the follow-
Templates: Templates are used extensively. Some instructors may
have reservations with this approach because it complicates the code,
but I included them because they are fundamental concepts in any
sophisticated C++ program.
I~zheritance: I use inheritance relatively sparingly because it adds
complications, and data structures are not a strong application area -- --
for it, This edition contains less use of inheritance than in the previ-
ous edition. However, there is a chapter on inheritance, and part of the
design patterns chapter touches on inheritance-based patterns. For the
most part, instructors who are eager to avoid inheritance can do so,
and those who want to discuss inheritance will find sufficient material
in the text.
Exceptions: Exception semantics have been standardized and
exceptions seem to work on many compilers. However, exceptions
in C++ involve ugly code, significant complications (e.g., if used in
conjunction with templates), and probably require discussing inher-
itance. So I use them sparingly in this text. A brief discussion of
exceptions is provided, and in some places exceptions are thrown in
code when warranted. However, I generally do not attempt to catch
exceptions in any Part I11 code (most of the Standard Library does
not attempt to throw exceptions).
Namespaces: Namespaces, which are a recent addition to C++, do not
work correctly on a large variety of compilers. I do not attempt to use
namespaces and I import the entire std namespace when necessary
(even though not great style, it works on the largest number of com-
pilers). Appendix A discusses the namespace issues.
Recent language additions: The boo1 data type is used throughout.
The new static-cas t operator is used in preference to the old-style
cast. Finally, I use explicit when appropriate. For the most part,
other additions are not used (e.g., I generally avoid using typename).
Standard Library: As previously mentioned, the STL is used through-
out, and a safe version (that does extra bounds checking) is available
online (and implemented in Part IV). We also use the string class
and the newer istringstream class that are part of the standard
In this text I introduce C++ and object-oriented programming (particularly
abstraction) in Part I. I discuss arrays, pointers and some other C++ topics
and then go on to discuss the syntax and use of classes, templates, and inher-
itance. The material in these chapters was substantially rewritten. New to
this edition is an entire chapter on design patterns.
In Part I1 I discuss Big-Oh and algorithmic paradigms, including recur-
sion and randomization. An entire chapter is devoted to sorting, and a sepa-
rate chapter contains a description of basic data structures. I use the STL in
presenting the interfaces and running times of the data structures. At this Preface
point in the text, the instructor may take several approaches to present the
remaining material, including the following two.
I. Discuss the corresponding implementations (either the STL ver-
sions or the simpler versions) in Part IV as each data structure is
described. The instructor can ask students to extend the classes in
various ways, as suggested in the exercises.
2. Show how the STL class is used and cover implementation at a later
point in the course. The case studies in Part 111 can be used to sup-
port this approach. As complete implementations are available on
every modern C++ compiler (or on the Internet for older compil-
ers). the instructor can use the STL in programming projects.
Details on using this approach are given shortly.
Part V describes advanced data structures such as splay trees, pairing
heaps, and the disjoint set data structure, which can be covered if time per-
mits or, more likely, in a follow-up course.
Chapter-by-Chapter Text Organization
Part I consists of ti ve chapters that describe some advanced features of C++
used throughout the text. Chapter I describes arrays, strings, pointers, refer-
ences, and structures. Chapter 2 begins the discussion of object-oriented pro-
gramming by describing the class mechanism in C++. Chapter 3 continues
this discussion by examining templates, and Chapter 4 illustrates the use of
inheritance. Several components, including strings and vectors, are written
in these chapters. Chapter 5 discusses some basic design patterns, focusing
mostly on object-based patterns such as function objects, wrappers and
adapters, iterators, and pairs. Some of these patterns (most notably the wrap-
per pattern) are used later in the text.
Part IT focuses on the basic algorithms and building blocks. In Chapter 6
a complete discussion of time complexity and Big-Oh notation is provided,
and binary search is also discussed and analyzed. Chapter 7 is crucial
because it covers the STL and argues intuitively what the running time of the
supported operations should be for each data structure. (The implementation
of these data structures. in both STL-style and a simplified version, is not
provided until Part IV. The STL is available on recent compilers.) Chapter 8
describes recursion by ti rst introducing the notion of proof by induction. It
also discusses divide-and-conquer, dynamic programming, and backtrack-
ing. A section describes several recursive numerical algorithms that are used
to implement the RSA cryptosystem. For many students, the material in the second half of Chapter 8 is more suitable for a follow-up course. Chapter 9
describes. codes, and analyzes several basic sorting algorithms, including
the insertion sort, Shellsort, mergesort, and quicksort, as well as indirect
sorting. It also proves the classic lower bound for sorting and discusses the
related problems of selection. Finally, Chapter 10 is a short chapter that dis-
cusses random numbers, including their generation and use in randomized
Part 111 provides several case studies, and each chapter is organized
around a general theme. Chapter I I illustrates several important techniques
by examining games. Chapter 12 discusses the use of stacks in computer
languages by examining an algorithm to check for balanced symbols and the
classic operator precedence parsing algorithm. Complete implementations
with code are provided for both algorithms. Chapter 13 discusses the basic
utilities of file compression and cross-reference generation, and provides a
complete implementation of both. Chapter 14 broadly examines simulation
by looking at one problem that can be viewed as a simulation and then at the
more classic event-driven simulation. Finally, Chapter 15 illustrates how
data structures are used to implement several shortest path algorithms effi-
ciently for graphs.
Part IV presents the data structure implementations. Implementations
that use simple protocols (insert, find, remove variations) are provided.
In some cases, STL implementations that tend to use more complicated C++
syntax are presented. Some mathematics is used in this part, especially in
Chapters 19-2 1, and can be skipped at the discretion of the instructor. Chap-
ter 16 provides implementations for both stacks and queues. First these data
structures are implemented using an expanding array; then they are imple-
mented using linked lists. The STL versions are discussed at the end of the
chapter. General linked lists are described in Chapter 17. Singly linked lists
are illustrated with a simple protocol, and the more complex STL version
that uses doubly linked lists is provided at the end of the chapter. Chapter 18
describes trees and illustrates the basic traversal schemes. Chapter 19 is a
detailed chapter that provides several implementations of binary search
trees. Initially, the basic binary search tree is shown, and then a binary
search tree that supports order statistics is derived. AVL trees are discussed
but not implemented; however, the more practical red-black trees and AA-
trees are implemented. Then the STL set and map are implemented.
Finally, the B-tree is examined. Chapter 20 discusses hash tables and imple-
ments the quadratic probing scheme, after examination of a simpler alterna-
tive. Chapter 21 describes the binary heap and examines heapsort and
external sorting. The STL pr iority-queue is implemented in this chapter.
Part Chapter V contains material suitable for use in a more advanced
course or for general reference. The algorithms are accessible even at the first-year level; however, for completeness sophisticated mathematical anal-
yses were included that are almost certainly beyond the reach of a first-year
student. Chapter 22 describes the splay tree, which is a binary search tree
that seems to perform extremely well in practice and is also competitive with
the binary heap in some applications that require priority queues. Chapter 23
describes priority queues that support merging operations and provides an
implementation of the pairing heap. Finally, Chapter 24 examines the classic
disjoint set data structure.
The appendices contain additional C++ reference material. Appendix A
describes tricky C++ issues, including some unusual operators, 110, and
recent language changes. Appendix B lists the operators and their prece-
dence. Appendix C summarizes some C++ libraries. Appendix D describes
primitive arrays and strings for those who want details of what is going on
under the hood of the vector and string classes.
Generally speaking, most chapters are independent of each other. However,
the following are some of the notable dependencies.
Part I: The first three chapters should be covered in their entirety first. I
recommend a brief discussion of inheritance in Chapter 4. Some instruc-
tors will want to cover all of inheritance, but it is possible to get by with
just the basics of inheritance and avoid some of the more difficult C++
issues that inheritance involves. Some of the object-based patterns (e.g.,
wrappers and function objects) in Chapter 5 can be discussed shortly
after templates, or later in the course as the need arises. Some of these
patterns are used in the chapter on sorting and in Part IV.
Chapter 6 (Algorithm Analysis): This chapter should be covered prior
to Chapters 7 and 9. Recursion (Chapter 8) can be covered prior to
this chapter, but the instructor will have to gloss over some details
about avoiding inefficient recursion.
Chapter 7 (STL): This chapter can be covered prior to, or in conjunc-
tion with, material in Part 111 or IV.
Chapter 8 (Recursion): The material in Sections 8.1-8.3 should be
covered prior to discussing recursive sorting algorithms, trees, the tic-
tac-toe case study, and shortest-path algorithms. Material such as the
RSA cryptosystem, dynamic programming, and backtracking (unless
tic-tac-toe is discussed) is otherwise optional.
Chapter 9 (Sorting): This chapter should follow Chapters 6 and 8.
However, it is possible to cover Shellsort without Chapters 6 and 8. Shellsort is not recursive (hence there is no need for Chapter 8), and a
rigorous analysis of its running time is too complex and is not cov-
ered in the book (hence there is little need for Chapter 6).
Chapters 16 and 17 (Stacks/Queues/Lists): These chapters may be
covered in either order. However, I prefer to cover Chapter 16 first,
because I believe that it presents a simpler example of linked lists.
Chapters 18 and 19 (TreesBearch trees): These chapters can be cov-
ered in either order or simultaneously.
The other chapters have little or no dependencies:
Chapter 10 (Randomization): The material on random numbers can
be covered at any point as needed.
Part III (Case Studies): Chapters 11-15 can be covered in conjunction
with, or after, the STL (in Chapter 7), and in roughly any order. There
are a few references to earlier chapters. These include Section 1 1.2 (tic-
tac-toe), which references a discussion in Section 8.7, and Section 13.2
(cross-reference generation), which references similar lexical analysis
code in Section 12.1 (balanced symbol checking).
CIzapters 20 and 21 (Hashing/Priority Queues): These chapters can
be covered at any point.
Part V (Advanced Data Structures): The material in Chapters 22-24
is self-contained and is typically covered in a follow-up course.
I have attempted to provide mathematical rigor for use in CS-2 courses that
emphasize theory and for follow-up courses that require more analysis.
However, this material stands out from the main text in the form of separate
theorems and, in some cases, separate sections (or subsections). Thus it can
be skipped by instructors in courses that deemphasize theory.
In all cases, the proof of a theorem is not necessary to the understanding
of the theorem's meaning. This is another illustration of the separation of an
interface (the theorem statement) from its implementation (the proof). Some
inherently mathematical material, such as Section 8.4 (Numerical Applica-
tions of Recursion), can be skipped without affecting comprehension of the
rest of the chapter.