没有合适的资源?快使用搜索试试~ 我知道了~
In computer science, a red–black tree is a specialised binary
需积分: 0 1 下载量 80 浏览量
2023-12-15
09:26:08
上传
评论
收藏 1.85MB PDF 举报
温馨提示
试读
27页
In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when reorganising the tree to ensure that it is always approximately balanced.
资源推荐
资源详情
资源评论
Red–black tree
Type Tree
Invented 1978
Invented
by
Leonidas J. Guibas
and Robert Sedgewick
Complexities in big O notation
Space complexity
Space
Time complexity
Function Amortized Worst Case
Search
[1] [1]
Insert
[2] [1]
Delete
[2] [1]
Red–black tree
In computer science, a red–black tree is a specialised binary
search tree data structure noted for fast storage and retrieval of
ordered information, and a guarantee that operations will complete
within a known time. Compared to other self-balancing binary
search trees, the nodes in a red-black tree hold an extra bit called
"color" representing "red" and "black" which is used when re-
organising the tree to ensure that it is always approximately
balanced.
[3]
When the tree is modified, the new tree is rearranged and
"repainted" to restore the coloring properties that constrain how
unbalanced the tree can become in the worst case. The properties
are designed such that this rearranging and recoloring can be
performed efficiently.
The (re-)balancing is not perfect, but guarantees searching in Big
O time of , where is the number of entries (or keys) in
the tree. The insert and delete operations, along with the tree
rearrangement and recoloring, are also performed in
time.
[4]
Tracking the color of each node requires only one bit of information per node because there are only two
colors. The tree does not contain any other data specific to it being a red–black tree, so its memory footprint
is almost identical to that of a classic (uncolored) binary search tree. In some cases, the added bit of
information can be stored at no added memory cost.
In 1972,
Rudolf Bayer
[5]
invented a data structure that was a special order-4 case of a
B-tree
. These trees
maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees.
However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper
and later they became popular as
2–3–4 trees
or even 2–4 trees.
[6]
In a 1978 paper, "A Dichromatic Framework for Balanced Trees",
[7]
Leonidas J. Guibas
and
Robert
Sedgewick
derived the red–black tree from the symmetric binary B-tree.
[8]
The color "red" was chosen
because it was the best-looking color produced by the color laser printer available to the authors while
working at
Xerox PARC
.
[9]
Another response from Guibas states that it was because of the red and black
pens available to them to draw the trees.
[10]
In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete
operations.
[11]
In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function
needed to take care of only 4 unbalanced cases and one default balanced case.
[12]
History
Example of a red–black tree
Figure 1: ... with explicit NIL leaves
Figure 2: ... with implicit left and right docking points
The original algorithm used 8 unbalanced cases, but Cormen et al. (2001) reduced that to 6 unbalanced
cases.
[3]
Sedgewick showed that the insert operation can be implemented in just 46 lines of
Java
code.
[13][14]
In 2008, Sedgewick proposed the
left-leaning red–black tree
, leveraging Andersson’s idea that
simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red,
making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3
trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46
lines of code.
[15][16]
A red–black tree is a special type of binary search
tree, used in computer science to organize pieces of
comparable data, such as text fragments or numbers
(as e. g. the numbers in figures 1 and 2). The nodes
carrying keys and/or data are frequently called
"internal nodes", but to make this very specific they
are also called non-NIL nodes in this article.
The leaf nodes of red–black trees ( NIL in figure 1)
do not contain keys or data. These "leaves" need
not be explicit individuals in computer memory: a
NULL pointer can—as in all binary tree data
structures— encode the fact that there is no child
node at this position in the (parent) node.
Nevertheless, by their position in the tree, these
objects are in relation to other nodes that is relevant
to the RB-structure, it may have parent, sibling
(i. .e., the other child of the parent), uncle, even
nephew node; and may be child—but never parent
—of another node. It is not really necessary to
attribute a "color" to these end-of-path objects,
because the condition "is NIL or BLACK" is implied by the condition "is NIL" (see also this remark).
Figure 2 shows the conceptually same red–black tree without these NIL leaves. To arrive at the same notion
of a path, one must notice that e. g., 3 paths run through the node 1, namely a path through 1
left
plus 2
added paths through 1
right
, namely the paths through 6
left
and 6
right
. This way, these ends of the paths are
also docking points for new nodes to be inserted, fully equivalent to the NIL leaves of figure 1.
Instead, to save a marginal amount of execution time, these (possibly many) NIL leaves may be
implemented as pointers to one unique (and black) sentinel node (instead of pointers of value NULL).
As a conclusion, the fact that a child does not exist (is not a true node, does not contain data) can in all
occurrences be specified by the very same NULL pointer or as the very same pointer to a sentinel node.
Throughout this article, either choice is called NIL node and has the constant value NIL.
The black depth of a node is defined as the number of black nodes from the root to that node (i. .e. the
number of black ancestors). The black height of a red–black tree is the number of black nodes in any path
from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the
Terminology
black depth of any leaf node).
[17]: 154–165
The black height of a node is the black height of the subtree
rooted by it. In this article, the black height of a NIL node shall be set to 0, because its subtree is empty as
suggested by figure 2, and its tree height is also 0.
In addition to the requirements imposed on a binary search tree the following must be satisfied by a
red–black tree:
[18]
1. Every node is either red or black.
2. All NIL nodes (figure 1) are considered black.
3. A red node does not have a red child.
4. Every path from a given node to any of its descendant NIL nodes goes through the same
number of black nodes.
5. (Conclusion) If a node N has exactly one child, it must be a red child, because if it were
black, its NIL descendants would sit at a different black depth than N's NIL child, violating
requirement 4.
Some authors, e. g. Cormen & al.,
[18]
claim "the root is black" as fifth requirement; but not Mehlhorn &
Sanders
[17]
or Sedgewick & Wayne.
[16]: 432–447
Since the root can always be changed from red to black,
this rule has little effect on analysis. This article also omits it, because it slightly disturbs the recursive
algorithms and proofs.
As an example, every perfect binary tree that consists only of black nodes is a red–black tree.
The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast,
the modifying operations insert and delete easily maintain requirements 1 and 2, but with respect to the
other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called
a red-violation, or of requirement 4, called a black-violation.
The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is
no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-
balanced. Since operations such as inserting, deleting, and finding values require worst-case time
proportional to the height of the tree, this upper bound on the height allows red–black trees to be efficient
in the worst case, namely logarithmic in the number of entries, i. e. , which is not the case
for ordinary binary search trees. For a mathematical proof see section Proof of bounds.
Red–black trees, like all binary search trees, allow quite efficient sequential access (e. g. in-order traversal,
that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct
access via a traversal from root to leaf, resulting in search time.
A red–black tree is similar in structure to a
B-tree
of order
[19]
4, where each node can contain between 1
and 3 values and (accordingly) between 2 and 4 child pointers. In such a B-tree, each node will contain
only one value matching the value in a black node of the red–black tree, with an optional value before
and/or after it in the same node, both matching an equivalent red node of the red–black tree.
One way to see this equivalence is to "move up" the red nodes in a graphical representation of the red–
black tree, so that they align horizontally with their parent black node, by creating together a horizontal
cluster. In the B-tree, or in the modified graphical representation of the red–black tree, all leaf nodes are at
Properties
Analogy to B-trees of order 4
Figure 3: The same red–black tree as in the example above,
now as a B-tree
the same depth.
The red–black tree is then structurally
equivalent to a B-tree of order 4, with a
minimum fill factor of 33% of values per
cluster with a maximum capacity of 3
values.
This B-tree type is still more general than
a red–black tree though, as it allows
ambiguity in a red–black tree conversion
—multiple red–black trees can be
produced from an equivalent B-tree of order 4 (see figure 3). If a B-tree cluster contains only 1 value, it is
the minimum, black, and has two child pointers. If a cluster contains 3 values, then the central value will be
black and each value stored on its sides will be red. If the cluster contains two values, however, either one
can become the black node in the red–black tree (and the other one will be red).
So the order-4 B-tree does not maintain which of the values contained in each cluster is the root black tree
for the whole cluster and the parent of the other values in the same cluster. Despite this, the operations on
red–black trees are more economical in time because you don't have to maintain the vector of values.
[20]
It
may be costly if values are stored directly in each node rather than being stored by reference. B-tree nodes,
however, are more economical in space because you don't need to store the color attribute for each node.
Instead, you have to know which slot in the cluster vector is used. If values are stored by reference, e. g.
objects, null references can be used and so the cluster can be represented by a vector containing 3 slots for
value pointers plus 4 slots for child references in the tree. In that case, the B-tree can be more compact in
memory, improving data locality.
The same analogy can be made with B-trees with larger orders that can be structurally equivalent to a
colored binary tree: you just need more colors. Suppose that you add blue, then the blue–red–black tree
defined like red–black trees but with the additional constraint that no two successive nodes in the hierarchy
will be blue and all blue nodes will be children of a red node, then it becomes equivalent to a B-tree whose
clusters will have at most 7 values in the following colors: blue, red, blue, black, blue, red, blue (For each
cluster, there will be at most 1 black node, 2 red nodes, and 4 blue nodes).
For moderate volumes of values, insertions and deletions in a colored binary tree are faster compared to B-
trees because colored trees don't attempt to maximise the fill factor of each horizontal cluster of nodes (only
the minimum fill factor is guaranteed in colored binary trees, limiting the number of splits or junctions of
clusters). B-trees will be faster for performing rotations (because rotations will frequently occur within the
same cluster rather than with multiple separate nodes in a colored binary tree). For storing large volumes,
however, B-trees will be much faster as they will be more compact by grouping several children in the
same cluster where they can be accessed locally.
All optimizations possible in B-trees to increase the average fill factors of clusters are possible in the
equivalent multicolored binary tree. Notably, maximizing the average fill factor in a structurally equivalent
B-tree is the same as reducing the total height of the multicolored tree, by increasing the number of non-
black nodes. The worst case occurs when all nodes in a colored binary tree are black, the best case occurs
when only a third of them are black (and the other two thirds are red nodes).
Applications and related data structures
Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only
does this make them valuable in time-sensitive applications such as real-time applications, but it makes them
valuable building blocks in other data structures that provide worst-case guarantees; for example, many data
structures used in computational geometry can be based on red–black trees, and the Completely Fair
Scheduler
used in current
Linux
kernels and
epoll
system call implementation
[21]
uses red–black trees.
The AVL tree is another structure supporting search, insertion, and removal. AVL trees can be
colored red–black, thus are a subset of RB trees. Worst-case height is 0.720 times the worst-case height of
RB trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with
realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and
geometric mean 0.910.
[22]
WAVL trees
have a performance in between those two.
Red–black trees are also particularly valuable in functional programming, where they are one of the most
common persistent data structures, used to construct associative arrays and sets that can retain previous
versions after mutations. The persistent version of red–black trees requires space for each
insertion or deletion, in addition to time.
For every 2–4 tree, there are corresponding red–black trees with data elements in the same order. The
insertion and deletion operations on 2–4 trees are also equivalent to color-flipping and rotations in red–
black trees. This makes 2–4 trees an important tool for understanding the logic behind red–black trees, and
this is why many introductory algorithm texts introduce 2–4 trees just before red–black trees, even though
2–4 trees are not often used in practice.
In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black
tree
[23]
by eliminating a previously unspecified degree of freedom in the implementation. The LLRB
maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–
black trees can be made isometric to either
2–3 trees
,
[24]
or 2–4 trees,
[23]
for any sequence of operations.
The 2–4 tree isometry was described in 1978 by Sedgewick.
[7]
With 2–4 trees, the isometry is resolved by
a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and
moves to the parent node.
The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–
black trees as part of its data structure.
[25]
As of Java 8, the HashMap has been modified such that instead of using a LinkedList to store different
elements with colliding hashcodes, a red–black tree is used. This results in the improvement of time
complexity of searching such an element from to where is the number of elements
with colliding hashcodes.
[26]
The read-only operations, such as search or tree traversal, on a red–black tree require no modification from
those used for binary search trees, because every red–black tree is a special case of a simple binary search
tree. However, the immediate result of an insertion or removal may violate the properties of a red–black
tree, the restoration of which is called rebalancing so that red–black trees become self-balancing. It requires
in the worst case a small number, in Big O notation, where is the number of objects in the tree,
on average or amortized , a constant number,
[27]: 310
[17]: 158
of color changes (which are very quick
in practice); and no more than three
tree rotations
[28]
(two for insertion).
If the example implementation below is not suitable, other implementations with explanations may be found
in Ben Pfaff’s
[29]
annotated C library
GNU libavl (http://adtinfo.org/)
(v2.0.3 as of June 2019).
Operations
剩余26页未读,继续阅读
资源评论
老虎爱代码
- 粉丝: 623
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功