【免费】Incomputerscience,ared–blacktreeisaspecialisedbinary资源-CSDN文库

需积分: 0 80 浏览量 2023-12-15 09:26:08 上传评论收藏 1.85MB PDF 举报

资源推荐

资源详情

资源评论

Red–black tree
Type Tree
Invented 1978
Invented
by
Leonidas J. Guibas
and Robert Sedgewick
Complexities in big O notation
Space complexity
Space
Time complexity
Function Amortized Worst Case
Search
[1] [1]
Insert
[2] [1]
Delete
[2] [1]
Red–black tree
In  computer  science,  a  red–black  tree  is  a  specialised  binary
search  tree  data  structure  noted  for  fast  storage  and  retrieval  of
ordered information, and a guarantee that operations will complete
within  a  known  time.  Compared  to  other  self-balancing  binary
search trees, the nodes in a red-black tree hold an extra bit called
"color"  representing  "red"  and  "black"  which  is  used  when  re-
organising  the  tree  to  ensure  that  it  is  always  approximately
balanced. 
[3]
When  the  tree  is  modified,  the  new  tree  is  rearranged  and
"repainted"  to  restore  the  coloring  properties  that  constrain  how
unbalanced the tree can become in the worst case. The properties
are  designed  such  that  this  rearranging  and  recoloring  can  be
performed efficiently.
The (re-)balancing is  not  perfect, but guarantees searching in Big
O time of  , where   is the number of entries (or keys) in
the  tree.  The  insert  and  delete  operations,  along  with  the  tree
rearrangement  and  recoloring,  are  also  performed  in 
time.
[4]
Tracking the color of each node requires only one bit of information per node because there are only two
colors. The tree does not contain any other data specific to it being a red–black tree, so its memory footprint
is  almost  identical  to  that  of  a  classic  (uncolored)  binary  search  tree.  In  some  cases,  the  added  bit  of
information can be stored at no added memory cost.
In 1972, 
Rudolf Bayer
[5]
 invented a data structure that was a special order-4 case of a 
B-tree
. These trees
maintained all  paths  from  root  to leaf with  the  same  number of nodes,  creating perfectly  balanced trees.
However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper
and later they became popular as 
2–3–4 trees
 or even 2–4 trees.
[6]
In  a  1978  paper,  "A  Dichromatic  Framework  for  Balanced  Trees",
[7]
 
Leonidas  J.  Guibas
  and 
Robert
Sedgewick
  derived  the  red–black  tree from  the  symmetric  binary  B-tree.
[8]
  The  color  "red"  was chosen
because  it  was  the  best-looking  color  produced  by  the  color  laser  printer  available  to  the  authors  while
working at 
Xerox PARC
.
[9]
 Another response from Guibas states that it was because of the red and black
pens available to them to draw the trees.
[10]
In  1993,  Arne  Andersson  introduced  the  idea  of  a  right  leaning  tree  to  simplify  insert  and  delete
operations.
[11]
In 1999, Chris Okasaki showed  how to make the insert  operation purely functional. Its balance function
needed to take care of only 4 unbalanced cases and one default balanced case.
[12]
History

Example of a red–black tree
Figure 1: ... with explicit NIL leaves
Figure 2: ... with implicit left and right docking points
The  original algorithm  used 8 unbalanced cases, but  Cormen et  al.  (2001)  reduced  that to  6  unbalanced
cases.
[3]
  Sedgewick  showed  that  the  insert  operation  can  be  implemented  in  just  46  lines  of 
Java
code.
[13][14]
 In 2008, Sedgewick proposed the 
left-leaning red–black tree
, leveraging Andersson’s idea that
simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red,
making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3
trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46
lines of code.
[15][16]
A red–black tree is a special type of binary search
tree, used in computer science to organize pieces of
comparable data, such as text fragments or numbers
(as e. g. the numbers in figures 1 and 2). The nodes
carrying  keys  and/or  data  are  frequently  called
"internal nodes", but to make this very specific they
are also called non-NIL nodes in this article.
The leaf nodes of red–black trees ( NIL  in figure 1)
do  not  contain  keys  or  data.  These  "leaves"  need
not  be  explicit  individuals  in  computer  memory: a
NULL  pointer  can—as  in  all  binary  tree  data
structures—  encode  the  fact  that  there  is  no  child
node  at  this  position  in  the  (parent)  node.
Nevertheless,  by  their  position  in  the  tree,  these
objects are in relation to other nodes that is relevant
to  the  RB-structure,  it  may  have  parent,  sibling
(i.  .e.,  the  other  child  of  the  parent),  uncle,  even
nephew node; and may be child—but never parent
—of  another  node.  It  is  not  really  necessary  to
attribute  a  "color"  to  these  end-of-path  objects,
because the condition "is NIL or BLACK" is implied by the condition "is NIL" (see also this remark).
Figure 2 shows the conceptually same red–black tree without these NIL leaves. To arrive at the same notion
of a path, one must notice that e. g., 3 paths run through the node 1, namely a path through 1
left
 plus 2
added paths through 1
right
, namely the paths through 6
left
 and 6
right
. This way, these ends of the paths are
also docking points for new nodes to be inserted, fully equivalent to the NIL leaves of figure 1.
Instead,  to  save  a  marginal  amount  of  execution  time,  these  (possibly  many)  NIL  leaves  may  be
implemented as pointers to one unique (and black) sentinel node (instead of pointers of value NULL).
As a conclusion, the fact that a child does not exist (is not a true node, does not contain data) can in all
occurrences be specified by the very same NULL pointer or as the very same pointer to a sentinel node.
Throughout this article, either choice is called NIL node and has the constant value NIL.
The black depth of a node is defined as the number of black nodes from the root to that node (i. .e. the
number of black ancestors). The black height of a red–black tree is the number of black nodes in any path
from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the
Terminology

black depth of any  leaf node).
[17]: 154–165 
 The  black  height of  a node is the black height  of the  subtree
rooted by it. In this article, the black height of a NIL node shall be set to 0, because its subtree is empty as
suggested by figure 2, and its tree height is also 0.
In  addition  to  the  requirements  imposed  on  a  binary  search  tree  the  following  must  be  satisfied  by  a
red–black tree:
[18]
1. Every node is either red or black.
2. All NIL nodes (figure 1) are considered black.
3. A red node does not have a red child.
4. Every path from a given node to any of its descendant NIL nodes goes through the same
number of black nodes.
5. (Conclusion) If a node N has exactly one child, it must be a red child, because if it were
black, its NIL descendants would sit at a different black depth than N's NIL child, violating
requirement 4.
Some authors, e. g. Cormen & al.,
[18]
 claim "the root is black" as fifth requirement; but not Mehlhorn &
Sanders
[17]
 or Sedgewick & Wayne.
[16]: 432–447 
 Since the root can always be changed from red to black,
this  rule  has  little  effect  on  analysis.  This  article  also  omits  it,  because  it  slightly  disturbs  the  recursive
algorithms and proofs.
As an example, every perfect binary tree that consists only of black nodes is a red–black tree.
The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast,
the  modifying  operations insert and  delete  easily  maintain  requirements  1  and  2,  but  with respect  to  the
other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called
a red-violation, or of requirement 4, called a black-violation.
The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is
no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-
balanced.  Since  operations  such  as  inserting,  deleting,  and  finding  values  require  worst-case  time
proportional to the height   of the tree, this upper bound on the height allows red–black trees to be efficient
in the worst case, namely logarithmic in the number   of entries, i. e.  , which is not the case
for ordinary binary search trees. For a mathematical proof see section Proof of bounds.
Red–black trees, like all binary search trees, allow quite efficient sequential access (e. g. in-order traversal,
that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct
access via a traversal from root to leaf, resulting in   search time.
A red–black tree is similar in structure to a 
B-tree
 of order
[19]
 4, where each node can contain between 1
and 3 values and (accordingly) between 2 and 4  child pointers. In such a B-tree, each node will contain
only  one  value  matching the  value  in a  black  node  of  the  red–black  tree, with an optional  value  before
and/or after it in the same node, both matching an equivalent red node of the red–black tree.
One way to see this equivalence is to "move up" the red nodes in a graphical representation of the red–
black  tree, so that  they  align horizontally  with  their  parent black  node,  by  creating together a horizontal
cluster. In the B-tree, or in the modified graphical representation of the red–black tree, all leaf nodes are at
Properties
Analogy to B-trees of order 4

Figure 3: The same red–black tree as in the example above,

now as a B-tree

the same depth.

The red–black tree is then structurally

equivalent to a B-tree of order 4, with a

minimum fill factor of 33% of values per

cluster with a maximum capacity of 3

values.

This B-tree type is still more general than

a red–black tree though, as it allows

ambiguity in a red–black tree conversion

—multiple red–black trees can be

produced from an equivalent B-tree of order 4 (see figure 3). If a B-tree cluster contains only 1 value, it is

the minimum, black, and has two child pointers. If a cluster contains 3 values, then the central value will be

black and each value stored on its sides will be red. If the cluster contains two values, however, either one

can become the black node in the red–black tree (and the other one will be red).

So the order-4 B-tree does not maintain which of the values contained in each cluster is the root black tree

for the whole cluster and the parent of the other values in the same cluster. Despite this, the operations on

red–black trees are more economical in time because you don't have to maintain the vector of values.

[20]

may be costly if values are stored directly in each node rather than being stored by reference. B-tree nodes,

however, are more economical in space because you don't need to store the color attribute for each node.

Instead, you have to know which slot in the cluster vector is used. If values are stored by reference, e. g.

objects, null references can be used and so the cluster can be represented by a vector containing 3 slots for

value pointers plus 4 slots for child references in the tree. In that case, the B-tree can be more compact in

memory, improving data locality.

The same analogy can be made with B-trees with larger orders that can be structurally equivalent to a

colored binary tree: you just need more colors. Suppose that you add blue, then the blue–red–black tree

defined like red–black trees but with the additional constraint that no two successive nodes in the hierarchy

will be blue and all blue nodes will be children of a red node, then it becomes equivalent to a B-tree whose

clusters will have at most 7 values in the following colors: blue, red, blue, black, blue, red, blue (For each

cluster, there will be at most 1 black node, 2 red nodes, and 4 blue nodes).

For moderate volumes of values, insertions and deletions in a colored binary tree are faster compared to B-

trees because colored trees don't attempt to maximise the fill factor of each horizontal cluster of nodes (only

the minimum fill factor is guaranteed in colored binary trees, limiting the number of splits or junctions of

clusters). B-trees will be faster for performing rotations (because rotations will frequently occur within the

same cluster rather than with multiple separate nodes in a colored binary tree). For storing large volumes,

however, B-trees will be much faster as they will be more compact by grouping several children in the

same cluster where they can be accessed locally.

All optimizations possible in B-trees to increase the average fill factors of clusters are possible in the

equivalent multicolored binary tree. Notably, maximizing the average fill factor in a structurally equivalent

B-tree is the same as reducing the total height of the multicolored tree, by increasing the number of non-

black nodes. The worst case occurs when all nodes in a colored binary tree are black, the best case occurs

when only a third of them are black (and the other two thirds are red nodes).

Applications and related data structures

Red–black  trees  offer  worst-case  guarantees  for  insertion  time,  deletion  time,  and  search  time.  Not  only
does this make them valuable in time-sensitive applications such as real-time applications, but it makes them
valuable building blocks in other data structures that provide worst-case guarantees; for example, many data
structures  used  in  computational  geometry  can  be  based  on  red–black  trees,  and  the  Completely  Fair
Scheduler
 used in current 
Linux
 kernels and 
epoll
 system call implementation
[21]
 uses red–black trees.
The AVL tree is another structure supporting   search, insertion, and removal. AVL  trees can be
colored red–black, thus are a subset of RB trees. Worst-case height is 0.720 times the worst-case height of
RB  trees,  so  AVL  trees  are  more  rigidly  balanced.  The  performance  measurements  of  Ben  Pfaff  with
realistic  test  cases  in  79  runs  find  AVL  to  RB  ratios  between  0.677  and  1.077,  median  at  0.947,  and
geometric mean 0.910.
[22]
 
WAVL trees
 have a performance in between those two.
Red–black trees are also particularly valuable in functional programming, where they are one of the most
common  persistent  data  structures,  used  to  construct  associative  arrays  and  sets  that  can  retain  previous
versions  after  mutations.  The  persistent  version  of  red–black  trees  requires    space  for  each
insertion or deletion, in addition to time.
For  every  2–4  tree,  there  are  corresponding  red–black  trees  with  data  elements  in  the  same  order.  The
insertion  and  deletion  operations  on  2–4  trees  are  also  equivalent  to  color-flipping  and  rotations  in  red–
black trees. This makes 2–4 trees an important tool for understanding the logic behind red–black trees, and
this is why many introductory algorithm texts introduce 2–4 trees just before red–black trees, even though
2–4 trees are not often used in practice.
In 2008,  Sedgewick  introduced  a  simpler  version  of  the  red–black  tree  called  the  left-leaning  red–black
tree
[23]
  by  eliminating  a  previously  unspecified  degree  of  freedom  in  the  implementation.  The  LLRB
maintains  an  additional  invariant  that all red  links must  lean  left  except  during  inserts  and  deletes.  Red–
black trees can be made isometric to either 
2–3 trees
,
[24]
 or 2–4 trees,
[23]
 for any sequence of operations.
The 2–4 tree isometry was described in 1978 by Sedgewick.
[7]
 With 2–4 trees, the isometry is resolved by
a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and
moves to the parent node.
The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–
black trees as part of its data structure.
[25]
As of Java  8,  the HashMap  has been  modified such that  instead  of using a  LinkedList to store different
elements  with  colliding  hashcodes,  a  red–black  tree  is  used.  This  results  in  the  improvement  of  time
complexity of  searching  such an element from   to   where   is the  number of  elements
with colliding hashcodes.
[26]
The read-only operations, such as search or tree traversal, on a red–black tree require no modification from
those used for binary search trees, because every red–black tree is a special case of a simple binary search
tree.  However,  the immediate result of an insertion  or removal may violate the properties of a red–black
tree, the restoration of which is called rebalancing so that red–black trees become self-balancing. It requires
in the worst case a small number,   in Big O notation, where   is the number of objects in the tree,
on average or amortized  , a constant number,
[27]: 310 
 
[17]: 158 
 of color changes (which are very quick
in practice); and no more than three 
tree rotations
[28]
 (two for insertion).
If the example implementation below is not suitable, other implementations with explanations may be found
in Ben Pfaff’s
[29]
 annotated C library 
GNU libavl (http://adtinfo.org/)
 (v2.0.3 as of June 2019).
Operations