PrefixB-trees-1977-计算机科学资源-CSDN文库

116 浏览量 2021-04-22 18:00:42 上传评论收藏 1.07MB PDF 举报

资源推荐

资源详情

资源评论

Prefix B-Trees

RUDOLF BAYER and KARL UNTERAUER

Technische Universitiit Miinchen

Two modifications of B-trees are described, simple prefix B-trees and prefix B-trees. Both

store only parts of keys, namely prefixes, in the index part of a B*-tree. In simple prefix B-

trees those prefixes are selected carefully to minimize their length. In prefix B-trees the pre-

fixes need not he fully stored, but are reconstructed as the tree is searched. Prefix B-trees are

designed to combine some of the advantages of B-trees, digital search trees, and key compres-

sion techniques while reducing the processing overhead of compression techniques.

Key Words and Phrases: B-trees, key compression, multiway search trees

CR Categories: 3.73, 3.74

1. SIMPLE PREFIX B-TREES

We assume that the reader is familiar with B-trees [4, 81 and with a variation of

B-trees, called B*-trees [8, lo]. In B*-trees the records of a file together with the

keys identifying them are only stored in leaf nodes of the tree structure. We call

the nonleaves branch nodes or branch pages. Leaves can be linked to their neigh-

bors to allow sequential processing of the leaves without using the branch nodes of

the B*-tree.

We call the part

of a B*-tree consisting only of the branch nodes the B*-index

and the ordered set of leaves the B*-file. In the B*-index some keys, which appear

again in the B*-file with their associated records, are repeated without their records.

The following observation is important for the rest of this paper: The keys stored

in the B*-index are only used to direct the search algorithm and to determine in

which subtree of a given branch node a key and its associated record will be found,

if they are in the tree at all.

It is now a fairly obvious observation that we need not necessarily use the keys

the

B*-file

to construct the B*-index. Instead we can use other strings, con-

structed to have desired properties, for building up the equivalent of the B*-index

of a B*-tree.

To give a simple example, we assume that a leaf is already full and contains

keys

publish, but not for profit, all or part of this material is granted provided that ACM’s copy-

right notice is given and that reference is made to the publication, to its date of issue, and

to the fact that reprinting privileges were granted by permission of the Association for Corn:

puting Machinery.

Most of the research reported in this paper was performed while the first author was visiting

at IBM&search Laboratory, San Jose, CA 95193. The research of the second author was

sup-

ported by the Sonderforschungsbereich 49-Elektronische Rechenanlagen und Informa-

tionsverarbeitung-of the Deutsche Forschungsgemeinschaft.

Authors’ address: Institut fiir Informatik

der Technischen Universitiit Munchen, Arcisstrasse

21, D-3999 Mtinchen 2, West Germany.

ACM Transactions on Database Systems, Vol. 2, No. 1, March 1977, Pages 11-26.

R. Bayer and K. Unterauer

Bigbird, Burt, Cookiemonster, Ernie, Snuffleopogus

In order to insert the key “Grouch” with its record, we must split this leaf into

two as follows:

Bigbird, Burt, Cookiemonster

Ernie, Grouch, Snuffleopogus

Instead of storing the key “Ernie” in the index, it obviously suffices to use one of

the one-letter strings “D”, “E” for the same purpose. In general we can select

any string s with the property

Cookiemonster < s 5 Ernie

(1)

and store it in the index part to separate the two nodes. We call such a string s

a separator (between Cookiemonster and Ernie). It seems prudent to choose one

of the shortest separators.

Note. If the keys are words over some alphabet and the ordering of the keys is

the alphabetic order, then the following property, called the pre$x property, holds:

Let x and y be any two keys such that x < y. Then there is a unique prefix g of

y such that (a) g is a separator between x and y, and (b) no other separator

between x and y is shorter than g. For the rest of this paper, we assume that the

prefix property holds.

The technique of moving a shortest separator to the father node when a node is

being split can be used only for splitting leaves, not branch nodes. When a branch

node is being split, one of the separators on that node must be moved up one

level in the tree.

As mentioned before, a B*-tree can be considered as consisting of a B*-index and

a B*-file. The B*-index itself is just a conventional B-tree of a subset of the keys

in the B*-file together with the maintenance algorithms for B-trees described in [4].

DeJinition. A simple prefix B-tree is a B*-tree in which the B*-index is replaced

by a B-tree of (variable length) separators.

Note. Since a key in a B*-index is also a separator, although not necessarily a

shortest possible separator, the class of simple prefix B-trees contains the class of

B*-trees.

Except for the slight complication of always having variable length separators,

the search algorithm for simple prefix B-trees is exactly the same as for B*-trees.

Split interval. The performance bottleneck of our trees is the number of ac-

cesses to the backup store needed for INSERT, DELETE, and RETRIEVE

operations. This number is essentially determined by the height of the tree since

the pages along the retrieval path for some key x from the root to some leaf are

always needed for those three operations.

Performance can therefore be improved by making the trees as flat as possible,

which can be achieved by making the branching degree of the nodes, especially

in the upper parts of the tree (i.e. near the root), as high as possible. This branch-

ing degree is determined by the number of (separator, pointer) pairs that can be

stored on a fixed size page. Pointers are generally rather short and have a fixed

ACM Transactions on Database Systems, Vol. 2, No. 1, March 1977.

Prefix B-Trees

size of a few bytes. Keys, however, tend to be rather long. The branching degree

can therefore be increased by decreasing the length of the separators. This essen-

tially is the rationale for storing only shortest separators instead of full keys in

the index part of a simple prefix B-tree.

This idea can now be carried one step further if we do not insist on splitting a

leaf precisely in the middle. Instead we could allow a certain split interval around

the middle (or the median key) of a splitting leaf within which a split point (be-

tween two adjacent keys xi-l and xi) should be chosen so as to minimize the length

of the shortest separator si . Note that because of the prefix property the resulting

separator si can still be chosen as a prefix of the key zi . The size of the split inter-

val is determined by a parameter UZ .

gz is simply the number of separators (or

bytes) around the middle of the page which we consider for choosing a suitable

split point.

The same idea can be applied to splitting branch nodes. A certain interval of

size

around the middle of a page is considered for choosing a split point such

that a shortest separator within this interval is moved to the father page.

Effect of ~1 and Ub . An increase of uz should decrease the average length of the

separators in the index part of the tree, thereby reducing the number of nodes in

the index part. An increase of Ug should favor the shorter separators in the index to

be located near the root, thereby increasing the branching degree of nodes near the

root, where a high branching degree is most beneficial.

Increasing both (rl and C7b causes two effects working against each other:

(a) It tends to decrease the height of the index part for the reasons just de-

scribed.

(b) The storage utilization decreases (there can now be pages less than half

full), which requires more pages in the file part and more but shorter entries in

the index part of a tree.

We have not analyzed the influence of uz or

on the performance of the trees.

We expect such an analysis to be quite involved and difficult. We are quite con-

fident, however, that small split intervals improve performance considerably.

Sets of keys that arise in practical applications are often far from random, and

clusters of similar keys differing only in the last few letters (e.g. plural forms) are

quite common.

As an example, consider the key sequence

“On, Part, Problem, Problems, Solution, Solutions”

arising in the tree of Figure 1. Splitting this sequence in the rr-ddle between the

third and fourth key would yield “Problems” as the shortest separator. Allowing a

split point to be chosen one key to the left or to the right yields “Pr” or “S’ as

separators. The split point between “Problems” and ‘Solution” yields the shortest

separator “S”,

which is a prefix of the full key “Solution” and appears in the tree

of Figure 1.

2. ALGORITHMS FOR SIMPLE PREFIX B-TREES

2.1 Search Algorithm

This algorithm is the same as for B*-trees with variable length keys.

ACM Transactions on Datsbsse Systems, Vol. 2, No. 1, March 1977.

剩余15页未读，继续阅读

评论收藏

内容反馈

weixin_38740397

粉丝: 6
资源: 854

Prefix B-trees - 1977-计算机科学

最新资源

Prefix B-trees - 1977-计算机科学

hbase-prefix-tree-1.1.3-API文档-中文版.zip

hbase-prefix-tree-1.4.3-API文档-中文版.zip

hbase-prefix-tree-1.2.12-API文档-中文版.zip

hbase-prefix-tree-1.2.12-API文档-中英对照版.zip

hbase-prefix-tree-1.1.3-API文档-中英对照版.zip

prefix-list.txt

Efficient in-memory indexing with Generalized Prefix trees.pptx

ndarray-prefix-sum:ndarray的前缀求和面积表计算

Android-Android-Prefix-Input为EditText添加prefix前缀模式

postcss-prefix-selector:使用选择器为所有CSS规则添加前缀

tmux-prefix-highlight:按下tmux前缀键时突出显示的插件

html-webpack-chunk-prefix-plugin

android-ndk-r18b-windows-x86_64中toolchains中相关的包

Parallel Scans and Prefix Sums - Slides (2013)-计算机科学

Data Compression Techniques - Lecture 5 - Adaptive Prefix-Free Coding - University of Helsinky - Slides (2015)-计算机科学

前端开源库-log-prefix

Python库 | prefix_list_agent-0.1.0rc3-py2.py3-none-any.whl

面试指南-Lucene-ES篇-课件

hbase-prefix-tree-0.98.4-hadoop2.jar

前端开源库-generic-common-prefix

配置route-map详解

前端开源库-loglevel-plugin-prefix

merkle-prefix-trie:使用Merkle前缀尝试（MPT）的经过身份验证的词典和经过身份验证的集的Java实现

prefix-launcher-开源

Prefix-Tuning Optimizing Continuous Prompts for Generation.pdf

路由策略中的IP-Prefix详解.docx

code_prefix.code-snippets

Python库 | si_prefix-0.4.post10-py2-none-any.whl

gulp-css-prefix:通过 css-prefix 管道文件

最新资源