LINEAR HASHING : A NEW TOOL FOR FILE AND TABLE ADDRESSING.
Witold Litwin
I. N. Ft. I. A.
78 150 Le Chesnay, France.
Abstract.
Linear hashing is a hashing in which the
address space may grow or shrink dynamically. A
file or a table may then support ally number of
insertions or deletions without access or memory
load performance deterioration. A record in the
file is,
in general, found in pale access, while
the load may stay practically constant up to 90 %.
A record in a table is found in a mean of 1.7
accesses, while the load is constantly 80 %. No
other algorithms attaining such a performance are
known.
1. B.
The most fundamental data structures are files
and tables of records identified by a primary key.
Hashing and trees (B-tree, binary tree,.. ) are
the basic addressing techniques for those files
and tables, thousands of publications dealed with
this subject.
If a file or a table is almost
static, hashing allows a record to be found in
general in one acoess. A tree always requires
several accesses. However, when the file or the
table is,
as usually, dynamic, then a tree still
works reasonably well, while the performance Of
hashing may become very bad. It may even become
necessary to rehash all records into a new file.
We have shown, however, in /LIT77/ that hashing
may be a tool for dynamic files,
if the hashing
function is dynamically modified in the course of
insertions or deletions, We have called this new
type of hashing yj,&& hashitlp. (VH), in contrast
to the well known hashing with a static function,
which we will refer to as qJ.a&&&
Through an
algorithm called VHO /LIT77/, we have shown that a
record in a dynamic file may typically be found in
two accessses, while the load stays close to 70 %
/LIT77a/. Another algorithm, called VHl /LIT78/, /
LIT78a/, has shown that a record may even be found
typically in one a&ass, while the load during
insertions oscillates between 45 % and 90 % and is
67.5
% on average. It showed also that the average
load during insertions may be always greater than
63
% and almost always greater than
85
5, if we
accept that the average successful search requires
1.6
accesses /LIT79a/. Finally, a gensralisation
of VHl, called VH2, has shown that for a similar
load, the average successful search requires very
close to one access /LIT79a/. Two other algorithms
similar to VHO have been proposed, Dynamic Hashing
(DH) /LAR78/ and Extendible Hashing (EH) /FAG78/.
Since trees typically lead to more than
3 or 4
accesses per search and to a load close to 70 %
/KND74/, /COW/g/, all these VH algorithms offer
better access performance for similar or higher
load factors.
VHO, DH and EH require at least two accesses
per
search because the data structure which
represents the dynamically oreated hashing
functions must be on the disk. VHl and VH2 are
faster, because the functions are represented by a
bit table which, depending on file parameters,
needs 1 kbyte of core (main storage), per file for
7 000 to more than 1 500 000 records. In this
paper,
we present the algorithm which
goes
further,
only a few bytes of core suffice now for
a file of any size. For RD~ number of insertions
or deletions, the load of a file may therefore be
high and a reaord may be found, in general, in one
access.
No other algorithms attaining such a
performance are known.
The algorithm is called Linear Virtual Hashing
or Linaar &&~g in short (LH). The choice of
file parameters may lead to a mean number of
accesses per successful search not greater
than
1.03,
while the load stays close to
60
%. It may
also lead to a load staying equal to 90 % while
the successful search requires 1.35 accesses in
C”1534-7/80/0000-0212$00.75 0 1980 IEEE
212
评论0
最新资源