线性散列表(linearhash)_线性散列表资源-CSDN文库

linear

hash

需积分: 48 190 浏览量 2010-10-31 23:00:30 上传评论收藏 1.05MB PDF 举报

资源详情

资源评论

LINEAR HASHING : A NEW TOOL FOR FILE AND TABLE ADDRESSING.

Witold Litwin

I. N. Ft. I. A.

78 150 Le Chesnay, France.

Abstract.

Linear hashing is a hashing in which the

address space may grow or shrink dynamically. A

file or a table may then support ally number of

insertions or deletions without access or memory

load performance deterioration. A record in the

file is,

in general, found in pale access, while

the load may stay practically constant up to 90 %.

A record in a table is found in a mean of 1.7

accesses, while the load is constantly 80 %. No

other algorithms attaining such a performance are

known.

1. B.

The most fundamental data structures are files

and tables of records identified by a primary key.

Hashing and trees (B-tree, binary tree,.. ) are

the basic addressing techniques for those files

and tables, thousands of publications dealed with

this subject.

If a file or a table is almost

static, hashing allows a record to be found in

general in one acoess. A tree always requires

several accesses. However, when the file or the

table is,

as usually, dynamic, then a tree still

works reasonably well, while the performance Of

hashing may become very bad. It may even become

necessary to rehash all records into a new file.

We have shown, however, in /LIT77/ that hashing

may be a tool for dynamic files,

if the hashing

function is dynamically modified in the course of

insertions or deletions, We have called this new

type of hashing yj,&& hashitlp. (VH), in contrast

to the well known hashing with a static function,

which we will refer to as qJ.a&&&

Through an

algorithm called VHO /LIT77/, we have shown that a

record in a dynamic file may typically be found in

two accessses, while the load stays close to 70 %

/LIT77a/. Another algorithm, called VHl /LIT78/, /

LIT78a/, has shown that a record may even be found

typically in one a&ass, while the load during

insertions oscillates between 45 % and 90 % and is

67.5

% on average. It showed also that the average

load during insertions may be always greater than

% and almost always greater than

5, if we

accept that the average successful search requires

1.6

accesses /LIT79a/. Finally, a gensralisation

of VHl, called VH2, has shown that for a similar

load, the average successful search requires very

close to one access /LIT79a/. Two other algorithms

similar to VHO have been proposed, Dynamic Hashing

(DH) /LAR78/ and Extendible Hashing (EH) /FAG78/.

Since trees typically lead to more than

3 or 4

accesses per search and to a load close to 70 %

/KND74/, /COW/g/, all these VH algorithms offer

better access performance for similar or higher

load factors.

VHO, DH and EH require at least two accesses

per

search because the data structure which

represents the dynamically oreated hashing

functions must be on the disk. VHl and VH2 are

faster, because the functions are represented by a

bit table which, depending on file parameters,

needs 1 kbyte of core (main storage), per file for

7 000 to more than 1 500 000 records. In this

paper,

we present the algorithm which

goes

further,

only a few bytes of core suffice now for

a file of any size. For RD~ number of insertions

or deletions, the load of a file may therefore be

high and a reaord may be found, in general, in one

access.

No other algorithms attaining such a

performance are known.

The algorithm is called Linear Virtual Hashing

or Linaar &&~g in short (LH). The choice of

file parameters may lead to a mean number of

accesses per successful search not greater

than

1.03,

while the load stays close to

%. It may

also lead to a load staying equal to 90 % while

the successful search requires 1.35 accesses in

C”1534-7/80/0000-0212$00.75 0 1980 IEEE

212

the average.

Even if the buffer in core may

contain only one

record, a

search in the file

needs 1.7 accesses in the average while the load

remains at

$. This property makes LH probably

the best performing tool for dynamic tables as

well.

The next section describes the principles of

LH. We first snow the basic schema for nashing. We

then

discuss

the computing

of the physical

adaresses of buckets, when tne storage for tnem is

allocated in a non-contiguous manner. Finally, we

present some variants of the basic schema.

Section 3 shows performance of the Linear

Hashing. First, we show access and memory load

performance of

the basic schema. Next,

the

performance are analysed for a variant with a,

so-called, load control.

Section 4 concludes the paper. We sum up the

advantages

which Linear Hashing brings, we show

some application areas and, finally, we

indicate

directions for further research.

-I

-OF

THE LINEAR .

2.1.

&sic schem&

We recall that hashing is a technique which

addresses

records provided with an identifier

called B&y or, simply, key. The key, let

it be c,

is usually a non-negative integer and, in

a work on the addressing by primary key,

we may

disregard

the rest of the record. A simple

pseudo-random function, let it be h, called a

m function, assigns to c the memory cell

identified by the value h(c). The &&i,ng Pu

w c /--> c mod M ;

M = 2,3,.. ; is an

exemple of a hashing function. The cells are

called buckets and

may

contain b records,

b = 1,2,.. .

The record is

inserted into the

bucket h(c),

called m for c, unless the

bucket is already full. The search for c always

starts with the access to the bucket h(c).

If the bucket is full when c should be stored,

speak

about a w. An algorithm called

. .

sm w methoq (CRM) is then applied

which, typically, stores c in a bucket m such that

md h(c).

c then becomes an

overflow record and

thl; bucket m is called overflow bucket for c. If

(i) overflow buckets

are

not primary for any

c,(U) each of them is devoted to only one h(c)

and (iii) a new overflow bucket

for an h(c) is

chained to the existing ones,

then we have a

u chaininn CRM. If,

in particular,

the

capacity b'

of an overflow bucket is b' = 1, we

call this CAM

seoarate chaininn /KNU74/.

A search for an

overflow record requires at

least two accesses. If all collisions are resolved

only

by overflow record

creations, as it

was

assumed until

recently /LIT77/, then access

performance must rapidly deteriorate when primary

buckets become full. If the insertion of c leads

to a collision and no records already stored in

the bucket h(c) should become overflow records,

then c may be stored in its primary bucket only if

a new

hashing function is chosen. The new

function,

let it be h', should

assign new

addresses to some of the records hashed with h on

h(c) and the file should be. reorganized in

consequence.

If h = h’ for all other records, the

reorganizing needs to move only a few records

and

so may be performed dynamically. The new function

is then called by us w

created-

function or, shortly, a s&&j.~g function.

The modification to the hashing function and to

the file is called au, h(c) is, under the

circumstances, -address . The idea in VH in

general

and so,

in particular,

in LH is to use

splits in order to avoid the accumulation of

overflow records. Splits are typically performed

during some insertions. All splits result from the

application of -functions. For LH, as well as

for VHO and for VHl, the basic Split functions are

defined as follows /LIT79/, /LITBO/ :

- Let C be

the

key

space. Let

: c --> (0, l,.., N-11 be the function that is

used to load the file. The functions h

,s h2,..,

hi,..

are called split functions for ho if they

obey the following requirements :

(1)

: C --> (0, l,.., 2’N-11

(2)

For any c either :

hi(c) = hi,,(c)

or :

hi(c)

= h. (c) + 2

i-l

l-l

(2.1)

assume that,

typically, each

hi ;

i = O,l,.. ;

hashes randomly. This means that the

probability that c is mapped by h. to a given

address is l/2iN. This also meansithat (2.1) and

(2.2) are equiprobable events.

Fig. 1 illustrates the use of split functions.

The file is created with h0 : c --> c mod N, where

N = 100. The bucket capacity is b = 5 records. For

split functions we choose tne hashing by division,

namely we put :

/-->

mod

25.4.

213

This choice respects (2) since, obviously, for any

non-negative integers k, L, either :

k mod 2L

= k mod L

or :

k mod 2L

= k mod L + L.

We assume that a collision occurs during the

insertion of c

= 4900. Instead of simply storing c

as an overflow record, we

change ho

to the

following h :

h(c) = h,(c)

h(c)

= ho(c)

if ho(c) = 0

otherwise.

We then reorganize the file. We thus have applied

h, as the split function and we have performed the

split for the address 0. The hashing function h

results from the spLit and is

a dynamic hashing

function.

ho 8 h

4 900

‘a)

$J;-%J--#iJ

0 1 53

100

Fig.

1. The use of a split for a collision reso-

t;;ion. (a) -

a collision occurs for the bucket 0.

- the collision is resolved without creating

an overflow record and the address space is

extended.

Fig. 1.b shows the new state of the file. Since

h = ho for each address except 0, no records other

than these hashed to 0 have been moved.

Since

hzh

for the records which h0 hashed to the

addre& 0, for approximatively half the number of

these

records the address has been changed. It

followed from (2) that all these records had the

same new address which, under the circumstances,

had to be 100. A new bucket has therefore been

appended to the last bucket of the file to which

all the records have been moved &l pne m.

Since for all these records the bucket 100 is

henceforward the primary bucket, they are all

accessible in m access. In particular, this is

also the case of the new record, i. e,, 4 900. On

the other hand, the records which remained in the

bucket 0 continue to be accessible in one access.

In contrast to what could be done if a classical

hashing was used,

the split has resolved the

collision ulthout creatina a

JELW.&X record and

without--deterioration.

Let us now assume that the file addressed with

h0 undergoes a

sequence of insertions which did

not yet lead to overflow records. Furthermore, let

assume that a split is performed iff a

collision occurs. The natural idea would be to

split the bucket which undergoes the collision,

this was

implicit for all algorithms for VH.

However, split

addresses must then be random and

this must lead to dynamic hashing functions using

tables.

Dynamic hashing functions which do not

need tables may be

obtained only if the split

addresses

are chosen in

a predefined order. To

perform splits in some predefined order, instead

of split the bucket which undergoes the collision,

is the main new idea in the linear hashing.

Let m be the address of a collision. Let n be

the address of a split to be performed in the

course

of the resolution of this collision. Since

the values of m are random while these of n are

predefined, usually

n f m. If so, we assume that

the new record is stored as an overflow record

from the bucket m through a classical CRM, bucket

chaining for instance. Next, we assume that n is

given by a pointer which thus indicate the bucket

to be split.

For the first N collisions, the

buckets are

pointed in the

linear order

0,1,2,..,

N-l

and all splits use h,

(2.2) implies

then,

that the file becomes progr&sively larger,

including one

after another the buckets N+l,

N+2 ,...,2N-1.

A record to be inserted undergoes a

split usually not when it leads to the collision,

but with some delay. The delay corresponds to the

number of buckets which has to

be pointed while

the pointer travels up, from the address indicated

in the moment of the collision, to the address of

this collision.

With this mechanics,

no matter what is the

address,

let it be ml, of the first collision, LH

performs the first split using h, and for the

address 0. The records from the bucket 0 are then

randomly distributed between the bucket

0 and a

new bucket N, while, unless m,=O, an overflow

record is created for the bucket m,.

The second

collision,

no matter what is its address, let

say m2,

leads to an analogous result, except that,

214

剩余11页未读，继续阅读

评论收藏

内容反馈

线性散列表(linear hash)

评论0

最新资源

线性散列表(linear hash)

评论0

最新资源

相关推荐

散列表 （哈希表，线性探测再散列）

LinearProbingHash:使用哈希表的线性探测实现

Linear Hash Table (C++ Implementation)

linear_hash:用C编写的线性探测哈希表实现（c99）

算法导论（第2版）参考答案

算法导论-第三版(中文).rar

算法导论第三版英文原版

算法导论（原书第二版）

算法导论（英文版）

算法导论Introduction to Algorithms

算法导论Introduction.to.Algorithms,.Second.Edition.part2（英文）

算法导论Introduction.to.Algorithms,.Second.Edition.part1（英文版）

算法导论-麻省理工（中文）

算法导论（中文 第二版）答案

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

数字信号处理——保研复习资料.pdf

风电和储能并网Simulink模型

使用STM32F103C8T6+L298N+MG513P30电机使用外部中断法和输入捕获法进行编码器测速

COMSOL各个模块中文使用手册及教程，入门必备

FMEA第五版（中文版）

散列表（哈希表，线性探测再散列）

算法导论（中文第二版）答案

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar