Post on 30-May-2018
8/14/2019 Cormen Algo-lec7
1/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1
Introduction to Algorithms
6.046J/18.401J
LECTURE 7Hashing I
Direct-access tables
Resolving collisions bychaining
Choosing hash functions
Open addressing
Prof. Charles E. Leiserson
8/14/2019 Cormen Algo-lec7
2/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2
Symbol-table problem
Symbol table Sholding n records:
recordxkey[x]
key[x]
Otherfieldscontaining
satellite data
Operations on S:
INSERT(S,x)
DELETE(S,x) SEARCH(S, k)
How should the data structure Sbe organized?
8/14/2019 Cormen Algo-lec7
3/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.3
Direct-access table
IDEA: Suppose that the keys are drawn fromthe set U {0, 1, , m1}, and keys aredistinct. Set up an array T[0 . . m1]:
T[k] =x ifx Kand key[x] = k,
NIL otherwise.Then, operations take (1) time.
Problem: The range of keys can be large: 64-bit numbers (which represent
18,446,744,073,709,551,616 different keys),
character strings (even larger!).
8/14/2019 Cormen Algo-lec7
4/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.4
Hash functions
Solution: Use a hash function h to map theuniverse Uof all keys into
{0, 1, , m1}:
U
S
k1
k2 k3
k4k5
0
m1
h(k1)
h(k4)
h(k2)
h(k3)
T
= h(k5)
As each key is inserted, h maps it to a slot ofT.When a record to be inserted maps to an alreadyoccupied slot in T, a collision occurs.
8/14/2019 Cormen Algo-lec7
5/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.5
Resolving collisions by
chaining Link records in the same slot into a list.
h(49) = h(86) = h(52) = i
T
4949 8686 5252
Worst case:
Every key
hashes to thesame slot.
Access time =
(n) if|S| = n
i
8/14/2019 Cormen Algo-lec7
6/26
8/14/2019 Cormen Algo-lec7
7/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.7
Search cost
The expected time for an unsuccessfulsearch for a record with a given key is
= (1 + ).
8/14/2019 Cormen Algo-lec7
8/26
8/14/2019 Cormen Algo-lec7
9/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.9
Search cost
The expected time for an unsuccessfulsearch for a record with a given key is
= (1 + ).
apply hash functionand access slot
searchthe list
Expected search time = (1) if = O(1),
or equivalently, ifn = O(m).
8/14/2019 Cormen Algo-lec7
10/26
8/14/2019 Cormen Algo-lec7
11/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.11
Choosing a hash function
The assumption of simple uniform hashing
is hard to guarantee, but several commontechniques tend to work well in practice aslong as their deficiencies can be avoided.
Desirata:
A good hash function should distribute the
keys uniformly into the slots of the table. Regularity in the key distribution should
not affect this uniformity.
8/14/2019 Cormen Algo-lec7
12/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.12
Division method
Assume all keys are integers, and define
h(k) = kmod m.
Deficiency: Dont pick an m that has a smalldivisord. A preponderance of keys that arecongruent modulo
dcan adversely affect
uniformity.
h(k)
Extreme deficiency: Ifm = 2r, then the hash
doesnt even depend on all the bits ofk:
Ifk= 10110001110110102 and r= 6, then
h(k) = 0110102 .
8/14/2019 Cormen Algo-lec7
13/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.13
Division method (continued)
h(k) = kmod m.
Pickm to be a prime not too close to a powerof2 or10 and not otherwise used prominentlyin the computing environment.
Annoyance: Sometimes, making the table size a prime is
inconvenient.But, this method is popular, although the nextmethod well see is usually superior.
8/14/2019 Cormen Algo-lec7
14/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14
Multiplication method
Assume that all keys are integers, m = 2r, and ourcomputer has w-bit words. Define
h(k) = (Akmod 2w) rsh (wr),
where rsh is the bitwise right-shift operator and
A is an odd integer in the range 2w1
8/14/2019 Cormen Algo-lec7
15/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.15
Multiplication method
example
4
0
35
2617
Modular wheel
h(k) = (Akmod 2w) rsh (wr)
Suppose that m = 8 = 23
and that our computerhas w = 7-bit words:
1 0 1 1 0 0 1 1 1 0 1 0 1 11 0 0 1 0 1 0 0 1 1 0 0 1 1
= A= k
h(k) A ..2A
..
3A
..
R l i lli i b
8/14/2019 Cormen Algo-lec7
16/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.16
Resolving collisions by open
addressingNo storage is used outside of the hash table itself.
Insertion systematically probes the table until anempty slot is found.
The hash function depends on both the key and
probe number:h : U {0, 1, , m1} {0, 1, , m1}.
The probe sequence h(k,0), h(k,1), , h(k,m1)should be a permutation of{0, 1, , m1}. The table may fill up, and deletion is difficult (but
not impossible).
8/14/2019 Cormen Algo-lec7
17/26
8/14/2019 Cormen Algo-lec7
18/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.18
Example of open addressing
Insert key k= 496:
0. Probe h(496,0)586133
204
481
T0
collision1. Probe h(496,1) 586
m1
8/14/2019 Cormen Algo-lec7
19/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.19
Example of open addressing
Insert key k= 496:
0. Probe h(496,0)586133
204
481
T0
1. Probe h(496,1)
496
2. Probe h(496,2)
insertion
m1
8/14/2019 Cormen Algo-lec7
20/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.20
Example of open addressing
Search for key k= 496:
0. Probe h(496,0)586133
204
481
T0
m1
1. Probe h(496,1)
496
2. Probe h(496,2)
Search uses the same probesequence, terminating suc-cessfully if it finds the key
and unsuccessfully if it encounters an empty slot.
8/14/2019 Cormen Algo-lec7
21/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.21
Probing strategies
Linear probing:
Given an ordinary hash function h(k), linearprobing uses the hash function
h(k,i) = (h(k) + i) mod m.This method, though simple, suffers fromprimaryclustering, where long runs of occupied slots build
up, increasing the average search time. Moreover,the long runs of occupied slots tend to get longer.
8/14/2019 Cormen Algo-lec7
22/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.22
Probing strategies
Double hashing
Given two ordinary hash functions h1(k) and h2(k),double hashing uses the hash function
h(k,i) = (h1(k) + i
h2(k)) mod m.This method generally produces excellent results,
but h2(k) must be relatively prime to m. One way
is to make m a power of2 and design h2(k) toproduce only odd numbers.
8/14/2019 Cormen Algo-lec7
23/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.23
Analysis of open addressing
We make the assumption ofuniform hashing:
Each key is equally likely to have any one ofthe m!permutations as its probe sequence.
Theorem. Given an open-addressed hashtable with load factor = n/m < 1, theexpected number of probes in an unsuccessfulsearch is at most 1/(1).
8/14/2019 Cormen Algo-lec7
24/26
October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.24
Proof of the theorem
Proof. At least one probe is always necessary.
With probability n/m, the first probe hits anoccupied slot, and a second probe is necessary.
With probability (n1)/(m1), the second probe
hits an occupied slot, and a third probe isnecessary.
With probability (n2)/(m2), the third probehits an occupied slot, etc.
=