24
File Structure File Structure Chapter 11. Hashing Chapter 11. Hashing

Chapter 11. Hashing

  • Upload
    pier

  • View
    62

  • Download
    2

Embed Size (px)

DESCRIPTION

Chapter 11. Hashing. Contents. Introduction A Simple Hashing Algorithm Hashing Functions and Record Distributions How Much Extra Memory Should Be Used? Collision Resolution by Progressive Overflow Storing More Than One Record per Address: Buckets Making Deletions - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 11. Hashing

File StructureFile Structure

Chapter 11. HashingChapter 11. Hashing

Page 2: Chapter 11. Hashing

- 2 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Contents

Introduction

A Simple Hashing Algorithm

Hashing Functions and Record Distributions

How Much Extra Memory Should Be Used?

Collision Resolution by Progressive Overflow

Storing More Than One Record per Address: Buckets

Making Deletions

Other Collision Resolution Techniques

Patterns of Record Access

Introduction

A Simple Hashing Algorithm

Hashing Functions and Record Distributions

How Much Extra Memory Should Be Used?

Collision Resolution by Progressive Overflow

Storing More Than One Record per Address: Buckets

Making Deletions

Other Collision Resolution Techniques

Patterns of Record Access

Page 3: Chapter 11. Hashing

- 3 -File Structures - Chapter 11 -File Structures - Chapter 11 -

1. Introduction

O-notationO(1)O(N) : sequential searchingO(log2N)

O(logkN) : B-Tree (k : 리프 노드 크기 )

What is Hashing?a = h(K)

h (hash function), K (key), a (home address)

ExampleK = BASSh = (first char * second char) mod 1000

a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

O-notationO(1)O(N) : sequential searchingO(log2N)

O(logkN) : B-Tree (k : 리프 노드 크기 )

What is Hashing?a = h(K)

h (hash function), K (key), a (home address)

ExampleK = BASSh = (first char * second char) mod 1000

a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

Page 4: Chapter 11. Hashing

- 4 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Introduction

CollisionExample

key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4

Several ways to reduce the number of collisions 1. Spread out the records

Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address

Buckets

CollisionExample

key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4

Several ways to reduce the number of collisions 1. Spread out the records

Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address

Buckets

Page 5: Chapter 11. Hashing

- 5 -File Structures - Chapter 11 -File Structures - Chapter 11 -

2. A Simple Hashing Algorithm

3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address

ExampleStep 1. Represent the Key in Numerical Form

3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address

ExampleStep 1. Represent the Key in Numerical Form

LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L Blanks

Page 6: Chapter 11. Hashing

- 6 -File Structures - Chapter 11 -File Structures - Chapter 11 -

A Simple Hashing Algorithm

Example (계속 )Step 2. Fold and Add

76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )

7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187

4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883

Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

Example (계속 )Step 2. Fold and Add

76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )

7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187

4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883

Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

Page 7: Chapter 11. Hashing

- 7 -File Structures - Chapter 11 -File Structures - Chapter 11 -

3. Hashing Functions and Record Distributions

Distributing Records among Addresses Distributing Records among Addresses

12345678910

ABCDEFG

Record Address

Best

(a)

12345678910

ABCDEFG

Record Address

Worst

(b)

12345678910

ABCDEFG

Record Address

Acceptable

(c)

<Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)

Page 8: Chapter 11. Hashing

- 8 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Hashing Functions and Record Distributions

Some Other Hashing MethodsBetter than random

Examine keys for a pattern 주민등록 번호

Divide the key by a prime number

Random Square the key and take the middle

4532 => 2 0 5 2 0 9 Radix transformation

Some Other Hashing MethodsBetter than random

Examine keys for a pattern 주민등록 번호

Divide the key by a prime number

Random Square the key and take the middle

4532 => 2 0 5 2 0 9 Radix transformation

Page 9: Chapter 11. Hashing

- 9 -File Structures - Chapter 11 -File Structures - Chapter 11 -

4. How Much Extra Memory Should Be Used ?

Packing Density

Exampler = 75 recordsN = 100 address

Packing Density

Exampler = 75 recordsN = 100 address

N

r

spaces of #

records of #

%7575.0100

75

Page 10: Chapter 11. Hashing

- 10 -File Structures - Chapter 11 -File Structures - Chapter 11 -

How Much Extra Memory Should Be Used ?

Predicting Collisions for Different Packing Densities Predicting Collisions for Different Packing Densities

Packing density (%) Synonyms (%)

10407090100

4.817.628.134.136.8

<Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses

Page 11: Chapter 11. Hashing

- 11 -File Structures - Chapter 11 -File Structures - Chapter 11 -

5. Collision Resolution by Progressive Overflow

Progressive OverflowOpen addressingLinear probing

Progressive OverflowOpen addressingLinear probing

0

1

Rosen2

Jasper3

York4

Novak’s home address

York’s home address

York h(K)address

3

Novak h(K)address

2

Page 12: Chapter 11. Hashing

- 12 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Collision Resolution by Progressive Overflow

Search Length Search Length

KeyHome

Address# of Access

(Search Length)

AdamsBatesColeDeanEvans

01120

11225

Adams0

Bates1

Cole2

Dean3

Evans4

5

Page 13: Chapter 11. Hashing

- 13 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Collision Resolution by Progressive Overflow

Search Length (계속 )

Example

Search Length (계속 )

Examplerecords ofnumber total

lengthsearch total Length Search Average

2.25

52211 Length Search Average

<Figure 11.7>Average search lengthversus packing densityin a hashed file

Page 14: Chapter 11. Hashing

- 14 -File Structures - Chapter 11 -File Structures - Chapter 11 -

6. Storing More Than One Record per Address : Buckets

Buckets Buckets

Key Home Address

GreenHall

JenksKingLandMarxNutt

0023333

Green Hall0

1

Jenks2

King Land Marks3

Nutt4

Page 15: Chapter 11. Hashing

- 15 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Storing More Than One Record per Address : Buckets

Effects of Buckets on Performance Effects of Buckets on Performance

bN

r density packing

r : # of recordsN : # of addressesb : # of records in a bucket

File without buckets File with buckets

# of records# of addresses

Bucket sizePacking density

Ratio of records to addresses

r = 750N = 1000

b = 10.75

r/N = 0.75

r = 750N = 500

b = 20.75

r/N = 1.5

Page 16: Chapter 11. Hashing

- 16 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Storing More Than One Record per Address : Buckets

<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes

<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes

Packingdensity

Bucket size

1 2 5 10

20 %

50 %

80 %

100 %

9.4

21.3

31.2

36.8

2.2

10.4

20.4

27.1

0.1

2.5

10.3

17.6

0.0

0.4

5.3

12.5

Page 17: Chapter 11. Hashing

- 17 -File Structures - Chapter 11 -File Structures - Chapter 11 -

7. Making Deletions

처음상태 처음상태

KeyHome

AddressActual

address

Adams

Jones

Morris

Smith

0

1

1

0

0

1

2

3

Adams0

Jones1

Morris2

Smith3

Page 18: Chapter 11. Hashing

- 18 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Making Deletions

(1) Tombstones for Handling Deletions (1) Tombstones for Handling Deletions

Adams0

Jones1

Morris2

Smith3

* Deletion of Morris

Adams0

Jones1

###2

Smith3

“Smith 는 찾을 수 없다”

### : tombstoneThis mark indicates that a record once lived there but no longer does

Page 19: Chapter 11. Hashing

- 19 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Making Deletions

(2) Implications of Tombstones for Insertions Inserting “Smith”

(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length

Reorganization

(2) Implications of Tombstones for Insertions Inserting “Smith”

(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length

Reorganization

Page 20: Chapter 11. Hashing

- 20 -File Structures - Chapter 11 -File Structures - Chapter 11 -

8. Other Collision Resolution Techniques

(1) Double HashingSecond hashing function

Increment(c) adding

Seek time overhead

(1) Double HashingSecond hashing function

Increment(c) adding

Seek time overhead

Page 21: Chapter 11. Hashing

- 21 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(2) Chained Progressive Overflow (2) Chained Progressive Overflow

KeyHome

addressActual

AddressSearch

length(1)Search

length(2)

AdamsBatesColeDeanEvansFlint

010140

012345

113316

112213

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5

2

3

5

-1

-1

-1

Page 22: Chapter 11. Hashing

- 22 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(3) Chaining with a Separate Overflow Area (3) Chaining with a Separate Overflow Area

Adams0

Bates1

2

3

Evans4

0

1

-1

Cole

Dean

Flint

2

-1

-1

Homeaddress

Primarydata area

Overflowarea

Page 23: Chapter 11. Hashing

- 23 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(4) Scatter Tables: Indexing Revisited (4) Scatter Tables: Indexing Revisited

0

1

2

3

4

Adams

Coles

Deans

1

3

Bates 4

Flint -1

-1

-1Evans

Page 24: Chapter 11. Hashing

- 24 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Patterns of Record Access

A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records

A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records