CpSc 3220 File and Database Processing Hashing. Exercise – Build a B + - Tree Construct an order-4...

CpSc 3220File and Database Processing

Hashing

Exercise – Build a B+-Tree

• Construct an order-4 B+-tree for the following set of key values:

(2, 3, 5, 7, 11, 17, 9, 6, 29, and 4)• Assume the tree is initially empty and values

are added in ascending order. • Now delete keys 2, 5, and 17

Objectives

• Survey Hashing Concepts• Investigate Hashing Algorithms• Study Collision Reduction• Analyze Performance• Investigate File Deterioration• Look at Patterns of Access

Schematic View of Hash File

Record for Key

Record for KeyxhashKey

Basic Hashing Concepts

• A hash file contains a fixed number of record spaces• Each record space is of a fixed size• A hash function determines the address of a record space

for a given key• A hash function may give same address for two different

records• A single address for different keys is called a collision.• Different keys that give identical addresses are called

synonyms.• A hash function that gives no collisions is called a perfect

hash function.

Objectives for a Hash File Package

• Keep collisions ‘low’– Spread out (distribute) records over address space– Use extra memory (increase address space)– Put more than one record per address

• Handle collisions efficiently

Outline for a Simple Hashing Algorithm

1. Put Key in numerical form2. Fold and Add to reduce numerical form to

‘integer’ size3. Divide by the size of the address space and

use remainder as RRN address (offset) of Key

Simple Hash Function(when Key is an alphanumeric string)

int Hash (string key){ int sum = 0; int len = strlen(key); if (len % 2 == 1) key = concat(key, ‘ ‘)// make len even for (int j = 0; j < len; j += 2)

sum = (sum + 256 * (ord)key[j] + (ord)key[j+1]) % FILE_SIZE; return sum;}

Hash Function Distribution

• Uniform (Perfect)• Random• Worse than random

We will look at random distributions

Predicting Record DistributionIf r records are distributed randomly into N spaces, the probability that a given address will have exactly x records assigned to it is p(x) = (r!/( (r-x)! x! ) )/(1-(1/N))r-x(1/N)x

p(0) – probability that an address is not usedp(1) – probability that no collision occursp(2) – probability that 1 collision occursetc.

Difficult to compute for large values of r and N.

Poisson’s Function

For large values of r and N, p(x) can be approximately by this function

p(x) = ( (r/N)x e-(r/N) ) / x!

The value r/N is the ratio of the number of records to the number of address spaces. If only one record is placed in each space it is a measure of the percent of storage space that will be used (the packing density).

From Page 484 of File Structures by Folk, Zoellick, and Riccardi

Collision Resolution Using Progressive Overflow ( Linear Probing)

Record for Key0Record for Key1Record for Key2hashKey3

Hi = (hash(key) + i) mod TableSize

ASL = (total # probes)/(# of Recs)

Address Spaces Can Hold More Than One Record

Packing Density = r/(bN) Address Density = r/N

Implementation Issues

• Loading a Hash File• Deletions– Tombstones– Performance Effects

Other Collision Resolution Techniques

• Quadratic Hashing– H(i) = (hash(key) + i2) mod TS

• Double Hashing– H(i) = (hash(key) + f(i)) mod TS where f(i) =

i*hash2(key) – Note that hash2(key) must never be zero

• Separate Overflow Area• Chained Overflow with Separate Overflow Area• Scatter Tables

Patterns of Record Access

• 20 percent of records account for 80 percent of activity

• Most active records must be in home address or performance deteriorates

Summary• Hashing provides O(1) direct access performance.• If hash function gives collisions ASL may increase.• Collisions can be reduced by:

– Spreading out records (choosing a better hash fct)– Using extra memory– Using buckets

• Poisson Distribution allows us to analyze hash file performance

• Better overflow handling can reduce ASL• Record Deletion requires special handling• Consider record access patterns • Hashing does not provide efficient sequential access• Hashing requires that we fix file size in advance

CpSc 3220 File and Database Processing Hashing. Exercise – Build a B + - Tree Construct an order-4...

Documents

1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

Lecture XI HASHINGyap/wiki/pm/uploads/Algo/l11_BASE.pdf · basic hashing framework, including universal hashing, perfect hashing, extendible hashing, and cuckoo hashing. Hash is one

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree

By Catherine Fontenot LTEC 3220 11/29/2011 1 Preparations for your tree Preparations for your tree The tools you will need for the job The tools you

Chapter 12: Indexing and Hashing - · PDF fileChapter 12: Indexing and Hashing ... tree), it can have between 0 and (n–1) values. A B+tree is a rooted tree satisfying the following

Indexing and Hashing - · PDF fileIndexing and Hashing ... 12.5 Construct a B+-tree for the following set of key ... implementation may be by linking together ﬁxed size buckets using

Stanford University · PDF filekd-tree Hashing: Locality-Sensitive Hashing Secondary storage: R-trees J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 10

3220 warehouse

$K$ -Ary Tree Hashing for Fast Graph Classificationxqzhu/papers/TKDE.Wu.2018.Kary.pdf · K-Ary Tree Hashing for Fast Graph Classiﬁcation Wei Wu , Bin Li , Ling Chen, Xingquan Zhu,

B+ Tree and Hashing - Department of Computer …user.ceng.metu.edu.tr/~karagoz/ceng302/302-B+tree-ind...– Balanced Tree • Same height for paths from root to leaf • Given a search-key

Chapter 12: Indexing and Hashing. 12.2 Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static

Chapter 12: Indexing and Hashing ( Cnt .) - · PDF fileB-Tree Index Files Static Hashing ... Data file degradation problem is solved by using B+-Tree File Organization. ... Implementation

Chapter 12: Indexing and HashingB+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL Multiple-Key

CHAPTER 11 Hashing Objectives To know what hashing · PDF fileObjectives • To know what hashing is for ... time in a well-balanced search tree. ... using the polynomial hash code

Hashing file organization - unipi.it · PDF fileHash function must be chosen at implementation time ... if the file itself is organized using hashing, ... B+-tree structure to locate

3070 3220 3070 3220 3660 - Demirhan€¦ · straining post 4000 "g" 3220 3220 5000 straining post 3070 3070 "e" "f" "d" straining post straining post straining post straining post

DATA STRUCTURES AND ALGORITHMS - … · queues, linked lists, trees, binary search trees, binary heaps, graphs. Algorithms: Searching, sorting, hashing. ... TREE 4.1 Binary Tree 28

CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison

11.1Database System Concepts - 6 th Edition Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static

Chapter 11: Indexing and Hashing - ssyu.im.ncnu.edu.tw · Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B +-Tree Index Files B-Tree Index Files Static Hashing Dynamic