18
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Embed Size (px)

Citation preview

Page 1: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

CSC 213 –Large Scale

Programming

Lecture 37:

External Caching & (a,b)-Trees

Page 2: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Today’s Goal

Look at advanced Tree structures Part of most databases, operating systems Anywhere there is lot of data to be held

Already examined related (2,4) trees Now look at more general definition Also examine why we should care

Page 3: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Big-Oh notation not always accurate For example, treats memory accesses equally

But many different memories inside machine Organized in a pyramid Higher == faster Lower == cheaper (Cheaper also means

more memory available)

register

L1 cache

main memory (RAM)

hard drive

L2 cache

Lies My Professor Told Me

Page 4: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Hierarchy In Perspective

Suppose the processor needs a beverage Registers -- Drink from the mug in its hand L1 Cache -- Get from a case in the fridge L2 Cache -- Get from tapped barrel in the cellar Main memory -- Purchase corner Wilson Farms Hard drive -- Drive to closest brewery & buy vat Network -- Go to Germany & buy Bavaria

Page 5: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Waiting Is a Pain

Page 6: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Not All Access Are Equal

Want to limit access to lowest possible level Easy when we are only using a few objects Difficult when working with non-trivial data sets

Two common approaches to avoid the wait Caching -- hold data from hard drive in RAM

Usually stores most recently or frequently used data Locality -- organize data to limit amount used

By matching internal storage to improve cache effectiveness

Page 7: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Virtual Memory

“Extends” RAM by using space on hard drive Big win if we rarely access the material on disk Incredibly slow if always stuck driving to brewery

Works by dividing memory into pages Each page is a constant size (usually 4096 bytes) Operating system handles memory at page level

Limits overhead and maximizes efficiency Evicts unused pages to the hard drive for storage Reloads pages when it is then accessed

Page 8: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Problems with Binary Trees

Good way to organize information Provides consistent O(log n) processing times

Organization is very bad for locality, however Nodes contain only 1 piece of data Must then jump to one of its two children Nodes can get randomly spread over heap Good torture test for roommates computer

(2,4) trees provide some improvement Still have at most 3 elements & 4 children Does not use anything like 4096 bytes in a page

Page 9: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

(a, b) Trees to the Rescue!

Real-world solution to killing disks by paging Linux & MacOS to track files & directories Organization used by MySQL & other databases Found in many other places where paging occurs

(2,4) trees are one example of these Can also create others, just follow the rules

All leaves are found at same level of the tree All internal nodes but root have at least a children All internal nodes have at most b child Nodes

Page 10: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Improving Locality

For (2,4) trees, a == 2 and b == 4 Process of splitting and merging nodes still holds We only vary the number of children in Node

Minimize paging using good size for a & b Store all the elements in an additional dictionary Make sure full node, including dictionary and child

references fill a page Limit number of nearly empty pages by selecting

reasonable value for a

Page 11: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Insertion

Always insert data into a leaf node Once inserted check for overflow!

Trying to make larger than allowed Example: insert(30)

27 32 35

15 24

12 18 27 30 32 35

1 2 3 4 5

Page 12: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

27 28 29 3027 28 30

Split In Case Of Overflow Split overflowing Node 2 new nodes

Promote median element to the parent Node Divide remaining elements into the two new Nodes

This may cause parent Node to overflow So must repeat the process until we hit the root If the root node overflows, we create a new root!

12 18 35

15 24 32

Page 13: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Parent Overflow

Example: insert(29)

15 24 29 32

12 27 2818 3530

Page 14: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Parent Overflow

Example: insert(29)

15 24

12 27 2818 3530

29

32

Page 15: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

15

9 14

Underflow and Fusion

Deleting Entry may cause underflow Two possible solutions depending on situation

Example: remove(15)

2 5 7 10

Page 16: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

6 8

Case 1: Transfer

Has adjacent sibling with elements to spare Steal closest Entry from parent & sibling’s child Parent takes sibling’s closest Entry We’re done

Example: remove(10)

4 9

6 82 10 2

4 9

9

4 94

6 8

4 8

6

Page 17: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Case 2: Fusion

Emptied node has siblings of minimum size Merge node & sibling into one Steal Entry from parent that was between

siblings May propagate underflow to parent!

Example: remove(15)9 14

2 5 7 10 15

9 14

10 142 5 7 10

99 14

Page 18: CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

For Next Lecture

Look at most popular version of (a, b)Tree How a BTree is implemented Ways of reading an writing these trees to disk