39
I/O-efficient Algorithms and Data Structures Lars Arge Professor and center director Aarhus University ALMADA summer school August 1, 2013

I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Embed Size (px)

DESCRIPTION

In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem. In these four lectures, we will cover fundamental I/O-efficient algorithms and data structures, such as sorting algorithms, search trees and priority queues. We will also discuss geometric problems, as well as geometric data structures for various range searching problems.

Citation preview

Page 1: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

I/O-efficient Algorithms and Data Structures

Lars Arge

Professor and center director

Aarhus University

ALMADA summer schoolAugust 1, 2013

Page 2: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

• Pervasive use of computers and sensors• Increased ability to acquire/store/process data

→ Massive data collected everywhere• Society increasingly “data driven”

→ Access/process data anywhere any time

Nature/Science special issues• 2/06, 9/08, 2/11• Scientific data size growing exponentially,

while quality and availability improving• Paradigm shift: Science will be about mining data

Massive Data

Obviously not only in sciences:• Economist 02/10:

• From 150 Billion Gigabytes five years ago

to 1200 Billion today• Managing data deluge difficult; doing so

will transform business/public life

I/O-efficient Algorithms and Data Structures

Lars Arge 2

Page 3: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 3

Random Access Machine Model

• Standard theoretical model of computation:– Infinite memory– Uniform access cost

• Simple model crucial for success of computer industry

R

A

M

I/O-efficient Algorithms and Data Structures

Page 4: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 4

Hierarchical Memory

• Modern machines have complicated memory hierarchy– Levels get larger and slower further away from CPU– Data moved between levels using large blocks

L

1

L

2

R

A

M

I/O-efficient Algorithms and Data Structures

Page 5: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 5

Slow I/O

– Disk systems try to amortize large access time transferring large contiguous blocks of data

• Important to store/access data to take advantage of blocks (locality)

• Disk access is 106 times slower than main memory access

track

magnetic surface

read/write armread/write head

“The difference in speed between modern CPU and

disk technologies is analogous to the difference

in speed in sharpening a pencil using a sharpener on

one’s desk or by taking an airplane to the other side of

the world and using a sharpener on someone else’s

desk.” (D. Comer)

I/O-efficient Algorithms and Data Structures

Page 6: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 6

Scalability Problems• Most programs developed in RAM-model

– Run on large datasets because

OS moves blocks as needed

• Moderns OS utilizes sophisticated paging and prefetching strategies– But if program makes scattered accesses even good OS cannot

take advantage of block access

Scalability problems!

data size

runn

ing

tim

e

I/O-efficient Algorithms and Data Structures

Page 7: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 7

N = # of items in the problem instance

B = # of items per disk block

M = # of items that fit in main memory

T = # of items in output

I/O: Move block between memory and disk

We assume (for convenience) that M >B2

D

P

M

Block I/O

External Memory Model

I/O-efficient Algorithms and Data Structures

Page 8: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 8

Fundamental Bounds Internal External

• Scanning: N• Sorting: N log N• Permuting • Searching:

• Note:– Linear I/O: O(N/B)– Permuting not linear– Permuting and sorting bounds are equal in all practical cases– B factor VERY important: – Cannot sort optimally with search tree

NBlog

BN

BN

BMlog

BN

NBN

BN

BN

BM log

}log,min{BN

BN

BMNN

N2log

I/O-efficient Algorithms and Data Structures

Page 9: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Scalability Problems: Block Access Matters• Example: Traversing linked list (List ranking)

– Array size N = 10 elements– Disk block size B = 2 elements– Main memory size M = 4 elements (2 blocks)

• Large difference between N and N/B large since block size is large– Example: N = 256 x 106, B = 8000 , 1ms disk access time

N I/Os take 256 x 103 sec = 4266 min = 71 hr

N/B I/Os take 256/8 sec = 32 sec

Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os

1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3

9Lars Arge

I/O-efficient Algorithms and Data Structures

Page 10: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Outline• Today:

– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping

• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking

Lars Arge 10

I/O-efficient Algorithms and Data Structures

Page 11: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 11

Queues and Stacks• Queue:

– Maintain push and pop blocks in main memory

O(1/B) Push/Pop operations amortized

• Stack:– Maintain push/pop blocks in main memory

O(1/B) Push/Pop operations amortized

Push Pop

I/O-efficient Algorithms and Data Structures

Page 12: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 12

Sorting• <M/B sorted lists (queues) can be merged in O(N/B) I/Os

M/B blocks in main memory

• Unsorted list (queue) can be distributed using <M/B split elements in O(N/B) I/Os

I/O-efficient Algorithms and Data Structures

Page 13: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 13

Sorting• Merge sort:

– Create N/M memory sized sorted lists– Repeatedly merge lists together Θ(M/B) at a time

phases using I/Os each I/Os)( BNO)(log

MN

BMO )log(

BN

BN

BMO

)(MN

)/(BM

MN

))/(( 2BM

MN

1

I/O-efficient Algorithms and Data Structures

Page 14: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 14

Sorting• Distribution sort (multiway quicksort):

– Compute Θ(M/B) split elements– Distribute unsorted list into Θ(M/B) unsorted lists of equal size– Recursively split lists until fit in memory

I/Os if split elements in O(N/B) I/Os

Can compute split elements in O(N/B) I/Os

phases I/Os

)log(BN

BN

BMO

BM

)(log)(logMN

MN

BM

BM OO )log(

BN

BN

BMO

I/O-efficient Algorithms and Data Structures

Page 15: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 15

Permuting Lower BoundPermuting N elements according to a given permutation takes

I/Os in “indivisibility” model

• Indivisibility model: Move of elements only allowed operation• Note:

– We can allow copies (and destruction of elements)– Bound also a lower bound on sorting

• Proof:– View memory and disk as array of N tracks of B elements– Assume all I/Os track aligned (assumption can be removed)

})log,(min{BN

BN

BMN

I/O-efficient Algorithms and Data Structures

M D

Page 16: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 16

Permuting Lower Bound– Array contains permutation of N elements at all times– We will count how many permutations can be

reached (produced) with t I/Os– Input:

* Choose track: N possibilities* Rearrange ≤ B element in track and place among ≤ M-B

elements in memory:–

possibilities if “fresh” track

– otherwise

at most permutations after t inputs– Output: Choose track: N possibilities

BN

B

M BN t )!())((

)(!B

MB )(

B

M

I/O-efficient Algorithms and Data Structures

M D

Page 17: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 17

Permuting Lower Bound– Permutation algorithm needs to be able to produce N! permutations

(using Stirlings formula and )– If we have– If we have and thus

!)!())(( NBN BN

B

M t

)!log())log((log)!log( NNtBB

M

BN

NNBNtBN BM log)log(loglog

BM

BN

BN

Nt

loglog

log

xxx log!log BMB

B

M log)log(

BMBN loglog B

NBMB

NB

N

BMBN

t /log2

loglog

BMBN loglog NB

NNNNNt NB

N

NBN

21

21

loglog

21

log2

log

})log,(min{ BN

BN

BMNt

I/O-efficient Algorithms and Data Structures

Page 18: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 18

Sorting lower bound• Similar argument but assuming comparison model in internal memory

– Initially N! possible orderings– Count how may possible after t I/Os

Sorting N elements takes I/Os in comparison model

!)!())(( NBN BN

B

M t

)!log())log((log)!log( NNtBB

M

BN

NNBNtBN BM log)log(loglog

BM

BN

BN

Nt

loglog

log

)log(BN

BN

BM

I/O-efficient Algorithms and Data Structures

Page 19: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 19

Summary/Conclusion: Sorting• External merge or distribution sort takes I/Os

– Merge-sort based on M/B-way merging– Distribution sort based on -way distribution

and (complicated) split elements finding– Key is linear I/O M/B-way merging/distribution

• Optimal in comparison model

• lower bound in stronger indivisibility model– Holds even for permuting

)log(BN

BN

BMO

})log,(min{BN

BN

BMN

BM

I/O-efficient Algorithms and Data Structures

Page 20: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Outline• Today:

– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping

• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking

Lars Arge 20

I/O-efficient Algorithms and Data Structures

Page 21: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

21Lars Arge

Cache-Oblivious Model

• N, B, and M as in I/O-model– Assumed that M>B2 (tall cache assumption)

• M and B not used in algorithm description• Block transfers by optimal paging strategy

Analyze in two-level modelEfficient on one level,

efficient on all levels (of fully associative hierarchy using LRU)

I/O-efficient Algorithms and Data Structures

Page 22: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

22Lars Arge

• levels• Distribution in I/Os

algorithms

I/O-model Distribution Sort

BM

1

)(BNO

)log(BN

BN

BMO

N

I/O-efficient Algorithms and Data Structures

BM

)(log)(logMN

MN

BM

BM OO

Page 23: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

23Lars Arge

Cache-Oblivious Distribution Sort• General idea: way distribution

N

N

N N N N N N

• Distribution performed in accesses after– breaking the N elements into subarrays of size – sorting each subarray recursively

)(BNO

N N

Recurrence for number

of accesses used to sort

)log()(BN

BN

BMONT

MNO

MNONTNNT

BN

BN

if)(

if)()(2)(

I/O-efficient Algorithms and Data Structures

Page 24: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

24Lars Arge

Cache-Oblivious Distribution Sort• Distribution in accesses (assuming buckets known):)(

BNO

NN

– Divide subarrays into two equal sized groups A1 and A2

– Divide buckets into two equal-sized groups B1 and B2

– Recursively:

1. Distribute (relevant) elements from A1 into B1

2. Distribute (relevant) elements from A2 into B1

3. Distribute elements from A1 into B2

4. Distribute elements from A2 into B2

1 23 4

I/O-efficient Algorithms and Data Structures

Page 25: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

25Lars Arge

Cache-Oblivious Distribution SortAnalysis of distribution:

– Consider recursive subproblems on B subarrays* such problems* Each subproblem solved in

accesses accesses

2

log4BNB

N

BB movedelements

)()( 2 BN

BN

BN OBO

N

2N

22

N

B

BNlog

NB

I/O-efficient Algorithms and Data Structures

Page 26: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

26Lars Arge

Batched Geometric Problems• Example: Orthogonal line segment intersection

– Given set of axis-parallel line segments, report all intersections

• In internal memory many problems is solved using sweeping

I/O-efficient Algorithms and Data Structures

Page 27: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

27Lars Arge

Plane Sweeping

• Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates)– Top endpoint of vertical segment: Insert in T– Bottom endpoint of vertical segment: Delete from T– Horizontal segment: Perform range query with x-interval on T

I/O-efficient Algorithms and Data Structures

Page 28: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

28Lars Arge

Plane Sweeping

• In internal memory algorithm runs in optimal O(Nlog N+T) time• In external memory algorithm performs badly (>N I/Os) if |T|>M

• Solution: Distribution sweeping– Combination of distribution and plane sweeping

I/O-efficient Algorithms and Data Structures

Page 29: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

29Lars Arge

Distribution Sweeping

• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between

– part of horizontal segment spanning slab(s) and vertical segments• Distribute data to M/B-1 slabs

– vertical segments and non-spanning parts of horizontal segments• Recourse in each slab

I/O-efficient Algorithms and Data Structures

Page 30: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

30Lars Arge

Distribution Sweeping

• Sweep performed in O(N/B+T’/B) I/Os I/Os• Maintain active list of vertical segments for each slab (<B in memory)

– Top endpoint of vertical segment: Insert in active list– Horizontal segment: Scan through all relevant active lists

* Removing “expired” vertical segments* Reporting intersections with “non-expired” vertical segments

)log(BT

BN

BN

BMO

I/O-efficient Algorithms and Data Structures

Page 31: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

31Lars Arge

Distribution Sweeping• Other example: Rectangle intersection

– Given set of axis-parallel rectangles, report all intersections.

I/O-efficient Algorithms and Data Structures

Page 32: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

32Lars Arge

Distribution Sweeping

• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between

– part of rectangles spanning slab(s) and other rectangles• Distribute data to M/B-1 slabs

– Non-spanning parts of rectangles • Recurse in each slab

I/O-efficient Algorithms and Data Structures

Page 33: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

33Lars Arge

Distribution Sweeping

• Seems hard to perform sweep in O(N/B+T’/B) I/Os• Solution: Multislabs (contigious set of slabs)

– Reduce fanout of distribution to – Recursion height still– Room for block from each multislab (activlist) in memory

)( BM

)(logBN

BMO

I/O-efficient Algorithms and Data Structures

Page 34: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

34Lars Arge

Distribution Sweeping

• Sweep while maintaining rectangle active list for each multisslab– Top side of spanning rectangle: Insert in active multislab list– Each rectangle: Scan through all relevant multislab lists

* Removing “expired” rectangles* Reporting intersections with “non-expired” rectangles

I/Os)log(BT

BN

BN

BMO

I/O-efficient Algorithms and Data Structures

Page 35: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

35Lars Arge

HomeworkDesign an I/O algorithm that given N rectangles in the

plane computes the measure (area) of the their union. Make sure to

argue for correctness and I/O-complexity of your algorithm.

Hint: First find for each input y-coordinate y the length of the

intersection between the rectangles and a horizontal line through the y.

Use distribution sweeping with a combine step to do so.

)log(BN

BN

BMO

I/O-efficient Algorithms and Data Structures

Page 36: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Outline• Today:

– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping

• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking

Lars Arge 36

I/O-efficient Algorithms and Data Structures

Page 37: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Lars Arge 37

References• Input/Output Complexity of Sorting and Related Problems

A. Aggarwal and J.S. Vitter. CACM 31(9), 1988

• External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93

• Cache-Oblivious Algorithms. M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Proc. FOCS '99.

I/O-efficient Algorithms and Data Structures

Page 38: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

• High level objectives:– Advance algorithmic knowledge in “massive data” processing area– Train researchers in world-leading international environment– Be catalyst for multidisciplinary/industry collaboration

• Building on:– Strong international team– Vibrant international environment – Focus areas (among others):

* I/O-efficient algorithms* Streaming algorithms* Cache-oblivious algorithms* Algorithm engineering

Center for Massive Data Algorithmics

We are hiring!!

Lars Arge 38

I/O-efficient Algorithms and Data Structures

Page 39: I/O Efficient Algorithms and Data Structures 1+2 (Lecture by Lars Arge)

Thanks

[email protected]

www.madalgo.au.dk