Upload
anton-konushin
View
1.027
Download
3
Tags:
Embed Size (px)
DESCRIPTION
In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem. In these four lectures, we will cover fundamental I/O-efficient algorithms and data structures, such as sorting algorithms, search trees and priority queues. We will also discuss geometric problems, as well as geometric data structures for various range searching problems.
Citation preview
I/O-efficient Algorithms and Data Structures
Lars Arge
Professor and center director
Aarhus University
ALMADA summer schoolAugust 1, 2013
• Pervasive use of computers and sensors• Increased ability to acquire/store/process data
→ Massive data collected everywhere• Society increasingly “data driven”
→ Access/process data anywhere any time
Nature/Science special issues• 2/06, 9/08, 2/11• Scientific data size growing exponentially,
while quality and availability improving• Paradigm shift: Science will be about mining data
Massive Data
Obviously not only in sciences:• Economist 02/10:
• From 150 Billion Gigabytes five years ago
to 1200 Billion today• Managing data deluge difficult; doing so
will transform business/public life
I/O-efficient Algorithms and Data Structures
Lars Arge 2
Lars Arge 3
Random Access Machine Model
• Standard theoretical model of computation:– Infinite memory– Uniform access cost
• Simple model crucial for success of computer industry
R
A
M
I/O-efficient Algorithms and Data Structures
Lars Arge 4
Hierarchical Memory
• Modern machines have complicated memory hierarchy– Levels get larger and slower further away from CPU– Data moved between levels using large blocks
L
1
L
2
R
A
M
I/O-efficient Algorithms and Data Structures
Lars Arge 5
Slow I/O
– Disk systems try to amortize large access time transferring large contiguous blocks of data
• Important to store/access data to take advantage of blocks (locality)
• Disk access is 106 times slower than main memory access
track
magnetic surface
read/write armread/write head
“The difference in speed between modern CPU and
disk technologies is analogous to the difference
in speed in sharpening a pencil using a sharpener on
one’s desk or by taking an airplane to the other side of
the world and using a sharpener on someone else’s
desk.” (D. Comer)
I/O-efficient Algorithms and Data Structures
Lars Arge 6
Scalability Problems• Most programs developed in RAM-model
– Run on large datasets because
OS moves blocks as needed
• Moderns OS utilizes sophisticated paging and prefetching strategies– But if program makes scattered accesses even good OS cannot
take advantage of block access
Scalability problems!
data size
runn
ing
tim
e
I/O-efficient Algorithms and Data Structures
Lars Arge 7
N = # of items in the problem instance
B = # of items per disk block
M = # of items that fit in main memory
T = # of items in output
I/O: Move block between memory and disk
We assume (for convenience) that M >B2
D
P
M
Block I/O
External Memory Model
I/O-efficient Algorithms and Data Structures
Lars Arge 8
Fundamental Bounds Internal External
• Scanning: N• Sorting: N log N• Permuting • Searching:
• Note:– Linear I/O: O(N/B)– Permuting not linear– Permuting and sorting bounds are equal in all practical cases– B factor VERY important: – Cannot sort optimally with search tree
NBlog
BN
BN
BMlog
BN
NBN
BN
BN
BM log
}log,min{BN
BN
BMNN
N2log
I/O-efficient Algorithms and Data Structures
Scalability Problems: Block Access Matters• Example: Traversing linked list (List ranking)
– Array size N = 10 elements– Disk block size B = 2 elements– Main memory size M = 4 elements (2 blocks)
• Large difference between N and N/B large since block size is large– Example: N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr
N/B I/Os take 256/8 sec = 32 sec
Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os
1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3
9Lars Arge
I/O-efficient Algorithms and Data Structures
Outline• Today:
– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping
• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking
Lars Arge 10
I/O-efficient Algorithms and Data Structures
Lars Arge 11
Queues and Stacks• Queue:
– Maintain push and pop blocks in main memory
O(1/B) Push/Pop operations amortized
• Stack:– Maintain push/pop blocks in main memory
O(1/B) Push/Pop operations amortized
Push Pop
I/O-efficient Algorithms and Data Structures
Lars Arge 12
Sorting• <M/B sorted lists (queues) can be merged in O(N/B) I/Os
M/B blocks in main memory
• Unsorted list (queue) can be distributed using <M/B split elements in O(N/B) I/Os
I/O-efficient Algorithms and Data Structures
Lars Arge 13
Sorting• Merge sort:
– Create N/M memory sized sorted lists– Repeatedly merge lists together Θ(M/B) at a time
phases using I/Os each I/Os)( BNO)(log
MN
BMO )log(
BN
BN
BMO
)(MN
)/(BM
MN
))/(( 2BM
MN
1
I/O-efficient Algorithms and Data Structures
Lars Arge 14
Sorting• Distribution sort (multiway quicksort):
– Compute Θ(M/B) split elements– Distribute unsorted list into Θ(M/B) unsorted lists of equal size– Recursively split lists until fit in memory
I/Os if split elements in O(N/B) I/Os
Can compute split elements in O(N/B) I/Os
phases I/Os
)log(BN
BN
BMO
BM
)(log)(logMN
MN
BM
BM OO )log(
BN
BN
BMO
I/O-efficient Algorithms and Data Structures
Lars Arge 15
Permuting Lower BoundPermuting N elements according to a given permutation takes
I/Os in “indivisibility” model
• Indivisibility model: Move of elements only allowed operation• Note:
– We can allow copies (and destruction of elements)– Bound also a lower bound on sorting
• Proof:– View memory and disk as array of N tracks of B elements– Assume all I/Os track aligned (assumption can be removed)
})log,(min{BN
BN
BMN
I/O-efficient Algorithms and Data Structures
M D
Lars Arge 16
Permuting Lower Bound– Array contains permutation of N elements at all times– We will count how many permutations can be
reached (produced) with t I/Os– Input:
* Choose track: N possibilities* Rearrange ≤ B element in track and place among ≤ M-B
elements in memory:–
possibilities if “fresh” track
– otherwise
at most permutations after t inputs– Output: Choose track: N possibilities
BN
B
M BN t )!())((
)(!B
MB )(
B
M
I/O-efficient Algorithms and Data Structures
M D
Lars Arge 17
Permuting Lower Bound– Permutation algorithm needs to be able to produce N! permutations
(using Stirlings formula and )– If we have– If we have and thus
!)!())(( NBN BN
B
M t
)!log())log((log)!log( NNtBB
M
BN
NNBNtBN BM log)log(loglog
BM
BN
BN
Nt
loglog
log
xxx log!log BMB
B
M log)log(
BMBN loglog B
NBMB
NB
N
BMBN
t /log2
loglog
BMBN loglog NB
NNNNNt NB
N
NBN
21
21
loglog
21
log2
log
})log,(min{ BN
BN
BMNt
I/O-efficient Algorithms and Data Structures
Lars Arge 18
Sorting lower bound• Similar argument but assuming comparison model in internal memory
– Initially N! possible orderings– Count how may possible after t I/Os
Sorting N elements takes I/Os in comparison model
!)!())(( NBN BN
B
M t
)!log())log((log)!log( NNtBB
M
BN
NNBNtBN BM log)log(loglog
BM
BN
BN
Nt
loglog
log
)log(BN
BN
BM
I/O-efficient Algorithms and Data Structures
Lars Arge 19
Summary/Conclusion: Sorting• External merge or distribution sort takes I/Os
– Merge-sort based on M/B-way merging– Distribution sort based on -way distribution
and (complicated) split elements finding– Key is linear I/O M/B-way merging/distribution
• Optimal in comparison model
• lower bound in stronger indivisibility model– Holds even for permuting
)log(BN
BN
BMO
})log,(min{BN
BN
BMN
BM
I/O-efficient Algorithms and Data Structures
Outline• Today:
– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping
• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking
Lars Arge 20
I/O-efficient Algorithms and Data Structures
21Lars Arge
Cache-Oblivious Model
• N, B, and M as in I/O-model– Assumed that M>B2 (tall cache assumption)
• M and B not used in algorithm description• Block transfers by optimal paging strategy
Analyze in two-level modelEfficient on one level,
efficient on all levels (of fully associative hierarchy using LRU)
I/O-efficient Algorithms and Data Structures
22Lars Arge
• levels• Distribution in I/Os
algorithms
I/O-model Distribution Sort
BM
1
)(BNO
)log(BN
BN
BMO
N
I/O-efficient Algorithms and Data Structures
BM
)(log)(logMN
MN
BM
BM OO
23Lars Arge
Cache-Oblivious Distribution Sort• General idea: way distribution
N
N
N N N N N N
• Distribution performed in accesses after– breaking the N elements into subarrays of size – sorting each subarray recursively
)(BNO
N N
Recurrence for number
of accesses used to sort
)log()(BN
BN
BMONT
MNO
MNONTNNT
BN
BN
if)(
if)()(2)(
I/O-efficient Algorithms and Data Structures
24Lars Arge
Cache-Oblivious Distribution Sort• Distribution in accesses (assuming buckets known):)(
BNO
NN
– Divide subarrays into two equal sized groups A1 and A2
– Divide buckets into two equal-sized groups B1 and B2
– Recursively:
1. Distribute (relevant) elements from A1 into B1
2. Distribute (relevant) elements from A2 into B1
3. Distribute elements from A1 into B2
4. Distribute elements from A2 into B2
1 23 4
I/O-efficient Algorithms and Data Structures
25Lars Arge
Cache-Oblivious Distribution SortAnalysis of distribution:
– Consider recursive subproblems on B subarrays* such problems* Each subproblem solved in
accesses accesses
2
log4BNB
N
BB movedelements
)()( 2 BN
BN
BN OBO
N
2N
22
N
B
BNlog
NB
I/O-efficient Algorithms and Data Structures
26Lars Arge
Batched Geometric Problems• Example: Orthogonal line segment intersection
– Given set of axis-parallel line segments, report all intersections
• In internal memory many problems is solved using sweeping
I/O-efficient Algorithms and Data Structures
27Lars Arge
Plane Sweeping
• Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates)– Top endpoint of vertical segment: Insert in T– Bottom endpoint of vertical segment: Delete from T– Horizontal segment: Perform range query with x-interval on T
I/O-efficient Algorithms and Data Structures
28Lars Arge
Plane Sweeping
• In internal memory algorithm runs in optimal O(Nlog N+T) time• In external memory algorithm performs badly (>N I/Os) if |T|>M
• Solution: Distribution sweeping– Combination of distribution and plane sweeping
I/O-efficient Algorithms and Data Structures
29Lars Arge
Distribution Sweeping
• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between
– part of horizontal segment spanning slab(s) and vertical segments• Distribute data to M/B-1 slabs
– vertical segments and non-spanning parts of horizontal segments• Recourse in each slab
I/O-efficient Algorithms and Data Structures
30Lars Arge
Distribution Sweeping
• Sweep performed in O(N/B+T’/B) I/Os I/Os• Maintain active list of vertical segments for each slab (<B in memory)
– Top endpoint of vertical segment: Insert in active list– Horizontal segment: Scan through all relevant active lists
* Removing “expired” vertical segments* Reporting intersections with “non-expired” vertical segments
)log(BT
BN
BN
BMO
I/O-efficient Algorithms and Data Structures
31Lars Arge
Distribution Sweeping• Other example: Rectangle intersection
– Given set of axis-parallel rectangles, report all intersections.
I/O-efficient Algorithms and Data Structures
32Lars Arge
Distribution Sweeping
• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between
– part of rectangles spanning slab(s) and other rectangles• Distribute data to M/B-1 slabs
– Non-spanning parts of rectangles • Recurse in each slab
I/O-efficient Algorithms and Data Structures
33Lars Arge
Distribution Sweeping
• Seems hard to perform sweep in O(N/B+T’/B) I/Os• Solution: Multislabs (contigious set of slabs)
– Reduce fanout of distribution to – Recursion height still– Room for block from each multislab (activlist) in memory
)( BM
)(logBN
BMO
I/O-efficient Algorithms and Data Structures
34Lars Arge
Distribution Sweeping
• Sweep while maintaining rectangle active list for each multisslab– Top side of spanning rectangle: Insert in active multislab list– Each rectangle: Scan through all relevant multislab lists
* Removing “expired” rectangles* Reporting intersections with “non-expired” rectangles
I/Os)log(BT
BN
BN
BMO
I/O-efficient Algorithms and Data Structures
35Lars Arge
HomeworkDesign an I/O algorithm that given N rectangles in the
plane computes the measure (area) of the their union. Make sure to
argue for correctness and I/O-complexity of your algorithm.
Hint: First find for each input y-coordinate y the length of the
intersection between the rectangles and a horizontal line through the y.
Use distribution sweeping with a combine step to do so.
)log(BN
BN
BMO
I/O-efficient Algorithms and Data Structures
Outline• Today:
– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting (if time permits)– Batched geometric problems: Distribution sweeping
• Monday:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking
Lars Arge 36
I/O-efficient Algorithms and Data Structures
Lars Arge 37
References• Input/Output Complexity of Sorting and Related Problems
A. Aggarwal and J.S. Vitter. CACM 31(9), 1988
• External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93
• Cache-Oblivious Algorithms. M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Proc. FOCS '99.
I/O-efficient Algorithms and Data Structures
• High level objectives:– Advance algorithmic knowledge in “massive data” processing area– Train researchers in world-leading international environment– Be catalyst for multidisciplinary/industry collaboration
• Building on:– Strong international team– Vibrant international environment – Focus areas (among others):
* I/O-efficient algorithms* Streaming algorithms* Cache-oblivious algorithms* Algorithm engineering
Center for Massive Data Algorithmics
We are hiring!!
Lars Arge 38
I/O-efficient Algorithms and Data Structures