Binary Merge-Sort
Merge-Sort(A,i,j)01 if (i < j) then02 m = (i+j)/2; 03 Merge-Sort(A,i,m);04 Merge-Sort(A,m+1,j);05 Merge(A,i,m,j)
Merge-Sort(A,i,j)01 if (i < j) then02 m = (i+j)/2; 03 Merge-Sort(A,i,m);04 Merge-Sort(A,m+1,j);05 Merge(A,i,m,j)
DivideConquer
Combine
1 2 8 10 7 9 13 19
1 2 7
Merge is linear in the
#items to be merged
Few key observations
Items = (short) strings = atomic...
On english wikipedia, about 109 tokens to sort (n log n) memory accesses (I/Os ??)
[5ms] * n log2 n ≈ 3 years
In practice it is a “faster”, why?
Recursion
10 2
10 2
5 1
5 1
13 19
13 19
9 7
9 7
15 4
15 4
8 3
8 3
12 17
12 17
6 11
6 11
10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11
10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11
10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11
log2 N
Implicit Caching…
10 2
2 10
5 1
1 5
13 19
13 19
9 7
7 9
15 4
4 15
8 3
3 8
12 17
12 17
6 11
6 11
1 2 5 10 7 9 13 19 3 4 8 15 6 11 12 17
1 2 5 7 9 10 13 19 3 4 6 8 11 12 15 17
1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19
log2 N
MN/M runs, each sorted in internal memory (no I/Os)
2 passes (one Read/one Write) = 2 * (N/B)
I/Os
— I/O-cost for binary merge-sort is ≈ 2 (N/B) log2 (N/M)
Log
2 (
N/M
)
2 passes
(R/W)
2 passes
(R/W)
B
A key inefficiency
1 2 4 7 9 10 13 19 3 5 6 8 11 12 15 17
B
After few steps, every run is longer than B !!!
B
We are using only 3 pagesBut memory contains M/B pages ≈ 230/215 = 215
B
OutputBuffer Disk
1, 2, 3
1, 2, 3
OutputRun
4, ...
Multi-way Merge-Sort
Sort N items with main-memory M and disk-pages B:
Pass 1: Produce (N/M) sorted runs. Pass i: merge X = M/B-1 runs logX N/M passes
Main memory buffers of B items
Pg for run1
Pg for run X
Out Pg
DiskDisk
Pg for run 2
. . . . . .
. . .
How it works
1 2 5 10 7 9 13 19
1 2 5 7….
1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19
MN/M runs, each sorted in internal memory = 2 (N/B) I/Os
2 passes (one Read/one Write) = 2 * (N/B)
I/Os
— I/O-cost for X-way merge is ≈ 2 (N/B) I/Os per level
Log
X (
N/M
)
M
X
X
Cost of Multi-way Merge-Sort
Number of passes = logX N/M logM/B (N/M)
Total I/O-cost is ( (N/B) logM/B N/M ) I/Os
Large fan-out (M/B) decreases #passes
In practice
M/B ≈ 105 #passes =1 few mins
Tuning dependson disk features
Compression would decrease the cost of a pass!