Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
CS 5633 – Spring 2012
Order StatisticsCarola Wenk
Slides courtesy of Charles Leiserson with small
2/9/12 CS 5633 Analysis of Algorithms 1
ychanges by Carola Wenk
Order statisticsOrder statisticsSelect the ith smallest of n elements (the element with rank i).• i = 1: minimum;;• i = n: maximum;• i = ⎣(n+1)/2⎦ or ⎡(n+1)/2⎤: median.( ) ( )
Naive algorithm: Sort and index ith element.W t i ti Θ( l + 1)Worst-case running time = Θ(n log n + 1)
= Θ(n log n),i t h t ( t i k t)
2/9/12 CS 5633 Analysis of Algorithms 2
using merge sort or heapsort (not quicksort).
Randomized divide-and-l ithconquer algorithm
RAND-SELECT(A, p, q, i) i-th smallest of A[ p . . q] ( p q ) [ p q]if p = q then return A[p]r ← RAND-PARTITION(A, p, q)k ← + 1 k k(A[ ])k ← r – p + 1 k = rank(A[r])if i = k then return A[r]if i < kif i < k
then return RAND-SELECT(A, p, r – 1, i )else return RAND-SELECT(A, r + 1, q, i – k )
≤ A[r] ≥ A[r]k
2/9/12 CS 5633 Analysis of Algorithms 3
rp q
ExampleExample
Select the i = 7th smallest:
i = 76 10 13 5 8 3 2 11
Select the i = 7th smallest:
pivot
P i ik = 42 5 3 6 8 13 10 11
Partition:
Select the 7 – 4 = 3rd smallest recursively.
2/9/12 CS 5633 Analysis of Algorithms 4
Select the 7 4 3rd smallest recursively.
Intuition for analysisIntuition for analysis(All our analyses today assume that all elements
Lucky:
( y yare distinct.)
for RAND-PARTITIONLucky:
101log 9/10 == nnCASE 3
T(n) = T(9n/10) + dn= Θ(n) CASE 3 Θ(n)
Unlucky:T(n) = T(n – 1) + dn arithmetic seriesT(n) T(n 1) + dn
= Θ(n2)arithmetic series
Worse than sorting!2/9/12 CS 5633 Analysis of Algorithms 5
Worse than sorting!
Analysis of expected timeAnalysis of expected time
The analysis follows that of randomized
Let T(n) = the random variable for the running
The analysis follows that of randomized quicksort, but it’s a little different.Let T(n) = the random variable for the running time of RAND-SELECT on an input of size n, assuming random numbers are independent.assuming random numbers are independent.For k = 0, 1, …, n–1, define the indicator random variablerandom variable
Xk = 1 if PARTITION generates a k : n–k–1 split,0 otherwise
2/9/12 CS 5633 Analysis of Algorithms 6
k 0 otherwise.
Analysis (continued)Analysis (continued)To obtain an upper bound, assume that the i th element
T(max{0, n–1}) + dn if 0 : n–1 split,
ppalways falls in the larger side of the partition:
T(n) =T(max{1, n–2}) + dn if 1 : n–2 split,
M( { 1 0}) d if 1 0 liT(max{n–1, 0}) + dn if n–1 : 0 split,
( )∑−
+−−=1
})1(max{n
dnknkTX ( )∑=
+−−=0
})1,(max{k
k dnknkTX.
( )∑−
+≤1
)(2n
k dnkTX
2/9/12 CS 5633 Analysis of Algorithms 7
( )⎣ ⎦∑
= 2/)(
nkk
Calculating expectationCalculating expectation
( )⎥⎤
⎢⎡
+= ∑−1
)(2)]([n
k dnkTXEnTE
Take expectations of both sides.
( )⎣ ⎦
⎥⎦
⎢⎣
+∑= 2/
)(2)]([nk
k dnkTXEnTE
Take expectations of both sides.
2/9/12 CS 5633 Analysis of Algorithms 8
Calculating expectationCalculating expectation
( )∑−
⎥⎤
⎢⎡
+=1
)(2)]([n
k dnkTXEnTE ( )⎣ ⎦
( )[ ]∑
∑−
=
+=
⎥⎦
⎢⎣
+
1
2/
)(2
)(2)]([
n
nkk
dnkTXE
dnkTXEnTE
Linearity of expectation.
( )[ ]⎣ ⎦∑
=
+=2/
)(2nk
k dnkTXE
y p
2/9/12 CS 5633 Analysis of Algorithms 9
Calculating expectationCalculating expectation
( )∑−
⎥⎤
⎢⎡
+=1
)(2)]([n
k dnkTXEnTE ( )⎣ ⎦
( )[ ]∑
∑−
=
+=
⎥⎦
⎢⎣
+
1
2/
)(2
)(2)]([
n
nkk
dnkTXE
dnkTXEnTE
( )[ ]⎣ ⎦
[ ] [ ]∑
∑−
=
+⋅=
+=
1
2/
)(2
)(2
n
nkk
dnkTEXE
dnkTXE
Independence of Xk from other random
[ ] [ ]⎣ ⎦∑
=
+⋅=2/
)(2nk
k dnkTEXE
p kchoices.
2/9/12 CS 5633 Analysis of Algorithms 10
Calculating expectationCalculating expectation
( )∑−
⎥⎤
⎢⎡
+=1
)(2)]([n
k dnkTXEnTE ( )⎣ ⎦
( )[ ]∑
∑−
=
+=
⎥⎦
⎢⎣
+
1
2/
)(2
)(2)]([
n
nkk
dnkTXE
dnkTXEnTE
( )[ ]⎣ ⎦
[ ] [ ]∑
∑−
=
+⋅=
+=
1
2/
)(2
)(2
n
nkk
dnkTEXE
dnkTXE
[ ] [ ]⎣ ⎦
[ ] ∑∑
∑−−
=
+=
+⋅=
11
2/
2)(2
)(2
nn
nkk
dnkTE
dnkTEXE
Linearity of expectation; E[Xk] = 1/n .
[ ]⎣ ⎦ ⎣ ⎦
∑∑==
+=2/2/
)(nknk
dnn
kTEn
2/9/12 CS 5633 Analysis of Algorithms 11
Linearity of expectation; E[Xk] 1/n .
Calculating expectationCalculating expectation
( )dnkTXEnTEn
k ⎥⎤
⎢⎡
+= ∑−1
)(2)]([ ( )⎣ ⎦
( )[ ]dnkTXE
dnkTXEnTE
n
nkk
+=
⎥⎦
⎢⎣
+
∑
∑−
=
1
2/
)(2
)(2)]([
( )[ ]⎣ ⎦
[ ] [ ]dnkTEXE
dnkTXE
n
nkk
+⋅=
+=
∑
∑−
=
1
2/
)(2
)(2
[ ] [ ]⎣ ⎦
[ ] dnkTE
dnkTEXE
nn
nkk
+=
+⋅=
∑∑
∑−−
=
11
2/
2)(2
)(2
[ ]⎣ ⎦ ⎣ ⎦
[ ] dnkTE
dnn
kTEn
n
nknk
+=
+=
∑
∑∑−
==
1
2/2/
)(2
)(
2/9/12 CS 5633 Analysis of Algorithms 12
[ ]⎣ ⎦
dnkTEn nk
+= ∑= 2/
)(
Hairy recurrenceHairy recurrence(But not quite as hairy as the quicksort one.)
[ ]⎣ ⎦
dnkTEn
nTEn
k+= ∑
−1
2/)(2)]([
⎣ ⎦n nk = 2/
Prove: E[T(n)] ≤ cn for constant c > 0 .• The constant c can be chosen large enough
so that E[T(n)] ≤ cn for the base cases.
Use fact: 21
83nk
n∑−
≤ (exercise).
2/9/12 CS 5633 Analysis of Algorithms 13
⎣ ⎦2/8
nk∑
=( )
Substitution methodSubstitution method[ ] dncknTE
n
+≤ ∑−12)([ ]
⎣ ⎦n nk∑
= 2/
Substitute inductive hypothesis.
2/9/12 CS 5633 Analysis of Algorithms 14
Substitution methodSubstitution method[ ] dncknTE
n
+≤ ∑−12)([ ]
⎣ ⎦
dnnc
n nk
+⎟⎞
⎜⎛≤
∑=
2
2/
32 dnnn
+⎟⎠
⎜⎝
≤8
Use fact.
2/9/12 CS 5633 Analysis of Algorithms 15
Substitution methodSubstitution method[ ] +≤ ∑
−
dncknTEn2)(
1
[ ]⎣ ⎦
+⎟⎞
⎜⎛≤
∑=
dnnc
n nk
32 2
2/
⎟⎞
⎜⎛
+⎟⎠
⎜⎝
≤
dcn
dnnn 8
⎟⎠⎞
⎜⎝⎛ −−= dncncn
4
Express as desired – residual.
2/9/12 CS 5633 Analysis of Algorithms 16
Substitution methodSubstitution method[ ] dncknTE
n
+≤ ∑−2)(1
[ ]⎣ ⎦
dnnc
n nk
+⎟⎞
⎜⎛≤
∑=
32 2
2/
dcn
dnnn
⎟⎞
⎜⎛
+⎟⎠
⎜⎝
≤8
cn
dncncn
≤
⎟⎠⎞
⎜⎝⎛ −−=
4cn≤
if c ≥ 4d .,
2/9/12 CS 5633 Analysis of Algorithms 17
Summary of randomized d i i l iorder-statistic selection
• Works fast: linear expected time• Works fast: linear expected time.• Excellent algorithm in practice.• But the worst case is very bad: Θ(n2)• But, the worst case is very bad: Θ(n ).
Q. Is there an algorithm that runs in linear i i h ?time in the worst case?
A. Yes, due to Blum, Floyd, Pratt, Rivest, d T j [1973]
IDEA: Generate a good pivot recursively.
and Tarjan [1973].
2/9/12 CS 5633 Analysis of Algorithms 18
g p y
Worst-case linear-time order i istatistics
SELECT(i, n)1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.2 Recursively SELECT the median x of the ⎣n/5⎦2. Recursively SELECT the median x of the ⎣n/5⎦
group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).
if i = k then return xelseif i < k
then recursively SELECT the ith
p ( )4.
Same as RANDthen recursively SELECT the ith
smallest element in the lower partelse recursively SELECT the (i–k)th
RAND-SELECT
2/9/12 CS 5633 Analysis of Algorithms 19
smallest element in the upper part
Choosing the pivotChoosing the pivot
2/9/12 CS 5633 Analysis of Algorithms 20
Choosing the pivotChoosing the pivot
1 Divide the n elements into groups of 51. Divide the n elements into groups of 5.
2/9/12 CS 5633 Analysis of Algorithms 21
Choosing the pivotChoosing the pivot
lesser1 Divide the n elements into groups of 5 Find lesser1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.
2/9/12 CS 5633 Analysis of Algorithms 22
greater
Choosing the pivotChoosing the pivot
x
lesser1 Divide the n elements into groups of 5 Find lesser1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ⎣n/5⎦
2/9/12 CS 5633 Analysis of Algorithms 23
greatery
group medians to be the pivot.
Developing the recurrenceDeveloping the recurrenceSELECT(i, n)T(n) ( , )1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.2 R i l S h di f h ⎣ /5⎦
( )
Θ(n)2. Recursively SELECT the median x of the ⎣n/5⎦
group medians to be the pivot.3 Partition around the pivot x Let k = rank(x)
T(n/5)Θ(n)
if i = k then return xelseif i < k
h i l S h i h
3. Partition around the pivot x. Let k rank(x).4.
Θ(n)
then recursively SELECT the ith smallest element in the lower part
else recursively SELECT the (i–k)th T( )?
2/9/12 CS 5633 Analysis of Algorithms 24
y ( )smallest element in the upper part
Analysis (Assume all elements are distinct )Analysis (Assume all elements are distinct.)
x
lesserAt least half the group medians are ≤ x which lesserAt least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.
2/9/12 CS 5633 Analysis of Algorithms 25
greater
Analysis (Assume all elements are distinct )Analysis (Assume all elements are distinct.)
x
lesserAt least half the group medians are ≤ x which lesserAt least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.• Therefore, at least 3 ⎣n/10⎦ elements are ≤ x.
2/9/12 CS 5633 Analysis of Algorithms 26
greater
Analysis (Assume all elements are distinct )Analysis (Assume all elements are distinct.)
x
lesserAt least half the group medians are ≤ x which lesserAt least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.• Therefore, at least 3 ⎣n/10⎦ elements are ≤ x.
2/9/12 CS 5633 Analysis of Algorithms 27
greater• Similarly, at least 3 ⎣n/10⎦ elements are ≥ x.
Analysis (Assume all elements are distinct )Analysis (Assume all elements are distinct.)
Need “at most” for worst-case runtime
• At least 3 ⎣n/10⎦ elements are ≤ x⇒ at most n-3 ⎣n/10⎦ elements are ≥ x⇒ at most n 3 ⎣n/10⎦ elements are ≥ x
• At least 3 ⎣n/10⎦ elements are ≥ x⇒ at most n-3 ⎣n/10⎦ elements are ≤ x⇒ at most n 3 ⎣n/10⎦ elements are ≤ x
• The recursive call to SELECT in Step 4 is executed recursively on n-3 ⎣n/10⎦ elementsexecuted recursively on n-3 ⎣n/10⎦ elements.
2/9/12 CS 5633 Analysis of Algorithms 28
Analysis (Assume all elements are distinct )Analysis (Assume all elements are distinct.)
• Use fact that ⎣a/b⎦ ≥ ((a-(b-1))/b (page 51)• n 3 ⎣n/10⎦ ≤ n 3·(n 9)/10 = (10n 3n +27)/10• n-3 ⎣n/10⎦ ≤ n-3·(n-9)/10 = (10n -3n +27)/10
≤ 7n/10 + 3Th i ll t SELECT i St 4 i• The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elementselements.
2/9/12 CS 5633 Analysis of Algorithms 29
Developing the recurrenceDeveloping the recurrenceSELECT(i, n)T(n) ( , )1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.2 R i l S h di f h ⎣ /5⎦
( )
Θ(n)2. Recursively SELECT the median x of the ⎣n/5⎦
group medians to be the pivot.3 Partition around the pivot x Let k = rank(x)
T(n/5)Θ(n)
if i = k then return xelseif i < k
h i l S h i h
3. Partition around the pivot x. Let k rank(x).4.
Θ(n)
then recursively SELECT the ith smallest element in the lower part
else recursively SELECT the (i–k)th T(7n/10
+3)
2/9/12 CS 5633 Analysis of Algorithms 30
y ( )smallest element in the upper part
Solving the recurrenceSolving the recurrencednnTnTnT +⎟
⎠⎞
⎜⎝⎛ ++⎟
⎠⎞
⎜⎝⎛= 3
107
51)(
for Θ(n)
⎠⎝⎠⎝ 105)(
)337()31()( +−++−≤ dnncncnTSubstitution:
3109
)10
()5
()(
+−≤ dnccnT(n) ≤ c(n - 3)
101)3(
10
+−−= dncnncTechnical trick. This shows that T(n)∈ O(n)
if c is chosen large enough e g c=10d)3(
10−≤ nc ,
2/9/12 CS 5633 Analysis of Algorithms 31
if c is chosen large enough, e.g., c=10d
ConclusionsConclusions• Since the work at each level of recursion is
basically a constant fraction (9/10) smaller, the work per level is a geometric series dominated by the linear work at the root.
• In practice, this algorithm runs slowly, because the constant in front of n is large.
• The randomized algorithm is far more gpractical.
Exercise: Try to divide into groups of 3 or 72/9/12 CS 5633 Analysis of Algorithms 32
Exercise: Try to divide into groups of 3 or 7.