36
Sorting Alexandre David B2-206

Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

Sorting

Alexandre DavidB2-206

Page 2: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 2

The ProblemInput: a sequence of n numbers ⟨a1…an⟩.Output: a permutation (re-ordering) ⟨a1’…an’⟩ of the input sequence s.t. a1’≤a2’…≤an’.Numbers to sort are also called keys.

Page 3: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 3

Sorting AlgorithmsBubble sort.Selection sort.Insertion sort.Quick sort.Merge sort.Heap sort.

Application: priority queues.

n2

nlg(n)

Page 4: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 4

Bubble SortPseudo code in problem 2-2 p38.

Array indexed from 1 to n.

Running time = n-1+n-2+…+1=n(n-1)/2 = Θ(n2).

for i = 1 to n-1 dofor j = n downto i+1 do

if a[j] < a[j-1] then swap(a[j],a[j-1])

Page 5: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 5

Bubble Sort

for i = 1 to n-1 dofor j = n downto i+1 do

if a[j] < a[j-1] then swap(a[j],a[j-1])

i1 n-1

ni+1j

Page 6: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 6

Bubble Sort - CorrectnessLoop invariant: Sub-array a[1..i] is sorted before the loop on i.

Initialization: true before 1st iteration.Maintenance: If it is true before an iteration, it remains true before the next iteration.Termination: The loop terminates and when it does, the invariant gives us a useful property.

Page 7: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 7

Selection SortCorresponds to exercise 2.2-2 p27.

Loop invariant: Sub-array a[1…i] is sorted before the loop on i.Running time: n-1+n-2+…+1 = n(n-1)/2= Θ(n2).

for i = 1 to n-1 dofor j = i+1 to n do

if a[i] > a[j] then swap(a[i],a[j])

Page 8: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 8

Selection Sort

for i = 1 to n-1 dofor j = i+1 to n do

if a[i] > a[j] then swap(a[i],a[j])

i1 n-1

ni+1j

Page 9: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 9

Insertion SortIdea: Like when you play cards.Loop invariant: Sub-array a[1…j-1] is sorted before the loop on j.Running time: Θ(n2)?

for j = 2 to n dokey = a[j]i = j-1while i > 0 and a[i] > key doa[i+1] = a[i]i = i-1

donea[i+1] = key

done

Page 10: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 10

Insertion Sort

for j = 2 to n dokey = a[j]i = j-1while i > 0 and a[i] > key doa[i+1] = a[i]i = i-1

donea[i+1] = key

done

j2 n

keyi

j-1

j

keyi?

sorted

Page 11: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 11

Example

8 2 4 9 3 62 8 4 9 3 62 4 8 9 3 62 4 8 9 3 62 3 4 8 9 62 3 4 6 8 9

Page 12: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 12

Analysis of Insertion SortCount executions of loops:

Test n times, execute n-1 times (outer loop).Test tj times, execute tj-1 times (inner loop).

Analysis:Best case: tj=1, T(n)=Ω(n).Worst case: tj=j, T(n)=O(n2).Average case: tj=j/2, E[T(n)]=O(n2).

∑+j

jtcnc 21

Can be good!

Page 13: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 13

Divide-and-ConquerDivide-and-conquer approach.

Recursive algorithms.Divide main problem into sub-problems.Conquer the sub-problems (solve recursively).Combine the solutions of the sub-problems.

Previous approaches were incremental.

Page 14: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 14

Merge SortMerge sort:

Divide array n into 2 arrays of size n/2.Sort 2 smaller arrays with merge sort.Combine the smaller arrays.

Careful: Running time comes fromnumber of divide (= recursive calls)+ time of combining solutions.

Page 15: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 15

Merge SortKey of the algorithm: Merge the sorted sub-arrays A[p…q] and A[q+1…r] linearly in time.Description (book) uses copies and sentinel values (to simplify tests).

merge_sort(a,p,r):if p < r thenq = (p+r)/2merge_sort(a, p, q)merge_sort(a, q+1, r)merge(a, p, q, r)

fi

Page 16: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 16

Merge Sort

merge_sort(a,p,r):if p < r thenq = (p+r)/2merge_sort(a, p, q)merge_sort(a, q+1, r)merge(a, p, q, r)

fip rq

sortedsorted

merge

sorted

Page 17: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 17

Key: Merging 2 Sorted Arrays

20 1213 117 92 1

20 1213 117 9

20 1213 11

20 1213 11

9

20 1213 117 92

20 1213

Time = Θ(n) to merge a totalof n elements (linear time).

Page 18: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 18

Analyzing Merge Sort

T(n)Θ(1)

2T(n/2)

Θ(n)

(sloppy)

merge_sort(a,p,r):if p < r thenq = (p+r)/2merge_sort(a, p, q)merge_sort(a, q+1, r)merge(a, p, q, r)

fi

Page 19: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 19

Analyzing Merge SortRunning time T(n) given as a recurrence. In general, for divide-and-conquer algorithms:

T(n)=Θ(1) if n≤c (c small enough) otherwiseT(n)=aT(n/b)+D(n)+C(n).

Easy to solve!Merge-sort: c=1, D(n)=1, C(n)=Θ(n).

Master method: a=2, b=2 ⇒ T(n)=Θ(n lgn).

(divide-conquer &combine)

BUT: Algorithm is not in-place.

Page 20: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 20

Quick SortDivide-and-conquer algorithm.

Worst case Θ(n2).Average (robust) case Θ(n lgn).Very efficient in practice.

In-place algorithm with small constants (complexity). Fastest sort in practice except for “almost-sorted” inputs.

Page 21: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 21

Quick SortDivide: Partition A[p..r] into A’[p..q-1] and A’[q+1..r] s.t. a’p..q-1≤a’q≤a’q+1..r.

Conquer: Sort A’[p..q-1] and A’[q+1..r] by recursive calls to quick-sort.Combine: A[p..r] is sorted. Nothing to do!Key: How to partition. We can choose a’qarbitrarily.

Page 22: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 22

Divide-and-Conquer

1)Divide: Choose pivot element.

x

Partition the array.

x≤x ≥x

2)Conquer: Recursive calls.3)Combine: Trivial.

KEY: Linear time partitioning sub-routine.

Page 23: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 23

Quick Sort Partitioning

Equivalent to selection-sort for worst case.

partition(A,p,r)x = A[r]i = p-1for j = p to r-1 doif A[j] ≤ x then

i = i+1swap(A[i],A[j])

fidoneswap(A[i+1],A[r])return i+1

pivot

Page 24: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 24

VariantsHoare’s partitioning more efficient.

Problem 7-1 p160.Fewer swaps in practice.

Randomized quick-sort:Choose the pivot randomly.No worst-case input but unlucky run may take Θ(n2).More robust.

Combine quick-sort with insertion sort.

Page 25: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 25

partition(A,p,r)x = A[r]i = p-1for j = p to r-1 doif A[j] ≤ x then

i = i+1swap(A[i],A[j])

fidoneswap(A[i+1],A[r])return i+1

2 8 7 1 3 5 6 4

2 8 7 1 3 5 6 4

2 8 7 1 3 5 6 4

2 8 7 1 3 5 6 4

2 1 7 8 3 5 6 4

2 1 3 8 7 5 6 4

2 1 3 8 7 5 6 4

2 1 3 8 7 5 6 4

2 1 3 4 7 5 6 8

i rp,jx

i j

j

j

j

j

j

i

i

i

i

i

i

i≤x >x ? x

Page 26: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 26

Worst Case of Quick SortInput sorted or reverse order.Partition around min or max element.Always no element for one side.

)()()1(

)()1()1()()1()0(

2nnnT

nnTnnTT

Θ=

Θ+−=Θ+−+Θ=Θ+−+=)(nT

Page 27: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 27

Worst Case Recursion Tree

cn

c(n-1)

c(n-2)

Θ(1)

T(0)

T(0)

T(0) …

T(n)=T(0)+T(n-1)+cn

nΘ(n2)

Page 28: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 28

Best Case AnalysisIntuition

If we are lucky, the partition splits the array evenly.Like merge-sort:What if the split is always 1/10 : 9/10?

Solution?

)lg()()2/(2

nnnnT

Θ=Θ+=)(nT

?

)()10/9()10/()( nnTnTnT Θ++=

Page 29: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 29

Recursion Tree 1/10 : 9/10cn

cn/10 9cn/10

cn/100 9cn/100 9cn/100 81cn/100

Θ(1) Θ(1)

…… … … … … … …

log10nlog9/10n

Θ(n) leaves

cnlog10n+Θ(n) ≤ T(n) ≤ cnlog10/9n+Θ(n) Θ(nlgn) lucky

Page 30: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 30

Performance of Quick SortDepends on the input (balanced array or not) and the choice of the pivot.Worst-case (sorted, reversed, equal elements): Θ(n2).Best-case: (balanced): T(n)=Θ(n lgn).Average case?

Remark: 1/10 : 9/10 still in n lgn.Average is Θ(n lgn).Analysis: Randomized quick-sort…

Page 31: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 31

Randomized Quick SortPick up a random pivot for the partitioning.Running time unpredictable.No assumption on the input distribution.No specific input gives the worst case.Used for analysis of quick sort for the average case.

Page 32: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 32

Randomized Quick Sort AnalysisT(n)=random variable for the running time of randomized quick sort (input of size n), assuming independent random numbers.For k=0…n-1, define the indicator random variable

E[Xk]=Pr{Xk=1}=1/n since all splits are equally likely, assuming distinct elements.

⎩⎨⎧

=01

kXif partition generatesa k : n-k-1 splitotherwise

Problem 7-2

Page 33: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 33

T(n) with Indicator Variable

( )∑−

=

Θ+−−+=

⎪⎪⎩

⎪⎪⎨

−Θ++−

−Θ+−+−Θ+−+

=

1

0)()1()()(

.0:1)()0()1(

,2:1)()2()1(,1:0)()1()0(

)(

n

kk nknTkTXnT

splitnifnTnT

splitnifnnTTsplitnifnnTT

nTM

Page 34: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 34

Expectation

)]([ nTE ( )

( )[ ]

)()]([2

)(1)]1([1)]([1

)]()1()([][

)()1()(

)()1()(

1

0

1

0

1

0

1

0

1

0

1

0

1

0

nkTEn

nn

knTEn

kTEn

nknTkTEXE

nknTkTXE

nknTkTXE

n

k

n

k

n

k

n

k

n

kk

n

kk

n

kk

Θ+=

Θ+−−+=

Θ+−−+⋅=

Θ+−−+=

⎥⎦

⎤⎢⎣

⎡Θ+−−+=

∑∑∑

=

=

=

=

=

=

=

Tough recurrence!

Page 35: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 35

Solving the Recurrence

Prove E[T(n)]≤an lgn for a constant a>0.Choose a large enough so that an lgndominates E[T(n)] for sufficiently small n≥2.Use fact:

)()]([2)]([1

2nkTE

nnTE

n

kΘ+= ∑

=

(k=0 and k=1 can be absorbed in Θ(n))

∑−

=

−≤1

2

22

81lg

21lg

n

knnnkk (exercise)

Page 36: Sorting - Aalborg Universitetpeople.cs.aau.dk/~adavid/teaching/AA/AA1-06/05-Sorting.pdf · Sorting Alexandre David B2-206. 31-10-2006 AA1 2 The Problem Input: a sequence of nnumbers

31-10-2006 AA1 36

Substitution Method…

)]([ nTE

nan

nannan

nnnnna

nkakn

n

k

lg

)(4

lg

)(81lg

212

)(lg2

22

1

2

⎟⎠⎞

⎜⎝⎛ Θ−−=

Θ+⎟⎠⎞

⎜⎝⎛ −=

Θ+≤ ∑−

=

with a large enough

E[T(n)]=O(n lgn)

(use substitution)

(use fact)