Chapter 16 Greedy Algorithms

CS6363: Design and Analysis of Computer Algorithms Prof. Sergey Bereg

Greedy AlgorithmsChapter 16

1 Activity selection problem

We are given n activities a1, . . . , an where i-th activity ai = [si, fi) starts at time si and finishes at time fi.They require the same resource (for example, one lecture hall). Two activities are ai and aj are compatible if[si, fi) [sj , fj) = . The activity-selection problem (ASP) is to select a maximum size subset of mutuallycompatible activities.

1.1 Activity Selection with Dynamic Programming

We want to find optimal substructure of an optimal selection. If ai is selected then there are 2 subproblems:

a problem with activities ak such that fk si, and a problem with activities ak such that sk fi.

In general, we have a subproblem of the following type

Sij = {ak | fi sk < fk sj}We add two fictitious activities a0 = [0, 0) and sn+1 = [,) Suppose that activities are sorted by

finish time:f0 f1 f2 fn+1

Let c[i, j] be the solution of ASP for Sij (the maximum size of compatible activities in Sij). Then

c[i, j] =

{0 if Sij =

maxi

Call RECURSIVE-ACTIVITY-SELECTOR(s, f, 0) first time. The running time is O(n) (assuming thatthe activities are sorted by finish time). We will prove that the algorithm finds an optimal schedule.

Theorem. Consider any nonempty subproblem Sij , and let am be the activity in Sij with the earliestfinish time:

fm = min{fk | ak Sij}.Then1. Activity am is used in some optimal solution for Sij .2. The subproblem Sim is empty, so choosing am leaves only one subproblem Smj .

Proof. 2. Suppose that there is ak Sim. Then fk sm < fm and fk < fm. Contradiction with thechoice of am.

1. Let Aij be an optimal solution for Sij . We sort the activities of Aij by finish time. Let ak be thefirst activity in Aij . If ak = am then we are done. Otherwise construct Aij = (Aij {ak}) {am}. Theactivities in Aij are compatible.

The recursive algorithm can be converted to an iterative one:

GREEDY-ACTIVITY-SELECTOR(s, f)// Input: s[1..n] is the array of start times, f [1..n] is the array of finish times.// Activities are sorted by finish time.1 n = length(s)2 A = {a1}3 i = 14 form = 2 to n5 if s[m] f [i] then6 A = A {am}7 i = m8 return A

Its running time is O(n).Greedy Strategy

1. Cast the optimization problem as one in which we make a choice and are left with one subproblem tosolve.

2. Prove that there is always an optimal solution to the original problem that makes the greedy choice,so that the greedy choice is always safe.

3. Show that if we combine the greedy choice and an optimal solution to the subproblem, we arrive atan optimal solution to the original problem.

2 Knapsack Problem

There are n items; ith item is worth vi dollars and weights wi pounds, where vi and wi are integers. Selectitems to put in knapsack with total weight W so that total value is maximized.

0-1 knapsack problem: each item must either be taken or left behind.Fractional knapsack problem: fractions of items are allowed.

Consider an example withW = 50 and 3 items

i 1 2 3wi 10 20 30vi $60 $100 $120

value per pound $6 $5 $4

2

Greedy choice: take an item with maximum value per pound.Greedy algorithm takes item 1 and then item 2. Total value $160. But the optimal solution is to take

items 2 and 3. Total value $220. So, greedy strategy doesnt work for 0-1 knapsack problem.The fractional knapsack problem can be solved by the greedy algorithm. I the above exemple, it takes

item 1, item 2 and 2/3 of item 3. Total weight 50. Total value $240 which is optimal.

3 Huffman code

We want to compress a file by encoding characters with binary codes. One idea is the fixed-length code.Example a = 00, b = 01, c = 11. Then acb = 001101. Decode 010011=?

Variable-length code. Example a = 0, b = 10, c = 11. Then acb = 01110. Decode 010100=?A binary sequence can be decoded if no codeword is a prefix of another codeword. We call such code a

prefix code. It can be represented by a binary tree.

a c

b

d

e

0

0

0

0

1

1

1

1

a = 10, b = 011, c = 11, d = 00, e = 010Encode ace.Decode 0100010.

Optimal code problem. Given an alphabet C of n characters and frequency of each character, findoptimal prefix code so that the compressed file has minimum length.

Huffman algorithm constructs the optimal tree T . The characters are the leaves of T .Greedy choice: select two vertices x and y of lowest frequency and replace them by a vertex z so that x

and y are the children of z and the frequency of z is the sum of frequencies of x and y.

x y x y

z

Example:character a b c d e ffrequency 45 13 12 16 9 5

e

a b c d e f

f

14

45 13 12 16 9 5

a

b c d

45

13 12 16

b c

25

e f

14

d

16

a

45

1)

2)

3)

a

45

b c

25

e f

d

304)

5,6)

b c

e f

d

a

100

3

Huffmans algorithm:

HUFFMAN(C)1 n = |C|2 Q = C // Q is the min-priority queue3 for i = 1 to n 14 allocate a new node z5 left[z] = x =EXTRACT-MIN(Q)6 right[z] = y =EXTRACT-MIN(Q)7 f [z] = f [x] + f [y]8 INSERT(Q, z)9 return EXTRACT-MIN(Q) // return the root of the tree

Using a binary min-heap, the initialization in line 2 takes O(n) time. Every EXTRACT-MIN() andINSERT() takes O(lg n) time. Total time is O(n lg n).

Theorem. Huffmans algorithm produces an optimal code.Proof. Let B(T ) be the cost of a binary tree T (the length of encoded text in bits), i.e. B(T ) =

cC f(c)dT (c) where f(c) is the frequency of a character c and dT (c) is its depth in T .Part 1. If T is an optimal tree then it is full, i.e. a vertex u cannot be the only child of its parent v.

u

v u = v

Part 2. Let a and b be two least frequent characters from C. Let T be an optimal tree with two siblingsa and b of maximum depth (they exist by Part 1). Make a tree T by exchanging a and a. Then

B(T )B(T ) = f(a)(dT (a) dT (a)) + f(a)(dT (a) dT (a))= (f(a) f(a))(dT (a) dT (a)) 0.

So T is not worse than T . We also can replace b by b.

a

a

b a

a

b

T T

a b

T

Part 3. By Part 2, we want to find a tree T minimizingB(T ) with a constraint that a and b are siblings inT . Let T be the binary tree T {a, b}. Let x be the new leaf and f(x) = f(a) + f(b). Minimizing B(T )is the same as minimizing B(T ) since B(T ) = B(T ) + (f(a) + f(b)) and f(a) + f(b) is a constant.

a b

T T

x

4

Activity selection problemActivity Selection with Dynamic ProgrammingActivity Selection with Greedy Approach

Knapsack ProblemHuffman code

Documents

Chapter 16 Greedy Algorithms