43
Lecture 2: Greedy Algorithms II Shang-Hua Teng

Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Embed Size (px)

Citation preview

Page 1: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Lecture 2:Greedy Algorithms II

Shang-Hua Teng

Page 2: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Optimization Problems

• A problem that may have many feasible solutions.

• Each solution has a value

• In maximization problem, we wish to find a solution to maximize the value

• In the minimization problem, we wish to find a solution to minimize the value

Page 3: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

• Suppose we have 1000000000 (1G) character data file that we wish to include in an email.

• Suppose file only contains 26 letters {a,…,z}.

• Suppose each letter in {a,…,z} occurs with frequency f.

• Suppose we encode each letter by a binary code• If we use a fixed length code, we need 5 bits for each

character

• The resulting message length is 5( fa+ fb + … + fz)

• Can we do better?

Data Compression

Page 4: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Huffman Codes

• Most character code systems (ASCII, unicode) use fixed length encoding

• If frequency data is available and there is a wide variety of frequencies, variable length encoding can save 20% to 90% space

• Which characters should we assign shorter codes; which characters will have longer codes?

Page 5: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Data Compression: A Smaller Example• Suppose the file only has 6 letters {a,b,c,d,e,f}

with frequencies

• Fixed length 3G=3000000000 bits

• Variable length

110011011111001010

101100011010001000

05.09.16.12.13.45.

fedcba

Fixed length

Variable length

G24.2405.409.316.312.313.145.

Page 6: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

How to decode?

• At first it is not obvious how decoding will happen, but this is possible if we use prefix codes

Page 7: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Prefix Codes• No encoding of a character can be the prefix of the

longer encoding of another character, for example, we could not encode t as 01 and x as 01101 since 01 is a prefix of 01101

• By using a binary tree representation we will generate prefix codes provided all letters are leaves

Page 8: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Prefix codes

• A message can be decoded uniquely.

• Following the tree until it reaches to a leaf, and then repeat!

• Draw a few more tree and produce the codes!!!

Page 9: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Some Properties• Prefix codes allow easy decoding

– Given a: 0, b: 101, c: 100, d: 111, e: 1101, f: 1100– Decode 001011101 going left to right, 0|01011101, a|

0|1011101, a|a|101|1101, a|a|b|1101, a|a|b|e

• An optimal code must be a full binary tree (a tree where every internal node has two children)

• For C leaves there are C-1 internal nodes

• The number of bits to encode a file is

where f(c) is the freq of c, dT(c) is the tree depth of c, which corresponds to the code length of c

Page 10: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Optimal Prefix Coding Problem

• Input: Given a set of n letters (c1,…, cn) with frequencies (f1,…, fn).

• Construct a full binary tree T to define a prefix code that minimizes the average code length

iT

n

i i cfT length )Average(1

Page 11: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Greedy Algorithms

• Many optimization problems can be solved more quickly using a greedy approach– The basic principle is that local optimal decisions may

may be used to build an optimal solution– But the greedy approach may not always lead to an

optimal solution overall for all problems– The key is knowing which problems will work with

this approach and which will not

• We will study– The problem of generating Huffman codes

Page 12: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Greedy algorithms• A greedy algorithm always makes the choice that

looks best at the moment– My everyday examples:

• Driving in Los Angeles, or even Boston for that matter

• Playing cards

• Invest on stocks

• Choose a university

– The hope: a locally optimal choice will lead to a globally optimal solution

– For some problems, it works

• greedy algorithms tend to be easier to code

Page 13: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

David Huffman’s idea• A Term paper at MIT

• Build the tree (code) bottom-up in a greedy fashion

• Origami aficionado

Page 14: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Building the Encoding Tree

Page 15: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Building the Encoding Tree

Page 16: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Building the Encoding Tree

Page 17: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Building the Encoding Tree

Page 18: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Building the Encoding Tree

Page 19: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

The Algorithm

• An appropriate data structure is a binary min-heap

• Rebuilding the heap is lg n and n-1 extractions are made, so the complexity is O( n lg n )

• The encoding is NOT unique, other encoding may work just as well, but none will work better

Page 20: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Correctness of Huffman’s Algorithm

Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree

Lemma A:

Page 21: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Proof of Lemma A• Without loss of generality, assume f[a]f[b] and

f[x]f[y]

• The cost difference between T and T’ is

0

))()(])([][(

)(][)(][)(][)(][

)(][)(][)(][)(][

)()()()()'()(

''

'

xdadxfaf

xdafadxfadafxdxf

adafxdxfadafxdxf

cdcfcdcfTBTB

TT

TTTT

TTTT

CcT

CcT

B(T’’) B(T), but T is optimal,

B(T) B(T’’) B(T’’) = B(T)Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth

Page 22: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Correctness of Huffman’s Algorithm

•Observation: B(T) = B(T’) + f[x] + f[y] B(T’) = B(T)-f[x]-f[y]

–For each c C – {x, y} dT(c) = dT’(c) f[c]dT(c) = f[c]dT’(c)

–dT(x) = dT(y) = dT’(z) + 1

–f[x]dT(x) + f[y]dT(y) = (f[x] + f[y])(dT’(z) + 1) = f[z]dT’(z) + (f[x] + f[y])

Lemma B:

Page 23: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

B(T’) = B(T)-f[x]-f[y]

B(T) = 45*1+12*3+13*3+5*4+9*4+16*3

z:14

B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3= B(T) - 5 - 9

Page 24: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Proof of Lemma B• Prove by contradiction.• Suppose that T does not represent an optimal prefix code

for C. Then there exists a tree T’’ such that B(T’’) < B(T).• Without loss of generality, by Lemma A, T’’ has x and y

as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then

• B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’)– T’’’ is better than T’ contradiction to the

assumption that T’ is an optimal prefix code for C’

Page 25: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

How Did I learn about Huffman code?

• I was taking Information Theory Class at USC from Professor Irving Reed (Reed-Solomon code)

• I was TAing for CSCI 303

• I taught a lecture on “Huffman Code“ for Professor Miller

• I wrote a paper

Page 26: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Stable Matching (Marriage) Problem -- Continue

Shang-Hua Teng

Page 27: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

3,5,2,1,4

5,2,1,4,3

4,3,5,1,2

1,2,3,4,5

2,3,4,1,5

Univ. 13,2,5,1,4

1,2,5,3,4

4,3,2,1,5

1,3,4,2,5

1,2,4,5,3

Univ. 2

Univ. 3

Univ. 4

Univ. 5

Appl. 1

Appl. 2

Appl. 3

Appl. 4

Appl. 5

Page 28: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Stability: “Mutually Cheating Hearts”

• Suppose we pair off all the university and candidate. Now suppose that some university and some candidate prefer each other to what they matched to.

• They will be called a pair of MCHs or blocking pair

Page 29: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Gale-Shapley Algorithm•For each day each university still has an opening does the following:– MorningMorning

• Make an offer to the best candidate to whom it has not yet offered to

– Afternoon (for those candidates with at least one offer)Afternoon (for those candidates with at least one offer)

• To today’s best suitor: “Maybe, but for now I will keep your offer”“Maybe, but for now I will keep your offer”

• To any others: “Not me, but good luck” “Not me, but good luck”

– EveningEvening

• Each university is ready to offer to the next candidate on its list

The day WILL COME that no university is rejected

Each candidate accepts the last university to whom she/he said “maybe”

Page 30: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Improvement Lemma: If a candidate has an offer, then she/he will always have an offer from now on.

Corollary: Each candidate will accept her/his absolute best offer received during the process. [Advantage to candidate?]

Corollary: No University can be rejected by all the candidates

Corollary: A matching is achieved by Gale-Shapley

Page 31: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Theorem: The pairing produced by Gale-Shapley is stable.

– This means USC prefers Adleman more than the person it matched to, say, Lee.

– Thus, USC offered to Adleman before it offered to Lee.– Adleman must have rejected USC for another university he

preferred.– By the Improvement lemma, he must like his current university,

saying, UCLA more than USC.

– Proof by contradiction: Suppose USC and Adleman are a pair of MCHs.

Contradiction!

Page 32: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Each candidate will accept her/his absolute best offer received during the process. [Power to reject: Advantage to candidate?]

But university has the power to determine when and whom to make an offer to [Power to choose: Advantage to University]

Page 33: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Opinion Poll

Who is better off

in Gale-Shapley

Algorithm, th

e

univeristies or

the candidates?

Page 34: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

• How should we define what we mean when we say “the optimal candidate for USC”?

Flawed Attempt:Flawed Attempt: “The candidate at the top of “The candidate at the top of

USC’sUSC’s list” list”

Page 35: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

The Optimal Match

A university’s optimal match is the highest ranked candidate for whom there is some stable pairing in which they are matched

The candidate is the best candidate it can conceivably be matched in a stable world.

Presumably, that candidate might be better than the candidate it gets matched to in the stable pairing output by Gale-Shapley.

Page 36: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

The Pessimal Match

A university’s pessimal match is the lowest ranked candidate for whom there is some stable pairing in which the university is matched to.

That candidate is the least ranked candidate the university can conceivably get to be matched to in a stable world.

Page 37: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Dilemmas: power to reject or power to choose

•A pairing is university-optimal if every university gets its optimal candidate. This is the best of all possible stable worlds for every university simultaneously.

•A pairing is university-pessimal if every university gets its pessimal candidate. This is the worst of all possible stable worlds for every university simultaneously.

Page 38: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

•A pairing is candidate-optimal if every candidate gets her/his optimal job. This is the best of all possible stable worlds for every candidate simultaneously.

•A pairing is candidate-pessimal if every candidate gets her/his pessimal job. This is the worst of all possible stable worlds for every candidate simultaneously.

Dilemmas: power to reject or power to choose

Page 39: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

The Mathematical Truth is!

The Gale-Shapley Algorithm always produces a university-optimal, and candidate-pessimal pairing.

Page 40: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Theorem: Gale-Shapley produces a university-optimal pairing

– Suppose not: i.e. that some univ. gets rejected by it optimal match during GS.

– In particular, let’s say UCLA is the first univ to be rejected by its optimal match Adleman: Let’s say Adleman said “maybe” to USC, whom he prefers.

– Since UCLA was the only univ. to be rejected by its optimal match so far, USC must like Adleman at least as much as USC’s optimal match.

Page 41: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

We are assuming that Adleman is UCLA’s optimal match: Adleman likes USC more than UCLA. USC likes Adleman at least as much as its optimal match.

•We now show that any pairing S in which UCLA hires Adleman cannot be stable (for a contradiction).•Suppose S is stable:– USC likes Adleman more than his partner in S

• USC likes Adleman at least as much as USC’s best match, but USC is not matched to Adleman in S

– Adleman likes USC more than his university UCLA in S

Contradiction!

Page 42: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

•We’ve shown that any pairing in which UCLA hires Adleman cannot be stable.– Thus, Adleman cannot be UCLA’s optimal match

(since he can never work there in a stable world).

– So UCLA never gets rejected by its optimal match in the Gale-Shapley, and thus the Gale-Shapley is university-optimal.

Page 43: Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization

Theorem:The Gale-Shapley pairing is candidate-pessimal.