Upload
truongque
View
214
Download
0
Embed Size (px)
Citation preview
CSCE 411 Design and Analysis of
Algorithms Set 4: Transform and Conquer Slides by Prof. Jennifer Welch
Spring 2014
CSCE 411, Spring 2014: Set 4 1
General Idea of Transform & Conquer 1. Transform the original problem instance into
a different problem instance 2. Solve the new instance 3. Transform the solution of the new instance
into the solution for the original instance
CSCE 411, Spring 2014: Set 4 2
Varieties of Transform & Conquer [Levitin] n Transform to a simpler or more convenient
instance of the same problem n “instance simplification”
n Transform to a different representation of the same instance
n “representation change”
n Transform to an instance of a different problem with a known solution
n “problem reduction” CSCE 411, Spring 2014: Set 4 3
Instance Simplification: Presorting n Sort the input data first n This simplifies several problems:
n checking whether a particular element in an array is unique
n computing the median and mode (value that occurs most often) of an array of numbers
n searching for a particular element n once array is sorted, we can use the decrease &
conquer binary search algorithm n used in several convex hull algorithms
CSCE 411, Spring 2014: Set 4 4
Instance Simplification: Solving System of Equations n A system of n linear equations in n unknowns:
n a11x1 + a12x2 + … + a1nxn = b1 n … n an1x1 + an2x2 + … + annxn = bn
n Cast as a matrix problem: n Ax = b, where A is n x n matrix, x and b are n-vectors
n To solve for all the x’s, solve Ax = b for x
CSCE 411, Spring 2014: Set 4 5
Motivation for Solving Systems of Linear Equations n http://aix1.uottawa.ca/~jkhoury/system.html
n geometry n networks n heat distribution n chemistry n economics n linear programming n games
CSCE 411, Spring 2014: Set 4 6
Solving System of Equations n One way to solve Ax = b for x:
n compute A−1
n multiply both sides by A−1
n A−1Ax = A−1b n x = A−1b
n Drawback is that computing matrix inverses suffers from numerical instability in practice
n Try another approach…
CSCE 411, Spring 2014: Set 4 7
LUP Decomposition n If A is triangular, solving Ax = b for x is easy and fast using
successive substitutions (how fast?) n Transform this problem into one involving only triangular
matrices n instance simplification!
n Find n n x n matrix L with all 1’s on diagonal and all 0’s above the
diagonal (“unit lower-triangular”) n n x n matrix U with all 0’s below the diagonal (“upper-triangular”) n n x n matrix P of 0’s and 1’s with exactly one 1 in each row and
each column (“permutation matrix”)
such that PA = LU
CSCE 411, Spring 2014: Set 4 8
Using LUP Decomposition n We want to solve Ax = b. n Assume we have L, U and P with desired properties so that PA = LU n Multiply both sides of Ax = b by P to obtain PAx = Pb
n Since P is a permutation matrix, Pb is easy to compute and is just a reordering of the vector b, call it b’
n Substitute LU for PA to obtain LUx = b’ n Let y be the vector (as of yet unknown) that equals Ux;
rewrite as Ly = b’ n although U is known, x is not yet known
n Solve Ly = b’ for y n since L is triangular, this is easy
n Now that y is known, solve y = Ux for x n since U is triangular, this is easy
CSCE 411, Spring 2014: Set 4 9
Solving Ax = b with LUP Decomp. n Assuming the L, U, and P are given,
pseudocode is on p. 817 of [CLRS] n Running time is Θ(n2) n Example: <board>
n Calculating L, U and P is more involved and takes Θ(n3) time. (See [CLRS].)
CSCE 411, Spring 2014: Set 4 10
Instance Simplification: Balanced Binary Search Trees n Transform an unbalanced binary search tree
into a balanced binary search tree n Benefit is guaranteed O(log n) time for
searching, inserting and deleting as opposed to possibility of Θ(n) time
n Examples: n AVL trees n red-black trees n splay trees
CSCE 411, Spring 2014: Set 4 11
Representation Change: Balanced Search Trees n Convert a basic binary search tree into a
search tree that is more than binary: n a node can have more than two children n a node can store more than one data item
n Can get improved performance (w.r.t. constant factors)
n Examples: n 2-3 trees n B-trees
CSCE 411, Spring 2014: Set 4 12
B-Trees: Motivation n Designed for very large data sets that cannot all fit
in main memory at a time n Instead, data is stored on disk n Fact 1: Disk access is orders of magnitude slower
than main memory access n Typically a disk access is needed for each node
encountered during operations on a search tree n For a balanced binary search tree, this would be about
c log2 n, where c is a small constant and n is number of items
CSCE 411, Spring 2014: Set 4 13
B-Trees: Motivation n Can we reduce the time? n Even if not asymptotically, what about reducing the
constants? n Constants do matter
n Reduce the height by having a bushier tree n have more than two children at each node n store more than two keys in each node
n Fact 2: Each disk access returns a fixed amount of information (a page). n Size is determined by hardware and operating system n Typically 512 to 4096 bytes
n Let size of tree node be page size CSCE 411, Spring 2014: Set 4 14
B-Tree Applications n Keeping index information for large amounts
of data stored on disk n databases n file systems
CSCE 411, Spring 2014: Set 4 15
B-Tree Definition n B-tree with minimum degree t is a rooted tree such that
1. each node has between t−1 and 2t−1 keys, in increasing order (root can have fewer keys)
2. each non-leaf node has one more child than it has keys 3. all keys in a node’s i-th subtree lie between the node’s
(i−1)st key and its i-th key 4. all leaves have the same depth
n Points 1-3 are generalization of binary search trees to larger branching factor
n Point 4 controls the height
CSCE 411, Spring 2014: Set 4 16
B-Tree Example
n B-tree with minimum degree 2 1. each node has between 1 and 3 keys, in sorted order 2. each non-leaf node has 2 to 4 children, one more than number
of keys 3. keys are in proper subtrees 4. all leaves have depth 1
CSCE 411, Spring 2014: Set 4 17
26 41
13 17 28 30 31 45 50
B-Tree Height n Theorem: Any n-key B-tree with minimum
degree t has height h ≤ logt((n+1)/2). n Height is still O(log n) but logarithm base is t
instead of 2 n savings in constant factor of log2t, which is
substantial since t is generally very large n Remember: log2x = (log2t)*(logtx)
n Proof: Calculate minimum number of keys in a B-tree of height h and solve for h.
CSCE 411, Spring 2014: Set 4 18
Searching in a B-Tree n Straightforward generalization of searching in a
binary search tree n to search for k, start at root:
1. find largest i such that k ≤ ith key in current node 2. if k = ith key then return “found” 3. elseif current node is a leaf then return “not found” 4. else recurse on root of ith subtree
CSCE 411, Spring 2014: Set 4 19
Running Time of B-Tree Search n CPU time:
n Line 1 takes O(t) (or O(log2 t) if using binary search) n Number of recursive calls is O(height) = O(logt n) n Total is O(t logt n)
n Number of disk accesses: n each recursive call requires at most one disk
access, to get the next node n O(logt n) (the height)
CSCE 411, Spring 2014: Set 4 20
B-Tree Insert n To insert a new key, need to
n obey bounds on branching factor / maximum number of keys per node
n keep all leaves at the same depth
n Do some examples on a B-tree with minimum degree 2 n each node has 1, 2, or 3 keys n each node has 2, 3, or 4 children CSCE 411, Spring 2014: Set 4 21
B-Tree Insert Examples
CSCE 411, Spring 2014: Set 4 22
F T
A D H L Q U Z
F T
C D H L Q U Z A
insert C
B-Tree Insert Examples
CSCE 411, Spring 2014: Set 4 23
F T
C D H L Q U Z A
insert M
F L
C D H M Q U Z A
T
M goes in a full node; split the node in two; promote the median L; insert M
B-Tree Insert Examples
CSCE 411, Spring 2014: Set 4 24
insert B
F L
C D H M Q U Z A
T
B goes in full leaf, so split leaf and promote median C. C goes in full root, so split root and promote median L to make a new root (only way height increases). But this is a 2-pass algorithm => twice as many disk accesses. To avoid 2 passes, search phase always recurses down to a non-full node...
B-Tree Insert with One Pass
CSCE 411, Spring 2014: Set 4 25
F L
C D H M Q U Z A
T
To insert B, start at root to find proper place; proactively split root since it is full
F
L
C D H M Q U Z A
T
B-Tree Insert with One Pass
CSCE 411, Spring 2014: Set 4 26
Recurse to node containing F; since not full no need to split.
F
L
C D H M Q U Z A
T
Recurse to left-most leaf, where B belongs. Since it is full, split it, promote the median C to the parent, and insert B.
B-Tree Insert with One Pass
CSCE 411, Spring 2014: Set 4 27
Final result of inserting B.
L
F C
D H M Q U Z
T
B A
Splitting a B-Tree Node n split(x,i,y) n input:
n non-full node x n full node y which is ith child of x
n result: n split y into two equal size nodes with t−1 keys each n insert the median key of the old y into x
CSCE 411, Spring 2014: Set 4 28
Splitting a B-Tree Node
CSCE 411, Spring 2014: Set 4 29
x:
< 2t−1 keys
2t−1 keys α m β y:
i ... ... x:
≤ 2t−1 keys
t−1 keys α β y:
i ... ... m
t−1 keys
i+1
B-Tree Insert Algorithm n if root r is full (2t−1 keys) then
n allocate a new node s n make s the new root n make r the first child of s n split(s,1,r) n insert-non-full(s,k)
n else insert-non-full(r,k)
CSCE 411, Spring 2014: Set 4 30
B-Tree Insert Algorithm (cont’d) n procedure insert-non-full(x,k):
n if x is a leaf then n insert k in sorted order
n else n find node y that is root of subtree where k belongs n if y is full then split it n call insert-non-full recursively on correct child of x
(y if no split, 1st half of y if split and k < median of y, 2nd half of y if split and k > median of y)
CSCE 411, Spring 2014: Set 4 31
Running Time of B-Tree Insert n Same as search:
n O(t logt n) CPU time n O(logt n) disk access
n Practice (Homework?): insert F, S, Q, K, C, L, H, T, V, W into a B-tree with minimum degree t = 3
CSCE 411, Spring 2014: Set 4 32
Deleting from a B-Tree n Pitfalls:
n Be careful that a node does not end up with too few keys
n When deleting from a non-leaf node, need to rearrange the children (remember, number of children must be one greater than the number of keys)
CSCE 411, Spring 2014: Set 4 33
B-Tree Delete Algorithm
delete(x,k): // called initially with x = root
1. if k is in x and x is a leaf then delete k from x // we will ensure that x has ≥ t keys
2. if k is in x and x is not a leaf then
CSCE 411, Spring 2014: Set 4 34
k x
y z ...
...
B-Tree Delete Algorithm (cont’d) 2(a) if y has ≥ t keys then
find k’ = pred(k) // in y’s subtree delete(y,k’) // recursive call
replace k with k’ in x
CSCE 411, Spring 2014: Set 4 35
k x
y z
k’
k’ x
y z
B-Tree Delete Algorithm (cont’d) 2(b) else if z has ≥ t keys then
find k’ = succ(k) // in z’s subtree delete(z,k’) // recursive call
replace k with k’ in x
CSCE 411, Spring 2014: Set 4 36
k x
y z
k’
k’ x
y z
B-Tree Delete Algorithm (cont’d) 2(c) else // both y and z have < t keys
merge y, k, z into a new node w delete(w,k) // recursive call
CSCE 411, Spring 2014: Set 4 37
k x
y z k
x
y z w
B-Tree Delete Algorithm (cont’d) 3. if k is not in (internal) node x then
let y be root of x’s subtree where k belongs 3(a) if y has < t keys but has a neighboring sibling z
with ≥ t keys then y borrows a key from z via x // note moving subtrees
CSCE 411, Spring 2014: Set 4 38
25 45
10 20 22 30
x
z y
22 45
10 20 25 30
x
y z
B-Tree Delete Algorithm (cont’d) 3. if k is not in (internal) node x then
let y be root of x’s subtree where k belongs 3(b) if y has < t keys and has no neighboring sibling z
with ≥ t keys then merge y with sibling z, using intermediate key in x
CSCE 411, Spring 2014: Set 4 39
whether (a), (b) or neither was done, call delete(y,k)
25 45
20 30
x
z y
45
20 25 30
x
y
Behavior of B-Tree Delete n As long as k has not yet been found, we continue in a
single downward pass, with no backtracking. n If k is found in an internal node, we may have to find pred
or succ of k, call it k’, delete k’ from its old place, and then go back to where k is and replace k with k’.
n However, finding and deleting k’ can be done in a single downward pass, since k’ will be in a leaf (basic property of search trees).
n O(logt n) disk access n O(t logt n) CPU time
CSCE 411, Spring 2014: Set 4 40
Problem Reduction: Computing Least Common Multiple n lcm(m,n) is the smallest integer that is
divisible by both m and n n Ex: lcm(11,5) = 55 and lcm(24,60) = 120
n One algorithm for finding lcm: multiply all common factors of m and n, all factors of m not in n, and all factors of n not in m n Ex: 24 = 2*2*2*3, 60 = 2*2*3*5,
lcm(24,60) = (2*2*3)*2*5 = 120
n But how to find prime factors of m and n? CSCE 411, Spring 2014: Set 4 41
Reduce Least Common Multiple to Greatest Common Denominator n Try another approach. n gcd(m,n) is product of all common factors of m and n n So gcd(m,n)*lcm(m,n) includes every factor in both gcd
and lcm twice, every factor in m but not n exactly once, and every factor in n but not m exactly once
n Thus gcd(m,n)*lcm(m,n) = m*n. n I.e., lcm(m,n) = m*n/gcd(m,n) n So if we can solve gcd, we can solve lcm n And we can solve gcd with Euclid’s algorithm
CSCE 411, Spring 2014: Set 4 42
Problem Reduction: Computing Number of Paths in a Graph n How many paths of length 3 are there in this
graph between b and d?
CSCE 411, Spring 2014: Set 4 43
a b
c d
Computing Number of Paths in a Graph n Claim: Adjacency matrix A to the k-th power
gives number of paths of length (exactly) k between all pairs
n Reduce problem of computing number of paths to problem of multiplying matrices!
CSCE 411, Spring 2014: Set 4 44
Proof of Claim n Basis: A1 = A gives all paths of length 1 n Induction: Suppose Ak gives all paths of length k. Show for
Ak+1 = AkA. n (i,j) entry of Ak+1 is sum, over all vertices h, of (i,h) entry of
Ak times (h,j) entry of A:
CSCE 411, Spring 2014: Set 4 45
i j h
all paths from i to h with length k
path from h to j with length 1
Computing Number of Paths of length k n We have to compute Ak. n Do k-1 matrix multiplications
n brute force or Strassen’s n O(kn3) or O(kn2.8…) running time
n Or, do successive doubling (A2, A4, A8, A16,…) n about log2k multiplications n O(n3log k) or O(n2.8…log k) running time
CSCE 411, Spring 2014: Set 4 46
Problem Reduction Tool: Linear Programming n Many problems related to finding an optimal
solution for something can be reduced to an instance of the linear programming problem:
n optimize a linear function of several variables subject to constraints n each constraint is a linear equation or linear
inequality
CSCE 411, Spring 2014: Set 4 47
Linear Program Example n An organization wants to invest $100 million in stocks,
bonds, and cash. n Assume interest rates are:
n stocks: 10% n bonds: 7% n cash: 3%
n Institutional restrictions: n amount in stock cannot be more than a third of amount in bonds n amount in cash must be at least a quarter of the amount in stocks
and bonds
n How should money manager invest to maximize return?
CSCE 411, Spring 2014: Set 4 48
Mathematical Formulation of the Example n x = amount in stocks (in millions of dollars) n y = amount in bonds n z = amount in cash maximize (.10)*x + (.70)*y + (.03)*z subject to
x+y+z = 100 x ≤ y/3 z ≥ (x+y)/4 x ≥ 0, y ≥ 0, z ≥ 0
CSCE 411, Spring 2014: Set 4 49
General Linear Program maximize (or minimize) c1x1 + … + cnxn
subject to a11x1 + … + a1nxn ≤ (or ≥ or =) b1
a21x1 + … + a2nxn ≤ (or ≥ or =) b2
… am1x1 + … + amnxn ≤ (or ≥ or =) bm x1 ≥ 0, …, xn ≥ 0
CSCE 411, Spring 2014: Set 4 50
Linear Programs with 2 Variables maximize x1 + x2
subject to 4x1 – x2 ≤ 8 2x1 + x2 ≤ 10 5x1 – 2x2 ≥ –2 x1, x2 ≥ 0
CSCE 411, Spring 2014: Set 4 51
feasible region
objective function
x1 = 2, x2 = 6 is optimal solution
(not drawn to scale)
x1 x1 ≥ 0
x2
x2 ≥
0
Solving a Linear Program n Given a linear program, there are 3 possibilities:
n the feasible region is empty n the feasible region and the optimal value are unbounded n the feasible region is bounded and there is an optimal value
n Three ways to solve a linear program: n simplex method: travel around the feasible region from corner to
corner until finding optimal n worst-case exponential time, average case is polynomial time
n ellipsoid method: a divide-and-conquer approach n polynomial worst-case, but slow in practice
n interior point methods n polynomial worst-case, reasonable in practice
CSCE 411, Spring 2014: Set 4 52
most common in practice
Use of Linear Programming n Later we will study algorithms to solve linear
programs. n Now we’ll give some examples of converting
other problems into linear programs.
CSCE 411, Spring 2014: Set 4 53
Reducing a Problem to a Linear Program n What unknowns are involved?
n These will be the variables x1, x2,…
n What quantity is to be minimized or maximized? How to express this quantity in terms of the variables? n This will be the objective function
n What are the constraints on the problem and how to state them w.r.t. the variables? n Constraints must be linear
CSCE 411, Spring 2014: Set 4 54
Reducing a Problem to a Linear Program: Example n A tailor can sew pants and shirts. n It takes him 2.5 hours to sew a pair of pants and 3.5 hours
to sew a shirt. n A pair of pants uses 3 yards of fabric and a shirt uses 2
yards of fabric. n The tailor has 40 hours available for sewing and has 50
yards of fabric. n He makes a profit of $10 per pair of pants and $15 per
shirt. n How many pants and how many shirts should he sew to
maximize his profit? CSCE 411, Spring 2014: Set 4 55
Reducing a Problem to a Linear Program: Example Solution n Variables:
n x1 = number of pants to sew n x2 = number of shirts to sew
n Objective function: n maximize 10*x1 + 15*x2
n Constraints: n time: (2.5)*x1 + (3.5)*x2 ≤ 40 n fabric: 3*x1 + 2*x2 ≤ 50 n nonnegativity: x1 ≥ 0, x2 ≥ 0
CSCE 411, Spring 2014: Set 4 56
Knapsack Problem as a Linear Program n Suppose thief can steal part of an object
n “fractional” knapsack problem
n For each item j, 1 ≤ j ≤ n, n vj is value of (entire) item j n wj is weight of (entire) item j n xj is fraction of item j that is taken
n maximize v1x1 + … + vnxn
n subject to n w1x1 + … wnxn ≤ W (knapsack limit) n 0 ≤ xj ≤ 1, for j = 1,…,n
CSCE 411, Spring 2014: Set 4 57
A Shortest Path Problem as a Linear Program n What is the shortest path distance from s to t
in weighted directed graph G = (V,E,w)? n For each v in V, let dv be a variable modeling
the distance from s to v. maximize dt
subject to dv ≤ du + w(u,v) for each (u,v) in E ds = 0
CSCE 411, Spring 2014: Set 4 58