Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 1 -
Bioinformatics:Issues and Algorithms
CSE 308-408 • Fall 2007 • Lecture 3
Introduction to Algorithms
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 2 -
Administrative issues
• Homework #1 has been posted on Blackboard and is due next Tuesday (Sept. 11) at 5:00 pm. Submit your work online using the Blackboard Assignment function.
CSE Department Distinguished Seminar SeriesTopic: “Architecture of Product Lines”
Speaker: Dr. David M. Weiss, Avaya LaboratoriesLocation: Packard Lab 466
Date: Thurs., Sept. 6, 4:00 pm – 5:00 pmReception @ 3:30 pm in Packard Lobby
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 3 -
Algorithms
* Even if you've studied algorithms before, this one is probably new to you. In the context of this course, it's a question of some importance to your final grade.
Questions to answer (starting today):
Skills to develop (over the course of the semester):
• What is an algorithm?
• What is the difference between an algorithm and a program?
• What makes one algorithm better than another?
• Reading and understanding the description of an algorithm.
• Translating a textual description into working program code.
• Writing about an algorithm so others can understand it. *
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 4 -
Algorithms
What is an algorithm?
An algorithm is a sequence of well-defined operations that solve a particular formal problem of interest.
We may have some vague ideas about answers to these questions, but for our purposes we need to be rigorous.
Does that shed light on the matter? Or raise more questions?
• What is a well-defined operation?• What does solve mean?• What exactly is a formal problem?
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 5 -
Algorithms
Another viewpoint: an algorithm is a "black box" that transforms inputs (a problem statement) into outputs (a solution to the problem).
AlgorithmUniverse of
possible problems of a
given type
Specificprobleminstance
Solution toprobleminstance
Note that algorithm must be ready to solve any instance!
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 6 -
Operations
What is a well-defined operation?• Goal is to specify algorithm so that it can be run on any
computer, without limiting it to a specific architecture.• Hence, operations should be abstract, but must reflect what
can be done on real-world machines.• Often called “pseudo-code” (like a programming language).
Some examples of typical operations:
AssignmentFormat: a ← b
Effect: sets the variable a to the value of b. *
* Note: for convenience, we'll sometimes write '=' instead of '←'.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 7 -
More operations
ArithmeticFormat: a + b, a – b, a * b, a / b, ab
Effect: addition, subtraction, multiplication, division, and exponentiation of numbers.
ConditionalFormat: if A is true
Belse
CEffect: If statement A is true, executes operation B,
otherwise executes operation C.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 8 -
More operations
We are allowed to create more complex functions by combining simpler operations:
Example: MAX(a,b)if a > b
return aelse
return b
Effect: Returns the maximum of a and b.
Example: DIST(x1, y1, x2, y2)dx ← (x2 – x1)2
dy ← (y2 – y1)2
return SQRT(dx + dy)
Effect: Returns Euclidean distance between points (x1,y1) and (x2,y2).
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 9 -
A simple algorithm
Is our MAX function an algorithm?
Example: MAX(a,b)if a > b
return aelse
return b
Hmmm ... we said an algorithm was a sequence of well-defined operations that solve a particular formal problem of interest.
• Does MAX always solve the problem?So MAX is an algorithm!
YES.• Is the problem formally specified?YES.• Are the operations well-defined?
YES.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 10 -
How about this?
Let's try making MAX a little simpler ...
Example: MAX2(a,b)return a
• Does MAX2 always solve the problem?
YES.• Is the problem formally specified?YES.• Are the operations well-defined?
NO.This version only works about half the time on average.
While this is a silly example, it highlights an important point. We generally insist that our algorithms work all of the time.
Techniques that provide no guarantees are called heuristics.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 11 -
Brief digression ...
Soon, we'll be learning thePerl programming language.
• A real runnable program requires a few more details that,for now, will only cloud the discussion.
• Generality ⇒ don't tie algorithm to a specific language.• It's traditional to use pseudo-code for describing algorithms.
(I.e., you'll see this when you read about algorithms.)
Why are we using pseudo-code? Why not use Perl?
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 12 -
More operations
FOR loopsFormat: for i ← a to b
BEffect: Sets i to a and executes operation B.
Repeats for i = a + 1, a + 2, ..., b – 1, b.
WHILE loopsFormat: while A is true
BEffect: Checks condition A. If true, then executes
operation B. Checks A again, if true, executes B again. Repeats until A not true.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 13 -
More operations
Array accessFormat: ai - or - a[i]
Effect: Returns ith item of array a = (a1, ..., ai, ..., an).
Note: it's sometimes convenient to assume that first item in array is stored at index 0 instead of index 1. In that case, we'll write (a0, ..., ai, ..., an-1) or maybe a[0], a[1], ..., a[n-1].
Example:if a = (1, 8, 2, 5, 7)then
a2 = 8a5 = 7, etc.
Arrays can be n-dimensional. E.g.,1 8 2 5 7
if a = 6 2 9 4 53 7 0 9 8
thena2,3 = 9, etc.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 14 -
Another simple algorithm
STRINGCOPY(s,n)for i ← 1 to n
ti = si
return t
Consider the (computational) problem of making a copy of a given DNA sequence.
String Duplication Problem: Given a string of letters, return a copy.Input: A string s = (s1, s2, ..., sn) of length n, as an
array of characters.Output: A string representing a copy of s.
Formal problemstatement
Algorithm that willsolve all instances
of problem
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 15 -
A slightly harder problem
Consider making change in a financial transaction optimally.
• Least annoying way (from customer's perspective).What does “optimally” mean?
• Fewest number of coins.• (Could also be lightest weight, most “useful” coins, etc.)
What does “least annoying” mean?
Say you go to McDonalds for dinner and your bill totals $19.23. You hand the cashier a $20 bill. Your change could be:
etc.
3 quarters + 2 pennies 77 pennies
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 16 -
The change problem
Input: An amount of money, M, in cents.Output: Smallest number of quarters q, dimes d, nickels
n, and pennies p whose value adds up to M.(I.e., 25q + 10d + 5n + p = M and q + d + n + p is as small as possible.)
United States Change Problem:Convert some amount of money into the fewest number of U.S. coins.
• Provide as much as possible using largest denomination.• Make up remainder as much as possible using next largest.• Etc.
General approach to the solution (conventional wisdom):
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 17 -
The change problem
USCHANGE(M)while M > 0
c ← Largest coin no bigger than MGive coin with denomination c to customerM ← M - c
A solution to the US change problem:
This works, doesn't it? (Do you see any potential problems?)
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 18 -
Let's get a bit more general
Input: An amount of money, M, and an array of d denominations c = (c1, c2, ..., cd), in decreasing order of value.
Output: A list of d integers i1, i2, ..., id such thatc1i1 + c2i2 + ... cdid = M and i1 + i2 + ... + id is as small as possible.
Change Problem:Convert some amount of money M into given denominations, using the smallest number of coins.
Looks good! Now can't we just adapt our existing algorithm?
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 19 -
The change problem
USCHANGE(M)r ← Mq ← r / 25r ← r – 25 * qd ← r / 10r ← r – 10 * dn ← r / 5r ← r – 5 * np ← rreturn (q, d, n, p)
Slightly more detailed solution:
This works, but it seemsawfully specific, doesn't it?
Division assumed toreturn integer result
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 20 -
A more general change problem
BETTERCHANGE(M,c,d)r ← Mfor k ← 1 to d
ik ← r / ck
r ← r – ck * ikreturn (i1, i2, ..., id)
Adapting our previous algorithm:
Is this algorithm correct?
In other words, does it work for all possible inputs?
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 21 -
Is this algorithm correct?
BETTERCHANGE(M,c,d)r ← Mfor k ← 1 to d
ik ← r / ck
r ← r – ck * ikreturn (i1, i2, ..., id)
No – this algorithm will not work for all cases!
• Today we would use one quarter, one dime, and one nickel.• In the past, however, the US had a 20 cent coin ...• ... in this case, optimal change would be two such coins.
Consider making change for 40 cents:
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 22 -
Fibonacci
He assumes the rabbits do not escape and none die.
A pair of rabbits are put in a field. If rabbits take a month to become mature and then produce a new pair every month after that, how many pairs will there be in twelve months time?
Leonardo of Pisa, better known as Fibonacci, has been called the "greatest European mathematician of the middle ages." In one of Fibonacci's books, he introduces a problem for his readers to use to practice their arithmetic:
Our problem: compute how many rabbits after n months.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 23 -
1 pair
Fibonacci
Let's look at what happens ...
5 pairs3 pairs2 pairs1 pair
same rabbitsnew babies
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 24 -
A Fibonacci algorithm
Example: FIBONACCI(n)F1 ← 1F2 ← 1for i ← 3 to n
Fi ← Fi-1 + Fi-2
return Fn
Effect: Computes the nth Fibonacci number
How do we compute number of rabbit pairs after n months?Note that this number equals:• # of rabbit pairs at n – 1 months ... plus ...• # of mature rabbit pairs at n – 1 months (they'll have babies)But this last value is same as # of rabbit pairs at n – 2 months.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 25 -
Recursion
Example: RECURSIVEFIBONACCI(n)if n = 1 or n = 2
return 1else
a ← RECURSIVEFIBONACCI(n – 1)b ← RECURSIVEFIBONACCI(n – 2)return a + b
Fibonacci is an example of a simple iterative algorithm.Another way to view this is that computing FIBONACCI for n months can be done by calling FIBONACCI for n – 1 months and for n – 2 months and summing the two values together.
This technique is known as recursion.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 26 -
Recursion
Recursion is a powerful and useful concept. It doesn't always lead to efficient algorithms, however. Consider calls to RECURSIVEFIBONACCI:
n-3 n-4 n-4 n-5 n-4 n-5 n-5 n-6
n-2 n-3 n-3 n-4
n
n-1 n-2Many redundant calls, much wasted computation
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 27 -
Towers of Hanoi
Consider now the following problem ...
Towers of Hanoi ProblemInput: An integer n.
Output: A sequence of moves that will solve the n-disk Towers of Hanoi puzzle.
Move the three disks from blue peg to red peg subject to:• May only move one disk at a time.• May never place a larger disk over a smaller one.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 28 -
Towers of Hanoi
How can we solve this?
Just move the disk!
Reduce it to an easier problem ... just one disk.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 29 -
Towers of Hanoi
How to apply this? Solve n-1 disk problem here
(1)
Move largedisk here
(2)
Solve n-1 disk problem here(3)
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 30 -
Towers of Hanoi
HANOI(n, fromPeg, toPeg)if n = 1
output “Move disk from peg fromPeg to peg toPeg”return
unusedPeg ← 6 – fromPeg – toPegHANOI(n – 1, fromPeg, unusedPeg)output “Move disk from peg fromPeg to peg toPeg”HANOI(n – 1, unusedPeg, toPeg)return
Expressing this as a recursive algorithm ...
This simple algorithm solves any Towers of Hanoi problem, but not necessarily quickly ...
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 31 -
Efficiency
How can we analyze the efficiency of this algorithm?By counting the number of basic operations performed.
Let T(n) represent the number of disk moves for HANOI(n).
Add 1 to both sides: T(n) + 1 = 2 * (T(n – 1) + 1)
Let U(n) = T(n) + 1, then: U(n) = 2 * U(n – 1)
T(n) = 2 * T(n – 1) + 1T(1) = 1
Then:
U(n) = 2 * U(n – 1) = 2 * 2 * U(n – 2) = ... = 2nSo:
So T(n) = 2n – 1 and HANOI(n) is an exponential algorithm.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 32 -
n T(n) = 2n – 1
Efficiency
HANOI takes exponential time. What does this mean?
1 12 33 74 155 316 637 1278 2559 511
Hmm ... not so good.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 33 -
T(n) = 2n – 1
Big-O notation
We need a way to express runtime of algorithm without worrying about nitty gritty details of a specific implementation.
Example of a “nitty gritty” detail. This 1 doesn't really contribute anything.
Instead, we write that HANOI has a runtime of O(2n), which is read as “order two to the n.”
More formally: we write that a function f(n) is O(g(n)) if f(n) grows no faster than g(n).
In other words, there exists constants c and x0 such that for all values of x ≥ x0, we have f(x) ≤ cg(x).
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 34 -
Big-O and Big-Ω
Big-O notation is a worst-case view of the world. It tells us that a function grows no faster than some other function, but not whether it grows much slower.
For example, the function f(n) = 2n + 2 is O(n). But it's also:• O(n2)• O(n3), etc.We say that big-O notation provides an upper bound on the growth of a function, but the bound is not necessarily tight.
Big-omega notation (Ω) provides the corresponding lower bound on the growth of a function.I.e., a function f(n) is Ω(g(n)) if f(n) grows no slower than g(n).
In general, we care mostly about upper bounds.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 35 -
Growth rates of various functions
In the long run, constants and low-order terms don't matter:
We always express runtime of algorithm relative to its highest-order term.
Eventually O(n2) beats O(n3). In end, O(logn) beats them both.
runt
ime
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 36 -
Tractable vs. intractable problems
• So far, we've discussed the time complexity of algorithms. But we haven't discussed the complexity of problems. The difference is key.
When is a problem “solved”? Are some harder than others?
• A problem is considered solved (“easy”) if we know an efficient algorithm. Efficient here means polynomial time.
• A problem is unsolved if we know no such algorithm.• A problem is hard if no such algorithm exists.• Note that “unsolved” and “hard” aren't the same thing.
Before going further, we need a notion of “hard” and “easy”.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 37 -
Relativized complexity
(1) No such algorithm exists.(2) It exists, but we're just not smart enough to find it.
We'd like to prove (1), but often that's impossible. If we can't prove (1), then we're forced to admit (2) may be the case.
Idea: prove that if we could solve our problem, that would also provide a solution to another, well-known problem that other people have been working on for a long time without success.
If a lot of smart people have been working for a long time, this increases our confidence no such algorithm exists (and, hence, we look a little less stupid ourselves).
Say we don't know a good algorithm for a problem. Two cases:
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 38 -
NP-completeness
A class of problems that smart people have been working on for a long time with little success are the NP-complete problems.
These are problems which might have an efficient solution (a polynomial time algorithm), but no one has been able to find it.
All NP-complete problems related: solve one, solve them all!
To prove a problem is hard, reduce NP-complete problem to it:
KnownNP-completeproblem
Ournew
problem
Efficienttransformation Posited
solution
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 39 -
(Not
cov
ered
inth
is c
ours
e.)
Taxonomy of algorithm design techniques
Your IBA book is organized in terms of general methodologies:Exhaustive search enumerate all possible solutions and
look for best one.Branch and bound eliminate search paths that
obviously won't be productive.Dynamic programming build up solutions for larger
problems from smaller ones.Divide-and-conquer break problem into pieces which are
easier to solve, then combine.Machine learning collect statistics over time and use
these to solve current problem.Randomized algorithms used to overcome certain kinds of
worst-case scenarios.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 40 -
Hints on reading, writing, and talking about algorithms
• Realize that you may have to read the description several times before you truly understand the algorithm.
• Strive to understand the algorithm deeply. Try to learn it well enough that you could actually implement it.
• Understand (and be able to explain) the model of the problem that the algorithm is designed to solve.
• Know whether the algorithm is exact or a heuristic. If the latter, figure out the kinds of cases where it breaks.
• Understand the efficiency of the algorithm so that you are able to express it in convincing terms.
• Use examples to illustrate your presentation. Choose your examples carefully to point out various important issues.
CSE 308-408 · Bioinformatics: Issues and AlgorithmsLopresti · Fall 2007 · Lecture 3 - 41 -
Wrap-up
Remember:• Come to class having done the readings.• Check Blackboard regularly for updates.• If enrolled in CSE 408, let me know which lecture topic you
wish to scribe by Friday, Sept. 7. Send me several choices, keeping in mind that our schedule may shift somewhat if we fall behind for some reason.
Readings for next time:• BB&P Chapters 3-4 (introduction to Perl).