CSE 326 Data Structures: Complexity

1

CSE 326 Data Structures: Complexity

Lecture 2: Wednesday, Jan 8, 2003

2

Overview of the Quarter• Complexity and analysis of algorithms• List-like data structures• Search trees• Priority queues• Hash tables• Sorting• Disjoint sets• Graph algorithms• Algorithm design• Advanced topics

Midtermaround here

3

Complexity and analysis of algorithms

• Weiss Chapters 1 and 2

• Additional material– Cormen,Leiserson,Rivest – on reserve at Eng.

library– Graphical analysis– Amortized analysis

4

Program Analysis

• Correctness– Testing– Proofs of correctness

• Efficiency– Asymptotic complexity - how running times

scales as function of size of input

5

Proving Programs Correct• Often takes the form of an inductive proof• Example: summing an array

int sum(int v[], int n)

{

if (n==0) return 0;

else return v[n-1]+sum(v,n-1);

}


{

if (n==0) return 0;


}

What are the parts of an inductive proof?

6

Inductive Proof of Correctnessint sum(int v[], int n)

{

if (n==0) return 0;


}


{

if (n==0) return 0;


}

Theorem: sum(v,n) correctly returns sum of 1st n elements of array v for any n.

Basis Step: Program is correct for n=0; returns 0.

Inductive Hypothesis (n=k): Assume sum(v,k) returns sum of first k elements of v.

Inductive Step (n=k+1): sum(v,k+1) returns v[k]+sum(v,k), which is the same of the first k+1 elements of v.

7

Inductive Proof of Correctness

int find(int x, int v[], int n){ int left = -1; int right = n;

while (left+1 < right) { int m = (left+right) / 2; if (x == v[m]) return m; if (x < v[m]) right = m; else left = m; } return –1; /* not found */}

int find(int x, int v[], int n){ int left = -1; int right = n;

while (left+1 < right) { int m = (left+right) / 2; if (x == v[m]) return m; if (x < v[m]) right = m; else left = m; } return –1; /* not found */}

Binary search in a sortedarray:

v[0]v[1] ... v[n-1]

Given x,find m s.t. x=v[m]

Exercise 1: proof that it is correct

Exercise 2: compute m, k s.t.v[m-1] < x = v[m] = v[m+1] = ... = v[k] < v[k+1]

8

Proof by Contradiction

• Assume negation of goal, show this leads to a contradiction

• Example: there is no program that solves the “halting problem”– Determines if any other program runs

forever or not

Alan Turing, 1937

9

• Does NonConformist(NonConformist) halt?

• Yes? That means HALT(NonConformist) = “never

halts”

• No? That means HALT(NonConformist) = “halts”

Program NonConformist (Program P)If ( HALT(P) = “never halts” ) Then

Halt Else

Do While (1 > 0)Print “Hello!”

End WhileEnd If

End Program

Contradiction!

10

Defining Efficiency

• Asymptotic Complexity - how running time scales as function of size of input

• Two problems:– What is the “input size” ?– How do we express the running time ? (The

Big-O notation)

11

Input Size

• Usually: length (in characters) of input

• Sometimes: value of input (if it is a number)

• Which inputs?– Worst case: tells us how good an algorithm works– Best case: tells us how bad an algorithm works– Average case: useful in practice, but there are

technical problems here (next)

12

Input Size

Average Case Analysis• Assume inputs are randomly distributed according

to some “realistic” distribution • Compute expected running time

• Drawbacks– Often hard to define realistic random distributions– Usually hard to perform math

Inputs( )

( , ) Prob ( ) RunTime( )x n

E T n x x

13

Input Size

• Recall the function: find(x, v, n)• Input size: n (the length of the array)• T(n) = “running time for size n”• But T(n) needs clarification:

– Worst case T(n): it runs in at most T(n) time for any x,v– Best case T(n): it takes at least T(n) time for any x,v– Average case T(n): average time over all v and x

14

Input Size

Amortized Analysis• Instead of a single input, consider a sequence of

inputs:– This is interesting when the running time on some input

depends on the result of processing previous inputs

• Worst case analysis over the sequence of inputs• Determine average running time on this sequence• Will illustrate in the next lecture

15

Definition of Order Notation• Upper bound: T(n) = O(f(n)) Big-O

Exist constants c and n’ such that

T(n) c f(n) for all n n’

• Lower bound: T(n) = (g(n)) OmegaExist constants c and n’ such that

T(n) c g(n) for all n n’

• Tight bound: T(n) = (f(n)) ThetaWhen both hold:

T(n) = O(f(n))

T(n) = (f(n))

Other notations: o(f), (f) - see Cormen et al.

16

Example: Upper Bound

2

2

2 2

2

2

2

Proof: Must find , such that for all ,

100

Let's try setting 2. Then

100 2

100

100

So we

Claim

can

: 10

set 100 and reverse the steps abov .

(

e

0 )

c n n n

n n cn

c

n n n

n

n n O

n

n

n

n

17

Using a Different Pair of Constants2

2 2

2 2

2


100


100 100

100 101

Claim: 100 ( )

(divide both sides by

100 100

1

So we can set 1 and reve

n)

rs

c n n n

n n cn

c

n n n

n n

n

n

n

n n O n

e the steps above.

18

Example: Lower Bound2

2

2 2

2 2

2


100


100

0

So we can set 0 and reverse the steps above.

Thus we can also conc

Claim: 100 ( )

100lude

c n n n

n n cn

c

n n n

n

n

n n n

n n

2( )n

19

Conventions of Order Notation2 2

2 2

Order notation is not symmetric: write

but never

The expression ( ( )) ( ( )) is equivalent to

( ) ( ( ))

The expression ( ( )) ( ( )) is equivalent to

( ) ( (

( )

2

))

T

(

he right

2

)

-h

O f n O g n

f n O g n

f n g n

f n

n n

g n

O n n n

O n

2

2 2

2 318

and side is a "cruder" version of the l

( ) ( )

18 ( ) (

(2

log ) ( )

f

)

e t:nn O n O n

n n n

O

n n

20

Which Function Dominates?f(n) =

n3 + 2n2

n0.1

n + 100n0.1

5n5

n-152n/100

82log n

g(n) =

100n2 + 1000

log n

2n + 10 log n

n!

1000n15

3n7 + 7n

Question to class: is f = O(g) ? Is g = O(f) ?

21

Race If(n)= n3+2n2 g(n)=100n2+1000vs.

22

Race IIn0.1 log nvs.

23

Race IIIn + 100n0.1 2n + 10 log nvs.

24

Race IV5n5 n!vs.

25

Race Vn-152n/100 1000n15vs.

26

Race VI82log(n) 3n7 + 7nvs.

27

• Eliminate low order terms

• Eliminate constant coefficients

3 2 28

3 28

3 28

3 28 8

3 3 28 8

3 28

38

38

38

3

16 log (10 ) 100

16 log (10 )

log (10 )

log (10) log ( )

log (10) log ( )

log ( )

2 log ( )

log ( )

log (2) log( )

log( )

n n n

n n

n n

n n

n n n

n n

n n

n n

n n

n n

3 2 2 3816 log (10 ) 100 ( log( ))n n n O n n

28

Common Namesconstant: O(1)logarithmic: O(log n) linear: O(n)log-linear: O(n log n)quadratic: O(n2)exponential: O(cn) (c is a constant > 1)hyperexponential: (a tower of n

exponentials

Other names:superlinear: O(nc) (c is a constant > 1)polynomial: O(nc) (c is a constant > 0)

)O(22. . .22

Fastest Growth

Slowest Growth

29

Sums and Recurrences

Often the function f(n) is not explicit but expressed as:

• A sum, or

• A recurrence

Need to obtain analytical formula first

30

Sums

)(6

)12)(1(...21)( 3

1

2222 nOnnn

innfn

i

)(2

)1(...21)( 2

1

nOnn

innfn

i

)()12()12(...531)( 2

1

2 nOninnfn

i

(?)...21)( 333 Onnf

(??))34()34(...951)(1

44444 Oinnfn

i

31

More Sums

)3(13

1333...331)(

1

12 n

n

i

nin Onf

))(ln()1ln()ln(11

111

...3

1

2

1

1

1)(

11

nOndxxin

nfn

i

n

)1(1

1

11

11

11...

3

1

2

1

1

1)(

11 222222

On

dxxin

nfn

i

n

Sometimes sums are easiest computed with integrals:

32

Recurrences• f(n) = 2f(n-1) + 1, f(0) = T• Telescoping

f(n)+1 = 2(f(n-1)+1) f(n-1)+1 = 2(f(n-2)+1) 2 f(n-2)+1 = 2(f(n-3)+1) 22

. . . . . f(1) + 1 = 2(f(0) + 1) 2n-1

f(n)+1 = 2n(f(0)+1) = 2n(T+1)

f(n) = 2n(T+1) - 1

33

Recurrences

• Fibonacci: f(n) = f(n-1)+f(n-2), f(0)=f(1)=1 try f(n) = A cn What is c ? A cn = A cn-1 + A cn-2

c2 – c – 1 = 0

nnn

OBAnf

c

2

51

2

51

2

51)(

2

51

2

4112,1

Constants A, B can be determined from f(0), f(1) – not interesting for us for the Big O notation

34

Recurrences

• f(n) = f(n/2) + 1, f(1) = T

• Telescoping:f(n) = f(n/2) + 1f(n/2) = f(n/4) + 1. . .f(2) = f(1) + 1 = T + 1

f(n) = T + log n = O(log n)

Documents

CSE 326 Data Structures: Complexity