Today, we will cover◦ Typing◦ Induction and Recursion◦ Asymptotic Complexity◦ Data Structures◦ Abstract Data Types and Implementing ADTs◦ Searching and Sorting◦ Graphs

For GUIs, you are fine if you can do the practice problems (just do it!)

Do not worry about◦ Threads and concurrency◦ Recurrences◦ Java virtual machine◦ How to balance trees (AVL trees)

But do know the difference between a balanced and unbalanced tree

◦ Software engineering (sort of) Don’t break every known rule of software

engineering when asked to write code We may use a design pattern on the final, but you

won’t have to memorize them

Primitive Types◦ boolean, int, double, etc…◦ Test equality with == and !=◦ Compare with <, <=, >, and >=

void f(int x) { x--;}

int x = 10;f(x);// x == 10



Reference types◦ Actual object is stored elsewhere◦ Variable contains a reference to the object ◦ == tests equality for the reference◦ equals() tests equality for the object

Two different references (!=) may exist to two objects with the same value (equals())

◦ Can compare objects of type T with compareTo() if the Comparable<T> interface is implemented

void f(ArrayList<Integer> l) { l.add(2); l = new ArrayList<Integer>();}

ArrayList<Integer> l = new ArrayList<Integer >();l.add(1);f(l);// l contains 1, 2






We know that type B can implement/extend A◦ B is a subtype of A; A is a supertype of B

The real type of the object is its dynamic type◦ This type is known only at run-time

Any object can act like the supertype of its dynamic type◦ But it cannot act like a subtype of its dynamic type

Variables and function arguments of type A can also accept any subtype of A◦ Type A is a supertype of the dynamic type

The static type is the type your object has in the code when it is compiled

doesn't make sense-objects don't have static type, expressions do

◦ Dynamic type might be a subtype of the static type

type of an object

Upcasts are always safe◦ Always cast to a supertype of the dynamic type

Downcasts may not be safe◦ Can downcast to a supertype of the dynamic type◦ Can downcast to the dynamic type itself◦ Cannot downcast to a subtype of the dynamic type

If B extends A, and B and A both have function foo, which foo gets called?◦ Answer depends on the dynamic type

If the dynamic type is B, B’s foo will even be called if foo is invoked inside a function of A

Exception: static functions◦ Static functions are not associated with any object◦ Thus, they do not have any type

Recursion◦ Basic examples

Factorial : n! = n(n-1)! Combinations Pascal’s triangle

◦ Recursive structure Tree (tree t = root with right/left subtree)

Depth first search

◦ Don’t forget base case (in proof, in your code)

Induction◦ Can do induction on previous recursive problems

Algorithm correctness proof (DFS) Math equation proof

Prelim 2 questions

Step 1◦ Base case

Step 2◦ suppose n is the variable you’re going to do induction

on. Suppose the equation holds when n=k ◦ Strong induction: suppose it holds for all n<=k

Step 3◦ prove that when n = k+1, equation still holds, by

making use of the assumptions in Step 2.

f(n) is O(g(n)) if ∃ (c, n0) such that ∀n≥ n0, f(n)≤c⋅g(n)◦ ∃ - there exists; ∀ - for all◦ (c, n0) is called the witness pair

Once you have a correct witness pair, you can probably use induction to prove it is correct

f(n) is O(g(n)) means that the function f(n) is roughly less than or equal to g(n)

Big-O notation is a model for running time◦ Models usually but do not always work in real life

Meaning of n0

◦ We can compare one integer to another◦ How can we tell if one function is less than or

equal to another?◦ Answer is which function grows faster

One function could also start ahead of the other and grow at the same rate, staying ahead

◦ 60-mph car with no headstart will eventually overtake a 40-mph with a headstart

◦ At what time does the faster car/function take over? n0

Meaning of c◦ Suppose we cannot get a precise integer value

897 is less than or equal to 899, but maybe due to some errors the real numbers were 892 and 884

E.g.: ballot counts in Minnesota recount◦ Idea: Compare order of magnitude

Compare numbers by the number of digits 897 and 899 have the same number of digits Difference between 42 and 482 is far bigger Gives us some room for error

Meaning of c◦ What is the difference between n3+1 and n3?◦ What about n3, n3+2n2, and 2n3?◦ We can be off by a constant factor, c◦ If f(n) is only twice as fast as g(n), setting c to 2 or

greater makes g(n) run faster◦ Constant factor cannot account for difference

between n and n2, log n and n, 2n and 3n

◦ There are three common types of growth Logarithmic, polynomial, and exponential growth

Linked Lists◦ Singly-linked/doubly-linked◦ Sorted/unsorted◦ Add, delete elements

Arrays◦ Sorted/unsorted◦ Add, delete elements

Search Tree◦ Balanced and unbalanced

Search for an element in array/list/tree◦ sorted arrays and balanced search trees O(log n) ◦ linked lists (sorted/unsorted) O(n)◦ other unsorted/unbalanced structures O(n)

Trees◦ Traversal◦ Search

Similar to binary search in an array O(log n) Heap

◦ Min/max heap : heap order invariant Every node smaller/larger than its immediate children

◦ Add an element (see lecture notes) O(log n)◦ Delete an element (see lecture notes) O(log n)◦ Implemented with either a binary tree of array

Motivation◦ Sort n numbers between 0 and 2n – 1◦ Instead of sorting abstract comparable objects,

we are sorting integers within a certain range General lower bound of O(n log n) may not apply

◦ Can be done in O(n) time with counting sort Create an array of size 2n The ith entry counts all the numbers equal to i For each number, increment the correct entry

◦ Can also find a given number in O(1) time

Can not do this with arbitrary data types◦ The integer type alone can have over 4 billion

possible values; no array should be that big For a hashtable, create an array of size m

◦ Hash function maps each object to an array index between 0 and m – 1 (in O(1) time) Hash function makes sorting impossible, but still can

lookup an element in O(1) time◦ Quality of hash function is based on how many

elements map to same index in the hashtable Need to expect O(1) collisions

Dealing with collisions◦ In counting sort, one array entry contains only

element of the same value◦ The hash function can map different objects to

the same index of the hashtable Chaining

◦ Each entry of the hashtable is a linked list Linear Probing

◦ If h(x) is taken, try h(x) + 1, h(x) + 2, h(x) + 3, …◦ Quadratic probing: h(x) + 1, h(x) + 4, h(x) + 9, …

Table Size◦ If too large, we waste space◦ If too small, everything collides with each other

Probing falls apart if number of items (n) is almost the size of the hashtable (m)

◦ Typically have a load factor 0 < λ ≤ 1 Resize table when n/m exceeds λ

◦ Resizing changes m; we have to reinsert everything with a new hash function

Table Size◦ What if we double the size every time we exceed

our load factor? Must double the number of items to exceed the load

factor again Worst case is when we just doubled the hashtable Consider all prior times we doubled the table

n + n/2 + n/4 + n/8 + … < 2n

◦ With table doubling, we can insert n items in O(n) time on average Some operations take O(n) time

◦ This also works for growing an ArrayList

Java, hashcode() and equals()◦ Java uses hashcode() in its hash function

hashcode() assigns each item an integer value Java has a special formula to map this integer to

some number between 0 and m – 1◦ If one object equals() another, they should have

the same hashcode() Cannot insert an object with one hashcode() and

then look the same object up with a different hashcode()

If you override equals(), you must also override hashcode() to preserve this property

Java, hashcode() and equals()◦ Different objects can have the same hashcode()

If this happens too often, we have too many collisions

Only equals() can determine if they are equal

Lists Stacks

◦ LIFO Queues

◦ FIFO Sets Dictionaries (Maps) Priority Queues Java API

◦ E.g.: ArrayList is an ADT list backed by an array

Priority Queue◦ Implement as List (sorted/unsorted) : O(n)◦ Implement as heap

PeekMin look at heap root : O(1) ExtractMin heap “delete” op : O(log n) Insert heap “add” op : O(log n)

Insertion Sort Selection Sort Merge Sort Quick Sort Heap Sort

◦ Best/worse case Average case for quicksort

◦ Asymptotic complexity

Inheritance/Interfaces Abstract classes Meaning of static

A graph has vertices A graph has edges between two vertices n – number of vertices; m – number of

edges Directed vs. undirected graph

◦ Directed edges can only be traversed one way◦ Undirected edges can be traversed both way

Weighted vs. unweighted graph◦ Edges could have weights/costs assigned to them

What makes a graph special?◦ Cycles!!!

What is a graph without a cycle?◦ Undirected graphs

Trees◦ Directed graphs

Directed acyclic graph (DAG)

Topological sort is for directed graphs Indegree: number of edges entering a

vertex Outdegree: number of edges leaving a

vertex Topological sort algorithm

◦ Delete a vertex with an indegree of 0 Delete its outgoing edges, too

◦ Repeat until no vertices have an indegree of 0






What is the only thing a topological sort cannot delete?◦ Cycles!!!

If a graph is a DAG, a topological sort will delete the entire graph

If a topological sort deletes the entire graph, the graph is a DAG

Works on directed and undirected graphs You have a start vertex which you visit first You want to visit all vertices reachable from

the start vertex◦ For directed graphs, depending on your start

vertex, some vertices may not be reachable You can traverse an edge from an already

visited vertex to another vertex

Why is choosing any path on a graph risky?◦ Cycles!!!◦ Could traverse a cycle forever

Need to keep track of vertices already visited◦ No cycles if you do not visit a vertex twice

Might also help to keep track of all unvisited vertices you can visit from a visited vertex

Add the start vertex to the collection of vertices to visit

Pick a vertex from the collection to visit◦ If you have already visited it, do nothing◦ If you have not visited it:

Visit that vertex Follow its edges to neighboring vertices Add unvisited neighboring vertices to the set to visit (You may add the same unvisited vertex twice)

Repeat until there are no more vertices to visit

Running time analysis◦ Visit each vertex only once◦ When you visit a vertex, you traverse its edges

You traverse all edges once on a directed graph Twice on an undirected graph

◦ At worst, you add a new vertex to the collection to visit for each edge (collection has size of O(m))

◦ Lower bound is O(n + m) Actual results depends on cost to add/delete vertices

to/from the collection of vertices to visit

Depth-first search and breadth-first search are two graph searching algorithms

DFS pushes vertices to visit onto a stack◦ Examines a vertex by popping it off the stack

BFS uses a queue instead Both have O(n + m) running time

◦ Push/enqueue and pop/dequeue have O(1) time



















MSTs apply to undirected graphs Take only some of the edges in the graph

◦ Spanning – all vertices connected together◦ Tree – no cycles connected

For all spanning trees, m = n – 1◦ All unweighted spanning trees are MSTs

Need to find MST for a weighted graph

A connected component has a path between all vertices in that component.

Idea: find two unconnected components; connect them

Pick the smallest edge between two unconnected components◦ This is a greedy strategy, but it somehow works

Start with a graph with no edges◦ n connected components, n trees

Add edges between unconnected components◦ Forms a bigger tree

What if you add an edge between two vertices in the same component?◦ Cycles!!!

Kruskal’s algorithm◦ Process edges from least to greatest◦ Either an edge connects two different components

or it connects a component to itself Add an edge only in the former case

◦ Picks smallest edge between two components◦ O(m log m) time to sort the edges

Also need the union-find structure to keep track of components, but it does not change the running time

















Prim’s algorithm◦ Graph search algorithm, builds up a spanning tree

from one root vertex◦ Like BFS, but it uses a priority queue

Priority is the weight of the edge to the vertex Also need to keep track of which edge we used

◦ Always picks smallest edge to an unvisited vertex◦ Size of heap is O(m); running time is O(m log m)




























Works on directed and undirected graphs What is the shortest path from one vertex

(the source) to another (the sink)?◦ (Hint: the answer is not cycles)

If edges have positive weights, we can use Dijkstra’s algorithm

Dijkstra’s algorithm is a graph search algorithm

If it visits a vertex, it knows the shortest path to that vertex

It will eventually hit the sink vertex and know the shortest path to it

Requires positive edge weights to work

Dijkstra’s algorithm is similar to Prim’s◦ Uses a priority queue◦ Also has O(m log m) time

Difference lies in the priority◦ Priority is the length of shortest path to a visited

vertex + cost of edge to unvisited vertex◦ We know the shortest path to every visited vertex

On unweighted graphs, BFS gives us the same result as Dijkstra’s algorithm



























