Topics Will be covered Important Programming Concepts Types, Recursion/Induction, Asymptotic Complexity The Toolbox (Data Structures) Arrays, Linked Lists,

Topics Will be covered Important Programming Concepts Types, Recursion/Induction, Asymptotic Complexity The Toolbox (Data Structures) Arrays, Linked Lists, Trees (BSTs, Heaps), Hashtables Practical Things Searching, Sorting, Abstract Data Types Graphs Threads and Concurrency

Topics Sort of Important Dont have a panic attack over these topics! GUIS Just do the problems on old prelims/finals Software engineering Dont write horrible code on coding questions But they are both important in the real world!

Topics Will Not Be Covered Lectures 24 and above Java Virtual Machine Distributed Systems and Cloud Computing Balancing Trees (e.g.: AVL trees) But do know what a balanced and unbalanced tree is Recurrences Network Flow

Primitive Types boolean, int, double, etc Test equality with == and != Compare with, and >= Typing

void f(int x) { x--; } int x = 10; f(x); // x == 10 Pass by Value f 10 f9f9 main 10

Reference types Actual object is stored elsewhere Variable contains a reference to the object == tests equality for the reference equals() tests equality for the object x == y implies that x.equals(y) x.equals(y) does not imply x == y How do we compare objects of type T? Implement the Comparable interface Typing

void f(ArrayList l) { l.add(2); l = new ArrayList (); } ArrayList l = new ArrayList (); l.add(1); f(l); // l contains 1, 2 Pass by Reference {} flfl main l {1} {1,2}

We know that type B can implement/extend A B is a subtype of A; A is a supertype of B The real type of the object is its dynamic type This type is known only at run-time Object can act like a supertype of its dynamic type Cannot act like a subtype of its dynamic type Variables/arguments of type A accept any subtype of A A is a supertype of the static type Static type is a supertype of the dynamic type Typing

Static type is compile-time, determined by code Dynamic type might be a subtype of the static type Casting temporarily changes the static type Upcasts are always safe Always cast to a supertype of the dynamic type Downcasts may not be safe Can downcast to a supertype of the dynamic type Cannot downcast to a subtype of the dynamic type Typing

If B extends A, and B and A both have function foo() Which foo() gets called? Answer depends on the dynamic type If the dynamic type is B, Bs foo will() be called Even if foo() is invoked inside a function of A Exception: static functions Static functions are not associated with any object Thus, they do not have any type Typing

Induction and Recursion Recursion Basic examples Factorial : n! = n(n-1)! Combinations Pascals triangle Recursive structure Tree (tree t = root with right/left subtree) Depth first search Dont forget base case (in proof, in your code)

Induction and Recursion Induction Can do induction on previous recursive problems Algorithm correctness proof (DFS) Math equation proof

Induction and Recursion Step 1 Base case Step 2 suppose n is the variable youre going to do induction on. Suppose the equation holds when n=k Strong induction: suppose it holds for all n

Binary Search Trees Used to sort data inside a tree For every node with value x: Every node in the left subtree has a value < x Every node in the right subtree has a value > x Binary search tree is not guaranteed to be balanced If it is balanced, we can find a node in O(log n) time

Binary Search Trees

Completely unbalanced, but still a BST

Not a Binary Search Tree 8 is in the left subtree of 5

Not a Binary Search Tree 3 is the left child of 2

Adding to a Binary Search Tree Adding 4 to the BST Start at root (5) 4 < 5 Go left to 2 4 > 2 Go right to 3 4 > 3 Add 4 as right child of 3

Tree Traversals Converts the tree into a list Works on any binary tree Do not need a binary search tree Traverse node and its left and right subtrees Subtrees are traversed recursively Preorder: node, left subtree, right subtree Inorder: left subtree, node, right subtree Produces a sorted list for binary search trees Postorder: left subtree, rightsubtree, node

Tree Traversals Inorder Traversal In(5) In(2), 5, In(7) 1, 2, In(3), 5, In(7) 1, 2, 3, 4, 5, In(7) 1, 2, 3, 4, 5, 6, 7, 9

Binary Heaps Weaker condition on each node A node is smaller than its immediate children Dont care about entire subtree Dont care if right child is smaller than left child Smallest node guaranteed to be at the root No guarantees beyond that Guarantee also holds for each subtree Heaps grow top-to-bottom, left-to-right Shrink in opposite direction

Binary Heaps

Adding to a heap Find lowest unfilled level of tree Find leftmost empty spot, add node there New node could be smaller than its parent Swap it up if its smaller If swapped, could still be smaller than its new parent Keep swapping up until the parent is smaller

Binary Heaps Removing from a heap Take out element from the top of the heap Find rightmost element on lowest level Make this the new root New root could be larger than one/both children Swap it with the smallest child Keep swapping down

Binary Heaps Can represent a heap as an array Root element is at index 0 For an element at index i Parent is at index (i 1)/2 Children are at indices 2i + 1, 2i + 2 n-element heap takes up first n spots in the array New elements grow heap by 1 Removing an element shrinks heap by 1

Motivation Sort n numbers between 0 and 2n 1 Sorting integers within a certain range More specific than comparable objects with unlimited range General lower bound of O(n log n) may not apply Can be done in O(n) time with counting sort Create an array of size 2n The i th entry counts all the numbers equal to i For each number, increment the correct entry Can also find a given number in O(1) time Hashtables

Cannot do this with arbitrary data types Integer alone can have over 4 billion possible values For a hashtable, create an array of size m Hash function maps each object to an array index between 0 and m 1 (in O(1) time) Hash function makes sorting impossible Quality of hash function is based on how many elements map to same index in the hashtable Need to expect O(1) collisions Hashtables

Dealing with collisions In counting sort, one array entry contains only element of the same value The hash function can map different objects to the same index of the hashtable Chaining Each entry of the hashtable is a linked list Linear Probing If h(x) is taken, try h(x) + 1, h(x) + 2, h(x) + 3, Quadratic probing: h(x) + 1, h(x) + 4, h(x) + 9, Hashtables

Table Size If too large, we waste space If too small, everything collides with each other Probing falls apart if number of items (n) is almost the size of the hashtable (m) Typically have a load factor 0 < 1 Resize table when n/m exceeds Resizing changes m Have to reinsert everything into new hashtable Hashtables

Table doubling Double the size every time we exceed our load factor Worst case is when we just doubled the hashtable Consider all prior times we doubled the table n + n/2 + n/4 + n/8 + < 2n Insert n items in O(n) time Average O(1) time to insert one item Some operations take O(n) time This also works for growing an ArrayList Hashtables

Java, hashCode(), and equals() hashCode() assigns an object an integer value Java maps this integer to a number between 0 and m 1 If x.equals(y), x and y should have the same hashCode() Insert object with one hashCode() Wont find it if you look it up with a different hashCode() If you override equals(), you must also override hashCode() Different objects can have the same hashCode() If this happens too often, we have too many collisions Only equals() can determine if they are equal Hashtables

Searching Unsorted lists Element could be anywhere, O(n) search time Sorted lists binary search Try middle element Search left half if too large, right half if too small Each step cuts search space in half, O(log n) steps Binary search requires random access Searching a sorted linked list is still O(n)

Sorting Many ways to sort Same high-level goal, same result Whats the difference? Algorithm, data structures used dictates running time Also dictate space usages Each algorithm has its own flavor Once again, assume random access to the list

Sorting Swap operation: swap(x, i, j) temp = x[i] x[i] = x[j] x[j] = temp Many sorts are a fancy set of swap instructions Modifies the array in place, very space-efficient Not space efficient to copy a large array

Insertion Sort for (i = 0n-1) Take element at index i, swap it back one spot until you hit the beginning of the list previous element is smaller than this one Ex.: 4, 7, 8, 6, 2, 9, 1, 5 swap 6 back twice After k iterations, first k elements relatively sorted 4 iterations: 4, 6, 7, 8, 2, 9, 1, 5 O(n 2 ) time, in-place sort O(n) for sorted lists, good for nearly sorted lists

Selection Sort for (i = 0n-1) Scan array from index i to the end Swap smallest element into index i Ex.: 1, 3, 5, 8, 9, 6, 7 swap 6, 8 After k iterations, first k elements absolutely sorted 4 iterations: 1, 3, 5, 6, 9, 8, 7 O(n 2 ) time, in-place sort Also O(n 2 ) for sorted lists

Merge Sort Copy left half, right half into two smaller arrays Recursively run merge sort each half Base case: 1 or 2 element array Merge two sorted halves back into original array Ex. (1, 3, 6, 9), (2, 5, 7, 8) (1, 2, 3, 5, 6, 7, 8, 9) Running Time: O(n log n) Merge takes O(n) time Split the list in half about log(n) times Also uses O(n) extra space!

Quick Sort Randomly pick a pivot Partition into two (unequal) halves Left partition smaller than pivot, right partition larger Recursively run quick sort on both partitions Expected Running Time: O(n log n) Partition takes O(n) time Pivot could be smallest, largest element (n partitions) Expected to split list O(log n) times In-place sort: partition can be done in-place

Heap Sort Use a max-heap (represented as an array) Can move items up/down the heap with swaps for (i = 0n-1) Add element at index i to the heap First pass builds the heaps for (i = n-10) Remove largest element, put it in spot i O(n log n), in-place sort

Abstract Data Types Lists Stacks LIFO Queues FIFO Sets Dictionaries (Maps) Priority Queues Java API E.g.: ArrayList is an ADT list backed by an array

Abstract Data Types Priority Queue Implemented as heap peek() - look at heap root : O(1) poll() - heap delete op : O(log n) add() - heap add op : O(log n)

A graph has vertices A graph has edges between two vertices n number of vertices; m number of edges Directed vs. undirected graph Directed edges can only be traversed one way Undirected edges can be traversed both way Weighted vs. unweighted graph Edges could have weights/costs assigned to them What is a graph?

What makes a graph special? Cycles!!! What is a graph without a cycle? Undirected graphs Trees Directed graphs Directed acyclic graph (DAG) What is a graph?

Topological sort is for directed graphs Indegree: number of edges entering a vertex Outdegree: number of edges leaving a vertex Topological sort algorithm Delete a vertex with an indegree of 0 Delete its outgoing edges, too Repeat until no vertices have an indegree of 0 Topological Sort

A B D EC ABEDC

What cant topological sort cannot delete? Cycles!!! Every node in a cycle has an indegree of 1 Need to delete another node in the cycle first A graph is DAG iff a topological sort deletes it iff - if and only if Topological Sort

Works on directed and undirected graphs Components set of vertices connected to edges Component starts as a single vertex No edges are selected yet Travel edges to add vertices to this component From a vertex in the component to a new vertex Graph Searching

Implementation details Why is choosing a random path risky? Cycles!!! Could traverse a cycle forever Need to keep track of vertices in your component No cycles if you do not visit a vertex twice Graph search algorithms need: A set of vertices in the component (visited) A collection of vertices to visit (toVisit) Graph Searching

Add the start vertex to toVisit Pick a vertex from toVisit If the vertex is in visited, do nothing Could add same vertex to toVisit twice before visiting it If the vertex is not in visited: Add vertex (and the edge to it) to visited Follow its edges to neighboring vertices Add each neighboring vertices to toVisit if it is not in visited Repeat until there are no more vertices to visit Graph Searching

Running time analysis Can check if vertex is in visited in O(1) time Use an array or HashSet Each edge added to toVisit once in a directed graph Algorithm has at most m iterations Running time determined by ADT used for toVisit Stack, queue: O(1) to add, delete; O(m) total Priority queue: O(log m) to add, delete; O(m log m) total Graph Searching

Depth-first search Uses a stack Goes as deep as it can before taking a new path Breadth-first search Uses a queue Graph Searching

Graph Searching: DFS A B D EC -A A-BB-C B-EE-D ABEDC

Graph Searching: BFS A B D EC -A E-D A-B C-D B-C B-E ABEDC

MSTs apply to undirected graphs Take only some of the edges in the graph Spanning all vertices connected together Tree no cycles connected For all spanning trees, m = n 1 All unweighted spanning trees are MSTs Need to find MST for a weighted graph Minimum Spanning Tree

Initially, we have no edges selected We have n 1-vertex components Find edge between two components Dont add edge between vertices in same component Creates a cycle Pick smallest edge between two components This is a greedy strategy, but it somehow works If edge weights are distinct, only one possible MST Doesnt matter which algorithm you choose Minimum Spanning Tree

Kruskals algorithm Process edges from least to greatest Each edge either Connects two vertices in two different components (take it) Connects two vertices in the same component (throw it out) O(m log m) running time O(m log m) to sort the edges Need union-find to track components and merge them Does not alter running time Minimum Spanning Trees

A B C D E G F 5 7 2 8 3 10 12 4 1 9

Prims algorithm Graph search algorithm Like BFS, but it uses a priority queue Priority is the edge weight Size of heap is O(m); running time is O(m log m) Minimum Spanning Trees

A B C D E G F 5 7 2 8 3 10 12 4 1 9 -A 0 A-C 2 A-B 5 C-B 4 C-D 3 D-E 8 D-F 10 B-E 7 D-G 12 E-G 9 G-F 1

For directed and undirected graphs Find shortest path between two vertices Start vertex is the source End vertex is the sink Use Dijkstras algorithm Assumes positive edge weights Graph search algorithm using a priority queue Weight is now entire path length Cost of path to node + cost of edge to next vertex Shortest Path Algorithm

A B C D E G F 5 7 2 8 3 10 15 4 1 9 -A 0 A-B 5 A-C 2 C-B 6 C-D 5 B-E 12 D-E 13 D-F 15 D-G 20 F-G 16 E-G 21

Motivation for Threads Splitting the Work Multi-core processors can multi-task Cant multi-task with a single thread! Operating system simulates multi-tasking Obvious: Multi-tasking is faster Less Obvious: Not all threads are active Waiting for a movie to download from DC++ Multi-task: Study for CS 2110 while you wait! Operating system runs another thread while one waits

Thread Basics Code Set of instructions, stored as data A paper with instructions does not do anything Code by itself does not run or execute Thread Runs/executes code Multiple people can follow the same instructions Multiple threads can run the same code

Thread Basics Making the Code Write a class implementing the Runnable interface Or use an anonymous inner class on the fly Code to execute goes in the run() method Running a Thread Thread thread = new Thread(Runnable target); Multiple threads could be constructed with the same target Start a thread: thread.start(); Wait until a thread finishes: thread.join();

Concurrency Two or more active threads Hard to predict each threads speed Nondeterministic behavior Two active threads, one data structure Both perform writes, or one reads while another writes No problems if both read data Updates are not atomic! E.g.: One heap operation (such as add) takes multiple steps Wait until update completes to read/write again Data in a bad state until this happens

Concurrency Avoiding race conditions (write/write, read/write) Imagine a room with a key Must acquire the key before entering the room Enter room, do many things, leave room, drop off key Mutual exclusion, or mutexes Only one thread can acquire a mutex Others threads wait until mutex is released

Synchronization Could use Javas Semaphore class with a count of 1 Primitive, many corner cases like exceptions Better: synchronized (x) { /* body*/ } Thread 1 acquires a lock on object x is a reference type; lock acquired on its memory location Thread 2 must wait if it wants a lock on x Also waits if it wants a lock on y if x == y Synchronized functions: synchronized (this) Only one synchronized function running per object

Synchronization Only synchronize when you have to Get key, enter room, chat with BFF on phone for 1 hour People waiting for key are very angry! Deadlock via bugs Thread 1 never releases lock on A, Thread 2 waiting for A Deadlock via circular waiting Thread 1 has lock on A, waiting for lock on B Thread 2 has lock on B, waiting for lock on A Solution: always acquire locks in same order

Documents

Topics Will be covered Important Programming Concepts Types, Recursion/Induction, Asymptotic Complexity The Toolbox (Data Structures) Arrays, Linked Lists,