Upload
sean-bugeja
View
15
Download
3
Embed Size (px)
DESCRIPTION
A brief introduction to various data structures and alogrithms in short note form. Perfect for studying. Based on the CSA1017 Course at the University of Malta
Citation preview
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
1
CSA 1017 Data Structures and Algorithms 1
Lecturer: John Abela
Algorithms and Complexity
Algorithm: a set of rules for solving a problem in a finite number of steps
Numbers:
- Natural: (1, 2, 3 )
- Integers: (including negatives and 0)
- Rational: (including fractions)
- Real: (including non-terminating and non-recurring decimals ex. )
- Complex: (including roots of negative numbers)
Set: a collection of objects (not necessarily numbers)
U -> Universal Set
|S| -> Cardinality -> The number of objects in set S
In order to have a mathematical structure, one needs a set and an operator or a relation
,
Operator: takes one or more elements from a set, apply an operation and gives something else
- Example: addition (+), subtraction(-), concatenate(), union(), intersection (), etc...
- Unary Operator: works on one object
- Binary Operator: works on two objects
- Properties of operators:
- Commutative: Consider S, c
x, y S, x c y = x c y - Associative: Consider S, a
x, y, z S, (x a y) a z = x a (y a z) - Distributive: Consider S, d1, d2
x, y, z S, x d1 (y d2 z) = (x d1 y) d2 (x d1 z)
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
2
Closed Set: when the operator gives back an object from the same set, therefore:
a, b S, a b S Example: the natural numbers are closed under addition
When an operator, works on an object and the identity, the result is the object itself
Consider S, o a S, a o i = a If the operand is addition, the identity is 0 and if the operand is multiplication, the identity is 1
Relation: acts upon elements from a set and gives a true or false a, b S, a b -> T/F
Properties of relations:
- Reflexive: Consider S, r x S, x r x -> T - Symmetric: Consider S, s x, y S, x s y -> T and y s x -> T - Transitive: Consider S, t x, y, z S, x t y -> T and y t z -> T Then x t z -> T
Equivalence: a relation that is reflexive, symmetric and transitive. With equivalence, the set becomes
partitioned and every element is part of one (and only one) partition. Also, the union of all the
partitions equals the original set
Function:
Domain
a Func[on
f Co-
Domain
x
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
3
Notations:
, f(x) = x2
f: "
f: a " a*a
Cartesian Product: the set SxS (or S2) holds all possible pairs (or triples) from set S
Defining Sets: Example: to define a set of even numbers
E = { a | a mod 2 = 0 } | means such that
Inductive Definition: Example: to define natural numbers
- Basis 0
- Operation +1 or succ()
- Closure
Function Properties:
- One-to-One: every y in the co-domain has only one x in the domain such that f(x) = y
- Onto: for every y in the co-domain there is an x in the domain such that f(x) = y. Therefore, all
elements in the co-domain are used
Types of Functions:
Linear Function: Example: f(x) = 2x
Polynomial Function: Example: f(x) = x2
Logarithmic Function: Example: f(x)= log2 x
Exponential Function: Example: f(x) = 2x
Intractable problems: problems that can be solved but not fast enough for the solution to be useful
[all exponential functions]
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
4
Travelling Salesman Problem (TSP): given a list of cities and their pair-wise distances, the task is to find
the shortest possible tour that visits each city exactly once
[In this case, to obtain an optimal result, one has to use an exponential algorithm (generate
all permutations possible and compare results). However, it will take billions of years to
execute, so we use an approximation solution. We use an algorithm that will generate
random sequences and keeps the smallest result. This solution may, or may not, be the
shortest distance.]
Space Complexity: the amount of space required by an algorithm to execute, expressed as a function of
the size of the input to the problem
Time Complexity: the amount of time taken by an algorithm to run, expressed as a function of the size
of the input to the problem (cannot be less than space complexity)
- Bubble sort n items: O(n2) - Quick sort n items: O(n.log2 n) - Sequential search through n unsorted items: O(n) - Binary search through n sorted items: O(log2 n)
Towers of Hanoi: to move the disks on a different pole without ever having a larger disk on a smaller
disk (Time Complexity: O(2n))
Q: Why do we work out the time complexity?
A: To make sure that the algorithm is not intractable (ie. not exponential)
Consider, input n
for i = 1 to 100
writeln(Students are dumb);
for i = 1 to n
writeln(Lecturers are nice people);
[When you work out the time complexity of a program, ignore all code fragments that are
input invariant and add up the time complexities of the code fragments which are input
variant. This means that for the above program, the time complexity is O(n)]
Input Invariant
Input Variant
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
5
Running Time Function (RTF): a function that from some points onwards become positive. A function f
is and RTF if:
f : + "
st no + and f(n) > 0
n > no
Examples:
- f(n) = 2n + 1 no = 0
- f(n) =n2 no = 0
- f(n) = n2 16 no = 4
- f(n) = sin n + 10
sin n + 1 is not an RTF since 0 is not strictly positive
Theta Relation: consider two RTFs f(n) and g(n)
f(n) is (g(n)) if:
c1 and c2 + and n0 +
st c1 g(n) f(n) c2 g(n)
n > n0
Examples:
- f(n) = n2 g(n) = 3n2
C1 g(n) f(n) C2 g(n) 1 9 3n2 n2 1. 3n2 => n2 is 3n2
- f(n) = n3 g(n) = n2
C1 g(n) f(n) C2 g(n)
.. n2 n3 .. n2
=> n3 is not n2
You cannot find a C2 such that C2n2 is greater than n3.
Since, when n exceeds C2, C2n2 will be smaller than f(n)
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
6
Note: if the largest exponent of two functions is the same, you will be able to find C1 and C2. Coefficients
and other terms are ignored
Example: n2 is 4n2 + 3n + 1 and their complexity is O(n2)
is reflexive f(n) is (f(n))
proof:
c1 f(n) f(n) c2 f(n)
let c1 = c2 = 1
f(n) f(n) f(n)
is symmetric if f(n) is (g(n)) then g(n) is (f(n))
proof:
if f(n) is (g(n))
=> c1 g(n) f(n) c2 g(n)
for g(n) is (f(n))
c3 f(n) g(n) c4 f(n)
consider f(n) c2 g(n)
1/c2 f(n) g(n) => c3 = 1/c2
consider c1 g(n) f(n) g(n) 1/c2 f(n) => c4 = 1/c1
1/c2 f(n) g(n) c4 = 1/c1 f(n)
Note that no remains the same
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
7
is transitive if f(n) is (g(n)) and g(n) is (h(n)) then f(n) is (h(n))
proof:
if f(n) is (g(n))
=> c1 g(n) f(n) c2 g(n)
if g(n) is (h(n))
=> c3 h(n) g(n) c4 h(n)
for f(n) is (h(n))
c5 h(n) f(n) c6 h(n)
consider c1 g(n) f(n) and c3 h(n) g(n)
c3 c1 h(n) c1 g(n) f(n) => c5 = c3 . c1
consider f(n) c2 g(n)and g(n) c4 h(n)
f(n) c2 g(n) c4 c2 h(n) => c6 = c4 . c2
c3 . c1 h(n) f(n) c4 . c2 h(n)
note that no is the maximum out of the two
Properties for Theta relation:
1. For any c > 0, c f(n) is (f(n))
2. If f1(n) is (g(n)) and f2(n) is (g(n)), then (f1 + f2)(n) is (g(n))
3. If f1(n) is (g1(n)) and f2(n) is (g2(n)), then (f1 . f2)(n) is ((g1 . g2) (n))
Big O Relation: consider two RTFs f(n) and g(n)
f(n) is O(g(n)) if:
c + and n0 + st f(n) c g(n)
n > n0 If f(n) is O(g(n)) than the growth rate of f is not larger than that of g
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
8
Examples:
- n3 O(n2) -> False - n2 O(n3) -> True
Omega Relation: consider two RTFs f(n) and g(n)
f(n) is (g(n)) if:
c + and n0 + st f(n) c g(n)
n > n0 If f(n) is (g(n)) than the growth rate of f is not smaller than that of g
Properties of Big O and Omega notations:
1. If f(n) is (g(n)) then g(n) is O(f(n))
2. If f(n) is (g(n)) and f(n) is O(g(n)) then f(n) is (g(n))
Notes: - is O(n)
- is not (n)
[Logarithmic growth is less than linear]
- is not O(n)
- < Exponential function> is (n)
[Exponential growth is greater than linear] Non-Computable Problems: problem that provably, no one can write an algorithm to find their solution.
Example:
- Halting: one cannot write an algorithm that works on another program (given as input) and
determine whether that program will terminate or not
- Busy Beaver
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
9
Computable Problems:
Example:
- Polynomial: problems for which the solution is polynomial
- Exponential: problems with exponential solutions
- Unknown Algorithms: problems for which no standard algorithm exists. Example: an algorithm
to predict the Super 5 of next week
A problem is in NP if:
1. It is a problem where the answer is yes or no [decision problem]
2. It is presented to the oracle and the answer is returned in O(1)
3. Answer is verifiable in polynomial time [polynomial verifiability]
Example: TSP
1. Decision Problem: Is there a tour of length k or less?
2. Oracle: answer is a list of cities
3. Polynomial Verifiability: compute the distance between the cities in the answer and check that
the result is k
Example: Map Coloring Problem [Given a map of countries ,find the least amount of colors needed so
that two adjacent countries do not have the same color]
1. Decision Problem: Can the map be colored with k colors or less?
2. Oracle: answer is a list of cities and colors
3. Polynomial Verifiability:
a. Place a node in the center of each country and name it according to the color
b. Check that no edge has the same letter twice
NP-Completeness (NPC):
Let be in NP
is in NPC if all problems in NP can be Turing reduced to
Turing Reduced ( ): take the input of P` (a very hard problem) and change it to the input of P in polynomial time
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
10
Example of Turing reduction:
Vertex Cover: given a graph, a vertex covers an edge if it touches it. Choose the smallest number of
vertices that cover all the edges
Set Cover: given a set, and a list of its subsets, find the smallest amount of subsets that their union is
equal to the original set
Change the input of the vertex cover to accommodate the set cover
- Original set: all edges
- Subsets: subset for each vertex containing the edges that it covers
Therefore, Vertex Cover Set Cover
Reducing TSP to shell sort: note: not Turing reduction!
Create a list of all combinations and their lengths (still exponential) and sort it using shell sort
TSP is in NPC (ie. all problems can be reduced to it)
[P NP] P = NP ?: can an exponential problem be reduced to polynomial solution?
If one NPC is reduced to polynomial time, all NPCs can be reduced to polynomial time!
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
11
Data Structures
Graph: a collection of vertices that could be connected together by edges,
G = (V, E)
V is a finite set of vertices
E is a subset of VxV
Example:
a b
d
c
e
f
V = {a, b, c, d, e, f}
E = {ab, bc, cd}
Path: an ordered sequence of vertices V1Vn
st (Vi, Vi+1) E 1 i n
Directed Edge: an edge that can only be traversed in one direction
A " B
Cycle: a path, V1Vn, where V1 = Vn
Note: a graph can either be cyclic or acyclic
Connected Graph:
If a, b V a path from a to b
Snake Relation (~): V, ~
a, b V a~b if a path from a to b This relation partitions the graph into components
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
12
Linked List: a logical list ideal to list in order, but not to search
Trees:
- Parent node is called the root
- Roots can have children
- Children can have children
- Nodes without children are called leaves
- Acyclic graph
- One path from root to all other nodes
To define the descendants of x:
- Take the sub-tree of which x is the root
- All nodes except x
To define the ancestors of y:
- Take the path from y to the root
- All nodes except y
Height: the length of the largest path of a given tree
Note: height is important because we usually search a tree to find a node, and the longest path, is the
longest time you can take. Therefore, the time complexity for searching is the height.
Restriction on Trees:
- Structure Conditions: restriction on the number of children per node
- Order Condition: restriction in the order of values in nodes
- Balance Condition: restriction on the balance of the heights of the tree
Structure Order Balance
Unrestricted Tree
Binary Tree
BST
AVL
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
13
Unrestricted Tree: a tree that has no type of restriction whatsoever
- Best Case Height : 1
- Worst Case Height: n 1
Q: How do you store an unrestricted tree in a data structure?
A: First Child / Next Sibling
Example:
Node First Child Next Sibling
1 C 5 -
2 E - 6
3 A - 4
4 D 2 -
5 B - 3
6 F - -
7 - - -
Binary Tree: every node has at most two children
- Best Case Height : log2 n
- Worst Case Height: n 1
Notes:
- The level is the log2 of the nodes in that level
- The height is [log2 (n+1)]-1
- Proof:
Level 0 1 Node
Level 1 2 Nodes
Level 2 4 Nodes
Level 3 8 Nodes
Total Nodes: 15
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
14
Storing a Binary Tree in an Array:
d
a
bc
e
fg
Left Data Right
1 2 d 5
2 4 a 7
3 - b -
4 - c -
5 3 e 6
6 - f -
7 - g -
Binary Search Tree (BST): for any node, the values in the left sub-tree must be less than (or equal to) it
and the values in the right sub-tree must be greater than (or equal to) it
Searching a BST:
- Best Case Time Complexity : 1
- Worst Case Time Complexity: height
Efficient Searching
17
12
277
30
15 If you enter nodes in alphabetical order, you get an unbalanced tree
Deleting an element from a sorted binary tree:
1. Find element to delete
2. Choose subtree on the left or right
3. Find rightmost or leftmost element
4. Place instead of the deleted element
5. If necessary repeat
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
15
Searching: start at root, visit the less nodes possible until you find the desired value, or until you find a
leaf (Time Complexity: height)
Traversal: a way of visiting all nodes (Time Complexity: O(n) )
- In-order Traversal: Left-Root-Right
inorder(T)
inorder(T.L)
write(root)
inorder(T.R)
- Pre-order Traversal: Root-Left-Right
preorder(T)
write(root)
preorder(T.L)
preorder(T.R)
- Post-order Traversal: Left-Right-Root
postorder(T)
postorder(T.L)
postorder(T.R)
write(root)
Expression Tree: a binary tree where all leaves store operands and all non-leaves store operators. Every
node has 0 or 2 children.
- Post-order Traversal: gets postfix expression
- In-order Traversal: gets infix expression
inorder(T)
write(()
inorder(T.L)
write(root)
inorder(T.R)
write())
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
16
Consider the following expression tree,
*
+
3
5
4
Inorder -> Infix: ((3+4)*5)
Postorder -> Postfix: 34+5*
Construct an Expression tree from a postfix expression:
- Start reading postfix expression from left to right
- When an operand is found, push it into the stack as a one-node tree
- When an operator is found, pop two trees from the stack, join them by that operator, and push
back in the stack
Adelson Velskii-Landis (AVL): a BST with a balance condition: for any node, the height of the left sub-
tree and that of the right sub-tree, differs by mostly 1. Example:
15
12 30
17
20 40
18 25
If one adds the number 23, it would violate the AVL condition
Rebalancing in constant time:
1. Add new node (Example: 23)
2. Consider all nodes on path from new node to the root and recompute AVL height for every node.
Let the first node that violates the AVL condition be (in this case 30), go back to the new node,
and note the first two directions (in this case Left-Right) and use it in the template (Note:
searching for takes logarithmic time)
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
17
Template:
LR
L R
RL RRLL
21 3 4
4 Rotations: 4 rebalancing algorithms using constant time O(1). 1 and 4 are single rotations while 2 and
3 are double rotations.
Template 1:
Y
A Z
B
X
Y
A
Z
B
X
Template 4:
B
AX
Y
Z
Y
A
Z
B
X
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
18
Template 2:
A
B Z
W Y
A
Z
B
YX
W X
Template 3:
W
Y
A
Z
B
W X
B
ZA
YX
B-Trees: can have many children
- Advantage: faster - Disadvantage: complex coding
- Order M - Root has between 2 and M children
- All non-root nodes have 2 to M children - All non-leaf nodes have M-1 data pointers - All data is in the leaves - Leaves have M data items
17 30 1 2 3
20 1 2 3
10 1 2 3
40 51 1 2 3
102 7 51 522017 18 4030 31 39
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
19
Abstract Data Type (ADT): we define how they work and not how to implement them, therefore they
are completely independent of the implementation (stacks & queues)
Priority Queue: a queue where each element is given a priority key, and the first element that goes out
is the one with the highest priority
- Insertion & Retrieval:
1. Insertion O(1) and Retrieval O(n): both value and priority reside in an array in the order they
were entered. For retrieval, one have to search the whole array for the element with the
highest priority
2. Insertion O(n) and Retrieval O(1): for every element added, the array is swapped and sorted
according to the priority. Therefore, to remove an element, the last element of the array is
removed
3. Most efficient method (using a heap)
Heap: a binary tree (not a BST) that helps us implement a priority queue with O(log2 n) insertion and
removal. It is filled in level by level from left to right. All leaves are at l or l 1. For all nodes n, n >=
children
7 1 3 2
5
11 6
17
Cascade Up: once an element is added, check the value of its parent. IT the value of the parent is less
than that of the new element, swap them etc
Cascade Down: since only the root can be deleted, a technique called cascade down is used:
1. Remove the root
2. Put the last element added instead of the deleted root
3. Replace the node with the largest of its children (unless both smaller)
4. Etc
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
20
Implementing a heap using an array:
- Children of i are at 2i and 2i+1
- Parents of i are at (int) 2 17 11 6 7 1 3 2 5
1 2 3 4 5 6 7 8
Sorting using a heap:
1. Build a heap
2. Put root in an array (ie. largest number)
3. Restore heap
4. Go to step 2
- Time Complexity:
o n log2 n to build the heap from the unsorted list o n log2 n to build the sorted list from the heap o Total: 2n log2 n
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
21
Sorting Algorithms
Iterative Sorts:
- Bubble Sort
- Shell Sort
- Insertion Sort
Bubble Sort: checks the element with the adjacent element and swaps if necessary. Repeat until the
array is sorted. With every step, the largest number is put at the end of the list
- Time Complexity: n(n 1) = n2 n = O(n2)
- Space Complexity: n
- Pseudo code: [most optimized] flag k = 1 repeat flag = false for i = 1 to n-k if a[i] > a[i+1] then swap and set flag = true k = k + 1 until flag = false
Shell Sort: similar to bubble sort, but instead of comparing [i] with [i + 1], compare it with [i + k], where
k = 2 for every iteration. This will put the largest number at the end of the list - Time Complexity: O(n2)
- Space Complexity: n
- Pseudo code: flag k = n/2 repeat flag = false for i = 1 to n-k if a[1] > a[i+k] then swap and set flag = true if k > 1 then k = k/2 and flag = true until k = 1 and flag = false
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
22
Insertion Sort: divides an array into a sorted and unsorted region. It then gets the first item of the
unsorted region and places it in the right place in the sorted region
- Time Complexity: n(n 1) = n2 n = O(n2)
- Space Complexity: n
- Pseudo code: for x = 2 to n for i = x downto 2 if a[i] < a[i-1] then swap
Recursive Sorts:
- Quick Sort (most popular sort because of its space complexity)
- Merge Sort (fastest sort)
Quick Sort: partitions the array around a pivot (ideally at the middle) and rearranges the array such that
all items before the pivot are less than the pivot, and all items after the pivot are greater the pivot
- Time Complexity:
o Best: O(n log2 n) o Worst: O(n2)
- Space Complexity: n
- Pseudo code: if |L|
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela
23
Merge Sort: splits an array in two halves, recursively merges both halves and merges sorted halves
- Time Complexity: O(n log2 n)
- Space Complexity: 2n
- Pseudo code: if |L|