23
CSA 1017 Data Structures and Algorithms 1 Mr. John Abela 1 CSA 1017 – Data Structures and Algorithms 1 Lecturer: John Abela Algorithms and Complexity Algorithm: a set of rules for solving a problem in a finite number of steps Numbers: - Natural: (1, 2, 3 …) - Integers: (including negatives and 0) - Rational: (including fractions) - Real: (including nonterminating and nonrecurring decimals ex. π) - Complex: (including roots of negative numbers) Set: a collection of objects (not necessarily numbers) U > Universal Set |S| > Cardinality > The number of objects in set ‘S’ In order to have a mathematical structure, one needs a set and an operator or a relation <set name>, <operator or relation symbol> Operator: takes one or more elements from a set, apply an operation and gives something else Example: addition (+), subtraction(), concatenate(°), union(), intersection (), etc... Unary Operator: works on one object Binary Operator: works on two objects Properties of operators: Commutative: Consider S, c x, y S, x c y=x c y Associative: Consider S, a x, y, z S, (x a y) a z=x a (y a z) Distributive: Consider S, d 1 , d 2 x, y, z S, x d 1 (y d 2 z) = (x d 1 y) d 2 (x d 1 z)

Data Structures and Algorithms

Embed Size (px)

DESCRIPTION

A brief introduction to various data structures and alogrithms in short note form. Perfect for studying. Based on the CSA1017 Course at the University of Malta

Citation preview

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    1

    CSA 1017 Data Structures and Algorithms 1

    Lecturer: John Abela

    Algorithms and Complexity

    Algorithm: a set of rules for solving a problem in a finite number of steps

    Numbers:

    - Natural: (1, 2, 3 )

    - Integers: (including negatives and 0)

    - Rational: (including fractions)

    - Real: (including non-terminating and non-recurring decimals ex. )

    - Complex: (including roots of negative numbers)

    Set: a collection of objects (not necessarily numbers)

    U -> Universal Set

    |S| -> Cardinality -> The number of objects in set S

    In order to have a mathematical structure, one needs a set and an operator or a relation

    ,

    Operator: takes one or more elements from a set, apply an operation and gives something else

    - Example: addition (+), subtraction(-), concatenate(), union(), intersection (), etc...

    - Unary Operator: works on one object

    - Binary Operator: works on two objects

    - Properties of operators:

    - Commutative: Consider S, c

    x, y S, x c y = x c y - Associative: Consider S, a

    x, y, z S, (x a y) a z = x a (y a z) - Distributive: Consider S, d1, d2

    x, y, z S, x d1 (y d2 z) = (x d1 y) d2 (x d1 z)

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    2

    Closed Set: when the operator gives back an object from the same set, therefore:

    a, b S, a b S Example: the natural numbers are closed under addition

    When an operator, works on an object and the identity, the result is the object itself

    Consider S, o a S, a o i = a If the operand is addition, the identity is 0 and if the operand is multiplication, the identity is 1

    Relation: acts upon elements from a set and gives a true or false a, b S, a b -> T/F

    Properties of relations:

    - Reflexive: Consider S, r x S, x r x -> T - Symmetric: Consider S, s x, y S, x s y -> T and y s x -> T - Transitive: Consider S, t x, y, z S, x t y -> T and y t z -> T Then x t z -> T

    Equivalence: a relation that is reflexive, symmetric and transitive. With equivalence, the set becomes

    partitioned and every element is part of one (and only one) partition. Also, the union of all the

    partitions equals the original set

    Function:

    Domain

    a Func[on

    f Co-

    Domain

    x

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    3

    Notations:

    , f(x) = x2

    f: "

    f: a " a*a

    Cartesian Product: the set SxS (or S2) holds all possible pairs (or triples) from set S

    Defining Sets: Example: to define a set of even numbers

    E = { a | a mod 2 = 0 } | means such that

    Inductive Definition: Example: to define natural numbers

    - Basis 0

    - Operation +1 or succ()

    - Closure

    Function Properties:

    - One-to-One: every y in the co-domain has only one x in the domain such that f(x) = y

    - Onto: for every y in the co-domain there is an x in the domain such that f(x) = y. Therefore, all

    elements in the co-domain are used

    Types of Functions:

    Linear Function: Example: f(x) = 2x

    Polynomial Function: Example: f(x) = x2

    Logarithmic Function: Example: f(x)= log2 x

    Exponential Function: Example: f(x) = 2x

    Intractable problems: problems that can be solved but not fast enough for the solution to be useful

    [all exponential functions]

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    4

    Travelling Salesman Problem (TSP): given a list of cities and their pair-wise distances, the task is to find

    the shortest possible tour that visits each city exactly once

    [In this case, to obtain an optimal result, one has to use an exponential algorithm (generate

    all permutations possible and compare results). However, it will take billions of years to

    execute, so we use an approximation solution. We use an algorithm that will generate

    random sequences and keeps the smallest result. This solution may, or may not, be the

    shortest distance.]

    Space Complexity: the amount of space required by an algorithm to execute, expressed as a function of

    the size of the input to the problem

    Time Complexity: the amount of time taken by an algorithm to run, expressed as a function of the size

    of the input to the problem (cannot be less than space complexity)

    - Bubble sort n items: O(n2) - Quick sort n items: O(n.log2 n) - Sequential search through n unsorted items: O(n) - Binary search through n sorted items: O(log2 n)

    Towers of Hanoi: to move the disks on a different pole without ever having a larger disk on a smaller

    disk (Time Complexity: O(2n))

    Q: Why do we work out the time complexity?

    A: To make sure that the algorithm is not intractable (ie. not exponential)

    Consider, input n

    for i = 1 to 100

    writeln(Students are dumb);

    for i = 1 to n

    writeln(Lecturers are nice people);

    [When you work out the time complexity of a program, ignore all code fragments that are

    input invariant and add up the time complexities of the code fragments which are input

    variant. This means that for the above program, the time complexity is O(n)]

    Input Invariant

    Input Variant

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    5

    Running Time Function (RTF): a function that from some points onwards become positive. A function f

    is and RTF if:

    f : + "

    st no + and f(n) > 0

    n > no

    Examples:

    - f(n) = 2n + 1 no = 0

    - f(n) =n2 no = 0

    - f(n) = n2 16 no = 4

    - f(n) = sin n + 10

    sin n + 1 is not an RTF since 0 is not strictly positive

    Theta Relation: consider two RTFs f(n) and g(n)

    f(n) is (g(n)) if:

    c1 and c2 + and n0 +

    st c1 g(n) f(n) c2 g(n)

    n > n0

    Examples:

    - f(n) = n2 g(n) = 3n2

    C1 g(n) f(n) C2 g(n) 1 9 3n2 n2 1. 3n2 => n2 is 3n2

    - f(n) = n3 g(n) = n2

    C1 g(n) f(n) C2 g(n)

    .. n2 n3 .. n2

    => n3 is not n2

    You cannot find a C2 such that C2n2 is greater than n3.

    Since, when n exceeds C2, C2n2 will be smaller than f(n)

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    6

    Note: if the largest exponent of two functions is the same, you will be able to find C1 and C2. Coefficients

    and other terms are ignored

    Example: n2 is 4n2 + 3n + 1 and their complexity is O(n2)

    is reflexive f(n) is (f(n))

    proof:

    c1 f(n) f(n) c2 f(n)

    let c1 = c2 = 1

    f(n) f(n) f(n)

    is symmetric if f(n) is (g(n)) then g(n) is (f(n))

    proof:

    if f(n) is (g(n))

    => c1 g(n) f(n) c2 g(n)

    for g(n) is (f(n))

    c3 f(n) g(n) c4 f(n)

    consider f(n) c2 g(n)

    1/c2 f(n) g(n) => c3 = 1/c2

    consider c1 g(n) f(n) g(n) 1/c2 f(n) => c4 = 1/c1

    1/c2 f(n) g(n) c4 = 1/c1 f(n)

    Note that no remains the same

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    7

    is transitive if f(n) is (g(n)) and g(n) is (h(n)) then f(n) is (h(n))

    proof:

    if f(n) is (g(n))

    => c1 g(n) f(n) c2 g(n)

    if g(n) is (h(n))

    => c3 h(n) g(n) c4 h(n)

    for f(n) is (h(n))

    c5 h(n) f(n) c6 h(n)

    consider c1 g(n) f(n) and c3 h(n) g(n)

    c3 c1 h(n) c1 g(n) f(n) => c5 = c3 . c1

    consider f(n) c2 g(n)and g(n) c4 h(n)

    f(n) c2 g(n) c4 c2 h(n) => c6 = c4 . c2

    c3 . c1 h(n) f(n) c4 . c2 h(n)

    note that no is the maximum out of the two

    Properties for Theta relation:

    1. For any c > 0, c f(n) is (f(n))

    2. If f1(n) is (g(n)) and f2(n) is (g(n)), then (f1 + f2)(n) is (g(n))

    3. If f1(n) is (g1(n)) and f2(n) is (g2(n)), then (f1 . f2)(n) is ((g1 . g2) (n))

    Big O Relation: consider two RTFs f(n) and g(n)

    f(n) is O(g(n)) if:

    c + and n0 + st f(n) c g(n)

    n > n0 If f(n) is O(g(n)) than the growth rate of f is not larger than that of g

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    8

    Examples:

    - n3 O(n2) -> False - n2 O(n3) -> True

    Omega Relation: consider two RTFs f(n) and g(n)

    f(n) is (g(n)) if:

    c + and n0 + st f(n) c g(n)

    n > n0 If f(n) is (g(n)) than the growth rate of f is not smaller than that of g

    Properties of Big O and Omega notations:

    1. If f(n) is (g(n)) then g(n) is O(f(n))

    2. If f(n) is (g(n)) and f(n) is O(g(n)) then f(n) is (g(n))

    Notes: - is O(n)

    - is not (n)

    [Logarithmic growth is less than linear]

    - is not O(n)

    - < Exponential function> is (n)

    [Exponential growth is greater than linear] Non-Computable Problems: problem that provably, no one can write an algorithm to find their solution.

    Example:

    - Halting: one cannot write an algorithm that works on another program (given as input) and

    determine whether that program will terminate or not

    - Busy Beaver

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    9

    Computable Problems:

    Example:

    - Polynomial: problems for which the solution is polynomial

    - Exponential: problems with exponential solutions

    - Unknown Algorithms: problems for which no standard algorithm exists. Example: an algorithm

    to predict the Super 5 of next week

    A problem is in NP if:

    1. It is a problem where the answer is yes or no [decision problem]

    2. It is presented to the oracle and the answer is returned in O(1)

    3. Answer is verifiable in polynomial time [polynomial verifiability]

    Example: TSP

    1. Decision Problem: Is there a tour of length k or less?

    2. Oracle: answer is a list of cities

    3. Polynomial Verifiability: compute the distance between the cities in the answer and check that

    the result is k

    Example: Map Coloring Problem [Given a map of countries ,find the least amount of colors needed so

    that two adjacent countries do not have the same color]

    1. Decision Problem: Can the map be colored with k colors or less?

    2. Oracle: answer is a list of cities and colors

    3. Polynomial Verifiability:

    a. Place a node in the center of each country and name it according to the color

    b. Check that no edge has the same letter twice

    NP-Completeness (NPC):

    Let be in NP

    is in NPC if all problems in NP can be Turing reduced to

    Turing Reduced ( ): take the input of P` (a very hard problem) and change it to the input of P in polynomial time

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    10

    Example of Turing reduction:

    Vertex Cover: given a graph, a vertex covers an edge if it touches it. Choose the smallest number of

    vertices that cover all the edges

    Set Cover: given a set, and a list of its subsets, find the smallest amount of subsets that their union is

    equal to the original set

    Change the input of the vertex cover to accommodate the set cover

    - Original set: all edges

    - Subsets: subset for each vertex containing the edges that it covers

    Therefore, Vertex Cover Set Cover

    Reducing TSP to shell sort: note: not Turing reduction!

    Create a list of all combinations and their lengths (still exponential) and sort it using shell sort

    TSP is in NPC (ie. all problems can be reduced to it)

    [P NP] P = NP ?: can an exponential problem be reduced to polynomial solution?

    If one NPC is reduced to polynomial time, all NPCs can be reduced to polynomial time!

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    11

    Data Structures

    Graph: a collection of vertices that could be connected together by edges,

    G = (V, E)

    V is a finite set of vertices

    E is a subset of VxV

    Example:

    a b

    d

    c

    e

    f

    V = {a, b, c, d, e, f}

    E = {ab, bc, cd}

    Path: an ordered sequence of vertices V1Vn

    st (Vi, Vi+1) E 1 i n

    Directed Edge: an edge that can only be traversed in one direction

    A " B

    Cycle: a path, V1Vn, where V1 = Vn

    Note: a graph can either be cyclic or acyclic

    Connected Graph:

    If a, b V a path from a to b

    Snake Relation (~): V, ~

    a, b V a~b if a path from a to b This relation partitions the graph into components

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    12

    Linked List: a logical list ideal to list in order, but not to search

    Trees:

    - Parent node is called the root

    - Roots can have children

    - Children can have children

    - Nodes without children are called leaves

    - Acyclic graph

    - One path from root to all other nodes

    To define the descendants of x:

    - Take the sub-tree of which x is the root

    - All nodes except x

    To define the ancestors of y:

    - Take the path from y to the root

    - All nodes except y

    Height: the length of the largest path of a given tree

    Note: height is important because we usually search a tree to find a node, and the longest path, is the

    longest time you can take. Therefore, the time complexity for searching is the height.

    Restriction on Trees:

    - Structure Conditions: restriction on the number of children per node

    - Order Condition: restriction in the order of values in nodes

    - Balance Condition: restriction on the balance of the heights of the tree

    Structure Order Balance

    Unrestricted Tree

    Binary Tree

    BST

    AVL

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    13

    Unrestricted Tree: a tree that has no type of restriction whatsoever

    - Best Case Height : 1

    - Worst Case Height: n 1

    Q: How do you store an unrestricted tree in a data structure?

    A: First Child / Next Sibling

    Example:

    Node First Child Next Sibling

    1 C 5 -

    2 E - 6

    3 A - 4

    4 D 2 -

    5 B - 3

    6 F - -

    7 - - -

    Binary Tree: every node has at most two children

    - Best Case Height : log2 n

    - Worst Case Height: n 1

    Notes:

    - The level is the log2 of the nodes in that level

    - The height is [log2 (n+1)]-1

    - Proof:

    Level 0 1 Node

    Level 1 2 Nodes

    Level 2 4 Nodes

    Level 3 8 Nodes

    Total Nodes: 15

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    14

    Storing a Binary Tree in an Array:

    d

    a

    bc

    e

    fg

    Left Data Right

    1 2 d 5

    2 4 a 7

    3 - b -

    4 - c -

    5 3 e 6

    6 - f -

    7 - g -

    Binary Search Tree (BST): for any node, the values in the left sub-tree must be less than (or equal to) it

    and the values in the right sub-tree must be greater than (or equal to) it

    Searching a BST:

    - Best Case Time Complexity : 1

    - Worst Case Time Complexity: height

    Efficient Searching

    17

    12

    277

    30

    15 If you enter nodes in alphabetical order, you get an unbalanced tree

    Deleting an element from a sorted binary tree:

    1. Find element to delete

    2. Choose subtree on the left or right

    3. Find rightmost or leftmost element

    4. Place instead of the deleted element

    5. If necessary repeat

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    15

    Searching: start at root, visit the less nodes possible until you find the desired value, or until you find a

    leaf (Time Complexity: height)

    Traversal: a way of visiting all nodes (Time Complexity: O(n) )

    - In-order Traversal: Left-Root-Right

    inorder(T)

    inorder(T.L)

    write(root)

    inorder(T.R)

    - Pre-order Traversal: Root-Left-Right

    preorder(T)

    write(root)

    preorder(T.L)

    preorder(T.R)

    - Post-order Traversal: Left-Right-Root

    postorder(T)

    postorder(T.L)

    postorder(T.R)

    write(root)

    Expression Tree: a binary tree where all leaves store operands and all non-leaves store operators. Every

    node has 0 or 2 children.

    - Post-order Traversal: gets postfix expression

    - In-order Traversal: gets infix expression

    inorder(T)

    write(()

    inorder(T.L)

    write(root)

    inorder(T.R)

    write())

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    16

    Consider the following expression tree,

    *

    +

    3

    5

    4

    Inorder -> Infix: ((3+4)*5)

    Postorder -> Postfix: 34+5*

    Construct an Expression tree from a postfix expression:

    - Start reading postfix expression from left to right

    - When an operand is found, push it into the stack as a one-node tree

    - When an operator is found, pop two trees from the stack, join them by that operator, and push

    back in the stack

    Adelson Velskii-Landis (AVL): a BST with a balance condition: for any node, the height of the left sub-

    tree and that of the right sub-tree, differs by mostly 1. Example:

    15

    12 30

    17

    20 40

    18 25

    If one adds the number 23, it would violate the AVL condition

    Rebalancing in constant time:

    1. Add new node (Example: 23)

    2. Consider all nodes on path from new node to the root and recompute AVL height for every node.

    Let the first node that violates the AVL condition be (in this case 30), go back to the new node,

    and note the first two directions (in this case Left-Right) and use it in the template (Note:

    searching for takes logarithmic time)

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    17

    Template:

    LR

    L R

    RL RRLL

    21 3 4

    4 Rotations: 4 rebalancing algorithms using constant time O(1). 1 and 4 are single rotations while 2 and

    3 are double rotations.

    Template 1:

    Y

    A Z

    B

    X

    Y

    A

    Z

    B

    X

    Template 4:

    B

    AX

    Y

    Z

    Y

    A

    Z

    B

    X

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    18

    Template 2:

    A

    B Z

    W Y

    A

    Z

    B

    YX

    W X

    Template 3:

    W

    Y

    A

    Z

    B

    W X

    B

    ZA

    YX

    B-Trees: can have many children

    - Advantage: faster - Disadvantage: complex coding

    - Order M - Root has between 2 and M children

    - All non-root nodes have 2 to M children - All non-leaf nodes have M-1 data pointers - All data is in the leaves - Leaves have M data items

    17 30 1 2 3

    20 1 2 3

    10 1 2 3

    40 51 1 2 3

    102 7 51 522017 18 4030 31 39

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    19

    Abstract Data Type (ADT): we define how they work and not how to implement them, therefore they

    are completely independent of the implementation (stacks & queues)

    Priority Queue: a queue where each element is given a priority key, and the first element that goes out

    is the one with the highest priority

    - Insertion & Retrieval:

    1. Insertion O(1) and Retrieval O(n): both value and priority reside in an array in the order they

    were entered. For retrieval, one have to search the whole array for the element with the

    highest priority

    2. Insertion O(n) and Retrieval O(1): for every element added, the array is swapped and sorted

    according to the priority. Therefore, to remove an element, the last element of the array is

    removed

    3. Most efficient method (using a heap)

    Heap: a binary tree (not a BST) that helps us implement a priority queue with O(log2 n) insertion and

    removal. It is filled in level by level from left to right. All leaves are at l or l 1. For all nodes n, n >=

    children

    7 1 3 2

    5

    11 6

    17

    Cascade Up: once an element is added, check the value of its parent. IT the value of the parent is less

    than that of the new element, swap them etc

    Cascade Down: since only the root can be deleted, a technique called cascade down is used:

    1. Remove the root

    2. Put the last element added instead of the deleted root

    3. Replace the node with the largest of its children (unless both smaller)

    4. Etc

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    20

    Implementing a heap using an array:

    - Children of i are at 2i and 2i+1

    - Parents of i are at (int) 2 17 11 6 7 1 3 2 5

    1 2 3 4 5 6 7 8

    Sorting using a heap:

    1. Build a heap

    2. Put root in an array (ie. largest number)

    3. Restore heap

    4. Go to step 2

    - Time Complexity:

    o n log2 n to build the heap from the unsorted list o n log2 n to build the sorted list from the heap o Total: 2n log2 n

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    21

    Sorting Algorithms

    Iterative Sorts:

    - Bubble Sort

    - Shell Sort

    - Insertion Sort

    Bubble Sort: checks the element with the adjacent element and swaps if necessary. Repeat until the

    array is sorted. With every step, the largest number is put at the end of the list

    - Time Complexity: n(n 1) = n2 n = O(n2)

    - Space Complexity: n

    - Pseudo code: [most optimized] flag k = 1 repeat flag = false for i = 1 to n-k if a[i] > a[i+1] then swap and set flag = true k = k + 1 until flag = false

    Shell Sort: similar to bubble sort, but instead of comparing [i] with [i + 1], compare it with [i + k], where

    k = 2 for every iteration. This will put the largest number at the end of the list - Time Complexity: O(n2)

    - Space Complexity: n

    - Pseudo code: flag k = n/2 repeat flag = false for i = 1 to n-k if a[1] > a[i+k] then swap and set flag = true if k > 1 then k = k/2 and flag = true until k = 1 and flag = false

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    22

    Insertion Sort: divides an array into a sorted and unsorted region. It then gets the first item of the

    unsorted region and places it in the right place in the sorted region

    - Time Complexity: n(n 1) = n2 n = O(n2)

    - Space Complexity: n

    - Pseudo code: for x = 2 to n for i = x downto 2 if a[i] < a[i-1] then swap

    Recursive Sorts:

    - Quick Sort (most popular sort because of its space complexity)

    - Merge Sort (fastest sort)

    Quick Sort: partitions the array around a pivot (ideally at the middle) and rearranges the array such that

    all items before the pivot are less than the pivot, and all items after the pivot are greater the pivot

    - Time Complexity:

    o Best: O(n log2 n) o Worst: O(n2)

    - Space Complexity: n

    - Pseudo code: if |L|

  • CSA 1017 Data Structures and Algorithms 1 Mr. John Abela

    23

    Merge Sort: splits an array in two halves, recursively merges both halves and merges sorted halves

    - Time Complexity: O(n log2 n)

    - Space Complexity: 2n

    - Pseudo code: if |L|