DAA Lecture 3

Embed Size (px)

Citation preview

  • 8/3/2019 DAA Lecture 3

    1/63

    Design and Analysis of AlgorithmsGraduate Course-Number CSC5011Fall Semester 2011

    Lecture 3 Searching Tactics

    Dr. Md. Shamim Akhter

    Assistant ProfessorComputer Science Department

    American International University Bangladesh

    Email: [email protected]

  • 8/3/2019 DAA Lecture 3

    2/63

    Searching Concept (1/3)Common problem in computer science

    Involves storing and maintaining large dataset, and then searching the data forarticular values

    data storage and retrieval are key tomany industry applications

    search algorithms are necessary tostoring and retrieving data efficiently

  • 8/3/2019 DAA Lecture 3

    3/63

    Searching Concept (2/3) For instance, a program that checks

    the spelling of words, searches forthem in a dictionary, which is just anordered list of words.

    Problems of this kind are calledsearching problems.

  • 8/3/2019 DAA Lecture 3

    4/63

    Searching Concept (3/3) There are many searching algorithms.

    The natural searching method is linearsearch (or sequential search, or exhaustivesearch)

    very simple but takes a long time to apply withlarge lists

    A binary search repeatedly subdivides the

    list to locate an item much faster than linear search

    Like a binary search, an interpolation

    search repeatedly subdivides the list tolocate an item

  • 8/3/2019 DAA Lecture 3

    5/63

    Linear / Sequential Search Special case of brute-force search

    This is a very simple algorithm It uses a loop to sequentially step through

    , .

    It compares each element with the valuebeing searched for and stops when that

    value is found or the end of the array isreached.

  • 8/3/2019 DAA Lecture 3

    6/63

    Linear Search (2/8)Sub LinearSearch(x:int, a[]: Int, loc: Int)

    i:=1While (i

  • 8/3/2019 DAA Lecture 3

    7/63

    Linear Search (3/8)Array numlist contains

    earc ng or t e t e va ue 11, nearsearch examines 17, 23, 5, and11 -> Found

    Searching for the the value 7, linearsearch examines 17, 23, 5, 11, 2,29, and 3 -> Not Found

  • 8/3/2019 DAA Lecture 3

    8/63

    Linear Search (4/8) The advantage is its simplicity.

    It is easy to understand Easy to implement Does not require the array to be in order

    The disadvantage is its inefficiency If there are 20,000 items in the array and

    what you are looking for is in the 19,999th

    element, you need to search through theentire list.

  • 8/3/2019 DAA Lecture 3

    9/63

    Linear Search (5/8) Whenever the number of entries doubles,

    so does the running time, roughly. If a machine does 1 million comparisons

    er second it takes about 30 minutes for

    4 billion comparisons.

  • 8/3/2019 DAA Lecture 3

    10/63

    Linear Search (6/8)

  • 8/3/2019 DAA Lecture 3

    11/63

    Linear Search (7/8)Use a Sentinel to Improve the

    Performance

    Sub LinearSearch2(x:int, a[]: Int, loc: Int)

    = = =While (xa[i])i = i+1

    End WhileIf i

  • 8/3/2019 DAA Lecture 3

    12/63

    Linear Search (8/8)Apply Linear Search to Sorted Lists

    Sub LinearSearch3(x:int, a[]: Int, loc: Int)

    i = 1

    While (x > a[i])i = i+1

    End While

    If a[i] = x Then loc = i Else loc = 0End Sub

  • 8/3/2019 DAA Lecture 3

    13/63

    Binary Search (1/9)Can We Search More Efficiently?

    Yes, provided the list is in some kind oforder, for example alphabetical order withrespect to the names.

    If this is the case, we use a divide andconquer strategy to find an item quickly.

    This strategy is what one would use in anumber guessing game, for example.

  • 8/3/2019 DAA Lecture 3

    14/63

    Binary Search (2/9)Im Thinking of A Number

    between 1 and 1000. Guess it!

    Is it 750? Nope, too high. Is it 625? etc

    This strategy guarantees a correctguess in no more than ten guesses!

  • 8/3/2019 DAA Lecture 3

    15/63

    Binary Search (3/9)Apply This Strategy to Searching

    The resulting algorithm is called theBinary Searchalgorithm. We check the middle key in our list.

    If it is beyond what we are looking for(too high), we look only at the 1st half ofthe list.

    If its not far enough in (too low), welook at the 2nd half.

    Then iterate!

  • 8/3/2019 DAA Lecture 3

    16/63

    Binary Search (4/9)1. Divide a sorted array into three

    sections. middle element elements on one side of the middle

    element elements on the other side of the middle

    element

    2. If the middle element is the correct

    value, done. Otherwise, go to step 1,using only the half of the array thatmay contain the correct value.

  • 8/3/2019 DAA Lecture 3

    17/63

    Binary Search (5/9)

    3. Continue steps 1 and 2 until either the

    value is found or there are no moreelements to examine.

  • 8/3/2019 DAA Lecture 3

    18/63

    Binary Search (6/9)Binary Search Example

    Array numlist2 contains

    2 3 5 11 17 23 29

    Searching for the value 11, binarysearch examines 11 and stops. Found.

    Searching for the value 7, binary searchexamines 11,3,5,and stops. Not

    Found.

  • 8/3/2019 DAA Lecture 3

    19/63

    Binary Search (7/9)Algorithm for Binary search

    Sub BinarySearch(x:int, a[]: int, loc: Int)i =1: j =n

    wbeginm =(i + j) \ 2

    if x > a[m] then i=m+1 else j=mendif x=a[i] then loc=i else loc=0

    End Sub

  • 8/3/2019 DAA Lecture 3

    20/63

    Binary Search (8/9) The worst case number of comparisons

    grows by only 1 comparison every time listsize is doubled.

    Only 32 comparisons would be needed on

    a list of4 billion using Binary Search. Sequential Search would need 4 billion

    comparisons and would take 30 minutes!

  • 8/3/2019 DAA Lecture 3

    21/63

    Binary Search (9/9) Benefit

    Much more efficient than linear search. For array of N elements, performs at

    Disadvantage

    Requires that array elements be sorted.

  • 8/3/2019 DAA Lecture 3

    22/63

    Interpolation Search (1/9) Binary search is a great improvement

    over linear search eliminates large portion of the list without

    ll x min ll

    Values are fairly evenly distributed,interpolation can be used to

    eliminate more values at each step.

  • 8/3/2019 DAA Lecture 3

    23/63

    Interpolation Search (2/9) Interpolation is the process of

    using knowledge to guess theposition of an unknown value

    Indexes of known values in the list

    value should have. Interpolation search selects the

    dividing point by interpolation usingthe following code:

    m = l + (x a[l])*(r-l)/(a[r]-a[l])

  • 8/3/2019 DAA Lecture 3

    24/63

    Interpolation Search (3/9) Compare x to a[m]

    If x = a[m]: Found. If x a m : set l = m + 1

    If searching is still not finish, continuesearching with new l and r.

    Stop searching when Found or xa[r].

  • 8/3/2019 DAA Lecture 3

    25/63

    Interpolation Search (4/9)Example: Find the key x = 32 in the list

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

    1: l=1, r=20 -> m=1+(32-1)*(20-1)/(70-1) =10

    a[10]=21 l=11

    2: l=11, r=20 -> m=11+(30-24)*(20-11)/(70-24) = 12

    a[12]=32=x -> Found at m = 12

  • 8/3/2019 DAA Lecture 3

    26/63

    Interpolation Search (5/9)Example: Find the key x = 30 in the list

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

    1: l=1, r=20 -> m=1+(30-1)*(20-1)/(70-1) = 9

    a = =x - =2: l=10, r=20 -> m=10+(30-21)*(20-10)/(70-21) = 12

    a[12]=32>30=x -> r = 113: l=10, r=11 -> m=10+(30-24)*(11-10)/(24-

    21) = 12

    m=12>11=r: Not Found

  • 8/3/2019 DAA Lecture 3

    27/63

    Interpolation Search (6/9)Private Sub Interpolation(a[]: Int, x: Int, n: Int,

    Found: Boolean)l = 1: r = n

    Do While (r > l)

    m = l + ((x a[l]) / (a[r] a[l])) * (r - l)Verify and Decide What to do next

    Loop

    End Sub

  • 8/3/2019 DAA Lecture 3

    28/63

    Interpolation Search (7/9)

    Verify and Decide what to do next

    If (a[m] = x) Or (m < l) Or (m > r) ThenFound = iif(a[m] = x, True, False)Exit Do

    ElseIf (a[m] < x) Thenl = m + 1

    ElseIf (a[m] > x) Then

    r = m 1End If

  • 8/3/2019 DAA Lecture 3

    29/63

    Interpolation Search (8/9) Binary search is very fast (O(logn)), but

    interpolation search is much faster(O(loglogn)).

    For n = 2^32 (four billion items) Binary search took32 steps of verification Interpolation search tookonly 5 steps of

    verification.

  • 8/3/2019 DAA Lecture 3

    30/63

    Interpolation Search (9/9) Interpolation search performance

    time is nearly constant for a largerange of n.

    data had been stored on a hard diskor other relatively slow device.

  • 8/3/2019 DAA Lecture 3

    31/63

    Binary Search Tree (BST) Its a binary tree !

    For each node in a BST left subtree is smaller than it;

    an

    right subtree is greater than it.

  • 8/3/2019 DAA Lecture 3

    32/63

    Search Operation

    Search operation takestime O(h), where h isthe height of a BST

  • 8/3/2019 DAA Lecture 3

    33/63

    Operation Insert

  • 8/3/2019 DAA Lecture 3

    34/63

    Worst Case

  • 8/3/2019 DAA Lecture 3

    35/63

    Performance

    Depend on the shape of the tree

    Best Case: Perfectly balanced tree, log N nodes from

    root to leave

    Worst Case: N nodes in a search path

    Average Case: 1.39 log N comparisons for N keys

  • 8/3/2019 DAA Lecture 3

    36/63

    Balanced Tree

    Tree structures support various basic dynamicset operations in time proportional to the heightof the tree

    e.g.: Search, Predecessor, Successor, Minimum,, ,

    Ideally, a tree will be balanced and the heightwill be log nwhere nis the number of nodes

    in the tree To ensure that the height of the tree is as

    small as possible and therefore provide the

    best running time

  • 8/3/2019 DAA Lecture 3

    37/63

    Balanced BST

    BST Worst case O(N)

    Need to be balancedApproach:

    Recursive and linear time

    However, insertion cost quadratic

    Frequently rebalancing

    Is there a type of BST which guarantee??

    Every insert and search will be logarithmic

  • 8/3/2019 DAA Lecture 3

    38/63

    Top Down 2-3-4 Trees

    Nodes store 1, 2, or 3 keys and have 2,

    3, or 4 children, respectivelyAll leaves have the same depth

  • 8/3/2019 DAA Lecture 3

    39/63

    2-3-4 Tree Nodes

    Introduction of nodes with more than 1

    key, and more than 2 children

    -

    same as a binary node

    3 Node: 2 keys, 3 links

    4 Node:

    3 keys, 4 links

  • 8/3/2019 DAA Lecture 3

    40/63

    Why 2-3-4? (1/2)

    Why not minimize height by maximizing children ina d-tree?

    Let each node have d children so that we getO(logd N) search time! Right?

    That means if d = N1/2, we get a height of 2

  • 8/3/2019 DAA Lecture 3

    41/63

    Why 2-3-4? (2/2)

    However, searching out the correct childon each level requires O(log N1/2) by

    binary search

    2 log N1/2 = O(log N) which is not as good

    as we had hoped for! 2-3-4-trees will guarantee O(log N) height

    using only 2, 3, or 4 children per node

  • 8/3/2019 DAA Lecture 3

    42/63

    Insertion into 2-3-4 Trees (1/3)

    Insert the new key at the lowest internal

    node reached in the search 2-node becomes 3-node

    3-node becomes 4-node

    What about a 4-node?

    We cant insert another key!

  • 8/3/2019 DAA Lecture 3

    43/63

    Insertion into 2-3-4 Trees (2/3)

    In our way down the tree, whenever we

    reach a 4-node, we break it up into two2-nodes, and move the middle elementup into the parent node

  • 8/3/2019 DAA Lecture 3

    44/63

    Insertion into 2-3-4 Trees (3/3)

    Now we can perform the insertion using

    one of the previous two cases Since, we follow this method from the

    root down to the leaf it is called to

    down insertion

  • 8/3/2019 DAA Lecture 3

    45/63

    Splitting the Tree

    As we travel down the tree, if we

    encounter any 4-node we will break it upinto 2-nodes.

    his uarantees that we will never have

    the problem of inserting the middleelement of a former 4-node into itsparent 4-node.

  • 8/3/2019 DAA Lecture 3

    46/63

    Splitting the Tree

  • 8/3/2019 DAA Lecture 3

    47/63

    Splitting the Tree

    Time Complexity of Insertion

  • 8/3/2019 DAA Lecture 3

    48/63

    Time Complexity of Insertion

    in 2-3-4 Trees Time complexity:

    A search visits O(log N) nodesAn insertion requires O(log N) node splits

    Each node s lit takes constant time

    Operations Search and Insert eachtaketime O(log N)

    d

  • 8/3/2019 DAA Lecture 3

    49/63

    Beyond 2-3-4 Trees

    What do we know about 2-3-4 Trees?

    Balanced

    O(log N) search time

    Different node structures

    Can we get 2-3-4 tree advantages ina binary tree format???

    Welcome to the world of Red-Black Trees!!!

  • 8/3/2019 DAA Lecture 3

    50/63

    Best both methods

    Search in BST Insert in 2-3-4 search tree

    R d Bl k T

  • 8/3/2019 DAA Lecture 3

    51/63

    Red-Black Tree

    A red-black tree is a binary search tree withthe following properties:

    edges are colored red or black

    no two consecutive red ed es on an root-leaf

    path same number of black edges on any root-leaf

    path (= black height of the tree)

    edges connecting leaves are black

    R d Bl k T

  • 8/3/2019 DAA Lecture 3

    52/63

    Red-Black Tree

    2 3 4 T E l i

  • 8/3/2019 DAA Lecture 3

    53/63

    2-3-4 Tree Evolution

    How 2-3-4 trees relate to red-black trees

  • 8/3/2019 DAA Lecture 3

    54/63

    Insertion into Red-Black Tree1. Perform a standard search to find the leaf where

    the key should be added

    2. Replace the leafwith an internal node with thenew key

    .

    4. Add two new leaves, and color their incomingedges black

    Inse tion into Red Black T ee

  • 8/3/2019 DAA Lecture 3

    55/63

    Insertion into Red-Black Tree

    If the parent had an incoming red edge,we now have two consecutive red edges!

    We must re-organize tree to remove thatviolation.

    What must be done depends on the siblingof the parent.

    I ti Pl i d Si l

  • 8/3/2019 DAA Lecture 3

    56/63

    Insertion - Plain and Simple

    Right Left Rotation

  • 8/3/2019 DAA Lecture 3

    57/63

    Right Left Rotation

    Restructuring

  • 8/3/2019 DAA Lecture 3

    58/63

    Restructuring

    Case 2: Incoming edge of p is red,and its sibling is black

  • 8/3/2019 DAA Lecture 3

    59/63

    Similar to a right rotation, we can do aleft rotation...

    Double Rotation

  • 8/3/2019 DAA Lecture 3

    60/63

    Double Rotation

    What if the new node is between its parent andgrandparent in the inorder sequence?

    We must perform a double rotation(which is nomore difficult than a single one)

    This would be called a left-right double rotation

    Last of the Rotations

  • 8/3/2019 DAA Lecture 3

    61/63

    Last of the Rotations

    And this would be called a right-leftdouble rotation

    Bottom-Up Rebalancing

  • 8/3/2019 DAA Lecture 3

    62/63

    Bottom-Up Rebalancing

    Case 3: Incoming edge of p is red and itssibling is also red

    We call this a promotion

    Note how the black depthremains unchanged for allof the descendants ofg This process will continue

    upward beyondg if necessary: rename gas n and repeat.

    Summary of Insertion

  • 8/3/2019 DAA Lecture 3

    63/63

    Summary of Insertion

    If two red edges are present, we do either

    a restructuring(with a simple or doublerotation)

    and stop, or apromotion and continue

    A r r rin k n n im n i

    performed at most once. It reorganizes an off-balanced section of the tree.

    Promotions may continue up the tree and are

    executed O(log N) times. The time complexity of an insertion is

    O(logN).