If you can't read please download the document
Upload
benjamin-walsh
View
218
Download
0
Embed Size (px)
DESCRIPTION
key-indexed counting LSD radix sort MSD radix sort String Sorts key-indexed counting LSD radix sort MSD radix sort ACKNOWLEDGEMENTS: http://algs4.cs.princeton.edu
Citation preview
String Sorts Tries Substring Search: KMP, BM, RK
Lecture 16 Strings String Sorts Tries Substring Search: KMP, BM, RK
key-indexed counting LSD radix sort MSD radix sort
String Sorts key-indexed counting LSD radix sort MSD radix sort
ACKNOWLEDGEMENTS: key-indexed counting LSD radix sort MSD radix
sort
String Sorts key-indexed counting LSD radix sort MSD radix sort
Review: sorting algorithms
Lower bound. ~NlgN compares required byany compare-based algorithm.
Key-indexed counting assumptions
Assumption. Keys are integers between 0 and R - 1. Implication. Can
use key as an array index. Applications. Sort string by first
letter. Sort class roster by section. Sort phone numbers by area
code. Subroutine in a sorting algorithm. Key-indexed counting
demo
Goal. Sort an array a[] of N integers between 0 and R - 1. Count
frequencies of each letter using key as index. Compute frequency
cumulates which specify destinations. Access cumulates using key as
index to move items. Copy back into original array. Key-indexed
counting demo
Goal. Sort an array a[] of N integers between 0 and R - 1. Count
frequencies of each letter using key as index. Compute frequency
cumulates which specify destinations. Access cumulates using key as
index to move items. Copy back into original array. Key-indexed
counting demo
Goal. Sort an array a[] of N integers between 0 and R - 1. Count
frequencies of each letter using key as index. Compute frequency
cumulates which specify destinations. Access cumulates using key as
index to move items. Copy back into original array. Key-indexed
counting demo
Goal. Sort an array a[] of N integers between 0 and R - 1. Count
frequencies of each letter using key as index. Compute frequency
cumulates which specify destinations. Access cumulates using key as
index to move items. Copy back into original array. Key-indexed
counting demo
Goal. Sort an array a[] of N integers between 0 and R - 1. Count
frequencies of each letter using key as index. Compute frequency
cumulates which specify destinations. Access cumulates using key as
index to move items. Copy back into original array. Key-indexed
counting analysis
Proposition. Key-indexed counting takes time proportional toN+R.
Proposition. Key-indexed counting uses extra spaceproportional to
N+R. Stable? Yes. key-indexed counting LSD radix sort MSD radix
sort
String Sorts key-indexed counting LSD radix sort MSD radix sort
Least-significant-digit-first string sort
LSD string (radix) sort. Consider characters from right to left.
Stably sort using dth character as the key (using
key-indexedcounting). LSD string sort: correctness proof
Proposition. LSD sorts fixed-length strings in ascending order. Pf.
[ by induction on i ] After pass i, strings are sorted by last i
characters. If two strings differ on sort key,key-indexed sort puts
them inproper relative order. If two strings agree on sort
key,stability keeps them in properrelative order. Proposition. LSD
sort is stable. Pf. Key-indexed counting is stable. Summary:
sorting algorithms key-indexed counting LSD radix sort MSD radix
sort
String Sorts key-indexed counting LSD radix sort MSD radix sort
Reverse LSD Consider characters from left to right.
Stably sort using dth character as the key (using
key-indexedcounting). Most-significant-digit-first string
sort
MSD string (radix) sort. Partition array into R pieces according to
first character(use key-indexed counting). Recursively sort all
strings that start with each character(key-indexed counts delineate
subarrays to sort). MSD string sort example Variable-length
strings
Treat strings as if they had an extra char at end (smaller thatany
char). C strings. Have extra char \0 at end => no extra work
needed. MSD string sort problem
Observation 1. Much too slow for smallsubarrays. Each function call
needs its own count[] array. ASCII (256 counts): 100x slower than
copypass for N=2. Unicode (65536 counts): 32000x slower forN=2.
Observation 2. Huge number of small subarraysbecause of recursion.
Summary: sorting algorithms Retrieve DFA simulation Trie Radix Tree
Suffix Trie(Tree)
Tries Retrieve DFA simulation Trie Radix Tree Suffix Trie(Tree)
Alphabet Word:{i,a,in,at,an,inn,int,ate,age,adv,ant}
letters on the path prefix of the word Common Prefix Common
Ancestor Leaf Node longest prefix Construct Node Find word Add word
Is Leaf?(Is End?) Edge.
Edge of next letter? Exist: jump to next Node. Not exist: return
false. Is end of the word Add word Not exist: add new Node, jump to
it. Mark. Example Find ant Find and Add and Other version
Child-Brother Tree Double Array Trie
Binary Tree Double Array Trie Ternary search tries How to save
edge? Array List BST Analysis Time complexity Space complexity Add:
length of string
Find: length of string Space complexity Total length of string
Radix Tree Internal node has least two child
Leaf Node =2 oSoS TSST (4). S1S2 S1#S2$#$(#) Suffix
Tree:construct
Time complexity Construction A Trie O(n^2) Esko Ukkonen Algorithm
Nodes = 0; --i) if (suff[i] == i + 1) for (; j < m i; ++j) if
(bmGs[j] == m) bmGs[j] = m i; for (i = 0; i =m) Output: s -->
T[s+1, s+2, .. s+m] (s 1: Hash will help. Detailed Procedure (For
convenience, numbers only.) Detailed Procedure Optimization:
10*(31415 3*10000)+2(mod 13)
(Dynamic Programming) Pseudo Space Complexity: O(n-m) Time
Complexity: O(m*(n-m))
RABIN-KARP-MATCHER(T, P, d, q) n = T.length m = P.length h =
d^(m-1) mod q p = 0 t[0] = 0 for i = 1 to m p = (d*p + P[i]) mod q
t[0] = (d*t[0]+T[i]) mod q for s = 0 to n-m if p==t[s] if P[1..m]
== T[s+1..s+m] print (Find P with shift: +s) if s < n-m t[s+1] =
(d*(t[s]-T(s+1)*h) + T[s+m+1]) mod q Space Complexity: O(n-m) Time
Complexity: O(m*(n-m)) Analysis pros When q increases Reduce the
chance of conflicts
Decrease the time of confirmation cons (May)Increase the space
requirement (May)Increase the time of Mod operation