Upload
william-edwards
View
214
Download
1
Embed Size (px)
Citation preview
Computers can process text as well as numbers◦ Example: a news agency might want to find all
the articles on Hurricane Katrina as part of the tenth anniversary of this disaster. They would have to search for the words “hurricane and “Katrina”
String
How do we represent text such as:◦ Words◦ Sentences◦ Characters
In programming we call these sequences Strings.◦ A sequence is a collection or composition of elements in a
specific order◦ For strings, the elements are characters
This includes: Punctuation Spaces Numeric characters
◦ The order is a spelling of a word or structure of a sentence. Not always true.
String
A string can be:◦ Empty◦ Non-empty
Must start with a character and be followed by a string
Definition 1 A string is one of the following:◦ An empty string◦ A non-empty string, which has the following parts:
A head, which is a single character, followed by, A tail, which is a string
Strings
When processing strings we must be able to check for:◦ Empty string
def strIsEmpty(str) ◦ The head
def strHead(str)◦ The tail
def strTail(str)◦ To construct strings from other strings
def strConcat(str1, str2)◦ We will implemeant these later….
String Opertations
A Python string is:◦ A sequence of characters◦ The sequence can be explicitly written between
two quotation marks Python allows single or double quotes
◦ Examples: ‘Hello World’ “Hello World” ‘’ (the empty string is a valid string) ‘a’ “abc”
Python Strings
Like numbers, string can be assigned to variables◦ str = ‘Hello World’
You can asscess parts of strings via indexing◦ str[n] means give me the nth-1 character of the
string str◦ str[0] == ‘H’◦ str[1] == ‘e’◦ str[2] == ‘l’◦ str[5] == ‘ ‘
Python Strings
Another way to access parts of a string is via slicing◦ str[m:n] means give me the part of the string from
character m up to but not including n◦ Both m and n are optional
If m is omitted then it starts from the beginning of the string
If n is omitted it ends at the end of the string◦ Examples:
Str = ‘Hello World’ Str[1:4] == ‘ell’ Str[:5] == ‘Hello’ Str[1:] == ‘ello World’
Python Strings
You can concatenate string, or put two string together◦ The plus-sign (+)
Examples◦ ‘Hello’ + ‘World’ == ‘Hello World’◦ ‘a’ + ‘ab’ == ‘aab’◦ ‘ab’ + ‘c’ == ‘abc’◦ ‘a’ + ‘b’ + ‘c’ == ‘abc’
Python Strings
Python strings are immutable◦ Means parts of strings cannot be changed using the
assignment operator◦ If we assign a new value to a string it replaces the old value
Basically a new string Example:
◦ Str = ‘abc’◦ Str[0] == ‘a’◦ Str[0] = ‘z’
We get: ◦ “Traceback (most recent call last):
File "<stdin >", line 1, in <module >TypeError: ’str’ object does not support item assignment“
Python Strings
def strIsEmpty(str): if str == ‘’: return True else: return False
def strHead(str): return str[:1]
def strTail(str): return str[1:]
def strConcat(str1, str2): return str1 + str2
Python String Operations
Computing length of strings◦ Python has a function to do this◦ But we will write our own version so we can:
Learn about strings And how to process them
Think about this: lengthRec(‘abc’) == 3
Python String Operations
Lets break down string length as a recursive function◦ ‘’: the empty string, length is 0◦ ‘a’: string with one character, length = 1
Head length = 1 tail length is empty string, length = 0
◦ ‘ab’: string with two character, length = 2 Head length = 1 tail is string with one character, length = 1
◦ ‘abc’: string with three characters. Length = 3 Head length = 1 tail is a string with two characters, length = 2
Notice the pattern????
Python String Operations
The pattern leads us to: def lengthRec(str): if str == ‘’: return 0 else: return 1 + lengthRec(strTail(str))
Python String Operations
Computing the reversal of a string◦ reverseRec(‘abc’) == ‘cba’
Lets solve for this like any other recursive function
Python String Operations
String reversal cases:◦ ‘’: emptystring reversed is the empty string◦ ‘a’: a single character reversed is the same string◦ ‘ab’: two character string
‘b’ + ‘a’ == ‘ba’ strTail(‘ab’) + strHead(‘ab’)
◦ ‘abc’: three character string ‘c’ + ‘b’ + ‘a’ == ‘cba’ strTail(strTail(‘abc’)) + strHead(strTail(‘abc’)) + strHead(‘abc’) strTail(‘bc’) + strHead(‘bc’) + strHead(‘abc’)
Reversal of string ‘bc’ + strHead(‘abc’)
Python String Operations
From this we get: def reverseRec(str): if str == ‘’: return ‘’ else: return reverseRec(strTail(str) + strHead(str)
Python String Operations
Accumulative Length◦ Recall the recursive form we did earlier:
def lengthRec(str): if str == ‘’: return 0 else: return 1 + lengthRec(strTail(str))
◦ How can we change this to use an accumulator variable?? Notice we are returning zero for the empty case and
1 + the recursive call for the other case
Accumulative Recursion
Accumulative Length◦ We can add the accumulator variable and add 1
for the recursive call and return it for the empty string: def lengthAccum(str, ac):
if str == ‘’: return ac else: return lengthAccum(strTail(str), ac + 1)
Basically: if we have a head, add one to the acummulator and
check the tail Else return the accumulator variable since we have
nothing else to count
Accumulative Recursion
Accumulative Reverse◦ Recall the recursive form we did earlier:
def reverseRec(str): if str == ‘’: return ‘’ else: return reverseRec(strTail(str) + strHead(str)
◦ How can we change this to use an accumulator variable?? Notice we are returning the empty string for the
empty case and the recursive call + the strHead for the other case
Accumulative Recursion
Accumulative Reverse◦ We can add the accumulator variable and add the
strHead to the accumulator for each recursive call: def reverseAccum(str, ac):
if str == ‘’: return ac else: return reverseAccum(strTail(str), strHead(str) +ac)
Accumulative Recursion
We often can replace recursion with iteration◦ This requires the use of a new type of Python
statement◦ The for loop
String Iteration
With the for loop when can convert our recursive string operations to iterative forms◦ Recall the accumulative form:
def lengthAccum(str, ac): if str == ‘’: return ac else: return lengthAccum(strTail(str), ac + 1)
◦basically we ran the function over and over again adding one to ac until we hit the empty string…
How could we make that into a for loop???
String Iteration
With a for loop we can avoid recursion def lenghtIter(str): ac = 0 for ch in str: ac = 1 + ac return ac
String Iteration
This can work for reverse too!◦ def reverseAccum(str, ac):
if str == ‘’: return ac else: return reverseAccum(strTail(str), strHead(str) +ac)
Becomes:◦ def reverseIter(str):
ac = ‘’ for ch in str: ac = hd + ac return ac
String Iteration
Some times we want to access parts of a string by index◦ This means iterating over the range of values 0,
1, 2, …, len(str) -1 len(str) is a built in Python function for string length
◦ To do this we have a special for loop in Python for i in range(0, len(str)) This says for all character in str starting at index 0 to
the last index of the string, do x It does not have to be the whole string for I in range(2, 5)… this will do all i’s 2, 3, 4
Index Values and Range
Say we do not want to type in a long string every time…◦ For instance we want to find and remove all of the
instances of a word in a report… How can we do that without using an input
statement and entering the entire text manually?
Files
Python can read files!◦ Lets look at a basic function to hide a word in a
string◦ def hide(textFileString, hiddenWord):
for currentWord in textFileString: if currentWord == hiddenWord: print(‘---’) else: print currentWord
Files
How do we do this from a text file and not a string?◦ To make this problem simple we will assume only one
word per line in the file Do the file reads, the spaces are shown on purpose as _:
word1__word2___word3__word4_word5word6
word7
Files
How can we read this file?◦ It is actually pretty easy in Python◦ for line in open(‘text1.txt’):
print(line) ◦ This give us:
word1◦
__word2
◦ ___word3
◦ __word4
◦ _word5
◦ word6
◦ word7
Files
Notice the spaces are still there and it
appears to have more space between lines!
The extra space between lines is due to:◦ The print function adds a new line◦ The original new line from the file is still in the string◦ If we were to make a single concatenated string we would see the
original file contents◦ str = ‘’
for line in open(‘text1.txt’): str = str + lineprint(str) word1
__word2___word3__word4_word5word6
word7
Files
We can make printing better◦ print(str, end=‘’)
Print will not generate newlines◦ We still have an issue for our problem
Newlines are still there from the file◦ for line in open(‘text1.txt’):
print(line == ‘word1’)◦ We would see false for all line
Even thought word1 looks like it exists Due to the new line from the file!
Files
We can use a Python feature called strip◦ This removes all whitespace from a string
Whitespace: Newlines Spaces
◦ We call it a bit different that other string functions str.strip() It returns the ‘stripped’ string
Files
Using strip we get for line in open(‘text1.txt’):
print(line.strip() == ‘word1’) Which results in:
True False False False False False False False
Files
We orginally made this function to hide a word…◦ But it can be used to find a word too◦ We look for, or search for, the word to replace it◦ We look for all occurrences of the word…
Sometimes we only want to find the first time a word happens
Files
Linear Search◦ The process of visiting each element in a
sequence in order stopping when either: we find what we are looking for We have visited every element, and not found a
match◦ Often we wish to characterize how long an
algorithm takes Measuring in seconds often depends on the
hardware, operating system, etc◦ We want to avoid such details
How do we do this?
Files
We avoid this by focusing on the size of the input, N…◦ And characterize the time spend as a function of
N◦ This function is called the time complexity
One way to measure performance is to count the number of operations or statements the code makes
Files
For the hide function we:◦ Strip the current word◦ Compare it to the hidden word◦ And then prints the results
Since this is a loop, this occurs for each word◦ Even thought the operations can take different
times individually, as the number of words grow this time difference are very small They are considered a constant amount
Files
For a linear search we can see that different size of N can lead to different behaviors and times…◦ Say the element we are looking for is the first
element… it’s very fast….◦ What if it is the last element?
We must then look at every element If n is 10 then the slow case is no problem.. What if it was 10 billion?
Files
Typically we are interested in how bad it can get…◦ Or the Worst-case analysis
Consider the worst-case analysis of the linear search…◦ For each element we spend some constant time◦ Plus some fixed time to startup and end the
search◦ If processing of an element takes constant time k
Then search all N elements should take k * N Similarly the startup and end time is some constant c So the time to run the linear search is (k * N) + c
Files
So the time to run the linear search is (k * N) + c◦ What are k and c?
Most of the time we simply do not care….◦ It is often good enough to know that the time
complexity is a linear function with a non-zero slope! The ,mathematical way to ignore this is to say the
time complexity of linear search is O(N) It is pronounced “Order N” This is known as “Big-O” notation
Files
“Big-O” notation◦ Makes it easy to compare algorithms…◦ Constant time, like comparing two characters,
O(1)◦ There are some algorithms that are O(N2) and
O(N3)◦ We prefer O(1) to O(N), O(N) to O(N2)…◦ There are possibly time complexities in between
these…
Files