Upload
nicholas-osborne
View
215
Download
0
Embed Size (px)
Citation preview
LING/C SC/PSYC 438/538
Lecture 59/8
Sandiway Fong
Administrivia
• Homework 1 (from lecture 3) – was due last night (at midnight)
Today’s Topics
• Review– Homework 1– We’ll go through it in class today
• Chapter 2 of JM– Section 2.1 on regular expressions– (which you’ve already read…)
Safari Book available online
(Thanks! Don Merson)• UA Library has been
given access to the full Safari Books Online service.
• This allows you to read a vast number of technical books via your browser.
• However, it is currently only a trial.
http://proquest.safaribooksonline.com.ezproxy1.library.arizona.edu/
Homework Review• Question 1: 438 and 538 (7 points)
– Given– @sentence1 = (I, saw, the, the, cat, on, the, mat);– @sentence2 = (the, cat, sat, on, the, mat);– Write a simple Perl program which detects repeated words (many spell
checker/grammar programs have this capability)– It should print a message stating the repeated word and its position if
one exists– e.g. word 3 “the” is repeated in the case of sentence1– No repeated words found in the case of sentence2– note: output multiple messages if there are multiple repeated words– Hint: use a loop– Submit your Perl code and show examples of your program working
Homework Review
• Thinking algorithmically…
w1 w2 w3 w4 w5
Compare w1 with w2
Compare w2 with w3
Compare w3 with w4
Compare w4 with w5
Array indices start from 0…
Homework Review
• Turning an algorithm into Perl code:
Compare w1 with w1+1
Compare w2 with w2+1
Compare wn-2 with wn-2+1
Compare wn-1 with wn
“for” loop implementation
words
0 ,words
1 … w
ordsn-1
for ($i=0; $i<$#words; $i++) {
compare word indexed by $i to word indexed by $i+1if same string, print message
}
array @words
Array indices end at $#words…
Homework Review
• First iteration (there are many ways to do this…)– (the basic for-loop)
my @sentence1 = (I, saw, the, the, cat, on, the, mat);my @sentence2 = (the, cat, sat, on, the, mat);
my @words = @sentence1;
for ($i=0; $i<$#words; $i++) {
if ($words[$i] eq $words[$i+1]) {print "word $i \"$words[$i]\" is
repeated\n" }}
Homework Review
• 2nd iteration– (setting a flag when a repeated word is found)– (condition the output based on the value of the flag)my $flag = 0;
for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) {
print "word $i \"$words[$i]\" is repeated\n";$flag = 1
}}
print "No words repeated\n" unless $flag
Homework Review
• 3rd iteration– (encapsulating the loop in a subroutine) sub check_repeated { my @words = @_; my $flag = 0;
for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1
} } print "No words repeated\n" unless $flag}
print "@sentence1\n";check_repeated(@sentence1);
print "@sentence2\n";check_repeated(@sentence2);
Homework Review
• Question 2: 438 and 538 (3 points)– Describe what would it take to stop a repeated word
program from flagging legitimate examples of repeated words in a sentence
– (No spell checker/grammar program that I know has this capability)
– Examples of legitimately repeated words:• I wish that that question had an answer • Because he had had too many beers already, he skipped the
Friday office happy hour
Homework Review
• Question 3: 538 (10 points), (438 extra credit)– Write a simple Perl program that outputs word frequencies for a
sentence– E.g. given– @sentence1 = (I, saw, the, cat, on, the, mat, by, the, saw, table);– output a summary that looks something like:– the occurs 4 times– saw occurs twice– I, car, mat, on, by, table occurs once only
– Hint: build a hash keyed by word with value frequency– Submit your Perl code and show examples of your program working
Homework Review
• Thinking algorithmically…
w1 w2 w3 w4 w5w0
foreach $word (@sentence)
w0
hash data structure = “labeled medicine cabinet”
Homework Review• Sample answer
@sentence = (the, cat, sat, on, the, mat, that, the, cat, likes, most);
%freq = ();
foreach $word (@sentence) { if (exists $freq{$word}) {
$freq{$word}++; } else {
$freq{$word} = 1; }}
foreach $word (keys %freq) { print "$word occurs $freq{$word} time(s)\n";}
perl e2.prl on occurs 1 time(s)the occurs 3 time(s)cat occurs 2 time(s)most occurs 1 time(s)sat occurs 1 time(s)likes occurs 1 time(s)that occurs 1 time(s)mat occurs 1 time(s)
Further simplifications to the code are possible but the basic logic remains
Chapter 2: JM
• Today– using your Perl skills on– Section 2.1 Regular Expressions– Online tutorials• http://perldoc.perl.org/perlrequick.html• http://perldoc.perl.org/perlretut.html
Pattern Matching
JM, Chapter 2, pg 17
Merriam-Webster online
Chapter 2: JM
• Perl regular expression (re) matching:– $a =~ /foo/
– /…/ contains a regular expression– will evaluate to true/false depending on what’s contained in
$a
• Perl regular expression (re) match and substitute:– $a =~ s/foo/bar/
– s/…match… /…substitute… / contains two expressions– will modify $a by looking for a single occurrence of match and
replacing that with substitute
– s/…match… /…substitute… /g global match and substitute
Chapter 2: JM
• Most useful with code for reading in a file line-by-line:
open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n";while ($line = <$txtfile>) {
do RE stuff with $line
}
Chapter 2: JM
Chapter 2: JM
Chapter 2: JM
Chapter 2: JM
Sheeptalk
Chapter 2: JM
Chapter 2: JM• Precedence of operators
– Example: Column 1 Column 2 Column 3 …– /Column [0-9]+ */– /(Column [0-9]+ *)*/– /house(cat(s|)|)/
• Perl:– In a regular expression the pattern matched by within the pair of parentheses is
stored in $1 (and $2 and so on)
• Precedence Hierarchy:
Chapter 2: JM
A shortcut: list context for matching
http://perldoc.perl.org/perlretut.html
Chapter 2: JM
• s/([0-9]+)/<\1>/