LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong

LING/C SC/PSYC 438/538

Lecture 59/8

Sandiway Fong

Administrivia

• Homework 1 (from lecture 3) – was due last night (at midnight)

Today’s Topics

• Review– Homework 1– We’ll go through it in class today

• Chapter 2 of JM– Section 2.1 on regular expressions– (which you’ve already read…)

Safari Book available online

(Thanks! Don Merson)• UA Library has been

given access to the full Safari Books Online service.

• This allows you to read a vast number of technical books via your browser.

• However, it is currently only a trial.

http://proquest.safaribooksonline.com.ezproxy1.library.arizona.edu/

Homework Review• Question 1: 438 and 538 (7 points)

– Given– @sentence1 = (I, saw, the, the, cat, on, the, mat);– @sentence2 = (the, cat, sat, on, the, mat);– Write a simple Perl program which detects repeated words (many spell

checker/grammar programs have this capability)– It should print a message stating the repeated word and its position if

one exists– e.g. word 3 “the” is repeated in the case of sentence1– No repeated words found in the case of sentence2– note: output multiple messages if there are multiple repeated words– Hint: use a loop– Submit your Perl code and show examples of your program working

Homework Review

• Thinking algorithmically…

w1 w2 w3 w4 w5

Compare w1 with w2

Compare w2 with w3

Compare w3 with w4

Compare w4 with w5

Array indices start from 0…

Homework Review

• Turning an algorithm into Perl code:

Compare w1 with w1+1

Compare w2 with w2+1

Compare wn-2 with wn-2+1

Compare wn-1 with wn

“for” loop implementation

words

0 ,words

1 … w

ordsn-1

for ($i=0; $i<$#words; $i++) {

compare word indexed by $i to word indexed by $i+1if same string, print message

}

array @words

Array indices end at $#words…

Homework Review

• First iteration (there are many ways to do this…)– (the basic for-loop)

my @sentence1 = (I, saw, the, the, cat, on, the, mat);my @sentence2 = (the, cat, sat, on, the, mat);

my @words = @sentence1;

for ($i=0; $i<$#words; $i++) {

if ($words[$i] eq $words[$i+1]) {print "word $i \"$words[$i]\" is

repeated\n" }}

Homework Review

• 2nd iteration– (setting a flag when a repeated word is found)– (condition the output based on the value of the flag)my $flag = 0;

for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) {

print "word $i \"$words[$i]\" is repeated\n";$flag = 1

}}

print "No words repeated\n" unless $flag

Homework Review

• 3rd iteration– (encapsulating the loop in a subroutine) sub check_repeated { my @words = @_; my $flag = 0;

for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1

} } print "No words repeated\n" unless $flag}

print "@sentence1\n";check_repeated(@sentence1);

print "@sentence2\n";check_repeated(@sentence2);

Homework Review

• Question 2: 438 and 538 (3 points)– Describe what would it take to stop a repeated word

program from flagging legitimate examples of repeated words in a sentence

– (No spell checker/grammar program that I know has this capability)

– Examples of legitimately repeated words:• I wish that that question had an answer • Because he had had too many beers already, he skipped the

Friday office happy hour

Homework Review

• Question 3: 538 (10 points), (438 extra credit)– Write a simple Perl program that outputs word frequencies for a

sentence– E.g. given– @sentence1 = (I, saw, the, cat, on, the, mat, by, the, saw, table);– output a summary that looks something like:– the occurs 4 times– saw occurs twice– I, car, mat, on, by, table occurs once only

– Hint: build a hash keyed by word with value frequency– Submit your Perl code and show examples of your program working

Homework Review

• Thinking algorithmically…

w1 w2 w3 w4 w5w0

foreach $word (@sentence)

w0

hash data structure = “labeled medicine cabinet”

Homework Review• Sample answer

@sentence = (the, cat, sat, on, the, mat, that, the, cat, likes, most);

%freq = ();

foreach $word (@sentence) { if (exists $freq{$word}) {

$freq{$word}++; } else {

$freq{$word} = 1; }}

foreach $word (keys %freq) { print "$word occurs $freq{$word} time(s)\n";}

perl e2.prl on occurs 1 time(s)the occurs 3 time(s)cat occurs 2 time(s)most occurs 1 time(s)sat occurs 1 time(s)likes occurs 1 time(s)that occurs 1 time(s)mat occurs 1 time(s)

Further simplifications to the code are possible but the basic logic remains

Chapter 2: JM

• Today– using your Perl skills on– Section 2.1 Regular Expressions– Online tutorials• http://perldoc.perl.org/perlrequick.html• http://perldoc.perl.org/perlretut.html

http://perldoc.perl.org/perlrequick.html

http://perldoc.perl.org/perlretut.html

Pattern Matching

JM, Chapter 2, pg 17

Merriam-Webster online

Chapter 2: JM

• Perl regular expression (re) matching:– $a =~ /foo/

– /…/ contains a regular expression– will evaluate to true/false depending on what’s contained in

$a

• Perl regular expression (re) match and substitute:– $a =~ s/foo/bar/

– s/…match… /…substitute… / contains two expressions– will modify $a by looking for a single occurrence of match and

replacing that with substitute

– s/…match… /…substitute… /g global match and substitute

Chapter 2: JM

• Most useful with code for reading in a file line-by-line:

open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n";while ($line = <$txtfile>) {

do RE stuff with $line

}

Chapter 2: JM

Chapter 2: JM

Chapter 2: JM

Chapter 2: JM

Sheeptalk

Chapter 2: JM

Chapter 2: JM• Precedence of operators

– Example: Column 1 Column 2 Column 3 …– /Column [0-9]+ */– /(Column [0-9]+ *)*/– /house(cat(s|)|)/

• Perl:– In a regular expression the pattern matched by within the pair of parentheses is

stored in $1 (and $2 and so on)

• Precedence Hierarchy:

Chapter 2: JM

A shortcut: list context for matching

http://perldoc.perl.org/perlretut.html

Chapter 2: JM

• s/([0-9]+)/<\1>/

Documents

LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong