23
CSE 341 S. Tanimoto 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2. Acceptance by Finite Automata 3. Perl’s Syntax 4. Other pattern matching functionality in Perl 5. Program Example

CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

Embed Size (px)

Citation preview

Page 1: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

1

Regular Expressions:Theory and Perl Implementation

Outline:

1. Theoretical Definitions and Examples2. Acceptance by Finite Automata3. Perl’s Syntax4. Other pattern matching functionality in Perl5. Program Example

Page 2: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

2

Alphabets and Sets of StringsAn alphabet = {a1, a2, ..., an} is a set of characters.

A string over is a sequence of zero or more elements of .

Example. If = {0, 1, 2} then 2201 is a string over .

No matter what is, the empty string is a string over .

A set of strings over is a set of zero or more strings, each of which is a string over .

Example. If = {0, 1, 2} then {, 111, 121, 0} is a set of strings over .

Page 3: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

3

A Recursive Definition for Regular Expressions

A regular expression for an alphabet is a certain kind of pattern that describes a set of strings over .

Any character c in is a regular expression representing {c}

If E, E1 and E2 are regular expressions over then so areE1 E2 -- representing the set concatenation of E1 and E2.E1 | E2 -- representing alternation of E1 and E2.( E ) -- representing E grouped with parentheses.E+ -- rep. one or more instances of E concatenated.E* -- zero or more instances of E

Page 4: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

4

Regular Expression ExamplesLet = {a, b}.

a = {a}ab = {ab}a | b = {a, b}a+ = {a, aa, aaa, ... }

ab* represents the set of strings having a single a followed by zero or more occurrences of b.That is, it’s {a, ab, abb, abbb, ... }

a (b | c) = {ab, ac}

(a | b) (c | d) = {ac, ad, bc, bd}aa* = a+ = {a, aa, aaa, ... }

Page 5: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

5

Extended Regular Expressions

Let letters = a | b | c | dLet digits = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Let identifiers = letters ( letters | digits )*

Thus we can use a name to represent a set of strings and use that name in a regular expression.

Page 6: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

6

Finite Automaton

a

b

a

corresponding regular expression: ab*a

start state accepting state

Example: process the string abba

Now try abbb

Finite number of states, but number of strings is not necessarily finite.

Page 7: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

7

Equivalence of Finite Automata and Regular Expressions

b

a

ab

a | b

a*

a

a

a

b

a

Page 8: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

8

Regular Expressions in PerlIn Perl, regular expressions are used to specify patterns for pattern matching.

$sentence = ”Winter weather has arrived."

if ($sentence =~ /weather/) { print "Never bet on the weather." ;}

# $string =~ /Pattern/

The result of this kind of pattern matching is a true or false value.

Page 9: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

9

A Perl Regular Expressionfor Identifier

$identifier = "[a-z][a-z0-9]*";

$sentence = "012,cse341 341,ABC]*";

if ($sentence =~ /$identifier/) { print "Seems to be an identifier here." ;}

$ident2 = "[a-zA-Z][a-zA-Z0-9]*";$reservedWord = "begin|end";

Page 10: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

10

Specifying Patterns/Pattern/ # Literal text;# true if it occurs anywhere in the string.

/^Pattern/ # Must occur at the beginning."Pattern recognition is alive" =~ /^Pattern/

"The end" =~ /end$/

\s whitespace \S non-whitespace\w a word char. \W a non-word char. [a-zA-Z_0-9]\d a digit \D a non-digit\b word boundary \B not word boundary

Page 11: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

11

Specifying Patterns (Cont.)$test = "You have new mail -- 11-13-00";if ($test =~ /^You\s.+\d+-\d+-\d+/ ) { print "The mail has arrived."; }

if ($test =~ m( ^ You \s .+ \d+ - \d+ - \d+ ) { print "The mail has arrived."; }

Page 12: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

12

Extracting Information$test = "You have new mail -- 11-13-00";

if ($test =~ /^You\s.+(\d+)-(\d+)-(\d+)/ ) { print "The mail has arrived on "; print "day $2 of month $1 in year $3.\n"; }

# Parentheses in the pattern establish# variables $1, $2, $3, etc. to hold# corresponding matched fragments.

Page 13: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

13

Search and Replace$sntc = "We surfed the waves the whole day."$sntc =~ s/surfed/sailed/;print $sntc;# We sailed the waves the whole day.

$sntc =~ s/the//g;print $sntc;# We sailed waves whole day.

# g makes the replacement “global”.

Page 14: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

14

Interpolation of Variables in Replacements

$exclamation = "yeah";

$sntc = "We had fun."$sntc =~ s/w+/$exclamation/g;print $sntc;

# yeah yeah yeah.

# a pattern can contain a Perl variable.

Page 15: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

15

Example of (Crude) Lexical Analysis

$ident = "[a-zA-Z][a-zA-Z0-9]*";$int = "[\-]?[0-9]+";$op = "[\-\+\*\/\=]|mod";

$exp = "begin x = 5; print sqrt(x); end";$exp =~ s/$ident/ID/g;$exp =~ s/$int/N/g;$exp =~ s/$op/OP/g;

print $exp;ID ID OP N; ID ID(ID); ID

Page 16: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

16

Processing Assignment Submissions Using Forms and Files

1. Form file

2. Perl script to process data from form.

3. Perl script to “compile” data into an index page.

Page 17: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

17

The HTML Form <html><head><title>Submission for CSE 341 Miniproject Topic Proposals</title></head><body>

<h1>CSE 341 Miniproject Topic Proposal Submission Form</h1>Write a topic-proposal web page, and thenfill out this form and submit it byThursday, February 24 at 5:00 PM.(The web page should follow these<a href="http://www.cs.washington.edu/education/courses/341/00wi/MP-topic-proposal-guidelines.html">guidelines</a>.)

<br><form method=post action="http://cubist.cs.washington.edu/~tanimoto/341-student/process-topic-proposal.pl">

Page 18: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

18

The HTML Form (2 of 2) <br>Possible name of project:<input type=text name=projectname value="" size=40>

<br>Name of Possible partner (optional):<input type=text name=partner value="">

<br>URL of a web page that describes your proposal:<input type=text name=proposalurl value="" size=40>

<br>If you plan to submit another topic proposal becauseyou are very uncertain about whether to stick with thisone, check this box:<input type=checkbox name=uncertain value="No">

<br><input type=submit name=submit value="Submit"></form></body></html>

Page 19: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

19

Perl Script to Process Data From Form#! /usr/bin/perl# Process the miniproject topic proposal form inputs# S. Tanimoto, 20 Feb 2000

use CGI qw/:standard/;use strict;

print header;

my $projectname = param("projectname");my $uncertain = param("uncertain");my $partner = param("partner");my $proposal_url = param("proposalurl");my $student_username = $ENV{"REMOTE_USER"};my $now = localtime();

$projectname =~ s/[^a-zA-Z0-9\-\~]//g;$partner =~ s/[^a-zA-Z0-9\-\~]//g; $proposal_url =~ s/[^a-zA-Z0-9\-\~]//g;

Page 20: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

20

Perl Script to Process the Data (2 of 2)

my $output_line = "STUDENT_USERNAME=$student_username; " . "PROPOSAL_URL=$proposal_url; " . "PROJECT_NAME=$projectname; " . "PARTNER=$partner; " . "UNCERTAIN=$uncertain; " . "DATE=$now; ";

if (! (open(OUT, ">>MP-topic-proposal-data.txt"))) { print("Error: could not open topic file for output."); print("Please notify instructor and/or try again later."); print end_html; exit 0;}

print OUT $output_line, "\n";close OUT;

print h1("Your miniproject topic proposal has been received. Thanks!");

print end_html;

Page 21: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

21

Perl Script to “Compile” the Data #!/usr/bin/perl# make-MP-index-of-proposed-topics.pl

use strict;use CGI qw/:standard/;

open(INFILE, "<MP-topic-proposal-data-sorted.txt") || die("Could not open the file MP-topic-proposal-data-sorted.txt.\n");

print<<"EOT";<html><head><title>CSE 341 MP Topic Proposal Index</title></head><body>

<h1>CSE 341 MP Topic Proposal Index</h1>EOTprint "<table><tr><td>Student username</td><td>Proposal Page</td><td>Partner</td><td>Certainty</td><td>When</td></tr>\n";

my $projectname;my $uncertain;my $partner;my $proposal_url;my $student_username;my $date;

Page 22: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

22

Perl Script to “Compile” the Data (2 of 3) while (<INFILE>) { if ( /STUDENT_USERNAME=([^\;]+);\s/){$student_username =$1; } else { $student_username =""; } if ( /PROJECT_NAME=([^\;]+);\s/){$projectname =$1; } else { $projectname =""; } if ( /PROPOSAL_URL=([^\;]+);\s/){$proposal_url =$1; } else { $proposal_url =""; } if ( /PARTNER=([^\;]+);\s/){$partner =$1; } else { $partner =""; } if ( /UNCERTAIN=([^\;]+);\s/){$uncertain =$1; } else { $uncertain =""; } if ( /DATE=([^\;]+);/){$date = $1; } else { $date = ""; }

if ($proposal_url =~ /http/ ) {} else { $proposal_url = "http://" . $proposal_url; }

if ($uncertain eq "No") { $uncertain = ""; } else { $uncertain = "Uncertain"; }

Page 23: CSE 341 S. Tanimoto Perl-Regular-Expressions - 1 Regular Expressions: Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2

CSE 341 S. Tanimoto Perl-Regular-Expressions -

23

Perl Script to “Compile” the Data (3 of 3)

my $link = "<a href=\"$proposal_url\">$projectname</a>";

print "<tr><td>$student_username</td><td>$link</td><td>$partner</td><td>$uncertain</td><td>$date</td></tr>\n";

}

print "</table>\n";

print "</body></html>\n";