View
217
Download
0
Category
Preview:
Citation preview
Karthik Sangaiah
Developed by Larry Wall◦ “There’s more than one way to do it”◦ “Easy things should be easy and hard things
should be possible” Main purpose of Perl was for text
manipulation Regular Expressions fundamental to text
processing
String that describes a pattern Simplest regex is a word A regex consisting of a word matches any
string that contains that word Ex:
◦ “Hello World” =~ /World/
“=~” operator produces TRUE if regex matches a string
Ex:◦ if (“Sample Words”
=~ /Sample/){
print “It matches\n”;else{
print “It doesn’t match\n”;}
“!~” operator produces TRUE of regex does NOT match a string
Ex:◦ if (“Sample Words” !~
/Sample/){
print “It doesn’t match\n”;else{
print “It matches\n”;}
Can use variable as regex
Ex: $temp = “ls” “ls - l” =~ /$temp/
If using default variable “$_”:◦ “$_ =~” can be omitted
Ex: $_ = “ls -l”; if (/ls/) { print “It matches\n”;}else {
print “It doesn’t match\n”;}
Regexs in Perl are mostly treated as double-quoted Strings
Values of variables in regex will be subtituted in before regex is evaluated for matching
Ex:$foo = ‘vision’;‘television’ =~ /tele$foo/;
“/ /” default delimiters can be changed to arbitrary delimiters by using “=~ m”
Ex:“Sample Text” =~ m!Text!;“Sample Text” =~ m{Text};“Sample Text” =~ m“Text”;
Reserved for use in regex notations◦ { }, [ ], ( ), ^, $, ., |, *, +, ?, \
Need to use “\” before use of a metacharacter in the regex
Ex:◦ “5*2=10" =~ /5\*2/;◦ "/usr/bin/perl" =~ /\/usr\/bin\/perl/;
“/” also needs to be backslashed if it’s used as the delimiter
“^” matches at beginning of string “$” matches at end of string or before new
line at end of string Ex:
“television” =~ /^tele/;“television” =~ /vision$/;
When using “^” and “$”, regex has to match in beginning and end of string (i.e. match whole string).
Ex:“vision” =~ /^vision$/;
Allows a set of possible characters, rather than a single character to match
Character classes denoted by […] with a set of characters matched inside
Ex./[btc]all/; #Matches ball, tall, or call/word[0123456789]/; #Matches word0…word9
Special characters in character class are handled with backslash as well
Special characters within character class:◦ “-”, “]”, “\”, “^”, “$”, “.”, “]”
Ex:/[\$c]w/; #matches $w or cw$x = ‘btc’;/[$x]all/; #matches ball, tall, or call/[\$x]all/; #matches $all or xall/[\\$x]all/; #matches \all, ball, tall, or call
Special Char. “-” used as range operator Ex:
/word[0-9]/; #matches word0…word9/word[0-9a-z] /; #matches word0… word9, or worda… wordz
Special Char. “^” in first position of character class denotes a negated character class
Ex:/[^0-9]/; #matches a non-numeric character
Common character class abbreviations:◦ \d – digit, [0-9]◦ \s – whitespace character, [\ \t\r\n\f]◦ \w – word character(alphanumeric or _),◦ \D – negated \d◦ \S – negated \s◦ \W – negated \w◦ . – any character but “\n”
Abbreviations can be used inside and outside character classes
“\b” matches boundary between a word character and a non-word character
Ex:◦ $x = “Exam1 Question from Sample Exam”;
◦ $x =~ /Exam/; #matches Exam in Exam1◦ $x =~ /\bExam/; #matches cat in Exam◦ $x =~ /\bExam\b/; #matches cat at end of string
Often, we want to match against lines and ignore newline characters
Sometimes we need to keep track of newlines.
//s – Single line matching //m – Multi-line matching These modifiers affect two aspects how the
regex is interpreted:◦ How the ‘.’ character class is defined◦ Where the anchor, ^ and $, are able to match
No modifier (//) – Default◦ . matches all characters but \n◦ ^ matches at beginning of string◦ $ matches at end of string or before a newline at
the end of string String as Single long line (//s)
◦ . matches any character◦ ^ matches at beginning of string◦ $ matches end of string or before a newline at the
end of string
String as Multiple lines (//m)◦ . matches all characters but \n◦ ^ matches at beginning of any line within the
string◦ $ matches end of any line within the string
String as Single long line but detect mutliple lines (//sm)◦ . matches any character◦ ^ matches at beginning of any line within the
string◦ $ matches end of any line within the string
$x = “You will know how to use Perl\nFor text processing\n";
$x =~ /^For/; # No match, “For" not at start of string $x =~ /^For/s; # No match, “For" not at start of string $x =~ /^For/m; # match, “For" at start of second line $x =~ /^For/sm; # match, “For" at start of second
line
Alternation metacharacter “|”◦ Used to match different possible words or
character strings◦ Word 1 or word 2 -> /word1|word2/;
Perl tries to match the regex at earliest possible point in the string
Ex.“shoes and strings” =~ /shoes/strings/and/; #matches shoes“shoes” =~ /s|sh|sho|shoes/; #matches “s”“shoes” =~ /shoes|sho|s/; #matches “cats”
Perl Resource 5: Perl Regular Expressions Tutorial◦ http://www.cs.drexel.edu/~knowak/cs265_fall_201
0/perlretut_2007.pdf Perl History
◦ http://www.xmluk.org/perl-cgi-history-information.htm
Perl Special Variables◦ http://www.kichwa.com/quik_ref/
spec_variables.html
Recommended