Upload
stewart-hudson
View
220
Download
1
Embed Size (px)
Citation preview
Topics of Discussion What Is a Regular Expression? Expression Modifiers and Simple
Statements Regular Expression Operators Regular Expression
Metacharacters Unicode
Chapter 7: Regular Expressions – Pattern Matching
7.1 What Is a Regular Expression? 7.2 Expression Modifiers and Simple
Statements 7.3 Regular Expression Operators
7.1 What Is a Regular Expression/
A regular expression is really just a sequence or pattern of characters that is matched against a string of text when performing searches and replacements.
Example: 7.1
/abc/
?abc?
7.2 Expression Modifiers and Simple Statements
Conditional Modifiers The if Modifier
Format: Expression2 if Expression1;
Examples: 7.2, 7.3, 7.4
The DATA Filehandle Format: __DATA__
The actual data is stored here
Examples: 7.5, 7.6
7.2 Expression Modifiers and Simple Statements The unless Modifier
Format: Expression2 unless Expression1;
Examples: 7.7, 7.8
Looping Modifiers
The while Modifier
Format: Expression2 while Expression1;
Examples: 7.9
The Until Modifier
Example: 7.10
The foreach Modifier
Example: 7.11
7.3 Regular Expression Operators The m Operator and Matching
Format: /Regular Expression/
m#Regular Expression#
m(regular expression)
Table 7.1
Examples: 7.12, 7.13, 7.14, 7.15, 7.16
7.3 Regular Expression Operators The g Modifier-Global Match
Format: m/search pattern/g
Example: 7.17
The i Modifier-case Insensitivity Format: m/search pattern/i
Example: 7.18
Special Scalars for Saving patterns
Example: 7.19
The x Modifier-Global Match Example: 7.20
7.3 Regular Expression Operators The s Operator and Substitution
Format: s/old/new/;
s/old/new/I;
s/old/new/g;
Table 7-2
Examples: 7.21, 7.22, 7.23
Changing the Substitution Delimiters
Example: 7.24, 7.25
The g Modifier-Global Substitution Examples: 7.26, 7.27
7.3 Regular Expression Operators The I Modifier-Case Insensitivity
Format: s/search pattern/replacement string/i;
Examples: 7.28, 7.29
The e Modifier-Evaluating An Expression
Format: s/search pattern/replacement string/e;
Examples: 7.30, 7.31, 7.32, 7.33
Pattern Binding Operators
Format: variable = ~ /Expression/
variable !~ /Expression/
Variable =~ s/old/new
Table 7.3
Examples: 7.34, 7.35, 7.36, 7.37, 7.38, 7.39
Chapter 8: Getting Control – Regular Expression Metacharacters
8.1 Regular Expression Metacharacters
8.2 Unicode
8.1 Regular Expression Metacharacters
Regular expression metacharacters are characters that do not represent themselves. They are endowed with special powers to allow you to control the search pattern in some way.
Metacharacters lose their special meaning if proceeded with a backslash(\).
Metasymbols – [0-9] = \d
Example: 8.1
/^a…c/
Table 8.1
8.1 Regular Expression Metacharacters
Metacharacters for Single Characters Table 8.2
Example: 8.2
The s Modifier-The Dot metacharacter and the newline Example: 8.3
The Character Class A character class represents one character from a
set of characters.
Examples: 8.4, 8.5, 8.6, 8.7, 8.8, 8.9
8.1 Regular Expression Metacharacters
The POSIX Character Class
POSIT (the Portable Operating System Interface) is an industry standard used to ensure that programs are portable across operating system.
Table 8.3
Example 8.11
Whitespace Metacharacters
Table 8.4
Examples: 8.12, 8.13, 8.14
8.1 Regular Expression Metacharacters Metacharacters to Repeat Pattern matches
Quantifier – One or more characters
The Greed Factor – the asterisk (*) It matches for zero or more of the preceding character.
$-=“ab123456783445554437AB” s/ab[0-9]*/X/; XAB
Table 8.5
Example 8.15, 8.16, 8.17, 8.18, 8.19, 8.20, 8.21, 8.22
Metacharacters That Turn Off Greediness
By pacing a question mark after a greedy quantifier, the greed is turned off and the search ends after the first match, rather the last one.
Table 8.6
Examples: 8.24
8.1 Regular Expression Metacharacters Anchoring Metacharacters
Zero-width assertions – Anchors correspond to positions, not actual characters.
Table 8.7
Example 8.25, 8.26, 8.27, 8.28
The m Modifier
The m modifier is used to control the behavior of the $ and ^ anchor metacharacters.
Examples: 8.29
Alternation
Alternation allows the regular expression to contain alternative pattern to be matched,
Example 8.30
8.1 Regular Expression Metacharacters
Grouping or Clustering The process of grouping characters together is
called clustering.
Example 8.31, 8.32, 8.33, 8.34
Remembering or Capturing Subpattern – If the regular expression pattern is
enclosed in parentheses. The subpattern is saved in special numbered scalar variables, and these variables can be used later in the programs.
The process of grouping characters together is called clustering.
Example 8.35, 8.36, 8.37, 8.38, 8.39, 8.40, 8.42
8.1 Regular Expression Metacharacters Turning Off Capturing
?: metacharacter can be used to suppress the capturing of the subpattern.
Example 8.43
Metacharacters That Look Ahead and Behind Look ahead in the string for a pattern (?=pattern)
Look behind in the string for a pattern (?<=pattern)
Table 8.8
Example 8.44, 8.45, 8.46, 8.47
8.1 Regular Expression Metacharacters
The tr or y Function The tr function translates characters, in a one-on-
one correspondence, from the characters in the search string to the characters in the replacement string.
Table 8.9
Example 8.48
Example 8.49 (tr Delete Option)
Example 8.50 (tr Complement Option)
Example 8.51 (tr Squeeze Option)
8.2 Unicode
The Unicode standard is an effort to solve the problem by creating new character sets, called UTF8 and UTF16.
Unicode has the capacity to encompass all the world’s written language.
Perl and Unicode
Perl 5.6 supports UTF8 Unicode
The utf8 program turns on the Unicode settings and the bytes programs turn off.
Table 8.10
Example 8.52
Summary What Is a Regular Expression? Expression Modifiers and Simple
Statements Regular Expression Operators Regular Expression
metacharacters Unicode
Mid-term Exam Date: Next week
Exam Time: 11:00 AM - 11:30 AM
Contents: Chapter 1- Chapter 8.1.2
No books, no notes, no computer