25
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen

Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen

Embed Size (px)

Citation preview

Programming Perl in UNIX

Course Number : CIT 370

Week 4

Prof. Daniel Chen

Introduction

Review and Overviews Chapters 7 and 8 Summary Lab Mid-term Exam Next Week (Week 5)

Topics of Discussion What Is a Regular Expression? Expression Modifiers and Simple

Statements Regular Expression Operators Regular Expression

Metacharacters Unicode

Chapter 7: Regular Expressions – Pattern Matching

7.1 What Is a Regular Expression? 7.2 Expression Modifiers and Simple

Statements 7.3 Regular Expression Operators

7.1 What Is a Regular Expression/

A regular expression is really just a sequence or pattern of characters that is matched against a string of text when performing searches and replacements.

Example: 7.1

/abc/

?abc?

7.2 Expression Modifiers and Simple Statements

Conditional Modifiers The if Modifier

Format: Expression2 if Expression1;

Examples: 7.2, 7.3, 7.4

The DATA Filehandle Format: __DATA__

The actual data is stored here

Examples: 7.5, 7.6

7.2 Expression Modifiers and Simple Statements The unless Modifier

Format: Expression2 unless Expression1;

Examples: 7.7, 7.8

Looping Modifiers

The while Modifier

Format: Expression2 while Expression1;

Examples: 7.9

The Until Modifier

Example: 7.10

The foreach Modifier

Example: 7.11

7.3 Regular Expression Operators The m Operator and Matching

Format: /Regular Expression/

m#Regular Expression#

m(regular expression)

Table 7.1

Examples: 7.12, 7.13, 7.14, 7.15, 7.16

7.3 Regular Expression Operators The g Modifier-Global Match

Format: m/search pattern/g

Example: 7.17

The i Modifier-case Insensitivity Format: m/search pattern/i

Example: 7.18

Special Scalars for Saving patterns

Example: 7.19

The x Modifier-Global Match Example: 7.20

7.3 Regular Expression Operators The s Operator and Substitution

Format: s/old/new/;

s/old/new/I;

s/old/new/g;

Table 7-2

Examples: 7.21, 7.22, 7.23

Changing the Substitution Delimiters

Example: 7.24, 7.25

The g Modifier-Global Substitution Examples: 7.26, 7.27

7.3 Regular Expression Operators The I Modifier-Case Insensitivity

Format: s/search pattern/replacement string/i;

Examples: 7.28, 7.29

The e Modifier-Evaluating An Expression

Format: s/search pattern/replacement string/e;

Examples: 7.30, 7.31, 7.32, 7.33

Pattern Binding Operators

Format: variable = ~ /Expression/

variable !~ /Expression/

Variable =~ s/old/new

Table 7.3

Examples: 7.34, 7.35, 7.36, 7.37, 7.38, 7.39

Chapter 8: Getting Control – Regular Expression Metacharacters

8.1 Regular Expression Metacharacters

8.2 Unicode

8.1 Regular Expression Metacharacters

Regular expression metacharacters are characters that do not represent themselves. They are endowed with special powers to allow you to control the search pattern in some way.

Metacharacters lose their special meaning if proceeded with a backslash(\).

Metasymbols – [0-9] = \d

Example: 8.1

/^a…c/

Table 8.1

8.1 Regular Expression Metacharacters

Metacharacters for Single Characters Table 8.2

Example: 8.2

The s Modifier-The Dot metacharacter and the newline Example: 8.3

The Character Class A character class represents one character from a

set of characters.

Examples: 8.4, 8.5, 8.6, 8.7, 8.8, 8.9

8.1 Regular Expression Metacharacters

The POSIX Character Class

POSIT (the Portable Operating System Interface) is an industry standard used to ensure that programs are portable across operating system.

Table 8.3

Example 8.11

Whitespace Metacharacters

Table 8.4

Examples: 8.12, 8.13, 8.14

8.1 Regular Expression Metacharacters Metacharacters to Repeat Pattern matches

Quantifier – One or more characters

The Greed Factor – the asterisk (*) It matches for zero or more of the preceding character.

$-=“ab123456783445554437AB” s/ab[0-9]*/X/; XAB

Table 8.5

Example 8.15, 8.16, 8.17, 8.18, 8.19, 8.20, 8.21, 8.22

Metacharacters That Turn Off Greediness

By pacing a question mark after a greedy quantifier, the greed is turned off and the search ends after the first match, rather the last one.

Table 8.6

Examples: 8.24

8.1 Regular Expression Metacharacters Anchoring Metacharacters

Zero-width assertions – Anchors correspond to positions, not actual characters.

Table 8.7

Example 8.25, 8.26, 8.27, 8.28

The m Modifier

The m modifier is used to control the behavior of the $ and ^ anchor metacharacters.

Examples: 8.29

Alternation

Alternation allows the regular expression to contain alternative pattern to be matched,

Example 8.30

8.1 Regular Expression Metacharacters

Grouping or Clustering The process of grouping characters together is

called clustering.

Example 8.31, 8.32, 8.33, 8.34

Remembering or Capturing Subpattern – If the regular expression pattern is

enclosed in parentheses. The subpattern is saved in special numbered scalar variables, and these variables can be used later in the programs.

The process of grouping characters together is called clustering.

Example 8.35, 8.36, 8.37, 8.38, 8.39, 8.40, 8.42

8.1 Regular Expression Metacharacters Turning Off Capturing

?: metacharacter can be used to suppress the capturing of the subpattern.

Example 8.43

Metacharacters That Look Ahead and Behind Look ahead in the string for a pattern (?=pattern)

Look behind in the string for a pattern (?<=pattern)

Table 8.8

Example 8.44, 8.45, 8.46, 8.47

8.1 Regular Expression Metacharacters

The tr or y Function The tr function translates characters, in a one-on-

one correspondence, from the characters in the search string to the characters in the replacement string.

Table 8.9

Example 8.48

Example 8.49 (tr Delete Option)

Example 8.50 (tr Complement Option)

Example 8.51 (tr Squeeze Option)

8.2 Unicode

The Unicode standard is an effort to solve the problem by creating new character sets, called UTF8 and UTF16.

Unicode has the capacity to encompass all the world’s written language.

Perl and Unicode

Perl 5.6 supports UTF8 Unicode

The utf8 program turns on the Unicode settings and the bytes programs turn off.

Table 8.10

Example 8.52

Summary What Is a Regular Expression? Expression Modifiers and Simple

Statements Regular Expression Operators Regular Expression

metacharacters Unicode

Lab

Examples 7.1 – 7.39 (P 163 – 195) Examples 8.1 - 8.52 (P 197 - 248) Homework 4

Mid-term Exam Date: Next week

Exam Time: 11:00 AM - 11:30 AM

Contents: Chapter 1- Chapter 8.1.2

No books, no notes, no computer

Next Week

Reading assignment (Textbook chapter 8.1.3 and Chapter 9)

Mid-term Exam (Chapter 1 – Chapter 8.1.2)