28
$address =~ m/(\d* .*)\n(.*?, ([A-Z]{2}) (\ d{5})-?(\d{0,5})/

$address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Embed Size (px)

Citation preview

Page 1: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

$address =~ m/(\d* .*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/

$address =~ m/(\d* .*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/

Page 2: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Introduction to Regular Expressions

Introduction to Regular Expressions

• It’s all about patterns

• Character Classes match any text of a certain type

• Repetition operators specify a recurring pattern

• Search flags change how the RegEx operates

• In this presentation…

• green denotes a character class

• yellow denotes a repetition quantifier

• orange denotes a search flag or other symbol

• My examples use Perl syntax

• It’s all about patterns

• Character Classes match any text of a certain type

• Repetition operators specify a recurring pattern

• Search flags change how the RegEx operates

• In this presentation…

• green denotes a character class

• yellow denotes a repetition quantifier

• orange denotes a search flag or other symbol

• My examples use Perl syntax

Page 3: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Introduction to Regular Expressions

Introduction to Regular Expressions

• Basic syntax

• All RegEx statements must begin and end with /

• /something/

• Escaping reserved characters is crucial

• /(i.e. / is invalid because ( must be closed

• However, /\(i\.e\. / is valid for finding ‘(i.e. ’

• Reserved characters include:

• . * ? + ( ) [ ] { } / \ |

• Also some characters have special meanings based on their position in the statement

• Basic syntax

• All RegEx statements must begin and end with /

• /something/

• Escaping reserved characters is crucial

• /(i.e. / is invalid because ( must be closed

• However, /\(i\.e\. / is valid for finding ‘(i.e. ’

• Reserved characters include:

• . * ? + ( ) [ ] { } / \ |

• Also some characters have special meanings based on their position in the statement

Page 4: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression MatchingRegular Expression Matching

• Text Matching

• A RegEx can match plain text

• ex. if ($name =~ /Dan/) { print “match”; }

• But this will match Dan, Danny, Daniel, etc…

• Full Text Matching with Anchors

• Might want to match a whole line (or string)

• ex. if ($name =~ /^Dan$/) { print “match”; }

• This will only match Dan

• ^ anchors to the front of the line

• $ anchors to the end of the line

• Text Matching

• A RegEx can match plain text

• ex. if ($name =~ /Dan/) { print “match”; }

• But this will match Dan, Danny, Daniel, etc…

• Full Text Matching with Anchors

• Might want to match a whole line (or string)

• ex. if ($name =~ /^Dan$/) { print “match”; }

• This will only match Dan

• ^ anchors to the front of the line

• $ anchors to the end of the line

Page 5: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression MatchingRegular Expression Matching

• Order of results

• The search will begin at the start of the string

• This can be altered, don’t ask yet

• Every character is important

• Any plain text in the expression is treated literally

• Nothing is neglected (close doesn’t count)

• / s/ is not the same as / s/

• Far easier to write than to debug!

• Order of results

• The search will begin at the start of the string

• This can be altered, don’t ask yet

• Every character is important

• Any plain text in the expression is treated literally

• Nothing is neglected (close doesn’t count)

• / s/ is not the same as / s/

• Far easier to write than to debug!

Page 6: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression Char ClassesRegular Expression Char Classes

• Allows specification of only certain allowable chars

• [dofZ] matches only the letters d, o, f, and Z

• If you have a string ‘dog’ then /[dofZ]/ would match ‘d’ only even though ‘o’ is also in the class

• So this expression can be stated “match one of either d, o, f, or Z.”

• [A-Za-z] matches any letter

• [a-fA-F0-9] matches any hexadecimal character

• [^*$/\\] matches anything BUT *, $, /, or \

• The ^ in the front of the char class specifies ‘not’

• In a char class, you only need to escape \ ( ] - ^

• Allows specification of only certain allowable chars

• [dofZ] matches only the letters d, o, f, and Z

• If you have a string ‘dog’ then /[dofZ]/ would match ‘d’ only even though ‘o’ is also in the class

• So this expression can be stated “match one of either d, o, f, or Z.”

• [A-Za-z] matches any letter

• [a-fA-F0-9] matches any hexadecimal character

• [^*$/\\] matches anything BUT *, $, /, or \

• The ^ in the front of the char class specifies ‘not’

• In a char class, you only need to escape \ ( ] - ^

Page 7: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression Char ClassesRegular Expression Char Classes

• Special character classes match specific characters

• \d matches a single digit

• \w matches a word character (A-Z, a-z, _)

• \b matches a word boundary /\bword\b/

• \s matches a whitespace character (spc, tab, newln)

• . wildcard matches everything except newlines

• Use very carefully, you could get anything!

• To match “anything but…” capitalize the char class

• i.e. \D matches anything that isn’t a digit

• Special character classes match specific characters

• \d matches a single digit

• \w matches a word character (A-Z, a-z, _)

• \b matches a word boundary /\bword\b/

• \s matches a whitespace character (spc, tab, newln)

• . wildcard matches everything except newlines

• Use very carefully, you could get anything!

• To match “anything but…” capitalize the char class

• i.e. \D matches anything that isn’t a digit

Page 8: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression Char ClassesRegular Expression Char Classes

• Character Class Examples

• $bodyPart =~ /e\w\w/;

• Matches ear, eye, etc

• $thing = ‘1, 2, 3 strikes!’; $thing =~ /\s\d/;

• Matches ‘ 2’

• $thing = ‘1, 2, 3 strikes!’; $thing =~ /[\s\d]/;

• Matches ‘1’

• Not always useful to match single characters

• $phone =~ /\d\d\d-\d\d\d-\d\d\d\d/;

• There’s a better way…

• Character Class Examples

• $bodyPart =~ /e\w\w/;

• Matches ear, eye, etc

• $thing = ‘1, 2, 3 strikes!’; $thing =~ /\s\d/;

• Matches ‘ 2’

• $thing = ‘1, 2, 3 strikes!’; $thing =~ /[\s\d]/;

• Matches ‘1’

• Not always useful to match single characters

• $phone =~ /\d\d\d-\d\d\d-\d\d\d\d/;

• There’s a better way…

Page 9: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression RepetitionRegular Expression Repetition

• Repetition allows for flexibility

• Range of occurrences

• $weight =~ /\d{2,3}/;

• Matches any weight from 10 to 999

• $name =~ /\w{5,}/;

• Matches any name longer than 5 letters

• if ($SSN =~ /\d{9}/) { print “Invalid SSN!”; }

• Matches exactly 9 digits

• Repetition allows for flexibility

• Range of occurrences

• $weight =~ /\d{2,3}/;

• Matches any weight from 10 to 999

• $name =~ /\w{5,}/;

• Matches any name longer than 5 letters

• if ($SSN =~ /\d{9}/) { print “Invalid SSN!”; }

• Matches exactly 9 digits

Page 10: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression RepetitionRegular Expression Repetition

• General Quantifiers

• Some more special characters

• $favoriteNumber =~ /\d*/;

• Matches any size number or no number at all

• $firstName =~ /\w+/;

• Matches one or more characters

• $middleInitial =~ /\w?/;

• Matches one or zero characters

• General Quantifiers

• Some more special characters

• $favoriteNumber =~ /\d*/;

• Matches any size number or no number at all

• $firstName =~ /\w+/;

• Matches one or more characters

• $middleInitial =~ /\w?/;

• Matches one or zero characters

Page 11: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression RepetitionRegular Expression Repetition

• Greedy vs Nongreedy matching

• Greedy matching gets the longest results possible

• Nongreedy matching gets the shortest possible

• Let’s say $robot = ‘The12thRobotIs2ndInLine’

• $robot =~ /\w*\d+/; (greedy)

• Matches The12thRobotIs2

• Maximizes the length of \w

• $robot =~ /\w*?\d+/; (nongreedy)

• Matches The12

• Minimizes the length of \w

• Greedy vs Nongreedy matching

• Greedy matching gets the longest results possible

• Nongreedy matching gets the shortest possible

• Let’s say $robot = ‘The12thRobotIs2ndInLine’

• $robot =~ /\w*\d+/; (greedy)

• Matches The12thRobotIs2

• Maximizes the length of \w

• $robot =~ /\w*?\d+/; (nongreedy)

• Matches The12

• Minimizes the length of \w

Page 12: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression RepetitionRegular Expression Repetition

• Greedy vs Nongreedy matching

• Suppose $txt = ‘something is so cool’;

• $txt =~ /something/;

• Matches ‘something’

• $txt =~ /so(mething)?/;

• Matches ‘something’ and the second ‘so’

• $txt =~ /so(mething)??/;

• Matches only ‘so’ and the second ‘so’

• Doesn’t really make sense to do this

• Greedy vs Nongreedy matching

• Suppose $txt = ‘something is so cool’;

• $txt =~ /something/;

• Matches ‘something’

• $txt =~ /so(mething)?/;

• Matches ‘something’ and the second ‘so’

• $txt =~ /so(mething)??/;

• Matches only ‘so’ and the second ‘so’

• Doesn’t really make sense to do this

Page 13: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression Real Life Examples

Regular Expression Real Life Examples

• Using what you’ve learned so far, you can…

• Validate a standard 8.3 file name

• $path =~ /^\w{1,8}\.[A-Za-z0-9]{2,3}$/

• Account for poorly spelled user input

• $answer =~ /^ban{1,2}an{1,2}a$/

• $iansLastName =~ /^P[ae]t{1,2}ers[oe]n$/

• $iansFirstName =~ /^E?[Ii]?[aeo]?n$/

• Matches Ian, Ean, Eian, Eon, Ien, Ein

• At least everyone gets the n right…

• Using what you’ve learned so far, you can…

• Validate a standard 8.3 file name

• $path =~ /^\w{1,8}\.[A-Za-z0-9]{2,3}$/

• Account for poorly spelled user input

• $answer =~ /^ban{1,2}an{1,2}a$/

• $iansLastName =~ /^P[ae]t{1,2}ers[oe]n$/

• $iansFirstName =~ /^E?[Ii]?[aeo]?n$/

• Matches Ian, Ean, Eian, Eon, Ien, Ein

• At least everyone gets the n right…

Page 14: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

AlternationAlternation

• Alternation allows multiple possibilities

• Let $story = ‘He went to get his mother’;

• $story =~ /^(He|She)\b.*?\b(his|her)\b.*? (mother|father|brother|sister|dog)/;

• Also matches ‘She punched her fat brother’

• Make sure the grouping is correct!

• $ans =~ /^(true|false)$/

• Matches only ‘true’ or ‘false’

• $ans =~ /^true|false$/ (same as /(^true|false$)/)

• Matches ‘true never’ or ‘not really false’

• Alternation allows multiple possibilities

• Let $story = ‘He went to get his mother’;

• $story =~ /^(He|She)\b.*?\b(his|her)\b.*? (mother|father|brother|sister|dog)/;

• Also matches ‘She punched her fat brother’

• Make sure the grouping is correct!

• $ans =~ /^(true|false)$/

• Matches only ‘true’ or ‘false’

• $ans =~ /^true|false$/ (same as /(^true|false$)/)

• Matches ‘true never’ or ‘not really false’

Page 15: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping for BackreferencesGrouping for Backreferences

• Backreferences

• With all these wildcards and possible matches, we usually need to know what the expression finally ended up matching.

• Backreferences let you see what was matched

• Can be used after the expression has evaluated or even inside the expression itself

• Handled very differently in different languages

• Numbered from left to right, starting at 1

• Backreferences

• With all these wildcards and possible matches, we usually need to know what the expression finally ended up matching.

• Backreferences let you see what was matched

• Can be used after the expression has evaluated or even inside the expression itself

• Handled very differently in different languages

• Numbered from left to right, starting at 1

Page 16: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping for BackreferencesGrouping for Backreferences

• Perl backreferences

• Used inside the expression

• $txt =~ /\b(\w+)\s+\1\b/

• Finds any duplicated word, must use \1 here

• Used after the expression

• $class =~ /(.+?)-(\d+)/

• The first word between hyphens is stored in the Perl variable $1 (not \1) and the number goes in $2

• print “I am in class $1, section $2”;

• Perl backreferences

• Used inside the expression

• $txt =~ /\b(\w+)\s+\1\b/

• Finds any duplicated word, must use \1 here

• Used after the expression

• $class =~ /(.+?)-(\d+)/

• The first word between hyphens is stored in the Perl variable $1 (not \1) and the number goes in $2

• print “I am in class $1, section $2”;

Page 17: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping for BackreferencesGrouping for Backreferences

• Java backreferences

• Annoying but still useful

• Pattern p = Pattern.compile(“(.+?)-(\\d+)”);

Matcher m = p.matcher(mySchedule);

m.find();

System.out.println(“I am in class ” + m.group(1) +

“, section ” + m.group(2));

• Ugly, but usually better than the alternative

• m.group() returns the entire string matched

• Java backreferences

• Annoying but still useful

• Pattern p = Pattern.compile(“(.+?)-(\\d+)”);

Matcher m = p.matcher(mySchedule);

m.find();

System.out.println(“I am in class ” + m.group(1) +

“, section ” + m.group(2));

• Ugly, but usually better than the alternative

• m.group() returns the entire string matched

Page 18: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping for BackreferencesGrouping for Backreferences

• Javascript backreferences

• Used inside the expression

• Not supported

• Used after the expression

• /(.+?)-(\d+)/.test(class);

• alert(RegExp.$1);

• str = str.replace(/(\S+)\s+(\S+)/, “$2 $1”);

• RegExp supports all of Perl’s special backreference variables (wait a few slides)

• Javascript backreferences

• Used inside the expression

• Not supported

• Used after the expression

• /(.+?)-(\d+)/.test(class);

• alert(RegExp.$1);

• str = str.replace(/(\S+)\s+(\S+)/, “$2 $1”);

• RegExp supports all of Perl’s special backreference variables (wait a few slides)

Page 19: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping for BackreferencesGrouping for Backreferences

• PHP/Python backreferences

• Allows the use of specifically named backreferences

• Groups also maintain their numbers

• .NET backreferences

• Allows named backreferences

• If you try to access named groups by number, stuff breaks

• Check the web for info on how to use backreferences in these and other languages.

• PHP/Python backreferences

• Allows the use of specifically named backreferences

• Groups also maintain their numbers

• .NET backreferences

• Allows named backreferences

• If you try to access named groups by number, stuff breaks

• Check the web for info on how to use backreferences in these and other languages.

Page 20: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Grouping without BackreferencesGrouping without Backreferences

• Sometimes you just need to make a group

• If important groups must be backreferenced, disable backreferencing for any unimportant groups

• $sentence =~ /(?:He|She) likes (\w+)\./;

• I don’t care if it’s a he or she

• All I want to know is what he/she likes

• Therefore I use (?:) to forgo the backreference

• $1 will contain that thing that he/she likes

• Sometimes you just need to make a group

• If important groups must be backreferenced, disable backreferencing for any unimportant groups

• $sentence =~ /(?:He|She) likes (\w+)\./;

• I don’t care if it’s a he or she

• All I want to know is what he/she likes

• Therefore I use (?:) to forgo the backreference

• $1 will contain that thing that he/she likes

Page 21: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Matching ModesMatching Modes

• Matching has different functional modes

• Modes can be set by flags outside the expression (only in some languages & implementations)

• $name =~ /[a-z]+/i;

• i turns off case sensitivity

• $xml =~ /title=“([\w ]*)”.*keywords=“([\w ]*)”/s;

• s enables . to match newlines

• $report =~ /^\s*Name:[\s\S]*?The End.\s*$/m;

• m allows newlines between ^ and $

• Matching has different functional modes

• Modes can be set by flags outside the expression (only in some languages & implementations)

• $name =~ /[a-z]+/i;

• i turns off case sensitivity

• $xml =~ /title=“([\w ]*)”.*keywords=“([\w ]*)”/s;

• s enables . to match newlines

• $report =~ /^\s*Name:[\s\S]*?The End.\s*$/m;

• m allows newlines between ^ and $

Page 22: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Matching ModesMatching Modes

• Matching has different functional modes

• Modes can be set by flags inside the expression (except in Javascript and Ruby)

• $password =~ /^[a-z](?i)[a-jp-xz0-9]{4,11}$/;

• If an insane web site specifies that your password must begin with a lowercase letter followed by 4 to 11 upper/lower alphanumeric characters excluding k through o and y.

• $element =~ /^(?i)[A-Z](?-i)[a-z]?$/;

• (?i) makes the first letter case insensitive (if they type o, but meant O, we still know they mean oxygen). (?-i) makes sure the second letter is lowercase, otherwise it’s 2 elements

• Matching has different functional modes

• Modes can be set by flags inside the expression (except in Javascript and Ruby)

• $password =~ /^[a-z](?i)[a-jp-xz0-9]{4,11}$/;

• If an insane web site specifies that your password must begin with a lowercase letter followed by 4 to 11 upper/lower alphanumeric characters excluding k through o and y.

• $element =~ /^(?i)[A-Z](?-i)[a-z]?$/;

• (?i) makes the first letter case insensitive (if they type o, but meant O, we still know they mean oxygen). (?-i) makes sure the second letter is lowercase, otherwise it’s 2 elements

Page 23: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression ReplacingRegular Expression Replacing

• Replacements simplify complex data modification

• Generally the first part of a replace command is the regular expression and the second part is what to replace the matched text with

• Usually a backreference variable can be used in the replacement text to refer to a group matched in the expression

• The RegEx engine continues searching at the point in the string following the replacement

• Replacements use all the same syntax, but have several unique features and are implemented very differently in various languages.

• Replacements simplify complex data modification

• Generally the first part of a replace command is the regular expression and the second part is what to replace the matched text with

• Usually a backreference variable can be used in the replacement text to refer to a group matched in the expression

• The RegEx engine continues searching at the point in the string following the replacement

• Replacements use all the same syntax, but have several unique features and are implemented very differently in various languages.

Page 24: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression ReplacingRegular Expression Replacing

• Perl replacement syntax

• $phone =~ s/\D//;

• Removes the first non-digit character in a phone #

• Note that leaving the replacement blank deletes

• $html =~ s/^(\s*)/$1\t/;

• Adds a tab to a line of HTML using backreferences

• $sample =~ s/[abc]/[ABC]/;

• Might not do what is expected

• The second part is NOT a regular expression, it’s a string

• Perl replacement syntax

• $phone =~ s/\D//;

• Removes the first non-digit character in a phone #

• Note that leaving the replacement blank deletes

• $html =~ s/^(\s*)/$1\t/;

• Adds a tab to a line of HTML using backreferences

• $sample =~ s/[abc]/[ABC]/;

• Might not do what is expected

• The second part is NOT a regular expression, it’s a string

Page 25: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Regular Expression ReplacingRegular Expression Replacing

• Java replacement syntax (sucks)

• Pattern p = Pattern.compile(“\\\\\\\\server(\\d)”);

• p.matcher(netPath).replaceAll(“\\\\workstation$1”);

• Yes, you actually have to use 8 \’s to make \\

• Any \ in the expression needs to be doubled

• Matcher should parse replacement for $1

• This has the same effect but is slightly faster than

• netPath.replaceAll(“\\\\\\\\server(\\d)”,

“\\\\workstation$1”);

• No, you can’t seem to use .replace()…

• Java replacement syntax (sucks)

• Pattern p = Pattern.compile(“\\\\\\\\server(\\d)”);

• p.matcher(netPath).replaceAll(“\\\\workstation$1”);

• Yes, you actually have to use 8 \’s to make \\

• Any \ in the expression needs to be doubled

• Matcher should parse replacement for $1

• This has the same effect but is slightly faster than

• netPath.replaceAll(“\\\\\\\\server(\\d)”,

“\\\\workstation$1”);

• No, you can’t seem to use .replace()…

Page 26: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Replacement ModesReplacement Modes

• Replacements can be performed singly or globally

• The examples I have been using replace only single occurrences of patterns

• Use the g flag to force the expression to scan the entire string

• $phone =~ s/\D//g;

• Removes all non-digits in the phone number

• $myGarage =~ s/Jeep|Cougar/Boeing/g;

• Gives me jets in exchange for cars

• Don’t use it if it’s not necessary

• Replacements can be performed singly or globally

• The examples I have been using replace only single occurrences of patterns

• Use the g flag to force the expression to scan the entire string

• $phone =~ s/\D//g;

• Removes all non-digits in the phone number

• $myGarage =~ s/Jeep|Cougar/Boeing/g;

• Gives me jets in exchange for cars

• Don’t use it if it’s not necessary

Page 27: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

Combining Replace and Match Modes

Combining Replace and Match Modes

• Combining modes is easy

• To combine modes, just append the flags

• $alphabet =~ /Q//gi;

• Get rid of the pesky letter Q (and q too)

• $response =~ /(?im)“([aeiou].*?)”(?-m)(.*)/;

• This example sucks. Point is you can combine modes inside the statement, too.

• Combining modes is easy

• To combine modes, just append the flags

• $alphabet =~ /Q//gi;

• Get rid of the pesky letter Q (and q too)

• $response =~ /(?im)“([aeiou].*?)”(?-m)(.*)/;

• This example sucks. Point is you can combine modes inside the statement, too.

Page 28: $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})

References for Learning MoreReferences for Learning More

• Tutorials for other programming languages• http://www.regular-expressions.info/

• In-depth syntax • http://kobesearch.cpan.org/htdocs/perl/perlreref.html

• Code Search (ex: ‘ip address regex’)• http://www.google.com/codesearch