Upload
amora
View
23
Download
0
Embed Size (px)
DESCRIPTION
Perl. Regular expression: string manipulation. substr function. string = substr(string2,start pos (starts with 0), offset) returns a substring after the start point to offset string2 is not changed $str2 = "Hi There"; $str = substr($str2, 3, 2); - PowerPoint PPT Presentation
Citation preview
Perl
Regular expression: string manipulation
substr function
• string = substr(string2,start pos (starts with 0), offset)—returns a substring after the start point to offset—string2 is not changed—$str2 = "Hi There";—$str = substr($str2, 3, 2);
– $str = "Th"; # from 4 position to 5 position;
• substr(string,start pos, offset) = string2—puts string2 after the start pos and removing old string
characters to offset.—$str2 = "Hi There"; $str = "hi";—substr($str2, 3,3) = $str; #insert and replace
– $str2 = "Hi hire";
—substr($str2, 3,0) = $str; #insert only.– $str2 = "Hi hihire";
index and rindex
• index string, substring [, offset]—returns the position before the substring in
string, else -1—with offset, position after the offset, else -1
• rindex string, substring [, offset]—return the last occurrence of the substring,
else -1—with offset, the right most position that may
be returned.
• $pos = index $str, $str2—returns the position where $str2 is found in
$str
example of substr and index
• $str = "There there Jim";• $sstr = "Jim";• $replace = "Fred";
• substr($str,(index $str,$sstr),3)= $replace;—replace Jim with Fred in $str—$str = "There there Fred";
• The substitution operator is an easier way to do this.
grep
• LIST = grep EXPR, LIST• LIST = grep BLOCK LIST• like map, each element is assigned to the
$_, then processed by BLOCK or EXPR, results are put into the list.
@new = grep /[a-zA-Z]/, @lines
• NOTE: altering $_ will alter the original list@list = qw(barney fred dino wilma)@greplist = grep {s/^[bfd]//} @list
—@greplist = "arney", "red", "ino"—@list = "arney", "red", "ino", "wilma"
s/// Operator (Substitution)
• $str =~ s/pattern to match/replacement/;—find the first match and replace it
• $str =~ s/pattern to match/replacement/g;—Find all matches and replace each of them.
• Simple substitution• $str = "3 dogs bit 1 dog";• $str =~ s/dog/cat/;
—$str = "3 cats bit 1 dog";
• $str =~ s/dog/cat/g;—$str = "3 cats bit 1 cat";
s/// Operator (Substitution) (2)
• s/pattern//;—remove the pattern found
• $str = "abad";• s/a//g;
—$str ="bd";
• From substr and index slide$str =~ s/$sstr/$replace/;OR$str =~ s/Jim/Fred/;
case insensitive substitution
• /i ignore case• $str = "Dog, dog, dOg";• s/DOG/cat/ig;
—$str = "cat, cat, cat";
• $str = "Dog, dog, dOg";• s/DOG/cAt/ig;
—$str = "cAt, cAt, cAt";—The replacement string is replaced as written.
examples
• $str = "fred xxx barney";—$str =~ s/x/boom/;
– $str = "fred boomxx barney"
—$str =~ s/x/boom/g;– $str = "fred boomboomboom barney";
—$str =~ s/x+/boom/;– $str = "fred boom barney";
alternation and group matching
• | allows an or'd matching• $str = "Wilma Flintstone";• $str =~ s/Fred|Wilma|Pebbles/Dino/g;
—$str = "Dino Flintstone";—Replace all instances of Fred or Wilma or
Pebbles with Dino.
• $str = "1st time winner";• $str =~ s/(1st|2nd|3rd) time/Last place/;
—$1 is the match, “1st” Entire match is “1st time”
—$str = "Last place winner"
single character substitution
• Using []• $str =~ s/[abc]/d/; #sub a, b, or c with d• $str =~ s/[Fred]/x/g;
—If $str was "Fred", after it would be "xxxx"
• $str =~ s/[^aeiouAEIOU]/_/g;—replace any non-vowel with an _
• Common mistake:• $str =~ s/[a-z]/[A-Z]/g;
—Should replaces any lower case letter with upper case letters but replace side is literal (not a pattern)
—if $str = "hi", then it would be "[A-Z][A-Z]";—NOTE: $str = uc $str; #upper cases a string.
matching quantifiers
• $str =~ s/a{3}/b/;—first instance of aaa is replace with b
• $str = "aaaaa"; # use this for the rest of the slide• $str =~ s/a{3,}/b/; #max matching
—$str = "b"
• $str =~ s/a{3,}?/b/; #min matching—$str = "baa"; #only sub 3 to make a min match
• $str =~ s/(a{3,}?)(a*)/b/;—$str = "b"; $1 = "aaa"; $2 = "aa";
• $str =~ s/(a{3,})(a*)/b/;—$str = "b"; $1 = "aaaaa"; $2 = "";
• $str =~ s/(a{3,}?)(a*?)/b/;# min match on both—$str = "baa"; $1 = "aaa"; $2 = "";
matching quantifiers (2)
• $str = "aaaaab"; # use this for the rest of the slide
• $str =~ s/a{3,}?b/c/;—$str = "c", why? in order to make the match, it
used all the a's to include the b.
• + 1 or more and ? 0 or 1 time (max match)
• $str =~ s/(a+)(b?)/c/;—$str = "c", $1 = "aaaaa" and $2 = "b"
• $str =~ s/(a+?)(b??)/c/; #min match—$str = "caaaab"; $1 ="a"; $2 = "";
matching quantifiers (3)
• Example and perl doesn’t always do what you think.
• $str = "ddogg";• $str =~ s/d.*g/cat/;
—$str = "cat" # max match, makes sense
• $str = "ddogg";• $str =~ s/d.*?g/cat/;
—$str = "catg"; #min match, but not the best min match it can make.
matching quantifiers (4)
• More Examples (with $_ variable)
$_ = "a xxx c xxxxx c xxx d";• s/x{1,}/d/g; produces "a d c d c d d"• s/x{1,}?/d/g; produces "a ddd c ddddd c
ddd d"• s/x{1,2}/d/g; prodcues "a dd c ddd c dd d"• s/x{1,3}/d/g; produces "a d c dd c d d"• s/x{2,2}/d/g; produces "a dx c ddx c dx d"
—or s/x{2}/d/g;
Anchoring
• $str = "Fred Flintstone Fred"• $str =~ s/Fred/Wilma/g;
—Replaces all instances of Fred with Wilma
• $str =~ s/Fred$/Wilma/g;—Only the last instance, "Fred Flintstone
Wilma", even with /g flag
• $str =~ s/^Fred/Wilma/g;—only the first instance, "Wilma Flintstone Fred",
even with the /g flag
• $str = "abcd";• $str =~ s/^[abc]+/d/;
—$str = "dd";
Parentheses as memory
• s/a(.)b(.)c\2d\1/a mess/;—"adbecedd" is converted to "a mess"—"adbecdde" is not converted.
• s/a(.*)b\1c/a mess/;—"addbddc" changes to "a mess"—"adddbddc" is not changed
• To kept the pattern found use \1 ..\9 in replacement
• s/a(.*)b\1c/What is this: \1/;—"addbddc" converted to "What is this: dd"—again $1 = "dd"
metasymbols
• a very common substitution —s/\s+/ /g; # replace all whitespace with single
space.– " a b\t c" changes to " a b c"
• remove word character duplicates—$str = "11aabbdccaa";—$str =~ s/(\w)\1/\1/g;
– $str = "1abdca"
• Remove any duplicates—$str = "11 ,,aa"—$str =~ s/(.)\1/\1/g;
– $str ="1 ,a"
Metasymbols (2)
• \U Upper case until \E and \L lower case until \E
• Example• s/a(.*)b\1c/What is this: \U\1\E/;
—"addbddc" converted to "What is this: DD"
• s/a(.*)b\1c/What is this: \L\1\E/;—"addbddc" converted to "What is this: dd"
• \Q …\E stop regex characters in between
Exercise 10
• What is the outcome of the following substitutions? Use $_ = "ad dog cd"
1. s/dog//;
2. while (/ /) { s/ / /g;}
3. s/(\w+)\s+(\w+)/$2 $1/g;
4. s/(.+)d/Dd/g;
5. s/(.+?)d/Dd/g;
6. s/(\S+)/=\1=/g;
7. Write a substitution to change each vowel to an X.
s/// flags
• like the match operator• /m let ^ and $ match next to embedded \n• /s let . match newline• /x ignore whitespace and permit
comments
• s/// flags only• /g replace globally, ie all occurrences• /e evaluate the right side as an
expression—in other words, perl interprets the right side as
perl code, where you have return value
/e flag
• s/(\d+)/sprintf("%#x",$1)/ge;—covert all numbers to hex—"2581" would converted to "0xb23"
• return to the leap year with a trinary operator
s/(\d+)/$1 % 4 ? "$1 (not a leap year)" :
$1 % 100 ? "$1 (a leap year)" :$1 % 400 ? "$1 (not a leap year)" :
"$1 (a leap year)"/gxe• "2000" changed to "2000 (a leap year)"
tr/// Operator (Transliteration)
• same as sed, can as use y/// instead of tr///• DOES NOT use pattern matching, instead it
scans character by character and replaces each occurrence of a character with a replacement
• tr/SEARCHLIST/REPLACEMENTLIST/cds;• Example:
—$str = "AABBCCDDEE";—$str =~ tr/ABC/XYZ/;
– $str = "XXYYZZDDEE";
—$str =~ tr/DE/!/; #if the replacement list is too short, uses the last one as many times as needed.
– $str = "XXYYZZ!!!!";
tr/// Operator (Transliteration) (2)
• Duplicates in the Searchlist are ignored—$str = "AABBCCDDEE";—$str =~ tr/AAB/xyz/;
– $str = "xxzzCCDDEE";
• /c means letters not in the Searchlist—$str = "AABBCCDDEE";—$str =~ tr/ABC/x/c;
– $str = "AABBCCxxxx";
tr/// Operator (Transliteration) (3)
• /d delete found, but non-replaced characters—Changes tr, so if your replacement list is short,
those characters are removed—$str = "AABBCCDDEE";—$str =~ tr/ABC/xy/d;
– $str = "xxyyDDEE";
—$str =~ tr/DE//d;– $str = "xxyy";
tr/// Operator (Transliteration) (4)
• /s removes duplicates in replaced characters—$str = "AABBCCDDEE";—$str =~ tr/ABC/xyz/s;
– $str ="xyzDDEE";
• tr/// returns the number of characters found/replaced.
• $count = ($str =~ tr/ABC/xyz/);—$count = 6; $str = "xxyyzzDDEE";
• $count = ($str =~ tr/ABC//);—$count = 6; $str = "AABBCCDDEE";
– No replacement list, so it just counted them and made no replacements. Note s/// would have removed them.
More tr/// Examples
• $str = "AABBCCDDEE";• $str =~ tr/D//d; #delete found characters
—$str = "AABBCCEE";
• $str = "AABBCCDDEE";• $str =~ tr/ABD/xy/ds; #delete D, sub A for x
and B for y and remove duplicates replacements—$str = "xyCCEE";
• $str =~ tr/a-zA-Z//dc;—remove any non letters from $str.
• $str =~ tr/A-Za-z/N-ZA-Mn-za-m/;—rotate the characters by 13 letters for simple
encryption.
Exercise 11
• What is the outcome of the following transliteration? Use $_ = "fred and barney"
1. tr/abcde/ABCDE/;2. tr/a-z/ABCDE/d;3. $count = tr/a-z/A-Z/;4. tr/a-z/_/c;5. tr/a-m/X/s;6. tr/aeiou/X/cs;7. $count = tr/aeiou//c;• Change the letters bdr to X and count the
number of changes.
QA&