Upload
rolf-fowler
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Appendix A:Regular Expressions
It’s All Greek to Me
Regular Expressions• A pattern that matches a set of one or more
strings
• May be a simple string, or contain wildcard characters or modifiers
• Used by programs such as vim, grep, awk, and sed
• Not the same as shell expansion
Components• Characters
– Literals– Special Characters
• Delimiters– Mark beginning end of regular expressions– Usually /– ’ (but not really)
Simple Strings• Contain no special characters
• Matches only the string
• Ex: /foo/ matches:– foo– tomfoolery– bar.foo.com
Special Characters• Can match multiple strings
• Represent zero or more characters
• Always match the longest possible string (we’ll see examples in a bit)
Periods• Matches any single character
• Ex: /.ing/– I was talking– bling– he called ingred
• Ex: /spar.ing/– sparring– sparking
Brackets• Define a character class
• Match any one character in the class
• If a carat (^) is first character in class, character class matches any character not in class
• Other special characters in class lose meaning
Brackets con’t• Ex. /[jJ]ustin/ matches justin and Justin
• Ex. /[A-Za-z]/ matches any letter
• Ex. /[0-9]/ matches any number
• Ex. /[^a-z]/ matches anything but lowercase letters
Asterisks• Zero or more occurrences of the previous
character
• So match any number of characters would be /.*/
• Ex. /t.*ing/– thing– this is really annoying
Plus Signs and Question Marks
• Very similar to asterisks, depend on previous• + matches one or more occurrences (not 0)• ? Matches zero or one occurrence (no more)
• Ex. /2+4?/ matches one or more 2’s followed by either zero or one 4– 22224, 2 match– 4, 244 do not
• Part of the class of extended R.E.
Carets & Dollar Signs• If a regular expression starts with a ^, the
string must be at the beginning of a line
• If a regular expression ends with a $, the string must be at the end of a line
• ^ and $ are referred to as anchors
• Ex. /^T.*T$/ matches any line that starts and ends with T
Quoting Special Characters• If you want to use a special character
literally, put a backslash in front of it
• Ex. /and\/or/ matches and/or• Ex. /\\/ matches \• Ex. /\**/ matches any number of asterisks
Longest Match• Regular expressions match the longest string
possible in a line
• Ex. I (Justin) like coffee (lots).• /(.*)/
– Matches (Justin) like coffee (lots)
• /([^)]*)/– Matches (Justin)
Boolean OR• You can pattern match for two distinct strings
using OR (the pipe)
• Ex. /CAT|DOG/ – Matches exactly CAT and exactly DOG
• Simplier expressions can be written just using a character class– I.E. /a[bc]/ instead of /ab|ac/
• Also part of extended R.E.
Grouping• You can apply special characters to groups
of characters in parenthesis
• Also called bracketing
• Matches same as unbracketed expression
• But can use modifiers
• Ex. /\(duck\)*|\(goose\)/
Using with vim• Use regular expressions for searching and
substituting
• Searching:– /string or ?string
• Substituting:– :[g][address]s/string/replace[/g]– g : global; substitute all lines– string and replace can be R.E.– /g : global; replace all occurrences in the line
Using with vim con’t• [address]
– n : line number– n[+/-]x : line number plus x lines before or after– n1,n2 : from line n1 to n2 – . : alias for current line– $ : alias for last line in work buffer– % : alias for entire work buffer
vim examples• /^if(• /end\.$• :%s/[Jj]ustin/Mr\. Awesome/g
Using with vim con’t• Ampersand (&)
– Alias for matched string when substituting– Ex: /[A-Z][0-9]/_&_/
• Quoted digit (\n)– Used with R.E. with multiple quoted parts – Can be used to rearrange columns– Ex: /\([^,]*\), \(.*\)/\2 \1/
Using with grep• To take advantage of extended regular
expressions, use egrep or grep -E instead
• Use single quote as delimiter
• Ex: – egrep ’^T.*T$’ myfile
Lists all lines in myfile that begin & end with T