Upload
britney-moore
View
212
Download
0
Embed Size (px)
Citation preview
Regular Regular ExpressionsExpressionsTheory and Theory and
PracticePracticeJeff SchoolcraftJeff Schoolcraft
MDCFUG 12/13/2005MDCFUG 12/13/2005
Who am I?Who am I?
Jeff SchoolcraftJeff Schoolcraft Senior Architect / Operations Senior Architect / Operations
Manager at RGII Technologies. Manager at RGII Technologies. Speaker at UsergroupsSpeaker at Usergroups President WinProTeam Vienna President WinProTeam Vienna
Usergroup (.NET)Usergroup (.NET) TDD EvangelistTDD Evangelist Tool guyTool guy
What can you expect?What can you expect?
““The gist” in 60 seconds or less.The gist” in 60 seconds or less. TheoryTheory Practical UsagePractical Usage Best PracticesBest Practices Hands OnHands On A sermonA sermon Q & AQ & A
The GistThe Gist
Regular Expressions (regex) Regular Expressions (regex) describe patterns in strings and are describe patterns in strings and are often used for data validation, often used for data validation, searching and text transformations.searching and text transformations.
TheoryTheory
BasicsBasicsA regular expression, often called a A regular expression, often called a patternpattern, is an expression that describes , is an expression that describes a set of strings without actually listing a set of strings without actually listing its elements.its elements.
Say what?Say what?The set of strings {“dog”, “bog”, “fog”} The set of strings {“dog”, “bog”, “fog”} can be described by this regular can be described by this regular expression (regex):expression (regex):[bdf]og[bdf]og
Formal Language TheoryFormal Language Theory
A Regular Language is any language A Regular Language is any language where all possible strings of that where all possible strings of that language can be described by a language can be described by a regular expressionregular expression
Formal Language Theory Formal Language Theory (cont’d)(cont’d)
Regular expressions consist of constants and operators Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following respectively. Given a finite alphabet Σ the following constants are defined:constants are defined: ((empty setempty set) denoting the set ∅ ∅) denoting the set ∅ ∅ ((empty stringempty string) ε denoting the set {ε} ) ε denoting the set {ε} ((literal characterliteral character) ) aa in Σ denoting the set { in Σ denoting the set {aa} }
and the following operations:and the following operations: ((concatenationconcatenation) ) RSRS denoting the set { αβ | α in denoting the set { αβ | α in RR and β in and β in SS }. }.
For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. "cef"}.
((alternationalternation) ) R|SR|S denoting the set union of denoting the set union of RR and and SS. . ((KleeneKleene star star) ) RR* denoting the smallest * denoting the smallest supersetsuperset of of RR that that
contains ε and is closed under string concatenation. This is contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero the set of all strings that can be made by concatenating zero or more strings in or more strings in RR. For example, {"ab", "c"}* = {ε, "ab", "c", . For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }. "abab", "abc", "cab", "cc", "ababab", ... }.
http://http://en.wikipedia.org/wiki/Regular_expressionen.wikipedia.org/wiki/Regular_expression
DEMODEMO
All binary numbersAll binary numbers All binary numbers that start and All binary numbers that start and
end in 1end in 1 All binary number that have 00 any All binary number that have 00 any
other bits followed by 111 and any other bits followed by 111 and any other bits.other bits.
Practical UsagePractical Usage
In order of popularityIn order of popularity SearchingSearching
Much nicer than * or %Much nicer than * or % String ManipulationString Manipulation
ParsingParsing ReplacementReplacement
ValidationValidation Input validationInput validation Database check constraintsDatabase check constraints
Best PracticesBest Practices
The most important thing to The most important thing to remember:remember:
Regular Expressions are greedyRegular Expressions are greedy Make the most explicit match possibleMake the most explicit match possible
Just because some implementations allow *? Just because some implementations allow *? Don’t fall back on that.Don’t fall back on that.
Hands OnHands On
A SermonA Sermon
Some people develop a religious Some people develop a religious fascination with new tools & fascination with new tools & technologies (design patterns, regex, technologies (design patterns, regex, whatever).whatever).
Use the tools that make the most Use the tools that make the most sense for your problem/solution.sense for your problem/solution.
Email Validation with Email Validation with REGEX?REGEX?
Are you kidding me?Are you kidding me? See email.regexSee email.regex Multi-tiered approach, regex to test Multi-tiered approach, regex to test
format, some code to test validity of format, some code to test validity of email address.email address.
Questions?Questions?
Further ResourcesFurther Resources
Mastering Regular ExpressionsMastering Regular Expressionshttp://www.oreilly.com/catalog/regex/http://www.oreilly.com/catalog/regex/
A website, 1000’s on google. A website, 1000’s on google. http://www.regular-expressions.info/http://www.regular-expressions.info/
MeMehttp://thequeue.net/blog/http://thequeue.net/blog/http://regexadvice.com/blogs/jschoolcraft/http://regexadvice.com/blogs/jschoolcraft/[email protected]@thequeue.net
ToolsTools
The Regulator The Regulator (http://regex.osherove.com/)(http://regex.osherove.com/)
Expresso Expresso (http://www.ultrapico.com/Expresso.(http://www.ultrapico.com/Expresso.htm)htm)