16
Regular Regular Expressions Expressions Theory and Theory and Practice Practice Jeff Schoolcraft Jeff Schoolcraft MDCFUG 12/13/2005 MDCFUG 12/13/2005

Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Embed Size (px)

Citation preview

Page 1: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Regular Regular ExpressionsExpressionsTheory and Theory and

PracticePracticeJeff SchoolcraftJeff Schoolcraft

MDCFUG 12/13/2005MDCFUG 12/13/2005

Page 2: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Who am I?Who am I?

Jeff SchoolcraftJeff Schoolcraft Senior Architect / Operations Senior Architect / Operations

Manager at RGII Technologies. Manager at RGII Technologies. Speaker at UsergroupsSpeaker at Usergroups President WinProTeam Vienna President WinProTeam Vienna

Usergroup (.NET)Usergroup (.NET) TDD EvangelistTDD Evangelist Tool guyTool guy

Page 3: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

What can you expect?What can you expect?

““The gist” in 60 seconds or less.The gist” in 60 seconds or less. TheoryTheory Practical UsagePractical Usage Best PracticesBest Practices Hands OnHands On A sermonA sermon Q & AQ & A

Page 4: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

The GistThe Gist

Regular Expressions (regex) Regular Expressions (regex) describe patterns in strings and are describe patterns in strings and are often used for data validation, often used for data validation, searching and text transformations.searching and text transformations.

Page 5: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

TheoryTheory

BasicsBasicsA regular expression, often called a A regular expression, often called a patternpattern, is an expression that describes , is an expression that describes a set of strings without actually listing a set of strings without actually listing its elements.its elements.

Say what?Say what?The set of strings {“dog”, “bog”, “fog”} The set of strings {“dog”, “bog”, “fog”} can be described by this regular can be described by this regular expression (regex):expression (regex):[bdf]og[bdf]og

Page 6: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Formal Language TheoryFormal Language Theory

A Regular Language is any language A Regular Language is any language where all possible strings of that where all possible strings of that language can be described by a language can be described by a regular expressionregular expression

Page 7: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Formal Language Theory Formal Language Theory (cont’d)(cont’d)

Regular expressions consist of constants and operators Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, that denote sets of strings and operations over these sets, respectively. Given a finite alphabet Σ the following respectively. Given a finite alphabet Σ the following constants are defined:constants are defined: ((empty setempty set) denoting the set ∅ ∅) denoting the set ∅ ∅ ((empty stringempty string) ε denoting the set {ε} ) ε denoting the set {ε} ((literal characterliteral character) ) aa in Σ denoting the set { in Σ denoting the set {aa} }

and the following operations:and the following operations: ((concatenationconcatenation) ) RSRS denoting the set { αβ | α in denoting the set { αβ | α in RR and β in and β in SS }. }.

For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. "cef"}.

((alternationalternation) ) R|SR|S denoting the set union of denoting the set union of RR and and SS. . ((KleeneKleene star star) ) RR* denoting the smallest * denoting the smallest supersetsuperset of of RR that that

contains ε and is closed under string concatenation. This is contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero the set of all strings that can be made by concatenating zero or more strings in or more strings in RR. For example, {"ab", "c"}* = {ε, "ab", "c", . For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }. "abab", "abc", "cab", "cc", "ababab", ... }.

http://http://en.wikipedia.org/wiki/Regular_expressionen.wikipedia.org/wiki/Regular_expression

Page 8: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

DEMODEMO

All binary numbersAll binary numbers All binary numbers that start and All binary numbers that start and

end in 1end in 1 All binary number that have 00 any All binary number that have 00 any

other bits followed by 111 and any other bits followed by 111 and any other bits.other bits.

Page 9: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Practical UsagePractical Usage

In order of popularityIn order of popularity SearchingSearching

Much nicer than * or %Much nicer than * or % String ManipulationString Manipulation

ParsingParsing ReplacementReplacement

ValidationValidation Input validationInput validation Database check constraintsDatabase check constraints

Page 10: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Best PracticesBest Practices

The most important thing to The most important thing to remember:remember:

Regular Expressions are greedyRegular Expressions are greedy Make the most explicit match possibleMake the most explicit match possible

Just because some implementations allow *? Just because some implementations allow *? Don’t fall back on that.Don’t fall back on that.

Page 11: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Hands OnHands On

Page 12: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

A SermonA Sermon

Some people develop a religious Some people develop a religious fascination with new tools & fascination with new tools & technologies (design patterns, regex, technologies (design patterns, regex, whatever).whatever).

Use the tools that make the most Use the tools that make the most sense for your problem/solution.sense for your problem/solution.

Page 13: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Email Validation with Email Validation with REGEX?REGEX?

Are you kidding me?Are you kidding me? See email.regexSee email.regex Multi-tiered approach, regex to test Multi-tiered approach, regex to test

format, some code to test validity of format, some code to test validity of email address.email address.

Page 14: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Questions?Questions?

Page 15: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

Further ResourcesFurther Resources

Mastering Regular ExpressionsMastering Regular Expressionshttp://www.oreilly.com/catalog/regex/http://www.oreilly.com/catalog/regex/

A website, 1000’s on google. A website, 1000’s on google. http://www.regular-expressions.info/http://www.regular-expressions.info/

MeMehttp://thequeue.net/blog/http://thequeue.net/blog/http://regexadvice.com/blogs/jschoolcraft/http://regexadvice.com/blogs/jschoolcraft/[email protected]@thequeue.net

Page 16: Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005

ToolsTools

The Regulator The Regulator (http://regex.osherove.com/)(http://regex.osherove.com/)

Expresso Expresso (http://www.ultrapico.com/Expresso.(http://www.ultrapico.com/Expresso.htm)htm)