33
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove www.iserializable.com Methodology & Team System Expert Sela Group www.Sela.co.il The hidden power language

Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The

Embed Size (px)

Citation preview

Page 1: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Regular Expressions

Roy Osherovewww.iserializable.com

Methodology &

Team System Expert

Sela Group

www.Sela.co.il

The hidden power language

Page 2: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Tools

http://tools.osherove.comwww.ISerializable.com

Page 3: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

The Log File

Page 4: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Developer Problem– Make this log file useful

Old log file from a *nix system’s entriesConverted to and from various formatsSearched by usersFormat may change Search fields can be added, removed or renamed at runtime

Date CPUs|ram|cpu HH:mm:ss action user domain.machine25/05/1998 1|00512|x86 21:49:12 [Search] Anakin Antler.Anita125/05/98 1|00512|x86 21:51:15 [Update] Anakin Antler.Anita126/05/1998 1|00256|x86 11:02:45 [Search] Darth Cydot.Uk.Gerry2k26/05/98 1|00256|x86 11:12:49 [Update] Darth Cydot.Uk.Gerry2k27/05/98 1|00512|x86 15:34:30 [Search] Anakin Anterl.Anita112/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez

Page 5: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

About 15 minutes later…

Done.

About 45 minutes later…Home early.

Page 6: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

You can be home early too!Regex is easier than you think

Page 7: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

What are Regular Expressions?A language to describe a language using “patterns”Think SQL or XPath – for textOriginated with Perl and *nix shell scriptingMany variations and frameworks exist. Only one for .NET (for now)Used in most languages

Page 8: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Common Regex Uses

Text ValidationPhones, emails, address or any format requirement

Text ManipulationTransform text

Text ParsingFind in files, site Scraping, data collection

Page 9: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

What .NET brings to the plate

Full object modelExtended syntaxOptimization techniques in the framework

Page 10: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

.NET Regular ExpressionsShow up in several places:

In the classes of the System.Text.RegularExpressions namespace

Via the RegularExpressionValidator validator control (for ASP.NET)Sprinkled in dozens of other places

Browser capabilities filterIn the WSDL <match> tagAnd many more

Page 11: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Key Classes within System.Text.RegularExpressions

RegexContains the pattern and matching optionsImportant methods:

IsMatch() returns booleanReplace() returns a stringSplit() returns a string array…

Main Use: Validation, Splitting, Replacing text

Page 12: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

The Process

Pattern

Input

Regex

Matches

Splits

TextReplace text

Options

Page 13: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Validation

Page 14: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Syntax

Match exact text as written in the pattern‘a’ will match all ‘a’ in the text.

Except for special symbols:

Page 15: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Enclosing Alternatives with []The square brackets allow you to specify a list of alternate values. Used in conjunction with the – operator, you can even specify character ranges.

[Cc] Capital or lowercase c[A-Z] Any capital letter A through Z[A-Za-z] Any capital or lowercase letter[0-9] Any digit 0 through 9[A-Za-z0-9] Any letter or digit[0-9.+-&=%] Any digit or special char listed

Notice: no escape needed

Page 16: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Controlling ExpressionFrequency with {}

The {} operators allow you to control the frequency of the preceding expression. The expression takes one of these two forms:

{occurrences} [A-Za-z]{3}

{MinOccurrences, MaxOccurences} [A-Za-z]{1,3}

Page 17: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Basic Frequency Operators? 0 or 1* 0 or more+ 1 or moreSo,

3+ Will match

3, 33, 3333but not

45, 678.

Page 18: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Wildcard Operator: .. matches any non-newline character

Unless multiline mode has been turned on for the patternExamples:

A.$ would match a capital A followed by one any character.

Will not match Abc

A.+ would match a capital A followed by one or more non-newline characters\.htm.? would match ".htm" followed by

an optional non-newline characterBackslash == escape characters that have reserved meanings in regular expressions

Page 19: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Convenience Expressions\d

Any digit

\DAny non-digitMust match something else one

\sAny whitespace character (such as a space or tab)

\SAny character other than a whitespace character

\wAny number or letter

\WAny character other than a number or letter

Many more: Unicode, Hex Values, negative lookups…

Page 20: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Quick Quiz!

[A-Za-z]{3}3 capital or lowercase lettersAbc, abc, aBC,1bc

[A-Z][a-z]{2,4}A capital letter followed by at least 2 but not more than 4 lowercase lettersAbc, Acbde, abcde, ABcde

\w{3,8}\.\w{3}3 to 8 AlphaNumeric characters, followed by a dot and 3 alpha numericsFilename.txt, d0main.com, 1234.567, 34.456

Page 21: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Splitting and Manipulating

Page 22: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

The Spammer

Page 23: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

(2) Key Classes within System.Text.RegularExpressions

MatchCollection - MatchMatchCollection stores all the matches found

GroupCollection - GroupCaptureCollection - Capture

Regex.Match() returns MatchRegex.Matches() returns MatchCollection…

Main Use: Parsing, searching, collecting data

Page 24: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Simple parsingParsing for emails

Page 25: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Grouping(the coolest part)

Page 26: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Grouping (pay attention!)Groups give us object models

HTML [email protected]

Create a capture hierarchy and use it in code

[\w\.\-]+@ [\w\.\-]+\.\w{2,5}

(?<userName>[\w\.\-]+)@(?<domain>[\w\.\-]+\.\w{2,5})

Page 27: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Grouping Emails& The Regulator

Page 28: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Regulazy

Build simple expressions by exampleNo syntax knowledge neededFreeTools.osherove.com

Page 29: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

When not to use Regex

When its easier and more readable to do it otherwiseNot just because it’s “cool”Hard to readSteep learning curveHard to maintain

“Sometimes, when confronted with a problem, you might decide to solve it with Regular Expressions for the wrong reasons. Now you you’ve got two problems.”

Page 30: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Summary

Amazing parsing flexibilityGood skill to have anywhereCan save you time and nervesWith Power comes responsibilityWeigh the pros and cons before using

Page 31: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Resources

The Regulator tools.osherove.comRegulazy tools.osherove.comRegexlib.com – Regex archive (http://www.regexlib.com) + Cheat Sheethttp://www.regular-expressions.info

Roy Osherove: [email protected]: www.iserializable.com

Page 32: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity

Thank you!

Questions?

Roy Osherove: [email protected]: www.iserializable.com

Page 33: Satisfy Your Technical Curiosity Regular Expressions Roy Osherove  Methodology & Team System Expert Sela Group  The

Satisfy Your Technical Curiosity