170
Photo By Mr. Christopher Thomas Creative Commons Attribution-ShareALike 2.0 Generic License Beneath the Surface Embracing the True Power of Regular Expressions in Ruby @nellshamrell

Beneath the Surface: Regular Expressions in Ruby

Embed Size (px)

DESCRIPTION

Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us. This talk showcases the incredible power of Ruby and the Oniguruma regex library Ruby runs on. It takes you on a journey beneath the surface, exploring the beauty, elegance, and power of regular expressions. You will discover the flexible, dynamic, and eloquent ways to harness this beauty and power in your own code.

Citation preview

Page 1: Beneath the Surface: Regular Expressions in Ruby

Photo By Mr. Christopher ThomasCreative Commons Attribution-ShareALike 2.0 Generic License

Beneath the Surface

Embracing the True Power of Regular Expressions in Ruby

@nellshamrell

Page 2: Beneath the Surface: Regular Expressions in Ruby

^4[0-9]{12}(?:[0-9]{3})?$

Source: regular-expressions.info

Page 3: Beneath the Surface: Regular Expressions in Ruby

We fear what we do not understand

Page 4: Beneath the Surface: Regular Expressions in Ruby
Page 5: Beneath the Surface: Regular Expressions in Ruby

Regular Expressions

+ Ruby

Photo By ShayanCreative Commons Attribution-ShareALike 2.0 Generic License

Page 6: Beneath the Surface: Regular Expressions in Ruby

Regex Matching in Ruby

RubyMethods

Onigmo

Page 7: Beneath the Surface: Regular Expressions in Ruby

Onigmo

Page 8: Beneath the Surface: Regular Expressions in Ruby

Oniguruma

OnigmoFork

Page 9: Beneath the Surface: Regular Expressions in Ruby

Onigmo

Reads Regex

Page 10: Beneath the Surface: Regular Expressions in Ruby

Onigmo

Reads Regex

AbstractSyntax

Tree

ParsesInto

Page 11: Beneath the Surface: Regular Expressions in Ruby

Onigmo

Reads Regex

AbstractSyntax

Tree

Series ofInstructions

ParsesInto

CompilesInto

Page 13: Beneath the Surface: Regular Expressions in Ruby

A Finite State Machine Shows How

Something Works

Page 14: Beneath the Surface: Regular Expressions in Ruby

Annie the Dog

Page 15: Beneath the Surface: Regular Expressions in Ruby

In the House

Out of House

Annie the Dog

Page 16: Beneath the Surface: Regular Expressions in Ruby

In the House

Out of House

Annie the Dog

Door

Page 17: Beneath the Surface: Regular Expressions in Ruby

In the House

Out of House

Annie the Dog

Door

Door

Page 18: Beneath the Surface: Regular Expressions in Ruby

Finite

State

Machine

Page 19: Beneath the Surface: Regular Expressions in Ruby

Finite

State

Machine

Page 20: Beneath the Surface: Regular Expressions in Ruby

Finite

State

Machine

Page 21: Beneath the Surface: Regular Expressions in Ruby

Multiple States

Page 22: Beneath the Surface: Regular Expressions in Ruby

/force/

Page 23: Beneath the Surface: Regular Expressions in Ruby

re = /force/string = “Use the force”re.match(string)

Page 24: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Path Doesn’t Match

Page 25: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Still Doesn’t Match

Page 26: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Path Matches!

(Fast Forward)

Page 27: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Page 28: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Page 29: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Page 30: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

Page 31: Beneath the Surface: Regular Expressions in Ruby

f o r c e

/force/

“Use the force”

We Have A Match!

Page 32: Beneath the Surface: Regular Expressions in Ruby

re = /force/string = “Use the force”re.match(string)=> #<MatchData “force”>

Page 34: Beneath the Surface: Regular Expressions in Ruby

/Y(olk|oda)/

Pipe

Page 35: Beneath the Surface: Regular Expressions in Ruby

re = /Y(olk|oda)/string = “Yoda”re.match(string)

Page 36: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Page 37: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

Which To Choose?

“Yoda”

Page 38: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Saves To Backtrack

Stack

Page 39: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Uh Oh, No Match

Page 40: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Backtracks To Here

Page 41: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Page 42: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Page 43: Beneath the Surface: Regular Expressions in Ruby

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

We Have A Match!

Page 44: Beneath the Surface: Regular Expressions in Ruby

re = /Y(olk|oda)/string = “Yoda”re.match(string)=> #<MatchData “Yoda”>

Page 46: Beneath the Surface: Regular Expressions in Ruby

/No+/

PlusQuantifier

Page 47: Beneath the Surface: Regular Expressions in Ruby

re = /No+/string = “Noooo”re.match(string)

Page 48: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Page 49: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Page 50: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Return Match? Or Keep Looping?

Page 51: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Greedy Quantifier

KeepsLooping

Page 52: Beneath the Surface: Regular Expressions in Ruby

Greedy quantifiers match as much as possible

Page 53: Beneath the Surface: Regular Expressions in Ruby

Greedy quantifiers use maximum effort for

maximum return

Page 54: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Page 55: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

Page 56: Beneath the Surface: Regular Expressions in Ruby

N o

o

/No+/

“Noooo”

We Have A Match!

Page 57: Beneath the Surface: Regular Expressions in Ruby

re = /No+/string = “Noooo”re.match(string)=> #<MatchData “Noooo”>

Page 58: Beneath the Surface: Regular Expressions in Ruby

Lazy Quantifiers

Page 59: Beneath the Surface: Regular Expressions in Ruby

Lazy quantifiers match as little as possible

Page 60: Beneath the Surface: Regular Expressions in Ruby

Lazy quantifiers use minimum effort for

minimum return

Page 61: Beneath the Surface: Regular Expressions in Ruby

/No+?/

Makes Quantifier

Lazy

Page 62: Beneath the Surface: Regular Expressions in Ruby

re = /No+?/string = “Noooo”re.match(string)

Page 63: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+?/

Page 64: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+?/

Page 65: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+?/

Return Match? Or Keep Looping?

Page 66: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+?/

We Have A Match!

Page 67: Beneath the Surface: Regular Expressions in Ruby

re = /No+?/string = “Noooo”re.match(string)=> #<MatchData “No”>

Page 68: Beneath the Surface: Regular Expressions in Ruby

Greedy quantifiers are greedy but reasonable

Page 69: Beneath the Surface: Regular Expressions in Ruby

/.*moon/

StarQuantifier

Page 70: Beneath the Surface: Regular Expressions in Ruby

re = /.*moon/string = “That’s no moon”re.match(string)

Page 71: Beneath the Surface: Regular Expressions in Ruby

. m o o n

./.*moon/

“That’s no moon”

Page 72: Beneath the Surface: Regular Expressions in Ruby

. m o o n

.

“That’s no moon”

/.*moon/

Page 73: Beneath the Surface: Regular Expressions in Ruby

. m o o n

.

“That’s no moon”

Loops

/.*moon/

Page 74: Beneath the Surface: Regular Expressions in Ruby

. m o o n

. Which To Match?

(Fast Forward)

“That’s no moon”

/.*moon/

Page 75: Beneath the Surface: Regular Expressions in Ruby

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 76: Beneath the Surface: Regular Expressions in Ruby

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 77: Beneath the Surface: Regular Expressions in Ruby

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 78: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”No More

Characters?

./.*moon/

Page 79: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”

Backtrack or Fail?./.*moon/

Page 80: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 81: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 82: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 83: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”Backtracks

Huzzah!./.*moon/

Page 84: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”

./.*moon/

Page 85: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”

./.*moon/

Page 86: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”

./.*moon/

Page 87: Beneath the Surface: Regular Expressions in Ruby

. m o o n

“That’s no moon”

. We Have A Match!

/.*moon/

Page 88: Beneath the Surface: Regular Expressions in Ruby

re = /.*moon/string = “That’s no moon”re.match(string)=> #<MatchData “That’s no moon”>

Page 89: Beneath the Surface: Regular Expressions in Ruby

Backtracking = Slow

Page 90: Beneath the Surface: Regular Expressions in Ruby

/No+w+/

Page 91: Beneath the Surface: Regular Expressions in Ruby

re = /No+w+/string = “Noooo”re.match(string)

Page 92: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

w

Page 93: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

w

Page 94: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wLoops

Page 95: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wLoops

Page 96: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wLoops

Page 97: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

Page 98: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

Backtrack or Fail?

Page 99: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 100: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 101: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 102: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

/No+w+/

w

w

Match FAILS

Page 103: Beneath the Surface: Regular Expressions in Ruby

Possessive Quantifers

Page 104: Beneath the Surface: Regular Expressions in Ruby

Possessive quantifiers do not backtrack

Page 105: Beneath the Surface: Regular Expressions in Ruby

Makes Quantifier Possessive

/No++w+/

Page 106: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

w

/No++w+/

Page 107: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

w

/No++w+/

Page 108: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

wLoops

/No++w+/

Page 109: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

wLoops

/No++w+/

Page 110: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

wLoops

/No++w+/

Page 111: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

w

/No++w+/

Page 112: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

wLoops

Uh Oh

Backtrack or Fail?

/No++w+/

Page 113: Beneath the Surface: Regular Expressions in Ruby

N o

o“Noooo”

w

w

Match FAILS

/No++w+/

Page 114: Beneath the Surface: Regular Expressions in Ruby

Possessive quantifiers fail faster by

controlling backtracking

Page 115: Beneath the Surface: Regular Expressions in Ruby
Page 117: Beneath the Surface: Regular Expressions in Ruby
Page 118: Beneath the Surface: Regular Expressions in Ruby

snake_case to CamelCase

Page 119: Beneath the Surface: Regular Expressions in Ruby

Find first letter of string and capitalize it

snake_case to CamelCase

Page 120: Beneath the Surface: Regular Expressions in Ruby

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

Page 121: Beneath the Surface: Regular Expressions in Ruby

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

Remove underscores

snake_case to CamelCase

Page 122: Beneath the Surface: Regular Expressions in Ruby

Find first letter of string and capitalize it

snake_case to CamelCase

Page 123: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 124: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 125: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 126: Beneath the Surface: Regular Expressions in Ruby

/ /^

Anchors Match To

Beginning Of String

Page 127: Beneath the Surface: Regular Expressions in Ruby

/ /\ w^

Matches Any Word

Character

Page 128: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

Page 129: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

Page 130: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

Spec Passes!

Page 131: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Page 132: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Page 133: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Spec Fails!

Page 134: Beneath the Surface: Regular Expressions in Ruby

Expected: ʺ″_Methodʺ″Got: ʺ″_methodʺ″

Spec Failure:

Page 135: Beneath the Surface: Regular Expressions in Ruby

Problem:Matches Letters AND Underscores

\ w^/ /

Page 136: Beneath the Surface: Regular Expressions in Ruby

/ /[a-z]^

Matches Only

Lowercase Letters

Page 137: Beneath the Surface: Regular Expressions in Ruby

/ /[a-z]^[^a-z]

Matches everything

BUT lowercase letters

Page 138: Beneath the Surface: Regular Expressions in Ruby

/ /[a-z][̂^a-z]?

Makes Character

Class Optional

Page 139: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

/ /[a-z]^[^a-z]?

Page 140: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}

Spec Passes!

re = / /[a-z]^[^a-z]?

Page 141: Beneath the Surface: Regular Expressions in Ruby

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

Page 142: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

Page 143: Beneath the Surface: Regular Expressions in Ruby

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

Page 144: Beneath the Surface: Regular Expressions in Ruby

/ /[a-z]^[^a-z]?

Page 145: Beneath the Surface: Regular Expressions in Ruby

Pipe For Alternation

| [a-z]/ /[a-z]^[^a-z]?

Page 146: Beneath the Surface: Regular Expressions in Ruby

Look Behind

(?<=_)| [a-z]/ /[a-z]^[^a-z]?

Page 147: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

| [a-z](?<=_)/ /[a-z]^[^a-z]?

Page 148: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

| [a-z](?<=_)/ /[a-z]^[^a-z]?

Spec Passes!

Page 149: Beneath the Surface: Regular Expressions in Ruby

Remove underscores

snake_case to CamelCase

Page 150: Beneath the Surface: Regular Expressions in Ruby

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 151: Beneath the Surface: Regular Expressions in Ruby

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 152: Beneath the Surface: Regular Expressions in Ruby

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 153: Beneath the Surface: Regular Expressions in Ruby

MatchesAn

Underscore

/ /_

Page 154: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def rmv_underscores(string)

end

re = string.gsub(re, “”)

/ /_

Page 155: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)re = / /_

Page 156: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)

Spec Passes!

re = / /_

Page 157: Beneath the Surface: Regular Expressions in Ruby

Combine results of two methods

snake_case to CamelCase

Page 158: Beneath the Surface: Regular Expressions in Ruby

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 159: Beneath the Surface: Regular Expressions in Ruby

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 160: Beneath the Surface: Regular Expressions in Ruby

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 161: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)

Page 162: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

Page 163: Beneath the Surface: Regular Expressions in Ruby

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

Spec Passes!

Page 166: Beneath the Surface: Regular Expressions in Ruby

Develop regular expressions in small pieces

Page 167: Beneath the Surface: Regular Expressions in Ruby
Page 168: Beneath the Surface: Regular Expressions in Ruby

If you write code, you can write regular expressions

Page 169: Beneath the Surface: Regular Expressions in Ruby

Move beyond the fear