36
Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Embed Size (px)

Citation preview

Page 1: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Copyright © 2008-2015 Curt Hill

Regular Expressions

Providing a Search Pattern

Page 2: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

What are they?• A special text pattern for describing a

search pattern• This text pattern allows special

sequences to have special meaning• Any other characters may just

appear in the searched string

Copyright © 2008-2015 Curt Hill

Page 3: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Specials• The special characters include– [ ]\^*$.?+(){}– The braces may be literal or special depending

on their usage

• Any other character just matches itself• ThusHelloas a pattern just matches the obvious string

• Since many of these characters are valuable in strings the escape is used to match them

Copyright © 2008-2015 Curt Hill

Page 4: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Escape• The backslash character is the escape• Thus to look for an asterisk (a special)

in a string it must be escaped: \*– This allows a search to find the asterisk

• The C family uses some of the same escape sequences:– \n newline or linefeed– \t tab– \r carriage return

Copyright © 2008-2015 Curt Hill

Page 5: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Coded escapes• An x and two hexadecimal digits may

also follow the backslash• Thus

\x4Egives the ASCII character with hexadecimal value 4E (an N in ASCII)

Copyright © 2008-2015 Curt Hill

Page 6: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Positioning• There are two specials that force a

position• ^ matches the beginning of the line• $ matches the end of the line• Both of these match a position rather

than a character• Without these a pattern could match

anywhere within a string

Copyright © 2008-2015 Curt Hill

Page 7: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Positioning examples• The pattern:

^Hiwill match any line that starts with the two characters H and I

• The pattern:,$will match any line that ends with a comma

• The pattern:^Hello$will match only a line that has Hello as its only content

Copyright © 2008-2015 Curt Hill

Page 8: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Wildcards• The dot will match any one character

– Except end of line control characters

• Thus A.Bcould match ABB, ACB, A.B or any other three character sequence starting with A and ending with B

Copyright © 2008-2015 Curt Hill

Page 9: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Repetition• It is often desirable to repeat a

pattern a fixed number of times• This is done by following the pattern

with a set of braces with an integer inside

• Thus abbbcis the same asab{3}c

Copyright © 2008-2015 Curt Hill

Page 10: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Repetition• There are three repetition characters

which are more general• Closure is the *

– It represents zero or more repetitions of the previous item

• The + represents one or more repetitions of the previous item

• The ? represent zero or one occurrences of the previous item

Copyright © 2008-2015 Curt Hill

Page 11: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Examples• ~* matches any number (including

zero) of successive tildes• \-* matches zero or more dashes• .+ matches one or more of any

character• hats? matches either hat or hats

Copyright © 2008-2015 Curt Hill

Page 12: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Grouping• The repetitions could only be applied

to a single character• What is next needed is some type of

grouping• This is provided by the parenthesis• Enclosing a pattern in parenthesis

makes it a group• This group can then be followed by a

repetition character

Copyright © 2008-2015 Curt Hill

Page 13: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Examples• (*-)* will match

– *-– *-*-– *-*-*- etc

• The * is greedy – it will try to match as many of these as is possible

Copyright © 2008-2015 Curt Hill

Page 14: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

More interesting patterns• A number is pretty to understand

from our perspective but not so easy to describe – Except in regular expressions

• An integer is a string of digits– Possibly preceded by a plus or minus

• So how is this done?• With sets and repetition

Copyright © 2008-2015 Curt Hill

Page 15: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

A set• A pair of brackets may be filled with

character• This will match any one of them• Thus the digits could be done with:[0123456789]

• An integer could then be:[-+]? [0123456789]+

• Any single vowel is:[aeiouAEIOU]

Copyright © 2008-2015 Curt Hill

Page 16: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Ranges in sets• The letters are somewhat more than

we want to type• The range is handled by a dash:[0-9]is the same as[0123456789]

• The letters are then:[a-zA-Z]

• If you want a dash in a set place it first

Copyright © 2008-2015 Curt Hill

Page 17: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Complement or Negation• You may place a caret ^ at the

beginning of a set to ask for any character but those present

• Thus [^0-9]is any character but a digit

Copyright © 2008-2015 Curt Hill

Page 18: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Shortcut sets• Several classes are so commonly used

that a shortcut exists• This is an escaped character• \d is a digit [0-9]• \D is not a digit [^0-9]• \w is an alphanumeric [a-zA-Z0-9_]• \W is not an alphanumeric [^a-zA-Z0-

9_]• \s is whitespace [ \r\n\t\f\v]

– \f is formfeed, \v is vertical tab

• \S is not whitespace [^ \r\n\t\f\v]

Copyright © 2008-2015 Curt Hill

Page 19: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Specials• In some sense the right parenthesis,

right bracket and dash are ambiguous as specials

• If found in certain contexts they are regular and in others as specials

• The rights are only special if there is a leading left

• Dash is only special in a set and following another character

Copyright © 2008-2015 Curt Hill

Page 20: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Alternation• A set provides intuitive alternation• The match process may choose any

character within the set to use• The alternation is only applied to

number of single characters• There is also an alternation character

– The vertical bar |

• This allows either simple or complicated patterns to alternate

Copyright © 2008-2015 Curt Hill

Page 21: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Alternation• Thus:

A|E|I|O|U is equivalent to [AEIOU]

• However, more interesting alternations are possible and useful– (abc)|(123) will match either of the two

strings– ([-+]?\d)+|(\w+) will match any string of

characters that looks like a number or word

Copyright © 2008-2015 Curt Hill

Page 22: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

How to use in JavaScript?• There are two ways that deserve

some attention• Strings have a search and replace

method– Easiest– Will deal with this one first

• The RegExp object– Most versatile and most complicated

Copyright © 2008-2015 Curt Hill

Page 23: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

String search• The search method takes a RegExp

pattern and returns an integer position

• The result is the index if found and -1 if the pattern has not been found

• If the pattern is a string it is cast into a RegExp– You cannot always use the other

features of the RegExp object– It is a powerful feature anyhow

Copyright © 2008-2015 Curt Hill

Page 24: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

One little glitch• Since the escape is the \ for both

strings and regular expressions we have a little problem

• To code the pattern\.for a literal dot, we would have to code:“\\.”

• Since this awkward we do something else

Copyright © 2008-2015 Curt Hill

Page 25: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Regular Expression Pattern• JavaScript has an alternative form for

regular expression patterns• Instead of enclosing the string in quotes

where the escape sequence must be dealt with it uses the forward slash as the delimitter

• Thus:/\./is a valid regular expression pattern equivalent to “\\.”

Copyright © 2008-2015 Curt Hill

Page 26: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Slash Notation• This notation looks funny but avoids

the doubling of the escape character• It may be assigned to variables:

var s = /\$\d*/• Doing so makes s a RegExp object

Copyright © 2008-2015 Curt Hill

Page 27: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Pattern Modifiers• There are several pattern modifiers

– Lower case letters that follow the slash pattern notation

• An i means ignore case on whole pattern– /[A-Z]*/i will match any string of any

letters

• Others are possible as well– m and g

• These are also known as flagsCopyright © 2008-2015 Curt Hill

Page 28: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Search example• Considers = "2314 Misc $23.85 in stock";// A pattern for moneynumpat = /\$\d*\.\d*/;int = s.search(numpat);document.write( "<P>position is ",int);

• The result displayed is 12

Copyright © 2008-2015 Curt Hill

Page 29: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

String Replace• A search is not the only thing

available – There is also a replace

• Takes two parameters– The search pattern– The replacement string

• Returns the new string• Only one pattern will be replaced

Copyright © 2008-2015 Curt Hill

Page 30: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Example:• This codes = "Welcome to VCSU. VCSU is cool.“;t = s.replace(/VCSU/, "Valley City State");document.write("<P> ",t);

• Will provide the following outputWelcome to Valley City State. VCSU is cool.

Copyright © 2008-2015 Curt Hill

Page 31: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Match• The match method is somewhat

more complicated and will not be considered seriously here

• It is similar to search • Depending on property settings it

may return a single integer position or an array of integers containing all matches

Copyright © 2008-2015 Curt Hill

Page 32: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

RegExp object• Clearly there is more than could be

learned from the pattern match• We would like to know

– What actual string was matched– What was the last position of the

matched string– Among many others

• This will also help us to modify how things are done

Copyright © 2008-2015 Curt Hill

Page 33: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Constructor• Just assigning a pattern to a variable

does construction:re = /\d*/i;

• You may also use a regular constructor– The first parameter is the pattern– The second the modifiers– re = new RegExp(/Hello/,”i”)

Copyright © 2008-2015 Curt Hill

Page 34: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

exec Method• The exec method returns the

characters that matched• The parameter is the string• Example:re = /[0-9]+/;s = re.exec( “answers 239 and 512”);

• Returns the 239 as a string• Does the search thing but produces a

string instead of a number– Returns null for failure

Copyright © 2008-2015 Curt Hill

Page 35: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Global searching• You may set the global searching

modifier with the g suffix• Each search will set the lastIndex

property to the where the search pattern ended– First location not matched

• A subsequent search will start at this location

• If the object does not have global set, the lastIndex will not be changed

Copyright © 2008-2015 Curt Hill

Page 36: Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

Example• Consider:re = /[0-9]+/g;str = "the answers are 239 and 512“;s = re.exec(str);t = re.exec(str);

• The s will hold the 239 and the t the 512• More serious manipulations could use

lastIndex to do more complicated things

Copyright © 2008-2015 Curt Hill