Upload
vivien-phillips
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Copyright © 2008-2015 Curt Hill
Regular Expressions
Providing a Search Pattern
What are they?• A special text pattern for describing a
search pattern• This text pattern allows special
sequences to have special meaning• Any other characters may just
appear in the searched string
Copyright © 2008-2015 Curt Hill
Specials• The special characters include– [ ]\^*$.?+(){}– The braces may be literal or special depending
on their usage
• Any other character just matches itself• ThusHelloas a pattern just matches the obvious string
• Since many of these characters are valuable in strings the escape is used to match them
Copyright © 2008-2015 Curt Hill
Escape• The backslash character is the escape• Thus to look for an asterisk (a special)
in a string it must be escaped: \*– This allows a search to find the asterisk
• The C family uses some of the same escape sequences:– \n newline or linefeed– \t tab– \r carriage return
Copyright © 2008-2015 Curt Hill
Coded escapes• An x and two hexadecimal digits may
also follow the backslash• Thus
\x4Egives the ASCII character with hexadecimal value 4E (an N in ASCII)
Copyright © 2008-2015 Curt Hill
Positioning• There are two specials that force a
position• ^ matches the beginning of the line• $ matches the end of the line• Both of these match a position rather
than a character• Without these a pattern could match
anywhere within a string
Copyright © 2008-2015 Curt Hill
Positioning examples• The pattern:
^Hiwill match any line that starts with the two characters H and I
• The pattern:,$will match any line that ends with a comma
• The pattern:^Hello$will match only a line that has Hello as its only content
Copyright © 2008-2015 Curt Hill
Wildcards• The dot will match any one character
– Except end of line control characters
• Thus A.Bcould match ABB, ACB, A.B or any other three character sequence starting with A and ending with B
Copyright © 2008-2015 Curt Hill
Repetition• It is often desirable to repeat a
pattern a fixed number of times• This is done by following the pattern
with a set of braces with an integer inside
• Thus abbbcis the same asab{3}c
Copyright © 2008-2015 Curt Hill
Repetition• There are three repetition characters
which are more general• Closure is the *
– It represents zero or more repetitions of the previous item
• The + represents one or more repetitions of the previous item
• The ? represent zero or one occurrences of the previous item
Copyright © 2008-2015 Curt Hill
Examples• ~* matches any number (including
zero) of successive tildes• \-* matches zero or more dashes• .+ matches one or more of any
character• hats? matches either hat or hats
Copyright © 2008-2015 Curt Hill
Grouping• The repetitions could only be applied
to a single character• What is next needed is some type of
grouping• This is provided by the parenthesis• Enclosing a pattern in parenthesis
makes it a group• This group can then be followed by a
repetition character
Copyright © 2008-2015 Curt Hill
Examples• (*-)* will match
– *-– *-*-– *-*-*- etc
• The * is greedy – it will try to match as many of these as is possible
Copyright © 2008-2015 Curt Hill
More interesting patterns• A number is pretty to understand
from our perspective but not so easy to describe – Except in regular expressions
• An integer is a string of digits– Possibly preceded by a plus or minus
• So how is this done?• With sets and repetition
Copyright © 2008-2015 Curt Hill
A set• A pair of brackets may be filled with
character• This will match any one of them• Thus the digits could be done with:[0123456789]
• An integer could then be:[-+]? [0123456789]+
• Any single vowel is:[aeiouAEIOU]
Copyright © 2008-2015 Curt Hill
Ranges in sets• The letters are somewhat more than
we want to type• The range is handled by a dash:[0-9]is the same as[0123456789]
• The letters are then:[a-zA-Z]
• If you want a dash in a set place it first
Copyright © 2008-2015 Curt Hill
Complement or Negation• You may place a caret ^ at the
beginning of a set to ask for any character but those present
• Thus [^0-9]is any character but a digit
Copyright © 2008-2015 Curt Hill
Shortcut sets• Several classes are so commonly used
that a shortcut exists• This is an escaped character• \d is a digit [0-9]• \D is not a digit [^0-9]• \w is an alphanumeric [a-zA-Z0-9_]• \W is not an alphanumeric [^a-zA-Z0-
9_]• \s is whitespace [ \r\n\t\f\v]
– \f is formfeed, \v is vertical tab
• \S is not whitespace [^ \r\n\t\f\v]
Copyright © 2008-2015 Curt Hill
Specials• In some sense the right parenthesis,
right bracket and dash are ambiguous as specials
• If found in certain contexts they are regular and in others as specials
• The rights are only special if there is a leading left
• Dash is only special in a set and following another character
Copyright © 2008-2015 Curt Hill
Alternation• A set provides intuitive alternation• The match process may choose any
character within the set to use• The alternation is only applied to
number of single characters• There is also an alternation character
– The vertical bar |
• This allows either simple or complicated patterns to alternate
Copyright © 2008-2015 Curt Hill
Alternation• Thus:
A|E|I|O|U is equivalent to [AEIOU]
• However, more interesting alternations are possible and useful– (abc)|(123) will match either of the two
strings– ([-+]?\d)+|(\w+) will match any string of
characters that looks like a number or word
Copyright © 2008-2015 Curt Hill
How to use in JavaScript?• There are two ways that deserve
some attention• Strings have a search and replace
method– Easiest– Will deal with this one first
• The RegExp object– Most versatile and most complicated
Copyright © 2008-2015 Curt Hill
String search• The search method takes a RegExp
pattern and returns an integer position
• The result is the index if found and -1 if the pattern has not been found
• If the pattern is a string it is cast into a RegExp– You cannot always use the other
features of the RegExp object– It is a powerful feature anyhow
Copyright © 2008-2015 Curt Hill
One little glitch• Since the escape is the \ for both
strings and regular expressions we have a little problem
• To code the pattern\.for a literal dot, we would have to code:“\\.”
• Since this awkward we do something else
Copyright © 2008-2015 Curt Hill
Regular Expression Pattern• JavaScript has an alternative form for
regular expression patterns• Instead of enclosing the string in quotes
where the escape sequence must be dealt with it uses the forward slash as the delimitter
• Thus:/\./is a valid regular expression pattern equivalent to “\\.”
Copyright © 2008-2015 Curt Hill
Slash Notation• This notation looks funny but avoids
the doubling of the escape character• It may be assigned to variables:
var s = /\$\d*/• Doing so makes s a RegExp object
Copyright © 2008-2015 Curt Hill
Pattern Modifiers• There are several pattern modifiers
– Lower case letters that follow the slash pattern notation
• An i means ignore case on whole pattern– /[A-Z]*/i will match any string of any
letters
• Others are possible as well– m and g
• These are also known as flagsCopyright © 2008-2015 Curt Hill
Search example• Considers = "2314 Misc $23.85 in stock";// A pattern for moneynumpat = /\$\d*\.\d*/;int = s.search(numpat);document.write( "<P>position is ",int);
• The result displayed is 12
Copyright © 2008-2015 Curt Hill
String Replace• A search is not the only thing
available – There is also a replace
• Takes two parameters– The search pattern– The replacement string
• Returns the new string• Only one pattern will be replaced
Copyright © 2008-2015 Curt Hill
Example:• This codes = "Welcome to VCSU. VCSU is cool.“;t = s.replace(/VCSU/, "Valley City State");document.write("<P> ",t);
• Will provide the following outputWelcome to Valley City State. VCSU is cool.
Copyright © 2008-2015 Curt Hill
Match• The match method is somewhat
more complicated and will not be considered seriously here
• It is similar to search • Depending on property settings it
may return a single integer position or an array of integers containing all matches
Copyright © 2008-2015 Curt Hill
RegExp object• Clearly there is more than could be
learned from the pattern match• We would like to know
– What actual string was matched– What was the last position of the
matched string– Among many others
• This will also help us to modify how things are done
Copyright © 2008-2015 Curt Hill
Constructor• Just assigning a pattern to a variable
does construction:re = /\d*/i;
• You may also use a regular constructor– The first parameter is the pattern– The second the modifiers– re = new RegExp(/Hello/,”i”)
Copyright © 2008-2015 Curt Hill
exec Method• The exec method returns the
characters that matched• The parameter is the string• Example:re = /[0-9]+/;s = re.exec( “answers 239 and 512”);
• Returns the 239 as a string• Does the search thing but produces a
string instead of a number– Returns null for failure
Copyright © 2008-2015 Curt Hill
Global searching• You may set the global searching
modifier with the g suffix• Each search will set the lastIndex
property to the where the search pattern ended– First location not matched
• A subsequent search will start at this location
• If the object does not have global set, the lastIndex will not be changed
Copyright © 2008-2015 Curt Hill
Example• Consider:re = /[0-9]+/g;str = "the answers are 239 and 512“;s = re.exec(str);t = re.exec(str);
• The s will hold the 239 and the t the 512• More serious manipulations could use
lastIndex to do more complicated things
Copyright © 2008-2015 Curt Hill