11
CSC1018F: Regular Expressions Diving into Python Ch. 7 Number Systems

CSC1018F: Regular Expressions

Embed Size (px)

DESCRIPTION

CSC1018F: Regular Expressions. Diving into Python Ch. 7 Number Systems. Lecture Outline. Recap of OO Python [week 3] Regular Expressions Standard Verbose Number Systems Binary, decimal, hexadecimal. Recap of OO Python. Object Orientation: Module importing - PowerPoint PPT Presentation

Citation preview

Page 1: CSC1018F: Regular Expressions

CSC1018F:Regular Expressions

Diving into Python Ch. 7Number Systems

Page 2: CSC1018F: Regular Expressions

Lecture Outline

Recap of OO Python [week 3]Regular Expressions

StandardVerbose

Number SystemsBinary, decimal, hexadecimal

Page 3: CSC1018F: Regular Expressions

Recap of OO Python

Object Orientation:Module importingDefining, initializing and instantiating ClassesClass attributesClass methods

ExceptionsFile Handling:

Opening, reading, writing and closing

Page 4: CSC1018F: Regular Expressions

Intro to Regular Expressions

Regular expressions are a powerful means for parsing text to identify complex patterns of charactersStandard string methods (find, replace, split) can be insufficient in complex casesBut regular expressions can be complicated and difficult to read so avoid them if string methods will do the jobRead regular expressions from left to rightUsage:

Import re # regular expression functionality in re moduleRe.sub(regexpr, repstr, inputstr) # typical search & replace

Page 5: CSC1018F: Regular Expressions

Format of Regular Expressions

Syntax:$ - end of string marker^ - start of string marker\b - word boundary marker (to avoid backslash escapes use a raw string - r"stringcontents")? - optional match to a single character(A|B|C) - indicates mutually exclusive options A, B and C

Examples:re.sub(r"\bROAD$", "RD.", addr)addr: 60 BROAD ROAD 60 BROAD RD.re.search(r"^(a|b|c) -", question)question: a - how are you? <SRE_Match object …>

Page 6: CSC1018F: Regular Expressions

Further Syntax

P{n, m} syntax:Deals with repeating patternsRead as pattern P appears at least n times but no more than m times

More syntax:\d - any numeric digit\D - any character except a numeric digit+ - 1 or more* - 0 or more( ) - to indicate groups

Examples:>>> phPat = re.compile(r"^(\d{3})\D*(\d{7})$")>>> phPat.search(“021 6504058”).groups()(‘021’, ‘6504058’)

Page 7: CSC1018F: Regular Expressions

Verbose Regular Expressions

So far only compact regular expressionsTo aid readability we would like to include comments and spacesUse re.VERBOSE as the last arguments to re functions

Whitespace is ignoredComments ( # commentstr) are ignored

Example: pattern = """

^ # beginning of string

$ # end of string

"""

Page 8: CSC1018F: Regular Expressions

Case Study

Counting 1-10 in roman numeralsAdditive and subtractive combination of I (=1), V(=5), X (=10)Can have at most 3 of a particular numeral in a row

>>> roman = r"^(I?X|IV|V?I{0,3})$">>> re.search(roman, "X")<_sre.SRE_Match object at 0x1e55be0>>>> re.search(roman, "VIII")<_sre.SRE_Match object at 0x1e55ba0> >>> re.search(roman, "")<_sre.SRE_Match object at 0x1e55ce0>>>> re.search(roman, "IIII") == NoneTrue

Page 9: CSC1018F: Regular Expressions

Number Systems

Decimal (base 10)Digits (0-9)Each place represents a power of ten172 = 2*100 + 7*101 + 1*102 = 172

Binary (base 2)Digits (0,1)Each place represents a power of two

10011 = 1*20 + 1*21 + 0* 22 + 0* 23 + 1* 24 = 19 Hexadecimal (base 16)

Digits (0-9, A-F)A-F represent 10-15Each place represents a power of sixteenE.g., F7A = 10*160 + 7* 161 + 15* 162 = 3962

Page 10: CSC1018F: Regular Expressions

Conversion

Decimal to othersRepeatedly divide number by base and populate places from right to left with the remainderE.g. Dec2Bin: 50 / 2 [% = 0] = 25 / 2 [% = 1] = 12 / 2 [% = 0] = 6 / 2 [% = 0] = 3 / 2 [% = 1] = 1 / 2 [% = 1] = 0 [110010]

Bin2Hex:Collect binary digits into groups of four and convertE.g., 111000011111 = 1110 0001 1111 = E1F

Hex2BinHexadecimal digits convert into groups of four binary digits

E.g., A7C = 1010 0111 1100 = 101001111100 Hex is used because:

It is easy to convert to and from binaryOffers a more compact representation

Page 11: CSC1018F: Regular Expressions

Revision Exercise

Create a function which will take a date string in any one of the following formats:

dd/mm/yyyy or dd/mm/yyOther separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowedSingle figure entries may have the form x or 0x, e.g. 3/4/5 or 03/04/05dd month yy or yyyy where month may be written in full (December) or abbreviated (Dec. or Dec)

And return it in the format: dd month(in full) yyyy, e.g. 13 March 2006

Implement this using regular expressions and also implement range checking on dates