24
Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney College Fri, Jan 16, 2015 Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 1 / 24

Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Specification of TokensLecture 2

Section 3.3

Robb T. Koether

Hampden-Sydney College

Fri, Jan 16, 2015

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 1 / 24

Page 2: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 2 / 24

Page 3: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Outline

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 3 / 24

Page 4: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Alphabets and Strings

Definition (Alphabet)An alphabet is a finite set of symbols. Traditionally, we denote analphabet by the letter Σ.

Definition (String)A string is a finite sequence of symbols.

Definition (Empty String)The empty string, denoted ε, is the string that contains no symbols.The empty string has length 0.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 4 / 24

Page 5: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Alphabets and Strings

Example (Alphabets and Strings)Examples:

The traditional alphabet is

Σ = {A, B, C, . . . , Z}.

The binary alphabet is Σ = {0, 1}.For C programs, the alphabet is the set of ASCII characters.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 5 / 24

Page 6: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Languages

Definition (Language)A language is a set of (finite) strings over a given alphabet.

A language can be (and usually is) infinite.The set of all even integers over the alphabet Σ = {0, 1, . . . , 9} isa language.The set of all C programs is a language.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 6 / 24

Page 7: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Outline

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 7 / 24

Page 8: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Operations on Languages

Definition (Language)Let L, L1, and L2 be languages.

Union:L1 ∪ L2 = {x | x ∈ L1 or x ∈ L2}.

Concatenation:

L1L2 = {xy | x ∈ L1 and y ∈ L2}.

Repeated concatenation:

Ln = LLL · · · L n copies of L.

Kleene closure:

L∗ = {ε} ∪ L ∪ L2 ∪ L3 · · · .

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 8 / 24

Page 9: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Operations on Languages

Example (Language)Let

L1 = {1, 3, 5, 7, 9}

andL2 = {0, 2, 4, 6, 8}.

Describe L1 ∪ L2.Describe L1L2.Describe (L1 ∪ L2)3.Describe (L1 ∪ L2)∗L2.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 9 / 24

Page 10: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Outline

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 10 / 24

Page 11: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

A regular expression can be used to describe a language.Regular expressions may be defined in two parts.

The basic part.The recursive part.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 11 / 24

Page 12: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

The basic part:ε represents the language {ε}.a represents the language {a} for every a ∈ Σ.Call these languages L(ε) and L(a), respectively.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 12 / 24

Page 13: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

The recursive part: Let r and s denote regular expressions.r | s represents the language L(r) ∪ L(s).rs represents the language L(r)L(s).r∗ represents the language L(r)∗.(r) represents the language L(r).

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 13 / 24

Page 14: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

In other wordsL(r | s) = L(r) ∪ L(s).L(rs) = L(r)L(s).L(r∗) = L(r)∗.L((r)) = L(r).

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 14 / 24

Page 15: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Example

Example (Identifiers)Identifiers in C++ can be represented by a regular expression.

r = A | B | · · · | Z | a | b | · · · | zs = 0 | 1 | · · · | 9t = r(r | s)∗

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 15 / 24

Page 16: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

Definition (Regular definition)A regular definition of a regular expression is a “grammar” of the form

d1 → r1

d2 → r2

...dn → rn

where each ri is a regular expression over Σ ∪ {d1, d2, . . . , di−1}.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 16 / 24

Page 17: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Example

Example (Identifiers)We may now describe C++ identifiers as follows.

letter → A | B | · · · | Z | a | b | · · · | zdigit → 0 | 1 | · · · | 9

id → letter (letter | digit)∗

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 17 / 24

Page 18: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Regular Expressions

Note that this definition does not allow recursively defined tokens.In other words, di cannot be defined in terms of di , not evenindirectly.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 18 / 24

Page 19: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Outline

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 19 / 24

Page 20: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Extensions of Regular Languages

DefinitionWe add the following symbols to our regular expressions.

One or more instances: r+ = r r∗.Zero or one instance: r? = r | ε.Character class: [a1a2 · · · an] = a1 | a2 | · · · | an.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 20 / 24

Page 21: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Extensions of Regular Languages

Example (Identifiers)Identifiers can be described as

letter → [A-Za-z]

digit → [0-9]

id → letter (letter | digit)∗

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 21 / 24

Page 22: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Extensions of Regular Languages

Example (Floating-point Numbers)Floating-point numbers can be described as

digit → [0-9]

digits → digit+

number → digits (. digits)? (E [+-]? digits)?

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 22 / 24

Page 23: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Outline

1 Alphabets and Languages

2 Operations on Languages

3 Regular Expressions

4 Extensions of Regular Languages

5 Assignment

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 23 / 24

Page 24: Specification of Tokens - Hampden-Sydney Collegepeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures...Specification of Tokens Lecture 2 Section 3.3 Robb T. Koether Hampden-Sydney

Assignment

AssignmentRead Section 3.3.Exercises 2, 3.

Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 24 / 24