Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Specification of TokensLecture 2
Section 3.3
Robb T. Koether
Hampden-Sydney College
Fri, Jan 16, 2015
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 1 / 24
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 2 / 24
Outline
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 3 / 24
Alphabets and Strings
Definition (Alphabet)An alphabet is a finite set of symbols. Traditionally, we denote analphabet by the letter Σ.
Definition (String)A string is a finite sequence of symbols.
Definition (Empty String)The empty string, denoted ε, is the string that contains no symbols.The empty string has length 0.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 4 / 24
Alphabets and Strings
Example (Alphabets and Strings)Examples:
The traditional alphabet is
Σ = {A, B, C, . . . , Z}.
The binary alphabet is Σ = {0, 1}.For C programs, the alphabet is the set of ASCII characters.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 5 / 24
Languages
Definition (Language)A language is a set of (finite) strings over a given alphabet.
A language can be (and usually is) infinite.The set of all even integers over the alphabet Σ = {0, 1, . . . , 9} isa language.The set of all C programs is a language.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 6 / 24
Outline
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 7 / 24
Operations on Languages
Definition (Language)Let L, L1, and L2 be languages.
Union:L1 ∪ L2 = {x | x ∈ L1 or x ∈ L2}.
Concatenation:
L1L2 = {xy | x ∈ L1 and y ∈ L2}.
Repeated concatenation:
Ln = LLL · · · L n copies of L.
Kleene closure:
L∗ = {ε} ∪ L ∪ L2 ∪ L3 · · · .
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 8 / 24
Operations on Languages
Example (Language)Let
L1 = {1, 3, 5, 7, 9}
andL2 = {0, 2, 4, 6, 8}.
Describe L1 ∪ L2.Describe L1L2.Describe (L1 ∪ L2)3.Describe (L1 ∪ L2)∗L2.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 9 / 24
Outline
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 10 / 24
Regular Expressions
A regular expression can be used to describe a language.Regular expressions may be defined in two parts.
The basic part.The recursive part.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 11 / 24
Regular Expressions
The basic part:ε represents the language {ε}.a represents the language {a} for every a ∈ Σ.Call these languages L(ε) and L(a), respectively.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 12 / 24
Regular Expressions
The recursive part: Let r and s denote regular expressions.r | s represents the language L(r) ∪ L(s).rs represents the language L(r)L(s).r∗ represents the language L(r)∗.(r) represents the language L(r).
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 13 / 24
Regular Expressions
In other wordsL(r | s) = L(r) ∪ L(s).L(rs) = L(r)L(s).L(r∗) = L(r)∗.L((r)) = L(r).
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 14 / 24
Example
Example (Identifiers)Identifiers in C++ can be represented by a regular expression.
r = A | B | · · · | Z | a | b | · · · | zs = 0 | 1 | · · · | 9t = r(r | s)∗
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 15 / 24
Regular Expressions
Definition (Regular definition)A regular definition of a regular expression is a “grammar” of the form
d1 → r1
d2 → r2
...dn → rn
where each ri is a regular expression over Σ ∪ {d1, d2, . . . , di−1}.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 16 / 24
Example
Example (Identifiers)We may now describe C++ identifiers as follows.
letter → A | B | · · · | Z | a | b | · · · | zdigit → 0 | 1 | · · · | 9
id → letter (letter | digit)∗
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 17 / 24
Regular Expressions
Note that this definition does not allow recursively defined tokens.In other words, di cannot be defined in terms of di , not evenindirectly.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 18 / 24
Outline
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 19 / 24
Extensions of Regular Languages
DefinitionWe add the following symbols to our regular expressions.
One or more instances: r+ = r r∗.Zero or one instance: r? = r | ε.Character class: [a1a2 · · · an] = a1 | a2 | · · · | an.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 20 / 24
Extensions of Regular Languages
Example (Identifiers)Identifiers can be described as
letter → [A-Za-z]
digit → [0-9]
id → letter (letter | digit)∗
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 21 / 24
Extensions of Regular Languages
Example (Floating-point Numbers)Floating-point numbers can be described as
digit → [0-9]
digits → digit+
number → digits (. digits)? (E [+-]? digits)?
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 22 / 24
Outline
1 Alphabets and Languages
2 Operations on Languages
3 Regular Expressions
4 Extensions of Regular Languages
5 Assignment
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 23 / 24
Assignment
AssignmentRead Section 3.3.Exercises 2, 3.
Robb T. Koether (Hampden-Sydney College) Specification of Tokens Fri, Jan 16, 2015 24 / 24