Upload
pierce-walters
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Syntax Analysis
CSE 340 – Principles of Programming Languages
Fall 2015
Adam Doupé
Arizona State University
http://adamdoupe.com
2Adam Doupé, Principles of Programming Languages
Syntax Analysis
• The goal of syntax analysis is to transform the sequence of tokens from the lexer into something useful
• However, we need a way to specify and check if the sequence of tokens is valid– NUM PLUS NUM– DECIMAL DOT NUM – ID DOT ID– DOT DOT DOT NUM ID DOT ID
3Adam Doupé, Principles of Programming Languages
Using Regular Expressions
PROGRAM = STATEMENT*
STATEMENT = EXPRESSION | IF_STMT | WHILE_STMT | …
OP = + | - | * | /
EXPRESSION = (NUM | ID | DECIMAL) OP (NUM | ID | DECIMAL)
5 + 10
foo - bar
1 + 2 + 3
4Adam Doupé, Principles of Programming Languages
Using Regular Expressions
• Regular expressions are not sufficient to capture all programming constructs– We will not go into the details in this class, but the reason
is that regular languages (the set of all languages that can be described by regular expressions) cannot express languages with properties that we care about
• How to write a regular expression for matching parenthesis?– L(R) = {𝜺, (), (()), ((())), …} – Regular expressions (as we have defined them in this
class) have no concept of counting (to ensure balanced parenthesis), therefore it is impossible to create R
5Adam Doupé, Principles of Programming Languages
Context-Free Grammars
• Syntax for context-free grammars– Each row is called a production
• Non-terminals on the left• Right arrow• Non-terminals and terminals on the right
– Non-terminals will start with an upper case in our examples, terminals will be lowercase and are tokens
– S will typically be the starting non-terminal
• Example for matching parenthesis
S → 𝜺S → ( S )
Can also write more succinctly by combining production rules with the same starting non-terminals
S→ ( S ) | 𝜺
6Adam Doupé, Principles of Programming Languages
CFG Example
S→ ( S ) | 𝜺Derivations of the CFG
S⇒𝜺S ( S ) ( ) ()⇒ ⇒ 𝜺 ⇒S ( S) ( ( S ) ) ( ( ) ) (())⇒ ⇒ ⇒ 𝜺 ⇒
8Adam Doupé, Principles of Programming Languages
CFG Example
Exp→ Exp + Exp
Exp→ Exp * Exp
Exp→ NUM
Exp Exp * Exp Exp * 3 Exp + Exp * ⇒ ⇒ ⇒3 Exp + 2 * 3 1 + 2 * 3⇒ ⇒
9Adam Doupé, Principles of Programming Languages
Leftmost Derivation
• Always expand the leftmost nonterminal
Exp→ Exp + Exp
Exp→ Exp * Exp
Exp→ NUM
Is this a leftmost derivation?
Exp Exp * Exp Exp * 3 Exp + Exp * 3 Exp + 2 * ⇒ ⇒ ⇒ ⇒3 1 + 2 * 3⇒
Exp Exp * Exp Exp + Exp * Exp 1 + Exp * Exp 1 ⇒ ⇒ ⇒ ⇒+ 2 * Exp 1 + 2 * 3⇒
10Adam Doupé, Principles of Programming Languages
Rightmost Derivation
• Always expand the rightmost nonterminal
Exp→ Exp + Exp
Exp→ Exp * Exp
Exp→ NUM
Exp Exp * Exp Exp * 3 Exp + Exp * ⇒ ⇒ ⇒3 Exp + 2 * 3 1 + 2 * 3⇒ ⇒
11Adam Doupé, Principles of Programming Languages
Parse Tree
• We can also represent derivations using a parse tree– May sound familiar
Source
LexerTokens
Parser
ParseTreeBytes
12Adam Doupé, Principles of Programming Languages
Parse Tree
Exp Exp * Exp Exp * 3 Exp + Exp * 3 ⇒ ⇒ ⇒ Exp + 2 * 3 1 + 2 * 3⇒ ⇒
Exp
Exp Exp*
3Exp + Exp
21
13Adam Doupé, Principles of Programming Languages
Parsing
• Derivations and parse tree can show how to generate strings that are in the language described by the grammar
• However, we need to turn a sequence of tokens into a parse tree
• Parsing is the process of determining the derivation or parse tree from a sequence of tokens
• Two major parsing problems:– Ambiguous grammars– Efficient parsing
14Adam Doupé, Principles of Programming Languages
Ambiguous Grammars
Exp→ Exp + Exp
Exp→ Exp * Exp
Exp→ NUM
How to parse 1 + 2 * 3?
Exp Exp * Exp Exp + Exp * Exp 1 + Exp * Exp 1 ⇒ ⇒ ⇒ ⇒+ 2 * Exp 1 + 2 * 3⇒
Exp Exp + Exp 1 + Exp 1 + Exp * Exp 1 + 2 * ⇒ ⇒ ⇒ ⇒Exp 1 + 2 * 3⇒
15Adam Doupé, Principles of Programming Languages
Ambiguous Grammars
1 + 2 * 3
Exp
Exp Exp*
3Exp + Exp
21
Exp
Exp Exp+
* Exp1 Exp
2 3
16Adam Doupé, Principles of Programming Languages
Ambiguous Grammars
• A grammar is ambiguous if there exists two different leftmost derivations, or two different rightmost derivations, or two different parse trees for any string in the grammar
• Is English ambiguous?– I saw a man on a hill with a telescope.
• Ambiguity is not desirable in a programming language– Unlike in English, we don't want the compiler to read
your mind and try to infer what you meant
17Adam Doupé, Principles of Programming Languages
Parsing Approaches
• Various ways to turn strings into parse tree– Bottom-up parsing, where you start from the
terminals and work your way up– Top-down parsing, where you start from the
starting non-terminal and work your way down
• In this class, we will focus exclusively on top-down parsing
18Adam Doupé, Principles of Programming Languages
Top-Down ParsingS → A | B | C
A → a
B → Bb | b
C → Cc | 𝜺parse_S() {
t_type = getToken()if (t_type == a) {
ungetToken()parse_A()check_eof()
}else if (t_type ==
b) {ungetToken()parse_B()check_eof()
}
else if (t_type == c) {
ungetToken()parse_C()check_eof()
}else if (t_type ==
eof) { // do EOF stuff
}else {
syntax_error()}
}
19Adam Doupé, Principles of Programming Languages
Predictive Recursive Descent Parsers
• Predictive recursive descent parser are efficient top-down parsers– Efficient because they only look at next token, no
backtracking/guessing
• To determine if a language allows a predictive recursive descent parser, we need to define the following functions
• FIRST(α), where α is a sequence of grammar symbols (non-terminals, terminals, and )𝜺– FIRST(α) returns the set of terminals and that begin strings 𝜺
derived from α
• FOLLOW(A), where A is a non-terminal– FOLLOW(A) returns the set of terminals and $ (end of file) that can
appear immediately after the non-terminal A
20Adam Doupé, Principles of Programming Languages
FIRST() Example
S → A | B | C
A → a
B → Bb | b
C → Cc | 𝜺FIRST(S) = { a, b, c, }𝜺FIRST(A) = { a }
FIRST(B) = { b }
FIRST(C) = { , c }𝜺
21Adam Doupé, Principles of Programming Languages
Calculating FIRST(α)
First, start out with empty FIRST() sets for all non-terminals in the grammar
Then, apply the following rules until the FIRST() sets do not change:
1. FIRST(x) = { x } if x is a terminal
2. FIRST( ) = { }𝜺 𝜺3. If A → Bα is a production rule, then add FIRST(B) – { } to 𝜺
FIRST(A)
4. If A → B0B1B2…BiBi+1…Bk and FIRST(B𝜺 ∈ 0) and FIRST(B𝜺 ∈ 1) and FIRST(B𝜺 ∈ 2) and … and FIRST(B𝜺 ∈ i), then add FIRST(Bi+1) – { } to FIRST(A)𝜺
5. If A → B0B1B2…Bk and FIRST(B0) and FIRST(B𝜺 ∈ 1) and 𝜺 ∈FIRST(B2) and … and FIRST(B𝜺 ∈ k), then add to FIRST(A)∈
23Adam Doupé, Principles of Programming Languages
Calculating FIRST Sets
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺
INITIAL
FIRST(S) = {}
FIRST(S) = { }
FIRST(S) = { a }
FIRST(S) = { a, c, d, b}
FIRST(S) = { a, c, d, b }
FIRST(A) = {}
FIRST(A) = { a }
FIRST(A) = { a, c, d, } 𝜺 FIRST(A) =
{ a, c, d, }𝜺
FIRST(A) = { a, c, d, }𝜺
FIRST(B) = {}
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(C) = {}
FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺 FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺FIRST(D) = {}
FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺 FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺
24Adam Doupé, Principles of Programming Languages
Calculating FIRST Sets
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺
INITIAL
FIRST(S) = {}
FIRST(S) = { }
FIRST(S) = { a }
FIRST(S) = { a, c, d, b}
FIRST(S) = { a, c, d, b }
FIRST(A) = {}
FIRST(A) = { a }
FIRST(A) = { a, c, d, } 𝜺 FIRST(A) =
{ a, c, d, }𝜺
FIRST(A) = { a, c, d, }𝜺
FIRST(B) = {}
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(C) = {}
FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺 FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺FIRST(D) = {}
FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺 FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺
25Adam Doupé, Principles of Programming Languages
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺
INITIAL
FIRST(S) = {}
FIRST(S) = { }
FIRST(S) = { a }
FIRST(S) = { a, c, d, b}
FIRST(S) = { a, c, d, b }
FIRST(A) = {}
FIRST(A) = { a }
FIRST(A) = { a, c, d, } 𝜺 FIRST(A) =
{ a, c, d, }𝜺
FIRST(A) = { a, c, d, }𝜺
FIRST(B) = {}
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(B) = { b }
FIRST(C) = {}
FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺 FIRST(C) = { c, }𝜺 FIRST(C) =
{ c, }𝜺FIRST(D) = {}
FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺 FIRST(D) = { d, }𝜺 FIRST(D) =
{ d, }𝜺
1. FIRST(x) = { x } if x is a terminal2. FIRST( ) = { }𝜺 𝜺3. If A → Bα is a production rule, then add FIRST(B) – { } to FIRST(A)𝜺4. If A → B0B1B2…BiBi+1…Bk and FIRST(B𝜺 ∈ 0) and FIRST(B𝜺 ∈ 1) and FIRST(B𝜺 ∈ 2) and … and FIRST(B𝜺 ∈ i), then add
FIRST(Bi+1) – { } to FIRST(A)𝜺5. If A → B0B1B2…Bk and FIRST(B0) and FIRST(B𝜺 ∈ 1) and FIRST(B𝜺 ∈ 2) and … and FIRST(B𝜺 ∈ k), then add to ∈
FIRST(A)
26Adam Doupé, Principles of Programming Languages
FOLLOW() Example
FOLLOW(A), where A is a non-terminal, returns the set of terminals and $ (end of file) that can appear immediately after the non-terminal A
S → A | B | C
A → a
B → Bb | b
C → Cc | 𝜺FOLLOW(S) = { $ }
FOLLOW(A) = { $ }
FOLLOW(B) = { b, $ }
FOLLOW(C) = { c, $ }
27Adam Doupé, Principles of Programming Languages
Calculating FOLLOW(A)
First, calculate FIRST sets.
Then, initialize empty FOLLOW sets for all non-terminals in the grammar
Finally, apply the following rules until the FOLLOW sets do not change:
1. If S is the starting symbol of the grammar, then add $ to FOLLOW(S)
2. If B → αA, then add FOLLOW(B) to FOLLOW(A)
3. If B → αAC0C1C2…Ck and FIRST(C𝜺 ∈ 0) and FIRST(C𝜺 ∈ 1) and 𝜺FIRST(C∈ 2) and … and FIRST(C𝜺 ∈ k), then add FOLLOW(B) to
FOLLOW(A)
4. If B → αAC0C1C2…Ck, then add FIRST(C0) – { } to FOLLOW(A)𝜺5. If B → αAC0C1C2…CiCi+1…Ck and FIRST(C𝜺 ∈ 0) and FIRST(C𝜺 ∈ 1)
and FIRST(C𝜺 ∈ 2) and … and FIRST(C𝜺 ∈ i), then add FIRST(Ci+1) –
{ } to FOLLOW(A)𝜺
29Adam Doupé, Principles of Programming Languages
Calculating FOLLOW Sets
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺FIRST(S) = { a, c, d, b }
FIRST(A) = { a, c, d, }𝜺FIRST(B) = { b }
FIRST(C) = { c, }𝜺FIRST(D) = { d, }𝜺
INITIAL
FOLLOW(S) = {}
FOLLOW(S) = { $ }
FOLLOW(S) = { $ }
FOLLOW(A) = {}
FOLLOW(A) = { b }
FOLLOW(A) = { b }
FOLLOW(B) = {}
FOLLOW(B) = { $, c, d }
FOLLOW(B) = { $, c, d }
FOLLOW(C) = {}
FOLLOW(C) = { $, d, b }
FOLLOW(C) = { $, d, b }
FOLLOW(D) = {}
FOLLOW(D) = { $, b }
FOLLOW(D) = { $, b }
30Adam Doupé, Principles of Programming Languages
Calculating FOLLOW Sets
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺FIRST(S) = { a, c, d, b }
FIRST(A) = { a, c, d, }𝜺FIRST(B) = { b }
FIRST(C) = { c, }𝜺FIRST(D) = { d, }𝜺
INITIAL
FOLLOW(S) = {}
FOLLOW(S) = { $ }
FOLLOW(S) = { $ }
FOLLOW(A) = {}
FOLLOW(A) = { b }
FOLLOW(A) = { b }
FOLLOW(B) = {}
FOLLOW(B) = { $, c, d }
FOLLOW(B) = { $, c, d }
FOLLOW(C) = {}
FOLLOW(C) = { $, d, b }
FOLLOW(C) = { $, d, b }
FOLLOW(D) = {}
FOLLOW(D) = { $, b }
FOLLOW(D) = { $, b }
32Adam Doupé, Principles of Programming Languages
S → ABCD
A → CD | aA
B → b
C → cC | 𝜺D → dD | 𝜺FIRST(S) = { a, c, d, b }
FIRST(A) = { a, c, d, }𝜺FIRST(B) = { b }
FIRST(C) = { c, }𝜺FIRST(D) = { d, }𝜺
INITIAL
FOLLOW(S) = {}
FOLLOW(S) = { $ }
FOLLOW(S) = { $ }
FOLLOW(A) = {}
FOLLOW(A) = { b }
FOLLOW(A) = { b }
FOLLOW(B) = {}
FOLLOW(B) = { $, c, d }
FOLLOW(B) = { $, c, d }
FOLLOW(C) = {}
FOLLOW(C) = { $, d, b }
FOLLOW(C) = { $, d, b }
FOLLOW(D) = {}
FOLLOW(D) = { $, b }
FOLLOW(D) = { $, b }
1. If S is the starting symbol of the grammar, then add $ to FOLLOW(S)2. If B → αA, then add FOLLOW(B) to FOLLOW(A)3. If B → αAC0C1C2…Ck and FIRST(C𝜺 ∈ 0) and FIRST(C𝜺 ∈ 1) and FIRST(C𝜺 ∈ 2) and … and 𝜺 ∈
FIRST(Ck), then add FOLLOW(B) to FOLLOW(A)4. If B → αAC0C1C2…Ck, then add FIRST(C0) – { } to FOLLOW(A)𝜺5. If B → αAC0C1C2…CiCi+1…Ck and FIRST(C𝜺 ∈ 0) and FIRST(C𝜺 ∈ 1) and FIRST(C𝜺 ∈ 2) and …
and FIRST(C𝜺 ∈ i), then add FIRST(Ci+1) – { } to FOLLOW(A)𝜺
33Adam Doupé, Principles of Programming Languages
Predictive Recursive Descent Parsers
• At each parsing step, there is only one grammar rule that can be chosen, and there is no need for backtracking
• The conditions for a predictive parser are both of the following– If A → α and A → β, then FIRST(α) ∩
FIRST(β) = ∅– If FIRST(A), then FIRST(A) ∩ 𝜺 ∈
FOLLOW(A) = ∅
34Adam Doupé, Principles of Programming Languages
Creating a Predictive Recursive Descent Parser
• Create a CFG• Calculate FIRST and FOLLOW sets• Prove that CFG allows a Predictive
Recursive Descent Parser• Write the predictive recursive descent
parser using the FIRST and FOLLOW sets
35Adam Doupé, Principles of Programming Languages
Email Addresses
• How to parse/validate email addresses?– name @ domain.tld
• Turns out, it is not so simple– "cse 340"@example.com– customer/[email protected]– "Abc@def"@example.com– "Abc\@def"@example.com– "Abc\"@example.com"@example.com– test "example @hello" <[email protected]>
• In fact, a company called Mailgun, which provides email services as an API, released an open-source tool to validate email addresses, based on their experience with real-world email– How did they implement their parser?– A recursive descent parser– https://github.com/mailgun/flanker
36Adam Doupé, Principles of Programming Languages
Email Address CFGquoted-string
atom
dot-atom
whitespace
Address → Name-addr-rfc | Name-addr-lax | Addr-spec
Name-addr-rfc → Display-name-rfc Angle-addr-rfc | Angle-addr-rfc
Display-name-rfc → Word Display-name-rfc-list | whitespace Word Display-name-rfc-list
Display-name-rfc-list → whitespace Word Display-name-rfc-list | epsilon
Angle-addr-rfc → < Addr-spec > | whitespace < Addr-spec > | whitespace < Addr-spec > whitespace | < Addr-spec > whitespace
Name-addr-lax → Display-name-lax Angle-addr-lax | Angle-addr-lax
Display-name-lax → whitespace Word Display-name-lax-list whitespace | Word Display-name-lax-list whitespace
Display-name-lax-list → whitespace Word Display-name-lax-list | epsilon
Angle-addr-lax → Addr-spec | Addr-spec whitespace
Addr-spec → Local-part @ Domain | whitespace Local-part @ Domain | whitespace Local-part @ Domain whitespace | Local-part @ Domain whitespace
Local-part → dot-atom | quoted-string
Domain → dot-atom
Word → atom | quoted-stringCFG taken from https://github.com/mailgun/flanker
37Adam Doupé, Principles of Programming Languages
Simplified Email Address CFG
quoted-string (q-s)
atom
dot-atom (d-a)
quoted-string-at (q-s-a)
dot-atom-at (d-a-a)
Address → Name-addr | Addr-spec
Name-addr → Display-name Angle-addr | Angle-addr
Display-name → Word Display-name-list
Display-name-list → Word Display-name-list | 𝜺Angle-addr → < Addr-spec >
Addr-spec → d-a-a Domain | q-s-a Domain
Domain → d-a
Word → atom | q-s
39Adam Doupé, Principles of Programming Languages
Address → Name-addr | Addr-spec
Name-addr → Display-name Angle-addr | Angle-addr
Display-name → Word Display-name-list
Display-name-list → Word Display-name-list | 𝜺Angle-addr → < Addr-spec >
Addr-spec → d-a-a Domain | q-s-a Domain
Domain → d-a
Word → atom | q-s
FIRST INITIAL
Address {} {} { d-a-a, q-s-a } { d-a-a, q-s-a, < } { d-a-a, q-s-a, <, atom, q-s }
{ d-a-a, q-s-a, <, atom, q-s }
Name-addr {} {} { < } { <, atom, q-s } { <, atom, q-s } { <, atom, q-s }
Display-name
{} {} { atom, q-s } { atom, q-s } { atom, q-s } { atom, q-s }
Display-name-list
{} { }𝜺 { , atom, q-𝜺s }
{ , atom, q-s }𝜺 { , atom, q-s }𝜺 { , atom, q-s }𝜺Angle-addr {} { < } { < } { < } { < } { < }
Addr-spec {} { d-a-a, q-s-a }
{ d-a-a, q-s-a } { d-a-a, q-s-a } { d-a-a, q-s-a } { d-a-a, q-s-a }
Domain {} { d-a } { d-a } { d-a } { d-a } { d-a }
Word {} { atom, q-s }
{ atom, q-s } { atom, q-s } { atom, q-s } { atom, q-s }
40Adam Doupé, Principles of Programming Languages
Address → Name-addr | Addr-spec
Name-addr → Display-name Angle-addr | Angle-addr
Display-name → Word Display-name-list
Display-name-list → Word Display-name-list | 𝜺Angle-addr → < Addr-spec >
Addr-spec → d-a-a Domain | q-s-a Domain
Domain → d-a
Word → atom | q-s
FOLLOW INITIAL
Address {} { $ } { $ }
Name-addr {} { $ } { $ }
Display-name {} { < } { < }
Display-name-list {} { < } { < }
Angle-addr {} { $ } { $ }
Addr-spec {} { $, > } { $, > }
Domain {} { $, > } { $, > }
Word {} { atom, q-s, < } { atom, q-s, < }
FIRST(Address) = { d-a-a, q-s-a, <, atom, q-s }FIRST(Name-addr) = { <, atom, q-s }FIRST(Display-name) = { atom, q-s }FIRST(Display-name-list) = { , atom, q-s }𝜺FIRST(Angle-addr) = { < }FIRST(Addr-spec) = { d-a-a, q-s-a }FIRST(Domain) = { d-a }FIRST(Word) = { atom, q-s }
41Adam Doupé, Principles of Programming Languages
Address → Name-addr | Addr-spec
Name-addr → Display-name Angle-addr | Angle-addr
Display-name → Word Display-name-list
Display-name-list → Word Display-name-list | 𝜺Angle-addr → < Addr-spec >
Addr-spec → d-a-a Domain | q-s-a Domain
Domain → d-a
Word → atom | q-s
FIRST(Name-addr) ∩ FIRST(Addr-spec)
FIRST(Display-name Angle-addr) ∩ FIRST(Angle-addr)
FIRST(Word Display-name-list) ∩ FIRST( )𝜺FIRST(d-a-a Domain) ∩ FIRST(q-s-a Domain)
FIRST(atom) ∩ FIRST(q-s)
FIRST(Display-name-list) ∩ FOLLOW(Display-name-list)
FIRST(Address) = { d-a-a, q-s-a, <, atom,
q-s }
FIRST(Name-addr) = { <, atom, q-s }
FIRST(Display-name) = { atom, q-s }
FIRST(Display-name-list) = { , atom, q-s }𝜺FIRST(Angle-addr) = { < }
FIRST(Addr-spec) = { d-a-a, q-s-a }
FIRST(Domain) = { d-a }
FIRST(Word) = { atom, q-s }
FOLLOW(Address) = { $ }
FOLLOW(Name-addr) = { $ }
FOLLOW(Display-name) = { < }
FOLLOW(Display-name-list) = { < }
FOLLOW(Angle-addr) = { $ }
FOLLOW(Addr-spec) = { $, > }
FOLLOW(Domain) = { $, > }
FOLLOW(Word) = { atom, q-s, < }
42Adam Doupé, Principles of Programming Languages
parse_Address() {t_type = getToken();// Check FIRST(Name-addr)if (t_type == < || t_type == atom || t_type == q-s ) {
ungetToken();parse_Name-addr();printf("Address -> Name-addr");
}// Check FIRST(Addr-spec)else if (t_type == d-a-a || t_type == q-s-a) {
ungetToken();parse_Addr-spec();printf("Address -> Addr-spec");
}else {
syntax_error();}
}
Address → Name-addr | Addr-spec
FIRST(Address) = { d-a-a, q-s-a, <, atom,
q-s }
FIRST(Name-addr) = { <, atom, q-s }
FIRST(Addr-spec) = { d-a-a, q-s-a }
FOLLOW(Address) = { $ }
FOLLOW(Name-addr) = { $ }
FOLLOW(Addr-spec) = { $, > }
43Adam Doupé, Principles of Programming Languages
parse_Name-addr() {t_type = getToken();// Check FIRST(Display-name Angle-addr)if (t_type == atom || t_type == q-s) {
ungetToken();parse_Display-name();parse_Angle-addr();printf("Name-addr -> Display-name Angle-addr");
}// Check FIRST(Angle-addr)else if (t_type == <) {
ungetToken();parse_Angle-addr();printf("Name-addr -> Angle-addr");
}else {
syntax_error();}
}
Name-addr → Display-name Angle-addr | Angle-addr
FIRST(Name-addr) = { <, atom, q-s }
FIRST(Display-name) = { atom, q-s }
FIRST(Angle-addr) = { < }
FOLLOW(Name-addr) = { $ }
FOLLOW(Display-name) = { < }
FOLLOW(Angle-addr) = { $ }
44Adam Doupé, Principles of Programming Languages
parse_Display-name() {t_type = getToken();// Check FIRST(Word Display-name-list)if (t_type == atom || t_type == q-s) {ungetToken();parse_Word();parse_Display-name-list();printf("Display-name -> Word Display-name-list");}else {syntax_error();}
}
Display-name → Word Display-name-list
FIRST(Display-name) = { atom, q-s }
FIRST(Display-name-list) = { , atom, q-s }𝜺FIRST(Word) = { atom, q-s }
FOLLOW(Display-name) = { < }
FOLLOW(Display-name-list) = { < }
FOLLOW(Word) = { atom, q-s, < }
45Adam Doupé, Principles of Programming Languages
parse_Display-name-list() {t_type = getToken();// Check FIRST( Word Display-name-list)if (t_type == atom || t_type == q-s) {
ungetToken();parse_Word();parse_Display-name-list();printf("Display-name-list -> Word Display-name-list");
}// Check FOLLOW(Display-name-list)else if (t_type == <) {
ungetToken();printf("Display-name-list -> ");𝜺
}else { syntax_error(); }
}
Display-name-list → Word Display-name-list | 𝜺FIRST(Display-name-list) = { , atom, q-s }𝜺FIRST(Word) = { atom, q-s }
FOLLOW(Display-name-list) = { < }
FOLLOW(Word) = { atom, q-s, < }
46Adam Doupé, Principles of Programming Languages
parse_Angle-addr() {t_type = getToken();// Check FIRST(< Addr-spec >)if (t_type == <) {
// ungetToken()?parse_Addr-spec();t_type = getToken();if (t_type != >) {
syntax_error();}printf("Angle-addr -> < Addr-spec >");
}else {
syntax_error();}
}
Angle-addr → < Addr-spec >
FIRST(Angle-addr) = { < }
FIRST(Addr-spec) = { d-a-a, q-s-a }
FOLLOW(Angle-addr) = { $ }
FOLLOW(Addr-spec) = { $, > }
47Adam Doupé, Principles of Programming Languages
parse_Addr-spec() {t_type = getToken();// Check FIRST(d-a-a Domain)if (t_type == d-a-a) {
// ungetToken()?parse_Domain();printf("Addr-spec -> d-a-a Domain");
}// Check FIRST(q-s-a Domain)else if (t_type == q-s-a) {
parse_Domain();printf("Addr-spec -> q-s-a Domain");
}else { syntax_error(); }
}
Addr-spec → d-a-a Domain | q-s-a Domain
FIRST(Addr-spec) = { d-a-a, q-s-a }
FIRST(Domain) = { d-a }
FOLLOW(Addr-spec) = { $, > }
FOLLOW(Domain) = { $, > }
48Adam Doupé, Principles of Programming Languages
parse_Domain() {t_type = getToken();// Check FIRST(d-a)if (t_type == d-a) {
printf("Domain -> d-a");}else {
syntax_error();}
}
Domain → d-a
FIRST(Domain) = { d-a }
FOLLOW(Domain) = { $, > }
49Adam Doupé, Principles of Programming Languages
parse_Word() {t_type = getToken();// Check FIRST(atom)if (t_type == atom) {printf("Word -> atom");}// Check FIRST(q-s)else if (t_type == q-s) {printf("Word -> q-s");}else {syntax_error();}
}
Word → atom | q-s
FIRST(Word) = { atom, q-s }
FOLLOW(Word) = { atom, q-s, < }
50Adam Doupé, Principles of Programming Languages
Predictive Recursive Descent Parsers
• For every non-terminal A in the grammar, create a function called parse_A
• For each production rule A → α (where α is a sequence of terminals and non-terminals), if getToken() FIRST(α) ∈then choose the production rule A → α– For every terminal and non-terminal a in α, if a is a non-terminal
call parse_a, if a is a terminal check that getToken() == a– If FIRST(α), then check that getToken() FOLLOW(A), 𝜺 ∈ ∈
then choose the production A → 𝜺• If getToken() FIRST(A), then syntax_error(), unless ∈ ∉ ∈
FIRST(A), then getToken() FOLLOW(A) is ∉syntax_error()