Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Lexical and Syntax Analysis(of Programming Languages)
Top-Down Parsing
Lexical and Syntax Analysis(of Programming Languages)
Top-Down Parsing
Data structure
Easy for programs
to transform
String ofcharacters
Easy for humansto write andunderstand
Lexemes identified
String oftokens
Data structure
Easy for programs
to transform
String ofcharacters
Easy for humansto write andunderstand
Lexemes identified
String oftokens
PART 1:SYNTAX OF LANGUAGES
• Context-Free Grammars
• Derivations
• Parse Trees
• Ambiguity
• Precedence and Associativity
PART 1:SYNTAX OF LANGUAGES
• Context-Free Grammars
• Derivations
• Parse Trees
• Ambiguity
• Precedence and Associativity
Syntax
The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.
For example, a grammar E forarithmetic expressions:
e → x| y| e + e| e – e| e * e| ( e )
Syntax
The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.
For example, a grammar E forarithmetic expressions:
e → x| y| e + e| e – e| e * e| ( e )
Context-free grammars
Have four components:
1. A set of terminal symbols.
2. A set of non-terminal symbols.
3. A set of productions (or rules) ofthe form:
where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.
4. The start symbol (one of thenon-terminals).
n → X1⋯ Xn
Context-free grammars
Have four components:
1. A set of terminal symbols.
2. A set of non-terminal symbols.
3. A set of productions (or rules) ofthe form:
where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.
4. The start symbol (one of thenon-terminals).
n → X1⋯ Xn
Notation
Non-terminals are underlined.
Rather than writing
we may write:
(Also, symbols → and ::= will beused interchangeably.)
e → xe → e + e
e → x| e + e
Notation
Non-terminals are underlined.
Rather than writing
we may write:
(Also, symbols → and ::= will beused interchangeably.)
e → xe → e + e
e → x| e + e
Why context-free?
Regular
ContextFree
ContextSensitive
Unrestricted
Nice balance between expressivepower and efficiency of parsing.
Why context-free?
Regular
ContextFree
ContextSensitive
Unrestricted
Nice balance between expressivepower and efficiency of parsing.
Derivations
A derivation is a proof that thesome string conforms to agrammar.
For example:
e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )
Derivations
A derivation is a proof that thesome string conforms to agrammar.
For example:
e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )
Derivations
Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.
Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.
e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )
Derivations
Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.
Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.
e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )
Parse tree:motivation
Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:
is more concise: we don’t writeout the sentence every time anon-terminal is expanded.
abstracts over the order inwhich rules are applied.
Parse tree:motivation
Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:
is more concise: we don’t writeout the sentence every time anon-terminal is expanded.
abstracts over the order inwhich rules are applied.
Parse tree:intuition
If non-terminal n has a production
n → X Y Z
where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.
n
X Y Z
Parse tree:intuition
If non-terminal n has a production
n → X Y Z
where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.
n
X Y Z
Parse tree:definition
A parse tree is a tree in which:
the root is labelled by the startsymbol;
each leaf is labelled by a terminalsymbol, or 𝜀;
each interior node is labelled by anon-terminal;
if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.
Parse tree:definition
A parse tree is a tree in which:
the root is labelled by the startsymbol;
each leaf is labelled by a terminalsymbol, or 𝜀;
each interior node is labelled by anon-terminal;
if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.
Example 1
Example input string:
Resulting parse tree accordingto grammar E:
x + y * x
e
x
+
*e
e
e
y
x
e
Example 1
Example input string:
Resulting parse tree accordingto grammar E:
x + y * x
e
x
+
*e
e
e
y
x
e
Example 2
The following is not a parse treeaccording to grammar E.
e
x
+
*e
e
e
y
x
Why? Because e → x + e is not aproduction in grammar E.
Example 2
The following is not a parse treeaccording to grammar E.
e
x
+
*e
e
e
y
x
Why? Because e → x + e is not aproduction in grammar E.
Syntax Analysis
String of symbols
Parse tree
A parse tree is:
1. A proof that a given input is validaccording to the grammar;
2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.
(Syntax analysis may also report thatthe input string is invalid.)
Syntax Analysis
String of symbols
Parse tree
A parse tree is:
1. A proof that a given input is validaccording to the grammar;
2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.
(Syntax analysis may also report thatthe input string is invalid.)
Ambiguity
If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:
e
e + e
x e * e
y x
e
*e
e + e
x y
e
x
Ambiguity
If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:
e
e + e
x e * e
y x
e
*e
e + e
x y
e
x
Operator precedence
Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.
Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).
Operator precedence
Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.
Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).
Operator associativity
Binary operators are either:
Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.
left-associative;
right-associative;
non-associative.
Even with operator precedencerules, ambiguity remains, e.g. x-x-x.
Operator associativity
Binary operators are either:
Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.
left-associative;
right-associative;
non-associative.
Even with operator precedencerules, ambiguity remains, e.g. x-x-x.
Exercise 1
Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.
Let all operators be left associative,and let * bind tighter than + and –.
e → x| y| e + e| e – e| e * e| ( e )
Recall grammar E:
Exercise 1
Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.
Let all operators be left associative,and let * bind tighter than + and –.
e → x| y| e + e| e – e| e * e| ( e )
Recall grammar E:
Answer: step-by-step
Given a non-terminal e whichinvolves operators at n levels ofprecedence:
Step 1: introduce n+1 new non-terminals, e0 ⋯ en.
Answer: step-by-step
Given a non-terminal e whichinvolves operators at n levels ofprecedence:
Step 1: introduce n+1 new non-terminals, e0 ⋯ en.
Step 2: replace each production
e → e op e
with
ei → ei op ei+1
| ei+1
if op is left-associative, or
ei → ei+1 op ei
| ei+1
if op is right-associative
Let op denote an operator withprecedence i.
Step 2: replace each production
e → e op e
with
ei → ei op ei+1
| ei+1
if op is left-associative, or
ei → ei+1 op ei
| ei+1
if op is right-associative
Let op denote an operator withprecedence i.
Grammar E after step 2 becomes:
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e → ( e )| x| y
Operator Precedence
+, - 0
* 1
Construct the precedence table:
Grammar E after step 2 becomes:
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e → ( e )| x| y
Operator Precedence
+, - 0
* 1
Construct the precedence table:
Step 3: replace each production
e → ⋯
with
en → ⋯
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e )| x| y
After step 3:
Step 3: replace each production
e → ⋯
with
en → ⋯
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e )| x| y
After step 3:
Step 4: replace all occurrences ofe0 with e.
e → e + e1
| e – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e )| x| y
After step 4:
Step 4: replace all occurrences ofe0 with e.
e → e + e1
| e – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e )| x| y
After step 4:
Exercise 2
Consider the following ambiguousgrammar for logical propositions.
p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)
Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.
Give an unambiguous grammar forlogical propositions.
Exercise 2
Consider the following ambiguousgrammar for logical propositions.
p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)
Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.
Give an unambiguous grammar forlogical propositions.
Exercise 3
Which of the following grammarsare ambiguous?
s → if b then s| if b then s else s| skip
e → + e e| – e e| x
b → 0 b 1| 0 1
Exercise 3
Which of the following grammarsare ambiguous?
s → if b then s| if b then s else s| skip
e → + e e| – e e| x
b → 0 b 1| 0 1
Summary of Part 1
Syntax of a language is oftenspecified by a context-freegrammar
Derivations and parse trees areproofs that a string is acceptedby a grammar.
Construction of unambiguousgrammars using rules ofprecedence and associativity.
Summary of Part 1
Syntax of a language is oftenspecified by a context-freegrammar
Derivations and parse trees areproofs that a string is acceptedby a grammar.
Construction of unambiguousgrammars using rules ofprecedence and associativity.
PART 2:TOP-DOWN PARSING
• Recursive-Descent
• Backtracking
• Left-Factoring
• Predictive Parsing
• Left-Recursion Removal
• First and Follow Sets
• Parsing tables and LL(1)
PART 2:TOP-DOWN PARSING
• Recursive-Descent
• Backtracking
• Left-Factoring
• Predictive Parsing
• Left-Recursion Removal
• First and Follow Sets
• Parsing tables and LL(1)
Top-down parsing
Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.
A good strategy for writing parsers:
1. Implement a syntax checker toaccept or refute input strings.
2. Modify the checker to constructa parse tree – straightforward.
Top-down parsing
Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.
A good strategy for writing parsers:
1. Implement a syntax checker toaccept or refute input strings.
2. Modify the checker to constructa parse tree – straightforward.
RECURSIVE DESCENT
A popular top-down parsing technique.
RECURSIVE DESCENT
A popular top-down parsing technique.
Recursive descent
A recursive descent parserconsists of a set of functions,one for each non-terminal.
The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.
Recursive descent
A recursive descent parserconsists of a set of functions,one for each non-terminal.
The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.
Consuming the input
int eat(char c) {if (*next == c) {
next++;return 1;
}return 0;
}
Consume c from input if possible.
We assume a global variable nextpoints to the input string.
char* next;
Consuming the input
int eat(char c) {if (*next == c) {
next++;return 1;
}return 0;
}
Consume c from input if possible.
We assume a global variable nextpoints to the input string.
char* next;
Recursive descent
int N() {char* save = next;
for each N → X1 X2 ⋯ Xn
if (parser(X1) &&parser(X2) &&
⋯ &&parser(Xn)) return 1;
else next = save;
return 0;}
For each non-terminal N, introduce:
Let parser(X) denote
X() if X is a non-terminal
eat(X) if X is a terminal
Backtrack
Recursive descent
int N() {char* save = next;
for each N → X1 X2 ⋯ Xn
if (parser(X1) &&parser(X2) &&
⋯ &&parser(Xn)) return 1;
else next = save;
return 0;}
For each non-terminal N, introduce:
Let parser(X) denote
X() if X is a non-terminal
eat(X) if X is a terminal
Backtrack
Exercise 4
Consider the following grammar Gwith start symbol e.
Using recursive descent, write asyntax checker for grammar G.
e → ( e + e )| ( e * e )| v
v → x| y
Exercise 4
Consider the following grammar Gwith start symbol e.
Using recursive descent, write asyntax checker for grammar G.
e → ( e + e )| ( e * e )| v
v → x| y
Answer (part 1)
int e() {char* save = next;
if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;
else next = save;
if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;
else next = save;
if (v()) return 1;else next = save;
return 0;}
Answer (part 1)
int e() {char* save = next;
if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;
else next = save;
if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;
else next = save;
if (v()) return 1;else next = save;
return 0;}
Answer (part 2)
int v() {char* save = next;
if (eat('x')) return 1;else next = save;
if (eat('y')) return 1;else next = save;
return 0;}
Answer (part 2)
int v() {char* save = next;
if (eat('x')) return 1;else next = save;
if (eat('y')) return 1;else next = save;
return 0;}
Exercise 5
How many function calls aremade by the recursive descentparser to parse the followingstrings?
(x*x)
((x*x)*x)
(((x*x)*x)*x)
(See animation of backtracking.)
Exercise 5
How many function calls aremade by the recursive descentparser to parse the followingstrings?
(x*x)
((x*x)*x)
(((x*x)*x)*x)
(See animation of backtracking.)
Answer
Input string Length Calls
(x*x) 5 21
((x*x)*x) 9 53
(((x*x)*x)*x) 13 117
Number of calls is quadratic inthe length of the input string.
Lesson: backtracking expensive!
Answer
Input string Length Calls
(x*x) 5 21
((x*x)*x) 9 53
(((x*x)*x)*x) 13 117
Number of calls is quadratic inthe length of the input string.
Lesson: backtracking expensive!
LEFT FACTORING
Reducing backtracking!
LEFT FACTORING
Reducing backtracking!
Left factoring
When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.
Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.
Left factoring
When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.
Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.
Example 3
Left-factoring grammar G byintroducing non-terminal r:
e → ( e r| v
r → + e )| * e )
v → x| y
Common prefix
Different suffixes
Example 3
Left-factoring grammar G byintroducing non-terminal r:
e → ( e r| v
r → + e )| * e )
v → x| y
Common prefix
Different suffixes
Exercise 6
How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?
(x*x)
((x*x)*x)
(((x*x)*x)*x)
Exercise 6
How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?
(x*x)
((x*x)*x)
(((x*x)*x)*x)
Answer
Input string Length Calls
(x*x) 5 13
((x*x)*x) 9 22
(((x*x)*x)*x) 13 31
Number of calls is now linear inthe length of input string.
Lesson: left-factoring a grammarreduces backtracking.
Answer
Input string Length Calls
(x*x) 5 13
((x*x)*x) 9 22
(((x*x)*x)*x) 13 31
Number of calls is now linear inthe length of input string.
Lesson: left-factoring a grammarreduces backtracking.
PREDICTIVE PARSING
Eliminating backtracking!
PREDICTIVE PARSING
Eliminating backtracking!
Predictive parsing
Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.
Advantage: very efficient since iteliminates all backtracking.
Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)
Predictive parsing
Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.
Advantage: very efficient since iteliminates all backtracking.
Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)
Running example
The following grammar H will beused as a running example todemonstrate predictive parsing.
Example:
e → e + e| e * e| ( e )| x| y
x+y*(y+x)
Running example
The following grammar H will beused as a running example todemonstrate predictive parsing.
Example:
e → e + e| e * e| ( e )| x| y
x+y*(y+x)
Removing ambiguity
Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.
e → e + t| t
t → t * f| f
f → ( e )| x| y
Removing ambiguity
Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.
e → e + t| t
t → t * f| f
f → ( e )| x| y
Left recursion
Problem: left-recursive grammarscause recursive descent parsers toloop forever.
int e() {char* save = next;
if (e() && eat('+') && t()) return 1;next = save;
if (t()) return 1;next = save;
return 0;}
Call to self withoutconsuming any input
Left recursion
Problem: left-recursive grammarscause recursive descent parsers toloop forever.
int e() {char* save = next;
if (e() && eat('+') && t()) return 1;next = save;
if (t()) return 1;next = save;
return 0;}
Call to self withoutconsuming any input
Eliminating left recursion
n → 𝛼 n → 𝛼 n'⟹
n' → 𝛼 n'⟹Rule 1
Rule 2
where 𝛼 does not begin with n
Let 𝛼 denote any sequence ofgrammar symbols.
n' → 𝜀
Rule 3Introduce new
production
n → n 𝛼
Eliminating left recursion
n → 𝛼 n → 𝛼 n'⟹
n' → 𝛼 n'⟹Rule 1
Rule 2
where 𝛼 does not begin with n
Let 𝛼 denote any sequence ofgrammar symbols.
n' → 𝜀
Rule 3Introduce new
production
n → n 𝛼
Example 4
Running example, after eliminatingleft-recursion.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Example 4
Running example, after eliminatingleft-recursion.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
first and follow sets
Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.
The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.
The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.
first and follow sets
Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.
The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.
The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.
Definition of first sets
Let 𝛼 denote any sequence ofgrammar symbols.
If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).
If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).
Definition of first sets
Let 𝛼 denote any sequence ofgrammar symbols.
If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).
If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).
Computing first sets
If a is a terminal then a ∊ first(a).
If there exists a production
n → X1 X2 ⋯ Xn
and ∃i · a ∊ first(Xi)
and ∀j < i · 𝜀 ∊ first(Xj)
then a ∊ first(n).
If n → 𝜀 then 𝜀 ∊ first(n).
Computing first sets
If a is a terminal then a ∊ first(a).
If there exists a production
n → X1 X2 ⋯ Xn
and ∃i · a ∊ first(Xi)
and ∀j < i · 𝜀 ∊ first(Xj)
then a ∊ first(n).
If n → 𝜀 then 𝜀 ∊ first(n).
Exercise 7
What are the first sets for eachnon-terminal in the followinggrammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Exercise 7
What are the first sets for eachnon-terminal in the followinggrammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Answer
first( f ) = { ‘(‘, ‘x’, ‘y’ }
first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }
first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }
Answer
first( f ) = { ‘(‘, ‘x’, ‘y’ }
first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }
first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }
Definition of follow sets
Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.
Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.
The set follow(n) never contains 𝜀.
Definition of follow sets
Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.
Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.
The set follow(n) never contains 𝜀.
End markers
In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.
If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).
End markers
In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.
If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).
Computing follow sets
If s is the start symbol of thegrammar then $ ∊ follow(s).
If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).
If n → 𝛼 x
or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)
then everything in follow(n) is infollow(x).
Computing follow sets
If s is the start symbol of thegrammar then $ ∊ follow(s).
If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).
If n → 𝛼 x
or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)
then everything in follow(n) is infollow(x).
Exercise 8
What are the follow sets for eachnon-terminal in the followinggrammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Exercise 8
What are the follow sets for eachnon-terminal in the followinggrammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Answer
follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }
follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }
follow( f ) = { ‘*’, ‘+’, ‘)’, $ }
Answer
follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }
follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }
follow( f ) = { ‘*’, ‘+’, ‘)’, $ }
Predictive parsing table
For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.
for each production n → 𝛼for each a ∊ first(𝛼)
add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then
for each b ∊ follow(n)add n → 𝛼 to T[n , a]
Predictive parsing table
For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.
for each production n → 𝛼for each a ∊ first(𝛼)
add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then
for each b ∊ follow(n)add n → 𝛼 to T[n , a]
Exercise 9
Construct a predictive parsingtable for the following grammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
Exercise 9
Construct a predictive parsingtable for the following grammar.
e → t e'e' → + t e'
| 𝜀
t → f t't' → * f t'
| 𝜀
f → ( e )| x| y
LL(1) grammars
If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).
First L: left-to-right scanning ofthe input.
Second L: a leftmost derivationis constructed.
The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.
LL(1) grammars
If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).
First L: left-to-right scanning ofthe input.
Second L: a leftmost derivationis constructed.
The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.
Exercise 10
Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.
int e() {...
}
It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.
Exercise 10
Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.
int e() {...
}
It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.
Answer (part 1)
int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;
}
int e1(){
if (*next == '+')return eat('+') && t() && e1();
if (*next == ')') return 1;if (*next == '\0') return 1;return 0;
}
Answer (part 1)
int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;
}
int e1(){
if (*next == '+')return eat('+') && t() && e1();
if (*next == ')') return 1;if (*next == '\0') return 1;return 0;
}
Answer (part 2)
int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;
}
int t1() {if (*next == '+') return 1;if (*next == '*‘)
return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;
}
Answer (part 2)
int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;
}
int t1() {if (*next == '+') return 1;if (*next == '*‘)
return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;
}
Answer (part 3)
int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')
return eat('(') && e() && eat(')');return 0;
}
(Notice how backtracking is notrequired.)
Answer (part 3)
int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')
return eat('(') && e() && eat(')');return 0;
}
(Notice how backtracking is notrequired.)
Predictive parsing algorithm
Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.
while (top(s) != $)if (top(s) is a terminal) {
if (top(s) == *next) { pop(s); next++; }else error();
}else if (T[top(s), *next] == X → Y1⋯ Yn) {
pop(s);push(s, Yn⋯ Y1) /* Y1 on top */
}
Predictive parsing algorithm
Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.
while (top(s) != $)if (top(s) is a terminal) {
if (top(s) == *next) { pop(s); next++; }else error();
}else if (T[top(s), *next] == X → Y1⋯ Yn) {
pop(s);push(s, Yn⋯ Y1) /* Y1 on top */
}
Exercise 11
Give the steps that a predictiveparser takes to parse thefollowing input.
x + x * y
For each step (loop iteration),show the input stream, the stack,and the parser action.
Exercise 11
Give the steps that a predictiveparser takes to parse thefollowing input.
x + x * y
For each step (loop iteration),show the input stream, the stack,and the parser action.
Acknowledgements
Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.
Acknowledgements
Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.
APPENDIX
APPENDIX
Chomsky hierarchy
Grammar Valid productions
Unrestricted 𝛼 → 𝛽
Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ
Context-Free x → 𝛽
Regularx → tx → t zx → 𝜀
Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-
terminals, and 𝜀.
Chomsky hierarchy
Grammar Valid productions
Unrestricted 𝛼 → 𝛽
Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ
Context-Free x → 𝛽
Regularx → tx → t zx → 𝜀
Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-
terminals, and 𝜀.
Backus-Naur Form
BNF is a standard ASCII notationfor specification of context-freegrammars whose terminals areASCII characters. For example:
<exp> ::= <exp> "+" <exp>| <exp> "-" <exp>| <var>
<var> ::= "x" | "y"
The BNF notation can itself bespecified in BNF.