Upload
thomasine-stanley
View
256
Download
1
Tags:
Embed Size (px)
Citation preview
1
Syntax AnalysisSyntax Analysis
Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers Bison/Yacc - parser generators Error Handling: Detection & R
ecovery
2
Introduction to parsersIntroduction to parsers
LexicalAnalyzer
Parser
SymbolTable
token
next token
source SemanticAnalyzer
syntaxtreecode
CFG
3
Context Free GrammarContext Free Grammar
CFG & Terminology Rewrite vs. Reduce Derivation
Language and CFL Equivalence & CNF
Parsing vs. Derivation lm/rm derivation & parse tree Ambiguity & resolution
Expressive power
Derivation is the reverse of Parsing.If we know how sentences are derived, we may find a parsing method in the reversed direction.
4
CFG: An ExampleCFG: An Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: expr, opProductions:
expr expr op expr expr ‘(’ expr ‘)’
expr ‘-’ expr expr id
op ‘+’ | ‘-’ | ‘*’ | ‘/’ The start symbol: expr
5
Notational Conventions in CFGNotational Conventions in CFG
• a, b, c, … [+-0-9], id: symbols in • A, B, C,…,S, expr,stmt: symbols in N• U, V, W,…,X,Y,Z: grammar symbols in(+N)• …denotes strings in (+N)*
• u, v, w,… denotes strings in *
• is an abbreviation of
• Alternatives: … at RHS
||| A
A
AA
7
Context-Free GrammarsContext-Free Grammars
A set of terminals: basic symbols from which sentences are formed
A set of nonterminals: syntactic variables denoting sets of strings
A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences
The start symbol: a distinguished nonterminal denoting the language
8
CFG: ComponentsCFG: ComponentsSpecification for Structures & ConstituencySpecification for Structures & Constituency
• CFG: formal specification of structure (parse trees)– G = {, N, P, S} : terminal symbols– N: non-terminal symbols– P: production rules– S: start symbol
9
CFG: ComponentsCFG: Components
: terminal symbols– the input symbols of the language
• programming language: tokens (reserved words, variables, operators, …)
• natural languages: words or parts of speech
– pre-terminal: parts of speech (when words are regarded as terminals)
• N: non-terminal symbols– groups of terminals and/or other non-terminals
• S: start symbol: the largest constituent of a parse tree
10
CFG: ComponentsCFG: Components
• P: production (re-writing) rules– form: A → β (A: non-terminal, β: string of
terminals and non-terminals)– meaning: A re-writes to (“consists of”, “derived
into”)β, or β reduced to A – start with “S-productions” (S → β)
11
DerivationsDerivations
A derivation step is an application of a production as a rewriting rule
E - EA sequence of derivation steps
E - E - ( E ) - ( id ) is called a derivation of “- ( id )” from E
The symbol * denotes “derives in zero or more steps”; the symbol + denotes “derives in one or more steps
12
CFG: Accepted LanguagesCFG: Accepted Languages
• Context-Free Language– Language accepted by a CFG
• L(G) = { | S + (strings of terminals that can be derived from start symbol)}
– Proof of acceptance: by induction• On the number of derivation steps
• On the length of input string
13
Context-Free LanguagesContext-Free Languages
A context-free language L(G) is the language defined by a context-free grammar G
A string of terminals is in L(G) if and only if S + , is called a sentence of G
If S * , where may contain nonterminals, then we call a sentential form of G
E - E - ( E ) - ( id ) G1 is equivalent to G2 if L(G1) = L(G2)
14
CFG: EquivalenceCFG: Equivalence• Chomsky Normal Form (CNF) (Chomsky, 1963):
– ε-free, and– Every production rule is in either of the following
form:• A → A1 A2 [two non-terminals: A1, A2], or• A → a [a terminal: a]
– i.e., two non-terminals or one terminal at the RHS
• Properties:– Generate binary parse tree– Good simplification for some algorithms
• e.g., grammar training with the inside-outside algorithm (Baker 1979)
– Good tool for theoretical proving• e.g., time complexity
15
CFG: EquivalenceCFG: Equivalence
• Every CFG can be converted into a weakly equivalent CNF– equivalence: L(G1) = L(G2)
• strong equivalent: assign the same phrase structure to each sentence (except for renaming non-terminals)
• weak equivalent: do not assign the same phrase structure to each sentence
– e.g., A → B C D == {A → B X, X → CD}
16
CFG: An ExampleCFG: An Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: E, opProductions:
E E op E …[R1] E ‘(’ E ‘)’ …[R2] E ‘-’ E …[R3] E id …[R4] op ‘+’ | ‘-’ | ‘*’ | ‘/’
The start symbol: E
17
Left- & Right-most DerivationsLeft- & Right-most DerivationsEach derivation step needs to choose
– a nonterminal to rewrite– an alternative to apply
A leftmost derivation always chooses the leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E + E ) lm - ( id + E ) lm - ( id + id )
A rightmost (canonical) derivation always chooses the rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E + E ) rm - (E + id ) rm - ( id + id )
18
Left- & Right-most DerivationsLeft- & Right-most Derivations Representation of leftmost/rightmost derivations:
Use the sequence of productions (or production numbers) to represent a derivation sequence.
Example:E rm - E rm - ( E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id ) => [3], [2], [1], [4], [4] (~ R3, R2, R1, R4, R
4)Advantage: A compact representation for
parse tree (data compression)Each parse tree has a unique leftmost/rightmo
st derivation
R3
R2 R1
19
Parse TreesParse Trees
A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting
PP
in
NP
NP
girl the park
NP
20
Context Free Grammar (CFG): Context Free Grammar (CFG): Specification for Structures & ConstituencySpecification for Structures & Constituency
• Parse Tree: graphical representation of structure– Root node (S): a sentencial level structure
– Internal nodes: constituents of the sentence
– Arcs: relationship between parent nodes and their children (constituents)
– Terminal nodes: surface forms of the input symbols (e.g., words)
• Bracketed notation: Alternative representation• e.g., [I saw [the [girl [in [the park]]]]]
21
Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”
PP
in
NP
NP
girl the parkI saw the
NP
S
VP
vpron det n p det n
1st parse
22
Parse Tree:Parse Tree:“I saw the girl in the park”“I saw the girl in the park”
PP
in
NP
NP
girl the park
NP
I saw the
NP
S
VP
vpron det n p det n
2nd parse
23
LM & RM: An ExampleLM & RM: An Example
E
-
( )
+
id id
E
E E
E E lm - E lm - ( E ) lm - ( E + E )lm - ( id + E ) lm - ( id + id )
E rm - E rm - ( E ) rm - ( E + E )rm - ( E + id ) rm - ( id + id )
24
Parse Trees & DerivationsParse Trees & Derivations
Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation
25
Ambiguous GrammarAmbiguous Grammar
A grammar is ambiguous if it produces more than one parse tree for some sentence more than one leftmost/rightmost derivation
E E + E id + E id + E * E id + id * E id + id * id
E E * E E + E * E id + E * E id + id * E id + id * id
26
Ambiguous GrammarAmbiguous Grammar
E
+E E
id
id
*E E
id
E
*E E
id
id
+E E
id
27
Resolving AmbiguityResolving Ambiguity
Use disambiguating rules to throw away
undesirable parse trees
Rewrite grammars by incorporating
disambiguating rules into unambiguous
grammars
28
An ExampleAn Example
The dangling-else grammar stmt if expr then stmt | if expr then stmt else stmt
| other
Two parse trees forif E1 then if E2 then S1 else S2
29
An ExampleAn Example
S
elseE S Sif then
if E then S
elseE
S
S Sif then
if E then S
Preferred parse: closest then
30
Disambiguating RulesDisambiguating Rules
Rule: match each else with the closest previous
unmatched then
Remove undesired state transitions in the
pushdown automaton (parser) shift/reduce conflict on “else”
1st parse: reduce
2nd parse: shift
31
Grammar RewritingGrammar Rewritingstmt m_stmt ; with only paired then-else | unm_stmt
m_stmt if expr then m_stmt else m_stmt | other
unm_stmt if expr then stmt | if expr then m_stmt else unm_stmt
So… cannot have unmatched then-else
want this then-else pair matched
32
RE RE vs.vs. CFG CFG
Every language described by a RE can also be described by a CFG
Example: (a|b)*abb A0 a A0 | b A0 | a A1 A1 b A2 A2 b A3 A3 (1) Right branching
(2) Starts with a terminal symbol
33
RE RE vs.vs. CFG CFGRegular Grammar:• Right branching• Starts with a
terminal symbol
A0
a(|b) A0
a(|b) A0A0
a A1
b A2A2
b A3
(a|b)* abb
34
RE vs. CFG
0 31 2a b b
a
b
start
RE: (a | b)*abb
A0 a A0 | b A0 | a A1
A1 b A2
A2 b A3
A3 A0
A1
A2
A3
35
RE vs. CFG
a DFA for (a | b)*abb
0 31 2ab b
a
b
start
a
b
a
A0
A1 A3
A2
A0 b A0 | a A1
A1 a A1 | b A2
A2 a A1 | b A3
A3 a A1 | b A0 |
36
CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)
• Writing a CFG for a FSA (RE)– define a non-terminal Ni for a state with state numb
er i
– start symbol S = N0 (assuming that state 0 is the initial state)
– for each transition δ(i,a)=j (from state i to stet j on input alphabet a), add a new production Ni → a Nj to P (if a== εNi → Nj)
– for each final state i, add a new production Ni → εto P
38
CFG: Expressive PowerCFG: Expressive Power
• CFG vs. Regular Expression (R.E.)– Every R.E. can be recognized by a FSA– Every FSA can be represented by a CFG
with production rules of the form: A → a B | ε
– (known as a “Regular Grammar”)
• Therefore, L(RE) L(CFG)
39
CFG: Expressive Power (cont.)CFG: Expressive Power (cont.)
• Chomsky Hierarchy:– R.E. : Regular set (recognized by FSAs)– CFG: Context-free (Pushdown automata)– CSG: Context-sensitive (Linear bounded aut
omata)– Unrestricted: Recursively enumerable (Tuni
ng Machine)
40
Push-Down AutomataPush-Down Automata
Finite Automata
Input
OutputStack
41
RE RE vs.vs. CFG CFG
Why use REs for lexical syntax?– do not need a notation as powerful as CFGs– are more concise and easier to understand than
CFGs– More efficient lexical analyzers can be constru
cted from REs than from CFGs– Provide a way for modularizing the front end i
nto two manageable-sized components
42
CFG CFG vs.vs. Finite-State Machine Finite-State Machine
• Inappropriateness of FSA– Constituents: only terminals
– Recursion: do not allow A => … B … => … A …
• RTN (Recursive Transition Network)– FSA with augmentation of recursion
– arc: terminal or non-terminal
– if arc is non-terminal: call to a sub-transition network & return upon traversal
43
Nonregular ConstructsNonregular Constructs
REs can denote only a fixed number of repetitions or an unspecified number of repetitions of one given constructE.g. a*b*
A nonregular construct:– L = {anbn | n 1}
44
Non-Context-Free ConstructsNon-Context-Free Constructs
CFGs can denote only a fixed number of repetitions or an unspecified number of repetitions of one or two (paired) given constructs E.g. anbn
Some non-context-free constructs:– L1 = {wcw | w is in (a | b)*}
• declaration/use of identifiers
– L2 = {anbmcndm | n 1 and m 1}• #formal arguments/#actual arguments
– L3 = {anbncn | n 0}• e.g., b: Backspace, c: under score
45
Context-Free ConstructsContext-Free Constructs
FA (RE) cannot keep countsCFGs can keep count of two items but not
threeSimilar context-free constructs:
– L’1 = {wcwR | w is in (a | b)*, R: reverse order}– L’2 = {anbmcmdn | n 1 and m 1}– L’’2 = {anbncmdm | n 1 and m 1}– L’3 = {anbn | n 1}
46
CFG ParsersCFG Parsers
47
Types of CFG ParsersTypes of CFG Parsers
Universal: can parse any CFG grammar CYK, Earley
CYK: Exhaustively matching sub-ranges of input tokens against grammar rules, from smaller ranges to larger ranges
Earley: Exhaustively enumerating possible expectations from left-to-right, according to current input token and grammar
Non-universal: not all CFG’s can be parsed (e.g., recursive descent parser)
Universal (to all grammars) is NOT always efficient
48
Types of CFG ParsersTypes of CFG Parsers Practical Parsers: [“what is a good parser?”]
Simple: simple program structure Left-to-right (or right-to-left) scan
middle-out or island driven is often not preferred
Top-down or Bottom up matching
Efficient: efficient for good/bad inputs Parse normal syntax quickly Detect errors immediately on next token
Deterministic: No alternative choices during parsing given next token Small lookahead buffer (also contribute to efficiency)
49
Types of CFG ParsersTypes of CFG Parsers
Top Down:Matching from start symbol down to terminal
tokens
Bottom Up:Matching input tokens with reducible rules
from terminal up to start symbol
50
Efficient CFG ParsersEfficient CFG Parsers
Top Down: LL ParsersMatching from start symbol down to terminal
tokens, left-to-right, according to a leftmost derivation sequence
Bottom Up: LR ParsersMatching input tokens with reducible rules,
left-to-right, from terminal up to start symbol, in a reverse order of rightmost derivation sequence
51
Efficient CFG ParsersEfficient CFG Parsers
Efficient & Deterministic Parsing – only possible for some subclasses of grammars with special parsing algorithmsTop Down:
Parsing LL Grammars with LL Parsers
Bottom Up:Parsing LR Grammars with LR ParsersLR grammar is a larger class of grammars than LL
52
Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers
Parsing Table:A pre-computed table (according to the gram
mar), indicating the appropriate action(s) to take in any predefined state when some input token(s) is/are under examination
Lookahead symbol(s): the input symbol(s) under examination for determining next action(s) id + * num
State-0 action-1 action-3
State-1 action-2 action-5
State-2 action-4
Good parsers do not change their codes when the grammar
is revised. Table driven.
53
Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers
Parsing Table Construction:Decide a pre-defined number of lookaheads to
use for predicting next stateDefine and enumerate all the unique states for
the parsing methodDecide the actions to take in all states with all
possible lookahead(s)
54
Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers
X-Parser: you can invent any parser and call it the X-ParserBut its parsing algorithm may not handle all
grammars deterministically, thus efficiently.X-Grammar:
Any grammar whose parsing table for the X-parsing method/X-Parser has no conflicting actions in all states
Non-X Grammar: has more than one action to take under any state
55
Parsing Table Construction for Parsing Table Construction for Efficient ParsersEfficient Parsers
k: The number of lookahead symbols used by a parser to determine the next action A larger number of lookahead symbols tends to make
it less possible to have conflicting actions But may result in a much larger table that grows exponential
ly with the number of lookaheads Does not guarantee unambiguous for some grammars (inher
ently ambiguous) even with infinite lookaheads X(k) Parser:
X Parser that uses k lookahead symbols to determine the next action
X(k) Grammar: any grammar deterministically parsable with X(k) Par
ser
56
Types of Grammars Capable of Types of Grammars Capable of Efficient ParsingEfficient Parsing
LL(k) GrammarsGrammars that can be deterministically
parsed using an LL(k) parsing algorithme.g., LL(1) grammar
LR(k) GrammarsGrammars that can be deterministically
parsed using an LR(k) parsing algorithme.g., SLR(1) grammar, LR(1) grammar,
LALR(1) grammar
57
Top-Down CFG ParsersTop-Down CFG Parsers
Recursive Descent Parser
vs.
Non-Recursive LL(1) Parser
58
Top-Down ParsingTop-Down ParsingConstruct a parse tree from the root to the
leaves using leftmost derivation
S c A B input: cadA a b | aB d
S
c A B
S
c A B
a
S
c A B
a b
S
c A B
a d
59
Predictive ParsingPredictive Parsing
A top-down parsing without backtracking– there is only one alternative production to choo
se at each derivation step
stmt if expr then stmt else stmt | while expr do stmt | begin stmt_list end
60
LL(LL(kk) Parsing) Parsing
The first L stands for scanning the input from left to right
The second L stands for producing a leftmost derivation
The k stands for the number of input symbols for lookahead used to choose alternative productions at each derivation step
61
LL(1) ParsingLL(1) Parsing
Use one input symbol of lookaheadSame as Recursive-descent parsing
But, Non-recursive predictive parsing
62
Recursive Descent Parsing (more)Recursive Descent Parsing (more)
The parser consists of a set of (possibly recursive) procedures
Each procedure is associated with a nonterminal of the grammar
The calling sequence of procedures in processing the input implicitly defines a parse tree for the input
63
An ExampleAn Example
type simple | id | array [ simple ] of type
simple integer | char | num dotdot num
64
An ExampleAn Example
type
array [ simple ] of type
dotdotnum num simple
integer
array [ num dotdot num ] of integer
65
An ExampleAn Exampleprocedure type;begin if lookahead is in { integer, char, num } then simple else if lookahead = id then match(id) else if lookahead = array then begin match(array); match('['); simple; match(']'); match(of); type end else errorend;
66
An ExampleAn Example
procedure match(t : token);begin if lookahead = t then lookahead := nexttoken else errorend;
67
An ExampleAn Example
procedure simple;begin if lookahead = integer then match(integer) else if lookahead = char then match(char) else if lookahead = num then begin match(num); match(dotdot); match(num) end else errorend;
68
LL(k) Constraint: Left RecursionLL(k) Constraint: Left Recursion
A grammar is left recursive if it has a nonterminal A such that A + A
A A | A R R R |
A
A
A
A
A R
RRR
*
69
Direct/Immediate Left Direct/Immediate Left RecursionRecursion
A A 1 | A 2 | ... | A m | 1 | 2 | ... | n
A 1 A' | 2 A' | ... | n A'
A' 1 A' | 2 A' | ... | m A' |
is equivalent to …
(1 | 2 | ... | n ) (1 | 2 | ... | m )*
A A i | j (i=1,m ; j=1,n)
70
An ExampleAn Example
E E + T | TT T * F | FF ( E ) | id
E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
71
Indirect Left RecursionIndirect Left Recursion
G0: S A a | b A A c | S d |
Problem: Indirect Left-Recursion: S A a S d a
Solution-Step1: Indirect to Direct Left-Recursion: A A c | A a d | b d |
Solution-Step2: Direct Left-Recursion to Right-Recursion: S A a | b A b d A' | A' A' c A' | a d A' |
• Scan rules top-down• Do not start with symbols defined earlier (=> substitute them if any)• Resolve direct recursion
72
Indirect Left RecursionIndirect Left Recursion
Input. Grammar G with no cycles or -production.Output. An equivalent grammar with no left recursion.1. Arrange the nonterminals in some order A1, A2, ..., An
2. for i := 1 to n do begin // Step1: Substitute 1st-symbols of Aifor j := 1 to i - 1 do begin // which are previous Aj’s replace each production of the form Ai Aj ( j < i )
by the production Ai 1 | 2 | ... | k where Aj 1 | 2 | ... | k are all thecurrent Aj-productions;
endeliminate direct left recursion among Ai-productions // Step2
end
73
Left FactoringLeft Factoring
Two alternatives of a nonterminal A have a nontrivial common prefix if , and
A 1 | 2
A A'A' 1 | 2
74
An ExampleAn Example
S i E t S | i E t S e S | aE b
S i E t S S' | aS' e S | E b
76
Top-Down Parsing: as Stack Top-Down Parsing: as Stack MatchingMatching
Construct a parse tree from the root to the leaves using leftmost derivation
S c A B input: cadA a b | aB d
S
c A B
S
c A B
a
S
c A B
a b
S
c A B
a d
77
Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – General Stateg – General State
Parsing program(parser/driver)
Parsing table
Input
Output
Stack
Predictive: pre-computed
parsing actions
M[X,a]= {X -> Y1 Y2 … Yk}
X
…Non-
Recursive: “Stack + Driver
Program” (instead of Recursive
procedures)
a b c … x y z
78
Nonrecursive Predictive Parsing Nonrecursive Predictive Parsing – Expand Non-terminal– Expand Non-terminal
Parsing program(parser/driver)
Parsing table
Input
Output
Stack
Predictive: pre-computed
parsing actions
M[X,a]= {X -> Y1 Y2 … Yk}
Y1
Y2
…
Yk
Non-Recursive: “Stack + Driver
Program” (instead of Recursive
procedures)
a b c … x y z
79
Nonrecursive Predictive ParsinNonrecursive Predictive Parsing – Match Terminalg – Match Terminal
Parsing program(parser/driver)
Parsing table
Input
Output
Stack
Predictive: pre-computed
parsing actions
M[X,a]= {X -> Y1 Y2 … Yk}
Y1
Y2
…
Yk
Non-Recursive: “Stack + Driver
Program” (instead of Recursive
procedures)
a b c … x y z
=a
80
Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery
Parsing program(parser/driver)
Parsing table
Input
Output
Stack
Predictive: pre-computed
parsing actions
M[X,a]= {X -> Y1 Y2 … Yk}
Y1
Y2
…
Yk
Non-Recursive: “Stack + Driver
Program” (instead of Recursive
procedures)
a b c … x y z
=a
=c
81
Nonrecursive Predictive ParsinNonrecursive Predictive Parsing - Error Recoveryg - Error Recovery
Parsing program(parser/driver)
Parsing table
Input
Output
Stack
Predictive: pre-computed
parsing actions
M[X,a]= {X -> Y1 Y2 … Yk}
Y1
Y2
…
Yk
Non-Recursive: “Stack + Driver
Program” (instead of Recursive
procedures)
a b c … x y z
=a
=c
83
Stack OperationsStack Operations
Match– when the top stack symbol is a terminal and it
matches the input symbol, pop the top stack symbol and advance the input pointer
Expand– when the top stack symbol is a nonterminal, rep
lace this symbol by the right hand side of one of its productions
• Leftmost RHS symbol at Top-of-Stack
84
An ExampleAn Example
type simple | id | array [ simple ] of type
simple integer | char | num dotdot num
85
An ExampleAn ExampleAction Stack InputE type array [ num dotdot num ] of integerM type of ] simple [ array array [ num dotdot num ] of integerM type of ] simple [ [ num dotdot num ] of integerE type of ] simple num dotdot num ] of integerM type of ] num dotdot num num dotdot num ] of integerM type of ] num dotdot dotdot num ] of integerM type of ] num num ] of integerM type of ] ] of integerM type of of integerE type integerE simple integerM integer integer
86
Parsing programParsing program
push $S onto the stack, where S is the start symbolset ip to point to the first symbol of w$; // try to match S$ with w$repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error // or error_recovery() else // X is a nonterminal
if M[X, a] = X Y1 Y2 ... Yk then pop X from and push Yk ... Y2 Y1 onto the stack else error // or error_recovery()until X = $
87
Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent
X() { // WITHOUT ε-production: X→ε
if (LA=‘a’) then
Y1(); Y2(); …Yk();
else if (LA=‘b’)
Z1(); Z2(); …; Zm();
else ERROR(); // no X→ε
// else RETURN; if X exists
} // Recursive decent procedure for matching X
a b c d
X X Y1 Y2 … Yk X Z1 Z2 … Zm
Y1 Y1 1 Y1 2
Z1 Z1 1 Z1 2
‘a’ in FirstSet( Y1 Y2 … Yk )
‘b’ in FirstSet( Z1 Z2 … Zm )
88
Parser Driven by a Parsing Table:Parser Driven by a Parsing Table:Non-recursive DescentNon-recursive Descent
X() { // WITH ε-production: X→ε
if (LA=‘a’) then
Y1(); Y2(); …Yk();
else if (LA=‘b’)
Z1(); Z2(); …; Zm();
// else ERROR(); // no X→ε
else if (LA=??) RETURN; // if X exists
} // Recursive decent procedure for matching X
a b c d
X X Y1 Y2 … Yk X Z1 Z2 … Zm X
Y1 Y1 1 Y1 2
Z1 Z1 1 Z1 2
‘a’ in FirstSet( Y1 Y2 … Yk )
‘b’ in FirstSet( Z1 Z2 … Zm )
‘d’ in FollowSet(X)(S =>* …X d …)
89
First Sets: Predictive ParsingFirst Sets: Predictive Parsing
The first set of a string is the set of terminals that begin the strings derived from. If * , then is also in the first set of
.Used simply to flag whether can be null for
computing First SetNot for matching any real input when parsing
FIRST() = {a | * a }+{ , if * }FIRST() includes { }: means that *
90
Compute First SetsCompute First Sets
If X is terminal, then FIRST(X) is {X} If X is nonterminal and X is a production,
then add to FIRST(X) If X is nonterminal and X Y1 Y2 ... Yk is a pr
oduction, then add a to FIRST(X) if for some i, a is in FIRST(Yi) and is in all of FIRST(Y1), ..., FIRST(Yi-1).
If is in FIRST(Yj) for all j, then add to FIRST(X)
91
Follow Sets: Matching EmptyFollow Sets: Matching Empty
What to do with matching null: A ? TD Recursive Descent Parsing: “assumes” success LL: more predictive => Follow Set of ‘A’
The follow set of a nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form, namely,
S * A a
a is in the follow set of A.
92
Compute Follow SetsCompute Follow Sets Initialization: Place $ in FOLLOW(S), where S is the
start symbol and $ is the input right end marker. If there is a production A B , then everything in
FIRST() except for is placed in FOLLOW(B) is not considered a visible input to follow any symbol
If there is a production A B or A B where FIRST() contains (i.e., * ), then everything in FOLLOW(A) is in FOLLOW(B) S * … A a … implies S * … B a YES:“every symbol that can follow A will also follow B” NO!: “every symbol that can follow B will also follow A”
93
An ExampleAn Example
E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }FIRST(T') = { *, }FOLLOW(E) = FOLLOW(E') = { ), $ }FOLLOW(T) = FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }
94
Constructing Parsing TableConstructing Parsing Table
Input. Grammar G.
Output. Parsing Table M.
Method.
1. For each production A of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST( ), add A to M[A, a].
3. If is in FIRST( ) [A * ], add A to M[A, b] for each
terminal b [including ‘$’] in FOLLOW(A).
- If is in FIRST( ) and $ is in FOLLOW(A),
add A to M[A, $].
4. Make each undefined entry of M be error.
95
LL(1) Parsing Table ConstructionLL(1) Parsing Table Construction
A() { // WITH/WITHOUT ε-productions: A (* )
if (LA=‘a’ in First(Y1 Y2… Yk)) then
Y1(); Y2(); …Yk();
else if (LA=‘b’ in Follow(A) & εin First(Z1 Z2... ))
Z1(); Z2(); …; Zm(); // Nullable
else ERROR();
} // Recursive version of LL(1) parser
a in First() b in Follow(A) c not in First() or Follow(A)
A A A (* ) error
B
CWhen to apply A ?
including A
96
An ExampleAn Example
id + * ( ) $E E TE' E TE'E' E' +TE' E' E' T T FT' T FT' T' T' T' *FT' T' T' F F id F (E)
97
An ExampleAn Example Stack Input Output$E id + id * id$ $E'T id + id * id$ E TE' $E'T'F id + id * id$ T FT' $E'T'id id + id * id$ F id$E'T' + id * id$$E' + id * id$ T' $E'T+ + id * id$ E' + TE' $E'T id * id$$E'T'F id * id$ T FT' $E'T'id id * id$ F id$E'T' * id$
$E'T'F* * id$ T' * FT' $E'T'F id$$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'
98
LL(1) GrammarsLL(1) Grammars
A grammar is an LL(1) grammar if its predictive parsing table has no multiply-defined entries
99
A Counter ExampleA Counter Example
S i E t S S' | aS' e S | E b
a b e i t $S S a S i E t S S'S' S' S' S' e SE E b
e FOLLOW(S’)
e FIRST(e S)Disambiguation: matching closest “then”
100
LL(1) Grammars or Not ??LL(1) Grammars or Not ??
A grammar G is LL(1) iff whenever A | are two distinct productions of G, the following conditions hold:– For no terminal a do both and derive strings beginning
with a.• or… M[A, first()&first()] entries will have conflicting actions
– At most one of and can derive the empty string• or… M[A, follow(A)] entries have conflicting actions
– If * , then does not derive any string beginning with a terminal in FOLLOW(A).
• or… M[A, first()&follow(A)] entries have conflicting actions
101
Non-LL(1) Grammar:Non-LL(1) Grammar:Ambiguous According to LL(1) Ambiguous According to LL(1)
Parsing Table ConstructionParsing Table Construction
a in First() & First() b in Follow(A) a in First() & Follow(A)
A A A
A (* )
A (* )
A (/* ) (but * a )
A (* )
B
C
When will A & A appear in the same table cell ??
S' e S | X X a | b
102
LL(1) Grammars or Not??LL(1) Grammars or Not??
If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry=> non-LL(1)E.g., X X a | b
=> FIRST(X) = {b} (and, of course, FIRST(b) = {b})
=> M[X,b] includes both {X X a} and {X b}
i.e., Ambiguous G and G with left-recursive productions can not be LL(1).
No LL(1) grammar can be ambiguous
103
Error Recovery for LL ParsersError Recovery for LL Parsers
104
Syntactic ErrorsSyntactic Errors
• Empty entries in a parsing table:– Syntactic error is encountered when the lookah
ead symbol corresponding to this entry is in input buffer
– Error Recovery information can be encoded in such entries to take appropriate actions upon error
• Error Detection:– (1) Stacktop = x && x != input (a)– (2) Stacktop = A && M[A, a] = empty (error)
105
Error Recovery StrategiesError Recovery Strategies Panic mode: skip tokens until a token in a set of
synchronizing tokens appears INS (insertion) type of errors sync at delimiters, keywords, …, that have clear
functions Phrase Level Recovery
local INS (insertion), DEL (deletion), SUB (substitution) types of errors
Error Production define error patterns (“error productions”) in grammar
Global Correction [Grammar Correction] minimum distance correction
106
Error Recovery – Panic ModeError Recovery – Panic Mode
Panic mode: skip tokens until a token in a set of synchronizing tokens appears
Commonly used Synchronizing tokens:– SUB(A,ip): use FOLLOW(A) as sync set for A (pop A)
– use the FIRST set of a higher construct as sync set for a lower construct
– INS(ip): use FIRST(A) as sync set for A
– *ip= : use the production deriving as the default
– DEL(ip): If a terminal on stack cannot be matched, pop the terminal
107
… …
Error Recovery – Panic ModeError Recovery – Panic ModeAction Stack InputSUB(A,ip)
INS(ip)
DEL(ip)
… A *ip … Follow(A) …A
… A *ip … First(A) …
… x *ip … …
A
x
X
…
Follow(A)…
A
*ip
X
… A
First(A)…*ip
X
… …x
*ip
x
108
Error Recovery Actions Using Error Recovery Actions Using Follow & First Sets to SyncFollow & First Sets to Sync
Expanding non-terminal A: M[A,a] = error (blank):
Skip “a” in input = delete all such “a” (until sync with sync symbol, b) /* panic */
M[A,b] = sync (at FOLLOW(A)) Pop “A” from stack = “b” is a sync symbol following A
M[A,b] = A (== sync at FIRST(A) ) Expand A as (same as normal parsing action)
Matching terminal “x”: (*sp=“x”) != “a”
Pop(x) from stack = missing input token “x”
109
An ExampleAn Example
id + * ( ) $E E TE' E TE' sync syncE' E' +TE' E' E' T T FT' sync T FT' sync syncT' T' T' *FT' T' T' F F id sync sync F (E) sync sync
FOLLOW(F)={+,*,),$}
FOLLOW(E)=FOLLOW(E’)={),$}
FIRST(X) is used to Expand non-productions or Sync (on errors)
FOLLOW(X) is used to Expand -productions or Sync (on errors)
110
An ExampleAn Example Stack Input Output$E ) id * + id$ error, skip )$E id * + id$ id is in FIRST(E)$E'T id * + id$ E TE' $E'T'F id * + id$ T FT' $E'T'id id * + id$ F id$E'T' * + id$$E'T'F* * + id$ T' *FT' $E'T'F + id$ error, M[F,+]=synch / FOLLOW(F)$E'T' + id$ F popped$E' + id$ T' $E'T+ + id$ E' +TE' $E'T id$$E'T'F id$ T FT'$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'
111
Parse Tree - Error RecoveredParse Tree - Error Recovered
E
) E’
ε
+ E’T
ε
F T’
id
T
F
id
T’
ε
F* T’
) id * + id => id * F + id