Upload
everett-bell
View
214
Download
1
Embed Size (px)
Citation preview
CH4.1
CSE244
Sections 4.1-4.4: Syntax AnalysisSections 4.1-4.4: Syntax Analysis
Aggelos KiayiasComputer Science & Engineering Department
The University of Connecticut371 Fairfield Road
Storrs, CT [email protected]
http://www.cse.uconn.edu/~akiayias
CH4.2
CSE244
Syntax Analysis - ParsingSyntax Analysis - Parsing
An overview of parsing : An overview of parsing : Functions & ResponsibilitiesFunctions & Responsibilities
Context Free GrammarsContext Free Grammars Concepts & TerminologyConcepts & Terminology
Writing and Designing GrammarsWriting and Designing Grammars Resolving Grammar Problems / DifficultiesResolving Grammar Problems / Difficulties Top-Down Parsing Top-Down Parsing
Recursive Descent & Predictive LLRecursive Descent & Predictive LL Bottom-Up ParsingBottom-Up Parsing
LR & LALRLR & LALR Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead
CH4.3
CSE244
An Overview of ParsingAn Overview of Parsing
Why are Grammars to formally describe Languages Important ?
1. Precise, easy-to-understand representations
2. Compiler-writing tools can take grammar and generate a compiler
3. allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones
ADA ADA9x, C++ Adds: Templates, exceptions,
How do grammars relate to parsing process ?
CH4.4
CSE244
Parsing During CompilationParsing During Compilation
interm repres
errors
lexical analyzer
parserrest of
front end
symbol table
source program
parse treeget next
token
token
regular expressions
• also technically part or parsing
• includes augmenting info on tokens in source, type checking, semantic analysis
• uses a grammar to check structure of tokens• produces a parse tree• syntactic errors and recovery• recognize correct syntax• report errors
CH4.5
CSE244
Parsing ResponsibilitiesParsing Responsibilities
Syntax Error Identification / Handling
Recall typical error types:
Lexical : Misspellings
Syntactic : Omission, wrong order of tokens
Semantic : Incompatible types
Logical : Infinite loop / recursive call
Majority of error processing occurs during syntax analysis
NOTE: Not all errors are identifiable !! Which ones?
CH4.6
CSE244
Key Issues – Error ProcessingKey Issues – Error Processing
• Detecting errors
• Finding position at which they occur
• Clear / accurate presentation
• Recover (pass over) to continue and find later
errors
• Don’t impact compilation of “correct”
programs
CH4.7
CSE244
What are some Typical Errors ?What are some Typical Errors ?
#include<stdio.h>
int f1(int v)
{ int i,j=0;
for (i=1;i<5;i++)
{ j=v+f2(i) }
return j; }
int f2(int u)
{ int j;
j=u+f1(u*u)
return j; }
int main()
{ int i,j=0;
for (i=1;i<10;i++)
{ j=j+i*I; printf(“%d\n”,i);
printf("%d\n",f1(j));
return 0;
}
Which are “easy” to recover from? Which are “hard” ?
As reported by MS VC++
'f2' undefined; assuming extern returning intsyntax error : missing ';' before ‘}‘syntax error : missing ';' before ‘return‘fatal error : unexpected end of file found
CH4.8
CSE244
Error Recovery StrategiesError Recovery Strategies
Panic Mode– Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statementPhrase Level – Local correction on input -- “,” ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped
CH4.9
CSE244
Error Recovery Strategies – (2)Error Recovery Strategies – (2)
Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- example: add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issues
CH4.10
CSE244
Motivating GrammarsMotivating Grammars
• Regular Expressions
Basis of lexical analysis
Represent regular languages
• Context Free Grammars
Basis of parsing
Represent language constructs
Characterize context free languages
Reg. Lang. CFLs
EXAMPLE: anbn , n 1 : Is it regular ?
CH4.11
CSE244
Context Free Grammars : Context Free Grammars : Concepts & TerminologyConcepts & Terminology
Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generated by the grammar & in the language
S: Start symbol, SNT, which defines all strings of the language
PR: Production rules to indicate how T and NT are combined to generate valid strings of the language.
PR: NT (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model
CH4.12
CSE244
Context Free Grammars : A First LookContext Free Grammars : A First Look
assign_stmt id := expr ;
expr expr operator term
expr term
term id
term real
term integer
operator +
operator -
What do “blue” symbols represent?
Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a sequence of terminals / tokens.
Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.
CH4.13
CSE244
DerivationDerivation
Let’s derive: id := id + real – integer ;
assign_stmt assign_stmt id := expr ;
id := expr ; expr expr operator term
id := expr operator term; expr expr operator term
id := expr operator term operator term; expr term
id := term operator term operator term; term id
id := id operator term operator term; operator +
id := id + term operator term; term real
id := id + real operator term; operator -
id := id + real - term; term integer
id := id + real - integer;
using production:
CH4.14
CSE244
Example GrammarExample Grammar
expr expr op expr
expr ( expr )
expr - expr
expr id
op +
op -
op *
op /
op
Black : NT
Blue : T
expr : S
9 Production rules
To simplify / standardize notation, we offer a synopsis of terminology.
CH4.15
CSE244
Example Grammar - TerminologyExample Grammar - Terminology
Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings
Non Terminals: A,B,C,S, black strings
T or NT: X,Y,Z
Strings of Terminals: u,v,…,z in T*
Strings of T / NT: , in ( T NT)*
Alternatives of production rules:
A 1; A 2; …; A k; A 1 | 2 | … | 1
First NT on LHS of 1st production rule is designated as start symbol !
E E A E | ( E ) | -E | id
A + | - | * | / |
CH4.16
CSE244
Grammar ConceptsGrammar Concepts
A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.
EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E
EXAMPLE: E E A E E * E E * ( E )
DEFINITION: derives in one step
derives in one step
derives in zero steps
+
*
EXAMPLES: A if A is a production rule
1 2 … n 1 n ; for all
If and then
* *
**
CH4.17
CSE244
How does this relate to Languages?How does this relate to Languages?
Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G) S W.
+
+
W : is a sentence of G
When S (and may have NTs) it is called a sentential form of G.
*
EXAMPLE: id * id is a sentence
Here’s the derivation:
E E A E E * E id * E id * id
E id * idSentential forms
CH4.18
CSE244
lm
Other Derivation ConceptsOther Derivation Concepts
Leftmost: Replace the leftmost non-terminal symbol
E E A E id A E id * E id * id
Rightmost: Replace the rightmost non-terminal symbol
E E A E E A id E * id id * idrm
lmlmlmlm
rmrmrm
Important Notes: A
If A , what’s true about ?
If A , what’s true about ?rm
Derivations: Actions to parse input can be represented pictorially in a parse tree.
CH4.19
CSE244
Examples of LM / RM DerivationsExamples of LM / RM Derivations
E E A E | ( E ) | -E | id
A + | - | * | / |
A leftmost derivation of : id + id * id
A rightmost derivation of : id + id * id
CH4.20
CSE244
Derivations & Parse TreeDerivations & Parse Tree
E
EE A
E
EE A
*
E
EE A
id *
E
EE A
id id*
E * E
E E A E
id * E
id * id
CH4.21
CSE244
Parse Trees and DerivationsParse Trees and Derivations
Consider the expression grammar:
E E+E | E*E | (E) | -E | id
Leftmost derivations of id + id * id
E E + E E + E id + E
E
EE +
id
E
EE *
id + E id + E * E
E
EE +
id
E
EE +
CH4.22
CSE244
Parse Tree & Derivations - continuedParse Tree & Derivations - continued
E
EE *
id + E * E id + id * E
E
EE +
id
id
id + id * E id + id * id E
EE *
E
EE +
id
idid
CH4.23
CSE244
Alternative Parse Tree & DerivationAlternative Parse Tree & Derivation
E E * E
E + E * E
id + E * E
id + id * E
id + id * id
E
E E+
E
E E*
id
id id
WHAT’S THE ISSUE HERE ?
Two distinct leftmost derivations!
CH4.24
CSE244
Resolving Grammar Problems/DifficultiesResolving Grammar Problems/Difficulties
Regular Expressions : Basis of Lexical Analysis
Reg. Expr. generate/represent regular languages
Reg. Languages smallest, most well defined class
of languages
Context Free Grammars: Basis of Parsing
CFGs represent context free languages
CFLs contain more powerful languages
Reg. Lang. CFLsanbn – CFL that’s not regular.
CH4.25
CSE244
Resolving Problems/Difficulties – (2)Resolving Problems/Difficulties – (2)
Since Reg. Lang. Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA.
start0 3b21 ba
a
b
Recall: (a | b)*abb
CH4.26
CSE244
Resolving Problems/Difficulties – (3)Resolving Problems/Difficulties – (3)
Construct CFG as follows:
1. Each State I has non-terminal Ai : A0, A1, A2, A3
2. If then Ai a Aj
3. If then Ai bAj
4. If I is an accepting state, Ai : A3
5. If I is a starting state, Ai is the start symbol : A0
i ja
i jb
: A0 aA0, A0 aA1
: A0 bA0, A1 bA2
: A2 bA3
T={a,b}, NT={A0, A1, A2, A3}, S = A0
PR ={ A0 aA0 | aA1 | bA0 ; A1 bA2 ; A2 bA3 ; A3 }
start 0 3b21 ba
a
b
CH4.27
CSE244
How Does This CFG Derive Strings ?How Does This CFG Derive Strings ?
start0 3b21 ba
a
b
vs. A0 aA0, A0 aA1
A0 bA0, A1 bA2
A2 bA3, A3
How is abaabb derived in each ?
CH4.28
CSE244
Regular Expressions vs. CFGsRegular Expressions vs. CFGs
Regular expressions for lexical syntax
1. CFGs are overkill, lexical rules are quite simple and straightforward
2. REs – concise / easy to understand
3. More efficient lexical analyzer can be constructed
4. RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion.
CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, …
Intended for structural associations between tokens !
Are tokens in correct order ?
CH4.29
CSE244
Resolving Grammar Difficulties : Resolving Grammar Difficulties : MotivationMotivation
The structure of a grammar affects the compiler designrecall “syntax-directed” translation
• Different parsing approaches have different needs
Top-Down vs. Bottom-Up
redesigning a grammar may assist in producing better parsing methods.
- ambiguity- -moves- cycles- left recursion- left factoring
Grammar Problems
CH4.30
CSE244
Resolving Problems: Ambiguous Resolving Problems: Ambiguous GrammarsGrammars
Consider the following grammar segment:stmt if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
What’s problem here ?
Let’s consider a simple parse tree:
stmt
stmt
stmtexpr
exprE1
E2S3
S1
S2
then
then
else
else
if
ifstmt stmt
Else must match to previous then. Structure indicates parse subtree for expression.
CH4.31
CSE244
Example : What Happens with this string?Example : What Happens with this string?
If E1 then if E2 then S1 else S2
How is this parsed ?
if E1 then
if E2 then
S1
else
S2
if E1 then
if E2 then
S1
else
S2
vs.
What’s the issue here ?
CH4.32
CSE244
Parse Trees for ExampleParse Trees for Example
Form 1:
stmt
stmt
stmtexpr
E1 S2
then elseif
expr
E2S1
thenifstmt
stmt
expr
E1
thenif stmt
expr
E2S2S1
then elseif
stmt stmt
Form 2:
What’s the issue here ?
CH4.33
CSE244
Removing AmbiguityRemoving Ambiguity
Take Original Grammar:
stmt if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Or to write more simply:
S i E t S | i E t S e S | sE a
The problem string: i a t i a t s e s
CH4.34
CSE244
Revise to remove ambiguity:
stmt matched_stmt | unmatched_stmt
matched_stmt if expr then matched_stmt else matched_stmt | other
unmatched_stmt if expr then stmt
| if expr then matched_stmt else unmatched_stmt
S M | UM i E t M e M | sU i E t S | i E t M e U E a
S i E t S | i E t S e S | sE a
i a t i a t s e sTry the above on
CH4.35
CSE244
Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion
A left recursive grammar has rules that support the derivation : A A, for some .+
Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.
A A A A … etc. A A |
Take left recursive grammar:
A A | To the following:
A A’
A’ A’ |
CH4.36
CSE244
Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?
Consider: E E + T | T T T * F | F F ( E ) | id
Derive : id + id + id
E E + T
How can left recursion be removed ?
E E + T | T What does this generate?
E E + T T + T
E E + T E + T + T T + T + T
How does this build strings ?
What does each string have to start with ?
CH4.37
CSE244
Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)
Informal Discussion:
Take all productions for A and order as:
A A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T
T T * F | F
F ( E ) | id
E TE’ E’ + TE’ | T FT’
T’ * FT’ | F ( E ) | id
CH4.38
CSE244
Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep, this isn’t enough
S Aa | b
A Ac | Sd | S Aa Sda
Algorithm:Input: Grammar G with ordered Non-Terminals A1, ..., An
Output: An equivalent grammar with no left recursion
1. Arrange the non-terminals in some order A1=start NT,A2,…An
2. for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai Aj
by the productions Ai 1 | 2 | … | k
where Aj 1|2|…|k are all current Aj productions; end
eliminate the immediate left recursion among Ai productions end
CH4.39
CSE244
Using the AlgorithmUsing the Algorithm
Apply the algorithm to: A1 A2a | b|
A2 A2c | A1d
i = 1
For A1 there is no left recursion
i = 2
for j=1 to 1 do
Take productions: A2 A1 and replace with
A2 1 | 2 | … | k |
where A1 1 | 2 | … | k are A1 productions
in our case A2 A1d becomes A2 A2ad | bd | dWhat’s left: A1 A2a | b |
A2 A2 c | A2 ad | bd | dAre we done ?
CH4.40
CSE244
Using the Algorithm (2)Using the Algorithm (2)
No ! We must still remove A2 left recursion !
A1 A2a | b |
A2 A2 c | A2 ad | bd | d
Recall:
A A1 | A2 | … | Am | 1 | 2 | … | n
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ |
Apply to above case. What do you get ?
CH4.41
CSE244
Removing Difficulties : Removing Difficulties : -Moves-Moves
Transformation: In order to remove A find all rules of the form B uAv and add the rule B uv to the grammar G.
Why does this work ?
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
Examples:
A1 A2 a | b
A2 bd A2’ | A2’
A2’ c A2’ | bd A2’ |
CH4.42
CSE244
Removing Difficulties : CyclesRemoving Difficulties : Cycles
How would cycles be removed ?
Make sure every production is addingsome terminal(s) (except a single -production in the start NT)…
e.g.
S SS | ( S ) |
Has a cycle: S SS S
S Transform to:
S S ( S ) | ( S ) |
Transformation: substitute A uBv by the productions A u1v| ... | unv assuming that all the productions for B are B 1| ... | n
CH4.43
CSE244
Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose:
stmt if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A 1 | 2 : if expr then stmt
1: else stmt 2 :
Transform to:
A A’
A’ 1 | 2
EXAMPLE:
stmt if expr then stmt rest
rest else stmt |
CH4.44
CSE244
Resolving Grammar ProblemsResolving Grammar Problems
Note: Not all aspects of a programming language can be represented by context free grammars / languages.
Examples:
1. Declaring ID before its use
2. Valid typing within expressions
3. Parameters in definition vs. in call
These features are call context-sensitive and define yet another language class, CSL.
Reg. Lang. CFLs CSLs
CH4.45
CSE244
Context-Sensitive Languages - ExamplesContext-Sensitive Languages - Examples
Examples:
L1 = { wcw | w is in (a | b)* } : Declare before use
L2 = { an bm cn dm | n 1, m 1 }
an bm : formal parameter
cn dm : actual parameter
What does it mean when L is context-sensitive ?
CH4.46
CSE244
How do you show a Language is a CFL?How do you show a Language is a CFL?
L3 = { w c wR | w is in (a | b)* }
L4 = { an bm cm dn | n 1, m 1 }
L5 = { an bn cm dm | n 1, m 1 }
L6 = { an bn | n 1 }
CH4.47
CSE244
SolutionsSolutions
L3 = { w c wR | w is in (a | b)* }
L4 = { an bm cm dn | n 1, m 1 }
L5 = { an bn cm dm | n 1, m 1 }
L6 = { an bn | n 1 }
S a S a | b S b | c
S a S d | a A d
A b A c | bc
S XY
X a X b | ab
Y c Y d | cd
S a S b | ab
CH4.48
CSE244
Example of CS GrammarExample of CS Grammar
L2 = { an bm cn dm | n 1, m 1 }
S a A c HA a A c | B B b B D | b D D c c DD D H D H dc D H c d
CH4.49
CSE244
Top-Down ParsingTop-Down Parsing
• Identify a leftmost derivation for an input string
• Why ?
• By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input.
• A aBc adDc adec (scan a, scan d, scan e, scan c - accept!)
• Recursive-descent parsing concepts
•Predictive parsing
• Recursive / Brute force technique
• non-recursive / table driven
• Error recovery
• Implementation
CH4.50
CSE244
Top-Down ParsingTop-Down Parsing
From Grammar to Parser, take IFrom Grammar to Parser, take I
CH4.51
CSE244
Recursive Descent ParsingRecursive Descent Parsing
• General category of Parsing Top-Down
• Choose production rule based on input symbol
• May require backtracking to correct a wrong choice.
• Example: S c A d A ab | a
input: cad
cad S
c dA
cadS
c dA
a b
cadS
c dA
a b Problem: backtrack
cadS
c dA
a
cadS
c dA
a
CH4.52
CSE244
Top-Down ParsingTop-Down Parsing
From Grammar to Parser, take IIFrom Grammar to Parser, take II
CH4.53
CSE244
Predictive ParsingPredictive Parsing
•Backtracking is bad!
•To eliminate backtracking, what must we do/be sure of for grammar?• no left recursion• apply left factoring
• (frequently) when grammar satisfies above conditions:current input symbol in conjunction with current non-terminal uniquely determines the production that needs to be applied.
• Utilize transition diagrams:
For each non-terminal of the grammar do following:
1. Create an initial and final state
2. If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn
• Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.
CH4.54
CSE244
Transition DiagramsTransition Diagrams
• Unlike lexical equivalents, each edge represents a token•Transition implies: if token, match input else call proc• Recall earlier grammar and its associated transition diagrams
E TE’ E’ + TE’ |
T FT’ T’ * FT’ |
F ( E ) | id
0 21T E’
E:
3 6+
4T
E’: 5E’
7 98F T’
T:
10 13*
11F
T’: 12T’
14 17(
15E
F: 16)
id
How are transition diagrams used ?
Are -moves a problem ?
Can we simplify transition diagrams ?
Why is simplification critical ?
CH4.55
CSE244
How are Transition Diagrams Used ?How are Transition Diagrams Used ?
main(){ TD_E();}
TD_T(){ TD_F(); TD_T’();}
TD_E(){ TD_T(); TD_E’();}
TD_E’(){ token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); }}
TD_F(){ token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else ...}
TD_E’(){ token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); }}
What happened to -moves?
… “else unget()and terminate”
NOTE: not all error conditions have been represented.
CH4.56
CSE244
How can Transition Diagrams be How can Transition Diagrams be Simplified ?Simplified ?
6E’
E’: 53+
4T
CH4.57
CSE244
How can Transition Diagrams be How can Transition Diagrams be Simplified ? (2)Simplified ? (2)
6E’
E’: 53+
4T
E’: 53+
4T
6
CH4.58
CSE244
How can Transition Diagrams be How can Transition Diagrams be Simplified ? (3)Simplified ? (3)
6E’
E’: 53+
4T
E’: 53+
4T
6
E’: 3+
4
T
6
CH4.59
CSE244
How can Transition Diagrams be How can Transition Diagrams be Simplified ? (4)Simplified ? (4)
6E’
E’: 53+
4T
E’: 53+
4T
6
E’: 3+
4
T
6
21E’T
0E:
CH4.60
CSE244
How can Transition Diagrams be How can Transition Diagrams be Simplified ? (5)Simplified ? (5)
6E’
E’: 53+
4T
E’: 53+
4T
6
E’: 3+
4
T
6
21E’T
0E:
T0E: 3
+4
T
6
CH4.61
CSE244
Additional Transition Diagram Additional Transition Diagram SimplificationsSimplifications
• Similar steps for T and T’
• Simplified Transition diagrams:*
F7T: 10
13
T’: 10*
11
F
13
14 17(
15E
F: 16)
id
Why is simplification important ?
How does code change?
CH4.62
CSE244
Top-Down ParsingTop-Down Parsing
From Grammar to Parser, take IIIFrom Grammar to Parser, take III
CH4.63
CSE244
Motivating Table-Driven ParsingMotivating Table-Driven Parsing
1. Left to right scan input
2. Find leftmost derivation
Grammar: E TE’
E’ +TE’ | T id
Input : id + id $
Derivation: E
Processing Stack:
Terminator
CH4.64
CSE244
Non-Recursive / Table DrivenNon-Recursive / Table Driven
Empty stack symbol
a + b $
Y
X
$
Z
Input
Predictive Parsing Program
Stack Output
Parsing Table M[A,a]
(String + terminator)
NT + T symbols of CFG What actions parser
should take based on stack / input
General parser behavior: X : top of stack a : current input
1. When X=a = $ halt, accept, success
2. When X=a $ , POP X off stack, advance input, go to 1.
3. When X is a non-terminal, examine M[X,a]
if it is an error call recovery routineif M[X,a] = {X UVW}, POP X, PUSH W,V,UDO NOT expend any input
CH4.65
CSE244
Algorithm for Non-Recursive ParsingAlgorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is terminal or $ then
if X=a then
pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X,a] = XY1Y2…Yk then begin
pop X from stack;
push Yk, Yk-1, … , Y1 onto stack, with Y1 on top
output the production XY1Y2…Yk
end
else error()
until X=$ /* stack is empty */
Input pointer
May also execute other code based on the production used
CH4.66
CSE244
ExampleExample
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
Our well-worn example !
Table M
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
ETE’
TFT’
Fid
E’+TE’
T’ T’*FT’
F(E)
TFT’
ETE’
T’
E’ E’
T’
CH4.67
CSE244
Trace of ExampleTrace of Example
STACK INPUT OUTPUT
CH4.68
CSE244
Trace of ExampleTrace of Example
Expend Input
$E
$E’T$E’T’F$E’T’id$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’id$E’T’$E’$
id + id * id$
id + id * id$id + id * id$id + id * id$
+ id * id$+ id * id$+ id * id$
id * id$id * id$id * id$
* id$* id$
id$id$
$$$
E TE’T FT’F id
T’ E’ +TE’
T FT’F id
T’ *FT’
F id
T’ E’
STACK INPUT OUTPUT
CH4.69
CSE244
Leftmost Derivation for the ExampleLeftmost Derivation for the Example
The leftmost derivation for the example is as follows:
E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’
id + id T’E’ id + id * FT’E’ id + id * id T’E’
id + id * id E’ id + id * id
CH4.70
CSE244
What’s the Missing Puzzle Piece ?What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M !
1st : Calculate First & Follow for Grammar
2nd: Apply Construction Algorithm for Parsing Table ( We’ll see this shortly )
Basic Tools:
First: Let be a string of grammar symbols. First() is the set that includes every terminal that appears leftmost in or in any string originating from . NOTE: If , then is First( ).
Follow: Let A be a non-terminal. Follow(A) is the set of terminals a that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A).
*
*
*
CH4.71
CSE244
Motivation Behind First & FollowMotivation Behind First & Follow
First:
Follow:
Is used to help find the appropriate reduction to follow given the top-of-the-stack non-terminal and the current input symbol.
Example: If A , and a is in First(), then when a=input, replace A with (in the stack).
( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .
Is used when First has a conflict, to resolve choices, or when First gives no suggestion. When or , then what follows A dictates the next choice to be made.
Example: If A , and b is in Follow(A ), then when and b is an input character, then we expand A with , which will eventually expand to , of which b follows!
( : i.e., First( ) contains .)
*
*
*
CH4.72
CSE244
An example.An example.
$S abbd$
STACK INPUT OUTPUT
S a B C d B CB | | S aC b
CH4.73
CSE244
Computing First(X) : Computing First(X) : All Grammar SymbolsAll Grammar Symbols
1. If X is a terminal, First(X) = {X}
2. If X is a production rule, add to First(X)
3. If X is a non-terminal, and X Y1Y2…Yk is a production rule
Place First(Y1) in First(X)
if Y1 , Place First(Y2) in First(X)
if Y2 , Place First(Y3) in First(X)
…
if Yk-1 , Place First(Yk) in First(X)
NOTE: As soon as Yi , Stop.
Repeat above steps until no more elements are added to any First( ) set.
Checking “Yj ?” essentially amounts to checking whether belongs to First(Yj)
*
*
*
*
*
CH4.74
CSE244
Computing First(X) : Computing First(X) : All Grammar Symbols - continuedAll Grammar Symbols - continued
Informally, suppose we want to compute
First(X1 X2 … Xn ) = First (X1) “+”
First(X2) if is in First(X1) “+”
First(X3) if is in First(X2) “+”
…
First(Xn) if is in First(Xn-1)
Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i
Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !
CH4.75
CSE244
Example 1Example 1
Given the production rules:
S i E t SS’ | a
S’ eS |
E b
CH4.76
CSE244
Example 1Example 1
Given the production rules:
S i E t SS’ | a
S’ eS |
E b
Verify that
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
CH4.77
CSE244
Example 2Example 2
Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
CH4.78
CSE244
Example 2Example 2
Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E)
First(TE’)
First(T)
First(T) “+” First(E’)
First(F)
First((E)) “+” First(id)
First(F) “+” First(T’)
“(“ and “id”
Not First(E’) since T
Not First(T’) since F
*
*
Overall: First(E) = { ( , id } = First(F)
First(E’) = { + , } First(T’) = { * , }
First(T) First(F) = { ( , id }
CH4.79
CSE244
Computing Follow(A) : Computing Follow(A) : All Non-TerminalsAll Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $ signals end of input
2. If there is a production A B, then everything in First() is in Follow(B) except for .
3. If A B is a production, or A B and (First() contains ), then everything in Follow(A) is in Follow(B)
(Whatever followed A must follow B, since nothing follows B from the production rule)
*
We’ll calculate Follow for two grammars.
CH4.80
CSE244
The Algorithm for Follow – pseudocodeThe Algorithm for Follow – pseudocode
1. Initialize Follow(X) for all non-terminals X1. Initialize Follow(X) for all non-terminals Xto empty set. Place $ in Follow(S), where S is the start to empty set. Place $ in Follow(S), where S is the start
NT.NT.2. Repeat the following step until no modifications are 2. Repeat the following step until no modifications are
made to any Follow-setmade to any Follow-setFor any production For any production X X1 X2 … Xm
For j=1 to m,For j=1 to m,
if if Xj is a non-terminal then:is a non-terminal then:
Follow(Follow(Xj)=Follow(Follow(Xj)(First(Xj+1,…,Xm)-{});
If If First(Xj+1,…,Xm) contains or Xj+1,…,Xm=
then Follow(then Follow(Xj)=Follow(Follow(Xj) Follow(Follow(X);
CH4.81
CSE244
Computing Follow : 1Computing Follow : 1stst Example Example
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Recall:
CH4.82
CSE244
Computing Follow : 1Computing Follow : 1stst Example Example
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Recall:
Follow(S) – Contains $, since S is start symbol
Since S i E t SS’ , put in First(S’) – not
Since S’ , Put in Follow(S)
Since S’ eS, put in Follow(S’) So…. Follow(S) = { e, $ }
Follow(S’) = Follow(S) HOW?
Follow(E) = { t }
*
CH4.83
CSE244
Example 2Example 2
Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
CH4.84
CSE244
Example 2Example 2
Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
FollowE $ )E’ $ )T + $ )T’ + $ )F + * $ )
FirstE ( idE’ +
T ( idT’ *
F ( id
CH4.85
CSE244
Constructing Parsing TableConstructing Parsing Table
Algorithm:
Table has one row per non-terminal / one column per terminal (incl. $ )
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A to M[A, a ]
3.1 in First()? Add A to M[A, b ] for all terminals b in Follow(A).
3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]
4. All undefined entries are errors.
CH4.86
CSE244
Constructing Parsing Table – Example 1Constructing Parsing Table – Example 1
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Follow(S) = { e, $ }
Follow(S’) = { e, $ }
Follow(E) = { t }
CH4.87
CSE244
Constructing Parsing Table – Example 1Constructing Parsing Table – Example 1
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Follow(S) = { e, $ }
Follow(S’) = { e, $ }
Follow(E) = { t }
S i E t SS’ S a E b
First(i E t SS’)={i} First(a) = {a} First(b) = {b}
S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }
INPUT SYMBOL
a $tie
S
S’
E
bNon-terminal
S a S iEtSS’
S
E b
S’ S’ eS
CH4.88
CSE244
Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +, }First(T’) = { *, }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ), $ }Follow(T,T’) = { +, ) , $}
CH4.89
CSE244
Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +, }First(T’) = { *, }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ), $ }Follow(T,T’) = { +, ) , $}
Expression Example: E TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E TE’
M[E, id ] : E TE’
(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’
(by rule 3) E’ : in First( ) T’ : in First( )
by rule 2
M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)
M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)
(Due to Follow(E’) M[T’, $] : T’ (3.2)
CH4.90
CSE244
LL(1) GrammarsLL(1) Grammars
L : Scan input from Left to Right
L : Construct a Leftmost Derivation
1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action
LL(1) grammars == they have no multiply-defined entries in the parsing table.
Properties of LL(1) grammars:
• Grammar can’t be ambiguous or left recursive• Grammar is LL(1) when A 1. & do not derive strings starting with the same terminal a 2. Either or can derive , but not both.
Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar
CH4.91
CSE244
Error RecoveryError Recovery
a + b $
Y
X
$
Z
Input
Predictive Parsing Program
Stack Output
Parsing Table M[A,a]
When Do Errors Occur? Recall Predictive Parser Function:
1. If X is a terminal and it doesn’t match input.
2. If M[ X, Input ] is empty – No allowable actions
Consider two recovery techniques:
A. Panic Mode
B. Phrase-level Recovery
CH4.92
CSE244
Panic-Mode RecoveryPanic-Mode Recovery
Assume a non-terminal on the top of the stack.Assume a non-terminal on the top of the stack. Idea: skip symbols on the input until a token in a selected Idea: skip symbols on the input until a token in a selected
set of set of synchronizing synchronizing tokens is found.tokens is found. The choice for a synchronizing set is important.The choice for a synchronizing set is important.
some ideas: define the synchronizing set of A to be FOLLOW(A).
then skip input until a token in FOLLOW(A) appears and then pop A from the stack. Resume parsing...
add symbols of FIRST(A) into synchronizing set. In this case we skip input and once we find a token in FIRST(A) we resume parsing from A.
Productions that lead to if available might be used. If a terminal appears on top of the stack and does not match If a terminal appears on top of the stack and does not match
to the input == pop it and and continue parsing (issuing an to the input == pop it and and continue parsing (issuing an error message saying that the terminal was inserted).error message saying that the terminal was inserted).
CH4.93
CSE244
Panic Mode Recovery, IIPanic Mode Recovery, II
General Approach: Modify the empty cells of the Parsing Table.
1. if M[A,a] = {empty} and a belongs to Follow(A) then we set M[A,a] = “synch”
Error-recovery Strategy :
If A=top-of-the-stack and a=current-input,
1. If A is NT and M[A,a] = {empty} then skip a from the input.
2. If A is NT and M[A,a] = {synch} then pop A.
3. If A is a terminal and A!=a then pop token (essentially inserting it).
CH4.94
CSE244
Revised Parsing Table / ExampleRevised Parsing Table / Example
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
ETE’
TFT’
Fid
E’+TE’
T’ T’*FT’
F(E)
TFT’
ETE’
T’
E’ E’
T’
From Follow sets. Pop top of stack NT
“synch” action
Skip input symbol
CH4.95
CSE244
Revised Parsing Table / Example(2)Revised Parsing Table / Example(2)
$E$E$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’$
+ id * + id$
id * + id$ id * + id$ id * + id$
id * + id$ * + id$ * + id$
+ id$ + id$
+ id$ + id$
id$ id$ id$
$ $ $
STACK INPUT Remark
error, M[F,+] = synchF has been popped
error, skip +
PossibleError Msg:“Misplaced +I am skipping it”
PossibleError Msg:“Missing Term”
CH4.96
CSE244
Writing Error MessagesWriting Error Messages
Keep input counter(s)Keep input counter(s) Recall: every non-terminal symbolizes an abstract language Recall: every non-terminal symbolizes an abstract language
construct.construct. Examples of Error-messages for our usual grammarExamples of Error-messages for our usual grammar
E = means expression. top-of-stack is E, input is +
“Error at location i, expressions cannot start with a ‘+’” or “error at location i, invalid expression”
Similarly for E, * E’= expression ending.
Top-of-stack is E’, input is * or id“Error: expression starting at j is badly formed at location i”
Requires: every time you pop an ‘E’ remember the location
CH4.97
CSE244
Writing Error-Messages, IIWriting Error-Messages, II
Messages for Synch Errors. Messages for Synch Errors. Top-of-stack is F input is +
“error at location i, expected summation/multiplication term missing”
Top-of-stack is E input is ) “error at location i, expected expression missing”
CH4.98
CSE244
Writing Error Messages, IIIWriting Error Messages, III
When the top-of-the stack is a terminal that does When the top-of-the stack is a terminal that does not match…not match… E.g. top-of-stack is id and the input is +
“error at location i: identifier expected” Top-of-stack is ) and the input is terminal other
than ) Every time you match an ‘(‘
push the location of ‘(‘ to a “left parenthesis” stack.– this can also be done with the symbol stack.
When the mismatch is discovered look at the left parenthesis stack to recover the location of the parenthesis.
“error at location i: left parenthesis at location m has no closing right parenthesis”– E.g. consider ( id * + (id id) $
CH4.99
CSE244
Incorporating Error-Messages to the TableIncorporating Error-Messages to the Table
Empty parsing table entries can now fill with the Empty parsing table entries can now fill with the appropriate error-reporting techniques.appropriate error-reporting techniques.
CH4.100
CSE244
Phrase-Level RecoveryPhrase-Level Recovery
• Fill in blanks entries of parsing table with error handling routines that do not only report errors but may also:
• change/ insert / delete / symbols into the stack and / or input stream• + issue error message
• Problems:
• Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language• infinite loops must be avoided
• Essentially extends panic mode to have more complete error handling
CH4.101
CSE244
How Would You Implement TD ParserHow Would You Implement TD Parser
• Stack – Easy to handle. Write ADT to manipulate its contents
• Input Stream – Responsibility of lexical analyzer
• Key Issue – How is parsing table implemented ?One approach: Assign unique IDS
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
ETE’
TFT’
Fid
E’+TE’
T’ T’*FT’
F(E)
TFT’
ETE’
T’
E’ E’
T’
synch
synch synch
synch
synch
synch synch
synch
synch
All rules have unique IDs Ditto for synch
actions
Also for blanks which handle errors
CH4.102
CSE244
Revised Parsing Table:Revised Parsing Table:
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
4
6
1
3
6
1
2 3
4
6
8 7 17161514
1312
10
11
9
24
23
222120
18 19
255
1 ETE’2 E’+TE’3 E’4 TFT’5 T’*FT’6 T’7 F(E)8 Fid
9 – 17 : Sync
Actions
18 – 25 : Error
Handlers
CH4.103
CSE244
Revised Parsing Table: (2)Revised Parsing Table: (2)
Each # ( or set of #s) corresponds to a procedure that:
• Uses Stack ADT
• Gets Tokens
• Prints Error Messages
• Prints Diagnostic Messages
• Handles Errors
CH4.104
CSE244
How is Parser Constructed ?How is Parser Constructed ?
One large CASE statement:state = M[ top(s), current_token ]switch (state) { case 1: proc_E_TE’( ) ; break ; … case 8: proc_F_id( ) ; break ; case 9: proc_sync_9( ) ; break ; … case 17: proc_sync_17( ) ; break ; case 18: … case 25: }
Procs to handle errors
Some error handlers may be similar
Combine put in another switch
Some sync actions may be same
CH4.105
CSE244
Final Comments – Top-Down ParsingFinal Comments – Top-Down Parsing
So far,
• We’ve examined grammars and language theory and its relationship to parsing
• Key concepts: Rewriting grammar into an acceptable form
• Examined Top-Down parsing:
Brute Force : Transition diagrams & recursion
Elegant : Table driven
• We’ve identified its shortcomings:
Not all grammars can be made LL(1) !
• Bottom-Up Parsing - Future