Upload
winifred-byrd
View
233
Download
1
Embed Size (px)
Citation preview
1
Syntax AnalysisSyntax Analysis
2
Syntax AnalysisSyntax Analysis
Introduction to parsersContext-free grammarsPush-down automataTop-down parsingButtom-up parsingBison - a parser generator
3
Introduction to parsersIntroduction to parsers
LexicalAnalyzer
Parser
SymbolTable
token
next token
source SemanticAnalyzer
syntaxtreecode
4
Context-Free GrammarsContext-Free Grammars
A set of terminals: basic symbols from which sentences are formed
A set of nonterminals: syntactic categories denoting sets of sentences
A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences
The start symbol: a distinguished nonterminal denoting the language
5
An ExampleAn Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’Nonterminals: expr, opProductions:
expr expr op expr expr ‘(’ expr ‘)’
expr ‘-’ expr expr id
op ‘+’ | ‘-’ | ‘*’ | ‘/’ The start symbol: expr
6
DerivationsDerivationsA derivation step is an application of a
production as a rewriting ruleE - E
A sequence of derivation stepsE - E - ( E ) - ( id )
is called a derivation of “- ( id )” from EThe symbol * denotes “derives in zero or
more steps”; the symbol + denotes “derives in one or more steps E * - ( id ) E + - ( id )
7
Context-Free LanguagesContext-Free Languages
A context-free language L(G) is the language defined by a context-free grammar G
A string of terminals is in L(G) if and only if S + , is called a sentence of G
If S * , where may contain nonterminals, then we call a sentential form of G
E - E - ( E ) - ( id ) G1 is equivalent to G2 if L(G1) = L(G2)
8
Left- & Right-most DerivationsLeft- & Right-most DerivationsEach derivation step needs to choose
– a nonterminal to rewrite– a production to apply
A leftmost derivation always chooses the leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E + E ) lm - ( id + E ) lm - ( id + id )
A rightmost derivation always chooses the rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E + E ) rm - (E + id ) rm - ( id + id )
9
Parse TreesParse Trees
A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting
Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation
10
An ExampleAn Example
E
-
( )
+
id id
E
E E
E E lm - E lm - ( E ) lm - ( E + E )lm - ( id + E ) lm - ( id + id )
E rm - E rm - ( E ) rm - ( E + E )rm - ( E + id ) rm - ( id + id )
11
Ambiguous GrammarAmbiguous Grammar
A grammar is ambiguous if it produces more than one parse tree for some sentence
E E + E id + E id + E * E id + id * E id + id * id
E E * E E + E * E id + E * E id + id * E id + id * id
12
Ambiguous GrammarAmbiguous Grammar
E
+E E
id
id
*E E
id
E
*E E
id
id
+E E
id
13
Resolving AmbiguityResolving Ambiguity
Use disambiguiting rules to throw away
undesirable parse trees
Rewrite grammars by incorporating disa
mbiguiting rules into grammars
14
An ExampleAn Example
The dangling-else grammar stmt if expr then stmt | if expr then stmt else stmt
| other
Two parse trees forif E1 then if E2 then S1 else S2
15
An ExampleAn Example
S
elseE S Sif then
if E then S
elseE
S
S Sif then
if E then S
16
Disambiguiting RulesDisambiguiting Rules
Rule: match each else with the closest
previous unmatched then
Remove undesired state transitions in the
pushdown automaton
17
Grammar RewritingGrammar Rewriting
stmt m_stmt | unm_stmt
m_stmt if expr then m_stmt else m_stmt | other
unm_stmt if expr then stmt | if expr then m_stmt else unm_stmt
18
RE vs. CFGRE vs. CFG
Every language described by a RE can also be described by a CFG
Why use REs for lexical syntax?– do not need a notation as powerful as CFGs– are more concise and easier to understand than
CFGs– More efficient lexical analyzers can be constru
cted from REs than from CFGs– Provide a way for modularizing the front end i
nto two manageable-sized components
19
Push-Down AutomataPush-Down Automata
Finite Automaton
Input
OutputStack
$
$
20
An ExampleAn Example
S’ S $
S a S b
S
1 2 3start (a, $)
a(b, a)
a($, $)
(a, a)
a(b, a)
a
0
($, $)
21
Nonregular ConstructsNonregular Constructs
REs can denote only a fixed number of repetitions or an unspecified number of repetitions of one given construct:
an, a*
A nonregular construct:– L = {anbn | n 0}
22
Non-Context-Free ConstructsNon-Context-Free Constructs
CFGs can denote only a fixed number of repetitions or an unspecified number of repetitions of one or two given constructs
Some non-context-free constructs:– L1 = {wcw | w is in (a | b)*}
– L2 = {anbmcndm | n 1 and m 1}
– L3 = {anbncn | n 0}
23
共勉
大學之道︰ 在明明德,在親民,在止於至善。
-- 大學
24
Top-Down ParsingTop-Down ParsingConstruct a parse tree from the root to the
leaves using leftmost derivation
1. S c A B input: cad2. A a b 3. A a4. B d
S S
c A B1
S
c A B
a b
2S
c A B
a d
4S
c A B
a
3
backtrack
25
Predictive ParsingPredictive Parsing
A top-down parsing without backtracking– there is only one alternative production to choo
se at each derivation step
stmt if expr then stmt else stmt | while expr do stmt | begin stmt_list end
26
LL(LL(kk) Parsing) Parsing
The first L stands for scanning the input from left to right
The second L stands for producing a leftmost derivation
The k stands for the number of lookahead input symbols used to choose alternative productions at each derivation step
27
LL(1) ParsingLL(1) Parsing
Use one input symbol of lookahead
Recursive-descent parsing
Nonrecursive predictive parsing
28
An ExampleAn Example
LL(1): S a b e | c d e
LL(2): S a b e | a d e
29
Recursive Descent ParsingRecursive Descent Parsing
The parser consists of a set of (possibly recursive) procedures
Each procedure is associated with a nonterminal of the grammar that is responsible to derive the productions of that nonterminal
Each procedure should be able to choose a unique production to derive based on the current token
30
An ExampleAn Example
type simple | id | array [ simple ] of type
simple integer | char | num dotdot num
{integer, char, num}
31
Recursive Descent ParsingRecursive Descent Parsing
♥ For each terminal in the production, the terminal is matched with the current token
♥ For each nonterminal in the production, the procedure associated with the nonterminal is called
♥ The sequence of matchings and procedure calls in processing the input implicitly defines a parse tree for the input
32
An ExampleAn Example
type
array [ simple ] of type
dotdotnum num simple
integer
array [ num dotdot num ] of integer
33
An ExampleAn Example
procedure match(t : terminal);begin if lookahead = t then lookahead := nexttoken else errorend;
34
An ExampleAn Exampleprocedure type;begin if lookahead is in { integer, char, num } then simple else if lookahead = id then match(id) else if lookahead = array then begin match(array); match('['); simple; match(']'); match(of); type end else errorend;
35
An ExampleAn Example
procedure simple;begin if lookahead = integer then match(integer) else if lookahead = char then match(char) else if lookahead = num then begin match(num); match(dotdot); match(num) end else errorend;
36
First Sets
The first set of a string is the set of terminals that begin the strings derived from. If * , then is also in the first set of
.
37
First Sets
If X is terminal, then FIRST(X) is {X} If X is nonterminal and X is a production,
then add to FIRST(X) If X is nonterminal and X Y1 Y2 ... Yk is a pr
oduction, then add a to FIRST(X) if for some i, a is in FIRST(Yi) and is in all of FIRST(Y1), ..., FIRST(Yi-1). If is in FIRST(Yj) for all j, then add to FIRST(X)
38
An ExampleAn Example
E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
FIRST(F) = { (, id }FIRST(T') = { *, }, FIRST(T) = { (, id }FIRST(E') = { +, }, FIRST(E) = { (, id }
39
Follow Sets
The follow set of a nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form, namely,
S * A a
a is in the follow set of A.
40
Follow Sets
Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker
If there is a production A B , then everything in FIRST() except for is placed in FOLLOW(B)
If there is a production A B or A B where FIRST() contains , then everything in FOLLOW(A) is in FOLLOW(B)
41
An ExampleAn Example
E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }, FIRST(T') = { *, }FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }
42
Nonrecursive Predictive ParsingNonrecursive Predictive Parsing
Parsing driver
Parsing table
Input
OutputStack
43
Stack OperationsStack Operations
Match– when the top stack symbol is a terminal and it
matches the input token, pop the terminal and advance the input pointer
Expand– when the top stack symbol is a nonterminal, re
place this symbol by the right hand side of one of its productions (pop the nonterminal and push the right hand side of a production in reverse order)
44
An ExampleAn Example
type simple | id | array [ simple ] of type
simple integer | char | num dotdot num
45
An ExampleAn ExampleAction Stack InputE type array [ num dotdot num ] of integerM type of ] simple [ array array [ num dotdot num ] of integerM type of ] simple [ [ num dotdot num ] of integerE type of ] simple num dotdot num ] of integerM type of ] num dotdot num num dotdot num ] of integerM type of ] num dotdot dotdot num ] of integerM type of ] num num ] of integerM type of ] ] of integerM type of of integerE type integerE simple integerM integer integer
46
Parsing Driver
push $S onto the stack, where S is the start symbolset ip to point to the first symbol of w$;repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error else /* X is a nonterminal */
if M[X, a] = X Y1 Y2 ... Yk then pop X from the stack and push Yk ... Y2 Y1 onto the stack else erroruntil X = $ and a = $
47
Constructing Parsing TableConstructing Parsing Table
Input. Grammar G.
Output. Parsing Table M.
Method.
1. For each production A , do steps 2 and 3.
2. For each terminal a in FIRST( ), add A to M[A, a].
3. If is in FIRST( ), add A to M[A, b] for each
symbol b in FOLLOW(A).
4. Make each undefined entry of M be error.
48
An ExampleAn Example
id + * ( ) $E E TE' E TE'E' E' +TE' E' E' T T FT' T FT' T' T' T' *FT' T' T' F F id F (E)
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }, FIRST(T') = { *, }FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }
49
An ExampleAn Example Stack Input Output$E id + id * id$ $E'T id + id * id$ E TE' $E'T'F id + id * id$ T FT' $E'T'id id + id * id$ F id$E'T' + id * id$$E' + id * id$ T' $E'T+ + id * id$ E' +TE' $E'T id * id$$E'T'F id * id$ T FT' $E'T'id id * id$ F id$E'T' * id$
$E'T'F* * id$ T' *FT' $E'T'F id$$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'
50
LL(1) GrammarsLL(1) Grammars
A grammar is an LL(1) grammar if its LL(1) parsing table has no multiply-defined entries
51
A Counter ExampleA Counter Example
S i E t S S' | a FOLLOW(S) = {$, e}S' e S | FOLLOW(S') = {$, e}E b FOLLOW(E) = {t}
a b e i t $S S a S i E t S S'S' S' e S S' S' E E b
52
LL(1) GrammarsLL(1) Grammars
A grammar G is LL(1) iff whenever A | are two distinct productions of G, the following conditions hold:– For no terminal a do both and derive strings be
ginning with a.– At most one of and can derive the empty string.– If * , then does not derive any string beginn
ing with a terminal in FOLLOW(A).
FIRST(α) FIRST(β) =
FIRST(α) FOLLOW(A) =
53
Left RecursionLeft Recursion
A grammar is left recursive if it has a nonterminal A such that A * A
A A | A R R R |
A
A
A
A
A R
RRR
54
Direct Left RecursionDirect Left Recursion
A A 1 | A 2 | ... | A m | 1 | 2 | ... | n
A 1 A' | 2 A' | ... | n A'
A' 1 A' | 2 A' | ... | m A' |
55
An ExampleAn Example
E E + T | TT T * F | FF ( E ) | id
E T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
56
Indirect Left RecursionIndirect Left Recursion
S A a | bA A c | S d |
S A a S d a
A A c | A a d | b d |
S A a | bA b d A' | A'A' c A' | a d A' |
57
Indirect Left RecursionIndirect Left Recursion
Input. Grammar G with no cycles (derivations of the form A + A) or -production (productions of the form A ).Output. An equivalent grammar with no left recursion.1. Arrange the nonterminals in some order A1, A2, ..., An
2. for i := 1 to n do beginfor j := 1 to i - 1 do begin replace each production of the form Ai Aj
by the production Ai 1 | 2 | ... | k where Aj 1 | 2 | ... | k are all thecurrent Aj-productions;
endeliminate direct left recursion among Ai-productions
end
58
Left FactoringLeft Factoring
Two alternatives of a nonterminal A have a nontrivial common prefix if , and
A 1 | 2
A A'A' 1 | 2
59
An ExampleAn Example
S i E t S | i E t S e S | aE b
S i E t S S' | aS' e S | E b
60
Error RecoveryError Recovery
Panic mode: skip tokens until a token in a set of synchronizing tokens appears
1. If a terminal on stack cannot be matched, pop the terminal
2. use FOLLOW(A) as sync set for A (pop A)
3. use the first set of a higher construct as sync set for A
4. use FIRST(A) as sync set for A
5. use the production deriving as the default for A
61
An ExampleAn ExampleE T E'E' + T E' | T F T'T' * F T' | F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }FIRST(E') = { +, }FIRST(T') = { *, }FOLLOW(E) = FOLLOW(E') = { ), $ }FOLLOW(T) = FOLLOW(T') = { +, ), $ }FOLLOW(F) = { +, *, ), $ }
62
An ExampleAn Example
id + * ( ) $E E TE' E TE' sync2 sync2
E' E' +TE' E' E' T T FT' sync2 T FT' sync2 sync2
T' T' T' *FT' T' T' F F id sync2 sync2 F (E) sync2 sync2
63
An ExampleAn Example Stack Input Output$E ) id * + id$ error, skip )$E id * + id$ $E'T id * + id$ E TE' $E'T'F id * + id$ T FT' $E'T'id id * + id$ F id$E'T' * + id$$E'T'F* * + id$ T' *FT' $E'T'F + id$ error$E'T' + id$ F has been poped$E' + id$$E'T+ + id$ E' +TE' $E'T id$$E'T'F id$ T FT'$E'T'id id$ F id$E'T' $$E' $ T' $ $ E'
64
共勉
樊遲問仁。子曰︰愛人。
子曰︰人之過也,各於其黨, 觀過,斯知仁矣。 -- 論語
人生的目的在追尋快樂。 -- 達賴喇嘛
65
Bottom-Up ParsingBottom-Up Parsing
Construct a parse tree from the leaves to the root using rightmost derivation in reverse
S a A B e input: abbcdeA A b c | bB d
ca d eb
A
b
A
ca d eb
A
b
BA
ca d eb
A
b
S
BA
ca d eb
A
bca d ebb
abbcde rm aAbcde rm aAde rm aABe rm S
66
HandlesHandles
A handle of a right-sentential form consists of– a production A – a position of where can be replaced by A to
produce the previous right-sentential form in a rightmost derivation of
abbcde rm aAbcde rm aAde rm aABe rm S
A b A A b c B d S a A B e
67
Handle PruningHandle Pruning
rm A rm S
S
A
The string to the right of the handle contains only terminals A is the bottommost leftmost interior node with all its children in the tree
68
An ExampleAn Example
S
S
BA
ca d eb
A
b
S
BA
ca d eb
A
S
BA
a d e
S
BA
a e
69
Shift-Reduce ParsingShift-Reduce Parsing
Parsing driver
Parsing table
Input
Output
Stack
Handle
$
$
70
Stack OperationsStack Operations
Shift: shift the next input symbol onto the top of the stack
Reduce: replace the handle at the top of the stack with the corresponding nonterminal
Accept: announce successful completion of the parsing
Error: call an error recovery routine
71
An ExampleAn Example
Action Stack InputS $ a b b c d e $S $ a b b c d e $R $ a b b c d e $S $ a A b c d e $S $ a A b c d e $R $ a A b c d e $S $ a A d e $R $ a A d e $S $ a A B e $R $ a A B e $A $ S $
72
Shift/Reduce ConflictShift/Reduce Conflict
stmt if expr then stmt | if expr then stmt else stmt | other
Stack Input$ - - - if expr then stmt else - - - $
Shift if expr then stmt else stmt Reduce if expr then stmt
73
Reduce/Reduce ConflictReduce/Reduce Conflictstmt id ( para_list ) stmt expr := expr para_list para_list , parapara_list parapara idexpr id ( expr_list ) expr idexpr_list expr_list , exprexpr_list expr
Stack Input$ - - - id ( id , id ) - - - $
$- - - procid ( id , id ) - - - $
74
LR(k) ParsingLR(k) Parsing
The L stands for scanning the input from left to right
The R stands for constructing a rightmost derivation in reverse
The k stands for the number of lookahead input symbols used to make parsing decisions
75
LR ParsingLR Parsing
The LR parsing algorithm
Constructing SLR(1) parsing tables
Constructing LR(1) parsing tables
Constructing LALR(1) parsing tables
76
Model of an LR ParserModel of an LR Parser
Parsing driver
Input
Output
Stack
Action Goto
Sm
Sm-1
Xm-1
Xm
S0Parsing table
$
$
77
An ExampleAn Example
(1) E E + T (2) E T(3) T T * F (4) T F(5) F ( E ) (6) F id
State Action Goto id + * ( ) $ E T F0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r110 r3 r3 r3 r311 r5 r5 r5 r5
78
An ExampleAn ExampleAction Stack Inputs5 $0 id + id * id $r6 $0 id5 + id * id $r4 $0 F3 + id * id $r2 $0 T2 + id * id $s6 $0 E1 + id * id $s5 $0 E1 +6 id * id $r6 $0 E1 +6 id5 * id $r4 $0 E1 +6 F3 * id $s7 $0 E1 +6 T9 * id $s5 $0 E1 +6 T9 *7 id $r6 $0 E1 +6 T9 *7 id5 $r3 $0 E1 +6 T9 *7 F10 $r1 $0 E1 +6 T9 $acc $0 E1 $
79
LR Parsing DriverLR Parsing Driver
push $s0 onto the stack, where s0 is the initial stateset ip to point to the first symbol of w$;repeat let s be the top state on the stack and a the symbol pointed to by ip; if action[s, a] == shift s’ then push a and s’ onto the stack and advance ip
else if action[s, a] == reduce A then pop 2 * | | symbols off the stack; s’ = goto[top(), A]; push a and s’ onto the stack and advance ip else if action[s, a] == accept then return else erroruntil false
80
LR(0) ItemsLR(0) Items• An LR(0) item of a grammar in G is a
production of G with a dot at some position of the right-hand side, A
• The production A X Y Z yields the following four LR(0) items
A • X Y Z, A X • Y Z, A X Y • Z, A X Y Z •
• An LR(0) item represents a state in an NPDA indicating how much of a production we have seen at a given point in the parsing process
81
From CFG to NPDAFrom CFG to NPDA
• The state A B will go to the state B via an edge of the empty string
• The state A a will go to the state A a via an edge of terminal a (a shifting)
• The state A will cause a reduction on seeing a terminal in FOLLOW(A)
• The state A B will go to the state A B via an edge of nonterminal B (after a reduction)
82
An ExampleAn Example
1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id
Augmented grammar: Easier to identify the accepting state
83
An ExampleAn Example
E’•E
0
E
E’E•
7
T
ET•
9
FTF•
11
E•E+T
1
E•T
2
T•T*F
3
T•F
4
EE•+T
8E
EE+•T
14+
17
EE+T•T
F•(E)
5
F•id
6 Fid•
13id
F(•E)
12(
TT•*F
10
TTT*•F*
15
TT*F•F
18
E F(E•)
16)
F(E)•
19
6
7
2
3
4
5
84
From NPDA to DPDAFrom NPDA to DPDA
• There are two functions performed on sets of LR(0) items (DPDA states)
• The function closure(I) adds more items to I when there is a dot to the left of a nonterminal (corresponding to edges)
• The function goto(I, X) moves the dot past the symbol X in all items in I that contain X (corresponding to non- edges)
85
The Closure FunctionThe Closure Function
function closure(I);begin J := I; repeat for each item A B in J and each production B of G such that B is not in J do
J = J { B } until no more items can be added to J; return Jend
86
An ExampleAn Example
1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id
s0 = E’ E,I0 = closure({s0 }) = { E’ E, E E + T, E T, T T * F, T F, F ( E ), F id }
87
The Goto FunctionThe Goto Function
function goto(I, X);begin set J to the empty set for any item A X in I do add A X to J return closure(J)end
88
An ExampleAn Example
I0 = {E’ E, E E + T, E T, T T * F, T F, F ( E ), F id }
goto(I0 , E) = closure({E’ E , E E + T })= {E’ E , E E + T }
89
Subset ConstructionSubset Construction
function items(G’);begin C := {closure({S’ S})} repeat for each set of items I in C and each symbol X do J := goto(I, X) if J is not empty and not in C then C = C { J } until no more sets of items can be added to C return Cend
90
An ExampleAn Example
1. E’ E2. E E + T 3. E T4. T T * F 5. T F6. F ( E ) 7. F id
91
I0 : E’ E E E + T E T T T * F T F F ( E ) F idgoto(I0, E) =I1 : E’ E E E + Tgoto(I0, T) =I2 : E T T T * Fgoto(I0, F) =I3 : T F
goto(I0, ‘(’) =I4 : F ( E ) E E + T E T T T * F T F F ( E ) F idgoto(I0, id) =I5 : F id goto(I1, ‘+’) =I6 : E E + T T T * F T F F ( E ) F id
goto(I2, ‘*’) =I7 : T T * F F ( E ) F idgoto(I4, E) =I8 : F ( E ) E E + Tgoto(I6, T) =I9 : E E + T T T * Fgoto(I7, F) =I10 : T T * F goto(I8, ‘)’) =I11 : F ( E )
92
An ExampleAn Example
E’ • E E • E + TE • TT • T * FT • FF • ( E )F • id
E’ E • E E • + T
E T •T T • * F
E T
T F •
F
F ( • E )E • E + TE • TT • T * FT • FF • ( E )F • id
F id • id
(
T T * • FF • ( E )F • id*
E E + • TT • T * FT • FF • ( E )F • id
+
F ( E • )E E • + T
F T T * F •
E E + T •T T • * FT
F ( E ) •
)
0
1
2
3
4
5
6
7
8
9
10
11
(id *
+
id
ET
F
F(
(id
93
SLR(1) Parsing Table GenerationSLR(1) Parsing Table Generation
procedure SLR(G’);begin for each state I in items(G’) do begin if A a in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if A in I and A S’ then action[I, a] = “reduce A ” for all a in Follow(A) if S’ S in I then action[I, $] = “accept” if A X in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend
94
An ExampleAn Example
+ * ( ) id $ E T F 0 s4 s5 1 2 3 1 s6 a 2 r3 s7 r3 r3 3 r5 r5 r5 r5 4 s4 s5 8 2 3 5 r7 r7 r7 r7 6 s4 s5 9 3 7 s4 s5 10 8 s6 s11 9 r2 s7 r2 r2 10 r4 r4 r4 r4 11 r6 r6 r6 r6
95
共勉
子曰︰唯仁者,能好人,能惡人。
子曰︰茍志於仁矣,無惡也。-- 論語
子曰︰志士仁人,無求生以害仁, 有殺身以成仁。
96
LR(1) ItemsLR(1) Items
• An LR(1) item of a grammar in G is a pair, ( A , a ), of an LR(0) item A and a lookahead symbol a
• The lookahead has no effect in an LR(1) item of the form ( A , a ), where is not
• An LR(1) item of the form ( A , a ) calls for a reduction by A only if the next input symbol is a
97
The Closure FunctionThe Closure Functionfunction closure(I);begin J := I; repeat for each item (A B , a) in J and each production B of G and each b FIRST( a) such that (B , b) is not in J do
J = J { (B , b) } until no more items can be added to J; return Jend
98
The Goto FunctionThe Goto Function
function goto(I, X);begin set J to the empty set for any item (A X , a) in I do add (A X , a) to J return closure(J)end
99
Subset ConstructionSubset Construction
function items(G’);begin C := {closure({S’ S, $})} repeat for each set of items I in C and each symbol X do J := goto(I, X) if J is not empty and not in C then C = C { J } until no more sets of items can be added to C return Cend
100
An ExampleAn Example
1. S’ S 2. S C C 3. C c C4. C d
101
An ExampleAn Example
I0: closure({(S’ S, $)}) = (S’ S, $) (S C C, $) (C c C, c/d) (C d, c/d)
I1: goto(I0, S) = (S’ S , $)
I2: goto(I0, C) = (S C C, $) (C c C, $) (C d, $)
I3: goto(I0, c) = (C c C, c/d) (C c C, c/d) (C d, c/d)
I4: goto(I0, d) = (C d , c/d)
I5: goto(I2, C) = (S C C , $)
102
An ExampleAn Example
I6: goto(I2, c) = (C c C, $) (C c C, $) (C d, $)
I7: goto(I2, d) = (C d , $)
I8: goto(I3, C) = (C c C , c/d)
: goto(I3, c) = I3
: goto(I3, d) = I4
I9: goto(I6, C) = (C c C , $)
: goto(I6, c) = I6
: goto(I6, d) = I7
103
LR(1) Parsing Table GenerationLR(1) Parsing Table Generation
procedure LR(G’);begin for each state I in items(G’) do begin if (A a , b) in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if (A , a) in I and A S’ then action[I, a] = “reduce A ” if (S’ S , $) in I then action[I, $] = “accept” if (A X , a) in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend
104
An ExampleAn Example
c d $ S C 0 s3 s4 1 2 1 a 2 s6 s7 5 3 s3 s4 8 4 r4 r4 5 r2 6 s6 s7 9 7 r4 8 r3 r3 9 r3
105
The Core of LR(1) ItemsThe Core of LR(1) Items
• The core of a set of LR(1) Items is the set of their first components (i.e., LR(0) items)
• The core of the set of LR(1) items{ (C c C, c/d),
(C c C, c/d), (C d, c/d) }
is {C c C, C c C, C d }
106
Merging CoresMerging Cores
I3: { (C c C, c/d) (C c C, c/d) (C d, c/d) }
I4: { (C d , c/d) }
I8: { (C c C , c/d) }
I6: { (C c C, $) (C c C, $) (C d, $) }
I7: { (C d , $) }
I9: { (C c C , $) }
107
LALR(1) LALR(1) ParsingParsing Table Generation Table Generation
procedure LALR(G’);begin for each state I in mergeCore(items(G’)) do begin if (A a , b) in I and goto(I, a) = J for a terminal a then action[I, a] = “shift J” if (A , a) in I and A S’ then action[I, a] = “reduce A ” if (S’ S , $) in I then action[I, $] = “accept” if (A X , a) in I and goto(I, X) = J for a nonterminal X then goto[I, X] = J end all other entries in action and goto are made errorend
108
An ExampleAn Example
c d $ S C 0 s36 s47 1 2 1 a 2 s36 s47 5 36 s36 s47 89 47 r4 r4 r4 5 r2 89 r3 r3 r3
109
LR GrammarsLR Grammars
• A grammar is SLR(1) iff its SLR(1) parsing table has no multiply-defined entries
• A grammar is LR(1) iff its LR(1) parsing table has no multiply-defined entries
• A grammar is LALR(1) iff its LALR(1) parsing table has no multiply-defined entries
110
Hierarchy of Grammar ClassesHierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k) LR(k)
LR(1)
LALR(1)
LL(1) SLR(1)
111
Hierarchy of Grammar ClassesHierarchy of Grammar Classes
• Why LL(k) LR(k)?
• Why SLR(k) LALR(k) LR(k)?
112
LL(k) vs. LR(k)LL(k) vs. LR(k)• For a grammar to be LL(k), we must be able t
o recognize the use of a production by seeing only the first k symbols of what its right-hand side derives
• For a grammar to be LR(k), we must be able to recognize the use of a production by having seen all of what is derived from its right-hand side with k more symbols of lookahead
113
LALR(k) vs. LR(k)LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the same core does not introduce shift/reduce conflicts
• Suppose there is a shift-reduce conflict on lookahead a in the merged set because of
1. (A , a) 2. (B a , b)• Then some set of items has item (A , a) , and
since the cores of all sets merged are the same, it must have an item (B a , c) for some c
• But then this set has the same shift/reduce conflict on a
114
LALR(k) vs. LR(k)LALR(k) vs. LR(k)• The merge of the sets of LR(1) items having the sam
e core may introduce reduce/reduce conflicts• As an example, consider the grammar
1. S’ S 2. S a A d | a B e | b A e | b B d 3. A c 4. B c
that generates acd, ace, bce, bcd• The set {(A c , d), (B c , e)} is valid for acx• The set {(A c , e), (B c , d)} is valid for bcx• But the union {(A c , d/e), (B c , d/e)} genera
tes a reduce/reduce conflict
115
SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)
1. S’ S 2. S L = R3. S R 4. L * R5. L id6. R L
116
SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)
I0: closure({S’ S}) = S’ S S L = R S R L * R L id R L
I1: goto(I0, S) = S’ S
I2: goto(I0, L) = S L = R R L
I3: goto(I0, R) = S R I4: goto(I0, *) = L * R R L L * R L id
I5: goto(I0, id) = L id
FOLLOW(R) = {=, $}
117
SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)
I6: goto(I2, =) = S L = R R L L * R L id
I7: goto(I4, R) = L * R
I8: goto(I4, L) = R L
I9: goto(I6, R) = S L = R
118
SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)
I0: closure({(S’ S, $)}) = (S’ S, $) (S L = R, $) (S R, $) (L * R, =/$) (L id, =/$) (R L, $)
I1: goto(I0, S) = (S’ S , $)
I2: goto(I0, L) = (S L = R, $) (R L , $)
I3: goto(I0, R) = (S R , $) I4: goto(I0, *) = (L * R, =/$) (R L, =/$) (L * R, =/$) (L id, =/$)
I5: goto(I0, id) = (L id , =/$)
119
SLR(k) vs. LALR(k)SLR(k) vs. LALR(k)I6: goto(I2, =) = (S L = R, $) (R L, $) (L * R, $) (L id, $)
I7: goto(I4, R) = (L * R , =/$)
I8: goto(I4, L) = (R L , =/$)
I9: goto(I6, R) = (S L = R , $)
I10: goto(I6, L) = (R L , $)
I11: goto(I6, *) = (L * R, $) (R L, $) (L * R, $) (L id, $)
I12: goto(I6, id) = (L id , $)
I13: goto(I11, R) = (L * R , $)
I4
I5
120
Bison – A Parser GeneratorBison – A Parser Generator
Bison compiler
C compiler
a.out
lang.ylang.tab.clang.tab.h (-d option)
lang.tab.c a.out
tokens syntax tree
A langauge for specifying parsers and semantic analyzers
121
Bison ProgramsBison Programs
%{C declarations%}Bison declarations%%Grammar rules%%Additional C code
122
An ExampleAn Example
line expr ‘\n’expr expr ‘+’ term | termterm term ‘*’ factor | factorfactor ‘(’ expr ‘)’ | DIGIT
123
An ExampleAn Example
%token DIGIT%start line%%line : expr ‘\n’ {printf(“line: expr \\n\n”);} ;expr: expr ‘+’ term {printf(“expr: expr + term\n”);} | term {printf(“expr: term\n”} ;term: term ‘*’ factor {printf(“term: term * factor\n”;} | factor {printf(“term: factor\n”);} ;factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);} | DIGIT {printf(“factor: DIGIT\n”);} ;
124
Functions and VariablesFunctions and Variables
• yyparse(): the parser function• yylex(): the lexical analyzer function. Bison
recognizes any non-positive value as indicating the end of the input
• yylval: the attribute value of a token. Its default type is int, and can be declared to be multiple types in the first section using
%union {int ival;double dval;
}
125
Conflict ResolutionsConflict Resolutions
• A reduce/reduce conflict is resolved by choosing the production listed first
• A shift/reduce conflict is resolved in favor of shift
• A mechanism for assigning precedences and assocoativities to terminals
126
Precedence and AssociativityPrecedence and Associativity
• The precedence and associativity of operators are declared simultaneously
%nonassoc ‘<’ /* lowest */ %left ‘+’ ‘-’
%right ‘^’ /* highest */• The precedence of a rule is determined by the prec
edence of its rightmost terminal• The precedence of a rule can be modified by addin
g %prec <terminal> to its right end
127
An ExampleAn Example
%{#include <stdio.h>%}
%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS
%%
128
An ExampleAn Example
line : expr ‘\n’ ;expr: expr ‘+’ expr | expr ‘-’ expr | expr ‘*’ expr | expr ‘/’ expr | ‘-’ expr %prec UMINUS | ‘(’ expr ‘)’ | NUMBER ;
129
Error RecoveryError Recovery
• Error recovery is performed via error productions
• An error production is a production containing the predefined terminal error
• After adding an error production, A B | error
on encountering an error in the middle of B, the parser pops symbols from its stack until , shifts error, and skips input tokens until a token in FIRST()
130
Error RecoveryError Recovery
• The parser can report a syntax error by calling the user provided function yyerror(char *)
• The parser will suppress the report of another error message for 3 tokens
• You can resume error report immediately by using the macro yyerrok
• Error productions are used for major nonterminals
131
An ExampleAn Example
line : expr ‘\n’ | error ‘\n’ {yyerror("reenter last line:");
yyerrok;} ;expr: expr ‘+’ expr | expr ‘*’ expr | ‘-’ expr %prec UMINUS | ‘(’ expr ‘)’ | NUMBER ;
132
共勉
子曰︰里仁為美。擇不處仁,焉得知?
子曰︰朝聞道,夕死可矣!-- 論語
子曰︰不仁者不可以久處約, 不可以長處樂。仁者安仁,知者利仁。