1
Week 3
• Questions / Concerns• What’s due:
• Lab1b due Friday at midnight• Lab1b check-off next week (schedule will be announced on Monday)• Homework #2 due next Monday (Draw a parse tree)• Homework #3 due next Wednesday (Define grammar for your
language)• Homework #4 due next Thursday (Grammar modifications)
• Top down parser• Grammar modifications
2
Structure of Compilers
Lexical Analyzer (scanner)
Modified Source Program
Syntax Analysis(Parser)
Tokens Semantic Analysis
Syntactic Structure
Optimizer
Code Generator
Intermediate Representation
Target machine code
Symbol Table
skeletal source
programpreprocessor
3
Parser
• Choose a type of parser• Top-Down parser• Bottom-Up parser
• Choose a parsing technique• Recursive Descent • Table driven parser (LL(1) or LR(1))
• Generate a grammar for your language• Modify the grammar to fit the particular parsing technique
• Remove lambda productions• Remove unit productions• Remove left recursion• Left factor the grammar
4
Parser
• Parser is just a matching tool• It matches list of tokens with grammar rules to determine if they are
legal constructs/statements or not.• Yes/No machine• Context-Free
• It doesn’t care about context (types), it just cares about syntax• If it looks like an assignment statement, then it is an assignment
statement.
int x;
x = “Hello”;
5
Grammar #1
S -> aaSc| B
B -> bbbB |
Generate a parse tree for the input string
aaaabbbcc
6
Grammar #2
S -> E
E -> E + E
E -> E * E
E -> a |b | c
Generate a parse tree for the input string
a + b * c
7
Grammar #3
• Lua Grammar
8
Grammar• Two formats
• Context-Free Grammar• Extended Backus-Naur Form
Lua Example
laststat ::= return [explist] | break
Laststat -> return LaststatOptional | break LaststatOptional -> Explist |
varlist ::= var {`,´ var}
Varlist -> Var Varlist2
Varlist2 -> `,´ Var Varlist2 |
9
Grammar
• Two formats• Context-Free Grammar• Extended Backus-Naur Form
Mini C exampleProgram = Definition { Definition }
program -> Definition MoreDefinitions MoreDefinitions -> Definition MoreDefinitions |
Definition = Data_definition | Function_definition Definition -> Data_definition | Function_definition
Function_definition = ['int'] Function_header Function_bodyFunction_definition -> OptionalType Function_header
Function_body OptionalType -> ‘int’ |
10
Top-down parser
• Start with start symbol of the grammar.• Grab an input token and select a production rule.
• Use “stack” to store the production rule.
• Try to parse that rule by matching input tokens. • Keep going until all of the input tokens have been
processed. • If the rule is not the right one, put all the tokens back and
try a different rule. (backtracking)
11
Top-down Parser
• Ideal grammar:• Unique rule for each type of token.
• One-token look ahead
12
One token look ahead
Stat ->
local function Name Funcbody | local Namelist LocalOptional
•Based on one token “local” we should be able to pick one unique rule so we don’t have to backtrack. •What if we could combine these 2 rules into one rule by factoring out the common parts, it would eliminate the need for backtracking.
13
One token look ahead
Stat ->
local function Name Funcbody | local Namelist LocalOptional
•Left factor the grammar:
Stat -> local Morelocal
Morelocal -> function Name Funcbody | Namelist LocalOptional
14
Top-down Parser
• Ideal grammar:• Unique rule for each type of token.
• One-token look ahead
• Minimize unit productions • Unit productions don’t parse tokens immediately. It requires another
production. • It’s hard to tell which tokens match the unit productions thus more
chances for backtracking.
15
Minimize Unit Productions
S -> aaSc
S -> B
B -> bbbB
B ->
S
B
b b b B
16
Minimize Unit Productions
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Functioncall |
Prefixexp |
Tableconstructor |
Exp Binop Exp |
Unop Exp
17
Remove Unit Productions
S -> aaSc
S -> B
B -> bbbB
B ->
S -> aaSc
S -> bbbB
S ->
B -> bbbB
B ->
18
Minimize Unit Productions
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Functioncall |
Prefixexp |
Tableconstructor |
Exp Binop Exp |
Unop Exp
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Functioncall|
Prefixexp |
{ Fieldlistoptional }|
Exp Binop Exp |
Unop Exp
19
Minimize Unit Productions
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Functioncall |
Prefixexp |
Tableconstructor |
Exp Binop Exp |
Unop Exp
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Prefixexp Args |
Prefixexp `:´ Name Args |
Prefixexp |
{ Fieldlistoptional } |
Exp Binop Exp |
Unop Exp
20
Minimize Unit Productions
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Functioncall |
Prefixexp |
Tableconstructor |
Exp Binop Exp |
Unop Exp
Exp -> nil |
false |
true |
Number |
String |
`...´ |
Prefixexp Args |
Prefixexp `:´ Name Args |
Prefixexp |
{ Fieldlistoptional } |
Exp Binop Exp |
Unop Exp More left factoring needed
21
Top-down Parser
• Ideal grammar:• Unique rule for each type of token.
• One-token look ahead
• Minimize unit productions • Unit productions don’t parse tokens immediately. It requires another
production. • It’s hard to tell which tokens match the unit productions thus more
chances for backtracking.
• Lambda productions are okay but we have to process them accordingly. • Removing lambdas always add more rules. • It’s not possible to remove all lambda productions and still yield unique
token-rule matching.
• Remove left recursion in the grammar.
22
Grammar (left recursive vs. right recursive)
Right Recursion
A -> aA
A ->
Left Recursion
A -> Aa
A ->
A
a A
a A
a A
A
aA
aA
aA
Only non-recursive rule is
Same grammar?
23
Grammar (left recursive vs. right recursive)
A -> aA
A -> A -> Aa
A ->
A
a A
a A
a A
A
aA
aA
aA
Which one works for top down?
24
Grammar (left recursive vs. right recursive)
A -> aA
A -> b
A -> Aa
A -> b
A
a A
a A
a A
b
A
aA
aA
aA
b
Non-recursive rules are not only
Same grammar?
25
Remove Left Recursion in the Grammar
• Example:
A -> Aa
A -> b• Step 1: Make all left recursive rules right recursive, but give them a new non-
terminal
A -> Aa X -> aX
• Step 2: Add a lambda production to the new non-terminal X ->
• Step 3: Identify all non-recursive rules.
A -> b
• Step 4: Append the new non-terminal to the end of all non-recursive rules• A -> bX
•
A -> A… Left Recursive rule
26
Grammar (left recursive vs. right recursive)
A -> bX
X -> aX | A -> Aa
A -> b
A
b X
a X
a X
A
aA
aA
aA
b
Non-recursive rules are not only
Same grammar?
a
27
Remove Left Recursion
S -> Sab
S -> c
S -> d
X -> abX
X -> S -> cX
S -> dX
28
Remove Left Recursion
PARAMLIST -> IDLIST : TYPE |
PARAMLIST ; IDLIST : TYPE
PARAMLIST2 -> ; IDLIST : TYPE PARAMLIST2
PARAMLIST2 -> PARAMLIST -> IDLIST : TYPE PARAMLIST2
29
Remove Unit Production Example
S -> abSc
S -> A
S -> AB
A -> aA
A -> B -> bbB
B ->
S -> abSc
S -> aA
S -> S -> AB
A -> aA
A -> B -> bbB
B ->
30
Remove Unit Production Example
TERM -> FACTOR
FACTOR ->
id
| id ( EXPR_LIST )
| num
| ( EXPRESSION )
| not FACTOR
TERM ->
id
| id ( EXPR_LIST )
| num
| ( EXPRESSION )
| not FACTOR
FACTOR ->
id
| id ( EXPR_LIST )
| num
| ( EXPRESSION )
| not FACTOR
31
Left Factor Example
S -> abS
S -> aaA
S -> a
A -> bA
A ->
S -> aX
X -> bS
X -> aA
X -> A -> bA
A ->
32
Left Factor Example
EXPRESSION ->
SIMPLE_EXPR
| SIMPLE_EXPR relop SIMPLE_EXPR
EXPRESSION -> SIMPLE_EXPR RestOfExp RestOfExp -> | relop SIMPLE_EXPR
33
In-Class Exercise #5
• Remove Unit Production
S -> abS | bSa | A | d A -> c | dA
• Left Factor this grammar FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR
• Remove Left recursion:
SIMPLE_EXPR -> TERM
| SIGN TERM
| SIMPLE_EXPR addop TERM