CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic2: Parsing and Lexical...

Preview:

Citation preview

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Winter 2006

Topic2: Parsing and Lexical Analysis

José Nelson Amaralhttp://www.cs.ualberta.ca/~amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

2

Reading List

Appel, Chapter 2, 3, 4, and 5

AhoSethiUllman, Chapter 2, 3, 4, and 5

CMPUT 680 - Compiler Design and Optimization

3

Some Important Basic Definitions

lexical: of or relating to the morphemes of a language.

morpheme: a meaningul linguistic unit that cannotbe divided into smaller meaningful parts.

lexical analysis: the task concerned with breaking aninput into its smallest meaningful units, called tokens.

CMPUT 680 - Compiler Design and Optimization

4

Some Important Basic Definitions

syntax: the way in which words are put togetherto form phrases, clauses, or sentences. The rulesgoverning the formation of statements in a programminglanguage.

syntax analysis: the task concerned with fitting asequence of tokens into a specified syntax.

parsing: To break a sentence down into its componentparts of speech with an explanation of the form, function,and syntactical relationship of each part.

CMPUT 680 - Compiler Design and Optimization

5

Some Important Basic Definitions

parsing = lexical analysis + syntax analysis

semantic analysis: the task concerned with calculating the program’s meaning.

CMPUT 680 - Compiler Design and Optimization

6

Regular Expressions

Symbol: a A regular expression formed by a.

Alternation:M | N A regular expression formed by M or N.

Concatenation:M • N A regular expression formed by M followed by N.

Epsilon: The empty string.

Repetition:M* A regular expression formed by zero or

more repetitions of M.

CMPUT 680 - Compiler Design and Optimization

7

Building a Recognizer for a Language

General approach:1. Build a deterministic finite automaton (DFA) from regular expression E

2. Execute the DFA to determine whether an input string belongs to L(E)

Note: The DFA construction is done automatically by a tool such as lex.

CMPUT 680 - Compiler Design and Optimization

8

Finite Automata

A nondeterministic finite automaton A = {S, , s0, F, move }consists of:1. A set of states S2. A set of input symbols (the input symbol alphabet)3. A state s0 that is distinguished as the start state4. A state F distinguished as the accepting state5. A transition function move that maps state-symbol pairs into sets of state.

In a Deterministic Finite State Automata (DFA), the functionmove maps each state-symbol pair into a unique state.

CMPUT 680 - Compiler Design and Optimization

9

Finite Automata

A Deterministic Finite Automaton (DFA):

A Nondeterministic Finite Automaton (NFA):

0 1 2 3a

b

b bstart

0 1 2 3

a

a

b

b bstart

What languages areaccepted by theseautomata?

b*abb

(a|b)*abb

(Aho,Sethi,Ullman, pp. 114)

CMPUT 680 - Compiler Design and Optimization

10

Another NFA

start

a

b

a

b

An -transition is taken without consuming any character from the input.

What does the NFA above accepts?

aa*|bb*

(Aho,Sethi,Ullman, pp. 116)

CMPUT 680 - Compiler Design and Optimization

11

Constructing NFA

It is very simple. Remember that a regular expression is formed by the use of alternation, concatenation, and repetition.

How do we define an NFA that accepts a regular expression?

Thus all we need to do is to know how to build the NFAfor a single symbol, and how to compose NFAs.

CMPUT 680 - Compiler Design and Optimization

12

Composing NFAs with Alternation

The NFA for a symbol a is: ai fstart

Given two NFA N(s) and N(t)

N(s)

N(t)

(Aho,Sethi,Ullman, pp. 122)

starti

f

, the NFA N(s|t) is:

CMPUT 680 - Compiler Design and Optimization

13

Composing NFAs with Concatenation

start

Given two NFA N(s) and N(t), the NFA N(st) is:

N(s) N(t)i f

(Aho,Sethi,Ullman, pp. 123)

CMPUT 680 - Compiler Design and Optimization

14

Composing NFAs with Repetition

The NFA for N(s*) is

N(s)

fi

(Aho,Sethi,Ullman, pp. 123)

CMPUT 680 - Compiler Design and Optimization

15

Properties of the NFA

Following this construction rules, we obtain an NFA N(r) with these properties: N(r) has at most twice as many states as the number of symbols and operators in r;

N(r) has exactly one starting and one accepting state;

Each state of N(r) has at most one outgoing transition on a symbol of the alphabet or at most two outgoing -transitions.

(Aho,Sethi,Ullman, pp. 124)

CMPUT 680 - Compiler Design and Optimization

16

How to Parse a Regular Expression?

Given a DFA, we can generate an automaton that recognizes the longest substring of an inputthat is a valid token.

Using the three simple rules presented, it is easyto generate an NFA to recognize a regular expression.

Given a regular expression, how do we generatean automaton to recognize tokens?

Create an NFA and convert it to a DFA.

CMPUT 680 - Compiler Design and Optimization

17

a An ordinary character stands for itself. The empty string.

Another way to write the empty string.M | N Alternation, Choosing from M or N.M N Concatenation, an M followed by an N.M* Repetition (zero or more times).M+ Repetition (one or more times).M? Optional, zero or one occurrence of M.[a -zA -Z] Character set alternation.

. Stands for any single character except newline.

“a.+*” Quotation, a string in quotes stands for itself

literally.

Regular expression notation: An Example

(Appel, pp. 20)

CMPUT 680 - Compiler Design and Optimization

18

if {return IF;}

[a - z] [a - z0 - 9 ] * {return ID;}

[0 - 9] + {return NUM;}

([0 - 9] + “.” [0 - 9] *) | ([0 - 9] * “.” [0 - 9] +) {return REAL;}

(“--” [a - z]* “\n”) | (“ ” | “ \n ” | “ \t ”) + {/* do nothing*/}

. {error ();}

(Appel, pp. 20)

Regular expressions for some tokens

CMPUT 680 - Compiler Design and Optimization

19

Building Finite Automatas for Lexical

Tokens

(Appel, pp. 21)

The NFA for a symbol i is: i1 2start

The NFA for the regular expression if is:

f 31start 2i

The NFA for a symbol f is: f 2start 1

IF

if {return IF;}

CMPUT 680 - Compiler Design and Optimization

20

Building Finite Automatas for Lexical

Tokens

(Appel, pp. 21)

a-z 21start

ID

[a-z] [a-z0-9 ] * {return ID;}

0-9

a-z

CMPUT 680 - Compiler Design and Optimization

21

Building Finite Automatas for Lexical

Tokens

(Appel, pp. 21)

0-9 21start

NUM

[0 - 9] + {return NUM;}

0-9

CMPUT 680 - Compiler Design and Optimization

22

Building Finite Automatas for Lexical

Tokens

(Appel, pp. 21)

1start

REAL

([0 - 9] + “.” [0 - 9] *) | ([0 - 9] * “.” [0 - 9] +) {return REAL;}

0-9

0-9

2 3.

0-9

0-950-94

.

CMPUT 680 - Compiler Design and Optimization

23

Building Finite Automatas for Lexical

Tokens

(Appel, pp. 21)

1start

/* do nothing */

(“--” [a - z]* “\n”) | (“ ” | “ \n ” | “ \t ”) + {/* do nothing*/}

- 2

a-z

- 3 4\n

\n\t

5blank \n

\tblank

CMPUT 680 - Compiler Design and Optimization

24

ID

1 20 - 9 0 - 9

NUM

0 - 9

1 2 3

4 5

0 - 9

0 - 9 0 - 9

0 - 9

REAL

1 2 43

5

a-z\n- -

blank, etc.blank, etc.

White space

21any but \n

error

IF

1 2a-z a-z

0-9

Building Finite Automatas for Lexical

Tokens

1 2i f

3

.

.

(Appel, pp. 21)

CMPUT 680 - Compiler Design and Optimization

25

Conversion of NFA into DFA

(Appel, pp. 27)

What states can be reached from state 1 without consuming a character?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

26

Conversion of NFA into DFA

What states can be reached from state 1 without consuming a character?

{1,4,9,14} form the -closure of state 1

(Appel, pp. 27)

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

27

Conversion of NFA into DFA

What are all the state closures in this NFA?

closure(1) = {1,4,9,14}closure(5) = {5,6,8}closure(8) = {6,8}closure(7) = {7,8}

(Appel, pp. 27)

closure(10) = {10,11,13}closure(13) = {11,13}closure(12) = {12,13}

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

28

Conversion of NFA into DFA

Given a set of NFA states T, the -closure(T) is theset of states that are reachable through -transiton from

any state s T.

Given a set of NFA states T, move(T, a) is theset of states that are reachable on input a

from any state sT.

(Aho,Sethi,Ullman, pp. 118)

CMPUT 680 - Compiler Design and Optimization

29

Problem Statement for Conversion of NFA into DFA

Given an NFA find the DFA with the minimum number of states that has the same behavior as the NFA for all inputs.

If the initial state in the NFA is s0, then theset of states in the DFA, Dstates, is initialized with a

state representing -closure(s0).

(Aho,Sethi,Ullman, pp. 118)

CMPUT 680 - Compiler Design and Optimization

30

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}

1-4-9-14

Now we need to compute:

move(1-4-9-14,a-h) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

31

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}

1-4-9-14

Now we need to compute:

move(1-4-9-14,a-h) = {5,15}

-closure({5,15}) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

32

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}

1-4-9-14

Now we need to compute:

move(1-4-9-14,a-h) = {5,15}

-closure({5,15}) = {5,6,8,15}

a-h 5-6-8-15

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

CMPUT 680 - Compiler Design and Optimization

33

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, i) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

CMPUT 680 - Compiler Design and Optimization

34

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, i) = {2,5,15}

-closure({2,5,15}) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

CMPUT 680 - Compiler Design and Optimization

35

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, i) = {2,5,15}

-closure({2,5,15}) = {2,5,6,8,15}

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

CMPUT 680 - Compiler Design and Optimization

36

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, j-z) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

CMPUT 680 - Compiler Design and Optimization

37

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, j-z) = {5,15}

-closure({5,15}) = ?

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

CMPUT 680 - Compiler Design and Optimization

38

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, j-z) = {5,15}

-closure({5,15}) = {5,6,8,15}

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z

CMPUT 680 - Compiler Design and Optimization

39

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, 0-9) = {10,15}

-closure({10,15}) = {10,11,13,15}

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

CMPUT 680 - Compiler Design and Optimization

40

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}move(1-4-9-14, other) = {15}

-closure({15}) = {15}

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

15other

CMPUT 680 - Compiler Design and Optimization

41

Conversion of NFA into DFA

(Appel, pp. 27)

Dstates = {1-4-9-14}

The analysis for 1-4-9-14is complete. We mark it andpick another state in the DFAto analyse.

2 3 84 5 6 7

139 10 11 1214 15

1

a-z

0-90-9

a-z

0-9i

f

IF

error

NUM

ID

anycharacter

1-4-9-14

a-h 5-6-8-15

2-5-6-8-15i

j-z10-11-13-15

0-9

15other

CMPUT 680 - Compiler Design and Optimization

42

The corresponding DFA

5-6-8-15

2-5-6-8-15

10-11-13-15

3-6-7-8

11-12-13

6-7-8

15

1-4-9-14

a-e, g-z, 0-9

a-z,0-9

a-z,0-9

0-9

0-9

f

i

a-h

j-z

0-9

other

ID

ID

NUM NUM

IF

error

ID

a-z,0-9

(Appel, pp. 29)

See pp. 118 of Aho-Sethi-Ullmanand pp. 29 of Appel.

CMPUT 680 - Compiler Design and Optimization

43

Lexical Analyzer and Parser

lexicalanalyzer

Syntaxanalyzer

symboltable

get nexttoken

(Aho,Sethi,Ullman, pp. 160)

token: smallest meaningful sequence of characters of interest in source program

SourceProgram

get nextchar

next char next token

(Contains a record for each identifier)

CMPUT 680 - Compiler Design and Optimization

44

Definition of Context-Free Grammars

A context-free grammar G = (T, N, S, P) consists of:

1. T, a set of terminals (scanner tokens).2. N, a set of nonterminals (syntactic variables generated by productions).

3. S, a designated start nonterminal.4. P, a set of productions. Each production has the form, A::= , where A is a nonterminal and is a sentential form , i.e., a string of zero or more grammar symbols (terminals/nonterminals).

CMPUT 680 - Compiler Design and Optimization

45

Syntax Analysis

Syntax Analysis Problem Statement: To find a derivation sequence in a grammar G for the input token stream (or say that none exists).

CMPUT 680 - Compiler Design and Optimization

46

Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps.

Parse trees

A parse tree is a graphical representation of a derivation sequence of a sentential form.

CMPUT 680 - Compiler Design and Optimization

47

Derivation

E E + E | E E | ( E ) | - E | id

Given the following grammar:

Is the string -(id + id) a sentence in this grammar?

Yes because there is the following derivation:

E -E -(E) -(E + E) -(id + id)

Where reads “derives in one step”.

(Aho,Sethi,Ullman, pp. 168)

CMPUT 680 - Compiler Design and Optimization

48

DerivationE E + E | E E | ( E ) | - E | id

Lets examine this derivation:

E -E -(E) -(E + E) -(id + id)

E E

E-

E

E-

E( )

E

E-

E( )

+E E

E

E-

E( )

+E E

id idThis is a top-down derivationbecause we start building theparse tree at the top parse tree

(Aho,Sethi,Ullman, pp. 170)

CMPUT 680 - Compiler Design and Optimization

49Which derivation tree is correct?

Another Derivation Example

Find a derivation for the expression: id + id idE E

+E E

E

+E E

E E

E

+E E

E E

id id

id

E E

E E

E

E E

+E E

E

E E

+E E

id id

id

E E + E | E E | ( E ) | - E | id

(Aho,Sethi,Ullman, pp. 171)

CMPUT 680 - Compiler Design and Optimization

50

According to the grammar, both are correct.

Another Derivation Example

Find a derivation for the expression: id + id idE

+E E

E E

id id

id

E

+E E

E E

id id

id

A grammar that produces more than oneparse tree for any input sentence is saidto be an ambiguous grammar.

E E + E | E E | ( E ) | - E | id

(Aho,Sethi,Ullman, pp. 171)

CMPUT 680 - Compiler Design and Optimization

51

Left Recursion

Consider the grammar:E E + T | TT T F | FF ( E ) | id

A top-down parser might loop forever when parsingan expression using this grammar

E E

+E T

E

+E T

+E T

E

+E T

+E T

+E T

(Aho,Sethi,Ullman, pp. 176)

CMPUT 680 - Compiler Design and Optimization

52

Left Recursion

Consider the grammar:E E + T | TT T F | FF ( E ) | id

A grammar that has at least one production of the formA A is a left recursive grammar.

Top-down parsers do not work with left-recursivegrammars.

Left-recursion can often be eliminated by rewriting thegrammar.

(Aho,Sethi,Ullman, pp. 176)

CMPUT 680 - Compiler Design and Optimization

53

Left Recursion

This left-recursivegrammar:

E E + T | TT T F | FF ( E ) | id

Can be re-written to eliminate the immediate left recursion:

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

(Aho,Sethi,Ullman, pp. 176)

CMPUT 680 - Compiler Design and Optimization

54

Predictive Parsing

Consider the grammar:

stm if expr then stmt else stmt | while expr do stmt | begin stmt_list end

A parser for this grammar can be written with the following simple structure: switch(gettoken())

{ case if: …. break;

case while: …. break;

case begin: …. break;

default: reject input;}

Based only on the first token,the parser knows which rule to use to derive a statement.

Therefore this is called apredictive parser.

(Aho,Sethi,Ullman, pp. 183)

CMPUT 680 - Compiler Design and Optimization

55

Left Factoring

The following grammar:

stmt if expr then stmt else stmt | if expr then stmt

Cannot be parsed by a predictive parser that looksone element ahead.

But the grammar can be re-written:

stmt if expr then stmt stmt’stmt‘ else stmt |

Where is the empty string.

(Aho,Sethi,Ullman, pp. 178)

Rewriting a grammar to eliminate multiple productionsstarting with the same token is called left factoring.

CMPUT 680 - Compiler Design and Optimization

56

A Predictive Parser

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

Grammar:

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

ParsingTable:

(Aho,Sethi,Ullman, pp. 188)

CMPUT 680 - Compiler Design and Optimization

57

A Predictive Parser

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

STACK:

id idid+ INPUT:

Predictive ParsingProgram

E

$

$ OUTPUT:

E

T

E’

$

T E’

PARSINGTABLE:

CMPUT 680 - Compiler Design and Optimization

58

T

E’

$

T

E’

$

A Predictive Parser

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

STACK:

id idid+ INPUT:

Predictive ParsingProgram

$ OUTPUT:

E

F

T’

E’

$

F T’

T E’

PARSINGTABLE: (Aho,Sethi,

Ullman, pp. 186)

CMPUT 680 - Compiler Design and Optimization

59

(Aho,Sethi,Ullman, pp. 188)

T

E’

$

T

E’

$

A Predictive Parser

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

STACK:

id idid+ INPUT:

Predictive ParsingProgram

$ OUTPUT:

E

F

T’

E’

$

F T’

T E’

id

T’

E’

$id

PARSINGTABLE:

CMPUT 680 - Compiler Design and Optimization

60

A Predictive Parser

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

STACK:

id idid+ INPUT:

Predictive ParsingProgram

$ OUTPUT:

E

F

T’

E’

$

F T’

T E’

id

T’

E’

$id

Action when Top(Stack) = input $ : Pop stack, advance input.

PARSINGTABLE: (Aho,Sethi,

Ullman, pp. 188)

CMPUT 680 - Compiler Design and Optimization

61

A Predictive Parser

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

STACK:

id idid+ INPUT:

Predictive ParsingProgram

$ OUTPUT:

E

F T’

T E’

id

T’

E’

$

E’

$

PARSINGTABLE: (Aho,Sethi,

Ullman, pp. 188)

CMPUT 680 - Compiler Design and Optimization

62

A Predictive Parser

E

F T’

T E’

id

T+ E’

F T’

id F T’

id

The predictive parser proceedsin this fashion emiting thefollowing productions:

E’ +TE’T FT’F idT’ FT’F idT’ E’

When Top(Stack) = input = $the parser halts and accepts the

input string. (Aho,Sethi,Ullman, pp. 188)

CMPUT 680 - Compiler Design and Optimization

63

LL(k) Parser

This parser parses from left to right, and does aleftmost-derivation. It looks up 1 symbol ahead to choose its next action. Therefore, it is known asa LL(1) parser.

An LL(k) parser looks k symbols ahead to decideits action.

CMPUT 680 - Compiler Design and Optimization

64

The Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

Given this grammar:

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

PARSINGTABLE:

How is this parsing table built?

CMPUT 680 - Compiler Design and Optimization

65

FIRST and FOLLOW

We need to build a FIRST set and a FOLLOW setfor each symbol in the grammar.

FIRST() is the set of terminal symbols that can begin any string derived from .

The elements of FIRST and FOLLOW areterminal symbols.

FOLLOW() is the set of terminal symbols that can follow :

t FOLLOW() derivation containing t

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

66

Rules to Create FIRST

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If X is a terminal, FIRST(X) = {X}

FIRST(id) = {id}FIRST() = {}FIRST(+) = {+}

SETS:

2. If X , then FIRST(X)3. If X Y1Y2 ••• Yk

FIRST(() = {(}FIRST()) = {)}

FIRST rules:

*and Y1 ••• Yi-1 and a FIRST(Yi)

then a FIRST(X)

FIRST(F) = {(, id}FIRST(T) = FIRST(F) = {(, id}FIRST(E) = FIRST(T) = {(, id}

FIRST(E’) = {} {+, }FIRST(T’) = {} {, }

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

67

Rules to Create FOLLOW

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If S is the start symbol, then $ FOLLOW(S)

FOLLOW(E) = {$}

FOLLOW(E’) = { ), $}

SETS:

2. If A B, and a FIRST() and a then a FOLLOW(B)3. If A B and a FOLLOW(A) then a FOLLOW(B)

FOLLOW rules:

{ ), $}

3a. If A B and and a FOLLOW(A) then a FOLLOW(B)

* FOLLOW(T) = { ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

A and B are non-terminals, and are strings of grammar symbols

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

68

Rules to Create FOLLOW

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If S is the start symbol, then $ FOLLOW(S)

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

SETS: 3. If A B and a FOLLOW(A) then a FOLLOW(B)

FOLLOW rules:

3a. If A B and and a FOLLOW(A) then a FOLLOW(B)

* FOLLOW(T) = { ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

2. If A B, and a FIRST() and a then a FOLLOW(B)

{+, ), $}

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

69

Rules to Create FOLLOW

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If S is the start symbol, then $ FOLLOW(S)

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

SETS:

FOLLOW rules:

FOLLOW(T) = {+, ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

2. If A B, and a FIRST() and a then a FOLLOW(B)3. If A B and a FOLLOW(A) then a FOLLOW(B)

FOLLOW(T’) = {+, ), $}

3a. If A B and and a FOLLOW(A) then a FOLLOW(B)

*

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

70

Rules to Create FOLLOW

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If S is the start symbol, then $ FOLLOW(S)

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

SETS:

FOLLOW rules:

FOLLOW(T) = {+, ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

2. If A B, and a FIRST() and a then a FOLLOW(B)3. If A B and a FOLLOW(A) then a FOLLOW(B)

FOLLOW(T’) = {+, ), $}

3a. If A B and and a FOLLOW(A) then a FOLLOW(B)

* FOLLOW(F) = {+, ), $}

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

71

Rules to Create FOLLOW

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

1. If S is the start symbol, then $ FOLLOW(S)

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

SETS:

FOLLOW rules:

FOLLOW(T) = {+, ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

3. If A B and a FOLLOW(A) then a FOLLOW(B)

FOLLOW(T’) = {+, ), $}

3a. If A B and and a FOLLOW(A) then a FOLLOW(B)

* FOLLOW(F) = {+, ), $}

2. If A B, and a FIRST() and a then a FOLLOW(B)

{+, , ), $}

(Aho,Sethi,Ullman, pp. 189)

CMPUT 680 - Compiler Design and Optimization

72

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

73

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

74

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

75

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

76

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

77

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]2. If A : if FIRST(), add A to M[A, b] for each terminal b FOLLOW(A),

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

78

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]2. If A : if FIRST(), add A to M[A, b] for each terminal b FOLLOW(A),

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

79

Rules to Build Parsing Table

E TE’E’ +TE’ | T FT’T’ FT’ | F ( E ) | id

GRAMMAR:

FOLLOW(E) = {), $}

FOLLOW(E’) = { ), $}

FOLLOW SETS:

FOLLOW(T) = {+, ), $}

FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, , ), $}

FIRST(F) = {(, id}FIRST(T) = {(, id}FIRST(E) = {(, id}

FIRST(E’) = {+, }FIRST(T’) = { , }

FIRST SETS:

PARSINGTABLE:

1. If A : if a FIRST(), add A to M[A, a]2. If A : if FIRST(), add A to M[A, b] for each terminal b FOLLOW(A), 3. If A : if FIRST(), and $ FOLLOW(A), add A to M[A, $]

INPUT SYMBOL NON- TERMINAL id + * ( ) $

E E → TE’ E → TE’ E’ E’ → +TE’ E’ → E’ → T T → FT’ T → FT’ T’ T’→ T’ → *FT’ T’ → T’ → F F → id F → (E)

(Aho,Sethi,Ullman, pp. 190)

CMPUT 680 - Compiler Design and Optimization

80

Bottom-Up and Top-Down Parsers

Top-down parsers: starts constructing the parse tree at thetop (root) of the tree and move down towards the leaves.Easy to implement by hand, but work with restricted grammars.example: predictive parsers

Bottom-up parsers: build the nodes on the bottom of theparse tree first.Suitable for automatic parser generation, handle a larger classof grammars.examples: shift-reduce parser (or LR(k) parsers)

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

81

Bottom-Up Parser

A bottom-up parser, or a shift-reduce parser, beginsat the leaves and works up to the top of the tree.

The reduction steps trace a rightmost derivationon reverse.

S aABeA Abc | bB d

Consider the Grammar:

We want to parse the input string abbcde.

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

82

Bottom-Up Parser Example

a dbb cINPUT:

Bottom-Up ParsingProgram

e OUTPUT:$

ProductionS aABeA Abc

A bB d

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

83

Bottom-Up Parser Example

a dbb cINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A

b

$

ProductionS aABeA Abc

A bB d

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

84

Bottom-Up Parser Example

a dbA cINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A

b

$

ProductionS aABeA Abc

A bB d

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

85

Bottom-Up Parser Example

a dbA cINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A

b

$

ProductionS aABeA Abc

A bB d

We are not reducing here in this example.

A parser would reduce, get stuck and then backtrack!

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

86

Bottom-Up Parser Example

a dbA cINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A

b

$

ProductionS aABeA Abc

A bB d

c

A

b

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

87

Bottom-Up Parser Example

a dAINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A c

A

b

$

ProductionS aABeA Abc

A bB d

b

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

88

Bottom-Up Parser Example

a dAINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A c

A

b

$

ProductionS aABeA Abc

A bB d

b

B

d

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

89

Bottom-Up Parser Example

a BAINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A c

A

b

$

ProductionS aABeA Abc

A bB d

b

B

d

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

90

Bottom-Up Parser Example

a BAINPUT:

Bottom-Up ParsingProgram

e OUTPUT:

A c

A

b

$

ProductionS aABeA Abc

A bB d

b

B

d

a

S

e

(Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

91

Bottom-Up Parser Example

SINPUT:

Bottom-Up ParsingProgram

OUTPUT:

A c

A

b

$

ProductionS aABeA Abc

A bB d

b

B

d

a

S

e

This parser is known as an LR Parser because it scans the input from Left to right, and it constructs

a Rightmost derivation in reverse order. (Aho,Sethi,Ullman, pp. 195)

CMPUT 680 - Compiler Design and Optimization

92

Bottom-Up Parser Example

The scanning of productions for matching withhandles in the input string, and backtracking makesthe method used in the previous example veryinneficient.

Can we do better?

CMPUT 680 - Compiler Design and Optimization

93

LR Parser Example

Input

Stack

LR ParsingProgram

action goto

Output

(Aho,Sethi,Ullman, pp. 217)

CMPUT 680 - Compiler Design and Optimization

94

LR Parser Example

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

The following grammar:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

Can be parsed with this actionand goto table

(Aho,Sethi,Ullman, pp. 219)

CMPUT 680 - Compiler Design and Optimization

95

LR Parser Exampleid idid+ INPUT: $

STACK: E0

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

GRAMMAR:

OUTPUT:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

96

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

E5

id

0

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

97

OUTPUT:

0

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

98

OUTPUT:

E3

F

0

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

99

OUTPUT:

0

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

100

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

E2

T

0

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

101

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

E7

2

T

0

T

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

102

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

E5

id

7

2

T

0

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

F

id

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

103

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

E7

2

T

0action goto State

id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

F

id

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

104

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

E10

F

7

2

T

0

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

T F

F

id

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

105

OUTPUT:

0

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

T F

F

id

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

106

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

2

T

0

T

T F

F

id

idaction goto State

id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

E

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

107

OUTPUT:

0

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

T F

F

id

id

E

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

108

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

1

E

0

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

T F

F

id

id

E

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

109

OUTPUT:LR Parser Example

id idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

T

T F

F

id

id

E

6

+

1

E

0

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

110

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

OUTPUT:

T

T F

F

id

id

E

5

id

6

+

1

E

0

F

id

GRAMMAR:

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

111

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

OUTPUT:

T

T F

F

id

id

E

6

+

1

E

0

F

id

GRAMMAR:

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

112

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

OUTPUT:

T

T F

F

id

id

E

3

F

6

+

1

E

0

F

id

GRAMMAR:

T

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

113

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

OUTPUT:

T

T F

F

id

id

E

6

+

1

E

0

F

id

GRAMMAR:

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

114

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

OUTPUT:

T

T F

F

id

id

E

9

T

6

+

1

E

0

F

id

GRAMMAR:

T

E

+

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

115

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

0

GRAMMAR:

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

OUTPUT:

T

T F

F

id

id

E

F

id

T

E

+

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

116

LR Parser Exampleid idid +INPUT: $

STACK:

(1) E E + T(2) E’ T(3) T T F(4) T F(5) F ( E ) (6) F id

LR ParsingProgram

action goto State id + * ( ) $ E T F

0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

OUTPUT:

T

T F

F

id

id

E

1

E

0

F

id

GRAMMAR:

T

E

+

(Aho,Sethi,Ullman, pp. 220)

CMPUT 680 - Compiler Design and Optimization

117

Constructing Parsing Tables

All LR parsers use the same parsing program thatwe demonstrated in the previous slides. What differentiates the LR parsers are the action and the goto tables:Simple LR (SLR): succeds for the fewest grammars, but is the easiest to implement.

Canonical LR: succeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck.

Lookahead LR (LALR): succeds for most common syntaticconstructions used in programming languages, but producesLR tables much smaller than canonical LR.

(See AhoSethiUllman pp. 221-230).

(See AhoSethiUllman pp. 236-247).

(See AhoSethiUllman pp. 230-236).

(Aho,Sethi,Ullman, pp. 221)

CMPUT 680 - Compiler Design and Optimization

118

Using Lex

Lexcompiler

Lexsource

programlex.l

lex.yy.c

Ccompiler

lex.yy.c a.out

a.outInput

stream

sequenceof

tokens

(Aho-Sethi-Ullman, pp. 258)

CMPUT 680 - Compiler Design and Optimization

119

Parsing Action Conflicts

If the grammar specified is ambiguous, yacc willreport parsing action conflicts.

These conflicts can be reduce/reduce conflicts orshift/reduce conflicts.

Yacc has rules to resolve such conflicts automatically(see AhoSethiUllman, pp. 262-264), but the resultingparser might not have the behavior intended by thegrammar writer..

Whenever you see a conflict report, rerun yacc withthe -v flag, examine the y.output file, and re-writeyour grammar to eliminate the conflicts.

(Aho-Sethi-Ullman, pp. 262)

CMPUT 680 - Compiler Design and Optimization

120

Three-Address Statements

A popular form of intermediate code used in optimizing compilers is three-address statements (or variations, such as quadruples).

Source statement:x = a + b c + d

Three address statements with temporaries t1 and t2:

t1 = b ct2 = a + t1

x = t2 + d

(Aho-Sethi-Ullman, pp. 466)

CMPUT 680 - Compiler Design and Optimization

121

Intermediate Code Generation

Reading List:Aho-Sethi-Ullman:Chapter 8.1 ~ 8.3, Chapter 8.7

CMPUT 680 - Compiler Design and Optimization

122

Lexical Analyzer (Scanner)+

Syntax Analyzer (Parser)+ Semantic Analyzer

Abstract Syntax Tree with attributes

Intermediate-code Generator

Non-optimized Intermediate Code

FrontEnd

ErrorMessage

Front End of a Compiler

CMPUT 680 - Compiler Design and Optimization

123

Component-Based Approach to Building Compilers

Target-1 Code Generator Target-2 Code Generator

Intermediate-code Optimizer

Language-1 Front End

Source programin Language-1

Language-2 Front End

Source programin Language-2

Non-optimized Intermediate Code

Optimized Intermediate Code

Target-1 machine code Target-2 machine code

CMPUT 680 - Compiler Design and Optimization

124

Advantages of Using an Intermediate Language

1. Retargeting - Build a compiler for a new machine by attaching a new code generator to an existing front-end.

2. Optimization - reuse intermediate code optimizers in compilers for different languages and different machines.

Note: the terms “intermediate code”, “intermediate language”, and “intermediate representation” are all used interchangeably.

position := initial + rate * 60

Th

e P

has

es o

f a

Co

mp

iler

lexical analyzer

id1 := id2 + id3 * 60

syntax analyzer

:=

id1 +

id2 *

id3 60

semantic analyzer

:=

id1 +

id2 *

id3 inttoreal

60

intermediate code generator

temp1 := inttoreal (60)temp2 := id3 * temp1temp3 := id2 + temp2id1 := temp3

code optimizer

temp1 := id3 * 60.0id1 := id2 + temp1

code generator

MOVF id3, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1

Recommended