CH4.1 CSE244 Sections 4.1-4.4: Syntax Analysis Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road

CH4.1

CSE244

Sections 4.1-4.4: Syntax AnalysisSections 4.1-4.4: Syntax Analysis

Aggelos KiayiasComputer Science & Engineering Department

The University of Connecticut371 Fairfield Road

Storrs, CT [email protected]

http://www.cse.uconn.edu/~akiayias

CH4.2

CSE244

Syntax Analysis - ParsingSyntax Analysis - Parsing

An overview of parsing : An overview of parsing : Functions & ResponsibilitiesFunctions & Responsibilities

Context Free GrammarsContext Free Grammars Concepts & TerminologyConcepts & Terminology

Writing and Designing GrammarsWriting and Designing Grammars Resolving Grammar Problems / DifficultiesResolving Grammar Problems / Difficulties Top-Down Parsing Top-Down Parsing

Recursive Descent & Predictive LLRecursive Descent & Predictive LL Bottom-Up ParsingBottom-Up Parsing

LR & LALRLR & LALR Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead

CH4.3

CSE244

An Overview of ParsingAn Overview of Parsing

Why are Grammars to formally describe Languages Important ?

1. Precise, easy-to-understand representations

2. Compiler-writing tools can take grammar and generate a compiler

3. allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones

ADA ADA9x, C++ Adds: Templates, exceptions,

How do grammars relate to parsing process ?

CH4.4

CSE244

Parsing During CompilationParsing During Compilation

interm repres

errors

lexical analyzer

parserrest of

front end

symbol table

source program

parse treeget next

token

token

regular expressions

• also technically part or parsing

• includes augmenting info on tokens in source, type checking, semantic analysis

• uses a grammar to check structure of tokens• produces a parse tree• syntactic errors and recovery• recognize correct syntax• report errors

CH4.5

CSE244

Parsing ResponsibilitiesParsing Responsibilities

Syntax Error Identification / Handling

Recall typical error types:

Lexical : Misspellings

Syntactic : Omission, wrong order of tokens

Semantic : Incompatible types

Logical : Infinite loop / recursive call

Majority of error processing occurs during syntax analysis

NOTE: Not all errors are identifiable !! Which ones?

CH4.6

CSE244

Key Issues – Error ProcessingKey Issues – Error Processing

• Detecting errors

• Finding position at which they occur

• Clear / accurate presentation

• Recover (pass over) to continue and find later

errors

• Don’t impact compilation of “correct”

programs

CH4.7

CSE244

What are some Typical Errors ?What are some Typical Errors ?

#include<stdio.h>

int f1(int v)

{ int i,j=0;

for (i=1;i<5;i++)

{ j=v+f2(i) }

return j; }

int f2(int u)

{ int j;

j=u+f1(u*u)

return j; }

int main()

{ int i,j=0;

for (i=1;i<10;i++)

{ j=j+i*I; printf(“%d\n”,i);

printf("%d\n",f1(j));

return 0;

}

Which are “easy” to recover from? Which are “hard” ?

As reported by MS VC++

'f2' undefined; assuming extern returning intsyntax error : missing ';' before ‘}‘syntax error : missing ';' before ‘return‘fatal error : unexpected end of file found

CH4.8

CSE244

Error Recovery StrategiesError Recovery Strategies

Panic Mode– Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statementPhrase Level – Local correction on input -- “,” ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped

CH4.9

CSE244

Error Recovery Strategies – (2)Error Recovery Strategies – (2)

Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- example: add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issues

CH4.10

CSE244

Motivating GrammarsMotivating Grammars

• Regular Expressions

Basis of lexical analysis

Represent regular languages

• Context Free Grammars

Basis of parsing

Represent language constructs

Characterize context free languages

Reg. Lang. CFLs

EXAMPLE: anbn , n 1 : Is it regular ?

CH4.11

CSE244

Context Free Grammars : Context Free Grammars : Concepts & TerminologyConcepts & Terminology

Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:

T: Terminals / tokens of the language

NT: Non-terminals to denote sets of strings generated by the grammar & in the language

S: Start symbol, SNT, which defines all strings of the language

PR: Production rules to indicate how T and NT are combined to generate valid strings of the language.

PR: NT (T | NT)*

Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model

CH4.12

CSE244

Context Free Grammars : A First LookContext Free Grammars : A First Look

assign_stmt id := expr ;

expr expr operator term

expr term

term id

term real

term integer

operator +

operator -

What do “blue” symbols represent?

Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a sequence of terminals / tokens.

Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

CH4.13

CSE244

DerivationDerivation

Let’s derive: id := id + real – integer ;

assign_stmt assign_stmt id := expr ;

id := expr ; expr expr operator term

id := expr operator term; expr expr operator term

id := expr operator term operator term; expr term

id := term operator term operator term; term id

id := id operator term operator term; operator +

id := id + term operator term; term real

id := id + real operator term; operator -

id := id + real - term; term integer

id := id + real - integer;

using production:

CH4.14

CSE244

Example GrammarExample Grammar

expr expr op expr

expr ( expr )

expr - expr

expr id

op +

op -

op *

op /

op

Black : NT

Blue : T

expr : S

9 Production rules

To simplify / standardize notation, we offer a synopsis of terminology.

CH4.15

CSE244

Example Grammar - TerminologyExample Grammar - Terminology

Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings

Non Terminals: A,B,C,S, black strings

T or NT: X,Y,Z

Strings of Terminals: u,v,…,z in T*

Strings of T / NT: , in ( T NT)*

Alternatives of production rules:

A 1; A 2; …; A k; A 1 | 2 | … | 1

First NT on LHS of 1st production rule is designated as start symbol !

E E A E | ( E ) | -E | id

A + | - | * | / |

CH4.16

CSE244

Grammar ConceptsGrammar Concepts

A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.

EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E

EXAMPLE: E E A E E * E E * ( E )

DEFINITION: derives in one step

derives in one step

derives in zero steps

+

*

EXAMPLES: A if A is a production rule

1 2 … n 1 n ; for all

If and then

* *

**

CH4.17

CSE244

How does this relate to Languages?How does this relate to Languages?

Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G) S W.

+

+

W : is a sentence of G

When S (and may have NTs) it is called a sentential form of G.

*

EXAMPLE: id * id is a sentence

Here’s the derivation:

E E A E E * E id * E id * id

E id * idSentential forms

CH4.18

CSE244

lm

Other Derivation ConceptsOther Derivation Concepts

Leftmost: Replace the leftmost non-terminal symbol

E E A E id A E id * E id * id

Rightmost: Replace the rightmost non-terminal symbol

E E A E E A id E * id id * idrm

lmlmlmlm

rmrmrm

Important Notes: A

If A , what’s true about ?

If A , what’s true about ?rm

Derivations: Actions to parse input can be represented pictorially in a parse tree.

CH4.19

CSE244

Examples of LM / RM DerivationsExamples of LM / RM Derivations

E E A E | ( E ) | -E | id

A + | - | * | / |

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id

CH4.20

CSE244

Derivations & Parse TreeDerivations & Parse Tree

E

EE A

E

EE A

*

E

EE A

id *

E

EE A

id id*

E * E

E E A E

id * E

id * id

CH4.21

CSE244

Parse Trees and DerivationsParse Trees and Derivations

Consider the expression grammar:

E E+E | E*E | (E) | -E | id

Leftmost derivations of id + id * id

E E + E E + E id + E

E

EE +

id

E

EE *

id + E id + E * E

E

EE +

id

E

EE +

CH4.22

CSE244

Parse Tree & Derivations - continuedParse Tree & Derivations - continued

E

EE *

id + E * E id + id * E

E

EE +

id

id

id + id * E id + id * id E

EE *

E

EE +

id

idid

CH4.23

CSE244

Alternative Parse Tree & DerivationAlternative Parse Tree & Derivation

E E * E

E + E * E

id + E * E

id + id * E

id + id * id

E

E E+

E

E E*

id

id id

WHAT’S THE ISSUE HERE ?

Two distinct leftmost derivations!

CH4.24

CSE244

Resolving Grammar Problems/DifficultiesResolving Grammar Problems/Difficulties

Regular Expressions : Basis of Lexical Analysis

Reg. Expr. generate/represent regular languages

Reg. Languages smallest, most well defined class

of languages

Context Free Grammars: Basis of Parsing

CFGs represent context free languages

CFLs contain more powerful languages

Reg. Lang. CFLsanbn – CFL that’s not regular.

CH4.25

CSE244

Resolving Problems/Difficulties – (2)Resolving Problems/Difficulties – (2)

Since Reg. Lang. Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA.

start0 3b21 ba

a

b

Recall: (a | b)*abb

CH4.26

CSE244

Resolving Problems/Difficulties – (3)Resolving Problems/Difficulties – (3)

Construct CFG as follows:

1. Each State I has non-terminal Ai : A0, A1, A2, A3

2. If then Ai a Aj

3. If then Ai bAj

4. If I is an accepting state, Ai : A3

5. If I is a starting state, Ai is the start symbol : A0

i ja

i jb

: A0 aA0, A0 aA1

: A0 bA0, A1 bA2

: A2 bA3

T={a,b}, NT={A0, A1, A2, A3}, S = A0

PR ={ A0 aA0 | aA1 | bA0 ; A1 bA2 ; A2 bA3 ; A3 }

start 0 3b21 ba

a

b

CH4.27

CSE244

How Does This CFG Derive Strings ?How Does This CFG Derive Strings ?

start0 3b21 ba

a

b

vs. A0 aA0, A0 aA1

A0 bA0, A1 bA2

A2 bA3, A3

How is abaabb derived in each ?

CH4.28

CSE244

Regular Expressions vs. CFGsRegular Expressions vs. CFGs

Regular expressions for lexical syntax

1. CFGs are overkill, lexical rules are quite simple and straightforward

2. REs – concise / easy to understand

3. More efficient lexical analyzer can be constructed

4. RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion.

CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, …

Intended for structural associations between tokens !

Are tokens in correct order ?

CH4.29

CSE244

Resolving Grammar Difficulties : Resolving Grammar Difficulties : MotivationMotivation

The structure of a grammar affects the compiler designrecall “syntax-directed” translation

• Different parsing approaches have different needs

Top-Down vs. Bottom-Up

redesigning a grammar may assist in producing better parsing methods.

- ambiguity- -moves- cycles- left recursion- left factoring

Grammar Problems

CH4.30

CSE244

Resolving Problems: Ambiguous Resolving Problems: Ambiguous GrammarsGrammars

Consider the following grammar segment:stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

What’s problem here ?

Let’s consider a simple parse tree:

stmt

stmt

stmtexpr

exprE1

E2S3

S1

S2

then

then

else

else

if

ifstmt stmt

Else must match to previous then. Structure indicates parse subtree for expression.

CH4.31

CSE244

Example : What Happens with this string?Example : What Happens with this string?

If E1 then if E2 then S1 else S2

How is this parsed ?

if E1 then

if E2 then

S1

else

S2

if E1 then

if E2 then

S1

else

S2

vs.

What’s the issue here ?

CH4.32

CSE244

Parse Trees for ExampleParse Trees for Example

Form 1:

stmt

stmt

stmtexpr

E1 S2

then elseif

expr

E2S1

thenifstmt

stmt

expr

E1

thenif stmt

expr

E2S2S1

then elseif

stmt stmt

Form 2:

What’s the issue here ?

CH4.33

CSE244

Removing AmbiguityRemoving Ambiguity

Take Original Grammar:

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

Or to write more simply:

S i E t S | i E t S e S | sE a

The problem string: i a t i a t s e s

CH4.34

CSE244

Revise to remove ambiguity:

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

S M | UM i E t M e M | sU i E t S | i E t M e U E a

S i E t S | i E t S e S | sE a

i a t i a t s e sTry the above on

CH4.35

CSE244

Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion

A left recursive grammar has rules that support the derivation : A A, for some .+

Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.

A A A A … etc. A A |

Take left recursive grammar:

A A | To the following:

A A’

A’ A’ |

CH4.36

CSE244

Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?

Consider: E E + T | T T T * F | F F ( E ) | id

Derive : id + id + id

E E + T

How can left recursion be removed ?

E E + T | T What does this generate?

E E + T T + T

E E + T E + T + T T + T + T

How does this build strings ?

What does each string have to start with ?

CH4.37

CSE244

Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)

Informal Discussion:

Take all productions for A and order as:

A A1 | A2 | … | Am | 1 | 2 | … | n

Where no i begins with A.

Now apply concepts of previous slide:

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T

T T * F | F

F ( E ) | id

E TE’ E’ + TE’ | T FT’

T’ * FT’ | F ( E ) | id

CH4.38

CSE244

Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)

Problem: If left recursion is two-or-more levels deep, this isn’t enough

S Aa | b

A Ac | Sd | S Aa Sda

Algorithm:Input: Grammar G with ordered Non-Terminals A1, ..., An

Output: An equivalent grammar with no left recursion

1. Arrange the non-terminals in some order A1=start NT,A2,…An

2. for i := 1 to n do begin

for j := 1 to i – 1 do begin

replace each production of the form Ai Aj

by the productions Ai 1 | 2 | … | k

where Aj 1|2|…|k are all current Aj productions; end

eliminate the immediate left recursion among Ai productions end

CH4.39

CSE244

Using the AlgorithmUsing the Algorithm

Apply the algorithm to: A1 A2a | b|

A2 A2c | A1d

i = 1

For A1 there is no left recursion

i = 2

for j=1 to 1 do

Take productions: A2 A1 and replace with

A2 1 | 2 | … | k |

where A1 1 | 2 | … | k are A1 productions

in our case A2 A1d becomes A2 A2ad | bd | dWhat’s left: A1 A2a | b |

A2 A2 c | A2 ad | bd | dAre we done ?

CH4.40

CSE244

Using the Algorithm (2)Using the Algorithm (2)

No ! We must still remove A2 left recursion !

A1 A2a | b |

A2 A2 c | A2 ad | bd | d

Recall:

A A1 | A2 | … | Am | 1 | 2 | … | n

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ |

Apply to above case. What do you get ?

CH4.41

CSE244

Removing Difficulties : Removing Difficulties : -Moves-Moves

Transformation: In order to remove A find all rules of the form B uAv and add the rule B uv to the grammar G.

Why does this work ?

E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

Examples:

A1 A2 a | b

A2 bd A2’ | A2’

A2’ c A2’ | bd A2’ |

CH4.42

CSE244

Removing Difficulties : CyclesRemoving Difficulties : Cycles

How would cycles be removed ?

Make sure every production is addingsome terminal(s) (except a single -production in the start NT)…

e.g.

S SS | ( S ) |

Has a cycle: S SS S

S Transform to:

S S ( S ) | ( S ) |

Transformation: substitute A uBv by the productions A u1v| ... | unv assuming that all the productions for B are B 1| ... | n

CH4.43

CSE244

Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring

Problem : Uncertain which of 2 rules to choose:

stmt if expr then stmt else stmt

| if expr then stmt

When do you know which one is valid ?

What’s the general form of stmt ?

A 1 | 2 : if expr then stmt

1: else stmt 2 :

Transform to:

A A’

A’ 1 | 2

EXAMPLE:

stmt if expr then stmt rest

rest else stmt |

CH4.44

CSE244

Resolving Grammar ProblemsResolving Grammar Problems

Note: Not all aspects of a programming language can be represented by context free grammars / languages.

Examples:

1. Declaring ID before its use

2. Valid typing within expressions

3. Parameters in definition vs. in call

These features are call context-sensitive and define yet another language class, CSL.

Reg. Lang. CFLs CSLs

CH4.45

CSE244

Context-Sensitive Languages - ExamplesContext-Sensitive Languages - Examples

Examples:

L1 = { wcw | w is in (a | b)* } : Declare before use

L2 = { an bm cn dm | n 1, m 1 }

an bm : formal parameter

cn dm : actual parameter

What does it mean when L is context-sensitive ?

CH4.46

CSE244

How do you show a Language is a CFL?How do you show a Language is a CFL?

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n 1, m 1 }

L5 = { an bn cm dm | n 1, m 1 }

L6 = { an bn | n 1 }

CH4.47

CSE244

SolutionsSolutions

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n 1, m 1 }

L5 = { an bn cm dm | n 1, m 1 }

L6 = { an bn | n 1 }

S a S a | b S b | c

S a S d | a A d

A b A c | bc

S XY

X a X b | ab

Y c Y d | cd

S a S b | ab

CH4.48

CSE244

Example of CS GrammarExample of CS Grammar

L2 = { an bm cn dm | n 1, m 1 }

S a A c HA a A c | B B b B D | b D D c c DD D H D H dc D H c d

CH4.49

CSE244

Top-Down ParsingTop-Down Parsing

• Identify a leftmost derivation for an input string

• Why ?

• By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input.

• A aBc adDc adec (scan a, scan d, scan e, scan c - accept!)

• Recursive-descent parsing concepts

•Predictive parsing

• Recursive / Brute force technique

• non-recursive / table driven

• Error recovery

• Implementation

CH4.50

CSE244


From Grammar to Parser, take IFrom Grammar to Parser, take I

CH4.51

CSE244

Recursive Descent ParsingRecursive Descent Parsing

• General category of Parsing Top-Down

• Choose production rule based on input symbol

• May require backtracking to correct a wrong choice.

• Example: S c A d A ab | a

input: cad

cad S

c dA

cadS

c dA

a b

cadS

c dA

a b Problem: backtrack

cadS

c dA

a

cadS

c dA

a

CH4.52

CSE244


From Grammar to Parser, take IIFrom Grammar to Parser, take II

CH4.53

CSE244

Predictive ParsingPredictive Parsing

•Backtracking is bad!

•To eliminate backtracking, what must we do/be sure of for grammar?• no left recursion• apply left factoring

• (frequently) when grammar satisfies above conditions:current input symbol in conjunction with current non-terminal uniquely determines the production that needs to be applied.

• Utilize transition diagrams:

For each non-terminal of the grammar do following:

1. Create an initial and final state

2. If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn

• Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.

CH4.54

CSE244

Transition DiagramsTransition Diagrams

• Unlike lexical equivalents, each edge represents a token•Transition implies: if token, match input else call proc• Recall earlier grammar and its associated transition diagrams

E TE’ E’ + TE’ |

T FT’ T’ * FT’ |

F ( E ) | id

0 21T E’

E:

3 6+

4T

E’: 5E’

7 98F T’

T:

10 13*

11F

T’: 12T’

14 17(

15E

F: 16)

id

How are transition diagrams used ?

Are -moves a problem ?

Can we simplify transition diagrams ?

Why is simplification critical ?

CH4.55

CSE244

How are Transition Diagrams Used ?How are Transition Diagrams Used ?

main(){ TD_E();}

TD_T(){ TD_F(); TD_T’();}

TD_E(){ TD_T(); TD_E’();}

TD_E’(){ token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); }}

TD_F(){ token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else ...}

TD_E’(){ token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); }}

What happened to -moves?

… “else unget()and terminate”

NOTE: not all error conditions have been represented.

CH4.56

CSE244

How can Transition Diagrams be How can Transition Diagrams be Simplified ?Simplified ?

6E’

E’: 53+

4T

CH4.57

CSE244

How can Transition Diagrams be How can Transition Diagrams be Simplified ? (2)Simplified ? (2)

6E’

E’: 53+

4T

E’: 53+

4T

6

CH4.58

CSE244


6E’

E’: 53+

4T

E’: 53+

4T

6

E’: 3+

4

T

6

CH4.59

CSE244


6E’

E’: 53+

4T

E’: 53+

4T

6

E’: 3+

4

T

6

21E’T

0E:

CH4.60

CSE244


6E’

E’: 53+

4T

E’: 53+

4T

6

E’: 3+

4

T

6

21E’T

0E:

T0E: 3

+4

T

6

CH4.61

CSE244

Additional Transition Diagram Additional Transition Diagram SimplificationsSimplifications

• Similar steps for T and T’

• Simplified Transition diagrams:*

F7T: 10

13

T’: 10*

11

F

13

14 17(

15E

F: 16)

id

Why is simplification important ?

How does code change?

CH4.62

CSE244


From Grammar to Parser, take IIIFrom Grammar to Parser, take III

CH4.63

CSE244

Motivating Table-Driven ParsingMotivating Table-Driven Parsing

1. Left to right scan input

2. Find leftmost derivation

Grammar: E TE’

E’ +TE’ | T id

Input : id + id $

Derivation: E

Processing Stack:

Terminator

CH4.64

CSE244

Non-Recursive / Table DrivenNon-Recursive / Table Driven

Empty stack symbol

a + b $

Y

X

$

Z

Input

Predictive Parsing Program

Stack Output

Parsing Table M[A,a]

(String + terminator)

NT + T symbols of CFG What actions parser

should take based on stack / input

General parser behavior: X : top of stack a : current input

1. When X=a = $ halt, accept, success

2. When X=a $ , POP X off stack, advance input, go to 1.

3. When X is a non-terminal, examine M[X,a]

if it is an error call recovery routineif M[X,a] = {X UVW}, POP X, PUSH W,V,UDO NOT expend any input

CH4.65

CSE244

Algorithm for Non-Recursive ParsingAlgorithm for Non-Recursive Parsing

Set ip to point to the first symbol of w$;

repeat

let X be the top stack symbol and a the symbol pointed to by ip;

if X is terminal or $ then

if X=a then

pop X from the stack and advance ip

else error()

else /* X is a non-terminal */

if M[X,a] = XY1Y2…Yk then begin

pop X from stack;

push Yk, Yk-1, … , Y1 onto stack, with Y1 on top

output the production XY1Y2…Yk

end

else error()

until X=$ /* stack is empty */

Input pointer

May also execute other code based on the production used

CH4.66

CSE244

ExampleExample


Our well-worn example !

Table M

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

ETE’

TFT’

Fid

E’+TE’

T’ T’*FT’

F(E)

TFT’

ETE’

T’

E’ E’

T’

CH4.67

CSE244

Trace of ExampleTrace of Example

STACK INPUT OUTPUT

CH4.68

CSE244

Trace of ExampleTrace of Example

Expend Input

$E

$E’T$E’T’F$E’T’id$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’id$E’T’$E’$

id + id * id$

id + id * id$id + id * id$id + id * id$

+ id * id$+ id * id$+ id * id$

id * id$id * id$id * id$

* id$* id$

id$id$

$$$

E TE’T FT’F id

T’ E’ +TE’

T FT’F id

T’ *FT’

F id

T’ E’

STACK INPUT OUTPUT

CH4.69

CSE244

Leftmost Derivation for the ExampleLeftmost Derivation for the Example

The leftmost derivation for the example is as follows:

E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’

id + id T’E’ id + id * FT’E’ id + id * id T’E’

id + id * id E’ id + id * id

CH4.70

CSE244

What’s the Missing Puzzle Piece ?What’s the Missing Puzzle Piece ?

Constructing the Parsing Table M !

1st : Calculate First & Follow for Grammar

2nd: Apply Construction Algorithm for Parsing Table ( We’ll see this shortly )

Basic Tools:

First: Let be a string of grammar symbols. First() is the set that includes every terminal that appears leftmost in or in any string originating from . NOTE: If , then is First( ).

Follow: Let A be a non-terminal. Follow(A) is the set of terminals a that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A).

*

*

*

CH4.71

CSE244

Motivation Behind First & FollowMotivation Behind First & Follow

First:

Follow:

Is used to help find the appropriate reduction to follow given the top-of-the-stack non-terminal and the current input symbol.

Example: If A , and a is in First(), then when a=input, replace A with (in the stack).

( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .

Is used when First has a conflict, to resolve choices, or when First gives no suggestion. When or , then what follows A dictates the next choice to be made.

Example: If A , and b is in Follow(A ), then when and b is an input character, then we expand A with , which will eventually expand to , of which b follows!

( : i.e., First( ) contains .)

*

*

*

CH4.72

CSE244

An example.An example.

$S abbd$

STACK INPUT OUTPUT

S a B C d B CB | | S aC b

CH4.73

CSE244

Computing First(X) : Computing First(X) : All Grammar SymbolsAll Grammar Symbols

1. If X is a terminal, First(X) = {X}

2. If X is a production rule, add to First(X)

3. If X is a non-terminal, and X Y1Y2…Yk is a production rule

Place First(Y1) in First(X)

if Y1 , Place First(Y2) in First(X)

if Y2 , Place First(Y3) in First(X)

…

if Yk-1 , Place First(Yk) in First(X)

NOTE: As soon as Yi , Stop.

Repeat above steps until no more elements are added to any First( ) set.

Checking “Yj ?” essentially amounts to checking whether belongs to First(Yj)

*

*

*

*

*

CH4.74

CSE244

Computing First(X) : Computing First(X) : All Grammar Symbols - continuedAll Grammar Symbols - continued

Informally, suppose we want to compute

First(X1 X2 … Xn ) = First (X1) “+”

First(X2) if is in First(X1) “+”

First(X3) if is in First(X2) “+”

…

First(Xn) if is in First(Xn-1)

Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i

Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !

CH4.75

CSE244

Example 1Example 1

Given the production rules:

S i E t SS’ | a

S’ eS |

E b

CH4.76

CSE244

Example 1Example 1

Given the production rules:

S i E t SS’ | a

S’ eS |

E b

Verify that

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

CH4.77

CSE244

Example 2Example 2

Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

CH4.78

CSE244

Example 2Example 2

Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

First(E)

First(TE’)

First(T)

First(T) “+” First(E’)

First(F)

First((E)) “+” First(id)

First(F) “+” First(T’)

“(“ and “id”

Not First(E’) since T

Not First(T’) since F

*

*

Overall: First(E) = { ( , id } = First(F)

First(E’) = { + , } First(T’) = { * , }

First(T) First(F) = { ( , id }

CH4.79

CSE244

Computing Follow(A) : Computing Follow(A) : All Non-TerminalsAll Non-Terminals

1. Place $ in Follow(S), where S is the start symbol and $ signals end of input

2. If there is a production A B, then everything in First() is in Follow(B) except for .

3. If A B is a production, or A B and (First() contains ), then everything in Follow(A) is in Follow(B)

(Whatever followed A must follow B, since nothing follows B from the production rule)

*

We’ll calculate Follow for two grammars.

CH4.80

CSE244

The Algorithm for Follow – pseudocodeThe Algorithm for Follow – pseudocode

1. Initialize Follow(X) for all non-terminals X1. Initialize Follow(X) for all non-terminals Xto empty set. Place $ in Follow(S), where S is the start to empty set. Place $ in Follow(S), where S is the start

NT.NT.2. Repeat the following step until no modifications are 2. Repeat the following step until no modifications are

made to any Follow-setmade to any Follow-setFor any production For any production X X1 X2 … Xm

For j=1 to m,For j=1 to m,

if if Xj is a non-terminal then:is a non-terminal then:

Follow(Follow(Xj)=Follow(Follow(Xj)(First(Xj+1,…,Xm)-{});

If If First(Xj+1,…,Xm) contains or Xj+1,…,Xm=

then Follow(then Follow(Xj)=Follow(Follow(Xj) Follow(Follow(X);

CH4.81

CSE244

Computing Follow : 1Computing Follow : 1stst Example Example

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Recall:

CH4.82

CSE244

Computing Follow : 1Computing Follow : 1stst Example Example

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Recall:

Follow(S) – Contains $, since S is start symbol

Since S i E t SS’ , put in First(S’) – not

Since S’ , Put in Follow(S)

Since S’ eS, put in Follow(S’) So…. Follow(S) = { e, $ }

Follow(S’) = Follow(S) HOW?

Follow(E) = { t }

*

CH4.83

CSE244

Example 2Example 2

Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

CH4.84

CSE244

Example 2Example 2

Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

FollowE $ )E’ $ )T + $ )T’ + $ )F + * $ )

FirstE ( idE’ +

T ( idT’ *

F ( id

CH4.85

CSE244

Constructing Parsing TableConstructing Parsing Table

Algorithm:

Table has one row per non-terminal / one column per terminal (incl. $ )

1. Repeat Steps 2 & 3 for each rule A

2. Terminal a in First()? Add A to M[A, a ]

3.1 in First()? Add A to M[A, b ] for all terminals b in Follow(A).

3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]

4. All undefined entries are errors.

CH4.86

CSE244

Constructing Parsing Table – Example 1Constructing Parsing Table – Example 1

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Follow(S) = { e, $ }

Follow(S’) = { e, $ }

Follow(E) = { t }

CH4.87

CSE244


S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Follow(S) = { e, $ }

Follow(S’) = { e, $ }

Follow(E) = { t }

S i E t SS’ S a E b

First(i E t SS’)={i} First(a) = {a} First(b) = {b}

S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }

INPUT SYMBOL

a $tie

S

S’

E

bNon-terminal

S a S iEtSS’

S

E b

S’ S’ eS

CH4.88

CSE244



First(E,F,T) = { (, id }

First(E’) = { +, }First(T’) = { *, }

Follow(E,E’) = { ), $}

Follow(F) = { *, +, ), $ }Follow(T,T’) = { +, ) , $}

CH4.89

CSE244



First(E,F,T) = { (, id }

First(E’) = { +, }First(T’) = { *, }

Follow(E,E’) = { ), $}

Follow(F) = { *, +, ), $ }Follow(T,T’) = { +, ) , $}

Expression Example: E TE’ : First(TE’) = First(T) = { (, id }

M[E, ( ] : E TE’

M[E, id ] : E TE’

(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’

(by rule 3) E’ : in First( ) T’ : in First( )

by rule 2

M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)

M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)

(Due to Follow(E’) M[T’, $] : T’ (3.2)

CH4.90

CSE244

LL(1) GrammarsLL(1) Grammars

L : Scan input from Left to Right

L : Construct a Leftmost Derivation

1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action

LL(1) grammars == they have no multiply-defined entries in the parsing table.

Properties of LL(1) grammars:

• Grammar can’t be ambiguous or left recursive• Grammar is LL(1) when A 1. & do not derive strings starting with the same terminal a 2. Either or can derive , but not both.

Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar

CH4.91

CSE244

Error RecoveryError Recovery

a + b $

Y

X

$

Z

Input

Predictive Parsing Program

Stack Output

Parsing Table M[A,a]

When Do Errors Occur? Recall Predictive Parser Function:

1. If X is a terminal and it doesn’t match input.

2. If M[ X, Input ] is empty – No allowable actions

Consider two recovery techniques:

A. Panic Mode

B. Phrase-level Recovery

CH4.92

CSE244

Panic-Mode RecoveryPanic-Mode Recovery

Assume a non-terminal on the top of the stack.Assume a non-terminal on the top of the stack. Idea: skip symbols on the input until a token in a selected Idea: skip symbols on the input until a token in a selected

set of set of synchronizing synchronizing tokens is found.tokens is found. The choice for a synchronizing set is important.The choice for a synchronizing set is important.

some ideas: define the synchronizing set of A to be FOLLOW(A).

then skip input until a token in FOLLOW(A) appears and then pop A from the stack. Resume parsing...

add symbols of FIRST(A) into synchronizing set. In this case we skip input and once we find a token in FIRST(A) we resume parsing from A.

Productions that lead to if available might be used. If a terminal appears on top of the stack and does not match If a terminal appears on top of the stack and does not match

to the input == pop it and and continue parsing (issuing an to the input == pop it and and continue parsing (issuing an error message saying that the terminal was inserted).error message saying that the terminal was inserted).

CH4.93

CSE244

Panic Mode Recovery, IIPanic Mode Recovery, II

General Approach: Modify the empty cells of the Parsing Table.

1. if M[A,a] = {empty} and a belongs to Follow(A) then we set M[A,a] = “synch”

Error-recovery Strategy :

If A=top-of-the-stack and a=current-input,

1. If A is NT and M[A,a] = {empty} then skip a from the input.

2. If A is NT and M[A,a] = {synch} then pop A.

3. If A is a terminal and A!=a then pop token (essentially inserting it).

CH4.94

CSE244

Revised Parsing Table / ExampleRevised Parsing Table / Example

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

ETE’

TFT’

Fid

E’+TE’

T’ T’*FT’

F(E)

TFT’

ETE’

T’

E’ E’

T’

From Follow sets. Pop top of stack NT

“synch” action

Skip input symbol

CH4.95

CSE244

Revised Parsing Table / Example(2)Revised Parsing Table / Example(2)

$E$E$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’$

+ id * + id$

id * + id$ id * + id$ id * + id$

id * + id$ * + id$ * + id$

+ id$ + id$

+ id$ + id$

id$ id$ id$

$ $ $

STACK INPUT Remark

error, M[F,+] = synchF has been popped

error, skip +

PossibleError Msg:“Misplaced +I am skipping it”

PossibleError Msg:“Missing Term”

CH4.96

CSE244

Writing Error MessagesWriting Error Messages

Keep input counter(s)Keep input counter(s) Recall: every non-terminal symbolizes an abstract language Recall: every non-terminal symbolizes an abstract language

construct.construct. Examples of Error-messages for our usual grammarExamples of Error-messages for our usual grammar

E = means expression. top-of-stack is E, input is +

“Error at location i, expressions cannot start with a ‘+’” or “error at location i, invalid expression”

Similarly for E, * E’= expression ending.

Top-of-stack is E’, input is * or id“Error: expression starting at j is badly formed at location i”

Requires: every time you pop an ‘E’ remember the location

CH4.97

CSE244

Writing Error-Messages, IIWriting Error-Messages, II

Messages for Synch Errors. Messages for Synch Errors. Top-of-stack is F input is +

“error at location i, expected summation/multiplication term missing”

Top-of-stack is E input is ) “error at location i, expected expression missing”

CH4.98

CSE244

Writing Error Messages, IIIWriting Error Messages, III

When the top-of-the stack is a terminal that does When the top-of-the stack is a terminal that does not match…not match… E.g. top-of-stack is id and the input is +

“error at location i: identifier expected” Top-of-stack is ) and the input is terminal other

than ) Every time you match an ‘(‘

push the location of ‘(‘ to a “left parenthesis” stack.– this can also be done with the symbol stack.

When the mismatch is discovered look at the left parenthesis stack to recover the location of the parenthesis.

“error at location i: left parenthesis at location m has no closing right parenthesis”– E.g. consider ( id * + (id id) $

CH4.99

CSE244

Incorporating Error-Messages to the TableIncorporating Error-Messages to the Table

Empty parsing table entries can now fill with the Empty parsing table entries can now fill with the appropriate error-reporting techniques.appropriate error-reporting techniques.

CH4.100

CSE244

Phrase-Level RecoveryPhrase-Level Recovery

• Fill in blanks entries of parsing table with error handling routines that do not only report errors but may also:

• change/ insert / delete / symbols into the stack and / or input stream• + issue error message

• Problems:

• Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language• infinite loops must be avoided

• Essentially extends panic mode to have more complete error handling

CH4.101

CSE244

How Would You Implement TD ParserHow Would You Implement TD Parser

• Stack – Easy to handle. Write ADT to manipulate its contents

• Input Stream – Responsibility of lexical analyzer

• Key Issue – How is parsing table implemented ?One approach: Assign unique IDS

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

ETE’

TFT’

Fid

E’+TE’

T’ T’*FT’

F(E)

TFT’

ETE’

T’

E’ E’

T’

synch

synch synch

synch

synch

synch synch

synch

synch

All rules have unique IDs Ditto for synch

actions

Also for blanks which handle errors

CH4.102

CSE244

Revised Parsing Table:Revised Parsing Table:

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

4

6

1

3

6

1

2 3

4

6

8 7 17161514

1312

10

11

9

24

23

222120

18 19

255

1 ETE’2 E’+TE’3 E’4 TFT’5 T’*FT’6 T’7 F(E)8 Fid

9 – 17 : Sync

Actions

18 – 25 : Error

Handlers

CH4.103

CSE244

Revised Parsing Table: (2)Revised Parsing Table: (2)

Each # ( or set of #s) corresponds to a procedure that:

• Uses Stack ADT

• Gets Tokens

• Prints Error Messages

• Prints Diagnostic Messages

• Handles Errors

CH4.104

CSE244

How is Parser Constructed ?How is Parser Constructed ?

One large CASE statement:state = M[ top(s), current_token ]switch (state) { case 1: proc_E_TE’( ) ; break ; … case 8: proc_F_id( ) ; break ; case 9: proc_sync_9( ) ; break ; … case 17: proc_sync_17( ) ; break ; case 18: … case 25: }

Procs to handle errors

Some error handlers may be similar

Combine put in another switch

Some sync actions may be same

CH4.105

CSE244

Final Comments – Top-Down ParsingFinal Comments – Top-Down Parsing

So far,

• We’ve examined grammars and language theory and its relationship to parsing

• Key concepts: Rewriting grammar into an acceptable form

• Examined Top-Down parsing:

Brute Force : Transition diagrams & recursion

Elegant : Table driven

• We’ve identified its shortcomings:

Not all grammars can be made LL(1) !

• Bottom-Up Parsing - Future

Documents

CH4.1 CSE244 Sections 4.1-4.4: Syntax Analysis Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road