Chapter 4-2 Chang Chi-Chung 2007.06.07. Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation) LR(0), SLR, Canonical LR = LR(1), LALR Other

Chapter 4-2

Chang Chi-Chung

2007.06.07

Bottom-Up Parsing

LR methods (Left-to-right, Rightmost derivation) LR(0), SLR, Canonical LR = LR(1), LALR

Other special cases: Shift-reduce parsing Operator-precedence parsing

Operator-Precedence Parsing

Special case of shift-reduce parsing See textbook section 4.6

Bottom-Up Parsing

E → T → T * F → T * id → F * id → id * id

rightmost derivation

reduction

E → E + T | T T → T * F | F F → ( E ) | id

Handle Pruning Handle

A handle is a substring of grammar symbols in aright-sentential form that matches a right-hand side of a production

Right Sentential Form Handle Reducing Production

id1 * id2 id1 F → id

F * id2 F T → F

T * id2 id2 F → id

T * F T * F E → T * F

E → E + T | T T → T * F | F F → ( E ) | id

Handle Pruning A rightmost derivation in reverse can be obtai

ned by “handle pruning”S = γ0 →rm γ1 → rm γ2 →rm …. →rm γn-1 →rm γn =ω

Handle definition S →*rm αAω →*rm αβω

S

A

α ωβ

A handle A →β in the parse tree for αβω

Example: Handle

Handle

GrammarS a A B eA A b c | bB d

NOT a handle, becausefurther reductions will fail

(result is not a sentential form)

a b b c d ea A b c d ea A A e… ?

a b b c d ea A b c d ea A d ea A B eS

Shift-Reduce Parsing Shift-Reduce Parsing is a form of bottom-up

parsing A stack holds grammar symbols An input buffer holds the rest of the string to

parsed. Shift-Reduce parser action

shift reduce accept error

Shift-Reduce Parsing Shift

Shift the next input symbol onto the top of the stack. Reduce

The right end of the string to be reduced must be at the top of the stack.

Locate the left end of the string within the stack and decide with what nonterminal to replace the string.

Accept Announce successful completion of parsing

Error Discover a syntax error and call recovery routine

Shift-Reduce Parsing

Stack Input Action

$ id1 * id2 $ shift

$id1 * id2 $ reduce by F → id

$F * id2 $ reduce by T → F

$T * id2 $ shift

$T * id2 $ shift

$T * id2 $ reduce by F → id

$T * F $ reduce by T → T * F

$T $ reduce by E → T

$E $ accept

E → E + T | T T → T * F | F F → ( E ) | id

Example: Shift-Reduce ParsingGrammar:S a A B eA A b c | bB d

Shift-reduce correspondsto a rightmost derivation:S rm a A B e rm a A d e rm a A b c d e rm a b b c d e

Reducing a sentence:a b b c d ea A b c d ea A d ea A B eS

S

a b b c d e

A

A

B

a b b c d e

A

A

B

a b b c d e

A

A

a b b c d e

A

These matchproduction’s

right-hand sides

Conflicts During Shift-Reduce Parsing Conflicts Type

shift-reduce reduce-reduce

Shift-reduce and reduce-reduce conflicts are caused by The limitations of the LR parsing method (even

when the grammar is unambiguous) Ambiguity of the grammar

Shift-Reduce Conflict

Stack$$id$E$E+$E+id$E+E$E+E*$E+E*id$E+E*E$E+E$E

Inputid+id*id$

+id*id$+id*id$

id*id$*id$*id$

id$$$$$

Actionshiftreduce E idshiftshiftreduce E idshift (or reduce?)shiftreduce E idreduce E E * Ereduce E E + Eaccept

GrammarE E + EE E * EE ( E )E id

Find handlesto be reduced

How toresolve

conflicts?

Shift-Reduce Conflict

Stack$…$…if E then S

Input…$

else…$

Action…shift or reduce?

Ambiguous grammar:S if E then S | if E then S else S | other

Resolve in favorof shift, so else

matches closest if

Shift else to if E then S or Reduce if E then S

Reduce-Reduce Conflict

Stack$$a

Inputaa$

a$

Actionshiftreduce A a or B a ?

GrammarC A BA aB a

Resolve in favorof reduce A a,

otherwise we’re stuck!動彈不得

LR LR parser are table-driven

Much like the nonrecursive LL parsers. The reasons of the using the LR parsing

An LR-parsering method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other.

LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written.

An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.

The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive or LL methods.

LR(0) An LR parser makes shift-reduce decisions b

y maintaining states to keep track. An item of a grammar G is a production of G

with a dot at some position of the body. Example

A → X Y Z

A → ． X Y Z

A → X ． Y Z

A → X Y ． Z

A → X Y Z ．

A → X ． Y Z

stack next derivations with input strings

items

Note that production A has one item [A •]

LR(0) Canonical LR(0) Collection

One collection of sets of LR(0) items Provide the basis for constructing a DFA that is us

ed to make parsing decisions. LR(0) automation

The canonical LR(0) collection for a grammar Augmented the grammar

If G is a grammar with start symbol S, then G’ is the augmented grammar for G with new start symbol S’ and new production S’ → S

Closure function Goto function

Use of the LR(0) Automaton

Function Closure

If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I.

Create closure(I) by the two rules: add every item in I to closure(I) If A→α ． Bβ is in closure(I) and B →γ is a production, then

add the item B → ． γ to closure(I). Apply this rule untill no more new items can be added to closure(I).

Divide all the sets of items into two classes Kernel items

initial item S’ → ． S, and all items whose dots are not at the left end. Nonkernel items

All items with their dots at the left end, except for S’ → ． S

Example The grammar G

E’ → E E → E + T | T T → T * F | F F → ( E ) | id

Let I = { E’ → ． E } , then closure(I) = {

E’ → ． E E → ． E + T E → ． T T → ． T * F T → ． F F → ． ( E ) F → ． id }

Exercise The grammar G

E’ → E

E → E + T | T

T → T * F | F

F → ( E ) | id Let I = { E → E + ． T }

Function Goto Function Goto(I, X)

I is a set of items X is a grammar symbol Goto(I, X) is defined to be the closure of the set of

all items [A α X‧β] such that [A α‧ Xβ] is in I. Goto function is used to define the transitions in th

e LR(0) automation for a grammar.

Example

I = {

E’ → E ． E → E ． + T } Goto (I, +) = {

E → E + ． T

T → ． T * F

T → ． F

F → ． (E)

F → ． id

}

The grammar G E’ → E E → E + T | T T → T * F | F F → ( E ) | id

Constructing the LR(0) Collection 1. The grammar is augmented with a new start

symbol S’ and production S’S2. Initially, set C = closure({[S’•S]})

(this is the start state of the DFA)3. For each set of items I C and each gramm

ar symbol X (NT) such thatGOTO(I, X) C and goto(I, X) , add the set of items GOTO(I, X) to C

4. Repeat 3 until no more sets can be added to C

Example: The Parse of id * idLine STACK SYMBOLS INPUT ACTION

(1) 0 $ id * id $ shift to 5

(2) 0 5 $ id * id $ reduce by F → id

(3) 0 3 $ F * id $ reduce by T → F

(4) 0 2 $ T * id $ shift to 7

(5) 0 2 7 $ T * id $ shift to 5

(6) 0 2 7 5 $ T * id $ reduce by F → id

(7) 0 2 7 10 $ T * F $ reduce by T → T * F

(8) 0 2 $ T $ reduce by E → T

(9) 0 1 $ E $ accept

Model of an LR Parser

Structure of the LR Parsing Table Parsing Table consists of two parts:

A parsing-action function ACTION A goto function GOTO

The Action function, Action[i, a], have one of four forms: Shift j, where j is a state.

The action taken by the parser shifts input a to the stack, but use state j to represent a.

Reduce A→β. The action of the parser reduces β on the top of the stack to h

ead A. Accept

The parser accepts the input and finishes parsing. Error

Structure of the LR Parsing Table(1) The GOTO Function, GOTO[Ii, A], defined on

sets of items. If GOTO[Ii, A] = Ij, then GOTO also maps a state i

and a nonterminal A to state j.

LR-Parser ConfigurationsConfiguration ( = LR parser state):

($s0 s1 s2 … sm, ai ai+1 … an $)

stack input

($ X1 X2 … Xm, ai ai+1 … an $)

If action[sm, ai] = shift s then push s (ai), and advance input (s0 s1 s2 … sm, ai ai+1 … an $) (s0 s1 s2 … sm s, ai+1 … an $)

If action[sm, ai] = reduce A and goto[sm-r, A] = s with r = || then pop r symbols, and push s ( push A )

( (s0 s1 s2 … sm, ai ai+1 … an $) (s0 s1 s2 … sm-r s, ai ai+1 … an $)

If action[sm, ai] = accept then stopIf action[sm, ai] = error then attempt recovery

Example LR Parse Table

Grammar:1. E E + T2. E T3. T T * F4. T F5. F ( E )6. F id

s5 s4

s6 acc

r2 s7 r2 r2

r4 r4 r4 r4

s5 s4

r6 r6 r6 r6

s5 s4

s5 s4

s6 s11

r1 s7 r1 r1

r3 r3 r3 r3

r5 r5 r5 r5

id + * ( ) $

0

1

2

3

4

5

6

7

8

9

10

11

E T F

1 2 3

8 2 3

9 3

10shift & goto 5

reduce byproduction #1

action gotostate

Line STACK SYMBOLS INPUT ACTION

(1) 0 id * id + id $ shift 5

(2) 0 5 id * id + id $ reduce 6 goto 3

(3) 0 3 F * id + id $ reduce 4 goto 2

(4) 0 2 T * id + id $ shift 7

(5) 0 2 7 T * id + id $ shift 5

(6) 0 2 7 5 T * id + id $ reduce 6 goto 10

(7) 0 2 7 10 T * F + id $ reduce 3 goto 2

(8) 0 2 T + id $ reduce 2 goto 1

(9) 0 1 E + id $ shift 6

(10)

0 1 6 E + id $ shift 5

(11)

0 1 6 5 E + id $ reduce 6 goto 3

(12)

0 1 6 3 E + F $ reduce 4 goto 9

(13)

0 1 6 9 E + T $ reduce 1 goto 1

(14)

0 1 $ E $ accept

Grammar0. S E1. E E + T2. E T3. T T * F4. T F5. F ( E )6. F id

Example

SLR Grammars SLR (Simple LR): a simple extension of LR(0)

shift-reduce parsing SLR eliminates some conflicts by populating

the parsing table with reductions A on symbols in FOLLOW(A)

S EE id + EE id

State I0:S •E E •id + EE •id

State I2:E id•+ EE id•

goto(I0,id) goto(I3,+)

FOLLOW(E)={$}thus reduce on $

Shift on +

SLR Parsing Table

Reductions do not fill entire rows Otherwise the same as LR(0)

s2

acc

s3 r3

s2

r2

id + $

0

1

2

3

4

E

1

4

1. S E2. E id + E3. E id

FOLLOW(E)={$}thus reduce on $

Shift on +

Constructing SLR Parsing Tables Augment the grammar with S’ S Construct the set C={I0, I1, …, In} of LR(0) items State i is constructed from Ii

If [A•a] Ii and goto(Ii, a)=Ij then set action[i, a]=shift j

If [A•] Ii then set action[i,a]=reduce A for all a FOLLOW(A) (apply only if AS’)

If [S’S•] is in Ii then set action[i,$]=accept

If goto(Ii, A)=Ij then set goto[i, A]=j set goto table Repeat 3-4 until no more entries added The initial state i is the Ii holding item [S’•S]

Example SLR Grammar and LR(0) ItemsAugmentedgrammar:0. C’ C1. C A B2. A a3. B a

State I0:C’ •C C •A BA •a

State I1:C’ C•

State I2:C A•BB •a

State I3:A a•

State I4:C A B•

State I5:B a•

goto(I0,C)

goto(I0,a)

goto(I0,A)

goto(I2,a)

goto(I2,B)

I0 = closure({[C’ •C]})I1 = goto(I0,C) = closure({[C’ C•]})…

start

final

Example SLR Parsing Table

s3

acc

s5

r2

r1

r3

a $

0

1

2

3

4

5

C A B

1 2

4

State I0:C’ •C C •A BA •a

State I1:C’ C•

State I2:C A•BB •a

State I3:A a•

State I4:C A B•

State I5:B a•

1

2

4

5

3

0start

a

A

CB

a

Grammar:0. C’ C1. C A B2. A a3. B a

SLR and Ambiguity Every SLR grammar is unambiguous, but not every

unambiguous grammar is SLR, maybe LR(1) Consider for example the unambiguous grammar

S L = R | RL * R | id

R L FOLLOW(R) = {=, $}

I0:S’ •S S •L=RS •RL •*R L •idR •L

I1:S’ S•

I2:S L•=RR L•

I3:S R•

I4:L *•R R •LL •*R L •id

I5:L id•

I6:S L=•RR •L L •*R L •id

I7:L *R•

I8:R L•

I9:S L=R•

action[2,=]=s6action[2,=]=r5 no

Has no SLRparsing table

LR(1) Grammars SLR too simple LR(1) parsing uses lookahead to avoid unnec

essary conflicts in parsing table LR(1) item = LR(0) item + lookahead

LR(0) item[A•]

LR(1) item [A•, a]

SLR Versus LR(1) Split the SLR states by adding LR(1) lookahead Unambiguous grammar

S L = R | RL * R | idR L

I2:S L•=RR L•

action[2,=]=s6

Should not reduce, because noright-sentential form begins with R=

split

R L•S L•=R

LR(1) Items An LR(1) item

[A•, a]contains a lookahead terminal a, meaning already on top of the stack, expect to see a

For items of the form[A•, a]

the lookahead a is used to reduce A only if the next input is a

For items of the form [A•, a]

with the lookahead has no effect

The Closure Operation for LR(1) Items Start with closure(I) = I If [A•B, a] closure(I) then

for each production B in the grammar

and each terminal b FIRST(a)

add the item [B•, b] to I

if not already in I Repeat 2 until no new items can be added

The Goto Operation for LR(1) Items For each item [A•X, a] I, add the set o

f items closure({[AX•, a]}) to goto(I,X) if not already there

Repeat step 1 until no more items can be added to goto(I,X)

Example Let I = { (S’ → •S, $) } I0 = closure(I) = { S’ → •S, $ S → • C C, $ C → •c C, c/d C → •d, c/d } goto(I0, S) = closure( {S’ → S •, $ } )

= {S’ → S •, $ } = I1

The grammar G S’ → S S → C C C → c C | d

Exercise

Let I = { (S → C •C, $) }

I2 = closure(I) = ?

I3 = goto(I2, c) = ?

The grammar G S’ → S S → C C C → c C | d

Construction of the sets of LR(1) Items Augment the grammar with a new start symbo

l S’ and production S’S Initially, set C = closure({[S’•S, $]})

(this is the start state of the DFA) For each set of items I C and each grammar

symbol X (NT) such that goto(I, X) C and goto(I, X) , add the set of items goto(I, X) to C

Repeat 3 until no more sets can be added to C

LR(1) Automation

Construction of the Canonical LR(1) Parsing Tables Augment the grammar with S’S Construct the set C={I0,I1,…,In} of LR(1) items State i of the parser is constructed from Ii

If [A•a, b] Ii and goto(Ii,a)=Ij then set action[i,a]=shift j If [A•, a] Ii then set action[i,a]=reduce A (apply only

if AS’) If [S’S•, $] is in Ii then set action[i,$]=accept

If goto(Ii,A)=Ij then set goto[i,A]=j Repeat 3 until no more entries added The initial state i is the Ii holding item [S’•S,$]

Example The grammar G S’ → S S → C C C → c C | dstate

ACTION GOTO

c d $ S C

0 s3 s4 1 2

1 acc

2 s6 s7 5

3 s3 s4 8

4 r3 r3

5 r1

6 s6 s7 9

7 r3

8 r2 r2

9 r2

Example Grammar and LR(1) Items Unambiguous LR(1) grammar:

S L = R | RL * R | idR L

Augment with S’ S LR(1) items (next slide)

[S’ •S, $] goto(I0,S)=I1 [S •L=R, $] goto(I0,L)=I2 [S •R, $] goto(I0,R)=I3 [L •*R, =/$] goto(I0,*)=I4 [L •id, =/$] goto(I0,id)=I5 [R •L, $] goto(I0,L)=I2

[S’ S•, $]

[S L•=R, $] goto(I0,=)=I6 [R L•, $]

[S R•, $]

[L *•R, =/$] goto(I4,R)=I7 [R •L, =/$] goto(I4,L)=I8 [L •*R, =/$] goto(I4,*)=I4 [L •id, =/$] goto(I4,id)=I5

[L id•, =/$]

[S L=•R, $] goto(I6,R)=I9 [R •L, $] goto(I6,L)=I10

[L •*R, $] goto(I6,*)=I11 [L •id, $] goto(I6,id)=I12

[L *R•, =/$]

[R L•, =/$]

[S L=R•, $]

[R L•, $]

[L *•R, $] goto(I11,R)=I13

[R •L, $] goto(I11,L)=I10

[L •*R, $] goto(I11,*)=I11

[L •id, $] goto(I11,id)=I12

[L id•, $]

[L *R•, $]

I0:

I1:

I2:

I3:

I4:

I5:

I6:

I7:

I8:

I9:

I10:

I12:

I11:

I13:

Example LR(1) Parsing Table

s5 s4

acc

s6 r6

r3

s5 s4

r5 r5

s12 s11

r4 r4

r6 r6

r2

r6

s12 s11

r5

r4

id * = $

0

1

2

3

4

5

6

7

8

9

10

11

12

13

S L R

1 2 3

8 7

10 4

10 13

Grammar:1. S’ S2. S L = R3. S R4. L * R5. L id6. R L

LALR(1) Grammars LR(1) parsing tables have many states LALR(1) parsing (Look-Ahead LR) combines LR(1)

states to reduce table size Less powerful than LR(1)

Will not introduce shift-reduce conflicts, because shifts do not use lookaheads

May introduce reduce-reduce conflicts, but seldom do so for grammars of programming languages

SLR and LALR tables for a grammar always have the same number of states, and less than LR(1) tables. Like C, SLR and LALR >100, LR(1) > 1000

Constructing LALR Parsing Tables Two ways

Construction of the LALR parsing table from the sets of LR(1) items. Union the states Requires much space and time

Construction of the LALR parsing table from the sets of LR(0) items Efficient Use in practice.

Example

stateACTION GOTO

c d $ S C

0 s36 s47 1 2

1 acc

2 s36 s47 5

36 s36 s47 89

47 r3 r3 r3

5 r1

89 r2 r2 r2

stateACTION GOTO

c d $ S C

0 s3 s4 1 2

1 acc

2 s6 s7 5

3 s3 s4 8

4 r3 r3

5 r1

6 s6 s7 9

7 r3

8 r2 r2

9 r2LALR(1)

LR(1)

Constructing LALR(1) Parsing Tables Construct sets of LR(1) items

Combine LR(1) sets with sets of items that share the same first part

[L *•R, =][R •L, =] [L •*R, =] [L •id, =]

[L *•R, $][R •L, $][L •*R, $][L •id, $]

I4:

I11:

[L *•R, =/$][R •L, =/$][L •*R, =/$][L •id, =/$]

Shorthandfor two items

in the same set

Example LALR(1) Grammar

Unambiguous LR(1) grammar:S L = R | RL * R | idR L

Augment with S’ S LALR(1) items (next slide)

[S’ •S, $] goto(I0,S)=I1 [S •L=R, $] goto(I0,L)=I2 [S •R, $] goto(I0,R)=I3 [L •*R, =/$] goto(I0,*)=I4 [L •id, =/$] goto(I0,id)=I5 [R •L, $] goto(I0,L)=I2

[S’ S•, $]

[S L•=R, $] goto(I0,=)=I6 [R L•, $]

[S R•, $]

[L *•R, =/$] goto(I4,R)=I7 [R •L, =/$] goto(I4,L)=I9 [L •*R, =/$] goto(I4,*)=I4 [L •id, =/$] goto(I4,id)=I5

[L id•, =/$]

[S L=•R, $] goto(I6,R)=I8 [R •L, $] goto(I6,L)=I9

[L •*R, $] goto(I6,*)=I4 [L •id, $] goto(I6,id)=I5

[L *R•, =/$]

[S L=R•, $]

[R L•, =/$]

I0:

I1:

I2:

I3:

I4:

I5:

I6:

I7:

I8:

I9:

Shorthandfor two items

[R L•, =][R L•, $]

Example LALR(1) Parsing Table

s5 s4

acc

s6 r6

r3

s5 s4

r5 r5

s5 s4

r4 r4

r2

r6 r6

id * = $

0

1

2

3

4

5

6

7

8

9

S L R

1 2 3

9 7

9 8

Grammar:1. S’ S2. S L = R3. S R4. L * R5. L id6. R L

LL, SLR, LR, LALR Summary LL parse tables computed using FIRST/FOLLOW

Nonterminals terminals productions Computed using FIRST/FOLLOW

LR parsing tables computed using closure/goto LR states terminals shift/reduce actions LR states nonterminals goto state transitions

A grammar is LL(1) if its LL(1) parse table has no conflicts SLR if its SLR parse table has no conflicts LR(1) if its LR(1) parse table has no conflicts LALR(1) if its LALR(1) parse table has no conflicts

LL, SLR, LR, LALR Grammars

LL(1)

LR(1)

LR(0)

SLR

LALR(1)

YACC

Yacc or Bisoncompiler

yaccspecification

yacc.y

y.tab.c

inputstream

Ccompiler

a.outoutputstream

y.tab.c

a.out

Documents

Chapter 4-2 Chang Chi-Chung 2007.06.07. Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation) LR(0), SLR, Canonical LR = LR(1), LALR Other