25
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4

1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4

Embed Size (px)

Citation preview

1

CIS 461Compiler Design & Construction

Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and

Linda Torczon

Lecture-Module #12Parsing 4

2

Parsing Techniques

Top-down parsers (LL(1), recursive descent)

• Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation)

• Pick a production and try to match the input

• Bad “pick” may need to backtrack

• Some grammars are backtrack-free (predictive parsing)

Bottom-up parsers (LR(1), operator precedence)

• Start at the leaves and grow toward root

• We can think of the process as reducing the input string to the start symbol

• At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production

• Bottom-up parsers handle a large class of grammars

3

S

lookahead input string

S

A B

C

?

?

lookahead

start symbol fringe of the parse tree

A ?

lookahead

upper fringe of the parse tree

left-to-rightscan

Bottom-up Parsing

left-most derivation

Top-down Parsing

D

D

C

S

right-most derivation in reverse

4

Handle-pruning, Bottom-up Parsers

The process of discovering a handle & reducing it to the appropriate left-hand side is called handle pruning

Handle pruning forms the basis for a bottom-up parsing method

To construct a rightmost derivation

S 0 1 2 … n-1 n w

Apply the following simple algorithm

for i n to 1 by -1

Find the handle < i i , ki > in i

Replace i with i to generate i-1

5

Example

The expression grammar

Handles for rightmost derivation of input string:

x – 2 * y

Sentential Form Handle Prod’n , Pos’n

S —Expr 1,1Expr – Term 3,3Expr – Term * Factor 5,5Expr – Term * <id,y> 9,5Expr – Factor * <id,y> 7,3Expr – <num,2> * <id,y> 8,3Term – <num,2> * <id,y> 4,1Factor – <num,2> * <id,y> 7,1<id,x> – <num,2> * <id,y> 9,1

1 S Expr2 Expr Expr + Term3  | Expr – Term4 | Term5 Term Term * Factor6  | Term / Factor7 | Factor8 Factor num9 | id

6

Handle-pruning, Bottom-up Parsers

One implementation technique is the shift-reduce parser

push $lookahead = get_ next_token( )repeat until (top of stack == start symbol and lookahead == $) if the top of the stack is a handle then /* reduce to */ pop || symbols off the stack push onto the stack else if (lookahead $) then /* shift */ push lookahead lookahead = get_next_token( )

How do errors show up?

• failure to find a handle

• hitting $ and needing to shift (final else clause)

Either generates an error

7

Example, Corresponding Parse Tree

S

<id,x>

Term

Fact.

Expr –

Expr

<id,y>

<num,2>

Fact.

Fact.Term

Term

*

1. Shift until top-of-stack is the right end of a handle2. Pop the left end of the handle & reduce

5 shifts + 9 reduces + 1 accept

8

Shift-reduce Parsing

Shift reduce parsers are easily built and easily understood

A shift-reduce parser has just four actions

• Shift — next word is shifted onto the stack

• Reduce — right end of handle is at top of stack

Locate left end of handle within the stack

Pop handle off stack & push appropriate lhs

• Accept — stop parsing & report success

• Error — call an error reporting/recovery routine

Accept & Error are simple

Shift is just a push and a call to the scanner

Reduce takes |rhs| pops & 1 push

If handle-finding requires state, put it in the stack

Handle finding is key

• handle is on stack

• finite set of handles

use a DFA !

9

LR Parsers

• LR(k) parsers are table-driven, bottom-up, shift-reduce parsers that use a limited right context (k-token lookahead) for handle

recognition• LR(k): Left-to-right scan of the input, Rightmost derivation in reverse

with k token lookahead

A grammar is LR(k) if, given a rightmost derivation

S 0 1 2 … n-1 n sentence

We can

1. isolate the handle of each right-sentential form i , and

2. determine the production by which to reduce,

by scanning i from left-to-right, going at most k symbols beyond

the right end of the handle of i

10

LR Parsers

A table-driven LR parser looks like

ScannerTable-driven

Parser

ACTION & GOTOTables

ParserGenerator

sourcecode

grammar

IR

Stack

11

LR Shift-Reduce Parsers push($); // $ is the end-of-file symbolpush(s0); // s0 is the start state of the DFA that recognizes handleslookahead = get_next_token();repeat forever s = top_of_stack(); if ( ACTION[s,lookahead] == reduce ) then pop 2*|| symbols; s = top_of_stack(); push(); push(GOTO[s,]); else if ( ACTION[s,lookahead] == shift si ) then push(lookahead); push(si); lookahead = get_next_token(); else if ( ACTION[s,lookahead] == accept and lookahead == $ ) then return success; else error();

The skeleton parser

•uses ACTION & GOTO

• does |words| shifts

• does |derivation| reductions • does 1 accept

12

To make a parser for L(G), we need a set of tables

The grammar

The tables

LR Parsers (parse tables)

1 S Z 2 Z Z z3  | z

ACTION  State $ z0 — shift 21 accept shift 32 reduce 3 reduce 33 reduce 2 reduce 2

GOTO State Z 0 1 123

13

Example ParsesThe string “z”

The string “zz”

Stack Input Action$ s0 z $ shift 2$ s0 z s2 $ reduce 3$ s0 Z s1 $ accept

Stack Input Action$ s0 z z $ shift 2$ s0 z s2 z $ reduce 3 $ s0 Z s1 z $ shift 3$ s0 Z s1 z s3 $ reduce 2$ s0 Zs1 $ accept

14

LR Parsers

How does this LR stuff work?

• Unambiguous grammar unique rightmost derivation

• Keep upper fringe on a stack– All active handles include TOS– Shift inputs until TOS is right end of a handle

• Language of handles is regular– Build a handle-recognizing DFA

– ACTION & GOTO tables encode the DFA

• To match subterms, recurse and leave DFA’s state on stack

• Final states of the DFA correspond to reduce actions– New state is GOTO[lhs , state at TOS]

– For Z, this takes the DFA to S1

S0

S3

S2

S1

z

zZ

Control DFA for the simple example

Reduce action

Reduce action

15

Building LR Parsers

How do we generate the ACTION and GOTO tables?• Use the grammar to build a model of the handle recognizing DFA

• Use the DFA model to build ACTION & GOTO tables• If construction succeeds, the grammar is LR

How do we build the handle-recognizing DFA ?

• Encode the set of productions that can be used as handles in the DFA state: Use LR(k) items

• Use two functions goto( s, ) and closure( s )

– goto() is analogous to move() in the DFA to NFA conversion

– closure() is analogous to -closure• Build up the states and transition functions of the DFA

• Use this information to fill in the ACTION and GOTO tables

16

LR(k) items

An LR(k) item is a pair [A , B], where

A is a production with a • at some position in the rhs

B is a lookahead string of length ≤ k (terminal symbols or $)

Examples: [• , a], [• , a], [• , a], & [• , a]

The • in an item indicates the position of the top of the stack

• LR(0) items [ • ] (no lookahead symbol)

• LR(1) items [ • , a ] (one token lookahead)

• LR(2) items [ • , a b ] (two token lookahead) ...

17

LR(k) items

The • in an item indicates the position of the top of the stack

[• , a] means that the input seen so far is consistent with the use of immediately after the symbol on top of the stack

[• , a] means that the input seen so far is consistent with the use of at this point in the parse, and that the parser has already recognized .

[• , a] means that the parser has seen , and that a lookahead a is consistent with reducing to (for LR(k) parsers a is a string of terminal symbols of length k)

The table construction algorithm uses items to represent valid configurations of an LR(1) parser

18

LR(1) Items

The production •, with lookahead a, generates 4 items

[• , a], [• , a], [• , a], & [• , a]

The set of LR(1) items for a grammar is finite

What’s the point of all these lookahead symbols?

• Carry them along to choose correct reduction

• Lookaheads are bookkeeping, unless item has • at right end

– Has no direct use in [• , a]

– In [• , a], a lookahead of a implies a reduction by

– For { [• , a],[• , b] }

lookahead = a reduce to ;

lookahead FIRST() shift Limited right context is enough to pick the actions

19

Back to Finding Handles

Parser in a state where the stack (the fringe) was

Expr – Term

With lookahead of *

How did it choose to expand Term rather than reduce to Expr?

• Lookahead symbol is the key

• With lookahead of + or –, parser should reduce to Expr

• With lookahead of * or /, parser should shift

• Parser uses lookahead to decide

• All this context from the grammar is encoded in the handle- recognizing mechanism

20

Back to x - 2 * y

1. Shift until TOS is the right end of a handle2. Find the left end of the handle & reduce

shift here

reduce here

21

High-level overview Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1) items), C =

{ I0 , I1 , ... , In }

a Introduce a new start symbol S’ which has only one production

S’ S

b Initial state, I0 should include

• [S’ •S, $], along with any equivalent items

• Derive equivalent items as closure( I0 )

c Repeatedly compute, for each Ik , and each grammar symbol , goto(Ik , )

• If the set is not already in the collection, add it

• Record all the transitions created by goto( )

This eventually reaches a fixed point

2 Fill in the ACTION and GOTO tables using the DFA

The canonical collection completely encodes the transition diagram for the handle-finding DFA

LR(1) Table Construction

22

Computing Closures

closure(I) adds all the items implied by items already in I

• Any item [ , a] implies [ , x] for each production with on the lhs, and x FIRST(a)

• Since is valid, any way to derive is valid, too

The algorithm

Closure( I ) while ( I is still changing ) for each item [ • , a] I for each production P for each terminal b FIRST(a) if [ • , b] I then add [ • , b] to I

Fixpoint computation

23

Example Grammar

Initial step builds the item [S • A ,$]and takes its closure( )

Closure( [S • A , $] )

So, initial state s0 is { [S • Z ,$], [Z • Z z, $],[Z• z , $], [Z • Z z , z], [Z • z , z] }

1 S Z 2 Z Z z3  | z

Item From[S • Z , $] Original item[Z • Z z , $] 1, a is $[Z • z , $] 1, a is $[Z • Z z , z] 2, a is z $[Z • z , z] 2, a is z $

24

Computing Gotos

goto(I , x) computes the state that the parser would reach if it recognized an x while in state I

• goto( { [ , a] }, ) produces [ , a]

• It also includes closure( [ , a] ) to fill out the state

The algorithm

Goto( I, x ) new = Ø for each [ • x , a] I new = new [ x • , a]

return closure(new)

• Not a fixpoint method• Uses closure

25

Example Grammar

s0 is { [S • Z ,$], [Z • Z z, $],[Z • z , $], [Z • Z z , z], [Z • z , z] }

goto( S0 , z )

• Loop produces

• Closure adds nothing since • is at end of rhs in each item

In the construction, this produces s2

{ [Z z • , {$ , z}]}

New, but obvious, notation for two distinct items

[Zz • , $] and [Zz • , z]

Item From[Z z • , $] Item 3 in s0

[Z z • , z] Item 5 in s0