Transcript
Page 1: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

1

CS 1622 Lecture 6 1

CS1622

Lecture 6Introduction to Parsing

CS 1622 Lecture 6 2

2

The Front End

Parser

• Checks the stream of words and their parts of speech (producedby the scanner) for grammatical correctness

• Determines if the input is syntactically well formed

• Guides checking at deeper levels than syntax

• Builds an IR representation of the code

Think of this as the mathematics of diagramming sentences

S o u r c e

c o d eS c a n n e r

I RP a r s e r

E r r o r s

tokens

CS 1622 Lecture 6 3

The Functionality of theParser Input: sequence of tokens from lexer Output: parse tree of the program

parse tree is generated if the input is alegal program

Instead of parse tree, some parsersproduce directly:

abstract syntax tree (AST) + symbol table, or intermediate code, or object code

Page 2: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

2

CS 1622 Lecture 6 4

Comparison with LexicalAnalysis

Parse treeString oftokens

Parser

String oftokens

String ofcharacters

Scanner

OutputInputPhase

CS 1622 Lecture 6 5

Example

E

E

E E

E+

id*

idid

• The program:x * y + zInput to parser:

ID TIMES ID PLUS IDwe’ll write tokens as follows:

id * id + id

• Output of parser:the parse tree

CS 1622 Lecture 6 6

What must parser do? Not all strings of tokens are valid programs

parser must distinguish between valid andinvalid strings of tokens

Parser must expose program structure(associativity and precedence)

• parser must return the parse treeWe need:

A language for describing valid strings of tokens A method for distinguishing valid from invalid

strings of tokens (and for building the parse tree)

Page 3: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

3

CS 1622 Lecture 6 7

Parser

lexicalanalyzer

symboltable

parsersourcetoken

get nexttoken

parsetree

rest offrontend

IR

Parsing = determining whether a string of tokenscan be generated by a grammar

CS 1622 Lecture 6 8

Grammars Precise, easy-to understand description of

syntax Context-free grammars -> efficient parsers

(automatically!) Help in translation and error detection

Eg. Attribute grammars Easier language evolution

Can add new constructs systematically

CS 1622 Lecture 6 9

Syntax Errors Many errors are syntactic or exposed

by parsing eg. Unbalanced ()

Error handling goals: Report errors quickly & accurately Recover quickly (continue parsing after

error) Little overhead on parse time

Page 4: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

4

CS 1622 Lecture 6 10

Error Recovery Panic mode

Discard tokens until synchronization token found (often ‘;’)

Phrase level Local correction: replace a token by another and continue

Error productions Encode commonly expected errors in grammar

Global correction Find closest input string that is in L(G)

Too costly in practice

CS 1622 Lecture 6 11

Context-free Grammars Precise and easy way to specify the

syntactical structure of a programminglanguage

Efficient recognition methods exist Natural specification of many

“recursive” constructs: expr -> expr + expr | term

CS 1622 Lecture 6 12

Context-free GrammarDefinition Terminals T

Symbols which form strings of L(G), G a CFG (= tokens in thescanner), e.g. if, else, id

Nonterminals N Syntactic variables denoting sets of strings of L(G) Impose hierarchical structure (e.g., precedence rules)

Start symbol S (∈ N) Denotes the set of strings of L(G)

Productions P Rules that determine how strings are formed N -> (N|T)*

Page 5: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

5

CS 1622 Lecture 6 13

Why are regular expressionsnot enough?

What programs are generated by?digit+ ( ( “+” | “-” | “*” | “/” ) digit+ )*

no structure! Generates a list rather than a tree

What important properties this regularexpression fails to express?

CS 1622 Lecture 6 14

Why are regular expressionsnot enough?

Write an automaton that accepts strings “a”, “(a)”, “((a))”, and “(((a)))”

cannot do - regular expressions cannot count.

“a”, “(a)”, “((a))”, “(((a)))”, … “(ka)k”

CS 1622 Lecture 6 15

Example: ExpressionGrammarexpr -> expr op exprexpr -> (expr)expr -> - exprexpr -> idop -> +op -> -op -> *op -> /op -> ^

Terminals: {id, +, -, *, /, ^}

Nonterminals {expr, op,}

Start symbol Expr

Page 6: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

6

CS 1622 Lecture 6 16

Notational Conventions Terminals

a,b,c.. +,-,.. ‘,’.’;’ etc 0..9 expr or <expr>

Nonterminals A, B, C .. S start symbol (if present)

or first nonterminal inproduction list

Terminal strings u,v,..

Grammar symbol strings α,β

Productions A -> α

CS 1622 Lecture 6 17

Shorthands & DerivationsE -> E + E | E * E |

(E) | - E | <id> E => - E “E derives -

E” => derives in 1 step =>* derive in n (0..)

steps

CS 1622 Lecture 6 18

More Definitions L(G) language generated by G = set of strings

derived from S S =>+ w : w sentence of G (w string of terminals) S =>+ α : α sentential form of G (string can contain nonterminals) G and G’ are equivalent :⇔ L(G) = L(G’) A language generated by a grammar (of the form

shown) is called a context-free language

Page 7: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

7

CS 1622 Lecture 6 19

ExampleG = ({-,*,(,),<id>}, {E},

E, {E -> E + E, E->E * E , E -> (E) , E-> - E, E -> <id>})

Sentence: -(<id> + <id>)Derivation:E => -E => -(E) =>

-(E+E)=>-(<id>+E) => -(<id> + <id>)• Leftmost derivation i.e.

always replace leftmostnonterminal

• Rightmost derivationanalogously

• Left /right sentential form

CS 1622 Lecture 6 20

Parse Trees

E

- E

( E )

E + E

<id> <id>

E => -E =>-(E) =>-(E+E)=>-(<id>+E) => -(<id> +<id>)

Parse tree =graphical representation

of a derivationignoring replacement

order

CS 1622 Lecture 6 21

Expressive Power CFGs are more powerful than REs

Can express matching () with CFGs Can express most properties desired for programming

languages CFGs cannot express:

Identifiers declared before used L = {wcw|w is in (a|b)*} Parameter checking (#formals = #actuals)

L ={anbmcndm|n ≥ 1, m ≥ 1}

Page 8: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

8

CS 1622 Lecture 6 22

Parsing = determining whether a string of

tokens can be generated by a grammar Two classes based on order in which

parse tree is constructed: Top-down parsing

Start construction at root of parse tree

Bottom-up parsing Start at leaves and proceed to root

CS 1622 Lecture 6 23

Derivations and Parse TreesA derivation is a sequence of productionsA derivation can be drawn as a tree

Start symbol is the tree’s rootFor a production N -> X1…Xn add

X1 … Xn as children to node N

S!!!LLL

1 nXYY!L

X

1 nYYL

CS 1622 Lecture 6 24

Derivation Example S -> E + E | E * E E -> id | (E) Derivation for: id * id + id Derivation for: id + id + id

See board

Page 9: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

9

CS 1622 Lecture 6 25

Notes on Derivations A parse tree has

Terminals at the leaves Non-terminals at the interior nodes

An in-order traversal of the leaves is theoriginal input

The parse tree shows the association ofoperations, the input string does not

CS 1622 Lecture 6 26

Terminals Terminals are called because there are no

rules for replacing them

Once generated, terminals are permanent

Terminals are the tokens of the languagerepresented by the grammar

CS 1622 Lecture 6 27

Left-most and Right-mostDerivations The example is a left-

most derivation At each step, replace the

left-most non-terminal

There is an equivalentnotion of a right-mostderivation

EE*Eid *Eid *E+E id *id+Eid*id+ id

®

®

®

®

®

Page 10: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

10

CS 1622 Lecture 6 28

Right-most Derivation inDetail (1)

EEE+EE+ idE * E + idE * id + idid * id + id

®

®*®*®*®*

CS 1622 Lecture 6 29

Derivations and Parse Trees Note that right-most and left-most

derivations have the same parse tree

The difference is the order in whichbranches are added

CS 1622 Lecture 6 30

Summary of Derivations We are not just interested in whether

s e L(G) We need a parse tree for s

A derivation defines a parse tree But one parse tree may have many derivations

Left-most and right-most derivations areimportant in parser implementation

Page 11: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

11

CS 1622 Lecture 6 31

Ambiguity (Cont.)This string has two parse trees

E

E

E E

E*

id +

idid

E

E

E E

E+

id*

idid

CS 1622 Lecture 6 32

Parse treesQuestion 1:

for each of the two parse trees, find thecorresponding left-most derivation

Question 2: for each of the two parse trees, find the

corresponding right-most derivation

CS 1622 Lecture 6 33

Ambiguity (Cont.) A grammar is ambiguous if for some string

(the following three conditions are equivalent)

it has more than one parse tree if there is more than one right-most derivation if there is more than one left-most derivation

Ambiguity is BAD Leaves meaning of some programs ill-defined

Page 12: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

12

CS 1622 Lecture 6 34

Dealing with Ambiguity There are several ways to handle ambiguity

Most direct method is to rewrite grammarunambiguousl

Enforces precedence of * over + Separate (external of grammar) conflict-resolution

rules E.g., precedence or associativity rules (e.g. via parser

tool declarations) Same idea we saw with scanning: instead of

complicated RE’s use symbol table to recognizekeywords

'''EEE | EEid E' | id | (E)!+!"

CS 1622 Lecture 6 35

Expression Grammars(precedence) Rewrite the grammar

use a different nonterminal for each precedence level start with the lowest precedence (MINUS)

E E - E | E / E | ( E ) | id

rewrite to

E E - E | TT T / T | FF id | ( E )

CS 1622 Lecture 6 36

Exampleparse tree for id – id / id

E E - E | TT T / T | FF id | ( E )

E

E

F F

T

-

id

/

idid

T

FT T

E

Page 13: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

13

CS 1622 Lecture 6 37

More than one parse tree? Attempt to construct a parse tree for id-

id/id that shows the wrong precedence.

Question: Why do you fail to construct this parse

tree?

CS 1622 Lecture 6 38

Associativity The grammar captures operator precedence,

but it is still ambiguous! fails to express that both subtraction and division

are left associative;

e.g., 5-3-2 is equivalent to: ((5-3)-2) and not to:(5-(3-2)).

CS 1622 Lecture 6 39

Recursion A grammar is recursive in nonterminal X if:

X + … X … recall that + means “in one or more steps, X derives a

sequence of symbols that includes an X”

A grammar is left recursive in X if: X + X …

in one or more steps, X derives a sequence of symbolsthat starts with an X

A grammar is right recursive in X if: X + … X

in one or more steps, X derives a sequence of symbolsthat ends with an X

Page 14: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture06.pdf · CS 1622 Lecture 6 3 ... intermediate code, or object code. 2 CS 1622 Lecture 6 4 Comparison

14

CS 1622 Lecture 6 40

How to fix associativity The grammar given above is both left and

right recursive in nonterminals exp and term try at home: write the derivation steps that show

this. To correctly expresses operator associativity:

For left associativity, use left recursion. For right associativity, use right recursion.

Here's the correct grammar:E E – T | TT T / F | FF id | ( E )


Recommended