33
Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Embed Size (px)

Citation preview

Page 1: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Parsing

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language PrinciplesLecture 3

Page 2: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Context-Free Grammars

• Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A → , for A and (u )*.

• Re-writing using grammar rules:

– βAγ => βγ if A → (derivation).

Page 3: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

String Derivations

• Left-most derivation: At each step, the left-most nonterminal is re-written.

• Right-most derivation: At each step, the right-most nonterminal is re-written.

Page 4: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 5: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Derivation Trees

Derivation trees: Describe re-writes, independently of the order (left-most or right-most).

• Each tree branch matches a production rule in the grammar.

Page 6: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 7: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Derivation Trees

Notes:1) Leaves are terminals.2) Bottom contour is the sentence.3) Left recursion causes left branching.4) Right recursion causes right branching.

Page 8: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Goal of Parsing

• Examine input string, determine whether it's legal.

• Equivalent to building derivation tree. • Added benefit: tree embodies syntactic

structure of input.• Therefore, tree should be unique.

Page 9: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Ambiguous Grammars

• Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z.

• (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

Page 10: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Ambiguous Grammars

Classic ambiguities:

– Simultaneous left/right recursion: E → E + E

→ i

– Dangling else problem: S → if E then S → if E then S else S →

Page 11: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 12: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Operator Precedence and Associativity

• Let’s build a CFG for expressions consisting of:

– elementary identifier i.– + and - (binary ops) have lowest

precedence, and are left associative .– * and / (binary ops) have middle

precedence, and are right associative.– + and - (unary ops) have highest

precedence, and are right associative.

Page 13: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Corresponding Grammar for Expressions

E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence).T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence).F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence).P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).

Page 14: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Operator Precedence and Associativity

• Operator precedence:– The lower in the grammar, the higher the

precedence.• Operator Associativity:

– Tie breaker for precedence.– Left recursion in the grammar means

• left associativity of the operator,• left branching in the tree.

– Right recursion in the grammar means• right associativity of the operator,• right branching in the tree.

Page 15: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Building Derivation Trees

Sample Input : - + i - i * ( i + i ) / i + i

(Human) derivation tree construction:

• Bottom-up.• On each pass, scan entire expression,

process operators with highest precedence (parentheses are highest).

• Lowest precedence operators are last, at the top of tree.

Page 16: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 17: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Abstract Syntax Trees

• AST is a condensed version of the derivation tree.

• No noise (intermediate nodes).• String-to-tree transduction grammar:

– rules of the form A → ω => 's'. • Build 's' tree node, with one child per tree

from each nonterminal in ω.

Page 18: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Example

E → E + T => + → E - T => - → TT → F * T => * → F / T => / → FF → - F => neg → + F => + → PP → '(' E ')' → i => i

Page 19: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Sample Input : - + i - i * ( i + i ) / i + i

Page 20: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

String-to-Tree Transduction

• We transduce from vocabulary of input symbols, to vocabulary of tree node names.

• Could eliminate construction of unary + node, anticipating semantics.

F → - F => neg → + F // no more unary + node → P

Page 21: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

The Game of Syntactic Dominoes• The grammar:

E → E+T T → P*T P → (E) → T → P → i

• The playing pieces: An arbitrary supply of each piece (one per grammar rule).

• The game board:• Start domino at the top.• Bottom dominoes are the "input."

Page 22: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 23: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

The Game of Syntactic Dominoes

• Game rules: – Add game pieces to the board.– Match the flat parts and the symbols.– Lines are infinitely elastic.

• Object of the game:– Connect start domino with the input

dominoes.– Leave no unmatched flat parts.

Page 24: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Parsing Strategies

• Same as for the game of syntactic dominoes.

– “Top-down” parsing: start at the start symbol, work toward the input string.

– “Bottom-up” parsing: start at the input string, work towards the goal symbol.

• In either strategy, can process the input left-to-right or right-to-left

Page 25: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Top-Down Parsing

• Attempt a left-most derivation, by predicting the re-write that will match the remaining input.

• Use a string (a stack, really) from which the input can be derived.

Page 26: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Top-Down Parsing

Start with S on the stack.At every step, two alternatives:

1) (the stack) begins with a terminal t. Match t against the first input symbol.

2) begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input.

The OPF does the “predicting” in such a predictive parser.

Page 27: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 28: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Classical Top-Down Parsing Algorithm

Push (Stack, S);while not Empty (Stack) do

if Top(Stack) then if Top(Stack) = Head(input)

then input := tail(input)Pop(Stack)

else error (Stack, input)else P:= OPF (Stack, input)

Push (Pop(Stack), RHS(P))od

Page 29: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3
Page 30: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Top-Down Parsing

• Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1).

• We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input.

• Storage requirements: O(n2), where n is the size of the grammar vocabulary (a few hundred).

Page 31: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

LL(1) Grammars

Definition:A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead) iff for all A, and for all A→, A→, ,

Select (A → ) ∩ Select (A → ) =

• Previous example: Grammar is not LL(1).• More later on why, and what do to about it.

Page 32: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Example:

S → A {b,}A → bAd {b} → {d, }

Disjoint!

Grammar is LL(1)!

d b

S S → A S → P

A A → A → bAd A →

(At most) one production per entry.

Page 33: Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Parsing

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language PrinciplesLecture 3