51
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1

CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

  • Upload
    lamthuy

  • View
    240

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330: Organization of Programming Languages

Context Free Grammars and Parsing

CMSC 330 - Spring 2013 1

Page 2: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

Recall: Architecture of Compilers, Interpreters

CMSC 330 - Spring 2013 2

Front End

Intermediate Representation

Back End

Parser Static Analyzer Source

Compiler / Interpreter

Page 3: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

Front End

!   Responsible for turning source code (sequence of symbols) into representation of program structure

!   Parser generates parse trees, which static analyzer processes

CMSC 330 - Spring 2013 3

Front End

Source Parser Static Analyzer

Intermediate Representation

Parse Tree

Page 4: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

Parser Architecture

!   Scanner / lexer converts sequences of symbols into tokens (keywords, variable names, operators, numbers, etc.)

!   Parser (yes, a parser contains a parser!) converts sequences of tokens into parse tree; implements a context free grammar •  We can’t do everything as a regular expression – not powerful enough

CMSC 330 - Spring 2013 4

Parser

Source Scanner Parser

Parse Tree

Token Stream

Page 5: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Context Free Grammar (CFG)

!   A way of describing sets of strings (= languages) •  The notation [[S]] denotes the language of strings

defined by S

!   Example grammar: S → 0S | 1S | ε String s’ ∊ [[S]] iff

•  s’ = ε, or ∃s ∊ [[S]] such that s’ = 0s, or s’ = 1s

!   Grammar is same as regular expression (0|1)* •  Generates / accepts the same set of strings

5

Page 6: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Context-Free Grammars (CFGs)

!   But CFGs can do what regexps cannot •  S → ( S ) | ε // represents balanced pairs of ( )’s

!   In fact, CFGs subsume REs, DFAs, NFAs •  There is a CFG that generates any regular language •  But REs are a better notation for regular languages

!   CFGs can specify programming language syntax •  CFGs (mostly) describe the parsing process

6

Page 7: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Formal Definition: Context-Free Grammar

!   A CFG G is a 4-tuple (Σ, N, P, S) •  Σ – alphabet (finite set of symbols, or terminals)

  Often written in lowercase

•  N – a finite, nonempty set of nonterminal symbols   Often written in uppercase   It must be that N ∩ Σ = ∅

•  P – a set of productions of the form N → (Σ|N)*   Informally: the nonterminal can be replaced by the string of

zero or more terminals / nonterminals to the right of the →   Can think of productions as rewriting rules (more later)

•  S ∊ N – the start symbol 7

Page 8: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Backus-Naur Form !   Context-free grammar production rules are also

called Backus-Naur Form or BNF •  A production like A → B c D is written in BNF as

<A> ::= <B> c <D> (Non-terminals written with angle brackets and ::= instead of →)

•  Often used to describe language syntax !   BNF was designed by

•  John Backus   Chair of the Algol committee in the early 1960s

•  Peter Naur   Secretary of the committee, who used this notation to

describe Algol in 1962

8

Page 9: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

Generating strings

!   We can think of a grammar as generating strings by rewriting

!   Example grammar: S → 0S | 1S | ε •  S ⇒ 0S // using S → 0S •  ⇒ 01S // using S → 1S •  ⇒ 011S // using S → 1S •  ⇒ 011 // using S → ε

CMSC 330 - Spring 2013 9

Page 10: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Accepting strings (informally)

!   Determining if s ∈ [[S]] is called acceptance: goal is to find a rewriting from S that yields s •  011 ∈ [[S]] according to the previous rewriting •  A rewriting is some sequence of productions

(rewrites) applied starting at the start symbol !   Terminology

•  Such a sequence of rewrites is a derivation or parse •  Discovering the derivation is called parsing

10

Page 11: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Derivations

!   Notation ⇒ indicates a derivation of one step ⇒+ indicates a derivation of one or more steps ⇒* indicates a derivation of zero or more steps

!   Example •  S → 0S | 1S | ε

!   For the string 010 •  S ⇒ 0S ⇒ 01S ⇒ 010S ⇒ 010 •  S ⇒+ 010 •  S ⇒* S

11

Page 12: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Language Generated by Grammar

!   L(G) (a.k.a. [[G]]), the language defined by G is

[[G]] = L(G) = { ω ∊ Σ* | S ⇒+ ω }

•  S is the start symbol of the grammar •  Σ is the alphabet for that grammar

!   In other words •  All strings over Σ that can be derived from the start

symbol via one or more productions

12

Page 13: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Practice

!   Try to make a grammar which accepts •  0*|1* – 0n1n where n ≥ 0 – 0n1m where m ≤ n

!   Give some example strings from this language •  S → 0 | 1S

  0, 10, 110, 1110, 11110, …

•  What language is it?   1*0

S → A | B A → 0A | ε B → 1B | ε

S → 0S1 | ε S → 0S1 | 0S | ε

13

Page 14: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Example: Arithmetic Expressions

!   E → a | b | c | E+E | E-E | E*E | (E) •  An expression E is either a letter a, b, or c •  Or an E followed by + followed by an E •  etc…

!   This describes (or generates) a set of strings •  {a, b, c, a+b, a+a, a*c, a-(b*a), c*(b + a), …}

!   Example strings not in the language •  d, c(a), a+, b**c, etc.

14

Page 15: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Formal Description of Example

!   Formally, the grammar we just showed is •  Σ = { +, -, *, (, ), a, b, c } // terminals •  N = { E } // nonterminals •  P = { E → a, E → b, E → c, // productions

E → E-E, E → E+E, E → E*E, E → (E)

} •  S = E // start symbol

15

Page 16: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

(Non-)Uniqueness of Grammars

!   Different grammars generate the same set of strings (language)

!   The following grammar generates the same set

of strings as the previous grammar E → E+T | E-T | T T → T*P | P P → (E) | a | b | c

16

Page 17: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Notational Shortcuts

!   A production is of the form •  left-hand side (LHS) → right hand side (RHS)

!   If not specified •  Assume LHS of first listed production is the start symbol

!   Productions with the same LHS •  Are usually combined with |

!   If a production has an empty RHS •  It means the RHS is ε

S → ABC // S is start symbol A → aA | b // A → b | // A → ε

17

Page 18: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Practice

!   Given the grammar S → aS | T T → bT | U U → cU | ε

•  Provide derivations for the following strings   b   ac   bbc

•  Does the grammar generate the following?   S ⇒+ ccc S ⇒+ bS   S ⇒+ bab S ⇒+ Ta

S ⇒ T ⇒ bT ⇒ bU ⇒ b S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac S ⇒ T ⇒ bT ⇒ bbT ⇒ bbU ⇒ bbcU ⇒ bbc

Yes No

No No

18

Page 19: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Practice

!   Given the grammar S → aS | T T → bT | U U → cU | ε

•  Name language accepted by grammar   a*b*c*

•  Give a different grammar accepting language

S → ABC A → aA | ε // a* B → bB | ε // b* C → cC | ε // c*

19

Page 20: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Trees

!   Parse tree shows how a string is produced by a grammar •  Root node is the start symbol •  Every internal node is a nonterminal •  Children of an internal node

  Are symbols on RHS of production applied to nonterminal

•  Every leaf node is a terminal or ε

!   Reading the leaves left to right •  Shows the string corresponding to the tree

20

Page 21: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Tree Example

S → aS | T T → bT | U U → cU | ε

S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac

21

Page 22: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Trees for Expressions !   A parse tree shows the structure of an

expression as it corresponds to a grammar E → a | b | c | d | E+E | E-E | E*E | (E) a a*c c*(b+d)

22

Page 23: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Practice

E → a | b | c | d | E+E | E-E | E*E | (E) Make a parse tree for… •  a*b •  a+(b-c) •  d*(d+b)-a •  (a+b)*(c-d) •  a+(b-c)*d

23

Page 24: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Ambiguity

!   A grammar is ambiguous if a string may have multiple derivations •  Equivalent to multiple parse trees •  Can be hard to determine

1.  S → aS | T T → bT | U U → cU | ε

2.  S → T | T T → Tx | Tx | x | x

3.  S → SS | () | (S)

No

Yes

?

24

Page 25: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Ambiguity (cont.)

!   Example •  Grammar: S → SS | () | (S) String: ()()() •  2 distinct (leftmost) derivations (and parse trees)

 S ⇒ SS ⇒ SSS ⇒()SS ⇒()()S ⇒()()()  S ⇒ SS ⇒ ()S ⇒()SS ⇒()()S ⇒()()()

25

Page 26: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

CFGs for Programming Languages

!   Recall that our goal is to describe programming languages with CFGs

!   We had the following example which describes limited arithmetic expressions E → a | b | c | E+E | E-E | E*E | (E)

!   What’s wrong with using this grammar?

•  It’s ambiguous!

26

Page 27: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Example: a-b-c E ⇒ E-E ⇒ a-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c

E ⇒ E-E ⇒ E-E-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c

Corresponds to a-(b-c) Corresponds to (a-b)-c 27

Page 28: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Example: a-b*c E ⇒ E-E ⇒ a-E ⇒ a-E*E ⇒ a-b*E ⇒ a-b*c

E ⇒ E-E ⇒ E-E*E ⇒ a-E*E ⇒ a-b*E ⇒ a-b*c

Corresponds to a-(b*c) Corresponds to (a-b)*c

*

*

28

Page 29: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Another Example: If-Then-Else <stmt> → <assignment> | <if-stmt> | ... <if-stmt> → if (<expr>) <stmt> | if (<expr>) <stmt> else <stmt>

(Note < >’s are used to denote nonterminals)

!   Consider the following program fragment if (x > y) if (x < z) a = 1; else a = 2; (Note: Ignore newlines)

29

Page 30: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Tree #1

!   Else belongs to inner if

30

Page 31: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Tree #2

!   Else belongs to outer if

31

Page 32: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Dealing With Ambiguous Grammars

!   Ambiguity is bad •  Syntax is correct •  But semantics differ depending on choice

  Different associativity (a-b)-c vs. a-(b-c)   Different precedence (a-b)*c vs. a-(b*c)   Different control flow if (if else) vs. if (if) else

!   Two approaches •  Rewrite grammar •  Use special parsing rules

  Depending on parsing method (learn in CMSC 430)

32

Page 33: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Fixing the Expression Grammar

!   Require right operand to not be bare expression E → E+T | E-T | E*T | T T → a | b | c | (E)

!   Corresponds to left-associativity

!   Now only one parse tree for a-b-c •  Find derivation

33

Page 34: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

What if We Wanted Right-Associativity?

!   Left-recursive productions •  Used for left-associative operators •  Example

E → E+T | E-T | E*T | T T → a | b | c | (E)

!   Right-recursive productions •  Used for right-associative operators •  Example

E → T+E | T-E | T*E | T T → a | b | c | (E)

34

Page 35: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Parse Tree Shape

!   The kind of recursion determines the shape of the parse tree

left recursion right recursion

35

Page 36: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

A Different Problem

!   How about the string a+b*c ? E → E+T | E-T | E*T | T T → a | b | c | (E)

!   Doesn’t have correct precedence for *

•  When a nonterminal has productions for several operators, they effectively have the same precedence

36

Page 37: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Final Expression Grammar E → E+T | E-T | T lowest precedence operators T → T*P | P higher precedence P → a | b | c | (E) highest precedence (parentheses)

!   Practice

•  Construct tree and left and and right derivations for  a+b*c a*(b+c) a*b+c a-b-c

•  See what happens if you change the last set of productions to P → a | b | c | E | (E)

•  See what happens if you change the first set of productions to E → E +T | E-T | T | P

37

Page 38: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Tips for Designing Grammars

1.  Use recursive productions to generate an arbitrary number of symbols

A → xA | ε // Zero or more x’s A → yA | y // One or more y’s

2.  Use separate non-terminals to generate disjoint parts of a language, and then combine in a production

{ a*b* } // a’s followed by b’s S → AB A → aA | ε // Zero or more a’s B → bB | ε // Zero or more b’s

38

Page 39: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Tips for Designing Grammars (cont.)

3.  To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle

{anbn | n ≥ 0} // N a’s followed by N b’s S → aSb | ε Example derivation: S ⇒ aSb ⇒ aaSbb ⇒ aabb

{anb2n | n ≥ 0} // N a’s followed by 2N b’s S → aSbb | ε Example derivation: S ⇒ aSbb ⇒ aaSbbbb ⇒ aabbbb

39

Page 40: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Tips for Designing Grammars (cont.) 4.  For a language that is the union of other

languages, use separate nonterminals for each part of the union and then combine { an(bm|cm) | m > n ≥ 0} Can be rewritten as { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} S → T | V T → aTb | U

U → Ub | b V → aVc | W W → Wc | c

40

Page 41: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Recall: Steps of Compilation

41

Page 42: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Implementing Parsers

!   Many efficient techniques for parsing •  I.e., turning strings into parse trees •  Examples

  LL(k), SLR(k), LR(k), LALR(k)…   Take CMSC 430 for more details

!   One simple technique: recursive descent parsing •  This is a top-down parsing algorithm •  Other algorithms are bottom-up

42

Page 43: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Top-Down Parsing E → id = n | { L } L → E ; L | ε (Assume: id is

variable name, n is integer)

Show parse tree for { x = 3 ; { y = 4 ; } ; }

43

{ x = 3 ; { y = 4 ; } ; }

E

L

E L

E L

ε L

E L ε

Page 44: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Example: Recursive Descent Parsing !   Goal

•  Determine if we can produce the string to be parsed from the grammar's start symbol

!   Approach •  Construct a parse tree: starting with start symbol at root,

recursively replace nonterminal with RHS of production !   Key question: which production to use?

•  There can be many productions for each nonterminal •  We could try them all, backtracking if we are unsuccessful •  But this is slow!

!   Answer: lookahead •  Keep track of next token on the input string to be processed •  Use this to guide selection of production

44

Page 45: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Bottom-up Parsing E → id = n | { L } L → E ; L | ε Show parse tree for { x = 3 ; { y = 4 ; } ; } Note that final trees

constructed are same as for top-down; only order in which nodes are added to tree is different

45

{ x = 3 ; { y = 4 ; } ; }

E

L

E L

E L

ε L

E L ε

Page 46: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Example: Shift-reduce parsing

!   Replaces RHS of production with LHS (nonterminal)

!   Example grammar •  S → aA, A → Bc, B → b

!   Example parse •  abc ⇒ aBc ⇒ aA ⇒ S •  Derivation happens in reverse

!   Something to look forward to in CMSC 430 !   Complicated to use; requires tool support

•  Bison, yacc produce shift-reduce parsers from CFGs

46

Page 47: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Tradeoffs

!   Recursive descent parsers •  Easy to write

  The formal definition is a little clunky, but if you follow the code then it’s almost what you might have done if you weren't told about grammars formally

•  Fast   Can be implemented with a simple table

!   Shift-reduce parsers handle more grammars •  But are slower

!   Most languages use hacked parsers (!) •  Strange combination of the two

47

Page 48: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

What’s Wrong With Parse Trees?

!   Parse trees contain too much information •  Example

  Parentheses   Extra nonterminals for precedence

•  This extra stuff is needed for parsing

!   But when we want to reason about languages •  Extra information gets in the way (too much detail)

48

Page 49: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Abstract Syntax Trees (ASTs)

!   An abstract syntax tree is a more compact, abstract representation of a parse tree, with only the essential parts

parse tree AST

49

Page 50: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

Abstract Syntax Trees (cont.)

!   Intuitively, ASTs correspond to the data structure you’d use to represent strings in the language •  Note that grammars describe trees •  So do OCaml datatypes (which we’ll see later) •  E → a | b | c | E+E | E-E | E*E | (E)

50

Page 51: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,

CMSC 330 - Spring 2013

The Compilation Process

51