19
(2.1) Grammars Definitions Grammars Backus-Naur Form Derivation terminology trees Grammars and ambiguity Simple example Grammar hierarchies Syntax graphs Recursive descent parsing

(2.1) Grammars Definitions Grammars Backus-Naur Form Derivation – terminology – trees Grammars and ambiguity Simple example Grammar hierarchies

Embed Size (px)

Citation preview

Page 1: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.1)Grammars

Definitions Grammars Backus-Naur Form Derivation

– terminology– trees

Grammars and ambiguity Simple example Grammar hierarchies Syntax graphs Recursive descent parsing

Page 2: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.2)Definitions

Syntax– the form or structure of the expressions, statements, and program units

Semantics– the meaning of the expressions, statements, and program units

Sentence–a string of characters over some alphabet

Language–a set of sentences

Lexeme– the lowest level syntactic unit of a language» :=, {, while

Token–a category of lexemes (e.g., identifier)

Page 3: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.3)Grammars

Can serve as “generators” or “recognizers”– recognizers used in compilers–we’ll study grammars as generators

Contain 4 components– terminal symbols

»atomic components of statements in the language•appear in source programs

» identifiers, operators, punctuation, keywords–nonterminal symbols

» intermediate elements in producing terminal symbols

»never appear in source program– start (or goal) symbol

»a special nonterminal which is the starting symbol for producing statements

Page 4: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.4)Grammars (continued)

4 components (continued)–productions

»rules for transforming nonterminal symbols into terminals or other nonterminals

»“nonterminal” ::= terminals and/or nonterminals

»each has lefthand side (LHS) and righthand side (RHS)

»every nonterminal must appear on LHS of at least one production

Page 5: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.5)Grammars (continued)

4 categories of grammars– regular

»good for identifiers, parameter lists, subscripts

– context free»LHS of production is single non-terminal

– context sensitive– recursively enumerable

enough for PLs

Page 6: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.6)Backus-Naur Form (BNF)

Used to describe syntax of PL; first used for Algol-60

Nonterminals are enclosed in <...>–<expression>, <identifier>

Alternatives indicated by |–<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Options (0 or 1 occurrences) indicated by [...]–<stmt> ::= if <cond> then <stmt> [ else <stmt>]»note recursion

Repetition (0 or more occurrences) indicated by {...}–<unsigned> ::= <digit> {<digit>}

Derivation– repeated application of rules, starting with start symbol and ending with sentence

Page 7: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.7)BNF (continued)

Example grammar and derivation

<program> -> <stmts><stmts> -> <stmt> | <stmt> ; <stmts><stmt> -> <var> = <expr><var> -> a | b | c | d<expr> -> <term> + <term> | <term> - <term><term> -> <var> | const

<program> => <stmts> => <stmt>

=> <var> = <expr> => a = <expr>

=> a = <term> + <term> => a = <var> + <term>

=> a = b + <term> => a = b + const

Page 8: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.8)Derivation Terminology

Every string of symbols in the derivation is a sentential form

A sentence is a sentential form that has only terminal symbols

A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded– similarly for rightmost derivation

A derivation may be neither leftmost nor rightmost

Page 9: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.9)Derivation Trees

A derivation tree is the tree resulting from applying productions to rewrite start symbol– a parse tree is the same tree starting with terminals and

building back to the start symbol

<program>

<stmts>

<stmt>

<var> = <expr>

a <term> + <term>

<var> const

b

Page 10: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.10)Grammars and Ambiguity

A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees

An ambiguous expression grammar:– <expr> -> <expr> <op> <expr> | const– <op> -> / | -

<expr> <expr>

<expr> <op> <expr> <expr> <op> <expr>

<expr> <op> <expr> <expr> <op> <expr>

const - const / const const - const / const

Page 11: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.11)Grammars and Ambiguity (continued)

We must have unambiguous grammars so compiler can produce correct code– because parse tree provides precedence and

associativity of operators Left recursive grammars produce left associativity Right recursive grammars produce right associativity An unambiguous expression grammar:

– <expr> -> <expr> - <term> | <term>– <term> -> <term> / const | const

<expr>

<expr> - <term>

<term> <term> / const

const const

Page 12: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.12)Grammars and Ambiguity (continued)

One famous ambiguity is “dangling else”–<stmt> ::= if <cond> then <stmt> [else <stmt>]

This can derive

if X > 9

then if B = 4

then X := 5

else X := 0

Page 13: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.13)Grammars and Ambiguity (continued)

Can solve syntactically by adding nonterminals & prod–<stmt> ::= <matched> | <unmatched>–<matched> ::= if <cond> then <matched> else <matched>

–<unmatched> ::= if <cond> then <stmt> | if <cond> then <matched> else <unmatched>

Can also solve semantically–“elses are associated with immediately preceding unmatched then”

Page 14: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.14)Grammar Hierarchies

BNF (and equivalent notations such as syntax graphs) can describe context free grammars–nonterminals appear alone on the LHS of productions

But there is a whole hierarchy of grammar types– recursively enumerable

»context sensitive•context free

– regular Context free grammars can describe

the essential features of all current PLs Regular grammars are good for

identifiers, parameter lists, etc.

Page 15: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.15)Simple Grammar Example

Consider following unambiguous grammar for expressions–<expr> ::= [<expr> <addop>] <term>–<term> ::= [<term> <mulop>] <factor>–<factor> ::= (<expr>) | <digit>–<addop> ::= + | -–<mulop> ::= * | /–<digit> ::= 0 | ... | 9

This grammar is left recursive and generates expressions that are left associative

Changing <factor> production produces right associative exponentiation–<factor> ::= <expon> [ ** <factor> ]

Page 16: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.16)Syntax Graphs

Are equivalent to CFGs– put the terminals in circles or ellipses and put the

nonterminals in rectangles; – connect with lines with arrowheads

Terminals in circles Non-terminals in rectangles Lines and arrows indicate how constructs are

built

type_identifier

identifier( )

,

..constant constant

Page 17: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.17)Recursive Descent Parsing

Parsing is the process of tracing or constructing a parse tree for a given input string

Parsers usually do not analyze lexemes –done by a lexical analyzer, which is called by the parser

Page 18: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.18)Recursive Descent Parsing (continued)

A recursive descent parser traces out a parse tree in top-down order– top-down parser

Each nonterminal in the grammar has a subprogram associated with it– the subprogram parses all sentential forms that the nonterminal can generate

The recursive descent parsing subprograms are built directly from the grammar rules

Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammars

Page 19: (2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies

(2.19)Recursive Descent Parsing (continued)

Example

For the grammar: <term> -> <factor> {(* | /) <factor>}

void term () { factor (); /* parse the first factor*/ while (next_token == ast_code || next_token == slash_code) { lexical (); /* get next token */ factor (); /* parse the next factor */ } }