46
CS 3360 1 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

Embed Size (px)

Citation preview

Page 1: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

CS 3360 1

Describing Syntax

CS 3360Spring 2012

Sec 3.1-3.4Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

Page 2: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

2CS 3360

Outline

Introduction Formal description of syntax Backus-Naur Form (BNF) Attribute grammars (probably next time

)

Page 3: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

3CS 3360

Introduction

Who must use language definitions? Implementers Programmers (the users of the language)

Syntax - the form or structure of the expressions, statements, and program units

Semantics - the meaning of the expressions, statements, and program units

Page 4: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

4CS 3360

Introduction (cont.)

ExampleSyntax of Java while statement

while (<boolean-expr>) <statement>

Semantics?

Page 5: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

5CS 3360

Describing Syntax – Vocabulary

A sentence is a string of characters over some alphabet

A language is a set of sentences A lexeme is the lowest level syntactic unit

of a language (e.g., *, sum, while) A token is a category of lexemes (e.g.,

identifier)

Page 6: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

6CS 3360

Example

index = 2 * count + 17;

Lexemes Tokensindex identifier= equal_sign2 int_literal* mult_opcount identifier+ plus_op17 int_literal; semicolon

Page 7: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

7CS 3360

Describing Syntax

Formal approaches to describing syntax:Recognizers (once you have code)

Can tell whether a given string is in a language or not

Used in compilers, and called a parserGenerators (in order to build code)

Generate the sentences of a language Used to describe the syntax of a language

Page 8: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

8CS 3360

Formal Methods of Describing Syntax Context-Free Grammars (CFG – see automata

course)

Developed by Noam Chomsky in the mid-1950’s

Language generators, meant to describe the syntax of natural languages

Define a class of languages called context-free languages

Page 9: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

9CS 3360

Formal Methods of Describing Syntax Backus-Naur Form

Invented by John Backus to describe Algol 58 Extended by Peter Naur to describe Algol 60 BNF is equivalent to context-free grammars A metalanguage is a language used to describe

another language. In BNF, abstractions are used to represent classes of

syntactic structures--they act like syntactic variables (also called nonterminal symbols)

Page 10: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

10CS 3360

Backus-Naur Form

<while_stmt> while ( <logic_expr> ) <stmt>

This is a rule (also called a production rule); it describes the structure of a while statement

Page 11: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

11CS 3360

Backus-Naur Form

A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and non-terminal symbols

A grammar is a finite non-empty set of rules An abstraction (or non-terminal symbol) can

have more than one RHS

<stmt> <single_stmt>

| { <stmt_list> }

Page 12: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

12CS 3360

Backus-Naur Form

Syntactic lists are described using recursion

<ident_list> ident

| ident , <ident_list>

Example sentences:

ident

ident , ident

ident , ident, ident

Page 13: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

13CS 3360

Example

A grammar for small language: <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5 Sample programa = b + 5

Page 14: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

14CS 3360

Exercise

Define a grammar to generate all sentences of the form:

subject verb object .where subject is “i” or “we”, and verb is “love” or “like”, and object is “exercises” or “programming”.

Page 15: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

15CS 3360

Exercise

Define the syntax of Java Boolean expressions consisting of: Constants: false and true Operators: !, &&, and ||

Page 16: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

16CS 3360

Derivation

A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)

Example: <ident_list> ident | ident , <ident_list>

<ident_list> => ident , <ident_list> => ident , ident , <ident_list> => ident, ident , ident

Page 17: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

17CS 3360

More Example

a = b + 5 <program> => <stmts>

=> <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term>=> a = <var> + <term>=> a = b + <term>=> a = b + 5

<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5

Page 18: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

18CS 3360

Derivation

Every string of symbols in the derivation is a sentential form

A sentence is a sentential form that has only terminal symbols

A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded

A derivation may be neither leftmost nor rightmost

Page 19: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

19CS 3360

Exercise

Derivea = b + 5by using a rightmost derivation.

<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5

Page 20: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

20CS 3360

Parse Tree A hierarchical representation of a

derivation

<program>

<stmts>

<stmt>

5

a

<var> = <expr>

<var>

b

<term> + <term>

Page 21: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

21CS 3360

Ambiguity of Grammars

A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees.

Page 22: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

22CS 3360

An Ambiguous Expression Grammar

<expr> <expr> <op> <expr> | 5 <op> / | -

<expr>

<expr> <expr>

<expr> <expr><op>

5 5 5- /

<op>

<expr>

<expr> <expr>

<expr> <expr>

<op><op>

<op>

5 5 5- /

Page 23: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

23CS 3360

An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of

the operators, we cannot have ambiguity

<expr> <expr> - <term> | <term><term> <term> / 5 | 5

<expr>

<expr> <term>

<term> <term>

5 5

5/

-

Page 24: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

24CS 3360

Exercise

Prove or disprove the ambiguity of the following grammar<stmt> -> <if-stmt><if-stmt> -> if <expr> then <stmt> | if <expr> then <stmt> else <stmt>

Page 25: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

25CS 3360

Operator Precedence

Derivation:<expr> => <expr> - <term>

=> <term> - <term>

=> 5 - <term>

=> 5 - <term> / 5

=> 5 - 5 / 5

<expr> <expr> - <term> | <term><term> <term> / 5 | 5

Page 26: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

26CS 3360

Operator Associativity Can we describe operator associativity

correctly?

A = A + B + C

(A + B) + C or A + (B + C)?

Does it matter?

Page 27: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

27CS 3360

Operator Associativity Operator associativity can also be indicated by

a grammar <expr> -> <expr> + <expr> | 5 (ambiguous) <expr> -> <expr> + 5 | 5 (unambiguous)

<expr><expr>

<expr>

<expr> 5

5

5

+

+

Page 28: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

28CS 3360

Left vs. Right Recursion A rule is left recursive if its LHS also appears at

the beginning (left end) of its RHS. A rule is right recursive if its LHS also appears

at the right end of its RHS.

<factor> -> <expr> ** <factor> | <expr> <expr> -> c

Example: c ** c ** c interpreted as c ** (c ** c)

Page 29: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

29CS 3360

Exercise Define a BNF grammar for expressions consisting of +, *, and **

(exponential). The operator ** has precedence over *, and * has precedence over +. Both + and * are left associative while ** is right associative.

Using the above grammar, draw a parse tree for the sentence:

7 + 6 + 5 * 4 * 3 ** 2 ** 1

Exercise to do in groups at the end of lecture

Page 30: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

30CS 3360

Extended BNF (EBNF)

Extended BNF (just abbreviations): Optional parts are placed in brackets ([ ])

<meth_call> -> ident ( [<expr_list>] ) Put alternative parts of RHSs in parentheses and

separate them with vertical bars

<term> -> <term> (+ | -) const Put repetitions (0 or more) in braces ({ })

<ident> -> letter {letter | digit}

Page 31: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

31CS 3360

Example

BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor>

EBNF: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}

Page 32: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

32CS 3360

Exercise / Homework

Write BNF rules for the following EBNF rules:1. <meth_call> -> <ident> “(” [<expr_list>] “)”2. <term> -> <term> (+ | -) const

3. <ident> -> letter {letter | digit}

Due on Tuesday at the start of the session!

Page 33: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

33CS 3360

Outline

Introduction Describing syntax formally Backus-Naur Form (BNF) Attribute grammars

Page 34: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

34CS 3360

Attribute Grammars

CFGs cannot describe all of the syntax of programming languages

Additions to CFGs to carry some semantic info along through parse trees

Primary value of attribute grammars:Static semantics specificationCompiler design (static semantics checking)

Page 35: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

35CS 3360

Basic Idea

Add attributes, attribute computation functions, and predicates to CFGs

Attributes Associated with grammar symbols Can have values assigned to them

Attribute computation functions Associated with grammar rules Specify how to compute attribute values Are often called semantic functions

Predicate functions Associated with grammar rules State some of the syntax and static semantic rules of the

language

Page 36: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

36CS 3360

Example

BNF<meth_def> -> meth <meth_name> <meth_body> end <meth_name>

<meth_name> -> <identifier>

<meth_body> -> …

AG1. Syntax rule: <meth_def> -> meth <meth_name>[1]

<meth_body>

end <meth_name>[2]

Predicate: <meth_name>[1].string == <meth_name>[2].string

2. Syntax rule: <meth_name> -> <identifier>

Semantic rule: <meth_name>.string <- <identifier>.string

Page 37: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

37CS 3360

Attribute Grammars Defined

An attribute grammar is a CFG with the following additions: A set of attributes A(X) for each grammar symbol X

A(X) consists of two disjoint sets S(X) and I(X) S(X): synthesized attributes I(X): inherited attributes

Each rule has a set of functions that define certain attributes of the non-terminals in the rule

Each rule has a (possibly empty) set of predicates to check for attribute consistency

Page 38: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

38CS 3360

Attribute Functions

Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... , A(Xn))

define synthesized attributes Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for

1 <= j <= n, define inherited attributes. Often of the form: I(Xj) = f(A(X0), ... , A(Xj-1))

Initially, there are intrinsic attributes on the leaves. Intrinsic attributes are synthesized attributes whose

value are determined outside the parse tree.

Page 39: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

39CS 3360

Example - Type Checking Rules

BNF <assign> -> <var> = <expr> <expr> -> <var> | <var> + <var>

<var> -> A | B | C Rule

A variable is either int or float. If the two operands of + has the same type, the type of expression is that of the

operands; otherwise, it is float. The type of the left side of assignment must match the type of the right side.

Attributes actual_type: synthesized for <var> and <expr> expected_type: inherited for <expr> string: intrinsic for <var>

Page 40: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

40CS 3360

Example – Attribute Grammar

1. Syntax rule: <assign> -> <var> = <expr>Semantic rule: <expr>.expected_type <- <var>.actual_type

2. Syntax rule: <expr> -> <var>[1] + <var>[2] Semantic rule: <expr>.actual_type <- (<var>[1].actual_type == int

&& <var>[2].actual_type == int) ? int : floatPredicate: <expr>.actual_type == <expr>.expected_type

3. Syntax rule: <expr> -> <var>Semantic rule: <expr>.actual_type <- <var>.actual_typePredicate: <expr>.actual_type == <expr>.expected_type

4. Syntax rule: <var> -> A | B | CSemantic rule: <var>.actual_type <- lookup(<var>.string)

Page 41: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

41CS 3360

<expr>

<var>[2]

A = A + B

Example – Parse Tree A = A + B

<assign>

<var> <var>[1]

Page 42: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

42CS 3360

<expr>

<var>[2]

A = A + B

Example – Flow of Attributes A = A + B

<assign>

<var> <var>[1]

expected_type actual_type

actual_type actual_type actual_type

<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float

Page 43: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

43CS 3360

<expr>

<var>[2]

A = A + B

Example – Calculating Attributes

A = A + B<assign>

<var> <var>[1]

expected_type actual_type

actual_type actual_type actual_type

float float int

float

floatfloat

float int

<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float

Page 44: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

44CS 3360

<expr>

<var>[2]

A = A + B

Example – Calculating Attributes

A = A + B<assign>

<var> <var>[1]

expected_type actual_type

actual_type actual_type actual_type

int int float

int

floatint

int float

<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float

Page 45: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

45CS 3360

Attribute Grammars

How are attribute values computed? If all attributes were inherited, the tree could

be decorated in top-down order. If all attributes were synthesized, the tree

could be decorated in bottom-up order. In many cases, both kinds of attributes are

used, and it is some combination of top-down and bottom-up that must be used.

Page 46: CS 33601 Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

46CS 3360

Group Exercise: homework due Tuesday February, 7 at the start of class BNF <cond_expr> -> <expr> ? <expr> : <expr>

<expr> -> <var> | <expr> + <expr> <var> -> id Rule

id's type can be bool, int, or float. Operands of + must be numeric and of the same type. The type of + is the type of its operands. The first operand of ?: must be of bool and the second and third must

be of the same type. The type of ?: is the type of its second and third operands.

Given the above BNF and rule:1. Define an attribute grammar2. Draw a decorated parse tree for “id ? id : id + id” assuming that the

first id is of type bool and the rest are of type int.