17
Specifying Languages Specifying Languages 480/680 – Comparative Languages 480/680 – Comparative Languages

Specifying Languages CS 480/680 – Comparative Languages

Embed Size (px)

Citation preview

Page 1: Specifying Languages CS 480/680 – Comparative Languages

Specifying LanguagesSpecifying Languages

CS 480/680 – Comparative LanguagesCS 480/680 – Comparative Languages

Page 2: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 2

Specifying a LanguageSpecifying a Language Informal methods

• Textbooks, tutorials, etc.

Formal definitions• Needed for exactness

Compiler writers, etc.

• Like technical specifications for design

Syntax – what expressions are legal? Semantics – what should they do?

Page 3: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 3

Context Free GrammarsContext Free Grammars Definition: A context-free grammar (CFG) is

a 4-tuple, G = (V, , R, S)• V = variables, non-terminal symbols = terminal symbols (alphabet)• R = production rules• S = start symbol, S V

V, , R, S are all finite

Page 4: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 4

A Context Free GrammarA Context Free Grammar V = A, B = (a, b) R = A

aAaA BB

bBbB AB

S = A

A aAa A aAa aaAaa A aAa aaBaa A B aabBbaa B bBb aabbBbbaa B bBb aabbbba B

What language does this grammar specify?

Page 5: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 5

Another Example CFGAnother Example CFG V = A = (a, b) R = A

aAaA

bAbA aA bA

S = A

What language does this grammar specify?

Page 6: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 6

More examplesMore examples Write a CFG for the following languages:

“All strings consisting of one or more a’s, followed by twice as many b’s.”

“Strings with more a’s than b’s.”

There is an entire class devoted to formal specifications of languages: CS 466/666 – Introduction to Formal Languages

Page 7: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 7

A CFG for Integer Arithmetic ExpressionsA CFG for Integer Arithmetic Expressions V = <num>, <digit>, <op>, <expr> = [(, ), 0…9, , , , ] R = <expr> <num>

<expr> <op> <expr> (<expr>)

<num> <digit><num> | <digit><digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<op> | | |

S = <expr>

Page 8: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 8

Derivation of an ExpressionDerivation of an Expression <expr> <expr> <op> <expr>

(<expr>) <op> <expr> (<expr>) + <expr> (<expr> <op> <expr>) + <expr> (<expr> <expr>) + <expr> (<num> <expr>) + <expr> (<digit><num> <expr>) + <expr> (<digit><digit> <num>) + <expr> (7<digit> <num>) + <expr> (73 <num>) + <expr> (73 <digit>) + <expr> (73 4) + <expr> (73 4) + <num> (73 4) + <digit> (73 4) + 9

Page 9: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 9

Parse TreesParse Trees The derivation of an expression can also be

expressed as a tree This parse tree can help to resolve the

interpretation of an expression A compiler reads in the source code, and

produces a parse tree before generating code.

Page 10: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 10

Example Parse TreeExample Parse Tree A simple CFG: E E E | 0 | 1 E E E

E E E 1 E E 1 0 E 1 0 1

E

E E

E E

1

1

0

E

E E

E E

1

1

0

Since there aretwo parse trees for this expression, the grammar is ambiguous.(Note: the order of substitution is not the issue.)

Since there aretwo parse trees for this expression, the grammar is ambiguous.(Note: the order of substitution is not the issue.)

(1 – 0) – 1 1 – (0 – 1)

Page 11: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 11

AmbiguityAmbiguity If there are two parse trees for any expression,

the grammar is syntactically ambiguous Programming languages should be specified by

unambiguous grammars• Otherwise it is difficult to determine the semantics

of a syntactically correct statement• a = b + c * d;• Conventions (like operator precedence) can be used

to clarify syntactically ambiguous grammars

Page 12: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 12

Disambiguating a grammarDisambiguating a grammar We can disambiguate our simple grammar by

adding explicit parentheses: E (E E) | 0 | 1 E E E

(E E) E (1 E) E (1 0) E (1 0) 1

In general, you can remove ambiguity in a grammar by imposing state in the derivation.

Page 13: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 13

An ambiguous grammarAn ambiguous grammar S aSb | aSbb | Language: L = {anbm | 0 n m 2n}

• The number of b’s is between the number of a’s and twice the number of a’s

aabbb can be generated two ways

Disambiguating:• Step 1: Produce all a’s with matching b’s

• Step 2: Produce all extra b’s.

S aSb | A | A aAbb | abb

Page 14: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 14

BNFBNF Backus-Naur Form A standard notation for CFG’s, often used in

specifying languages• Non-terminals (variables) are enclosed in <>

<expression>, <number> <empty> =

is the production symbol ()• | is used for “or”

Page 15: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 15

BNF ExampleBNF Example <real-number> <integer-part> . <fraction> <integer-part> <digit> | <integer-part> <digit> <fraction> <digit> | <digit><fraction> <digit>

Can we generate the number “.7” from this grammar?

Page 16: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 16

Extended BNFExtended BNF Makes some constructs easier to specify No more powerful than BNF Rules:

• { } = “zero or more”• [ ] = “optional” or, equivalently “zero or one”• | = “or”• ( ) are used for grouping

Page 17: Specifying Languages CS 480/680 – Comparative Languages

Language Specification 17

Arithmetic ExpressionsArithmetic Expressions <expression> ::= <expression> + <term>

| <expression> – <term>| <term>

<term> ::= <term> * <factor>| <term> / <factor>| <factor>

<factor> ::= number | name | | (<expression>)

<expression> ::= <term> { (+| – ) <term> } <term> ::= <factor> { (*| / ) <factor> } <factor> ::= ‘(’ <expression> ‘)’ | number | name