View
260
Download
0
Category
Preview:
Citation preview
1Syntax and Semantics
• The Purpose of Syntax
• Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Derivations and Parse Trees
• Sebesta Chapter 3
2What is Syntax and Semantics
• Syntax and Semantics define a PL• Syntax
– form or structure of program units• expressions, statements, declarations, etc.
• Semantics– meaning of program units
• expressions, statements, declarations, etc.
• Why do we need language definitions?– to design a language– to implementer a compiler/interpreter– to write a program (use the language)
3 Syntax Elements
• A sentence is – a string of characters over some alphabet
• A language is – a set of sentences
• A lexeme is – the lowest level syntactic unit of a language
• e.g.,*, public, totalCount
• A token is – a category of lexemes
• e.g., identifier
4Describing Syntax
• Recognizers
– read an input string in the alphabet of the language (a sentence) and decide whether it belongs to the language
• used in compilers – see Chapter 4 for details
• Generators
– produce sentences in a language• a sentence is syntactically correct if it can be
generated by the generator
5Backus-Naur Form (BNF)
• BNF is a meta-language – i.e. a language used to describe another language– invented by John Backus to describe ALGOL 58 – used by Peter Naur to describe ALGOL 60
• BNF is equivalent to context-free grammars• a BNF grammar is defined by
– a set of terminal symbols, – a set of nonterminal symbols– a set of rules– a start symbol (one of the terminal symbols)
6BNF Elements
• terminal symbols – are the lexemes of the target PL
• e.g., while, ( , )
• nonterminal symbols – represent classes of syntactic structures
• they act like syntactic variables• e.g., <statement>
• rules – define how a nonterminal symbol can by
developed into a sequence of nonterminal and terminal symbols
• e.g., <while_stmt> while ( <logic_expr> ) <stmt>
7BNF Rules
• A rule has– a left-hand side (LHS)– then – a right-hand side (RHS)
• There can be several rules for one LHS<stmt> <assignment>
<stmt> begin <stmt_list> end
• Syntactic lists are described using recursion<ident_list> ident
<ident_list> ident , <ident_list>
• A grammar is – a finite nonempty set of rules
8EBNF
• Extended BNF (EBNF) – is most often used– avoids having numerous rules for the same LHS
• Extra meta-symbols (in addition to ) – [… ]
• enclosed symbols are optional (1 or 0 times)– e.g., <if_stmt> if ( <exp> ) <stmt> [ else <stmt> ]
– {…}• enclosed symbols can be repeated (0 to n times)
– e.g., <ident_list> ident {, ident }
– …|…• choice of one of the symbol sequences separated by |
– e.g., <stmt> <assignment> | begin <stmt_list> end
– (…)• groups enclosed symbols
9
BNF<expr> <expr> + <term> <expr> <expr> - <term> <expr> <term> <term> <term> * <factor> <term> <term> / <factor> <term> <factor> <factor> <exp> ** <factor> <factor> <exp> <exp> ( <expr> ) <exp> id
EBNF<expr> <term> { ( + | - ) <term> }
<term> <factor> { ( * | / ) <factor> }
<factor> <exp> [ ** <factor> ]
<exp> ( <expr> ) | id
BNF vs. EBNF
10Augmented EBNF
• another meta-symbol
= (equal) instead of • meta-symbols for repetitions
+ means one or more times
* means zero or more times<ident> = <letter>+ ( <letter> | <digit> )*
• rules can use iteration instead of recursion – e.g.:
• <stmt_list> <stmt> | <stmt> ; <stmt_list>
– can be formulated as• <stmt_list> = <stmt> ( ; <stmt> )*
11Context-Free Grammar
• Context-Free Grammars (CFG)– defined by Noam Chomsky– meant to describe the syntax of natural languages
• Context-Free Grammar G = (S, T, N, P)• S = start symbol• T = set of terminal symbols – lexemes and tokens• N = set of non-terminal symbols - abstractions• P = production rules – definition of a LHS abstraction
using RHS
• A sentence– a sequence of terminal symbols
12A Small Language in EBNF
<program> begin <stmt_list> end<stmt_list> <stmt> | <stmt> ; <stmt_list><stmt> <var> = <expr><expr> <term> + <term> | <term> - <term><term> <var> | const<var> a | b | c
13Derivation
• A derivation is – a repeated application of rules
• starting with the start symbol• substitution of a nonterminal LHS by the RHS of a rule• ending with a sentence (all terminal symbols)
• Every string of symbols in the derivation is – a sentential form
• A sentence is– sentential form with only terminal symbols
14Derivation Types
• A leftmost derivation– leftmost nonterminal in each sentential form is
expanded first
• A rightmost derivation– rightmost nonterminal is expanded first
• A mixed derivation– an arbitrary nonterminal is expanded
15Derivation Example<program> begin <stmt_list> end<stmt_list> <stmt> | <stmt> ; <stmt_list><stmt> <var> = <expr><expr> <term> + <term> | <term> - <term><term> <var> | const<var> a | b | c
<program> => begin <stmt_list> end
=> begin <stmt> end
=> begin <var> = <expr> end
=> begin a = <expr> end
=> begin a = <term> + <term> end
=> begin a = <var> + <term> end
=> begin a = b + <term> end
=> begin a = b + const end
16Questions
In the preceding slide:1. Is the derivation a leftmost or a rightmost derivation?
2. State the "opposite" derivation.• I.e. if it is a leftmost derivation give rightmost one • or vice versa
3. What are the terminal symbols of the language, what are the nonterminal symbols and what is the start symbol?
4. Change a rule so that begin a = - b + const end
is a legal sentence
17Parse Tree
• Parse Tree is– a hierarchical representation of a derivation
<program>
<stmt_list>
<stmt>
const
a
<var> = <expr>
<var>
b
<term> + <term>
begin end
18
EBNF Grammar
<assign> <id> = <expr><expr> <id> + <expr> | <id> * <expr>
| ( <expr> )
| <id>
<id> a | b | c
Parse tree of the sentence:
a = b * (a + c)
Simple Assignment Language
<assign>
a
<id> = <expr>
<id>
c
*
b
<id> <expr>
<expr>( )
a
<id> + <expr>
19Ambiguous Grammars
• A grammar is ambiguous
– if and only if it generates a sentential form that has two or more distinct parse trees
– e.g.<assign> <id> = <expr><expr> <expr> + <expr> | <expr> * <expr>
| ( <expr> )
| <id>
<id> a | b | c
20
add-first parse tree a = b + c * d
multiply-first parse tree a = b + c * d
Two Distinct Parse Trees
<assign>
a
=
<id>
d
*<expr>
b
<id>
+
<expr>
<expr>
c
<id>
<expr>
<expr><id>
<assign>
a
=
*
<expr>
b
<id>
+<expr>
<id>
<expr>
c
<expr>
<id>
d
<expr>
<id>
21An Unambiguous Expression Grammar
• The same language can be defined with an unambiguous grammar!
<assign> <id> = <expr><expr> <expr> + <term> | <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> ) | <id>
<id> a | b | c
Recommended