View
52
Download
0
Category
Tags:
Preview:
DESCRIPTION
Syntax. Outline. Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams. Programming Language Specification. - PowerPoint PPT Presentation
Citation preview
SYNTAX
2
Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams
Nandigam
3
Programming Language Specification
PLs require precise definitions (i.e. no ambiguity) Language form (Syntax) Language meaning (Semantics)
Consequently, PLs are specified using formal notation:
Formal syntax Tokens Grammar
Formal semantics Operational Denotational Axiomatic
Nandigam
4
Lexical Structure of PLs
Nandigam
5
Lexical Structure of PLs (cont.)
Main task of scanner: identify tokens Basic building blocks of programs E.g. keywords, identifiers, numbers, punctuation marks
Lexeme – an instance of a token. One can think of programs as strings of lexemes rather than
of characters A token of a language is a category of its lexemes (or
instances) Some tokens can have one or more lexemes
E.g. keyword, identifier, number In some cases, a token has only one single possible lexeme
E.g. equal_sign, plus_op, mult_op
Nandigam
6
Lexical Structure of PLs (cont.)
Consider the following Java statement:index = 2 * count + 17 ;
The lexemes and tokens of this statement are:
Nandigam
Lexemes Tokensindex identifier= equal_sign2 int_literal* mult_opcount identifier+ plus_op17 int_literal; semicolon
7
Lexical Structure of PLs (cont.)
Tokens in a programming language are described formally by regular expressions.
Regular expressions – descriptions of patterns of characters Regular expression operations
Basic operations Concatenation item sequencing Choice or selection | Repetition * Grouping ( )
Additional operations One or more repetitions + Range of characters [ - ] Optional ? Any character .
Nandigam
8
Lexical Structure of PLs (cont.)
Regular expression examples (a|b)*c
String that match include ababaac, aac, bbc, c, and babc [0-9]+
Integer constants with one or more digits [0-9]+(\.[0-9]+)?
Floating-point literals [a-zA-Z][a-zA-Z0-9_]*
Identifiers
Nandigam
9
Lexical Structure of PLs (cont.)
Scanners generators: lex, flex ANTLR – Another Tool for Language Recognition These programs can be used to generate a program (i.e.,
a scanner) that can extract tokens from a stream of characters.
Many PLs provide good support for regular expressions – Java, C#, Perl, Ruby, …
Support for regular expressions in Java java.util.regex package split() method of String class
Nandigam
10
Syntactic Structure of PLs Specifying the form of a programming language
Tokens Regular Expression
Syntax – organization of tokens Context-Free Grammars (CFGs)
Nandigam
11
Context-Free Grammar Context-free grammars (CFGs) are used to describe
the syntax of PLs. Proposed by Noam Chomsky – a noted linguist
BNF (Backus-Naur Form) is a notation for describing syntax.
Proposed by John Backus and Peter Naur CFG and BNF are nearly identical and are used
interchangeably. BNF is a metalanguage for programming languages. A metalanguage is a language that is used to describe
another language.
Nandigam
12
Context-Free Grammar (cont.)
CFG or BNF consists of a series of rules or productions. Productions are made up of:
Nonterminals – structures that are broken down into further structures
Terminals – things that cannot be broken down Metasymbols
Symbols that are part of CFG/BNF These are not actual symbols in the language being described Sometimes, a metasymbol is also an actual symbol in a language
One of the nonterminals is designated as the start symbol. The start symbol stands for the entire structure being
defined.
Nandigam
13
Context-Free Grammar (cont.)
CFG/BNF Example (Figure 4.2, page 83)
(1) sentence → noun-phrase verb-phrase .(2) noun-phrase → article noun(3) article → a | the(4) noun → girl | dog(5) verb-phrase → verb noun-phrase(6) verb → sees | pets
Nandigam
14
Context-Free Grammar (cont.)
The language of a CFG is the set of strings of terminals that can be generated from the start symbol by a derivation:
sentence noun-phrase verb-phrase . (rule 1) article noun verb-phrase . (rule 2) the noun verb-phrase . (rule 3) the girl verb-phrase . (rule 4) the girl verb noun-phrase . (rule 5) the girl sees noun-phrase . (rule 6) the girl sees article noun . (rule 2) the girl sees a noun . (rule 3) the girl sees a dog . (rule 4)
Nandigam
15
Context-Free Grammar (cont.)
Derivation – Generating sentences of the language through a sequence of applications of rules (or productions), beginning with a special nonterminal called the start symbol.
Leftmost derivation – The replaced nonterminal is always the leftmost nonterminal.
Rightmost derivation – The replaced nonterminal is always the rightmost nonterminal.
A derivation may be neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar.
Nandigam
16
Context-Free Grammar (cont.)
A grammar for a small language<program> → begin <stmt_list> end<stmt_list> → <stmt> | <stmt> ; <stmt_list><stmt> → <var> := <expr><expr> → <var> + <var>| <var> - <var>| <var>
<var> → A | B | C
Derive the following program:begin A := B + C ; B := C end
Is the language defined by this grammar finite or infinite?
Nandigam
17
Context-Free Grammar (cont.)
Left recursive rule – A BNF rule is left recursive if the left-hand side (LHS) appears at the beginning of its right-hand side (RHS).
Right recursive rule – A BNF rule is right recursive if the LHS appears at the right end of the RHS.
Examples:number ® number digit | digitdigit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9expr ® expr + expr
| expr expr | ( expr ) | number
Uses of recursion in BNF: to show repetition to describe complex structures
Nandigam
18
Parse Trees A parse tree is a graphical representation of hierarchical syntactic
structure of sentences. It describes graphically the replacement process in a derivation.
A parse tree is labeled by nonterminals at interior nodes and terminals at leaves.
A parse tree better expresses the structure inherent in a derivation.
Nandigam
number
number
number
digit
digit
digit
2
3
4
expr
* expr
number
digit
4
expr
expr expr +
number number
digit digit
2 3
expr
( )
19
Parse Trees (cont.)
Problem 1:
<assign> → <id> := <expr><expr> → <id> + <expr> | <id> * <expr> | ( <expr> ) | <id><id> → A | B | C
Show a leftmost derivation and a parse tree for each of the following statements:
A := A + ( B * C )A := B + C + AA := A * ( B + C )A := B * ( C * ( A + B ) )
Nandigam
20
Parse Trees (cont.)
Problem 2:Describe, in English, the language defined by the following grammar:<S> → <A> <B> <C><A> → a <A> | a<B> → b <B> | b<C> → c <C> | c
Problem 3:Consider the following grammar:<S> → <A> a <B> b<A> → <A> b | b<B> → a <B> | aWhich of the following sentences are in the language generated by this grammar?baabbbbabbbaaaaabbaab
Nandigam
21
Parse Trees (cont.)
Problem 4:Consider the following grammar:
<S> → a <S> c <B><S> → <A> | b<A> → c <A> | c<B> → d | <A>
Which of the following sentences are in the language generated by the grammar?abcdacccbdacccbccacdaccc
Nandigam
22
Abstract Syntax Trees Parse trees are still too detailed in their structure, since
every step in a derivation is expressed as nodes Abstract Syntax Tree or (just syntax tree) shows the
essential structure of a parse tree. AST is more compact than the corresponding parse tree An (abstract) syntax tree condenses a parse tree to its
essential structure Language designers and translator writers are most
interested in abstract syntax. A programmer is most interested in concrete syntax Examples on the next two slides…
Nandigam
23
Abstract Syntax Trees (cont.)
Nandigam
number
number
number
digit
digit
digit
2
3
4
2
3
4
Parse Tree Corresponding AST
24
Abstract Syntax Trees (cont.)
Nandigam
expr
* expr
number
digit
4
expr
expr expr +
number number
digit digit
2 3
expr
( )
*
4 +
2 3
Parse Tree Corresponding AST
25
Ambiguous Grammars A grammar is ambiguous if it is possible to construct two or more distinct parse
trees for the same string Example:
Grammar:expr ® expr + expr | expr expr | ( expr ) | NUMBER
Expression: 2 + 3 * 4 Parse trees – ambiguity in operator precedence
Nandigam
expr
expr expr
expr
+
* expr
expr
expr
+
* expr
expr expr NUMBER (2)
NUMBER (3)
NUMBER (4)
NUMBER (2)
NUMBER (3)
NUMBER (4)
26
Ambiguous Grammars (cont.)
Another Example: Grammar:
expr ® expr + expr | expr expr | ( expr ) | NUMBER
Expression: 2 - 3 - 4 Parse trees – ambiguity in operator associativity
Nandigam
expr
expr expr
expr
-
- expr
expr
expr
-
- expr
expr expr NUMBER (2)
NUMBER (3)
NUMBER (4)
NUMBER (2)
NUMBER (3)
NUMBER (4)
27
Ambiguous Grammars (cont.)
Ways to resolve ambiguities in a grammar Revise grammar – desired approach Provide disambiguating rule (semantic help)
Revising grammar to address precedence and associativity ambiguities Do not write rules that allow a parse tree to grow on both left and right sides Use left recursive rules for left-associative operators Use right recursive rules for right-associative operators Add new rules that establish “precedence cascade” between rules to specify
precedence Make sure operators with higher precedence appear lower in the cascade of
rules Revised grammar
expr ® expr + term | termterm ® term * factor | factorfactor ® ( expr ) | NUMBER
Nandigam
28
Ambiguous Grammars (cont.)
Problem 1:
<expr> → <expr> + <expr> | <expr> - <expr> | <expr> * <expr> | <expr> / <expr> | ( <expr> ) | NUMBERNUMBER = [0-9]+
Show that this grammar is ambiguous by constructing two distinct parse trees for each of the following expressions:30 + 5 + 230 – 5 – 230 * 5 * 230 / 5 / 230 + 5 * 2
Nandigam
29
Ambiguous Grammars (cont.)
Revised unambiguous grammar
<expr> → <expr> + <term> | <expr> - <term> | <term><term> → <term> * <factor> | <term> / <factor> | <factor><factor> → ( <expr> ) | NUMBER
NUMBER = [0-9]+
Nandigam
30
Ambiguous Grammars (cont.)
Problem 2:
Show that the following grammar is ambiguous:
<S> → <A><A> → <A> + <A> | <id><id> → a | b | c
Nandigam
31
Ambiguous Grammars (cont.)
Are there other alternatives to resolving ambiguities? Yes, but they change the language!
Fully-parenthesized expressions:
expr ® ( expr + expr ) | ( expr - expr ) | NUMBER
Prefix expressions:
expr ® + expr expr | - expr expr | NUMBER
Nandigam
32
Extended BNF Adds new metasymbols (or operations) to BNF to enhance
readability and writability. These new extensions do not enhance the descriptive
power of BNF. It facilitates development of parsing tools based on an
approach called Recursive-Descent Parsing.
New metasymbols added to EBNF:
{ } zero or more repetitions [ ] optional parts ( | ) multiple-choice
Nandigam
33
Extended BNF (cont.)
Examples:BNF: <number> → <number> <digit> | <digit>EBNF: <number> → <digit> {<digit>}
BNF: <expr> → <expr> + <term> | <term>EBNF: <expr> → <term> {+ <term>}
BNF: <expr> → <term> ^ <expr> | <term>EBNF: <expr> → <term> [^ <expr>]
BNF: <selection> → if <logic-expr> then <statement>| if <logic-expr> then <statement> else <statement>EBNF <selection> →if <logic-expr> then <statement> [else <statement>]
BNF: <for-stmt> → for <var> := <expr> to <expr> do <statement> | for <vat> := <expr> downto <expr> do <statement>EBNF: <for-stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>
Nandigam
34
Extended BNF (cont.)
More examples:BNF: <expr> → <expr> + <term> | <term><term> → <term> * <power> | <term> / <power> | <term> % <power> | <power><power> → <factor> ^ <power> | factor<factor> → (<expr>) | NUMBERNUMBER = [0-9]+
EBNF: <expr> → <term> {+ <term>}<term> → <power> { * <power> | / <power> | % <power> }<power> → <factor> [^ <power>]<factor> → (<expr>) | NUMBERNUMBER = [0-9]+
Nandigam
35
Syntax Diagrams A graphical representation for a grammar rule An alternative to EBNF Circle or ovals for terminals Squares or rectangles for nonterminals Terminals and nonterminals are connected with lines and arrows Visually appealing but takes up space Rarely seen any more: EBNF is much more compact
Nandigam
if-statement expression
statement
if ( )
else statement
Recommended