Upload
bama-raja-segaran
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
c03 Syntax
Citation preview
Defining Program Syntax
Advanced Programming Languages
Syntax And SemanticsProgramming language syntax: how programs look, their form and structureSyntax is defined using a kind of formal grammarProgramming language semantics: what programs do, their behavior and meaningSemantics is harder to define
Advanced Programming Languages
OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms
Advanced Programming Languages
An English GrammarA sentence is a nounphrase, a verb, and anoun phrase.
A noun phrase is anarticle and a noun.
A verb is
An article is
A noun is... ::=
::=
::= loves | hates|eats
::= a | the ::= dog | cat | rat
Advanced Programming Languages
How The Grammar WorksThe grammar is a set of rules that say how to build a treea parse treeYou put at the root of the treeThe grammars rules say how children can be added at any point in the treeFor instance, the rule says you can add nodes , , and , in that order, as children of ::=
Advanced Programming Languages
A Parse Tree
thedogthecatloves
Advanced Programming Languages
A Programming Language GrammarAn expression can be the sum of two expressions, or the product of two expressions, or a parenthesized subexpressionOr it can be one of the variables a, b or c ::= + | * | ( ) | a | b | c
Advanced Programming Languages
A Parse Tree
+ ( ) * ( )ab((a+b)*c)c
Advanced Programming Languages
OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms
Advanced Programming Languages
::=
::=
::= loves | hates|eats
::= a | the ::= dog | cat | rattokensnon-terminal symbolsstart symbola production
Advanced Programming Languages
BNF Grammar DefinitionA BNF grammar consists of four parts:The set of tokensThe set of non-terminal symbolsThe start symbolThe set of productions
Advanced Programming Languages
Definition, ContinuedThe tokens are the smallest units of syntaxStrings of one or more characters of program text They are atomic: not treated as being composed from smaller partsThe non-terminal symbols stand for larger pieces of syntaxThey are strings enclosed in angle brackets, as in They are not strings that occur literally in program textThe grammar says how they can be expanded into strings of tokensThe start symbol is the particular non-terminal that forms the root of any parse tree for the grammar
Advanced Programming Languages
Definition, ContinuedThe productions are the tree-building rulesEach one has a left-hand side, the separator ::=, and a right-hand side The left-hand side is a single non-terminalThe right-hand side is a sequence of one or more things, each of which can be either a token or a non-terminalA production gives one possible way of building a parse tree: it permits the non-terminal symbol on the left-hand side to have the things on the right-hand side, in order, as its children in a parse tree
Advanced Programming Languages
AlternativesWhen there is more than one production with the same left-hand side, an abbreviated form can be usedThe BNF grammar can give the left-hand side, the separator ::=, and then a list of possible right-hand sides separated by the special symbol |
Advanced Programming Languages
ExampleNote that there are six productions in this grammar. It is equivalent to this one: ::= + | * | ( ) | a | b | c ::= + ::= * ::= ( ) ::= a ::= b ::= c
Advanced Programming Languages
EmptyThe special nonterminal is for places where you want the grammar to generate nothingFor example, this grammar defines a typical if-then construct with an optional else part: ::= if then ::= else |
Advanced Programming Languages
Parse TreesTo build a parse tree, put the start symbol at the rootAdd children to every non-terminal, following any one of the productions for that non-terminal in the grammarDone when all the leaves are tokensRead off leaves from left to rightthat is the string derived by the tree
Advanced Programming Languages
PracticeShow a parse tree for each of these strings:
a+ba*b+c(a+b)(a+(b)) ::= + | * | ( )| a | b | c
Advanced Programming Languages
Compiler NoteWhat we just did is parsing: trying to find a parse tree for a given stringThats what compilers do for every program you try to compile: try to build a parse tree for your program, using the grammar for whatever language you usedTake a course in compiler construction to learn about algorithms for doing this efficiently
Advanced Programming Languages
Language DefinitionWe use grammars to define the syntax of programming languagesThe language defined by a grammar is the set of all strings that can be derived by some parse tree for the grammarAs in the previous example, that set is often infinite (though grammars are finite)Constructing grammars is a little like programming...
Advanced Programming Languages
OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms
Advanced Programming Languages
Constructing GrammarsMost important trick: divide and conquerExample: the language of Java declarations: a type name, a list of variables separated by commas, and a semicolonEach variable can be followed by an initializer:float a; boolean a,b,c; int a=1, b, c=1+2;
Advanced Programming Languages
Example, ContinuedEasy if we postpone defining the comma-separated list of variables with initializers: Primitive type names are easy enough too: (Note: skipping constructed types: class names, interface names, and array types) ::= ; ::= boolean | byte | short | int | long | char | float | double
Advanced Programming Languages
Example, ContinuedThat leaves the comma-separated list of variables with initializersAgain, postpone defining variables with initializers, and just do the comma-separated list part: ::= | ,
Advanced Programming Languages
Example, ContinuedThat leaves the variables with initializers: For full Java, we would need to allow pairs of square brackets after the variable nameThere is also a syntax for array initializersAnd definitions for and ::= | =
Advanced Programming Languages
OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms
Advanced Programming Languages
Where Do Tokens Come From?Tokens are pieces of program text that we do not choose to think of as being built from smaller piecesIdentifiers (count), keywords (if), operators (==), constants (123.4), etc.Programs stored in files are just sequences of charactersHow is such a file divided into a sequence of tokens?
Advanced Programming Languages
Lexical Structure AndPhrase StructureGrammars so far have defined phrase structure: how a program is built from a sequence of tokensWe also need to define lexical structure: how a text file is divided into tokens
Advanced Programming Languages
One Grammar For BothYou could do it all with one grammar by using characters as the only tokensNot done in practice: things like white space and comments would make the grammar too messy to be readable ::= if then ::= else |
Advanced Programming Languages
Separate GrammarsUsually there are two separate grammarsOne says how to construct a sequence of tokens from a file of charactersOne says how to construct a parse tree from a sequence of tokens ::= | ::= | | ::= | | ::= | | |
Advanced Programming Languages
Separate Compiler PassesThe scanner reads the input file and divides it into tokens according to the first grammarThe scanner discards white space and commentsThe parser constructs a parse tree (or at least goes through the motionsmore about this later) from the token stream according to the second grammar
Advanced Programming Languages
Historical Note #1Early languages sometimes did not separate lexical structure from phrase structureEarly Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keywordOther languages like PL/I allow keywords to be used as identifiersThis makes them harder to scan and parseIt also reduces readability
Advanced Programming Languages
Historical Note #2Some languages have a fixed-format lexical structurecolumn positions are significantOne statement per line (i.e. per card)First few columns for statement labelEtc.Early dialects of Fortran, Cobol, and BasicAlmost all modern languages are free-format: column positions are ignored
Advanced Programming Languages
OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms
Advanced Programming Languages
Other Grammar FormsBNF variationsEBNF variationsSyntax diagrams
Advanced Programming Languages
BNF VariationsSome use or = instead of ::=Some leave out the angle brackets and use a distinct typeface for tokensSome allow single quotes around tokens, for example to distinguish | as a token from | as a meta-symbol
Advanced Programming Languages
EBNF VariationsAdditional syntax to simplify some grammar chores:{x} to mean zero or more repetitions of x[x] to mean x is optional (i.e. x | )() for grouping| anywhere to mean a choice among alternativesQuotes around tokens, if necessary, to distinguish from all these meta-symbols
Advanced Programming Languages
EBNF ExamplesAnything that extends BNF this way is called an Extended BNF: EBNFThere are many variations ::= { ;} ::= if then [else ] ::= { ( | ) ;}
Advanced Programming Languages
Syntax DiagramsSyntax diagrams (railroad diagrams)Start with an EBNF grammarA simple production is just a chain of boxes (for nonterminals) and ovals (for terminals):ifthenelseexprstmtstmtif-stmt ::= if then else
Advanced Programming Languages
BypassesSquare-bracket pieces from the EBNF get paths that bypass themifthenelseexprstmtstmtif-stmt ::= if then [else ]
Advanced Programming Languages
BranchingUse branching for multiple productions ::= + | * | ( )| a | b | c
Advanced Programming Languages
LoopsUse loops for EBNF curly brackets ::= {+ }
Advanced Programming Languages
Syntax Diagrams, Pro and ConEasier for people to read casuallyHarder to read precisely: what will the parse tree look like?Harder to make machine readable (for automatic parser-generators)
Advanced Programming Languages
Formal Context-Free GrammarsIn the study of formal languages and automata, grammars are expressed in yet another notation: These are called context-free grammarsOther kinds of grammars are also studied: regular grammars (weaker), context-sensitive grammars (stronger), etc.S aSb | X X cX |
Advanced Programming Languages
Many Other VariationsBNF and EBNF ideas are widely usedExact notation differs, in spite of occasional efforts to get uniformityBut as long as you understand the ideas, differences in notation are easy to pick up
Advanced Programming Languages
ExampleWhileStatement: while ( Expression ) Statement DoStatement: do Statement while ( Expression ) ; ForStatement: for ( ForInitopt ; Expressionopt ; ForUpdateopt) Statement [from The Java Language Specification, James Gosling et. al.]
Advanced Programming Languages
ConclusionWe use grammars to define programming language syntax, both lexical structure and phrase structureConnection between theory and practiceTwo grammars, two compiler passesParser-generators can write code for those two passes automatically from grammars
Advanced Programming Languages
Conclusion, ContinuedMultiple audiences for a grammarNovices want to find out what legal programs look likeExpertsadvanced users and language system implementerswant an exact, detailed definitionToolsparser and scanner generatorswant an exact, detailed definition in a particular, machine-readable form
Advanced Programming Languages