Upload
wesley-miles
View
16
Download
2
Embed Size (px)
DESCRIPTION
Programming Languages. An Introduction to Grammars Spring 2004. Context Free Grammars. One or more non terminal symbols Lexically distinguished, e.g. upper case Terminal symbols are actual characters in the language Or they can be tokens in practice - PowerPoint PPT Presentation
Citation preview
Programming Programming LanguagesLanguages
An Introduction to GrammarsAn Introduction to Grammars
Spring 2004Spring 2004
Context Free GrammarsContext Free Grammars
One or more non terminal symbolsOne or more non terminal symbolsLexically distinguished, e.g. upper caseLexically distinguished, e.g. upper case
Terminal symbols are actual Terminal symbols are actual characters in the languagecharacters in the languageOr they can be tokens in practiceOr they can be tokens in practice
One non-terminal is the distinguished One non-terminal is the distinguished start symbol.start symbol.
Grammar RulesGrammar Rules
Non-terminal ::= sequenceNon-terminal ::= sequenceWhere sequence can be non-terminals Where sequence can be non-terminals
or terminalsor terminalsAt least some rules must have ONLY At least some rules must have ONLY
terminals on the right sideterminals on the right side
Example of GrammarExample of Grammar
S ::= (S) S ::= (S) S ::= <S>S ::= <S>S ::= emptyS ::= emptyThis is the language D2, the language This is the language D2, the language
of two kinds of balanced parensof two kinds of balanced parensE.g. ((<<>>))E.g. ((<<>>))
Well not quite D2, since that should Well not quite D2, since that should allow things like (())<>allow things like (())<>
Example, continuedExample, continued
So add the ruleSo add the ruleS ::= SSS ::= SS
And that is indeed D2And that is indeed D2But this is ambiguousBut this is ambiguous
()<>() can be parsed two ways()<>() can be parsed two ways()<> is an S and () is an S()<> is an S and () is an S() is an S and <>() is an S() is an S and <>() is an S
Nothing wrong with ambiguous Nothing wrong with ambiguous grammarsgrammars
BNF (Backus Naur/Normal BNF (Backus Naur/Normal Form)Form)
Properly attributed to Sanskrit Properly attributed to Sanskrit scholarsscholars
An extension of CFG withAn extension of CFG withOptional constructs in []Optional constructs in []Sequences {} = 0 or moreSequences {} = 0 or moreAlternation |Alternation |
All these are just short handsAll these are just short hands
BNF ShorthandsBNF Shorthands
IF ::= if EXPR then STM [else STM] fiIF ::= if EXPR then STM [else STM] fiIF ::= if EXPR then STM fiIF ::= if EXPR then STM fiIF ::= if EXPR then STM else STM fiIF ::= if EXPR then STM else STM fi
STM ::= IF | WHILESTM ::= IF | WHILESTM ::= IFSTM ::= IFSTM ::= WHILESTM ::= WHILE
STMSEQ ::= STM {;STM}STMSEQ ::= STM {;STM}STMSEQ ::= STMSTMSEQ ::= STMSTMSEQ ::= STM ; STMSEQSTMSEQ ::= STM ; STMSEQ
Programming Language Programming Language SyntaxSyntax
Expressed as a CFG where the grammar Expressed as a CFG where the grammar is closely related to the semanticsis closely related to the semantics
For exampleFor exampleEXPR ::= PRIMARY {OP | PRIMARY}EXPR ::= PRIMARY {OP | PRIMARY}OP ::= + | *OP ::= + | *
Not good, better isNot good, better isEXPR ::= TERM | EXPR + TERMEXPR ::= TERM | EXPR + TERMTERM ::= PRIMARY | TERM * PRIMARYTERM ::= PRIMARY | TERM * PRIMARY
This implies associativity and This implies associativity and precedenceprecedence
PL Syntax ContinuedPL Syntax Continued
No point in using BNF for tokens, No point in using BNF for tokens, since no semantics involvedsince no semantics involvedID ::= LETTER | LETTER IDID ::= LETTER | LETTER ID
Is actively confusing since the BC of Is actively confusing since the BC of ABC is not an identifier, and anyway ABC is not an identifier, and anyway there is no tree structure herethere is no tree structure here
Better to regard ID as a terminal Better to regard ID as a terminal symbol. In other words grammar is a symbol. In other words grammar is a grammar of tokens, not charactersgrammar of tokens, not characters
Grammars and TreesGrammars and Trees
A Grammar with a starting symbol A Grammar with a starting symbol naturally indicates a tree naturally indicates a tree representation of the programrepresentation of the program
Non terminal on left is root of tree Non terminal on left is root of tree nodenode
Right hand side are descendentsRight hand side are descendentsLeaves read left to right are the Leaves read left to right are the
terminals that give the tokens of the terminals that give the tokens of the programprogram
The Parsing ProblemThe Parsing Problem
Given a grammar of tokensGiven a grammar of tokensAnd a sequence of tokensAnd a sequence of tokensConstruct the corresponding parse Construct the corresponding parse
treetreeGiving good error messagesGiving good error messages
General ParsingGeneral Parsing
Not known to be easier than matrix Not known to be easier than matrix multiplicationmultiplicationCubic, or more properly n**2.71.. Cubic, or more properly n**2.71..
(whatever that unlikely constant is)(whatever that unlikely constant is)In practice almost always linearIn practice almost always linearIn any case not a significant amount of In any case not a significant amount of
timetimeHardest part by far is to give good Hardest part by far is to give good
messagesmessages
Regular GrammarsRegular Grammars
Also called type 3Also called type 3Only one non-terminal on right sideOnly one non-terminal on right sideRight side is eitherRight side is either
TerminalTerminalTerminal Non-terminalTerminal Non-terminal
Correspond to regular expressionsCorrespond to regular expressionsUsed to express grammar of tokensUsed to express grammar of tokens
Real_Literal ::=Real_Literal ::= {digit}* . {digit}* [E[+|-]integer_literal] {digit}* . {digit}* [E[+|-]integer_literal]
Context Sensitive GrammarsContext Sensitive Grammars
Also called type 1Also called type 1Left side can contain multiple Left side can contain multiple
symbolssymbolsJUNK digit => digit JUNKJUNK digit => digit JUNKBut rhs cannot be shorter than left sideBut rhs cannot be shorter than left side
Can express powerful rulesCan express powerful rulesBut hard to parseBut hard to parse
Type 0 GrammarsType 0 Grammars
No restrictions on left/right sidesNo restrictions on left/right sidesArbitrary mixture of terminals/non-Arbitrary mixture of terminals/non-
terminals on either sideterminals on either sideVery powerful (Turing powerful)Very powerful (Turing powerful) Impossible to parse Impossible to parse
In Practice …In Practice …
Context free grammars used to Context free grammars used to express syntax.express syntax.
Other rules (formal english, VDL) Other rules (formal english, VDL) used to express static semanticsused to express static semantics
Possible to do more in grammarsPossible to do more in grammarsE.g. Algol-68 W-grammarsE.g. Algol-68 W-grammarsBut too difficult to deal withBut too difficult to deal with