16
Programming Programming Languages Languages An Introduction to An Introduction to Grammars Grammars Spring 2004 Spring 2004

Programming Languages

Embed Size (px)

DESCRIPTION

Programming Languages. An Introduction to Grammars Spring 2004. Context Free Grammars. One or more non terminal symbols Lexically distinguished, e.g. upper case Terminal symbols are actual characters in the language Or they can be tokens in practice - PowerPoint PPT Presentation

Citation preview

Page 1: Programming Languages

Programming Programming LanguagesLanguages

An Introduction to GrammarsAn Introduction to Grammars

Spring 2004Spring 2004

Page 2: Programming Languages

Context Free GrammarsContext Free Grammars

One or more non terminal symbolsOne or more non terminal symbolsLexically distinguished, e.g. upper caseLexically distinguished, e.g. upper case

Terminal symbols are actual Terminal symbols are actual characters in the languagecharacters in the languageOr they can be tokens in practiceOr they can be tokens in practice

One non-terminal is the distinguished One non-terminal is the distinguished start symbol.start symbol.

Page 3: Programming Languages

Grammar RulesGrammar Rules

Non-terminal ::= sequenceNon-terminal ::= sequenceWhere sequence can be non-terminals Where sequence can be non-terminals

or terminalsor terminalsAt least some rules must have ONLY At least some rules must have ONLY

terminals on the right sideterminals on the right side

Page 4: Programming Languages

Example of GrammarExample of Grammar

S ::= (S) S ::= (S) S ::= <S>S ::= <S>S ::= emptyS ::= emptyThis is the language D2, the language This is the language D2, the language

of two kinds of balanced parensof two kinds of balanced parensE.g. ((<<>>))E.g. ((<<>>))

Well not quite D2, since that should Well not quite D2, since that should allow things like (())<>allow things like (())<>

Page 5: Programming Languages

Example, continuedExample, continued

So add the ruleSo add the ruleS ::= SSS ::= SS

And that is indeed D2And that is indeed D2But this is ambiguousBut this is ambiguous

()<>() can be parsed two ways()<>() can be parsed two ways()<> is an S and () is an S()<> is an S and () is an S() is an S and <>() is an S() is an S and <>() is an S

Nothing wrong with ambiguous Nothing wrong with ambiguous grammarsgrammars

Page 6: Programming Languages

BNF (Backus Naur/Normal BNF (Backus Naur/Normal Form)Form)

Properly attributed to Sanskrit Properly attributed to Sanskrit scholarsscholars

An extension of CFG withAn extension of CFG withOptional constructs in []Optional constructs in []Sequences {} = 0 or moreSequences {} = 0 or moreAlternation |Alternation |

All these are just short handsAll these are just short hands

Page 7: Programming Languages

BNF ShorthandsBNF Shorthands

IF ::= if EXPR then STM [else STM] fiIF ::= if EXPR then STM [else STM] fiIF ::= if EXPR then STM fiIF ::= if EXPR then STM fiIF ::= if EXPR then STM else STM fiIF ::= if EXPR then STM else STM fi

STM ::= IF | WHILESTM ::= IF | WHILESTM ::= IFSTM ::= IFSTM ::= WHILESTM ::= WHILE

STMSEQ ::= STM {;STM}STMSEQ ::= STM {;STM}STMSEQ ::= STMSTMSEQ ::= STMSTMSEQ ::= STM ; STMSEQSTMSEQ ::= STM ; STMSEQ

Page 8: Programming Languages

Programming Language Programming Language SyntaxSyntax

Expressed as a CFG where the grammar Expressed as a CFG where the grammar is closely related to the semanticsis closely related to the semantics

For exampleFor exampleEXPR ::= PRIMARY {OP | PRIMARY}EXPR ::= PRIMARY {OP | PRIMARY}OP ::= + | *OP ::= + | *

Not good, better isNot good, better isEXPR ::= TERM | EXPR + TERMEXPR ::= TERM | EXPR + TERMTERM ::= PRIMARY | TERM * PRIMARYTERM ::= PRIMARY | TERM * PRIMARY

This implies associativity and This implies associativity and precedenceprecedence

Page 9: Programming Languages

PL Syntax ContinuedPL Syntax Continued

No point in using BNF for tokens, No point in using BNF for tokens, since no semantics involvedsince no semantics involvedID ::= LETTER | LETTER IDID ::= LETTER | LETTER ID

Is actively confusing since the BC of Is actively confusing since the BC of ABC is not an identifier, and anyway ABC is not an identifier, and anyway there is no tree structure herethere is no tree structure here

Better to regard ID as a terminal Better to regard ID as a terminal symbol. In other words grammar is a symbol. In other words grammar is a grammar of tokens, not charactersgrammar of tokens, not characters

Page 10: Programming Languages

Grammars and TreesGrammars and Trees

A Grammar with a starting symbol A Grammar with a starting symbol naturally indicates a tree naturally indicates a tree representation of the programrepresentation of the program

Non terminal on left is root of tree Non terminal on left is root of tree nodenode

Right hand side are descendentsRight hand side are descendentsLeaves read left to right are the Leaves read left to right are the

terminals that give the tokens of the terminals that give the tokens of the programprogram

Page 11: Programming Languages

The Parsing ProblemThe Parsing Problem

Given a grammar of tokensGiven a grammar of tokensAnd a sequence of tokensAnd a sequence of tokensConstruct the corresponding parse Construct the corresponding parse

treetreeGiving good error messagesGiving good error messages

Page 12: Programming Languages

General ParsingGeneral Parsing

Not known to be easier than matrix Not known to be easier than matrix multiplicationmultiplicationCubic, or more properly n**2.71.. Cubic, or more properly n**2.71..

(whatever that unlikely constant is)(whatever that unlikely constant is)In practice almost always linearIn practice almost always linearIn any case not a significant amount of In any case not a significant amount of

timetimeHardest part by far is to give good Hardest part by far is to give good

messagesmessages

Page 13: Programming Languages

Regular GrammarsRegular Grammars

Also called type 3Also called type 3Only one non-terminal on right sideOnly one non-terminal on right sideRight side is eitherRight side is either

TerminalTerminalTerminal Non-terminalTerminal Non-terminal

Correspond to regular expressionsCorrespond to regular expressionsUsed to express grammar of tokensUsed to express grammar of tokens

Real_Literal ::=Real_Literal ::= {digit}* . {digit}* [E[+|-]integer_literal] {digit}* . {digit}* [E[+|-]integer_literal]

Page 14: Programming Languages

Context Sensitive GrammarsContext Sensitive Grammars

Also called type 1Also called type 1Left side can contain multiple Left side can contain multiple

symbolssymbolsJUNK digit => digit JUNKJUNK digit => digit JUNKBut rhs cannot be shorter than left sideBut rhs cannot be shorter than left side

Can express powerful rulesCan express powerful rulesBut hard to parseBut hard to parse

Page 15: Programming Languages

Type 0 GrammarsType 0 Grammars

No restrictions on left/right sidesNo restrictions on left/right sidesArbitrary mixture of terminals/non-Arbitrary mixture of terminals/non-

terminals on either sideterminals on either sideVery powerful (Turing powerful)Very powerful (Turing powerful) Impossible to parse Impossible to parse

Page 16: Programming Languages

In Practice …In Practice …

Context free grammars used to Context free grammars used to express syntax.express syntax.

Other rules (formal english, VDL) Other rules (formal english, VDL) used to express static semanticsused to express static semantics

Possible to do more in grammarsPossible to do more in grammarsE.g. Algol-68 W-grammarsE.g. Algol-68 W-grammarsBut too difficult to deal withBut too difficult to deal with