11
Lecture 5: Lecture 5: Syntax AnalysisSyntax Analysis
(Section 2.2)(Section 2.2)
CSCI 431 Programming LanguagesCSCI 431 Programming Languages
Fall 2002Fall 2002
A modification of slides developed by Felix A modification of slides developed by Felix Hernandez-Campos at UNC Chapel HillHernandez-Campos at UNC Chapel Hill
22
Review: Compilation/InterpretationReview: Compilation/Interpretation
Compiler or InterpreterCompiler or Interpreter
Translation Translation ExecutionExecution
Source CodeSource Code
Target CodeTarget Code
Interpre-Interpre-tationtation
33
Review: Syntax AnalysisReview: Syntax Analysis
Compiler or InterpreterCompiler or Interpreter
Translation Translation Execution Execution
Source CodeSource Code• Specifying the Specifying the formform
of a programming of a programming
languagelanguage
– TokensTokens» Regular ExpressionsRegular Expressions
(also F.A.s & Reg. Grammars)(also F.A.s & Reg. Grammars)
– SyntaxSyntax» Context-FreeContext-Free
GrammarsGrammars(also P.D.A.s)(also P.D.A.s)
Target CodeTarget Code
55
Syntax AnalysisSyntax Analysis
• Syntax:Syntax:– Webster’s definition: Webster’s definition: 1 a : the way in which linguistic 1 a : the way in which linguistic
elements (as words) are put together to form constituents elements (as words) are put together to form constituents (as phrases or clauses)(as phrases or clauses)
• The syntax of a programming languageThe syntax of a programming language– Describes its formDescribes its form
» Organization of tokensOrganization of tokens » Context Free Grammars (CFGs)Context Free Grammars (CFGs)
– Must be Must be recognizablerecognizable by compilers and interpreters by compilers and interpreters» ParsingParsing» LL and LR parsersLL and LR parsers
66
Context Free GrammarsContext Free Grammars
• CFGsCFGs– Add recursion to regular expressionsAdd recursion to regular expressions
» Nested constructionsNested constructions
– NotationNotationexpressionexpression identifieridentifier | | numbernumber | | -- expressionexpression | | (( expressionexpression )) | | expressionexpression operatoroperator expressionexpressionoperator operator ++ | | -- | | ** | | //
» Terminal symbolsTerminal symbols» Non-terminal symbolsNon-terminal symbols» Production rule (i.e. substitution rule)Production rule (i.e. substitution rule)
terminal symbol terminal symbol terminal and non-terminal symbols terminal and non-terminal symbols
77
ParsingParsing
• Parsing an arbitrary Context Free GrammarParsing an arbitrary Context Free Grammar– O(nO(n33))– Too slow for large programsToo slow for large programs
• Linear-time parsingLinear-time parsing– LL parsers (a ‘Left-to-right, Left-most’ derivation)LL parsers (a ‘Left-to-right, Left-most’ derivation)
» Recognize LL grammarRecognize LL grammar» Use a top-down strategyUse a top-down strategy
– LR parsers (a ‘Left-to-right, Right-most’ derivation)LR parsers (a ‘Left-to-right, Right-most’ derivation)» Recognize LR grammarRecognize LR grammar» Use a bottom-up strategyUse a bottom-up strategy
88
Parsing exampleParsing example
• Example: comma-separated list of identifierExample: comma-separated list of identifier
– CFGCFG
id_list id_list idid id_list_tailid_list_tailid_list_tail id_list_tail ,, id_list_tailid_list_tailid_list_tail id_list_tail ;;
– ParsingParsing
A, B, C;A, B, C;
99
Top-down derivation of Top-down derivation of A, B, C;A, B, C;
CFGCFG
Left-to-right,Left-to-right,Left-most derivationLeft-most derivation
LL(1) parsingLL(1) parsing
1111
Bottom-up parsing of Bottom-up parsing of A, B, C;A, B, C;
CFGCFG
Left-to-right,Left-to-right,Right-most derivationRight-most derivation
LR parsingLR parsing(a shift-reduce parser)(a shift-reduce parser)
1414
LR Parsing vs. LL ParsingLR Parsing vs. LL Parsing
• LLLL– A ‘top-down’ or ‘predictive’ parserA ‘top-down’ or ‘predictive’ parser– Predict needed productions based on the current left-most Predict needed productions based on the current left-most
non-terminal in the tree and the current input tokennon-terminal in the tree and the current input token– The top-of-stack contains the left-most non-terminalThe top-of-stack contains the left-most non-terminal– The stack contains a record of what the parser expects to The stack contains a record of what the parser expects to
seesee
• LRLR– A ‘bottom-up’ or shift-reduce parserA ‘bottom-up’ or shift-reduce parser– Shifts tokens onto the stack until it recognizes a right-hand Shifts tokens onto the stack until it recognizes a right-hand
side then reduces those tokens to their left-hand sideside then reduces those tokens to their left-hand side– The stack contains a record of what the parser has already The stack contains a record of what the parser has already
seenseen
1515
An appropriate LR GrammarAn appropriate LR Grammar
id_listid_list id_list_prefixid_list_prefix ;;
id_list_prefixid_list_prefix id_list_prefixid_list_prefix ,, idid
idid
This grammar can’t be parsed top-down!This grammar can’t be parsed top-down!
Problems for LL grammars:Problems for LL grammars:
- left recursion, example above- left recursion, example above
- common prefixes, example:- common prefixes, example:
stmtstmt id := id := exprexpr | id ( | id (arg_listarg_list))
1818
Hierarchy of Linear ParsersHierarchy of Linear Parsers
• Basic containment relationshipBasic containment relationship– All CFGs can be recognized by LR parserAll CFGs can be recognized by LR parser– Only a subset of all the CFGs can be recognized by LL Only a subset of all the CFGs can be recognized by LL
parsersparsers
LL parsingLL parsing
CFGsCFGs LR parsingLR parsing
1919
Bigger PictureBigger Picture
• Chomsky Hierarchy of GrammarsChomsky Hierarchy of Grammars
RegularRegularGrammarGrammar
Context Free GrammarContext Free Grammar
Context Sensitive GrammarContext Sensitive Grammar
Unrestricted GrammarUnrestricted Grammar
2020
Implementation of an LL ParserImplementation of an LL Parser
• Two options:Two options:– A recursive descent parser (section 2.2.3)A recursive descent parser (section 2.2.3)
» For LL grammars onlyFor LL grammars only
– Parse table and a driver (section 2.2.5)Parse table and a driver (section 2.2.5)» LR parsers covered in section 2.2.6LR parsers covered in section 2.2.6
2222
Recursive Descent Parser ExampleRecursive Descent Parser Example
• Outline of Outline of
recursive parserrecursive parser
– This parser onlyThis parser onlyverifies syntaxverifies syntax
– matchmatch is isthe scannerthe scanner
2525
Recursive Descent Parser ExampleRecursive Descent Parser Example
A program that develops recursive decent A program that develops recursive decent parsers: parsers: JavaCC
2626
Semantic AnalysisSemantic Analysis
Compiler or InterpreterCompiler or Interpreter
Translation Translation Execution Execution
Source CodeSource Code• Specifying the Specifying the meaningmeaning
of a programming of a programming
languagelanguage
– Attribute GrammarsAttribute Grammars
Target CodeTarget Code