92
Compiler 2 nd Phase Syntax Analysis Unit-2 : Syntax Analysis 1

Compiler 2 nd Phase Syntax Analysis Unit-2 : Syntax Analysis1

Embed Size (px)

Citation preview

Compiler

Compiler2nd PhaseSyntax AnalysisUnit-2 : Syntax Analysis1IntroductionSyntax analyzer = ParserChecks for syntax of languagegenerate either Parse tree or syntactic erroreg. a = b +10 = a + b 10

Unit-2 : Syntax Analysis2Role of Parser Lexical AnalyzerParserSourceprogramdemand for tokensupply for tokenError Handlergenerate parse treeSymbol tableUnit-2 : Syntax Analysis3Unit-2 : Syntax Analysis4Parsers (cont.)We categorize the parsers into two groups:Top-Down Parserthe parse tree is created top to bottom, starting from the root.Bottom-Up Parserthe parse is created bottom to top; starting from the leavesBoth top-down and bottom-up parsers scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars.LL for top-down parsingLR for bottom-up parsingBasic issues in compilerTwo main issues1. Specification of syntax2. Representation of i/p after parsing

most critical issue is Parsing algorithm

Unit-2 : Syntax Analysis5Unit-2 : Syntax Analysis6Context-Free GrammarsInherently recursive structures of a programming language are defined by a context-free grammar.

In a context-free grammar, we have:A finite set of terminals (in our case, this will be the set of tokens)A finite set of non-terminals (syntactic-variables)A finite set of productions rules in the following formA where A is a non-terminal and is a string of terminals and non-terminals (including the empty string)A start symbol (one of the non-terminal symbol)

Example:E E + E | E E | E * E | E / E | - EE ( E )E idUnit-2 : Syntax Analysis7DerivationsE E+E

E+E derives from Ewe can replace E by E+Eto able to do this, we have to have a production rule EE+E in our grammar.

E E+E id+E id+id

A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.

In general a derivation step isA if there is a production rule A in our grammar where and are arbitrary strings of terminal and non-terminal symbols

1 2 ... n (n derives from 1 or 1 derives n )

: derives in one step: derives in zero or more steps: derives in one or more steps

*+7Unit-2 : Syntax Analysis8CFG - TerminologyL(G) is the language of G (the language generated by G) which is a set of sentences.A sentence of L(G) is a string of terminal symbols of G.If S is the start symbol of G then is a sentence of L(G) iff S where is a string of terminals of G.

If G is a context-free grammar, L(G) is a context-free language.Two grammars are equivalent if they produce the same language.

S - If contains non-terminals, it is called as a sentential form of G.- If does not contain non-terminals, it is called as a sentence of G. +*Unit-2 : Syntax Analysis9Derivation ExampleE -E -(E) -(E+E) -(id+E) -(id+id)ORE -E -(E) -(E+E) -(E+id) -(id+id)

At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement.

If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation.

If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

Unit-2 : Syntax Analysis10Left-Most and Right-Most DerivationsLeft-Most Derivation

E -E -(E) -(E+E) -(id+E) -(id+id)Right-Most DerivationE -E -(E) -(E+E) -(E+id) -(id+id)

We will see that the top-down parsers try to find the left-most derivation of the given source program.

We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

lmlmlmlmlmrmrmrmrmrmUnit-2 : Syntax Analysis11Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.

A parse tree can be seen as a graphical representation of a derivation.E -E EE-EEEEE+-()EEE-()EEidEEE+-()idEEEEE+-()id -(E) -(E+E) -(id+E) -(id+id)Unit-2 : Syntax Analysis12Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar.E E+E id+E id+E*E id+id*E id+id*idE E*E E+E*E id+E*E id+id*E id+id*idEidE+ididEE*EEE+idEE*EididUnit-2 : Syntax Analysis13Ambiguity (cont.)For the most parsers, the grammar must be unambiguous.unambiguous grammar unique selection of the parse tree for a sentenceWe should eliminate the ambiguity in the grammar during the design phase of the compiler.An unambiguous grammar should be written to eliminate the ambiguity.We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.Unit-2 : Syntax Analysis14Ambiguity (cont.)stmt if expr then stmt | if expr then stmt else stmt | otherstmtsif E1 then if E2 then S1 else S2stmt

if expr then stmt else stmt

E1 if expr then stmt S2

E2 S1stmt

if expr then stmt

E1 if expr then stmt elsestmt

E2 S1 S2

12Unit-2 : Syntax Analysis15Ambiguity (cont.) We prefer the second parse tree (else matches with closest if). So, we have to disambiguate our grammar to reflect this choice. The unambiguous grammar will be:

stmt matchedstmt | unmatchedstmt

matchedstmt if expr then matchedstmt else matchedstmt | otherstmt

unmatchedstmt if expr then stmt | if expr then matchedstmt else unmatchedstmtUnit-2 : Syntax Analysis16Ambiguity Operator PrecedenceAmbiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules.

E E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left)* (left to right)+ (left to right)E E+T | TT T*F | FF G^F | GG id | (E)Unit-2 : Syntax Analysis17Left RecursionA grammar is left recursive if it has a non-terminal A such that there is a derivation. A A for some string Top-down parsing techniques cannot handle left-recursive grammars.So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive.The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation.+Unit-2 : Syntax Analysis18Immediate Left-RecursionA A | where does not start with Aeliminate immediate left recursionA AA A | an equivalent grammarA A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with Aeliminate immediate left recursionA 1 A | ... | n AA 1 A | ... | m A | an equivalent grammarIn general,Unit-2 : Syntax Analysis19Immediate Left-Recursion- ExampleE E+T | TT T*F | FF id | (E)

E T EE +T E | T F TT *F T | F id | (E)

eliminate immediate left recursionUnit-2 : Syntax Analysis20Left-Recursion -- Problem A grammar cannot be immediately left-recursive, but it still can be left-recursive. By just eliminating the immediate left-recursion, we may not get a grammar which is not left-recursive.S Aa | bA Sc | dThis grammar is not immediately left-recursive,but it is still left-recursive.

S Aa Sca orA Sc Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammarUnit-2 : Syntax Analysis21Eliminate Left-Recursion-Algorithm- Arrange non-terminals in some order: A1 ... An- for i from 1 to n do { - for j from 1 to i-1 do {replace each production Ai Aj by Ai 1 | ... | k where Aj 1 | ... | k }- eliminate immediate left-recursions among Ai productions}

Unit-2 : Syntax Analysis22Eliminate Left-Recursion-ExampleS Aa | bA Ac | Sd | f

- Order of non-terminals: S, A

for S:- we do not enter the inner loop.- there is no immediate left recursion in S.

for A:- Replace A Sd with A Aad | bd So, we will have A Ac | Aad | bd | f- Eliminate the immediate left-recursion in A A bdA | fA A cA | adA |

So, the resulting equivalent grammar which is not left-recursive is:S Aa | bA bdA | fAA cA | adA | Unit-2 : Syntax Analysis23Eliminate Left-Recursion-Example2S Aa | bA Ac | Sd | f

- Order of non-terminals: A, S

for A:- we do not enter the inner loop.- Eliminate the immediate left-recursion in A A SdA | fA A cA |

for S:- Replace S Aa with S SdAa | fAa So, we will have S SdAa | fAa | b - Eliminate the immediate left-recursion in S S fAaS | bS S dAaS |

So, the resulting equivalent grammar which is not left-recursive is:S fAaS | bSS dAaS | A SdA | fAA cA |

Unit-2 : Syntax Analysis24Left-FactoringA predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored.grammar a new equivalent grammar suitable for predictive parsing

stmt if expr then stmt else stmt | if expr then stmt

when we see if, we cannot know which production rule to choose to re-write stmt in the derivation.Unit-2 : Syntax Analysis25Left-Factoring (cont.)In general,

A 1|2 where is non-empty and the first symbols of 1 and 2 (if they have one)are different.

when processing we cannot know whether expand A to 1 or A to 2

But, if we re-write the grammar as follows A A A 1|2 so, we can immediately expand A to AUnit-2 : Syntax Analysis26Left-Factoring -- Algorithm For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say

A 1 | ... | n | 1 | ... | m

convert it into

A A | 1 | ... | m A 1 | ... | n

Unit-2 : Syntax Analysis27Left-Factoring Example1A abB | aB | cdg | cdeB | cdfBA aA | cdg | cdeB | cdfBA bB | BA aA | cdAA bB | BA g | eB | fB

Unit-2 : Syntax Analysis28Left-Factoring Example2A ad | a | ab | abc | b A aA | bA d | | b | bc A aA | bA d | | bAA | c

Unit-2 : Syntax Analysis29Parsing TechniquesTypes of parserShift reduce PredictiveRecursive descentBottom-up parserTop-down parserOperator precedenceCanonical parserLR parserLALRUnit-2 : Syntax Analysis30Top-down parsingGenerated from top to bottomDerivation terminated when the required i/p string terminatesLMD matches this requirement.

Unit-2 : Syntax Analysis31Problem with Top-down parsingBacktrackingLeft recursionLeft factoringAmbiguityUnit-2 : Syntax Analysis32Recursive Descent parserThat uses collection of recursive procedures for parsing the given i/p string R.H.S of the production rule is directly converted to a program code symbol by symbolCFG is used to build the recursive routineseach variable = procedure and body = r.h.s. of corresponding procedureUnit-2 : Syntax Analysis33Basic steps for RD parser1 If i/p symbol is variable then a call to the procedure corresponding to the variable is made.2. If i/p symbol is terminal then it is matched with the lookahead from i/p. The lookahead ptr has to be advanced on matching of the i/p symbol.3. If production rule has many alternate then all these alternate has to be combined into single body of procedure.4. It should be activated by a procedure corresponding to the start symbol.

Unit-2 : Syntax Analysis34Predictive Parsing -- LL(1) Parserpredictive parsing is a table-driven parser.It is a top-down parser.It is also known as LL(1) Parser.

input buffer

stackNon-recursive OutputPredictive Parser

Parsing Table Unit-2 : Syntax Analysis35LL(1) Parserinput buffer string to be parsed. Its end is marked with a special symbol $.output a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer.stackcontains the grammar symbols at the bottom of the stack, there is a special end marker symbol $.initially the stack contains only the symbol $ and the starting symbol S. $S initial stackwhen the stack is emptied (ie. only $ left in the stack), the parsing is completed.parsing tablea two-dimensional array M[A,a] each row is a non-terminal symboleach column is a terminal symbol or the special symbol $each entry holds a production rule.Unit-2 : Syntax Analysis36Construction of Predictive LL(1) ParserFollowing steps :computation of FIRST and FOLLOW functionConstruct the Predictive parsing table using First and Follow functionParse the input string with the help of predictive parsing table.Unit-2 : Syntax Analysis37For Predictive parser required terms(1)FIRST function :FIRST() is a set of terminal that are 1st symbols appearing at R.H.S. in derivation of .Following are the rules : - If x is a terminal symbol FIRST(a)={a}- If there is a rule X FIRST(X)={}- For the rule A X1 X2 .. Xk FIRST(A)=FIRST(X1)+FIRST(X2)+.+FIRST(Xk)where k Xj