View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Compiler ConstructionCompiler Construction
ParsingParsing
Rina Zviel-GirshinRina Zviel-Girshin and Ohad Shachamand Ohad ShachamSchool of Computer ScienceSchool of Computer Science
Tel-Aviv UniversityTel-Aviv University
22
AdministrationAdministration
Please use the forum for questionsPlease use the forum for questionshttps://forums.cs.tau.ac.il/viewforum.php?f=70https://forums.cs.tau.ac.il/viewforum.php?f=70
Don’t compile in the submission directoryDon’t compile in the submission directory
Check whether your group appears in the listCheck whether your group appears in the list Send me an email if you can’t find a teamSend me an email if you can’t find a team Send me your team if you found one and didn’t send an emailSend me your team if you found one and didn’t send an email Please send Name, Id, nova Id, and leaderPlease send Name, Id, nova Id, and leader
33
Complementary ClassComplementary Class
November 26 – Schreiber 07November 26 – Schreiber 07 9:00 – 10:009:00 – 10:00 13:00 – 14:0013:00 – 14:00
November 27 – Schreiber 07November 27 – Schreiber 07 10:00 – 11:0010:00 – 11:00
Does anyone plan to come on Friday?Does anyone plan to come on Friday?
44
Compiler
ICProgram
ic
x86 executable
exeLexicalAnalysi
s
Syntax Analysi
s
Parsing
AST Symbol
Tableetc.
Inter.Rep.(IR)
CodeGeneration
IC compilerIC compiler
55
ParsingParsing
Input:Input: Sequence of TokensSequence of Tokens A context free grammarA context free grammar
Output:Output: Abstract Syntax TreeAbstract Syntax Tree
Decide whether program satisfies syntactic structureDecide whether program satisfies syntactic structure
66
ParsingParsing
Context Free Grammars (CFG)Context Free Grammars (CFG)
Captures program structure (hierarchy)Captures program structure (hierarchy) Employ formal theory resultsEmploy formal theory results Automatically create “efficient” parsersAutomatically create “efficient” parsers
Grammar:S if E then S else S S print EE num
77
From text to abstract syntaxFrom text to abstract syntax5 + (7 * x)
numnum++((numnum**idid))
Lexical Analyzer
program text
token stream
Parser
Grammar:E id E numE E + EE E * EE ( E ) num(5)
E
E E+
E * E
( E )
num(7) id(x)
+
Num(5)
Num(7) id(x)
*Abstract syntax tree
parse tree
validsyntaxerror
88
From text to abstract syntaxFrom text to abstract syntax
numnum++((numnum**idid))token stream
Parser
Grammar:E id E numE E + EE E * EE ( E ) num
E
E E+
E * E
( E )
num id
+
num
num x
*Abstract syntax tree
parse tree
validsyntaxerror
Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run
99
Parsing terminologyParsing terminologySymbols סימנים)): terminals (tokens) + * ( ) id numnon-terminals E
Derivation (גזירה):EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3
Parse tree (עץ גזירה):
1
E
E E+
E E*
2 3
Grammar rules :( חוקי(דקדוקE id E numE E + EE E * EE ( E )
Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal
Each step in a derivation is called a production
1010
AmbiguityAmbiguity
Derivation:EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3
Parse tree:
1
E
E E+
E E*
2 3
Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3
Parse tree:
E
E E*
3E E+
1 2
Leftmost derivation Rightmost derivation
Grammar rules:E id E numE E + EE E * EE ( E )
Definition: a grammar is ambiguous if there exists an input (רב-משמעי)string that has two different derivations
1111
Grammar rewritingGrammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )
Unambiguous grammar:E E + TE TT T * FT FF idF numF ( E )
E
E T+
T F*
3F
2
T
F
1
Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3
Parse tree:
Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars.
1212
Parsing methods – Top DownParsing methods – Top Down
Starts with the start symbolStarts with the start symbol Tries to transform it to the inputTries to transform it to the input
Grammar:S if E then S else S S begin S LS print EL endL ; S LE num
if 5 then print 8 else…
Token : rule Sif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S
…
1313
Parsing methods – Bottom UpParsing methods – Bottom Up
Starts with the inputStarts with the input Attempt to rewrite it to the start symbolAttempt to rewrite it to the start symbol
Widely used in practiceWidely used in practice LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1)
We will focus only on the theory of LR(0)We will focus only on the theory of LR(0)
JavaCup implements LALR(1)JavaCup implements LALR(1)
1414
Bottom Up – parsingBottom Up – parsing1 + (2) + (3)
E + (E) + (3)
+
E E + (E) E i
E
1 2 + 3
E
E + (3)
E
( ) ( )
E + (E)
E
E
E
E + (2) + (3)
1515
Bottom Up - problemsBottom Up - problems
AmbiguityAmbiguity
E = E + EE = E + E
E = iE = i
1 + 2 + 3 -> (1 + 2) + 3 ?1 + 2 + 3 -> (1 + 2) + 3 ?
1 + 2 + 3 -> 1 + (2 + 3) ?1 + 2 + 3 -> 1 + (2 + 3) ?
1616
CupCup
JavaCup javacParserspec
.java Parser
AST
CConstructor of onstructor of UUseful seful PParsersarsers
Automatic LALR(1) parser generatorAutomatic LALR(1) parser generator Input: cup spec fileInput: cup spec file Output: Syntax analyzer in JavaOutput: Syntax analyzer in Java
tokens
1717
Expression calculatorExpression calculator
terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS, MINUS, MULT, DIV;terminal PLUS, MINUS, MULT, DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;
non terminal Integer expr;non terminal Integer expr;
expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr| MINUS expr| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER
;;
Is 2+3+4+5 a valid expression?
1818
AmbiguitiesAmbiguities
a * b + c
a b c
+
*
a b c
*
+
a + b + c
a b c
+
+
a b c
+
+
1919
terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;terminal UMINUS;terminal UMINUS;non terminal Integer expr;non terminal Integer expr;
precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;
expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr | MINUS expr %prec UMINUS%prec UMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER
;;
Expression calculatorExpression calculator
Increasing precedence
Contextual precedence
2020
DisambiguationDisambiguation
Each terminal assigned with precedenceEach terminal assigned with precedence
By default all terminals have lowest precedenceBy default all terminals have lowest precedence User can assign his own precedenceUser can assign his own precedence
MINUS expr MINUS expr %prec UMINUS%prec UMINUS
CUP assigns each production a precedenceCUP assigns each production a precedence Precedence of last terminal in productionPrecedence of last terminal in production
expr MINUS exprexpr MINUS expr User specified contextual precedenceUser specified contextual precedence
MINUS expr MINUS expr %prec UMINUS%prec UMINUS
2121
DisambiguationDisambiguation
On shift/reduce conflict resolve ambiguity by On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and comparing precedence of terminal and production and decides whether to shift or production and decides whether to shift or reducereduce
In case of equal precedences In case of equal precedences leftleft//rightright help help resolve conflictsresolve conflicts leftleft means reduce means reduce rightright means shift means shift
More information on More information on precedence declarationsprecedence declarations in in CUP’s manualCUP’s manual
2222
Resolving ambiguityResolving ambiguity
a + b + c
a b c
+
+
a b c
+
+
precedence left PLUS
2323
Resolving ambiguityResolving ambiguity
a * b + c
a b c
+
*
a b c
*
+
precedence left PLUSprecedence left MULT
2424
Resolving ambiguityResolving ambiguity
a + b * c
a b c
*
+
a b c
+
*
precedence left PLUSprecedence left MULT
2525
Resolving ambiguityResolving ambiguity
- a * b
a b
*
-
precedence left PLUSprecedence left MULTMINUS expr %prec UMINUS
a
-b
*
2626
Resolving ambiguityResolving ambiguityterminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;terminal UMINUS;terminal UMINUS;
precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;
expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr | MINUS expr %prec %prec
UMINUSUMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER
;;
Rule has precedence of
UMINUS
UMINUS never returnedby scanner
(used only to define precedence)
2727
More CUP directivesMore CUP directives precedence nonassoc NEQprecedence nonassoc NEQ
Non-associative operators: Non-associative operators: < > == !=< > == != etc. etc. 1<2<31<2<3 identified as an error identified as an error 6 == 7 == 8 == 9 6 == 7 == 8 == 9
start non-terminalstart non-terminal Specifies start non-terminal other than first non-terminalSpecifies start non-terminal other than first non-terminal Can change to test parts of grammarCan change to test parts of grammar
Getting internal representationGetting internal representation Command line options:Command line options:
--dump_grammardump_grammar -dump_states -dump_states -dump_tables-dump_tables -dump-dump
2828
import java_cup.runtime.*;%%%cup%eofval{ return new Symbol(sym.EOF);%eofval}NUMBER=[0-9]+%%<YYINITIAL>”+” { return new Symbol(sym.PLUS); }<YYINITIAL>”-” { return new Symbol(sym.MINUS); }<YYINITIAL>”*” { return new Symbol(sym.MULT); }<YYINITIAL>”/” { return new Symbol(sym.DIV); }<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }<YYINITIAL>{NUMBER} {
return new Symbol(sym.NUMBER, new Integer(yytext()));}<YYINITIAL>\n { }<YYINITIAL>. { }
Parser gets terminals from the scannerParser gets terminals from the scanner
Scanner integrationScanner integrationGenerated from token
declarations in .cup file
2929
Assigning meaningAssigning meaning
So far, only validationSo far, only validationAdd Java code implementing semantic Add Java code implementing semantic
actionsactions
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
3030
Symbol labels used to name variablesSymbol labels used to name variables RESULT names the left-hand side symbolRESULT names the left-hand side symbol
expr ::= expr:e1 PLUS expr:e2{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n {: RESULT = n; :};
Assigning meaningAssigning meaning
3131
Building an ASTBuilding an AST
More useful representation of syntax treeMore useful representation of syntax treeLess clutterLess clutterActual level of detail depends on your designActual level of detail depends on your design
Basis for semantic analysisBasis for semantic analysisLater annotated with various informationLater annotated with various information
Type informationType informationComputed values Computed values
3232
Parse tree vs. ASTParse tree vs. AST
+
expr
1 2 + 3
expr
expr
( ) ( )
expr
expr
1 2
+
3
+
3333
AST constructionAST construction
AST Nodes constructed during parsingAST Nodes constructed during parsingStored in push-down stackStored in push-down stack
Bottom-up parserBottom-up parserGrammar rules annotated with actions for Grammar rules annotated with actions for
AST constructionAST constructionWhen node is constructed all children When node is constructed all children
available (already constructed)available (already constructed)Node (RESULT) pushed on stackNode (RESULT) pushed on stack
3434
1 + (2) + (3)
expr + (expr) + (3)
+
expr
1 2 + 3
expr
expr + (3)
expr
) ( ) (
expr + (expr)
expr
expr
expr
expr + (2) + (3)
int_const
val = 1
pluse1 e2
int_const
val = 2
int_const
val = 3
pluse1 e2
expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :}
AST constructionAST construction
3535
terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMISEMI;;terminal UMINUS;terminal UMINUS;non terminal Integer expr;non terminal Integer expr;non terminal expr_list, expr_part; non terminal expr_list, expr_part; precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;
expr_list ::= expr_list expr_part expr_list ::= expr_list expr_part | expr_part| expr_part
; ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI
; ; expr ::= expr PLUS exprexpr ::= expr PLUS expr
| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr %prec UMINUS| MINUS expr %prec UMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER
;;
Designing an ASTDesigning an AST
3636
PA2PA2
Write parser for ICWrite parser for ICWrite parser for Write parser for libic.siglibic.sigCheck syntaxCheck syntax
Emit either “Parsed [file] successfully!”Emit either “Parsed [file] successfully!”or “Syntax error in [file]: [details]”or “Syntax error in [file]: [details]”
-print-ast option-print-ast optionPrints one AST node per linePrints one AST node per line
3737
PA2 – step 1PA2 – step 1
Understand IC grammar in the manualUnderstand IC grammar in the manual Don’t touch the keyboard before understanding specDon’t touch the keyboard before understanding spec
Write a debug JavaCup spec for IC grammarWrite a debug JavaCup spec for IC grammar A spec with “debug actions” : print-out debug A spec with “debug actions” : print-out debug
messages to understand what’s going onmessages to understand what’s going on
Try “debug grammar” on a number of test casesTry “debug grammar” on a number of test cases Keep a copy of “debug grammar” spec aroundKeep a copy of “debug grammar” spec aroundOptional: perform error recoveryOptional: perform error recovery
Use JavaCup error tokenUse JavaCup error token
3838
PA2 – next weekPA2 – next week
Flesh out AST class hierarchyFlesh out AST class hierarchyDon’t touch the keyboard before you Don’t touch the keyboard before you
understand the hierarchyunderstand the hierarchyKeep in mind that this is the basis for later Keep in mind that this is the basis for later
stagesstagesWeb-site contains an AST adapted with Web-site contains an AST adapted with
permission from Tovi Almozlinopermission from Tovi AlmozlinoChange CUP actions to construct AST Change CUP actions to construct AST
nodesnodes
3939
Partial example of Partial example of mainmainimport java.io.*;import IC.Lexer.Lexer;import IC.Parser.*;import IC.AST.*;
public class Compiler { public static void main(String[] args) { try { FileReader txtFile = new FileReader(args[0]); Lexer scanner = new Lexer(txtFile); Parser parser = new Parser(scanner); // parser.parse() returns Symbol, we use its value ProgAST root = (ProgAST) parser.parse().value; System.out.println(“Parsed ” + args[0] + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in ” + args[0] + “: “ + e); }
if (libraryFileSpecified) {... try { FileReader libicFile = new FileReader(libPath); Lexer scanner = new Lexer(libicFile); LibraryParser parser = new LibraryParser(scanner); ClassAST root = (ClassAST) parser.parse().value; System.out.println(“parsed “ + libPath + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in “ + libPath + “ “ + e); } } ...