39
Compiler Construction Compiler Construction Parsing Parsing Rina Zviel-Girshin Rina Zviel-Girshin and Ohad Shacham and Ohad Shacham School of Computer Science School of Computer Science Tel-Aviv University Tel-Aviv University

Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

Compiler ConstructionCompiler Construction

ParsingParsing

Rina Zviel-GirshinRina Zviel-Girshin and Ohad Shachamand Ohad ShachamSchool of Computer ScienceSchool of Computer Science

Tel-Aviv UniversityTel-Aviv University

Page 2: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

22

AdministrationAdministration

Please use the forum for questionsPlease use the forum for questionshttps://forums.cs.tau.ac.il/viewforum.php?f=70https://forums.cs.tau.ac.il/viewforum.php?f=70

Don’t compile in the submission directoryDon’t compile in the submission directory

Check whether your group appears in the listCheck whether your group appears in the list Send me an email if you can’t find a teamSend me an email if you can’t find a team Send me your team if you found one and didn’t send an emailSend me your team if you found one and didn’t send an email Please send Name, Id, nova Id, and leaderPlease send Name, Id, nova Id, and leader

Page 3: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

33

Complementary ClassComplementary Class

November 26 – Schreiber 07November 26 – Schreiber 07 9:00 – 10:009:00 – 10:00 13:00 – 14:0013:00 – 14:00

November 27 – Schreiber 07November 27 – Schreiber 07 10:00 – 11:0010:00 – 11:00

Does anyone plan to come on Friday?Does anyone plan to come on Friday?

Page 4: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

44

Compiler

ICProgram

ic

x86 executable

exeLexicalAnalysi

s

Syntax Analysi

s

Parsing

AST Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

IC compilerIC compiler

Page 5: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

55

ParsingParsing

Input:Input: Sequence of TokensSequence of Tokens A context free grammarA context free grammar

Output:Output: Abstract Syntax TreeAbstract Syntax Tree

Decide whether program satisfies syntactic structureDecide whether program satisfies syntactic structure

Page 6: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

66

ParsingParsing

Context Free Grammars (CFG)Context Free Grammars (CFG)

Captures program structure (hierarchy)Captures program structure (hierarchy) Employ formal theory resultsEmploy formal theory results Automatically create “efficient” parsersAutomatically create “efficient” parsers

Grammar:S if E then S else S S print EE num

Page 7: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

77

From text to abstract syntaxFrom text to abstract syntax5 + (7 * x)

numnum++((numnum**idid))

Lexical Analyzer

program text

token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num(5)

E

E E+

E * E

( E )

num(7) id(x)

+

Num(5)

Num(7) id(x)

*Abstract syntax tree

parse tree

validsyntaxerror

Page 8: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

88

From text to abstract syntaxFrom text to abstract syntax

numnum++((numnum**idid))token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num

E

E E+

E * E

( E )

num id

+

num

num x

*Abstract syntax tree

parse tree

validsyntaxerror

Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run

Page 9: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

99

Parsing terminologyParsing terminologySymbols סימנים)): terminals (tokens) + * ( ) id numnon-terminals E

Derivation (גזירה):EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree (עץ גזירה):

1

E

E E+

E E*

2 3

Grammar rules :( חוקי(דקדוקE id E numE E + EE E * EE ( E )

Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal

Each step in a derivation is called a production

Page 10: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1010

AmbiguityAmbiguity

Derivation:EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree:

1

E

E E+

E E*

2 3

Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3

Parse tree:

E

E E*

3E E+

1 2

Leftmost derivation Rightmost derivation

Grammar rules:E id E numE E + EE E * EE ( E )

Definition: a grammar is ambiguous if there exists an input (רב-משמעי)string that has two different derivations

Page 11: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1111

Grammar rewritingGrammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )

Unambiguous grammar:E E + TE TT T * FT FF idF numF ( E )

E

E T+

T F*

3F

2

T

F

1

Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3

Parse tree:

Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars.

Page 12: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1212

Parsing methods – Top DownParsing methods – Top Down

Starts with the start symbolStarts with the start symbol Tries to transform it to the inputTries to transform it to the input

Grammar:S if E then S else S S begin S LS print EL endL ; S LE num

if 5 then print 8 else…

Token : rule Sif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S

Page 13: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1313

Parsing methods – Bottom UpParsing methods – Bottom Up

Starts with the inputStarts with the input Attempt to rewrite it to the start symbolAttempt to rewrite it to the start symbol

Widely used in practiceWidely used in practice LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1)

We will focus only on the theory of LR(0)We will focus only on the theory of LR(0)

JavaCup implements LALR(1)JavaCup implements LALR(1)

Page 14: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1414

Bottom Up – parsingBottom Up – parsing1 + (2) + (3)

E + (E) + (3)

+

E E + (E) E i

E

1 2 + 3

E

E + (3)

E

( ) ( )

E + (E)

E

E

E

E + (2) + (3)

Page 15: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1515

Bottom Up - problemsBottom Up - problems

AmbiguityAmbiguity

E = E + EE = E + E

E = iE = i

1 + 2 + 3 -> (1 + 2) + 3 ?1 + 2 + 3 -> (1 + 2) + 3 ?

1 + 2 + 3 -> 1 + (2 + 3) ?1 + 2 + 3 -> 1 + (2 + 3) ?

Page 16: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1616

CupCup

JavaCup javacParserspec

.java Parser

AST

CConstructor of onstructor of UUseful seful PParsersarsers

Automatic LALR(1) parser generatorAutomatic LALR(1) parser generator Input: cup spec fileInput: cup spec file Output: Syntax analyzer in JavaOutput: Syntax analyzer in Java

tokens

Page 17: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1717

Expression calculatorExpression calculator

terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS, MINUS, MULT, DIV;terminal PLUS, MINUS, MULT, DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;

non terminal Integer expr;non terminal Integer expr;

expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr| MINUS expr| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER

;;

Is 2+3+4+5 a valid expression?

Page 18: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1818

AmbiguitiesAmbiguities

a * b + c

a b c

+

*

a b c

*

+

a + b + c

a b c

+

+

a b c

+

+

Page 19: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

1919

terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;terminal UMINUS;terminal UMINUS;non terminal Integer expr;non terminal Integer expr;

precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;

expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr | MINUS expr %prec UMINUS%prec UMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER

;;

Expression calculatorExpression calculator

Increasing precedence

Contextual precedence

Page 20: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2020

DisambiguationDisambiguation

Each terminal assigned with precedenceEach terminal assigned with precedence

By default all terminals have lowest precedenceBy default all terminals have lowest precedence User can assign his own precedenceUser can assign his own precedence

MINUS expr MINUS expr %prec UMINUS%prec UMINUS

CUP assigns each production a precedenceCUP assigns each production a precedence Precedence of last terminal in productionPrecedence of last terminal in production

expr MINUS exprexpr MINUS expr User specified contextual precedenceUser specified contextual precedence

MINUS expr MINUS expr %prec UMINUS%prec UMINUS

Page 21: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2121

DisambiguationDisambiguation

On shift/reduce conflict resolve ambiguity by On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and comparing precedence of terminal and production and decides whether to shift or production and decides whether to shift or reducereduce

In case of equal precedences In case of equal precedences leftleft//rightright help help resolve conflictsresolve conflicts leftleft means reduce means reduce rightright means shift means shift

More information on More information on precedence declarationsprecedence declarations in in CUP’s manualCUP’s manual

Page 22: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2222

Resolving ambiguityResolving ambiguity

a + b + c

a b c

+

+

a b c

+

+

precedence left PLUS

Page 23: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2323

Resolving ambiguityResolving ambiguity

a * b + c

a b c

+

*

a b c

*

+

precedence left PLUSprecedence left MULT

Page 24: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2424

Resolving ambiguityResolving ambiguity

a + b * c

a b c

*

+

a b c

+

*

precedence left PLUSprecedence left MULT

Page 25: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2525

Resolving ambiguityResolving ambiguity

- a * b

a b

*

-

precedence left PLUSprecedence left MULTMINUS expr %prec UMINUS

a

-b

*

Page 26: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2626

Resolving ambiguityResolving ambiguityterminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal LPAREN, RPAREN;terminal UMINUS;terminal UMINUS;

precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;

expr ::= expr PLUS exprexpr ::= expr PLUS expr| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr | MINUS expr %prec %prec

UMINUSUMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER

;;

Rule has precedence of

UMINUS

UMINUS never returnedby scanner

(used only to define precedence)

Page 27: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2727

More CUP directivesMore CUP directives precedence nonassoc NEQprecedence nonassoc NEQ

Non-associative operators: Non-associative operators: < > == !=< > == != etc. etc. 1<2<31<2<3 identified as an error identified as an error 6 == 7 == 8 == 9 6 == 7 == 8 == 9

start non-terminalstart non-terminal Specifies start non-terminal other than first non-terminalSpecifies start non-terminal other than first non-terminal Can change to test parts of grammarCan change to test parts of grammar

Getting internal representationGetting internal representation Command line options:Command line options:

--dump_grammardump_grammar -dump_states -dump_states -dump_tables-dump_tables -dump-dump

Page 28: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2828

import java_cup.runtime.*;%%%cup%eofval{ return new Symbol(sym.EOF);%eofval}NUMBER=[0-9]+%%<YYINITIAL>”+” { return new Symbol(sym.PLUS); }<YYINITIAL>”-” { return new Symbol(sym.MINUS); }<YYINITIAL>”*” { return new Symbol(sym.MULT); }<YYINITIAL>”/” { return new Symbol(sym.DIV); }<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }<YYINITIAL>{NUMBER} {

return new Symbol(sym.NUMBER, new Integer(yytext()));}<YYINITIAL>\n { }<YYINITIAL>. { }

Parser gets terminals from the scannerParser gets terminals from the scanner

Scanner integrationScanner integrationGenerated from token

declarations in .cup file

Page 29: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2929

Assigning meaningAssigning meaning

So far, only validationSo far, only validationAdd Java code implementing semantic Add Java code implementing semantic

actionsactions

expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER

;

Page 30: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3030

Symbol labels used to name variablesSymbol labels used to name variables RESULT names the left-hand side symbolRESULT names the left-hand side symbol

expr ::= expr:e1 PLUS expr:e2{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n {: RESULT = n; :};

Assigning meaningAssigning meaning

Page 31: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3131

Building an ASTBuilding an AST

More useful representation of syntax treeMore useful representation of syntax treeLess clutterLess clutterActual level of detail depends on your designActual level of detail depends on your design

Basis for semantic analysisBasis for semantic analysisLater annotated with various informationLater annotated with various information

Type informationType informationComputed values Computed values

Page 32: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3232

Parse tree vs. ASTParse tree vs. AST

+

expr

1 2 + 3

expr

expr

( ) ( )

expr

expr

1 2

+

3

+

Page 33: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3333

AST constructionAST construction

AST Nodes constructed during parsingAST Nodes constructed during parsingStored in push-down stackStored in push-down stack

Bottom-up parserBottom-up parserGrammar rules annotated with actions for Grammar rules annotated with actions for

AST constructionAST constructionWhen node is constructed all children When node is constructed all children

available (already constructed)available (already constructed)Node (RESULT) pushed on stackNode (RESULT) pushed on stack

Page 34: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3434

1 + (2) + (3)

expr + (expr) + (3)

+

expr

1 2 + 3

expr

expr + (3)

expr

) ( ) (

expr + (expr)

expr

expr

expr

expr + (2) + (3)

int_const

val = 1

pluse1 e2

int_const

val = 2

int_const

val = 3

pluse1 e2

expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :}

AST constructionAST construction

Page 35: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3535

terminal Integer NUMBER;terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMISEMI;;terminal UMINUS;terminal UMINUS;non terminal Integer expr;non terminal Integer expr;non terminal expr_list, expr_part; non terminal expr_list, expr_part; precedence left PLUS, MINUS;precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left DIV, MULT;precedence left UMINUS;precedence left UMINUS;

expr_list ::= expr_list expr_part expr_list ::= expr_list expr_part | expr_part| expr_part

; ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI

; ; expr ::= expr PLUS exprexpr ::= expr PLUS expr

| expr MINUS expr| expr MINUS expr| expr MULT expr| expr MULT expr| expr DIV expr| expr DIV expr| MINUS expr %prec UMINUS| MINUS expr %prec UMINUS| LPAREN expr RPAREN| LPAREN expr RPAREN| NUMBER| NUMBER

;;

Designing an ASTDesigning an AST

Page 36: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3636

PA2PA2

Write parser for ICWrite parser for ICWrite parser for Write parser for libic.siglibic.sigCheck syntaxCheck syntax

Emit either “Parsed [file] successfully!”Emit either “Parsed [file] successfully!”or “Syntax error in [file]: [details]”or “Syntax error in [file]: [details]”

-print-ast option-print-ast optionPrints one AST node per linePrints one AST node per line

Page 37: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3737

PA2 – step 1PA2 – step 1

Understand IC grammar in the manualUnderstand IC grammar in the manual Don’t touch the keyboard before understanding specDon’t touch the keyboard before understanding spec

Write a debug JavaCup spec for IC grammarWrite a debug JavaCup spec for IC grammar A spec with “debug actions” : print-out debug A spec with “debug actions” : print-out debug

messages to understand what’s going onmessages to understand what’s going on

Try “debug grammar” on a number of test casesTry “debug grammar” on a number of test cases Keep a copy of “debug grammar” spec aroundKeep a copy of “debug grammar” spec aroundOptional: perform error recoveryOptional: perform error recovery

Use JavaCup error tokenUse JavaCup error token

Page 38: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3838

PA2 – next weekPA2 – next week

Flesh out AST class hierarchyFlesh out AST class hierarchyDon’t touch the keyboard before you Don’t touch the keyboard before you

understand the hierarchyunderstand the hierarchyKeep in mind that this is the basis for later Keep in mind that this is the basis for later

stagesstagesWeb-site contains an AST adapted with Web-site contains an AST adapted with

permission from Tovi Almozlinopermission from Tovi AlmozlinoChange CUP actions to construct AST Change CUP actions to construct AST

nodesnodes

Page 39: Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

3939

Partial example of Partial example of mainmainimport java.io.*;import IC.Lexer.Lexer;import IC.Parser.*;import IC.AST.*;

public class Compiler { public static void main(String[] args) { try { FileReader txtFile = new FileReader(args[0]); Lexer scanner = new Lexer(txtFile); Parser parser = new Parser(scanner); // parser.parse() returns Symbol, we use its value ProgAST root = (ProgAST) parser.parse().value; System.out.println(“Parsed ” + args[0] + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in ” + args[0] + “: “ + e); }

if (libraryFileSpecified) {... try { FileReader libicFile = new FileReader(libPath); Lexer scanner = new Lexer(libicFile); LibraryParser parser = new LibraryParser(scanner); ClassAST root = (ClassAST) parser.parse().value; System.out.println(“parsed “ + libPath + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in “ + libPath + “ “ + e); } } ...