37
Compiler Construction Compiler Construction Parsing I Parsing I Ran Shaham and Ohad Shacham Ran Shaham and Ohad Shacham School of Computer Science School of Computer Science Tel-Aviv University Tel-Aviv University

Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Compiler ConstructionCompiler Construction

Parsing IParsing I

Ran Shaham and Ohad ShachamRan Shaham and Ohad ShachamSchool of Computer ScienceSchool of Computer Science

Tel-Aviv UniversityTel-Aviv University

22

AdministrationAdministration

ForumForumhttps://https://forums.cs.tau.ac.il/viewforum.php?fforums.cs.tau.ac.il/viewforum.php?f=64=64

Project Teams Project Teams Send me an email if you can’t find a teamSend me an email if you can’t find a team Send me your team if you found one and didn’t send an emailSend me your team if you found one and didn’t send an email Check excel file on websiteCheck excel file on website

First PA is at:First PA is at: http://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdfhttp://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdf

33

Programming Assignment 1Programming Assignment 1

Implement a scanner for ICImplement a scanner for IC class Tokenclass Token

At least – line, id, valueAt least – line, id, value Should extend java_cup.runtime.SymbolShould extend java_cup.runtime.Symbol Numeric token ids in Numeric token ids in sym.javasym.java

Will be later generated by JavaCupWill be later generated by JavaCup

class Compilerclass Compiler Testbed - calls scanner to print list of tokensTestbed - calls scanner to print list of tokens [StateList] <<EOF>> { return appropriate symbol } [StateList] <<EOF>> { return appropriate symbol }

44

Programming Assignment 1Programming Assignment 1

class LexicalErrorclass LexicalErrorCaught by CompilerCaught by Compiler

AssumeAssume class identifiers starts with a capital letterclass identifiers starts with a capital letterOther identifiers starts with a non capital letterOther identifiers starts with a non capital letter

55

sym.javasym.java

public class sym {public class sym {

public static final int EOF = 0;public static final int EOF = 0;

public static final int ID = 1;public static final int ID = 1;

......

}}

Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter

Unique value for each tokes

Will be generated by cup in PA2

66

Token classToken class

import java_cup.runtime.Symbol;import java_cup.runtime.Symbol;

public class Token extends Symbol {public class Token extends Symbol {

public int getId() {...}public int getId() {...}

public Object getValue() {...}public Object getValue() {...} public int getLine() {...} public int getLine() {...}

......

}}

77

JFlex directives to useJFlex directives to use

%cup%cup (integrate with cup)(integrate with cup)

%line%line (count lines)(count lines)

%type Token%type Token (pass type Token)(pass type Token)

%class Lexer%class Lexer (gen. scanner class)(gen. scanner class)

88

%cup%cup

%implements java_cup.runtime.Scanner%implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.ScannerLex class implements java_cup.runtime.Scanner

%function next_token %function next_token Returns the next tokenReturns the next token

%type java_cup.runtime.Symbol%type java_cup.runtime.Symbol Return token ClassReturn token Class

99

StructureStructure

JFlex javacIC.lexLexical analyzer

test.ic

tokens

Lexer.java

sym.javaToken.java

LexicalError.javaCompiler.java

1010

DirectionsDirections

Download JavaDownload Java Download JFlexDownload JFlex Download JavaCupDownload JavaCup Put JFlex and JavaCup in classpathPut JFlex and JavaCup in classpath EclipseEclipse

Use ant build.xmlUse ant build.xml Import jflex and javacupImport jflex and javacup

Apache AntApache Ant

1111

DirectionsDirections

Use skeleton from the websiteUse skeleton from the website Read AssignmentRead Assignment Use ForumUse Forum

1212

ToolsTools

AntAntMake environmentMake environmentA build.xml included in the skeletonA build.xml included in the skeletonDownload from:Download from:

http://ant.apache.orghttp://ant.apache.org

Use:Use:ant – to compileant – to compileant scanner – to run JFlexant scanner – to run JFlex

1313

ToolsTools

JFlexJFlexLexical analyzer generatorLexical analyzer generatorDownload from:Download from:

http://jflex.de/http://jflex.de/

Manual: Manual: http://http://jflex.de/manual.pdfjflex.de/manual.pdfAdd $MyJFlex/lib/JFlex.jar to your classpathAdd $MyJFlex/lib/JFlex.jar to your classpathUse:Use:

java JFlex.Main IC.lexjava JFlex.Main IC.lexant scanner – for ant usersant scanner – for ant users

1414

ToolsTools CupCup

Parser generatorParser generator Download from:Download from:

http://www2.cs.tum.edu/projects/cup/http://www2.cs.tum.edu/projects/cup/

Manual:Manual:http://www2.cs.tum.edu/projects/cup/manual.htmlhttp://www2.cs.tum.edu/projects/cup/manual.html

Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpathPut java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath

Use:Use: java -jar java-cup-11a.jar <your file.cup>java -jar java-cup-11a.jar <your file.cup> ant libparser – for ant usersant libparser – for ant users

1515

Compiler

ICProgram

ic

x86 executable

exeLexicalAnalysi

s

Syntax Analysi

s

Parsing

AST Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

IC compilerIC compiler

1616

ParsingParsing

Input:Input: Sequence of TokensSequence of Tokens

Output:Output: Abstract Syntax TreeAbstract Syntax Tree

Decide whether program satisfies syntactic structureDecide whether program satisfies syntactic structure

1717

Parsing errors Parsing errors

Error detectionError detection Report the most relevant error messageReport the most relevant error message Correct line numberCorrect line number Current v.s. expected tokenCurrent v.s. expected token

Error recoveryError recovery Recover and continue to the next errorRecover and continue to the next error Heuristics for good recovery to avoid many spurious errorsHeuristics for good recovery to avoid many spurious errors

Search for a semi-column and ignore the statementSearch for a semi-column and ignore the statement Ignore the next n errorsIgnore the next n errors

1818

ParsingParsing

Context Free Grammars (CFG)Context Free Grammars (CFG)

Captures program structure (hierarchy)Captures program structure (hierarchy) Employ formal theory resultsEmploy formal theory results Automatically create “efficient” parsersAutomatically create “efficient” parsers

Grammar:S if E then S else S S print EE num

1919

From text to abstract syntaxFrom text to abstract syntax5 + (7 * x)

numnum++((numnum**idid))

Lexical Analyzer

program text

token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num(5)

E

E E+

E * E

( E )

num(7) id(x)

+

Num(5)

Num(7) id(x)

*Abstract syntax tree

parse tree

validsyntaxerror

2020

From text to abstract syntaxFrom text to abstract syntax

numnum++((numnum**idid))token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num

E

E E+

E * E

( E )

num id

+

num

num x

*Abstract syntax tree

parse tree

validsyntaxerror

Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run

2121

Parsing terminologyParsing terminologySymbols סימנים)): terminals (tokens) + * ( ) id numnon-terminals E

Derivation (גזירה):EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree (עץ גזירה):

1

E

E E+

E E*

2 3

Grammar rules :( חוקי(דקדוקE id E numE E + EE E * EE ( E )

Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal

Each step in a derivation is called a production

2222

AmbiguityAmbiguity

Derivation:EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree:

1

E

E E+

E E*

2 3

Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3

Parse tree:

E

E E*

3E E+

1 2

Leftmost derivation Rightmost derivation

Grammar rules:E id E numE E + EE E * EE ( E )

Definition: a grammar is ambiguous if there exists an input (רב-משמעי)string that has two different derivations

2323

Grammar rewritingGrammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )

Unambiguous grammar:E E + TE TT T * FT FF idF numF ( E )

E

E T+

T F*

3F

2

T

F

1

Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3

Parse tree:

Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars.

2424

Parsing methods – Top DownParsing methods – Top Down

LL(k)LL(k) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““L” – leftmost derivationL” – leftmost derivation ““k” – predict based on “k” look-ahead tokensk” – predict based on “k” look-ahead tokens

Predict a production for a non-terminal and “k” tokensPredict a production for a non-terminal and “k” tokens

2525

Parsing methods – Bottom UpParsing methods – Bottom Up

LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““R” – right most derivationR” – right most derivation

Decide a production for a RHS and a lookupDecide a production for a RHS and a lookup

2626

Top Down – parsingTop Down – parsingE

1 + E

E T + EE iT i

1

1 + T + E

+

1 + 2 + E

T

E

1 + 2 + 3

T + E

E

+T E

2 3

1 + 2 + 3

2727

Top Down – parsingTop Down – parsing Starts with the start symbolStarts with the start symbol Tries to transform it to the inputTries to transform it to the input Also called Also called predictive parsingpredictive parsing LL(1) exampleLL(1) example

Grammar:S if E then S else S S begin S LS print EL endL ; S LE num

if 5 then print 8 else…

Token : rule Sif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S

2828

Top Down - problemsTop Down - problems

Left RecursionLeft Recursion A A Aa Aa A A a a

Non terminationNon terminationA

AaAaa

Aaaa

Aaaaaa…..

2929

Top Down - problemsTop Down - problems

Two rules cannot start with same tokenTwo rules cannot start with same token Can be solved by backtrackingCan be solved by backtracking Reduce #backtracksReduce #backtracks

E E T + E T + E E E T T

E

T

T + E

3030

Top Down – solutionTop Down – solution

Two waysTwo ways Eliminate left recursionEliminate left recursion Perform left refactoringPerform left refactoring

3131

Top Down – solutionTop Down – solution

Step I: left recursion removal

E E + T

E T

T T * F

T F

F id

F (E)

E T + E

T F * T

3232

Top Down – solutionTop Down – solution

Step II: left factoring

E T + E

E T

T F * T

T F

F id

F (E)

E T E’E’ + E E’ εT F T’T’ * T T’ εF idF (E)

3333

Top Down – left recursionTop Down – left recursion

Non-terminal with two rules starting with Non-terminal with two rules starting with same prefixsame prefix

Grammar:S if E then S else S S if E then S

Left-factored grammar:S if E then S XX εX else S

3434

Bottom Up – parsingBottom Up – parsing

No problem with left recursionNo problem with left recursion Widely used in practiceWidely used in practice LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1)

We will focus only on the theory of LR(0)We will focus only on the theory of LR(0)

JavaCup implements LALR(1)JavaCup implements LALR(1)

Starts with the inputStarts with the input Attempt to rewrite it to the start symbolAttempt to rewrite it to the start symbol

3535

Bottom Up – parsingBottom Up – parsing1 + (2) + (3)

E + (E) + (3)

+

E E + (E) E i

E

1 2 + 3

E

E + (3)

E

( ) ( )

E + (E)

E

E

E

E + (2) + (3)

3636

Bottom Up - problemsBottom Up - problems

AmbiguityAmbiguity

E = E + EE = E + E

E = iE = i

1 + 2 + 3 -> (1 + 2) + 3 ????1 + 2 + 3 -> (1 + 2) + 3 ????

1 + 2 + 3 -> 1 + (2 + 3) ????1 + 2 + 3 -> 1 + (2 + 3) ????

3737

SummarySummary

Do PA1Do PA1Use forumUse forum

Next weekNext weekCupCupLR(0)LR(0)