41
Bottom-up Parsing Saumya Debray CSc 453 Oct 20, 2016

CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Bottom-upParsing

SaumyaDebrayCSc 453Oct20,2016

Page 2: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

BASICCONCEPTS

Page 3: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

CSc453:SyntaxAnalysis 3

Parsing:Top-downvs.Bottom-up

S

parsetree

inputstring

top-downparsing:• startswiththestart

symbol• identifiesaderivation

sequencethatproducestheinputstring

→ growstheparsetreefromthetopdown

Page 4: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

CSc453:SyntaxAnalysis 4

Parsing:Top-downvs.Bottom-up

bottom-upparsing:• startswiththeinput

tokens• identifiesa“backwards

derivationsequence”thatproducesthestartsymbol

→ assemblestheparsetreefromthebottomup

S

parsetree

inputstring

Page 5: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Whyuseabottom-upparser?

+ Canhandlesomegrammarsthatrecursive-descentparserscannot– don’tneedtorewritethegrammar

+ Createdusingparsergenerators(e.g.,bison)– speedsupparsercreation

‒ Provideslesscontrolovertheparsingprocessthanrecursive-descentparsers– e.g.,errormessagesarelesshelpful

Page 6: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde

Page 7: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

A

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde

Page 8: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde A

A

Page 9: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

A

A B

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde

Page 10: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

A

A B

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde

S

Page 11: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Doingderivationsbackwards

a edcbb

A

A B

Grammar:S → aABeA → AbcA → bB → d

Input:abbcde

SNoticethatwedidn’treducethis‘b’usingtheruleA® b.Ifwehad,theparserwouldhavegottenstuck.

Page 12: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

SHIFT-REDUCEPARSING

Page 13: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Shift-reduceparsing

• Aninstanceofbottom-upparsing• Basicidea:repeat

1. inthestringbeingprocessed,findasubstringα suchthatA→α isaproduction

2. replaceα byA(i.e.,reverseaderivationstep)

untilwegetthestartsymbol.

• Technicalissues:Figuringout1. whichsubstringtoreplace;and2. whichproductiontoreducewith.

encodedintheparsetable

Page 14: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

LRparsing*

parsetable

$

...

parserstack

$

driver

input

*LRparsing:aparticulartypeofshift-reduceparsing

Page 15: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

LRparsing

parsetable

$

...

$

driver

input

prettymuchthesameforallLRparsers

parserstack

Page 16: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

LRparsing

parsetable

$

...

$

driver

input

Producedbytheparsergenerator

Encodesthestructureofthegrammar

prettymuchthesameforallLRparsers

parserstack

Page 17: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

LRparsing• DataStructures:

– astack,itsbottommarkedby‘$’.Initiallyempty.– theinputstring,itsrightendmarkedby‘$’

• Actions:repeat1. Shift some(³ 0)symbolsfromtheinputstringontothe

stack,untilparsetablesaystoreduce.2. Reduce b totheLHSoftheappropriateproduction.until readytoaccept.

– Acceptance:wheninputisemptyandstackcontainsonlythestartsymbol.

Page 18: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Example

Grammar :S → aABeA → Abc | bB → d

Stack (→) Input Action*$ abbcde$ shift$a bbcde$ shift$ab bcde$ reduce: A → b$aA bcde$ shift$aAb cde$ shift$aAbc de$ reduce: A → Abc$aA de$ shift$aAd e$ reduce: B → d$aAB e$ shift$aABe $ reduce: S → aABe$S $ accept

*fromparsetable

Page 19: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

PARSERGENERATORS

Page 20: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Parsergenerators

• ConstructingLRparsersbyhandispainful– largeno.ofparserstates

• Parsergenerators(e.g.,yacc,bison)takeaspecificationofagrammarandwriteouttheCcodefortheparser– convenient– debuggingcanbetedious

Page 21: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Parsergenerators

• Takesaspecificationforacontext-freegrammar

• Producescodeforaparser

Input:asetofgrammarrulesandactions

Output:Ccodeimplementingaparser:

yacc

(orbison)

Page 22: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Usingparsergenerators

flex yacc

yylex() yyparse()

lexicalrules grammarrules

inputtokens parsed

input

lex.yy.c y.tab.c

“yacc -d”Þ y.tab.h

describesstates,transitionsofparser(usefulfordebugging)

y.output

“yacc -v”

Page 23: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Usingparsergenerators

flex yacc

yylex() yyparse()

lexicalrules grammarrules

inputtokens parsed

input

lex.yy.c y.tab.c

“yacc -d”Þ y.tab.h

describesstates,transitionsofparser(usefulfordebugging)

y.output

“yacc -v”

specifications

Page 24: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Usingparsergenerators

flex yacc

yylex() yyparse()

lexicalrules grammarrules

inputtokens parsed

input

lex.yy.c y.tab.c

“yacc -d”Þ y.tab.h

describesstates,transitionsofparser(usefulfordebugging)

y.output

“yacc -v”

tools

Page 25: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Usingparsergenerators

flex yacc

yylex() yyparse()

lexicalrules grammarrules

inputtokens parsed

input

lex.yy.c y.tab.c

“yacc -d”Þ y.tab.h

describesstates,transitionsofparser(usefulfordebugging)

y.output

“yacc -v”

generatedCcode

Page 26: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Usingyacc/bison

Structureofinputfile:

definitions(optional)%%rules(optional)%%usercode(optional)

Formatofrules:Grammarrules:

A® B1 B2

A® C1 C2 C3

Yacc rules:A:B1 B2

|C1 C2 C3

Youcanembedsemanticactions(Ccode)anywhereontheRHSofarule

Page 27: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Example:parsingexpressions

associativity+

precedence

Calculator:

Page 28: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Example:parsingexpressionsASTBuilder:

Page 29: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

SOMENON-STANDARDPARSERS

Page 30: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

1.Facultyteachingevaluations(circa2012)

students

TCE

instructorsummary

data

CSDept staff

Excelspreadsheet

export

HTML

simpleHTMLparser

(yacc-based)

webpages

evaluation

Page 31: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

1.Facultyteachingevaluationsscanner(inputtoflex) parser(inputtoyacc)

Page 32: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

1.Facultyteachingevaluationsgenerated output

Page 33: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

2.ComputingFIRST,FOLLOWsetsAtooltocomputeFIRSTandFOLLOWsetsforanarbitrarygrammar[CSc 453]

gff:

grammarinyacc format

parser(yacc-based)

FIRST,FOLLOWsetsFIRST,

FOLLOWcomputation

Page 34: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

2.ComputingFIRST,FOLLOWsets

yacc specificationfortheyacc-formatinputgrammar:

Page 35: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

• WantCcodethatmodelsthebehaviorofindividualx86instructions– foranalyzingexecutablesforasecurityproject

• Toomanytodomanually– xed reports1503differentinstructions

Parsethex86InstructionReferenceManual

Page 36: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

parser(yacc-based)

x86referencemanuals

x86instructionmodels(Ccode)

libraries,analysisdriver

Ccompiler,linker

securityanalysisapplication

Page 37: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

wanttoparsethis

Page 38: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

It’salot harderthanIhadthought

Page 39: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

Theinputtoyacc lookslikethis:

Page 40: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

3.ParsingIntelx86manuals

Currentstatus:workingonit

Page 41: CSC453 guest lecture - 2016-10-20 · CSc 453: Syntax Analysis 3 Parsing: Top-down vs. Bottom-up S parse tree input string top-down parsing: • starts with the start symbol • identifies

Summary

• Bottom-upparserscanbeanalternativetorecursive-descent– painfultowritebyhand– muchmoreconvenientviaparsergenerators

• Theyconstructtheparsetreebottom-up– startatthetokens– workbyrepeatedlyundoingderivationsteps

• Parsersalsofindnon-standardapplications