Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Bottom-upParsing
SaumyaDebrayCSc 453Oct20,2016
BASICCONCEPTS
CSc453:SyntaxAnalysis 3
Parsing:Top-downvs.Bottom-up
S
parsetree
inputstring
top-downparsing:• startswiththestart
symbol• identifiesaderivation
sequencethatproducestheinputstring
→ growstheparsetreefromthetopdown
CSc453:SyntaxAnalysis 4
Parsing:Top-downvs.Bottom-up
bottom-upparsing:• startswiththeinput
tokens• identifiesa“backwards
derivationsequence”thatproducesthestartsymbol
→ assemblestheparsetreefromthebottomup
S
parsetree
inputstring
Whyuseabottom-upparser?
+ Canhandlesomegrammarsthatrecursive-descentparserscannot– don’tneedtorewritethegrammar
+ Createdusingparsergenerators(e.g.,bison)– speedsupparsercreation
‒ Provideslesscontrolovertheparsingprocessthanrecursive-descentparsers– e.g.,errormessagesarelesshelpful
Doingderivationsbackwards
a edcbb
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde
Doingderivationsbackwards
a edcbb
A
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde
Doingderivationsbackwards
a edcbb
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde A
A
Doingderivationsbackwards
a edcbb
A
A B
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde
Doingderivationsbackwards
a edcbb
A
A B
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde
S
Doingderivationsbackwards
a edcbb
A
A B
Grammar:S → aABeA → AbcA → bB → d
Input:abbcde
SNoticethatwedidn’treducethis‘b’usingtheruleA® b.Ifwehad,theparserwouldhavegottenstuck.
SHIFT-REDUCEPARSING
Shift-reduceparsing
• Aninstanceofbottom-upparsing• Basicidea:repeat
1. inthestringbeingprocessed,findasubstringα suchthatA→α isaproduction
2. replaceα byA(i.e.,reverseaderivationstep)
untilwegetthestartsymbol.
• Technicalissues:Figuringout1. whichsubstringtoreplace;and2. whichproductiontoreducewith.
encodedintheparsetable
LRparsing*
parsetable
$
...
parserstack
$
driver
input
*LRparsing:aparticulartypeofshift-reduceparsing
LRparsing
parsetable
$
...
$
driver
input
prettymuchthesameforallLRparsers
parserstack
LRparsing
parsetable
$
...
$
driver
input
Producedbytheparsergenerator
Encodesthestructureofthegrammar
prettymuchthesameforallLRparsers
parserstack
LRparsing• DataStructures:
– astack,itsbottommarkedby‘$’.Initiallyempty.– theinputstring,itsrightendmarkedby‘$’
• Actions:repeat1. Shift some(³ 0)symbolsfromtheinputstringontothe
stack,untilparsetablesaystoreduce.2. Reduce b totheLHSoftheappropriateproduction.until readytoaccept.
– Acceptance:wheninputisemptyandstackcontainsonlythestartsymbol.
Example
Grammar :S → aABeA → Abc | bB → d
Stack (→) Input Action*$ abbcde$ shift$a bbcde$ shift$ab bcde$ reduce: A → b$aA bcde$ shift$aAb cde$ shift$aAbc de$ reduce: A → Abc$aA de$ shift$aAd e$ reduce: B → d$aAB e$ shift$aABe $ reduce: S → aABe$S $ accept
*fromparsetable
PARSERGENERATORS
Parsergenerators
• ConstructingLRparsersbyhandispainful– largeno.ofparserstates
• Parsergenerators(e.g.,yacc,bison)takeaspecificationofagrammarandwriteouttheCcodefortheparser– convenient– debuggingcanbetedious
Parsergenerators
• Takesaspecificationforacontext-freegrammar
• Producescodeforaparser
Input:asetofgrammarrulesandactions
Output:Ccodeimplementingaparser:
yacc
(orbison)
Usingparsergenerators
flex yacc
yylex() yyparse()
lexicalrules grammarrules
inputtokens parsed
input
lex.yy.c y.tab.c
“yacc -d”Þ y.tab.h
describesstates,transitionsofparser(usefulfordebugging)
y.output
“yacc -v”
Usingparsergenerators
flex yacc
yylex() yyparse()
lexicalrules grammarrules
inputtokens parsed
input
lex.yy.c y.tab.c
“yacc -d”Þ y.tab.h
describesstates,transitionsofparser(usefulfordebugging)
y.output
“yacc -v”
specifications
Usingparsergenerators
flex yacc
yylex() yyparse()
lexicalrules grammarrules
inputtokens parsed
input
lex.yy.c y.tab.c
“yacc -d”Þ y.tab.h
describesstates,transitionsofparser(usefulfordebugging)
y.output
“yacc -v”
tools
Usingparsergenerators
flex yacc
yylex() yyparse()
lexicalrules grammarrules
inputtokens parsed
input
lex.yy.c y.tab.c
“yacc -d”Þ y.tab.h
describesstates,transitionsofparser(usefulfordebugging)
y.output
“yacc -v”
generatedCcode
Usingyacc/bison
Structureofinputfile:
definitions(optional)%%rules(optional)%%usercode(optional)
Formatofrules:Grammarrules:
A® B1 B2
A® C1 C2 C3
Yacc rules:A:B1 B2
|C1 C2 C3
Youcanembedsemanticactions(Ccode)anywhereontheRHSofarule
Example:parsingexpressions
associativity+
precedence
Calculator:
Example:parsingexpressionsASTBuilder:
SOMENON-STANDARDPARSERS
1.Facultyteachingevaluations(circa2012)
students
TCE
instructorsummary
data
CSDept staff
Excelspreadsheet
export
HTML
simpleHTMLparser
(yacc-based)
webpages
evaluation
1.Facultyteachingevaluationsscanner(inputtoflex) parser(inputtoyacc)
1.Facultyteachingevaluationsgenerated output
2.ComputingFIRST,FOLLOWsetsAtooltocomputeFIRSTandFOLLOWsetsforanarbitrarygrammar[CSc 453]
gff:
grammarinyacc format
parser(yacc-based)
FIRST,FOLLOWsetsFIRST,
FOLLOWcomputation
2.ComputingFIRST,FOLLOWsets
yacc specificationfortheyacc-formatinputgrammar:
3.ParsingIntelx86manuals
• WantCcodethatmodelsthebehaviorofindividualx86instructions– foranalyzingexecutablesforasecurityproject
• Toomanytodomanually– xed reports1503differentinstructions
Parsethex86InstructionReferenceManual
3.ParsingIntelx86manuals
parser(yacc-based)
x86referencemanuals
x86instructionmodels(Ccode)
libraries,analysisdriver
Ccompiler,linker
securityanalysisapplication
3.ParsingIntelx86manuals
wanttoparsethis
3.ParsingIntelx86manuals
It’salot harderthanIhadthought
3.ParsingIntelx86manuals
Theinputtoyacc lookslikethis:
3.ParsingIntelx86manuals
Currentstatus:workingonit
Summary
• Bottom-upparserscanbeanalternativetorecursive-descent– painfultowritebyhand– muchmoreconvenientviaparsergenerators
• Theyconstructtheparsetreebottom-up– startatthetokens– workbyrepeatedlyundoingderivationsteps
• Parsersalsofindnon-standardapplications