7/24/2019 04 Syntax Analysis S16
1/45
Syntax Analysis
CSE 340 Principles of Programming Languages
Spring 2016
Aam !oup"
Arizona State University
#ttp$%%aamoupe&com
7/24/2019 04 Syntax Analysis S16
2/45
Aam !oup"' Principles of Programming Languages
Syntax Analysis
( )#e goal of syntax analysis is to transformt#e se*uence of to+ens from t#e lexer intosomet#ing useful
( ,o-e.er' -e nee a -ay to specify anc#ec+ if t#e se*uence of to+ens is .ali / PLS /
!ECAL !) / ! !) !
!) !) !) / ! !) !
2
7/24/2019 04 Syntax Analysis S16
3/45
Aam !oup"' Principles of Programming Languages
sing egular Expressions
P5A S)A)EE/)7
S)A)EE/) E8PESS/ 9 :;S)) 9 9 ? 9 7 9 %E8PESS/ @/ 9 ! 9 !ECAL P @/ 9 !9 !ECAL
5 + 10
foo - bar
1 + 2 + 3
3
7/24/2019 04 Syntax Analysis S16
4/45
7/24/2019 04 Syntax Analysis S16
5/45
Aam !oup"' Principles of Programming Languages
Context?:ree 5rammars
( Syntax for context?free grammars Eac# ro- is calle a prouction
( /on?terminals on t#e left
( ig#t arro-
( /on?terminals an terminals on t#e rig#t
/on?terminals -ill start -it# an upper case in our examples' terminals -ill Belo-ercase an are to+ens
S -ill typically Be t#e starting non?terminal
( Example for matc#ing parent#esis
S F
S F @ S
Can also -rite more succinctly By comBining prouction rules -it# t#e samestarting non?terminals
SF @ S 9
G
7/24/2019 04 Syntax Analysis S16
6/45
Aam !oup"' Principles of Programming Languages
C:5 Example
SF @ S 9
!eri.ations of t#e C:5S
S @ S @ @
S @ S @ @ S @ @ @@
6
7/24/2019 04 Syntax Analysis S16
7/45
Aam !oup"' Principles of Programming Languages
C:5 Example
ExpF Exp > Exp
ExpF Exp 7 Exp
ExpF /
Exp Exp 7 Exp Exp 7 3 Exp > Exp 7 3
Exp > 2 7 3 1 > 2 7 3
H
7/24/2019 04 Syntax Analysis S16
8/45
Aam !oup"' Principles of Programming Languages
Leftmost !eri.ation
( Al-ays expan t#e leftmost nonterminal
ExpF Exp > Exp
ExpF Exp 7 Exp
ExpF /
s t#is a leftmost eri.ation
Exp Exp 7 Exp Exp 7 3 Exp > Exp 7 3 Exp > 2 7 3
1 > 2 7 3
Exp Exp 7 Exp Exp > Exp 7 Exp 1 > Exp 7 Exp 1 > 2 7 Exp 1 > 2 7 3
I
7/24/2019 04 Syntax Analysis S16
9/45
Aam !oup"' Principles of Programming Languages
ig#tmost !eri.ation
( Al-ays expan t#e rig#tmost nonterminal
ExpF Exp > ExpExpF Exp 7 Exp
ExpF /
Exp Exp 7 Exp Exp 7 3 Exp > Exp 7 3 Exp > 2 7 3 1 > 2 7 3
10
7/24/2019 04 Syntax Analysis S16
10/45
Aam !oup"' Principles of Programming Languages
Parse )ree
(
7/24/2019 04 Syntax Analysis S16
11/45
Aam !oup"' Principles of Programming Languages
Parse )ree
Exp Exp 7 Exp Exp 7 3 Exp > Exp 7 3 Exp > 2 7 3 1 > 2 7 3
12
Exp
Exp Exp7
3Exp > Exp
21
7/24/2019 04 Syntax Analysis S16
12/45
7/24/2019 04 Syntax Analysis S16
13/45
Aam !oup"' Principles of Programming Languages
AmBiguous 5rammars
ExpF Exp > Exp
ExpF Exp 7 Exp
ExpF /
,o- to parse 1 + 2 * 3
Exp Exp 7 Exp Exp > Exp 7 Exp 1 > Exp 7 Exp 1 >
2 7 Exp 1 > 2 7 3
Exp Exp > Exp 1 > Exp 1 > Exp 7 Exp 1 > 2 7 Exp 1 > 2 7 3
14
7/24/2019 04 Syntax Analysis S16
14/45
Aam !oup"' Principles of Programming Languages
AmBiguous 5rammars
1 + 2 * 3
1G
Exp
Exp Exp7
3Exp > Exp
21
Exp
Exp Exp>
7 Exp1 Exp
2 3
7/24/2019 04 Syntax Analysis S16
15/45
Aam !oup"' Principles of Programming Languages
AmBiguous 5rammars
( A grammar is amBiguous if t#ere exists t-oifferent leftmost eri.ations' or t-o ifferentrig#tmost eri.ations' or t-o ifferent parse trees
for any string in t#e grammar( s Englis# amBiguous
sa- a man on a #ill -it# a telescope&
( AmBiguity is not esiraBle in a programminglanguage nli+e in Englis#' -e ont -ant t#e compiler to rea
your min an try to infer -#at you meant
16
7/24/2019 04 Syntax Analysis S16
16/45
Aam !oup"' Principles of Programming Languages
Parsing Approac#es
( Marious -ays to turn strings into parse tree Jottom?up parsing' -#ere you start from t#e
terminals an -or+ your -ay up
)op?o-n parsing' -#ere you start from t#estarting non?terminal an -or+ your -ay o-n
( n t#is class' -e -ill focus exclusi.ely on
top?o-n parsing
1N
7/24/2019 04 Syntax Analysis S16
17/45
Aam !oup"' Principles of Programming Languages
)op?!o-n ParsingS F A 9 J 9 C
A F a
J F JB 9 B
C F Cc 9
parse_S() {
t_type = getToken()
if (t_type = = a) {ungetToken()
parse_A()
check_eof()
}
else if (t_type = = b) {
ungetToken()parse_B()
check_eof()
}
1H
else if (t_type = = c) {
ungetToken()parse_C()
check_eof()
}
else if (t_type = = eof) {
// o !" # stuff
}
else {
synta$_error()
}
}
7/24/2019 04 Syntax Analysis S16
18/45
Aam !oup"' Principles of Programming Languages
Preicti.e ecursi.e !escent Parsers
( Preicti.e recursi.e escent parser are efficient top?o-nparsers Efficient Because t#ey only loo+ at next to+en' no
Bac+trac+ing%guessing
( )o etermine if a language allo-s a preicti.e recursi.eescent parser' -e nee to efine t#e follo-ing functions
( :S)@O' -#ere O is a se*uence of grammar symBols @non?terminals' terminals' an :S)@O returns t#e set of terminals an t#at Begin strings eri.e
from O( :LL
7/24/2019 04 Syntax Analysis S16
19/45
Aam !oup"' Principles of Programming Languages
:S)@ Example
S F A 9 J 9 C
A F a
J F JB 9 B
C F Cc 9
:S)@S D a' B' c'
:S)@A D a :S)@J D B
:S)@C D ' c
20
7/24/2019 04 Syntax Analysis S16
20/45
Aam !oup"' Principles of Programming Languages
Calculating :S)@O
:irst' start out -it# empty :S)@ sets for all non?terminals in t#egrammar
)#en' apply t#e follo-ing rules until t#e :S)@ sets o not c#ange$
1& :S)@x D x if x is a terminal
2& :S) D ( 3& f A F JO is a prouction rule' t#en a :S)@J D to
:S)@A
4& f A F J0J1J2=JiJi>1=J+an :S)@J 0 an :S)@J 1 an
:S)@J 2 an = an :S)@J i' t#en a :S)@Ji>1
D to :S)@AG& f A F J0J1J2=J+an :S)@J0 an :S)@J 1 an
:S)@J2 an = an :S)@J +' t#en a to :S)@A
21
7/24/2019 04 Syntax Analysis S16
21/45
Aam !oup"' Principles of Programming Languages
Calculating :S) Sets
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
23
INITIAL
:S)@S D
:S)@S D
:S)@S D a
:S)@S D a' c' ' B
:S)@S D a' c' ' B
:S)@A
D
:S)@A
D a
:S)@A
D a' c' ' :S)@A
D a' c' ' :S)@A
D a' c' '
:S)@J D
:S)@J D B
:S)@J D B
:S)@J D B
:S)@J D B
:S)@C D
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@! D
:S)@! D '
:S)@! D '
:S)@! D '
:S)@! D '
7/24/2019 04 Syntax Analysis S16
22/45
Aam !oup"' Principles of Programming Languages
Calculating :S) Sets
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
24
INITIAL
:S)@S D
:S)@S D
:S)@S D a
:S)@S D a' c' ' B
:S)@S D a' c' ' B
:S)@A
D
:S)@A
D a
:S)@A
D a' c' ' :S)@A
D a' c' ' :S)@A
D a' c' '
:S)@J D
:S)@J D B
:S)@J D B
:S)@J D B
:S)@J D B
:S)@C D
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@! D
:S)@! D '
:S)@! D '
:S)@! D '
:S)@! D '
7/24/2019 04 Syntax Analysis S16
23/45
Aam !oup"' Principles of Programming Languages
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
2G
INITIAL
:S)@S D
:S)@S D
:S)@S D a
:S)@S D a' c' ' B
:S)@S D a' c' ' B
:S)@A
D
:S)@A
D a
:S)@A
D a' c' ' :S)@A
D a' c' ' :S)@A
D a' c' '
:S)@J D
:S)@J D B
:S)@J D B
:S)@J D B
:S)@J D B
:S)@C D
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@C D c'
:S)@! D
:S)@! D '
:S)@! D '
:S)@! D '
:S)@! D '
1&:S)@x D x if x is a terminal
2& :S) D (
3&f A F JO is a prouction rule' t#en a :S)@J D to :S)@A
4&f A F J0J1J2=JiJi>1=J+an :S)@J 0 an :S)@J 1 an :S)@J 2 an = an :S)@J i' t#en a
:S)@Ji>1 D to :S)@A
G&f A F J0J1J2=J+an :S)@J0 an :S)@J 1 an :S)@J 2 an = an :S)@J +' t#en a to :S)@A
7/24/2019 04 Syntax Analysis S16
24/45
Aam !oup"' Principles of Programming Languages
:LL
7/24/2019 04 Syntax Analysis S16
25/45
Aam !oup"' Principles of Programming Languages
Calculating :LL1 D to
:LL
7/24/2019 04 Syntax Analysis S16
26/45
Aam !oup"' Principles of Programming Languages
Calculating :LL< Sets
2I
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
:S)@S D a' c' ' B
:S)@A D a' c' '
:S)@J D B
:S)@C D c'
:S)@! D '
INITIAL
:LL
7/24/2019 04 Syntax Analysis S16
27/45
Aam !oup"' Principles of Programming Languages
Calculating :LL< Sets
30
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
:S)@S D a' c' ' B
:S)@A D a' c' '
:S)@J D B
:S)@C D c'
:S)@! D '
INITIAL
:LL
7/24/2019 04 Syntax Analysis S16
28/45
Aam !oup"' Principles of Programming Languages 32
S F AJC!
A F C! 9 aA
J F B
C F cC 9
! F ! 9
:S)@S D a' c' ' B
:S)@A D a' c' '
:S)@J D B
:S)@C D c'
:S)@! D '
INITIAL
:LL
7/24/2019 04 Syntax Analysis S16
29/45
Aam !oup"' Principles of Programming Languages
Preicti.e ecursi.e !escent Parsers
( At eac# parsing step' t#ere is only onegrammar rule t#at can Be c#osen' ant#ere is no nee for Bac+trac+ing
( )#e conitions for a preicti.e parser areBot# of t#e follo-ing f A F O an A F R' t#en :S)@O
:S)@R f T :S)@A' t#en :S)@A
:LL
7/24/2019 04 Syntax Analysis S16
30/45
Aam !oup"' Principles of Programming Languages
Creating a Preicti.e ecursi.e !escentParser
( Create a C:5
( Calculate :S) an :LL< sets
(Pro.e t#at C:5 allo-s a Preicti.eecursi.e !escent Parser
(
7/24/2019 04 Syntax Analysis S16
31/45
Aam !oup"' Principles of Programming Languages
Email Aresses
( ,o- to parse%.aliate email aresses name U omain&tl
( )urns out' it is not so simple Vcse 340VUexample&com
customer%epartments#ippingUexample&com
VABcUefVUexample&com VABcWUefVUexample&com
VABcWVUexample&comVUexample&com
test Vexample U#elloV XtestUexample&comY
( n fact' a company calle ailgun' -#ic# pro.ies email ser.ices as an AP'
release an open?source tool to .aliate email aresses' Base on t#eirexperience -it# real?-orl email ,o- i t#ey implement t#eir parser
A recursi.e escent parser
#ttps$%%git#uB&com%mailgun%flan+er
3G
7/24/2019 04 Syntax Analysis S16
32/45
Aam !oup"' Principles of Programming Languages
Email Aress C:5
*uote?string
atom
ot?atom
-#itespace
Aress F /ame?ar?rfc 9 /ame?ar?lax 9 Ar?spec
/ame?ar?rfc F !isplay?name?rfc Angle?ar?rfc 9 Angle?ar?rfc
!isplay?name?rfc F
7/24/2019 04 Syntax Analysis S16
33/45
Aam !oup"' Principles of Programming Languages
Simplifie Email Aress C:5
*uote?string @*?s
atom
ot?atom @?a
*uote?string?at @*?s?a
ot?atom?at @?a?a
Aress F /ame?ar 9 Ar?spec
/ame?ar F !isplay?name Angle?ar 9 Angle?ar
!isplay?name F
7/24/2019 04 Syntax Analysis S16
34/45
Aam !oup"' Principles of Programming Languages
/ame?ar F !isplay?name Angle?ar 9 Angle?ar
!isplay?name F
7/24/2019 04 Syntax Analysis S16
35/45
Aam !oup"' Principles of Programming Languages 40
/ame?ar F !isplay?name Angle?ar 9 Angle?ar
!isplay?name F
7/24/2019 04 Syntax Analysis S16
36/45
Aam !oup"' Principles of Programming Languages 41
9 p
/ame?ar F !isplay?name Angle?ar 9 Angle?ar
!isplay?name F
7/24/2019 04 Syntax Analysis S16
37/45
Aam !oup"' Principles of Programming Languages
parse_Aress() {
t_type = getToken()%
// Check #&'ST(a e*ar)
if (t_type = = + ,, t_type = = ato ,, t_type = = -*s ) {
ungetToken()%
parse_a e*ar()%
printf(.Aress * a e*ar.)%}
// Check #&'ST(Ar*spec)
else if (t_type = = *a*a ,, t_type = = -*s*a) {
ungetToken()%
parse_Ar*spec()%
printf(.Aress * Ar*spec.)%
}
else {
synta$_error()%
}
}
42
9 p
:S)@Aress D ?a?a' *?s?a' X' atom'
*?s
:S)@/ame?ar D X' atom' *?s
:S)@Ar?spec D ?a?a' *?s?a
:LL
7/24/2019 04 Syntax Analysis S16
38/45
Aam !oup"' Principles of Programming Languages
parse_a e*ar() {
t_type = getToken()%
// Check #&'ST(0isplay*na e Angle*ar)
if (t_type = = ato ,, t_type = = -*s) {
ungetToken()%
parse_0isplay*na e()%
parse_Angle*ar()%
printf(.a e*ar * 0isplay*na e Angle*ar.)%
}
// Check #&'ST(Angle*ar)
else if (t_type = = + ) {
ungetToken()%
parse_Angle*ar()%
printf(.a e*ar * Angle*ar.)%}
else {
synta$_error()%
}
}
43
/ame ar F !isplay name Angle ar 9 Angle ar
:S)@/ame?ar D X' atom' *?s
:S)@!isplay?name D atom' *?s
:S)@Angle?ar D X
:LL
7/24/2019 04 Syntax Analysis S16
39/45
Aam !oup"' Principles of Programming Languages
parse_0isplay*na e() {
t_type = getToken()%
// Check #&'ST(1 or 0isplay*na e*list)
if (t_type = = ato ,, t_type = = -*s) {
ungetToken()%parse_1 or()%
parse_0isplay*na e*list()%
printf(.0isplay*na e * 1 or 0isplay*na e*list.)%
}
else {synta$_error()%
}
}
44
!isplay name
7/24/2019 04 Syntax Analysis S16
40/45
Aam !oup"' Principles of Programming Languages
parse_0isplay*na e*list() {
t_type = getToken()%
// Check #&'ST( 1 or 0isplay*na e*list)
if (t_type = = ato ,, t_type = = -*s) {
ungetToken()%
parse_1 or()%
parse_0isplay*na e*list()%
printf(.0isplay*na e*list * 1 or 0isplay*na e*list.)%
}
// Check #" 22"1 (0isplay*na e*list)
else if (t_type = = + ) {
ungetToken()%printf(.0isplay*na e*list * VZ
}
else { synta$_error()% }
}
4G
!isplay?name?list F
7/24/2019 04 Syntax Analysis S16
41/45
Aam !oup"' Principles of Programming Languages
parse_Angle*ar() {
t_type = getToken()%
// Check #&'ST(+ Ar*spec )
if (t_type = = + ) {
// ungetToken()3
parse_Ar*spec()%
t_type = getToken()%
if (t_type 4= ) {
synta$_error()%
}
printf(.Angle*ar * + Ar*spec .)%
}else {
synta$_error()%
}
}
46
Angle?ar F X Ar?spec Y
:S)@Angle?ar D X
:S)@Ar?spec D ?a?a' *?s?a
:LL
7/24/2019 04 Syntax Analysis S16
42/45
Aam !oup"' Principles of Programming Languages
parse_Ar*spec() {
t_type = getToken()%
// Check #&'ST(*a*a 0 o ain)
if (t_type = = *a*a) {
// ungetToken()3
parse_0o ain()%printf(.Ar*spec * *a*a 0o ain.)%
}
// Check #&'ST(-*s*a 0o ain)
else if (t_type = = -*s*a) {
parse_0o ain()%
printf(.Ar*spec * -*s*a 0o ain.)%
}
else { synta$_error()% }
}
4N
Ar?spec F ?a?a !omain 9 *?s?a !omain
:S)@Ar?spec D ?a?a' *?s?a
:S)@!omain D ?a
:LL
7/24/2019 04 Syntax Analysis S16
43/45
Aam !oup"' Principles of Programming Languages
parse_0o ain() {
t_type = getToken()%
// Check #&'ST(*a)
if (t_type = = *a) {printf(.0o ain * *a.)%
}
else {
synta$_error()%}
}
4H
!omain F a
:S)@!omain D ?a
:LL
7/24/2019 04 Syntax Analysis S16
44/45
Aam !oup"' Principles of Programming Languages
parse_1 or() {
t_type = getToken()%
// Check #&'ST(ato )
if (t_type = = ato ) {
printf(.1 or * ato .)%
}
// Check #&'ST(-*s)
else if (t_type = = -*s) {
printf(.1 or * -*s.)%
}
else {
synta$_error()%
}
}
4I
7/24/2019 04 Syntax Analysis S16
45/45
Preicti.e ecursi.e !escent Parsers
( :or e.ery non?terminal A in t#e grammar' create a functioncalle parse;A
( :or eac# prouction rule A F O @-#ere O is a se*uence ofterminals an non?terminals' if get)o+en@ :S)@O t#en
c#oose t#e prouction rule A F O :or e.ery terminal an non?terminal a in O' if a is a non?terminal
call parse;a' if a is a terminal c#ec+ t#at get)o+en@ a
f :S)@O' t#en c#ec+ t#at get)o+en@ :LL