22
CSC 3130: Automata theory and formal languages Parsers for programming languages MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

MELJUN CORTES automata12

Embed Size (px)

Citation preview

CSC 3130: Automata theory and formal languages

Parsers for programming languages

MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

CFG of the java programming languageIdentifier:

IDENTIFIER

QualifiedIdentifier:Identifier { . Identifier }

Literal:IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteralNullLiteral

Expression: Expression1 [AssignmentOperator Expression1]]

AssignmentOperator: = += -= *= /= &= |=

from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996

Parsing java programs

class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging

public Point2d (double px, double py) { // Constructorx = px;y = py;

debug = false; // turn off debugging }

public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor

} // Note that a this() invocation must be the BEGINNING of // statement body of constructor

public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();

}

}

Simple java program: about 1000 symbols

Parsing algorithms

• How long would it take to parse this?

• Can we parse faster?

• No! CYK is the fastest known general-purposeparsing algorithm

exhaustive algorithm about 1080 years(longer than life of universe)

CYK algorithm about 1 week!

Another way of thinking

Scientist: Find an algorithm thatcan parse strings inany grammar

Engineer: Design your grammar so it has a very fastparsing algorithm

An example

S → Tc(1)

T → TA(2) | A(3)

A → aTb(4) | ab(5)

input: abaabbc

Stack Input

εaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε

Action

shiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb

A

a b

A

c

TT

T

A

S

Items

S → •TcS → T•cS → Tc•

T → •TAT → T•AT → TA•

T → •AT → A•

A → •aTbA → a•TbA → aT•bA → aTb•

A → •abA → a•bA → ab•

S → Tc(1) T → A(3)T → TA(2) A → aTb(4) A → ab(5)

Stack Input

εaabATTa

abaabbcbaabbcaabbcaabbcaabbcabbc

Action

shiftshiftreduce (5)reduce (3)shiftshift

••••••

Idea of parsing algorithm:Try to match complete items to top of stack

Some terminology

S → Tc(1)

T → TA(2) | A(3)

A → aTb(4) | ab(5)

input: abaabbc

Stack Input

εaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε

Action

shiftshiftreduce (5)reduce (3)shift shiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)

handle

valid items:a•Tb, a•b

valid items:T•a, T•c, aT•b

Outline of LR(0) parsing algorithm

• As the string is being read, it is pushed on a stack

• Algorithm keeps track of all valid items

• Algorithm can perform two actions:

no complete itemis viable

shift reduce

there is one valid item,and it is complete

Running the algorithm

Stack Input

S

S

SRSR

εa

aa

aabaAaAbA

aabbabb

bb

bbεε

A Valid Items

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

Running the algorithm

Stack Input

S

S

SRSR

εa

aa

aabaAaAbA

aabbabb

bb

bbεε

A Valid Items

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

How to update viable items

• Initial set of valid items

• Updating valid items on “shift b”

– After these updates, for every valid item A → α•Cβ andproduction C → •δ, we also add

as a valid item

S → •α for every production S → α

A → α•bβ A → αb•βis updated to

A → α•Xβ disappears if X ≠ b

C → •δa, b: terminalsA, B: variablesX, Y: mixed symbolsα, β: mixed strings

notation

How to update viable items

• Updating valid items on “reduce β to B”– First, we backtrack to viable items before reduce

– Then, we apply same rules as for “shift B” (as if B were a terminal)

A → α•Bβ A → αB•βis updated to

A → α•Xβ disappears if X ≠ B

C → •δ is added for every valid item A → α•Cβ and production C → •δ

Viable item updates by εNFA

• States of εNFA will be items (plus a start state q0)

• For every item S → •α we have a transition

• For every item A → α•Xβ we have a transition

• For every item A → α•Cβ and production C → •δ

S → •αq0ε

A → αX•βXA → α•Xβ

C → •δεA → α•Cβ

Example

A → aAb | ab

A → •aAb A → a•Ab A → aA•b

A → aAb•

A → •ab A → a•b A → ab•

q0

ε

ε

ε

ε

a

a b

b

A

Convert εNFA to DFA

A → •aAbA→ •ab

A → a•AbA → a•bA → •aAbA → •ab

A → aA•b

A → aAb•

A → ab•

a

b

bAa

1

2

3

4

5

states correspond to sets of valid itemstransitions are labeled by variables / terminals

die

Attempt at parsing with DFA

Stack Input

S

S

SR

εa

aa

aabaA

aabbabb

bb

bb

A DFA state

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•b

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

3?

Remember the state in stack!

Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bbεε

A DFA state

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

345

LR(0) grammars and deterministic PDAs

• The parsing procedure can be implemented by adeterministic pushdown automaton

• A PDA is deterministic if in every state there is atmost one possible transition – for every input symbol and pop symbol, including ε

• Example: PDA for w#wR is deterministic, but PDA forwwR is not

LR(0) grammars and deterministic PDAs

• Not every PDA can be made deterministic

• Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs!

• When does LR(0) parsing algorithm fail?

Outline of LR(0) parsing algorithm

• Algorithm can perform two actions:

• What if:

no complete itemis valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid itemscomplete, some not

more than one validcomplete item

S / R conflict R / R conflict

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars

Hierarchy of context-free grammars

LR(1) grammars

LR(0) grammarsparse using LR(0) algorithm

javaperl

python…

to be continued…