Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Compila(on 0368-‐3133
Lecture 2: Lexical Analysis
Syntax Analysis (1): CFLs, CFGs, PDAs
Noam Rinetzky 1
2
3
What is a Compiler? source language target language
Compiler
Executable
code
exe Source
text
txt
4
Compiler vs. Interpreter
5
Toy compiler
6
The Real Anatomy of a Compiler
7
The Real Anatomy of a Compiler
Executable
code
exe
Source
text
txt Lexical Analysis
Sem. Analysis
Process text input
characters Syntax Analysis tokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimization
IR Code generation IR
Target code optimization
Symbolic Instructions
SI Machine code generation
Write executable
output
MI
8
Lexical Analysis
Syntax Analysis
Lexical Analysis
9 Lexical analyzers are also known as “scanners’’
Lexical Analysis: from Text to Tokens
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
x = b*b – 4*a*c
txt
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
Token Stream
10
• Scan the input • Par((ons the text into stream of tokens
– Numbers – Iden(fiers – Keywords – Punctua(on
• Tokens usually represented as (kind, value) • Defined using regular expressions*
• “word” in the source language • “meaningful” to the syntac(cal analysis
What does Lexical Analysis do?
11
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
12
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
( ( 23 + 7 ) * 19 )
13
Regular languages
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
LP LP Num Op Num RP Op Num RP
From scanning to parsing((23 + 7) * x)
) ? * ) 7 + 23 ( (
RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
Parser Grammar: Expr → ... | Id Id → ‘a’ | ... | ‘z’
Op(*)
Id(?)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
valid syntax error
14
Some basic terminology
• Lexeme (aka symbol) -‐ a series of lecers separated from the rest of the program according to a conven(on (space, semi-‐column, comma, etc.)
• Pacern -‐ a rule specifying a set of strings. Example: “an iden(fier is a string that starts with a lecer and con(nues with lecers and digits” – (Usually) a regular expression
• Token -‐ a pair of (pacern, acributes)
15
Example void match0(char *s) /* find a zero */
{
if (!strncmp(s, “0.0”, 3))
return 0. ;
}
VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN
LBRACE
IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN
RETURN REAL(0.0) SEMI
RBRACE
EOF 16
Regular languages
• Formal languages – Σ = finite set of lecers – Word = sequence of lecer – Language = set of words
• Regular languages defined equivalently by – Regular expressions – Finite-‐state automata
17
Example: Integers w/o Leading Zeros
Digit = 1|2|…|9
Digit0 = 0|Digit Pos = Digit Digit0* Integer = 0 | Pos| -‐Pos
18
Challenge: Ambiguity
• If = if• Id = Lecer (Lecer | Digit)*
• “if” is a valid iden(fiers… what should it be? • ‘’iffy” is also a valid iden(fier
• Solu(on – Longest matching token – Break (es using order of defini(ons…
• Keywords should appear before iden(fiers 19
Building a Scanner – Take I
• Input: String
• Output: Sequence of tokens
20
Building a Scanner – Take I Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `: goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: ungetc(c); return Plus; }; case `<`: … case `w`: … } 21
There must be a becer way!
22
A becer way
• AutomaGcally generate a scanner
• Define tokens using regular expressions
• Use finite-‐state automata for detec(on
23
Reg-‐exp vs. automata
• Regular expressions are declara(ve – Good for humans – Not “executable”
• Automata are opera(ve – Define an algorithm for deciding whether a given word is in a regular language
– Not a natural nota(on for humans 24
Overview
• Define tokens using regular expressions
• Construct a nondeterminis(c finite-‐state automaton (NFA) from regular expression
• Determinize the NFA into a determinis(c finite-‐state automaton (DFA)
• DFA can be directly used to iden(fy tokens 25
Automata theory: a bird’s-‐eye view
26
Determinis(c Automata (DFA)
• M = (Σ, Q, δ, q0, F) – Σ -‐ alphabet – Q – finite set of state – q0 ∈ Q – ini(al state – F ⊆ Q – final states – δ : Q × Σ à Q -‐ transi(on func(on
• For a word w, M reach some state x – M accepts w if x ∈ F
27
DFA in pictures
start
a
b,c
a,b
c
accepting state
start state
transition
• An automaton is defined by states and transi(ons
28
a,b,c a,b,c
Accep(ng Words
• Words are read leq-‐to-‐rightcba
start
a
b
c
29
• Missing transi(on = non-‐acceptance – “Stuck state”
• Words are read leq-‐to-‐right
Accep(ng Words
cba
start
a
b
c
30
• Words are read leq-‐to-‐right
Accep(ng Words
cba
start
a
b
c
31
• Words are read leq-‐to-‐right
Accep(ng Words
cba
start
a
b
c
32
Rejec(ng Words
cbb
start
a
b
c
33
• Words are read leq-‐to-‐right
start
Rejec(ng Words
• Missing transi(on means non-‐acceptancecbb
a
b
c
34
Non-‐determinis(c Automata (NFA)
• M = (Σ, Q, δ, q0, F) – Σ -‐ alphabet – Q – finite set of state – q0 ∈ Q – ini(al state
– F ⊆ Q – final states – δ : Q × (Σ ∪ {ε}) → 2Q -‐ transi(on func(on
• DFA: δ : Q × Σ à Q
• For a word w, M can reach a number of states X – M accepts w if X ∩ M ≠ {}
• Possible: X = {}
• Possible ε-‐transi(ons
35
NFA
• Allow mul(ple transi(ons from given state labeled by same lecer
start
a
a
b
c
c
b
36
Accep(ng words
cba
start
a
a
b
c
c
b
37
Accep(ng words
• Maintain set of states
cba
start
a
a
b
c
c
b
38
Accep(ng words
cba
start
a
a
b
c
c
b
39
Accep(ng words• Accept word if reached an accep(ng state
cba
start
a
a
b
c
c
b
40
NFA+Є automata
• Є transi(ons can “fire” without reading the input
Є
start a
b
c
41
NFA+Є run example
cba
Є
start a
b
c
42
NFA+Є run example• Now Є transi(on can non-‐determinis(cally take place
cba
Є
start a
b
c
43
NFA+Є run example
cba
Є
start a
b
c
44
NFA+Є run example
cba
Є
start a
b
c
45
NFA+Є run example
cba
Є
start a
b
c
46
• Є transi(ons can “fire” without reading the input
NFA+Є run example
cba
• Word accepted
Є
start a
b
c
47
From regular expressions to NFA
• Step 1: assign expression names and obtain pure regular expressions R1…Rm
• Step 2: construct an NFA Mi for each regular expression Ri
• Step 3: combine all Mi into a single NFA
• Ambiguity resolu8on: prefer longest accep8ng word 48
From reg. exp. to automata• Theorem: there is an algorithm to build an NFA+Є automaton for any regular expression
• Proof: by induc8on on the structure of the regular expression
start
49
R = ε
R = φ
R = a a
Basic constructs
50
Composi(on R = R1 | R2 ε M1
M2 ε
ε
ε
R = R1R2
ε M1 M2
ε ε
51
Repe((on
R = R1*
ε M1
ε
ε
ε
52
53
Naïve approach
• Try each automaton separately
• Given a word w: – Try M1(w) – Try M2(w) – … – Try Mn(w)
• Requires rese{ng aqer every acempt 54
Actually, we combine automata
1 2 a a
3 a
4 b 5 b 6
abb
7 8 b a*b+ b a
9 a
10 b 11 a 12 b 13
abab
0
ε
ε
ε
ε
a abb a*b+ abab
combines
55
Corresponding DFA
0 1 3 7 9
8
7
b
a
a 2 4 7 10
a
b b
6 8
5 8 11 b
12 13 a b
b
abb a*b+ a*b+
a*b+
abab
a
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
b
56
Scanning with DFA
• Run un(l stuck – Remember last accepGng state
• Go back to accep(ng state • Return token
57
Ambiguity resolu(on
• Longest word • Tie-‐breaker based on order of rules when words have same length
58
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
Examples
0 1 3 7 9
8
7
b
a
a 2 4 7 10
a
b b
6 8
5 8 11 b
12 13 a b
b
abb a*b+ a*b+
a*b+
abab
a
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#b
abaa: gets stuck aqer aba in state 12, backs up to state (5 8 11) pacern is a*b+, token is ab Tokens: <a*b+, ab> <a,a><a,a> 59
Examples
0 1 3 7 9
8
7
b
a
a 2 4 7 10
a
b b
6 8
5 8 11 b
12 13 a b
b
abb a*b+ a*b+
a*b+
abab
a
b
abba: stops aqer second b in (6 8), token is abb because it comes first in spec 60 Tokens: <abb, abb> <a,a>
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
Summary of Construc(on
• Describe tokens as regular expressions – Decide acributes (values) to save for each token
• Regular expressions turned into a DFA – Also, records which acributes (values) to keep
• Lexical analyzer simulates the run of an automata with the given transi(on table on any input string
61
A Few Remarks
• Turning an NFA to a DFA is expensive, but – Exponen(al in the worst case – In prac(ce, works fine
• The construc(on is done once per-‐language – At Compiler construc(on (me – Not at compila(on (me
62
Implementa(on
63
Implementa(on by Example if { return IF; } [a-‐z][a-‐z0-‐9]* { return ID; } [0-‐9]+ { return NUM; } [0-‐9]”.”[0-‐9]+|[0-‐9]*”.”[0-‐9]+ { return REAL; } (\-‐\-‐[a-‐z]*\n)|(“ “|\n|\t) { ; } . { error(); }
64
if
xy, i, zs98
3,32, 032
0.55, 33.1
-‐-‐comm\n \n, \t, “ “ ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
int edges[][256]= { /* …, 0, 1, 2, 3, ..., -‐, e, f, g, h, i, j, ... */
/* state 0 */ {0, …, 0, 0, …, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0, 0}, /* state 1 */ {13, … , 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4, …, 13, 13}, /* state 2 */ {0, …, 4, 4, 4, 4, ..., 0, 4, 3, 4, 4, 4, 4, …, 0, 0}, /* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, , 0, 0}, /* state 4 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, …, 0, 0}, /* state 5 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}, /* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, ..., 0, 0}, /* state 7 */ /* state … */ ... /* state 13 */ {0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0} }; 65
ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
Pseudo Code for Scanner char* input = … ; Token nextToken() { lastFinal = 0; currentState = 1 ; inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) {
nextState = edges[currentState][*currentPosition]; if (isFinal(nextState)) { lastFinal = nextState ; inputPositionAtLastFinal = currentPosition; } currentState = nextState; advance currentPosition; } input = inputPositionAtLastFinal + 1; return action[lastFinal]; }
66
Example
Input: “if -‐-‐not-‐a-‐com”
67
2 blanks
ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
final state input
0 1 if -‐-‐not-‐a-‐com
2 2 if -‐-‐not-‐a-‐com
3 3 if -‐-‐not-‐a-‐com
3 0 if -‐-‐not-‐a-‐com
return IF
68
ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
found whitespace
final state input
0 1 --not-a-com
12 12 --not-a-com
12 12 --not-a-com
12 0 --not-a-com
69
final state input
0 1 -‐-‐not-‐a-‐com
9 9 -‐-‐not-‐a-‐com
9 10 -‐-‐not-‐a-‐com
9 10 -‐-‐not-‐a-‐com
9 10 -‐-‐not-‐a-‐com
9 0 -‐-‐not-‐a-‐com error
70
ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
final state input
0 1 -‐not-‐a-‐com
9 9 -‐not-‐a-‐com
9 0 -‐not-‐a-‐com
9 0 -‐not-‐a-‐com
9 0 -‐not-‐a-‐com
error
71
ID
IF
ID error REAL
NUM REAL
error w.s. error w.s.
01
2 3
9 10 11 12
Concluding remarks
• Efficient scanner • Minimiza(on • Error handling • Automa(c crea(on of lexical analyzers
72
Efficient Scanners
• Efficient state representa(on • Input buffering • Using switch and gotos instead of tables
73
Minimiza(on
• Create a non-‐determinis(c automaton (NDFA) from every regular expression
• Merge all the automata using epsilon moves (like the | construc(on)
• Construct a determinis(c finite automaton (DFA) – State priority
• Minimize the automaton – separate accep(ng states by token kinds
74
Example if { return IF; } [a-‐z][a-‐z0-‐9]* { return ID; } [0-‐9]+ { return NUM; }
75 Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8
ID IF
error NUM
Example if { return IF; } [a-‐z][a-‐z0-‐9]* { return ID; } [0-‐9]+ { return NUM; }
76 Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8
ID IF
error
NUM
ID
NUM
ID
ID IF
error NUM
Example
77
ID IF
error NUM
ID IF
error
NUM
ID
NUM
ID
if { return IF; } [a-‐z][a-‐z0-‐9]* { return ID; } [0-‐9]+ { return NUM; }
ID IF
error NUM
ID
ID
ID
IF
NUM NUM
error
Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8
Example
78
if { return IF; } [a-‐z][a-‐z0-‐9]* { return ID; } [0-‐9]+ { return NUM; }
ID IF
error NUM
ID
ID
ID
IF
NUM NUM
error
Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8
Error Handling • Many errors cannot be iden(fied at this stage • Example: “fi (a==f(x))”. Should “fi” be “if”? Or is it a rou(ne name?
– We will discover this later in the analysis – At this point, we just create an iden(fier token
• Some(mes the lexeme does not match any pacern – Easiest: eliminate lecers un(l the beginning of a legi(mate lexeme – Alterna(ves: eliminate/add/replace one lecer, replace order of two adjacent
lecers, etc.
• Goal: allow the compila(on to con(nue • Problem: errors that spread all over
79
Automa(cally generated scanners
• Use of Program-‐Genera(ng Tools – Specifica(on è Part of compiler – Compiler-‐Compiler
Stream of tokens
JFlex regular expressions
input program scanner 80
Use of Program-‐Genera(ng Tools
• Input: regular expressions and ac(ons • Ac(on = Java code
• Output: a scanner program that • Produces a stream of tokens • Invoke ac(ons when pacern is matched
Stream of tokens
JFlex regular expressions
input program scanner 81
Line Coun(ng Example
• Create a program that counts the number of lines in a given input text file
82
Crea(ng a Scanner using Flex
int num_lines = 0; %% \n ++num_lines; . ; %% main() { yylex(); printf( "# of lines = %d\n", num_lines); }
83
Crea(ng a Scanner using Flex
ini(al
other
newline \n
^\n
int num_lines = 0; %% \n ++num_lines; . ; %% main() { yylex(); printf( "# of lines = %d\n", num_lines); }
84
JFLex Spec File User code: Copied directly to Java file %% JFlex direc(ves: macros, state names %% Lexical analysis rules:
– Op(onal state, regular expression, ac(on – How to break input to tokens – Ac(on when token matched
Possible source of javac errors down
the road
DIGIT= [0-‐9] LETTER= [a-‐zA-‐Z]
YYINITIAL
{LETTER} ({LETTER}|{DIGIT})*
85
Crea(ng a Scanner using JFlex import java_cup.runtime.*; %% %cup %{ private int lineCounter = 0; %} %eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF); %eofval} NEWLINE=\n %% {NEWLINE} { lineCounter++; } [^{NEWLINE}] { }
86
Catching errors
• What if input doesn’t match any token defini(on?
• Trick: Add a “catch-‐all” rule that matches any character and reports an error – Add aqer all other rules
87
A JFlex specifica(on of C Scanner import java_cup.runtime.*; %% %cup %{ private int lineCounter = 0; %} Letter= [a-‐zA-‐Z_] Digit= [0-‐9] %% ”\t” { } ”\n” { lineCounter++; } “;” { return new Symbol(sym.SemiColumn);} “++” { return new Symbol(sym.PlusPlus); } “+=” { return new Symbol(sym.PlusEq); } “+” { return new Symbol(sym.Plus); } “while” { return new Symbol(sym.While); } {Letter}({Letter}|{Digit})*
{ return new Symbol(sym.Id, yytext() ); } “<=” { return new Symbol(sym.LessOrEqual); } “<” { return new Symbol(sym.LessThan); }
88
Missing
• Crea(ng a lexical analysis by hand • Table compression • Symbol Tables • Nested Comments • Handling Macros
89
Lexical Analysis: What
• Input: program text (file) • Output: sequence of tokens
90
Lexical Analysis: How
• Define tokens using regular expressions
• Construct a nondeterminis(c finite-‐state automaton (NFA) from regular expression
• Determinize the NFA into a determinis(c finite-‐state automaton (DFA)
• DFA can be directly used to iden(fy tokens 91
Lexical Analysis: Why
• Read input file • Iden(fy language keywords and standard iden(fiers • Handle include files and macros • Count line numbers • Remove whitespaces • Report illegal symbols
• [Produce symbol table]
92
Syntax Analysis (1)
Context Free Languages Context Free Grammars Pushdown Automata
93
The Real Anatomy of a Compiler
Executable
code
exe
Source
text
txt Lexical Analysis
Sem. Analysis
Process text input
characters Syntax Analysis tokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimization
IR Code generation IR
Target code optimization
Symbolic Instructions
SI Machine code generation
Write executable
output
MI
94
Lexical Analysis
Syntax Analysis
Frontend: Scanning & Parsing((23 + 7) * x)
) x * ) 7 + 23 ( (
RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
Parser Grammar: E → ... | Id Id → ‘a’ | ... | ‘z’
Op(*)
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
valid syntax error
95
From scanning to parsing((23 + 7) * x)
) x * ) 7 + 23 ( (
RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
Parser Grammar: E → ... | Id Id → ‘a’ | ... | ‘z’
Op(*)
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
valid syntax error
96
Parsing • Construct a structured representa(on of the input
• Challenges – How do you describe the programming language? – How do you check validity of an input?
• Is a sequence of tokens a valid program in the language?
– How do you construct the structured representa(on? – Where do you report an error?
97
Some founda(ons
98
Context free languages (CFLs)
• L01 = { 0n1n | n > 0 }
• Lpolyndrom = {pp’ | p ∊ Σ* , p’=reverse(p)}
• Lpolyndrom# = {p#p’ | p ∊ Σ* , p’=reverse(p), # ∉ Σ}
99
Context free grammars (CFG)
• V – non terminals (syntac(c variables) • T – terminals (tokens) • P – deriva(on rules
– Each rule of the form V t (T 4 V)*
• S – start symbol
G = (V,T,P,S)
100
What can CFGs do?
• Recognize CFLs
• S t 0T1 • T t 0T1 | ℇ
101
~ language-‐defining power
Recognizing CFLs
• Context Free Grammars (CFG)
• Nondeterminis(c push down automata (PDA)
102
Pushdown Automata (PDA)
• Nondeterminis(c PDAs define all CFLs
• Determinis(c PDAs model parsers. – Most programming languages have a determinis(c PDA
– Efficient implementa(on
103
Intui(on: PDA
• An ε-‐NFA with the addi(onal power to manipulate one stack
104
stack
X
Y
IF
$
Top
control (ε-‐NFA)
Intui(on: PDA
• Think of an ε-‐NFA with the addi(onal power that it can manipulate a stack
• PDA moves are determined by: – The current state (of its “ε-‐NFA”) – The current input symbol (or ε) – The current symbol on top of its stack
105
Intui(on: PDA
input
stack
if (oops) then stat:= blah else abort
X
Y
IF
$
Top
Current
control (ε-‐NFA)
106
Intui(on: PDA
• Moves: – Change state – Replace the top symbol by 0…n symbols
• 0 symbols = “pop” (“reduce”) • 0 < symbols = sequence of “pushes” (“shiq”)
• Nondeterminis(c choice of next move
107
PDA Formalism
• PDA = (Q, Σ, Γ, δ, q0, $, F): – Q: finite set of states – Σ: Input symbols alphabet – Γ: stack symbols alphabet – δ: transi(on func(on – q0: start state – $: start symbol – F: set of final states
108
The Transi(on Func(on
• δ(q, a, X) = { (p1, σ1), … ,(pn, σn)} – Input: triplet
• A state q ∊ Q • An input symbol a ∊ Σ or ε • A stack symbol X ∊ Γ
– Output: set of 0 … k ac(ons of the form (p, σ) • A state p ∊ Q • σ a sequence X1⋯Xn ∊ Γ* of stack symbols
109
Ac(ons of the PDA
• Say (p, σ) ∊ δ(q, a, X) – If the PDA is in state q and X is the top symbol and a is at the front of the input
– Then it can • Change the state to p. • Remove a from the front of the input
– (but a may be ε).
• Replace X on the top of the stack by σ.
110
Example: Determinis(c PDA
• Design a PDA to accept {0n1n | n > 1}. • The states:
– q = We have not seen 1 so far • start state
– p = we have seen at least one 1 and no 0s since – f = final state; accept.
111
Example: Stack Symbols
• $ = start symbol. – Also marks the bocom of the stack, – Indicates when we have counted the same number of 1’s as 0’s.
• X = “counter” – used to count the number of 0s we saw
112
Example: Transi(ons
• δ(q, 0, $) = {(q, X$)}. • δ(q, 0, X) = {(q, XX)}.
– These two rules cause one X to be pushed onto the stack for each 0 read from the input.
• δ(q, 1, X) = {(p, ε)}. – When we see a 1, go to state p and pop one X.
• δ(p, 1, X) = {(p, ε)}. – Pop one X per 1.
• δ(p, ε, $) = {(f, $)}. – Accept at bocom. 113
Ac(ons of the Example PDA
q
0 0 0 1 1 1
$
114
Ac(ons of the Example PDA
q
X $
0 0 0 1 1 1
115
Ac(ons of the Example PDA
q
X X $
0 0 0 1 1 1
116
Ac(ons of the Example PDA
q
X X X $
0 0 0 1 1 1
117
Ac(ons of the Example PDA
p
X X $
0 0 0 1 1 1
118
Ac(ons of the Example PDA
p
X $
0 0 0 1 1 1
119
Ac(ons of the Example PDA
p
$
0 0 0 1 1 1
120
Ac(ons of the Example PDA
f
$
0 0 0 1 1 1
121
Example: Non Determinis(c PDA
• A PDA that accepts palindromes – L {pp’ ∊ Σ* | p’=reverse(p)}
122
123