Upload
emily-ramsey
View
62
Download
7
Tags:
Embed Size (px)
DESCRIPTION
UNIT - II Grammar Formalism: Chomsky hierarchy of languages Context free grammar Derivation trees and sentential forms Right most and leftmost derivation of strings Ambiguity in context free grammars Minimization of Context Free Grammars Chomsky normal form - PowerPoint PPT Presentation
Citation preview
UNIT - II
Grammar Formalism:
Chomsky hierarchy of languages
Context free grammar
Derivation trees and sentential forms
Right most and leftmost derivation of strings
Ambiguity in context free grammars
Minimization of Context Free Grammars
Chomsky normal form
Greiback normal form
Push down Automata:
Push down automata,
Definition
Model
Acceptance of CFL
Acceptance by final state
Acceptance by empty stack and its equivalence. Equivalence of CFL and PDA
Grammar:
A Grammar is a 4 tuple : G= ( T , N , P,S )
T->Set of terminals
N-> set of non terminals
S->starting symbol
P-> production rules in the form of
→ where , NT
Depending on production rules the grammars are classified into 4 types:
i) Unrestricted or Type 0 Grammar: In this the production rules are of the form
→ where , NT
ii) Context Sensitive or Type1 Grammar: In this the production rules are of the form
→ where , NT and || ||
iii) Context Free or Type2 Grammar: In this the production rules are of the form
A→ where NT, A N
iv) Regular or Type3 Grammar: In this only one Non terminal is used at both left and
right sides of the production Ex: A →a, A → Ba , A → aB
The Chomsky Hierarchy
Type Language Grammar Automaton
0 Partially Computable Unrestricted DTM - NTM
1 Context Sensitive Context Sensitive Linearly Bounded Automaton
2 Context Free Context Free NPDA
3 Regular right regular, left regular
DFA, NFA
0 Type 1 Type 2 Type 3 Type
Context-Free Grammar
The syntax of a programming language is specified by using Context Free Grammar (CFG).
A CFG can be defined as G = { T,N,P,S}
where T->Set of terminals
N-> set of non terminals
S->starting symbol
P-> production rules in the form of
A→ where A N , NT Notational Conventions:1.Terminals: i) Lowercase letters early in the alphabets such as a, b ,c ii) Digits and special characters such as +,-,{,(2. Non Terminals: i) Uppercase letters early in the alphabets like A,B,C ii) Lowercase italic names such as exp ,stmt …3.Uppercase letters late in the alphabet X, Y, Z are use to represent grammar symbol i.e
either terminal or non terminal4. Lowercase Greek letters ,, are used to represent set of grammar symbols(strings).
Context-Free Languages
Given a context-free grammar G = { T,N,P,S}, the language generated or derived
from G is the set: L(G) = {w T*: S * w } A language L is context-free if there is a context-free grammar G = { T,N,P,S}, such
that L is generated from G.
• Context-free grammars are more expressive than finite automata: if a language L is
accepted by a finite automata then L can be generated by a context-free grammar
• The converse is NOT true• Derivation
– Based on the grammar, derivations can be made– The purpose of a grammar is to derive strings in the language defined by the
grammar– , can be derived from in one step– + derived in one or more steps– * derived in any number of steps– lm leftmost derivation
• Always substitute the leftmost non-terminal – rm rightmost derivation
• Always substitute the rightmost non-terminal
6
• Example CFG:
G = ({S}, {0, 1}, P, S)
P:
(1) S –> 0S1 or just simply S –> 0S1 | ε
(2) S –> ε
• Example Derivations:
S => 0S1 (1)S => ε (2)
=> 01 (2)
S => 0S1 (1)
=> 00S11 (1)
=> 000S111 (1)
=> 000111 (2)
• Note that G “generates” the language {0k1k | k>=0}
7
• Example CFG:G = ({A, B, C, S}, {a, b, c}, P, S)P:
(1) S –> ABC(2) A –> aA A –> aA | ε(3) A –> ε(4) B –> bB B –> bB | ε(5) B –> ε(6) C –> cC C –> cC | ε(7) C –> ε
• Example Derivations:
S => ABC (1)S => ABC (1)
=> BC (3)=> aABC (2)
=> C (5)=> aaABC (2)
=> ε (7)=> aaBC (3)
=> aabBC (4)
=> aabC (5)
=> aabcC (6)
=> aabc (7)
• Note that G generates the language a*b*c*
Sentential Form
may contain terminals and non-terminals may be empty sentence of G is a sentential form with no non-terminals the language generated by a grammar is a set of sentences L(G) – the language generated by G a string of terminals w is in L(G) iff w is a sentence of G (S=>w)*
The following CFG is for simple arithmetic expressions: E → E op E | ( E )| id op → + | - | * | | % | ( | ) From above production rules T={ ( , ) , id , + , - , * , % } N = { E, op } S = { E }
► Represents the language using an ordered rooted tree.
► Root represents the starting symbol.
► Internal vertices represent the nonterminal symbol that arise in the production.
► Leaves represent the terminal symbols.
► If the production A → w arise in the derivation, where w is a word, the vertex that represents
A has as children vertices that represent each symbol in w, in order from left to right.
Derivation (Parse) Tree of A Context-free Grammar
• Example: Let G = ({S,A,a,b},{a,b}, S,{S → aA, S → b, A → aa}). What is L(G)?
• Draw a tree of all possible derivations.– We have: S aA aaa.– and S b.
• Answer: L = {aaa, b}.
S
aA b
aaaExample of aderivation treeor parse tree or sentence diagram.
Leftmost, Rightmost Derivations
- A left-most derivation of a sentential form is one in which rules transforming the left-most non terminal are always applied
- A right-most derivation of a sentential form is one in which rules transforming the right-most nonterminal are always applied
S A | A BA e | a | A b | A AB b | b c | B c | b B
Sample derivations:S AB AAB aAB aaB aabB aabb
S AB AbB Abb AAbb Aabb aabb
S
A B
AA Bb
a a b
These two derivations are special.
1st derivation is leftmost.Always picks leftmost variable.
2nd derivation is rightmost.Always picks rightmost variable.
Ambiguity in context free grammarsA context-free grammar G is ambiguous, if some string wεL(G) has two or more derivation trees
11
E
EE
EE
a
a a
aaa take a = 2 E
EE
a a
EE a
string aaa has two leftmost derivations
The grammar aEEEEEE |)(|| is ambiguous:
aaaEaa
EEaEaEEE
*
aaaEaa
EEaEEEEEE
Rewrite Ambiguous Grammar
• Try to use a single recursive non terminal in each rule
– When the left symbol appears more than once on the right side
– Use additional symbols to substitute them and allow only one
• Force to only allow one expansion
– Example grammar
• E E + E | E – E | E * E | E / E | (E) | id
• It is ambiguous
– Change to
• E T + E | T – E | T * E | T / E | (E) | T
• T id
• Parse: id * id – id
– E T * E T * T – E T * T – T … id * id – id
E
E*T
T – E
id T
id
id
• Build desired precedence in the grammar– Example
• E E + E | E * E | (E) | id• Ambiguous• Desired precedence: * executes before +
– Change toE E + T | TT T * F | FF (E) | id
– Parse id + id * id
E
E + T
F*T
idF
T
F
id id
Minimization of Context Free Grammars
Three ways to simplify/clean a CFG
1. Eliminate useless symbols (clean)
2. Eliminate -productions (simplify)
3. Eliminate unit productions (simplify)
Eliminating useless symbols
A symbol X is reachable if there exists:– S * X
A symbol X is generating if there exists: – X * w,
• for some w T*
For a symbol X to be “useful”, it has to be both reachable and generating
• S * X * w’, for some w’ T*
1. First, eliminate all symbols that are not generating2. Next, eliminate all symbols that are not reachable
• SAB | a• A b
1. A, S are generating2. B is not generating (and therefore B is useless)3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S a2. A b
4. Now, A is not reachable and therefore is useless
5. Simplified G: 1. S a
Eliminating -productions
Theorem: If G=(V,T,P,S) is a CFG for a language L, then L-{} has a CFG without -productions
Definition: A is “nullable” if A* If A is nullable, then any production of the form “B CAD” can be simulated by: B CD | CAD
• Let L be the language represented by the following CFG G:i. SABii. AaAA | iii. BbBB |
Goal: To construct G1, which is the grammar for L-{}
• Nullable symbols: {A, B}
• G1 can be constructed from G as follows:– B b | bB | bB | bBB
• ==> B b | bB | bBB– Similarly, A a | aA | aAA– Similarly, S A | B | AB
• Note: L(G) = L(G1) U {}
G1:• S A | B | AB• A a | aA | aAA• B b | bB | bBB
• S
+
18
Eliminating Unit Productions
• A unit production is one whose right side consists of exactly one variable.• These productions can be eliminated.• Key idea: If A =>* B by a series of unit productions, and B -> is a non-
unit-production, then add production A -> .• Then, drop all unit productions.
bbB
AB
BA
aA
aAS
bbB
BAB
aA
aBaAS
|
|
bbB
AB
aA
aBaAS
|
bbB
aA
aAaBaAS
||
Chomsky normal form
• Method of simplifying a CFGDefinition: A context-free grammar is in Chomsky normal form if every rule is
of one of the following forms A BCA a
where a is any terminal and A is any variable, and B, and C are any variables or terminals other than the start variable
the rule S ε is permitted, where S is the start variableAny context-free language is generated by a context-free grammar in
Chomsky normal form
Convert any CFG to one in Chomsky normal form by removing or
replacing all rules in the wrong form
1. Add a new start symbol
2. Eliminate ε rules of the form A ε
3. Eliminate unit rules of the form A B
4. Convert remaining rules into proper form
Convert a CFG to Chomsky normal form
1. Add a new start symbol- Create the following new rule S0 S
where S is the start symbol and S0 is not used in the CFG
2. Eliminate all ε rules A ε, where A is not the start variable- For each rule with an occurrence of A on the right-hand side, add a new
rule with the A deleted- R uAv becomes R uAv | uv
R uAvAw becomes R uAvAw | uvAw | uAvw | uvw- If we have R A, replace it with R ε unless we had already removed R
ε3. Eliminate all unit rules of the form A B- For each rule B u, add a new rule A u, where u is a string of terminals
and variables, unless this rule had already been removed- Repeat until all unit rules have been replaced4. Convert remaining rules into proper form- Replace each rule A u1u2…uk, where k 3 and ui is a variable or a
terminal with k-1 rulesA u1A1 A1 u2A2 … Ak-2 uk-1uk
Example
Convert the following grammar into Chomsky Normal Form.S S1 | S2
S1 S1b | Ab | ε
A aAb | abS2 S2a | Ba | ε
B bBa | ba
Step 1: Add a new start symbolS0 S
S S1 | S2
S1 S1b | Ab
A aAb | ab | εS2 S2a | Ba
B bBa | ba | ε
Step 2: Eliminate ε rulesS0 S
S S1 | S2
S1 S1b | Ab | b
A aAb | abS2 S2a | Ba | a
B bBa | ba
Step 3: Eliminate all unit rules
S0 S1b | Ab | b | S2a | Ba | a S S1b | Ab | b | S2a | Ba | a S1 S1b | Ab | bA aAb | abS2 S2a | Ba | aB bBa | ba
Step 4: Convert remaining rules to proper form
S0 S1b | Ab | b | S2a | Ba | a S S1b | Ab | b | S2a | Ba | a S1 S1b | Ab | bA aA1 | abA1 AbS2 S2a | Ba | aB bB1 | baB1 Ba
Pushdown Automaton (PDA)
• A Pushdown Automaton is a nondeterministic finite state automaton (NFA) that permits ε-transitions and a stack.
• A PDA P is a seven tuple ( Q,∑,, δ,q0,Z0,F ):– Q: states of the PDA with ε– ∑: input alphabet– : stack symbols – δ: transition function– q0: start state
– Z0: Initial stack top symbol– F: Final/accepting states
old state Stack top input symb. new state(s)new Stack top(s)
δ : Q x x ∑ => Q x
A Graphical Notation for PDA’s
1. The nodes correspond to the states of the PDA.
2. An arrow labeled Start indicates the unique start state.
3. Doubly circled states are accepting states.
4. Edges correspond to transitions in the PDA as follows:
An edge labeled (ai, X)/Y from state q to state p means that d(q, ai, X) contains the pair (p, Y), perhaps among other pairs.
qi qj
a, X / Y
Next input symbolCurrent
state
Currentstacktop
StackTopReplacement(w/ string Y)
Nextstate
δ(qi,a, X)={(qj,Y)}
28
Example
Let Lwwr = {wwR | w is in (0+1)*}• CFG for Lwwr : S==> 0S0 | 1S1 | • PDA for Lwwr :• P := ( Q,∑, , δ,q0,Z0,F )
= ( {q0, q1, q2},{0,1},{0,1,Z0},δ,q0,Z0,{q2})1. δ(q0,0, Z0)={(q0,0Z0)}2. δ(q0,1, Z0)={(q0,1Z0)}
3. δ(q0,0, 0)={(q0,00)}4. δ(q0,0, 1)={(q0,01)}5. δ(q0,1, 0)={(q0,10)}6. δ(q0,1, 1)={(q0,11)}
7. δ(q0, , 0)={(q1, 0)}8. δ(q0, , 1)={(q1, 1)}9. δ(q0, , Z0)={(q1, Z0)}
10. δ(q1,0, 0)={(q1, )}11. δ(q1,1, 1)={(q1, )}
12. δ(q1, , Z0)={(q2, Z0)}
First symbol push on stack
Grow the stack by pushing new symbols on top of old(w-part)
Switch to popping mode(boundary between w and wR)
Shrink the stack by popping matching symbols (wR-part)
Enter acceptance state
29
PDA for Lwwr: Transition Diagram
q0 q1 q2
0, Z0/0Z0
1, Z0/1Z0
0, 0/000, 1/011, 0/101, 1/11
0, 0/ 1, 1/
, Z0/Z0
, 0/0 , 1/1
, Z0/Z0
Grow stack
Switch topopping mode
Pop stack for matching symbols
Go to acceptance
∑ = {0, 1}G = {Z0, 0, 1}Q = {q0,q1,q2}
, Z0/Z0
This would be a non-deterministic PDA
30
language of balanced paranthesis
q0 q1 q2
(, Z0 / ( Z0
, Z0 / Z0
, Z0 / Z0
Grow stack
Switch topopping mode
Pop stack for matching symbols
Go to acceptance (by final state)when you see the stack bottom symbol
∑ = { (, ) }G = {Z0, ( }Q = {q0,q1,q2}
(, ( / ( (
), ( /
), ( /
To allow adjacentblocks of nested paranthesis
(, ( / ( ( (, Z0 / ( Z0
, Z0 / Z0
31
• PDAs that accept by final state:– For a PDA P, the language accepted by P, denoted by L(P) by final
state, is:• {w | (q0,w,Z0) |---* (q,, A) }, s.t., q F
• PDAs that accept by empty stack:– For a PDA P, the language accepted by P, denoted by N(P) by empty
stack, is:• {w | (q0,w,Z0) |---* (q, , ) }, for any q Q.
Checklist: - input exhausted? - in a final state?
Checklist: - input exhausted? - is the stack empty?
There are two types of PDAs that one can design: those that accept by final state or by empty stack
Q) Does a PDA that accepts by empty stackneed any final state specified in the design?
q0
(,Z0 / ( Z0
(,( / ( (), ( /
startq1
,Z0/ Z0
,Z0/ Z0
PF:
q0
start
(,Z0 / ( Z0
(, ( / ( (), ( / ,Z0 /
,Z0/ Z0
PN:
• A language is L(P1) for some PDA P1 if and only if it is N(P2) for some PDA P2.
Equivalence of Acceptance by Final State and Empty Stack
Final State Empty Stack
Given P1 = (Q, , , , q0, Z0, F), construct P2:
1. Introduce new start state p0 and new bottom-of-stack marker
X0.
2. First move of P2 : replace X0 by Z0X0 and go to state q0. The
presence of X0 prevents P2 from "accidentally" emptying its stack and
accepting when P1 did not accept.
3. Then, P2 simulates P1, i.e., give P2 all the transitions of P1.
4. Introduce a new state r that keeps popping the stack of P2 until it is
empty.
5. If (the simulated) P1 is in an accepting state, give P2 the additional
choice of going to state r on input, and thus emptying its stack
without reading any more input.
33
PF==> PN construction
• Main idea:
– Whenever PF reaches a final state, just make an -transition into a
new end state, clear out the stack and accept
– What if PF design is such that it clears the stack midway without
entering a final state?
to address this, add a new start symbol X0 (not in of PF)
PN = (Q U {p0,pe}, ∑, U {X0}, δN, p0, X0)
p0
, X0/Z0X0New start
, any/
, any/
, any/
q0 … pe
, any/
PF
PN:
Given P2 = (Q, , , , q0, Z0, F), construct P1:1. Introduce new start state p0 and new bottom-of-stack
marker X0
2. First move of P1 : replace X0 by Z0X0 and go to state q0. Then, P2 simulates P1, i.e., give P2 all the transitions of P1
3. Introduce a new state r for P1, it is the only accepting state
4. P1 simulates P2
5. If (the simulated) P1 ever sees X0 it knows P2 accepts so P1 goes to state r on input
Empty Stack Final State
PF = (QN U {p0,pf}, ∑, U {X0}, δF, p0, X0, {pf})
q0 … pfp0
, X0/Z0X0New start
, X0/ X0
, X0/ X0
, X0/ X0
, X0/ X0
PF: PN:
, X0 / X0
35
Example: Matching parenthesis “(” “)”
PN: ( {q0}, {(,)}, {Z0,Z1}, δN, q0, Z0 )
δN: δN(q0,(,Z0) = { (q0,Z1Z0) }δN(q0,(,Z1) = { (q0, Z1Z1) }δN(q0,),Z1) = { (q0, ) }δN(q0, ,Z0) = { (q0, ) }
q0
start
(,Z0 /Z1Z0
(,Z1 /Z1Z1
),Z1 / ,Z0 /
q0
(,Z0/Z1Z0
(,Z1/Z1Z1
),Z1/ ,Z0/
startp0 pf
,X0/Z0X0,X0/ X0
Pf:( {p0,q0 ,pf}, {(,)}, {X0,Z0,Z1}, δf, p0, X0 , pf)
δf: δf(p0, ,X0) = { (q0,Z0) }δf(q0,(,Z0) = { (q0,Z1 Z0) }δf(q0,(,Z1) = { (q0, Z1Z1) }δf(q0,),Z1) = { (q0, ) }δf(q0, ,Z0) = { (q0, ) }δf(p0, ,X0) = { (pf, X0 ) }
Accept by empty stack Accept by final state
Equivalence between CFGs and PDAs
• Converting CFGs to PDAs– Easier to use PDA version that accepts by empty stack
• Given a context free grammar G = (V,T,P,S), construct a pushdown automaton M– Need to specify states, input and stack symbols and the transition
function• M = (Q, , , , q0, Z0), where
– Q contains a single state, q0
– = T– = {V T}– Z0 = S
– Note: no need for F (final states) since we are accepting by empty stack
• Transition function is based on the variables, productions and terminals of the grammar:– (q0 , є , A) = (q0, w) whenever A w
– (q0 , a , a) = (q0, є ) for each a in T
• Easier and more intuitive if the grammar is of GNF– (q0 , a , A) = (q0, B1B2…Bn) for each production
A aB1B2…Bn
Every left-most derivation can be simulated in the PDA as follows:
1.Put S on the stack
2.Change variable on top of stack in accordance with next production
3.Read input to get to next variable on stack
4. If stack empty accept. Else, go to no. 2
On the other hand, every accepting computation must have gone
through the steps above and so corresponds to a left-most
derivation in G.
This shows that the PDA constructed accepts the same language as
the original grammar.
Example
Design the PDA for the following grammarS a | aS | bSS | SSb | SbS
PDA A = ({q},{a,b},{S,a,b},,q,,S)d is defined as
(q,,S) = { (q,a),(q,aS),(q,bSS),(q,SSb),(q,SbS) }(q,a,a) = (q,)(q,b,b) = (q,)
Processing of baa
state input stack move
q baa S (q,,S) = (q,bSS)
q baa bSS (q,b,b) = (q,)
q aa SS (q,,S) = (q,a)
q aa aS (q,a,a) = (q,)
q a S (q,,S) = (q,a)
q a a (q,a,a) = (q,)
q - - - accept -
Generate bSS
Match b
Generate a
Match a
Generate a
Match a
S
b S S
a a
b a a
matc
h
matc
h
matc
h
39
• From PDA’s to Grammars
Let P = (Q, S, G, d, q0, Z0) be a PDA. Then there is a context-free grammar
G such that L(G) = N(P).
Construct G = (V, T, P, S) where the set of nonterminals consists of:
• the special symbol S as the start symbol;
• all symbols of the form [pXq] where p and q are states in Q and X is
a stack symbol in G.The productions of G are as follows.
(a) For all states p, G has the production S [q0Z0p].
(b) Let d(q, a, X) contain the pair (r, Y1Y2 … Yk), where
– a is either a symbol in S or a = e;– k can be any number, including 0, in which case the pair is (r,
e).
Then for all lists of states r1, r2, …, rk, G has the production
[qXrk] a[rY1r1][r1Y2r2]…[rk1Ykrk].
40
Convert the following PDA to a Context Free Grammar.
Nonterminals include only two symbols, S and [qZq]. Productions:
1. S [qZq] (for the start symbol S);
2. [qZq] i[qZq][qZq] (from (q, ZZ)dN(q, i, Z))
3. [qZq] e (from (q, e)dN(q, e, Z))
start
e, Z/ei, Z/ZZ
qFig. 6.5
If we replace [qZq] by a simple symbol A, then the productions become 1. S A 2. A iAA 3. A e Obviously, these productions can be simplified to be
1. S iSS 2. S e And the grammar may be written simply as
G = ({S}, {i, e}, {S iSS | e}, S)
Assignment - 2
1. Explain in detail about Chomsky’s Hierarchy with neat diagram.2. Define the language for the following Context Free Grammars. (a) S → 0 S 1 | 01 (b) S → a S a | b S b | ε
3. Construct Leftmost parse tree and Rightmost parse tree for the following grammar and the given string ,if the grammar is ambiguous write
equivalent unambiguous grammar.R → R + R | RR|(R) |R* |a | bString : (ab+ba)*
4. Minimize the following Context Free Grammar. S → ABC| BaB
A → Aa | BaC|aaa B → bBb | a |D C → CA | AC D → ε
5. Convert the following Context Free Grammar to Chomsky Normal Form. S → bA | aB A → bAA | aS | a B → aBB | bS | b
6. Convert the following Context Free Grammar to Greibach Normal Form.S → XA | BBB → b | SBX → bA → A
7. Compare Finite automata and Push Down Automata in detail with examples and diagrams.
8. Design a PDA whose language is { w | w contains balanced parenthesis}
9. Consider the grammar S → abScB | λB → bB | b What language does it generate?
10. Design PDA for Binary strings that start and end with the same symbol and have the same number of 0s as 1s.
11. Convert the PDA for the language { wwR| w ∈ {0, 1}∗} into CFG.
12. Construct the PDA for the following CFG.G = ({S, T}, {a, b}, {S → aT b | b, T → T a | ∈ }, S).