Push-DownAutomata and Context-Free Languagesevink/education/2it70/PDF2016/2it… · associated with push-down automata is often called the class of context-free languages. This class

Chapter 3

Push-Down Automata

and Context-Free Languages

In the previous chapter, we studied finite automata, modeling computers without mem-ory. In the next chapter, we study a general model of computers with memory. In thecurrent chapter, we study an interesting class that is in between: a class of automata withmemory in the form of a stack, so-called push-down automata. The class of languagesassociated with push-down automata is often called the class of context-free languages.This class is important in the study of programming languages, in particular for parsingbased on context-free grammars.

3.1 Push-down automata

A rough sketch of the architecture of a DFA or NFA is given in Figure 3.1: the controlunit labeled ‘Automaton’ processes the input that is provided, and produces a yes/noanswer for the input string processed so far. Yes, if the current state of the control unitis a final state; no, if the current state of the control unit is a non-final state. The only‘memory’ that is available are the states of the control unit.

Automatoninput yes/no

Figure 3.1: Architecture of finite automaton

We next consider an abstract machine model that does have memory, viz. memory in theform of a stack. The stack can be accessed only at the top: something can be added ontop of the stack via a push operation, or something can be removed from the top of thestack via a pop operation. Items under the top of the stack cannot be inspected directly.For this reason, the machine model is generally referred to as a push-down automaton,PDA for short. We assume that we have means to check if the stack is empty, i.e. toobserve that there is no item on (top of) the stack. See Figure 3.2 for the augmented

1

2 CHAPTER 3. PDA AND CONTEXT-FREE LANGUAGES

architecture of a push-down automaton.

Automatoninput yes/no

Stack

Figure 3.2: Architecture of a push-down automaton

Example 3.1. Let us look at an example of a push-down automaton. In Figure 3.3, weindicate the initial state q0 as usual. Initially, the stack is assumed to be empty, signaledby the special symbol ∅. In state q0, we can input a symbol a, thereby replacing theempty stack by the stack containing a single data element 1 while control switches fromstate q0 to state q1. In the state q1, we can either input a symbol a again, replacing thetop element 1 by twice the item 1 indicated by 11 (thus effectively adding an item 1 ontop), and remain in state q1, or alternatively input the symbol b, popping the top item,which is an item 1, by putting nothing in its place as indicated by the string ε, and goingto state q2. There, we can read in a b repeatedly, popping 1’s, but we must terminateand go to final state q3 if (and only if) the stack has become empty.

We see that this push-down automaton can read any number of a’s (collecting exactlyso many 1’s on the stack), followed by the input of the same number of b’s followed bytermination. As we shall see, the language accepted by the PDA is { anbn | n > 1 }, anon-regular language.

0 1 2 3a[∅/1] b[1/ε] τ [∅/ε]

a[1/11] b[1/ε]

Figure 3.3: Example push-down automaton.

From the example we see that in a PDA we allow internal steps τ , that consume noinput symbol. We even allow the stack to be modified during such steps. We also seethat the transitions need not to be specified by a function. E.g., a b-transition is missingfor state q0. Like for NFA, a relation is fine. This also allows to have multiple transitionsfor given a state, input or τ , and stack.

Definition 3.2 (Push-down automaton). A push-down automaton (PDA) is a septupleP = (Q, Σ, ∆, ∅, →, q0, F ) where

3.1. PUSH-DOWN AUTOMATA 3

• Q is a finite set of states,

• Σ is a finite input alphabet with τ /∈ Σ,

• ∆ is a finite data alphabet or stack alphabet,

• ∅ /∈ ∆ a special symbol denoting the empty stack,

• → ⊆ Q× Στ ×∆∅ ×∆∗ ×Q, where Στ = Σ ∪ {τ} and ∆∅ = ∆ ∪ {∅}, is a finiteset of transitions or steps,

• q0 ∈ Q is the initial state,

• F ⊆ Q is the set of final states.

We use x and y for strings in ∆∗ denoting the content of the stack. For non-empty stringsthe leftmost symbol indicates the topmost element of the stack. If (q, a, d, x, q′) ∈ →

with d ∈ ∆, we write qa[d/x]−−−−→ q′, and this means that the machine, when it is in state q

and d is the top element of the stack, can consume the input symbol a, replace the topelement d by the string x and thereby move to state q′. Likewise, if (q, a,∅, x, q′) ∈ →

we write qa[∅/x]−−−−→ q′ meaning that the machine, when it is in state q and the stack is

empty, can consume the input symbol a, put the string x on the stack and thereby move

to state q′. In steps qτ [d/x]−−−−→ q′ and q

τ [∅/x]−−−−→ q′, no input symbol is consumed, only the

stack is modified (if x 6= d and x 6= ε, respectively).Note the occurrence of ∆∅ vs. ∆∗ for the transition relation →. As a special case,

for a transition of the form qa[∅/ε]−−−−→ q′, the symbol a is read from input, but the stack

remains empty.

Example 3.3. Consider again Figure 3.3. The figure depicts a push-down automatonP = (Q, Σ, ∆, ∅, →, q0, {q3} ) with Q = {q0, q1, q2, q3}, Σ = {a, b}, and ∆ = {1} as setof states, input alphabet and stack alphabet, respectively, transitions

q0a[∅/1]−−−−→ q1, q1

a[1/11]−−−−→ q1, q1

b[1/ε]−−−−→ q2, q2

b[1/ε]−−−−→ q2, q2

τ [∅/ε]−−−−→ q3

initial state q0, and the singleton {q3} as the set of final states.

Also for PDA we introduce the notion of a configuration for the description of computa-tions. Apart from the current state and the string left on input, we need to keep trackof the stack.

Definition 3.4 (Configuration). Let P = (Q, Σ, ∆, ∅, →, q0, F ) be a push-down au-tomaton. A configuration or instantaneous description (ID) of P is a triple (q, w, x) ∈Q× Σ∗ ×∆∗. The relation ⊢P ⊆ (Q× Σ∗ ×∆∗ )× (Q× Σ∗ ×∆∗ ) is given as follows.

(i) (q, aw, dy) ⊢P (p, w, xy) if qa[d/x]−−−−→ p


(ii) (q, w, dy) ⊢P (p, w, xy) if qτ [d/x]−−−−→ p

(iii) (q, aw, ε) ⊢P (p, w, x) if qa[∅/x]−−−−→ p

(iv) (q, w, ε) ⊢P (p, w, x) if qτ [∅/x]−−−−→ p.

The ‘derives’ relation ⊢∗P is the reflexive and transitive closure of the relation ⊢P .

In the definition we see, in line with the convention for strings x and y indicating stackcontent made above, that if x = d1 · · · dn ∈ ∆∗ then the symbol d1 will be on top of thestack. The symbol at position i from the top will be di, for 1 6 i 6 n, and the symbol dnwill be at the bottom of the stack.

Example 3.5. For the push-down automaton of Figure 3.3 we have, e.g.,

(q0, aabb, ε) ⊢P (q1, abb, 1) ⊢P (q1, bb, 11) ⊢P (q2, b, 1) ⊢P (q2, ε, ε) ⊢P (q3, ε, ε)

Thus (q0, aabb, ε) ⊢∗P (q3, ε, ε) and also (q0, aabb, ε) ⊢∗

P (q2, b, 1) and (q1, abb, 1) ⊢∗P

(q2, ε, ε).

For DFA and NFA we know that a derivation is independent of the input that is nottouched, see Lemma ??b and Lemma ??. For PDA we can include part of the stack in thisconsideration. The stack items that are not inspected do not influence the computation.We first consider a one-step derivation, and next generalize to multi-step derivations.Regarding the untouched part of the stack y it is important that it remain covered, sincethe top element of the stack can be inspected by the PDA. Therefore, in the lemma, thestrings x and xi above y need to be non-empty.

Lemma 3.6. Let P = (Q, Σ, ∆, ∅, →, q0, F ) be a push-down automaton.

(a) For w,w′, v ∈ Σ∗, q, q′ ∈ Q, x, x′, y ∈ ∆∗ with x 6= ε, it holds that

(q, wv, xy) ⊢P (q′, w′v, x′y) ⇐⇒ (q, w, x) ⊢P (q′, w′, x′)

(b) For n > 0, wi ∈ Σ∗, qi ∈ Q, xi ∈ ∆∗ with xi 6= ε, for 1 6 i < n, v ∈ Σ∗ and y ∈ ∆∗,it holds that

(q0, w0v, x0y) ⊢P (q1, w1v, x1y) ⊢P . . . ⊢P (qn, wnv, xny)

⇐⇒

(q0, w0, x0) ⊢P (q1, w1, x1) ⊢P . . . ⊢P (qn, wn, xn)

Proof. (a) We exploit Definition 3.4. If (q, wv, xy) ⊢P (q′, w′v, x′y) with x 6= ε then

either we have (i) qa[d/x]−−−−→ q′ and w = aw′, x = dx′′, x′ = xx′′ for suitable x and x′′ (by

clause (i) of Definition 3.4), or we have (ii) qτ [d/x]−−−−→ q′ and w = w′, x = dx′′, x′ = xx′′ for


suitable x and x′′ (by clause (ii) of Definition 3.4). So, either (q, w, x) = (q, aw′, dx′′) ⊢P

(q′, w′, xx′′) = (q′, w′, x′) or (q, w, x) = (q, w, dx′′) ⊢P (q′, w, xx′′) = (q′, w′, x′).

Reversely, if (q, w, x) ⊢P (q′, w′, x′) with x 6= ε, then either (i) qa[d/x]−−−−→ q′ and

w = aw′, x = dx′′, x′ = xx′′ for suitable x and x′′, or we have (ii) qτ [d/x]−−−−→ q′ and

w = w′, x = dx′′, x′ = xx′′ for suitable x and x′′. It follows that wv = aw′v, xy = dx′′y,x′y = xx′′y, or wv = w′v, xy = dx′′y, x′y = xx′′y. Thus (q, wv, xy) ⊢P (q′, w′v, x′y).

(b) By induction on n > 1 using part (a).

With the notion of an instantaneous description in place and the derivation relation ⊢P

given, we can define the language accepted by a PDA.

Definition 3.7. Let P = (Q,Σ,∆,∅,→, q0, F ) be a push-down automaton. The lan-guage L(P ), called the language accepted by the push-down automaton P , is given by

{ w ∈ Σ∗ | ∃q ∈ F ∃x ∈ ∆∗ : (q0, w, ε) ⊢∗P (q, ε, x) }

Note that we require the input string w to be processed completely, and we require q tobe a final state of P , i.e. q ∈ F , but the string x can be any stack content.

Example 3.8. Consider once more the push-down automaton P of Figure 3.3. Weverify L(P ) = { anbn | n > 1 } by means of a so-called invariant table.

state q input w stack x

q0 ε ε

q1 an 1n 1 6 n

q2 anbm 1n−m 1 6 m 6 n

q3 anbn ε 1 6 n

The first column of the table lists the states in Q. The second column lists the input bywhich the state is reached (starting from the initial state with empty stack). The thirdcolumn lists the contents of the stack after the state is reached while reading the inputof the second column. The last column provides further constraints and relationships ofindices involved. If a triple (q, w, x) is listed, then it holds that (q0, w, ε) ⊢∗

P (q, ε, x).

For the PDA P of Figure 3.3 the first row is obvious: Starting in state q0 (firstcolumn) with empty stack ε (third column) the PDA immediately leaves q0. So, P willonly be in q0 when no input is given, i.e., when the input equals ε (second column). Thesecond row captures that P is in state q1 after reading an, for n > 0. The first a is readon the transition from q0 to q1, therefore n > 0 when in state q1. Possibly more a’s areread on execution of the transition from q1 to itself. In q1 the stack contains as manysymbols 1 as symbols a have been read so far; the index n is the same for the inputcolumn displaying an and the stack column displaying qn. Upon reading a symbol bin state q1, hence with the stack non-empty, the PDA moves from state q1 to state q2.More b’s can be read, but no more than the number of symbols 1 on the stack. For


each b read, one symbol 1 is popped off the stack. Thus, if the string bm is read, thestring 1m is taken from the stack leaving 1n−m on the stack (where n−m > 0. Whenthe stack has become empty, the transition from q2 to q3 becomes enabled. There nofurther input can be read. As the stack was empty upon leaving q2, all symbols 1 musthave been popped off. Thus, as many b’s have been read as there were symbols a before.

Now q3 is the only final state of P . From the table we see that the input read toreach q3 starting from the initial state q0 with empty stack is of the form anbn for somen > 1. Thus (q0, w, ε) ⊢∗

P (q3, ε, x) iff w = anbn with n > 1. This proves that L(P ) isthe language as claimed.

Example 3.9. Let us construct a push-down automaton for the non-regular languageL = { wwR | w ∈ {a, b}∗ }. The input alphabet is {a, b}. It is convenient to use a and bas stack symbols as well. So ∆ = {a, b}. In the initial state, a string is read and puton the stack. At some point the PDA guesses to be half-way the input, right after thestring w and before the string wR. Therefore, the PDA can switch non-deterministicallyto the second state, where the stack items will be compared with the input one by one.Note that the stack will be read in reversed order now as a stack is last-in first-out.Termination takes place when the stack is empty again. The above is implemented bythe push-down automaton P given in Figure 3.4. In this figure we have d ∈ ∆, so d iseither a or b.

The invariant table belonging to P is as follows. Recall, a triple (q, w, x) is listed inthe invariance table iff it holds that (q0, w, ε) ⊢∗

P (q, ε, x).


q0 w wR w ∈ {a, b}∗

q1 wv u w ∈ {a, b}∗, vu = wR

q2 wv ε w ∈ {a, b}∗, v = wR

In q0 the string w that is read, is stored on the stack in reversed order because of thelast-in first-out regime of the stack. After the non-deterministic switch from state q0 tostate q1 which does not affect the input nor the stack, in q1 a symbol can only be readif it matches the symbol on top of the stack. So the extra input v, which equals whatis already popped off the stack, and the remainder of the stack u together form wR.Since q1 is left only if the stack is empty, it must be the case in q2 that u = ε, hencev = wR. Therefore, only words of the form wwR are accepted. As any string can beread initially, including the empty string ε, we conclude that P indeed accepts L.

Exercises for Section 3.1

Exercise 3.1.1. Consider the following PDA.


0 1 2

a[∅/a]a[a/aa]a[b/ab]

b[∅/b]b[a/ba]b[b/bb]

τ [∅/ε]

τ [a/a]τ [b/b]

τ [∅/ε]

b[b/ε]

a[a/ε]

Figure 3.4: A PDA accepting { wwR | w ∈ {a, b}∗ }

0 1

a[∅/1]a[1/11]

b[1/ε]

b[1/ε]

τ [1/ε]

Compute all maximal derivation sequences, starting from the initial state q0, for thefollowing inputs:

(a) ab;

(b) aab;

(c) aaabbb.

A maximal derivation sequence of a PDA P for a string w is a sequence

(q0, w0, x0) ⊢P (q1, w1, x1) ⊢P . . . (qn−1, wn−1, xn−1) ⊢P (qn, wn, xn) 0P

where q0, q1, . . . , qn−1, qn are states of P with q0 its initial state, w0, w1, . . . , wn−1, wn

strings over the input alphabet of P with w0 equal to w, and x0, x1, . . . , xn−1, xn stringsover the stack alphabet of P with x0 equal to ε, the empty stack.

Exercise 3.1.2. Construct a push-down automaton and give an invariant table for thefollowing languages over the input alphabet Σ = {a, b, c}. Also provide an invarianttable for these PDA.

(a) L1 = { anbmcn+m | n,m > 0 };

(b) L2 = { an+mbncm | n,m > 0 };

(c) L3 = { anbn+mcm | n,m > 0 };


Exercise 3.1.3. Give a push-down automaton for each of the following languages. Alsoprovide an invariant table for these PDA.

(a) L4 = { anb2n | n > 0 };

(b) L5 = { anbm | m > n > 1 };

(c) L6 = { anbm | 2n = 3m+ 1 };

(d) L7 = { anbm | m,n > 0, m 6= n }.

Exercise 3.1.4. (a) Give a push-down automaton for the language

L8 = { w ∈ {a, b}∗ | #a(w) 6= #b(w) }

(b) Give a push-down automaton for the language

L9 = { w ∈ {a, b, c}∗ | #a(w) 6= #b(w) ∨#b(w) 6= #c(w) }

3.2 Context-free grammars

Consider again the language L = { anbn | n > 0 }. Clearly ab ∈ L; take n = 1. Now, foran arbitrary element w ∈ L, the string awb is also an element of L: if w = anbn for somen > 0, then awb = an+1bn+1 and n+1 > 0. Thus, writing the symbol S to represent anelement of L, we can write

S → ab and S → aSb (3.1)

We may read S → ab as “S produces ab”, and S → aSb as “S produces aSb”. The ex-pressions S → ab and S → aSb are called production rules, or more precisely productionrules for S.

Production rules are typically used in a setting with so-called productions or deriva-tions. If uSv is a string, say with u, v ∈ {S, a, b}∗, then we have the productionsuSv ⇒ uabv and uSv ⇒ uaSbv, based on the production rules S → ab and S → aSb,respectively. Since the applicability of the production rules does not depend on the con-text comprised by the strings u and v, but only the symbol S, the production scheme isreferred to as context-free.

The production rules of Equation (3.1) can be applied to strings containing S re-peatedly, yielding sequences of zero, one or more productions or derivations. E.g., wehave

S ⇒ abS ⇒ aSb ⇒ aabbS ⇒ aSb ⇒ aaSbb ⇒ aaabbb

Such sequences are called production sequences or derivation sequences. We write, e.g.,S ⇒∗ ab, S ⇒∗ aabb, S ⇒∗ aaabbb production for strings in L, but also aSb ⇒∗ aaSbband aSb ⇒∗ aSb of strings containing S.

3.2. CONTEXT-FREE GRAMMARS 9

Definition 3.10 (Context-free grammar). A context-free grammar (CFG) is a four-tuple G = (V, T, R, S) where

1. V is a non-empty finite set of variables or non-terminals,

2. T is a finite set of terminals, disjoint from V ,

3. R ⊆ V ×(V ∪ T )∗ is a finite set of production rules, and

4. S ∈ V is the start symbol.

We use CFG as shorthand for the notion of a context-free grammar. As we shall see,the set of terminals T corresponds to the input alphabet Σ of a PDA.

If G is a context-free grammar with set of production rules R, we suggestively writeA → α, read “A produces α”, if (A,α) ∈ R. Sometimes we write A →G α to stressthat the production rule belongs to the grammar G. The production rule is called aproduction rule for the non-terminal A. If A → α1, A → α2, . . ., A → αn are all theproduction rules in R for A, we also write A → α1 | α2 | . . . | αn.

Example 3.11. Define G = (V, T, R, S) with V = {S}, T = {a, b} and R consisting of

S → ab and S → aSb

Then G is a context-free grammar with non-terminal S, terminals a and b and the twoproduction rules above. The start symbol of G is S.

Example 3.12. In view of the shorthand notation given above, the following describesseven production rules:

S → ABC

A → ε | aA

B → ε | Bb

C → c | cC

The rules implicitly define a context-free grammar G = (V, T, R, S). The followingconventions apply: (i) Commonly, capital letters indicate non-terminals. Thus, for G wehave V = { S, A, B, C }. Note that the elements of V are precisely the symbols thatoccur at the left-hand side of the rules above (strictly speaking this is not required).(ii) Usually, lower-case letters indicate terminals. Thus for G we have T = { a, b, c }.Note that the elements of T are exactly the letters that are not in V and occur atthe right-hand side of the rules (again, strictly speaking this is not required). Note,ε indicates the empty string, and hence is not in T , nor in V . (iii) Obviously, the setof production rules R consists of the rules listed. (iv) A customary choice for the startsymbol is S which indeed is an element of the set of non-terminals V here.

Definition 3.13 (Production, production sequence, language of a CFG). Let G =(V, T, R, S) be a context-free grammar.


• Let A → α ∈ R be a production rule of G. Thus A ∈ V and α ∈ (V ∪ T )∗. Letγ = β1Aβ2 be a string in which A occurs. Put γ′ = β1αβ2. We say that from thestring γ the production rule A → α produces the string γ′, notation γ ⇒G γ′.

• A production sequence or derivation is a sequence (γi)ni=0 such that γi−1 ⇒G γi,

for 1 6 i 6 n. Often we write

γ0 ⇒G γ1 ⇒G . . . ⇒G γn−1 ⇒G γn

The length of this production sequence is n, the number of productions. In caseγ = γ0 and γ′ = γn we also write γ ⇒∗

G γ′.

• Let A ∈ V be a variable of G. The language LG(A) generated by G from A isgiven by

LG(A) = { w ∈ T ∗ | A ⇒∗G w }

• The language L(G), the language generated by the CFG G, consists of all stringsof terminals that can be produced from the start symbol S, i.e.

L(G) = LG(S)

• A language L is called context-free, if there exists a context-free grammar G suchthat L = L(G).

Note, with respect to the grammar G, we have L(G) = LG(S) = { w ∈ T ∗ | S ⇒∗G w }.

When the grammar G is clear it may be dropped as a subscript.

Example 3.14. Consider again the grammar G given in Example 3.11. As eludedto at the beginning of this section, S ⇒G ab and aaSbb ⇒G aaabbb are productionsof G, but also SS ⇒G SaSb is a production of G. Example production sequences areS ⇒G ab and S ⇒ aSb ⇒ aaSbb ⇒ aaabbb, thus S ⇒∗

G ab, and S ⇒∗G aaabbb. Also

S ⇒ aSb ⇒ aaSbb and SS ⇒G SaSb ⇒G abaSb are production sequences of G. Weclaim L(G) = { anbn | n > 1 }. This can be shown by proving for L = { anbn | n > 1 }two set inclusions: (i) the inclusion L(G) ⊆ L by induction on the length of a productionsequence, and (ii) the inclusion L ⊆ L(G) by induction on the parameter n.

Proof of the claim (L(G) ⊆ L) We show by induction on n, if S ⇒nG w and w ∈ T ∗

then w ∈ L. Basis: n = 0. Then S = w while S /∈ T ∗, hence there is nothing to show.Induction step, n + 1: Suppose S ⇒n+1

G w. We have either S ⇒G ab ⇒nG w, in case

the production rule S → ab was used, or S ⇒G aSb ⇒nG w, in case the production

rule S → aSb was used. In the first case clearly w ∈ L. As to the second case, byLemma 3.15c below it follows that w = avb for some string v ∈ T ∗ and S ⇒n

G v.By induction hypothesis v ∈ L, i.e. v = ambm for some suitable m > 1. Thereforew = avb = am+1bm+1 and w ∈ L too.

(L ⊆ L(G)) We show by induction on n, S ⇒∗G anbn, for n > 1. Basis, n = 1:

Clear, S ⇒G ab based on the production rule S → ab. Induction hypothesis, n+1: Wehave anbn ∈ L. By induction hypothesis it follows that S ⇒∗

G anbn. By Lemma 3.15bwe obtain aSb ⇒∗

G aanbnb. Therefore S ⇒G aSb ⇒∗G an+1bn+1, as was to be shown.


In the example above we called to Lemma 3.15 below to split and combine productionsequences. This technical lemma summarizes the context independence of the machineryintroduced and is used in many situations.

Lemma 3.15. Let G = (V, T, R, S) be a context-free grammar.

(a) Let x, x′, y, y′ ∈ (V ∪ T )∗. If x ⇒nG x′ and y ⇒m

G y′ then xy ⇒n+mG x′y′.

(b) Let k > 1, X1 . . . , Xk ∈ V ∪T , n1, . . . , nk > 0, and x1, . . . , xk ∈ (V ∪T )∗. IfX1 ⇒n1

G x1, . . ., Xk ⇒nk

G xk, thenX1 · · ·Xk ⇒nG x1 · · ·xk where n = n1+· · ·+nk.

(c) Let X1 . . . , Xk ∈ V ∪T and x ∈ (V ∪T )∗. If X1 · · ·Xk ⇒nG x then there exist

n1, . . . , nk > 0, and x1, . . . , xk ∈ (V ∪T )∗ such that n = n1+ · · ·+nk, X1 ⇒n1

G x1,. . ., Xk ⇒nk

G xk, and x = x1 . . . xk.

Proof. (a) By induction n+m, if x ⇒nG x′ and y ⇒m

G y′ then xy ⇒n+mG x′y′. Basis,

n+m = 0: Trivial. We have x = x′, y = y′ and xy ⇒0G xy. Induction step, n+m+1:

If x ⇒G x ⇒nG x′ and y ⇒m

G y′ we have x = x1Ax2, x = x1xx2 for a production ruleA → x of G. By induction hypothesis xy ⇒n+m

G x′y′. We also have xy = x1Ax2y ⇒G

x1xx2y = xy. Thus xy ⇒n+m+1G x′y′. If y ⇒G y ⇒m

G y′, then by similar reasoning itfollows that xy ⇒G xy and xy ⇒n+m

G x′y′. From this it follows that xy ⇒n+m+1G x′y′,

as was to be shown.

(b) By induction on k, using part (a).

(c) By induction on n, if X1 · · ·Xk ⇒nG x then X1 ⇒n1

G x1, . . ., Xk ⇒nk

G xk, andx = x1 · · ·xk for suitable n1, . . . , nk, and x1, . . . , xk. Basis, n = 0: If X1 · · ·Xk ⇒0

G x,then X1 · · ·Xk = x. Choose x1 = X1, . . ., xk = Xk and n1, . . . , nk = 0. Then clearlyn1 + · · · + nk = 0 = n, Xi ⇒∗

G xi, for 1 6 i 6 k, and x1 · · ·xk = X1 · · ·Xk = x.Induction step, n+ 1: Suppose

X1 · · ·Xk ⇒G X1 · · ·Xi−1Y1 · · ·YℓXi+1 · · ·Xk ⇒nG x

for some i, 1 6 i 6 k, such that Xi ⇒G Y1 · · ·Yℓ. By induction hypothesis exist indicesn1, . . . , ni−1, m1, . . . ,mℓ, ni+1, . . . , nk, and strings x1, . . . , xi−1, y1, . . . , yℓ, xi+1, . . . , xksuch that Xj ⇒

nj

G xj for 1 6 j < i or i < j 6 k, Yh ⇒mh

G yh for 1 6 h 6 ℓ, andn = n1+ · · ·+ni−1+m1+ · · ·+mℓ+ni+1+ · · ·+nk, and x = x1 · · ·xi−1y1 · · · yℓxi+1 · · ·xk.Put m = m1+· · ·+mℓ, ni = m+1, and xi = y1 · · · yℓ. Then we have n+1 = n1+· · ·+nk,and

Xi ⇒G Y1 · · ·Yℓ ⇒mG y1 · · · yℓ = xi

Thus, by part (b), Xi ⇒ni

G xi. Thus Xj ⇒nj

G xj for 1 6 j 6 k, including i, andx = x1 · · ·xi−1y1 · · · yℓxi+1 · · ·xk = x1 · · ·xk, as was to be shown.

Example 3.16. The so-called parentheses language L( ) ⊆ {( , )}∗ is the language ofstrings of balanced parentheses: for a string of parentheses w = b1 · · · bn ∈ L( ) it holdsthat for an arbitrary prefix v 4 w, say v = b1 · · · bm, 0 6 m 6 n, it holds that #((v) >#)(v), and for w it holds that #((w) = #)(w). Thus, the i-th right parenthesis occurs


only after the i-th left parenthesis, and for the left parenthesis with number i there is aright parenthesis with number i.

We have ()(()) ∈ L( ) by inspection of the string (1)1(2(3)2)3 where we number theleft and right parentheses. But w = ())( and v = (()( are not in L( ) as the annotatedversions (1)1)2(2 and (1(2)1(3 show; for w we have that )2 occurs before (2, while for vwe have that there are 3 left parentheses but only 1 right parenthesis. Note that theempty string ε meets the requirements, although it has no parenthesis at all.

A possible grammar G for L( ) is given by the rules

S → ε, S → SS, S → (S)

With respect to G we have the production sequence

S ⇒G SS ⇒G S(S) ⇒G S((S)) ⇒G S(()) ⇒G (S)(()) ⇒G ()(())

for the string ()(()) ∈ L( ), but also the production sequence

S ⇒G SS ⇒G (S)S ⇒G ()S ⇒G ()(S) ⇒G ()((S)) ⇒G ()(())

as well as

S ⇒G SS ⇒G (S)S ⇒G (S)(S) ⇒G ()(S) ⇒G ()((S)) ⇒G ()(())

There are several more derivations for ()(()) with respect to G.

Definition 3.17. Let G = (V, T, R, S) be a context-free grammar. A productionγ ⇒G γ′ is called a leftmost production if γ′ is obtained from γ by application of aproduction rule of G on the leftmost variable occurring in γ, i.e., γ = wAβ, A → α a

rule of G, γ′ = wαβ for some w ∈ T ∗, A ∈ V , and α, β ∈ (V ∪T )∗. Notation, γℓ⇒G γ′. A

leftmost derivation of G is a production sequence (γi)ni=0 of G for which every production

is leftmost. Notation, γℓ⇒∗

G γ′.

Similarly one can define the notion of a rightmost production and a rightmost derivationfor a CFG G.

We often write γℓ⇒∗ γ′ rather than γ

ℓ⇒∗

G γ′, if the grammar G is clear from thecontext. The notion of a leftmost derivation is useful to select a particular productionsequence from the many that may produce a string. However, it is not generally thecase that there exists precisely one leftmost derivation sequence from a string γ to astring γ′. Although, in a leftmost derivation there is no freedom to select the variablethat is used for the production, there may be freedom to select the production rule touse. Consider, for example the grammar

S → AB, A → aa | aac, B → bb | cbb


which allows four complete leftmost derivations starting from S, viz.

Sℓ⇒ AB

ℓ⇒ aaB

ℓ⇒ aabb

Sℓ⇒ AB

ℓ⇒ aaB

ℓ⇒ aacbb

Sℓ⇒ AB

ℓ⇒ aacB

ℓ⇒ aacbb

Sℓ⇒ AB

ℓ⇒ aacB

ℓ⇒ aaccbb

Note, there are two leftmost derivations of the string aacbb.

In the next section we will show that if there is a production sequence from γ to γ′

for some grammar G, then there is also a leftmost derivation from γ to γ′ for G, i.e. if

γ ⇒∗G γ′ then γ

ℓ⇒∗

G γ′.

Theorem 3.18. If L is a regular language, then L is also context-free.

Proof. Let D = (Q, Σ, δ, q0, F ) be a deterministic automaton that accepts L. Definethe grammar G = (Q, Σ, R, q0) where R is given by

R = { q → aq′ | δ(q, a) = q′ } ∪ { q → ε | q ∈ F }

We claim that L = L(G).

Suppose w ∈ L and w = a1a2 . . . an. Let (q0, a1a2 · · · an) ⊢D (q1, a2 · · · an) ⊢D

· · · ⊢D (qn−1, an) ⊢D (qn, ε) with qn ∈ F be an accepting transition sequence of Dfor w. Then we have a production sequence

γ0 ⇒G γ1 ⇒G · · · ⇒G γn+1

of G for w with γ0 = q0, γi = a1 · · · aiqi for 0 6 i 6 n, and γn+1 = a1 · · · an.Since δ(qi−1, ai) = qi we have qi−1 → aiqi, and hence γi−1 = a1 · · · ai−1qi−1 ⇒G

qa1 · · · ai−1aiqi = γi for 1 6 i 6 n and γn = a1 · · · anqn ⇒G a1 · · · an = γn+1. Thusw ∈ L(G) and L ⊆ L(G).

Suppose w ∈ L(G). Say γ0 ⇒G γ1 ⇒G · · · ⇒G γn+1 for some n > 0 with γ0 = q0 andγn+1 = w. Then there exist a1, . . . , an ∈ Σ and q0, . . . , qn ∈ Q such that γi = a1 · · · aiqi,for 0 6 i 6 n and γn+1 = w = a1 · · · an. Moreover, δ(qi−1, ai) = qi, for 1 6 i 6 n, andqn ∈ F . Therefore we have

(q0, a1a2 · · · an) ⊢D (q1, a2 · · · an) ⊢D · ⊢D (qn−1, an) ⊢D (qn, ε) and qn ∈ F

It follows that w ∈ L and L(G) ⊆ L.

As we shall see later, the language L( ) of Example 3.16 is not regular. So, the reverseof Theorem 3.18 does not hold.



Exercise 3.2.1. (Hopcroft, Motwani & Ullman 2001) Consider the context-free gram-mar G given by the production rules

S → XbYX → ε | aXY → ε | aY | bY

that generates the language of the regular expression a∗b(a + b)∗. Give leftmost andrightmost derivations for the following strings:

(a) aabab;

(b) baab;

(c) aaabb.

Exercise 3.2.2. Consider the context-free grammar G given by the production rules

S → A | BA → ε | aAB → ε | bB

(a) Put LA = { an | n > 0 }. Prove that LG(A) = LA.

(b) Put L = { an | n > 0 } ∪ { bn | n > 0 }. Prove that L(G) = L.

Exercise 3.2.3. Give a context-free grammar for each of the following languages andprove them correct.

(a) L1 = { anbm | n,m > 0, n 6= m };

(b) L2 = { anbmcℓ | n,m, ℓ > 0, n 6= m ∨ m 6= ℓ };

Exercise 3.2.4. Give a construction, based on the number of operators, that shows thatthe language of every regular expression can be generated by a context-free grammar.

Exercise 3.2.5. Consider the context-free grammar G given by the production rules

S → aS | Sb | a | b

Put X = { anSbm, anbm | n,m > 0, n+m > 1 }.

3.3. CHOMSKY NORMAL FORM 15

(a) Prove S ⇒kG x implies x ∈ X , for x ∈ {S, a, b}∗.

(b) Prove that no string w ∈ L(G) has a substring ba.

Exercise 3.2.6. Put L = { w ∈ {a, b}∗ | #a(w) = #b(w) }.

(a) (Hopcroft, Motwani & Ullman 2001) Consider the context-free grammar G givenby the production rules

S → aSbS | bSaS | ε

Prove that L(G) = L.

(b) Give an alternative CFG G′ with four production rules such that L(G′) = L.

3.3 Chomsky normal form

In the previous section several examples of context-free grammars have been given. Inthis section we aim to come up with a specific format of a context-free grammar, calledChomsky normal form, that is a convenient starting point for algorithmic purposes.Subsequently, we will discuss (i) how to eliminate ε-productions A → ε, (ii) how toeliminate so-called unit rules A → B, and (iii) how to identify useless variables, allwithout affecting the language generated by the grammar. These procedures togetheryield a compact context-free grammar which is then easily converted to Chomsky normalform.

Elimination of ε-productions

The CNFG = ({S,A}, {a, b}, {S → A | a | b, A → ε}, S) generates the language L(G) ={ε, a, b}. However, simply deleting the ε-production A → ε from the set of productionrules yields the grammar G′ = ({S}, {a, b}, {S → a | b}, S), which generates a differentlanguage, L(G′) = {a, b}. So, in order to eliminate ε-productions propertly, we need togo about it slightly more cautiously. In this respect, the relevant concept is the definitionof a nullable variable.

Definition 3.19. A variable A of a context-free grammarG is called nullable if A ⇒∗G ε.

The set of nullable variables of G is denoted by Null(G).

Example 3.20. Consider the CNF G = ({S,A,B,C,D}, {a, b, c}, R, S) with produc-tions

S → AB|c, A → ε, B → ε, C → ε|c, D → ab

Clearly, the variables A, B and C are nullable. Also the start symbol S is nullable sinceS ⇒2

G ε. The variable D is not nullable. So, Null(G) = {S,A,B,C}.


Lemma 3.21. Let G = (V, T, R, S) be a CFG. Define the subsets Nulli ⊆ V by

Null0=∅

Nulli+1=Nulli ∪ {A | A → X1 · · ·Xℓ ∈ R, ∀j, 1 6 j 6 ℓ : Xj ∈ Nulli }

for i > 0, and put Null = ∪∞i=0 Nulli. Then it holds that Null = Null(G).

Proof. (Null ⊆ Null(G)) If suffices to prove that Nulli ⊆ Null(G) by induction on i. Ba-sis, i=0: Clear, since Null0 = ∅. Induction step, i+1: Suppose A ∈ Nulli+1\Nulli. Thenthere is a rule A → X1 · · ·Xℓ in R with X1, . . . , Xℓ ∈ Nulli. By induction hypothesis,X1, . . . , Xℓ ∈ (G). Thus, Xj ⇒∗

G ε for 1 6 j 6 ℓ. Therefore, A ⇒G X1 · · ·Xℓ ⇒∗G ε.

Hence A ∈ Null(G).(Null(G) ⊆ Null) It suffices to prove that A ⇒k

G ε implies A ∈ Nullk by inductionon k. Basis, k=0: Trivial, this situation does not occur. Induction step, k+1: Suppose

A →G X1 · · ·Xℓ ⇒∗G ε. Then Xj ∈ V and Xj ⇒

kjG ε for some kj > 0, for 1 6 j 6 ℓ,

such that k1 + · · · + kℓ = k. By induction hypothesis, Xi ∈ Nullki , hence Xi ∈ Nullk.Since A → X1 · · ·Xℓ is a rule of G, it follows that A ∈ Nullk+1.

Lemma 3.22. Let G = (V, T, R, S) be a CFG. Suppose B → ε is a rule in R. Definethe set of rules R′ by

R = {A → α1 · · ·αn | A → X1 · · ·Xn ∈ R,

∀i, 1 6 i 6 n : αi = Xi ∨ (Xi ∈ Null(G) ∧ αi = ε) }

and the CFG G′ by G′ = (V, T, R′, S) where R′ = R\{A → ε | A ∈ V }. Then it holdsthat L(G)\{ε} = L(G′)\{ε}.

Proof. First we prove the following claim, for arbitrary A ∈ V and w ∈ T ∗, w 6= ε.

A ⇒∗G w iff A ⇒∗

G′ w (3.2)

Proof of the claim: (⇒) By induction on n, for w 6= ε, if A ⇒nG w then A ⇒∗

G′ w.Basis, n = 0: Trivial. This situation doesn’t occur. Induction step, n + 1: SupposeA →G X1 · · ·Xk, Xi ⇒ni

G wi for some ni > 0 and wi ∈ T ∗, for 1 6 i 6 k, such thatn1 + · · ·nk = n and w1 · · ·wk = w (see Lemma 3.15c). Put, for 1 6 i 6 k, αi = ε ifwi = ε and αi = Xi if wi 6= ε. Note, if wi = ε then Xi ∈ Null(G). Therefore, we havethat the rule A →G X1 · · ·Xk of G induces a rule A →G′ α1 · · ·αk in R. Moveover,not all wi are equal to ε, since w1 · · ·wk = w 6= ε. Thus, also α1 · · ·αk 6= ε. Hence,A →G′ α1 · · ·αk is a rule of G′. By induction hypothesis, we have Xi ⇒∗

G′ wi, for1 6 i 6 k such that wi 6= ε. Therefore we have αi ⇒∗

G′ wi, for 1 6 i 6 k, whereαi ⇒∗

G′ wi is the empty production sequence ε ⇒∗G′ ε if wi = ε. By Lemma 3.15b we

obtain α1 · · ·αk ⇒∗G′ w1 · · ·wk. Thus,

A ⇒∗G′ α1 · · ·αk ⇒∗

G′ w1 · · ·wk = w

which completes the induction step.


(⇐) By induction on n, for w 6= ε, if A ⇒nG′ w then A ⇒∗

G w. Basis, n = 0: Trivial.This situation doesn’t occur. Induction step, n+ 1: Suppose A →G′ Y1 · · ·Yℓ for somevariables Y1, . . . , Yℓ ∈ V such that Yi ⇒ni

G′ wi for some ni > 0, wi ∈ T ∗, for 1 6 i 6 ℓ,with n1 + · · ·nℓ = n and w1 · · ·wℓ = w (see Lemma 3.15c). Note, wi 6= ε, for 1 6 i 6 ℓ,since G′ has no ε-productions.

By definition of R, there exist, for some k > 0, variables X1, . . . , Xk in V and amonotone injection f : {1, .., ℓ} → {1, .., k} such that A →G X1 · · ·Xk and (i) Yi =Xf(i), for 1 6 i 6 ℓ, and (ii) Xj ∈ Null(G) if j /∈ Img(f) = { f(i) | 1 6 i 6 ℓ }. Inthe first case, we have Xf(i) ⇒ni

G′ wi 6= ε. Thus Xf(i) ⇒∗G wi by induction hypothesis.

In the second case, we have Xj ⇒∗G ε since Xj ∈ Null(G). Therefore, we can pick

w′1, . . . , w

′k ∈ T ∗ such that Xj ⇒∗

G w′j , for 1 6 j 6 k, and w′

1 · · ·w′k = w1 · · ·wℓ = w:

put w′j = wi if f(i) = j ∈ {1, .., ℓ}, put w′

j = ε if j /∈ Img(f). It follows that

A →G X1 · · ·Xk →G w′1 · · ·w

′k = w1 · · ·wℓ = w

by Lemma 3.15b, which proves the claim.Now, let S◦ be a fresh variable. Put V ◦ = V ∪ {S◦}. Define R◦ = R′ ∪ {S◦ → S} ∪

{ S◦ → ε | S ∈ Null(G) }. Finally, define G◦ by G◦ = (V ◦, T, R◦, S◦). We showthat L(G) = L(G◦). We distinguish four cases: (i) Suppose w ∈ L(G), w 6= ε. ThenS ⇒∗

G w, and S ⇒∗G′ w by the claim. Since R′ ⊆ R◦ and S◦ ⇒G◦ S, it follows that

S◦ ⇒∗G◦ w and w ∈ L(G◦). (ii) Suppose w ∈ L(G◦), w 6= ε. Then S◦ ⇒∗

G◦ w and, bydefinition of R◦, S ⇒∗

G′ w. By the claim it follows that S ⇒∗G s. Hence, w ∈ L(G).

(iii) If ε ∈ L(G), then S ⇒∗G ε. Thus, S ∈ Null(G), S◦ ⇒G◦ ε and ε ∈ L(G◦). (iv) If

ε ∈ L(G◦), then S◦ ⇒∗G◦ ε. Assume S◦ ⇒G◦ S ⇒∗

G◦ ε. Then we have S ⇒∗G′ ε. But

the set of rules R′ of G′ has no ε-productions, which yields a contradiction. It followsthat S →G◦ ε, thus, by definition of R◦, S ∈ Null(G). We conclude S ⇒∗

G ε andε ∈ L(G).

Example 3.23. Consider the CNF G = ({S,A,B,C}, {a, b}, R, S) where

S → ABC, A → aA | ε, B → bB | ε, C → c | ε,

The set of production rules R, obtained from R according to the lemma, is given by

S → ABC | AB | AC | BC | A | B | C | εA → aA | a | εB → bB | b | εC → c | ε

which yields the following set of productions with non-empty righthand-side

S → ABC | AB | AC | BC | A | B | CA → aA | aB → bB | bC → c

The latter CFG has no ε-productions, so its language does not contain ε, although ε wasin the language of the original grammar.


Eliminating unit productions

The next transformation on context-free grammars we discuss is the elimination of so-called unit productions. These are productions of the form A → B, for two variables Aand B.

Definition 3.24. A unit production of a context-free grammar G = (V, T, R, S) is aproduction A → B ∈ R where A,B ∈ V .

The idea is to combine a sequence of unit productions A0 → A1, A1 → A2, . . ., Am−1 →Am with an other production Am → α, where α is not a variable.

Lemma 3.25. Let G = (V, T, R, S) be a CFG. Define the set of productions R′ by

R′ = {A → α | ∃m > 0∃A0, . . . , Am ∈ V : A = A0,

Ai−1 → Ai ∈ R for 1 6 i 6 m, Am → α ∈ R }\{A → B | A,B ∈ V }

Define the CFG G′ by G′ = (V, T, R′, S). Then it holds that L(G) = L(G′).

Proof. We first prove the following claim:

A ⇒∗G w =⇒ A ⇒∗

G′ w

for A ∈ V and w ∈ T ∗.(⇒) We show, if A ⇒n

G w then A ⇒∗G′ w, for A ∈ V and w ∈ T ∗, by induction

on n. Basis, n=0: Trivial. This situation doesn’t occur. Induction step, n+1: SupposeA ⇒n+1

G w. By focussing on the first non-unit production in the derivation of w from Awe obtain derivation by

A = A0 ⇒G A1 · · · ⇒G Am ⇒G X1 · · ·Xk ⇒ℓG w

for suitable m > 0, A0, . . . , Am ∈ V such that and m + 1 + ell = n + 1. Then it holdsthat A → X1 · · ·Xk ∈ R′. Moreover, we can find ℓ1, . . . , ℓk > 0 and w1, . . . , wk ∈ T ∗

such that Xi ⇒ℓiG wi for 1 6 i 6 k, ℓ1 + · · · + ℓk, and w1 · · ·wk = w. We have either

Xi ∈ T and Xi = wi, or Xi ∈ V and Xi ⇒∗G′ wi by induction hypothesis. Therefore,

A ⇒∗G′ X1 · · ·Xk ⇒∗

G′ w1 · · ·wk = w

which proves the claim. From the claim we obtain, if S ⇒∗G w then S ⇒∗

G′ w. HenceL(G) ⊆ L(G′).

(⇐) We show, if A ⇒nG′ w then A ⇒∗

G w, for A ∈ V and w ∈ T ∗, by inductionon n. Basis, n=0: Trivial. This situation doesn’t occur. Induction step, n+1: SupposeA ⇒n+1

G′ w. Then we have A ⇒G′ X1 · · ·Xk ⇒nG′ w. Thus for suitable n1, · · · , nk > 0

and w1, . . . , wk ∈ T ∗ we have Xi ⇒ni

G′ wi, for 1 6 i 6 k, n1 + · · · + nk = n, andw1 · · ·wk = w. By definition of R′ we can find A0, · · · , Am ∈ V for some m > 0 suchthat A = A0 ⇒G A1 · · · ⇒G Am ⇒G X1 · · ·Xk. Also, if Xi ∈ T then Xi ⇒∗

G wi

since Xi = wi, and if Xi ∈ V then Xi ⇒G wi by induction hypothesis. We conclude,A ⇒∗

G X1 · · ·Xk ⇒∗G w1 · wk = w by Lemma 3.15c, which proofs the claim. From the

claim we get, if S ⇒nG′ w for some w ∈ T ∗ then S ⇒∗

G w. Hence L(G′) ⊆ L(G).


Note that a production A → α ∈ R, for α /∈ V , are also in R′: take m = 0 in thedefinition of R′. Also note, a sequence of unit production alone, A0 → A1, . . ., Am → Bfor variables A0, . . . , Am and B, does not lead to a production in R′.

From the lemma we derive, for every context-free grammar G there exists an equiv-alent context-free grammar G′ without unit productions. Moreover, if G doesn’t haveε-productions for variables other than the start symbol, then G′ don’t need to have suchproductions either.

Example 3.26. A CFG for arithmetic expressions may have the following productions:

E → T | E + TT → F | T ∗ FF → I | (E)I → a | b | c

Elimination of unit productions as prescribed by Lemma 3.25 yields

E → T ∗ F | (E) | a | b | c | E + TT → (E) | a | b | c | T ∗ FF → a | b | c | (E)I → a | b | c

which has no unit productions indeed.

Identifying useless variables

The final transformation on context-free grammars we discuss concerns the identificationand elilmination of so-called useless symbols. A symbol is useless if either (i) it does notgenerate any terminal string, so doesn’t contribute to the language of the grammar, or(ii) it is never produced by the start symbol, so not encountered in the production of aword of the language.

Definition 3.27. Let G = (V, T, R, S) be a CNF. Let X ∈ V ∪ T be a symbol of G.

(a) X is generating if X ⇒∗G w for some w ∈ T ∗.

(b) X is called reachable if S ⇒∗G αXβ for some strings α, β ∈ (V ∪ T )∗.

(c) X is called useful if X is both generating and reachable.

Note that a terminal a ∈ T of a grammar G = (V, T, R, S) is always a generatingsymbol, since we always have a ⇒∗

G a and a ∈ T ∗. Similarly trivial, the start symbol Sof G is reachable, since S ⇒ αSβ for α, β = ε.

We first focus on finding generating symbols. We construct the set Gen containingthese symbols by iteration.


Lemma 3.28. Let G = (V, T, R, S) be a CNF. Define the sets Geni ⊆ V ∪T , for i > 0,by Gen0 = T and Geni+1 = Geni ∪ { A ∈ V | ∃A → α ∈ R : α ∈ Gen∗

i }. PutGen =

⋃∞i=0 Geni. Then it holds that

X ∈ Gen ⇐⇒ X is generating

for all symbols X ∈ V ∪ T .

Proof. (⇒) We prove by induction on i, if X ∈ Geni then X is generating. Basis, i = 0:Clear, each terminal a ∈ T = Gen0 is generating.

Induction step, i + 1: Suppose X ∈ Geni+1\Geni. Pick a rule A → α ∈ R suchthat X = A and α ∈ Gen∗

i . Say, α = X1 · · ·Xn for n > 0, X1, . . . , Xn ∈ V ∪ T . SinceX1, . . . , Xn ∈ Geni we have that X1, . . . , Xn are generating by induction hypothesis.Pick w1, . . . , wn ∈ T ∗ such that Xk ⇒∗

G wk for 1 6 k 6 n. Then, by Lemma ??,

X ⇒G X1 · · ·Xn ⇒∗G w1 · · ·wn

Because w1 · · ·wn ∈ T ∗ it follows that the symbol X is generating, as was to be shown.(⇐) If X is generating, we can find i > 0 and w ∈ T ∗ such that X ⇒i

G w. Weprove X ∈ Geni by induction on i. Basis, i = 0: We have X = w ∈ T ∗, hence X ∈ T .Therefore, X ∈ Gen0.

Induction step, i + 1: Since X ⇒i+1G we can find a rule A → X1 · · ·Xn ∈ R, where

n > 0, X1, . . . , Xn ∈ V ∪T , such that A = X and Xk ⇒kiG wk for 0 6 ki 6 i and wk ∈ T ∗

for 1 6 k 6 n, again with appeal to Lemma ??. By induction hypothesis, Xk ∈ Geni for1 6 k 6 n. Because of rule A → X1 · · ·Xn of G and A = X it follows that X ∈ Geni+1,as was to be shown.

Note, the set of symbols Gen satisfies Gen = Gen ∪{A ∈ V | ∃A → α ∈ R : α ∈ Gen∗ }.We use Gen(G) to denote the set of generating symbols of a CFG G.

Theorem 3.29. Let G = (V, T, R, S) be a CNF such that L(G) 6= ∅. Let the setGen ⊆ V ∪ T be as given by Lemma 3.28. Define the CNF G′ = (V ′, T, R′, S) byV ′ = V ∩ Gen and R′ = { A→α ∈ R | A ∈ Gen, α ∈ Gen∗ }. Then it holds thatL(G) = L(G′), and G′ has generating symbols only.

Proof. Since L(G) 6= ∅ it holds that S ⇒∗G w for some w ∈ T ∗. Hence, S is generating,

and S ∈ V ′ = V ∩Gen.Since R′ ⊆ R we have L(G′) ⊆ L(G). To show the reverse, L(G) ⊆ L(G′), we claim:

if X ⇒kG w then X ⇒∗

G′ w (3.3)

for all X ′ ∈ V ′ ∪ T , k > 0, and w ∈ T ∗. Proof of the claim Induction on k. Basis, k = 0:Clear, if X ⇒0

G w then X = w and X ⇒0G′ w by definition of ⇒0

G′ .Induction step, k+1: Suppose X ⇒G X1 · · ·Xn ⇒k

G w for a rule X→X1 · · ·Xn ∈ R,where X1, . . . , Xn ∈ V ∪ T , n > 0. We can find w1, . . . , wn ∈ T ∗, 0 6 k1, . . . , kn 6 k suchthat Xi ⇒

kiG wi for 1 6 i 6 n, and w1 · · ·wn = w. By induction hypothesis, Xi ⇒

∗G′ wi,


1 6 i 6 n. Note, X1 · · ·Xn ∈ (V ∪ T )∗ ∩Gen. Since X ⇒k+1G w ∈ T ∗ we have that X ∈

V ∩Gen, thus X→X1 · · ·Xn ∈ R′. It follows that X ⇒G′ X1 · · ·Xn ⇒∗G′ w1 · · ·wn = w.

This proves the claim.It holds that S ∈ V ′. From the claim, instantiating X by S, we obtain, if S ⇒∗

G wthen S ⇒∗

G′ w for w ∈ T ∗, Thus, if w ∈ L(G) then w ∈ L(G′), i.e. L(G) ⊆ L(G′).

With the help of Lemma 3.28 and Theorem 3.29 we can construct a CFG with generatingvariables only for every CFG that has a non-empty language.

Example 3.30. Put G = ({S,A,B,C}, {a, b}, R, S) where the set of productions R isgiven by

S→AB | AC, A→BC | a, C→ε

Then we haveGen0= {a, b}Gen1= {a, b} ∪ {A,C} = {A,C, a, b}Gen2= {A,C, a, b} ∪ {S,A,C} = {S,A,C, a, b}

It follows that Gen = {S,A,C, a, b}. The construction of the CNF G′ = (V ′, T, R′, S),as given by Theorem 3.29 yields V ′ = {S,A,C} and R′ = {S→AC, A → a, C → ε}.We have that L(G) = L(G′) = {a}.

We proceed by defining a procedure to eliminate non-reachable symbols from a CNF. Ifthe starting CNF is non-empty and has generating symbols only, the resulting CNF hasonly symbols that are both generating and reachable, i.e. symbols that are useful.

Lemma 3.31. Let G = (V ′, T, R′, S) be a CNF. Define the sets Reachi ⊆ V ′ ∪ T ,for i > 0, by Reach0 = {S} and

Reachi+1=Reachi ∪ {X ∈ V ′ ∪ T |

∃A ∈ Reachi ∃α, β ∈ (V ′ ∪ T )∗ : A→αXβ ∈ R′ }

Put Reach =⋃∞

i=0 Reachi. Then it holds that

X ∈ Reach ⇐⇒ X is reachable

for all symbols X ∈ V ′ ∪ T .

Proof. (⇒) By induction on i, if X ∈ Reachi then X is reachable. Basis, i = 0: SinceS ⇒∗

G′ αSβ for α, β = ε we have that the start symbol S is reachable. Inductionstep, i + 1: Suppose X ∈ Reachi+1\Reachi. Pick A→γ ∈ R′ such that A ∈ Reachi

and γ = αXβ for some α, β ∈ (V ′ ∪ T )∗. By induction hypothesis, the variable A isreachable, i.e. S ⇒∗

G′ α′Xβ′ for suitable α′, β′ ∈ (V ′∪T )∗. Then also S ⇒∗G′ α′αXββ′,

thus X is reachable, as was to be shown.(⇐) If a symbol X ∈ V ′ ∪ T is reachable, we have S ⇒k

G′ αXβ for suitable stringsα, β ∈ (V ′∪T )∗ and some k > 0. We prove, if S ⇒k

G′ αXβ then X ∈ Reach by inductionon k. Basis, k = 0: We have X = S and X ∈ Reach0 ⊆ Reach by definition.


Induction step, k + 1: Suppose S ⇒kG′ γ ⇒G′ αXβ. Then we can find A ∈ V ′,

α′, α′′, β′, β′′ ∈ (V ′∪T )∗ such that γ = α′Aβ′, A→α′′Xβ′′ ∈ R′ and α = α′α′′, β = β′β′′.Note, the symbol A is reachable. Therefore, by induction hypothesis, we have A ∈ Reach,say A ∈ Reachi for some i > 0. Since A→α′′Xβ′′ ∈ R′, it follows that X ∈ Reachi+1

and hence X ∈ Reach.

Now we have a means to identify reachable symbols, we are able to transform a CFGwith generating symbols into a CFG without useless symbols.

Theorem 3.32. Let G′ = (V ′, T, R′, S) be a CNF with generating symbols only suchthat L(G′) 6= ∅. Let the set Reach ⊆ V ∪ T be as given by Lemma 3.31. Define theCNF G′′ = (V ′′, T, R′′, S) by V ′′ = V ′∩Reach and R′′ = {A→γ ∈ R′ | A ∈ Reach, γ ∈Reach∗ }. Then it holds that L(G′) = L(G′′), and G′ has useful symbols only.

Proof. Since R′′ ⊆ R′ we have L(G′′) ⊆ L(G′). In order to show L(G′) ⊆ L(G′′) we firstprove the following claim:

if X ⇒nG′ w then X ⇒∗

G′′ w (3.4)

for all X ∈ Reach(G′) n > 0, and w ∈ T ∗.Proof of the claim Induction on n. Basis, n = 0: We have X = w thus X ⇒∗

G′′ w.Induction step, k + 1: Suppose X ⇒n+1

G′ w. Pick symbols X1, . . . , Xk ∈ V ′′ ∪ T ,w1, . . . , wk ∈ T ∗, n1, . . . , nk > 0 such that X → X1 · · ·Xk ∈ R′, Xi ⇒ni

G′ wi for1 6 i 6 k, n1 + · · · + nk = n and w1 · · ·wk = w. Since X ∈ Reach(G′) we haveX1, . . . , Xk ∈ Reach(G′), thus X → X1 · · ·Xk ∈ R′′ and Xi ⇒∗

G′′ wi for 1 6 i 6 k byinduction on n. We conclude X ⇒∗

G′′ , as was to be shown.Since S ∈ Reach(G′) we obtain L(G′) ⊆ L(G′′) by instantiating Equation (3.4) for S.

Let A ∈ V ′′ be a variable of G′′. Then A ∈ Gen(G′) by construction. Thus A ⇒∗G′ w

for some w ∈ T ∗. By Equation (3.4) also A ⇒∗G′′ w. Hence A ∈ Gen(G′′). It is left to

the reader to verify S ⇒∗G′′ αAβ implies A ∈ Reach(G′′), for A ∈ V ′′, α, β ∈ (V ∪ T )∗.

Having this, we see for A ∈ V ′′ that both A ∈ Gen(G′′) and A ∈ Reach(G′′). Hence,each variable of G′′ is useful.

Example 3.33. Consider the CFG G = ({S,A,B,C}, {a, b, c, d}, R, S) where R is theset of productions given by

S → A | d, A → AB, B → ε, C → S | c

Then we have

Reach0= {S}Reach1= {S} ∪ {A, d} = {S,A, d}Reach2= {S,A, d} ∪ {A, d,A,B} = {S,A,B, d}Reach3= {S,A,B, d} ∪ {A, d,A,B} = {S,A,B, d}

Thus Reach(G) = {S,A,B, d}. Restriction to reachable symbols, or rather reachablevariables, yields the CFG G′ = ({S,A,B}, {a, b, c, d}, R′, S) where the set of produc-tion R′ is given by

S → A | d, A → AB, B → ε


Since L(G′) = {d} we must have L(G) = {d} too, by the theorem.

With the use of Lemma 3.28 up to Theorem 3.32 we can devise a procedure to con-struct from a CFG G, with non-empty language, an equivalent CFG G′′ with usefulvariables only. Hence deleting productions that do not contribute to the production ofany sequence in the language of G. Starting from G = (V, T, R, S) with L(G) 6= ∅, theprodure is as follows:

1. Compute Gen(G).

2. Put G′ = (V ′, T, R′ S) with set of variables V ′ = V ∩Gen(G) and set of produc-tions R′ = {A → γ ∈ R | γ ∈ Gen(G)∗ }.

3. Compute Reach(G′).

4. Put G′′ = (V ′′, T, R′′ S) with set of variables V ′′ = V ′ ∩ Reach(G′) and set ofproductions R′′ = {A → γ ∈ R′ | A ∈ Reach(G′) }.

Combination of the results above yields that L(G′′) = L(G) and G′′ has useful symbolsonly.

The order in which we eliminate non-generating symbols and non-reachable variablesis important. We need to consider generating symbols first and reachable variables next.Consider the CFG G = ({S,A,B,C}, {a, b, c}, R, S) with set of productions R given by

S → AB | c, A → a, C → c

Doing things in the right order, we have Gen(G) = {S,A,C, a, b, c}. Thus, G′ =({S,A,C}, {a, b, c}, R′, S) with set of productions R′ given by

S → c, A → a, C → c

Since Reach(G′) = {S} we obtainG′′ = ({S}, {a, b, c}, R′′, S) with set of productions R′′

consisting of one production

S → c

However, doing things in the reversed order, we have Reach(G) = {S,A,B, c}. Hence,G′ = ({S,A,B}, {a, b, c}, R′, S) with set of productions R′ given by

S → AB | c, A → a

Next, Gen(G′) = {S,A, a, b, c}. Therefore now G′′ = ({S,A}, {a, b, c}, R′′, S) with setof productions R′ given by

S → c, A → a

a grammar which still contains a useless symbol, viz. the variable A.


Chomsky normal form

So far we have seen transformations to eliminate ε-productions, unit productions, anduseless symbols from a context-free grammar. We push it a little further though, bybringing a CFG into a normal form, viz. Chomsky normal form.

Definition 3.34. A context-free grammar G = (V, T, R, S) is in Chomsky normal formif for all rules A → α ∈ R either (i) there exist variables B,C ∈ V such that α = BC,or (ii) there exists a terminal a ∈ T such that α = a, or (iii) A = S and α = ε.

The constructions we have seen above help to focus on the production of strings in thelanguage of a CFG. As a next step in bringing a grammar in Chomsky normal form wedescribe how to adapt a CFG such that the righthand-side of a production is either aterminal or a string of two or more variables. For this we need to introduce additionalvariables and production. However, we gain that we can assume that the CFG is in astandard form, which comes in handy in various situations.

Theorem 3.35. Let G = (V, T, R, S) be a CNF such that γ ∈ T or |γ | > 2, for allA → γ ∈ R. Put U = { a ∈ T | ∃A → γ ∈ R : |γ | > 2 ∧ a ∈ γ }. Let W = {Ba | a ∈ U }be a fresh set of variables, i.e. W ∩ V = ∅ and W ∩ T = ∅. Define the CNF G′ =(V ′, T, R′, S) by V ′ = V ∪W and

R′ = {A → γ | γ ∈ T } ∪ {Ba → a | a ∈ U } ∪

{A → Y1 · · ·Yk | ∃A → X1 · · ·Xk ∈ R ∃i, 1 6 i 6 k :

(∃a ∈ U : Xi = a ∧ Yi = Ba) ∨ (Xi ∈ V ∧ Yi = Xi) }

Then it holds that L(G) = L(G′).

Proof. (L(G) ⊆ L(G′)) We first prove, by induction on n, if A ⇒nG w then A ⇒∗

G′ w, forall w ∈ T ∗. Basis, n = 0: Trivial. Induction step, n+1: Suppose A ⇒G X1 · · ·Xk ⇒n

G

w,Xi ⇒ni

G wi, n1+· · ·+nk = n, and w1 · · ·wk = w. IfX1 · · ·Xk ∈ T , then w = X1 · · ·Xk

and A ⇒G′ w ∈ R′. If X1 · · ·Xk /∈ T , put Yi = Xi if Xi ∈ V , and Yi = Ba if Xi = a forsome a ∈ T . Then we have A ⇒G′ Y1 · · ·Yk ⇒∗

G′ X1 · · ·Xn. By induction hypothesis,Xi ⇒∗

G′ wi, for 1 6 i 6 k. Thus A ⇒∗G′ w, as was to be shown.

(L(G′) ⊆ L(G)) Note, each righthand-side of R′ is either a terminal or a string oftwo are more variables. We first prove

A ⇒nG′ w implies A ⇒∗

G w (3.5)

for A ∈ V , n > 0, w ∈ T ∗, by induction on n. Basis, n = 0: Trivial. Induction step,n+1: Suppose A ⇒G′ Y1 · · ·Yk ⇒n

G′ w, Yi ⇒ni

G′ wi for 1 6 i 6 k and n1+ · · ·+nk = n,w1 · · ·wk = w, for k > 2, Y1, . . . , Yk ∈ V ∪W , n1, . . . , nk > 0, and w1, . . . , wk ∈ T ∗. IfYi ∈ V we put Xi = Yi; if Yi ∈ W we put Xi = wi ∈ T . We have that A → X1 . . . Xk

is a production rule of G. Moreover, Xi ⇒∗G wi: if Xi ∈ V then this is by induction

hypothesis; if Xi = wi then is always the case. We conclude A ⇒G X1 · · ·Xk ⇒∗G

w1 · · ·wk = w, which proves the induction step.If we take S for A in Equation (3.5), we see L(G′) ⊆ L(G), as was to be shown.


Theorem 3.36. Let G = (V, T, R, S) be a CNF such that γ ∈ T or γ ∈ V ∗ and |γ | > 2,for all A → γ ∈ R. Put

R1 = {A → γ | |γ | = 1 }, R2 = {A → γ | |γ | = 2 }, R3 = {A → γ | |γ | > 3 }

Choose, for each production rule r : Ar → Br1 · · ·B

rk in R3, a fresh set of variables

{C r1 , . . . , C

rk−2} such that Vr ∩ V = ∅ and Vr ∩ Vr′ = ∅ for two different rules r, r′ ∈ R3.

Put

V ′=V ∪⋃{ Vr | r ∈ R3 }

R′3= {Ar → C r

1Brk , C

r1 → C r

2Brk−1, . . . , C

rk−3 → C r

k−2Br3 , C

rk−2 → Br

1Br2 }

and R′ = R1 ∪R2 ∪R′3. Define G′ = (V ′, T, R′, S). Then it holds that L(G) = L(G′).

Proof. (L(G) ⊆ L(G′)) We first prove, by induction on n,

A ⇒nG w implies A ⇒∗

G′ w (3.6)

for A ∈ V , n > 0, and w ∈ T ∗.Basis, n = 0: Trivial. Induction step, n + 1: Suppose A ⇒n+1

G w. If the firstproduction is of the form A → a ∈ R1 for some a ∈ T , then k = 0, w = a, and alsoA ⇒∗

G w. If the first production is of the form A → B1B2 ∈ R2 then B1 ⇒n1

G w1,B2 ⇒n2

G w2 for suitable n1, n2 > 0, w1, w2 ∈ T ∗ such that n1 + n2 = n and w1w2 = w.Then we have A → B1B2 ∈ R′ and B1 ⇒∗

G′ w1, B2 ⇒∗G

′ w2 by induction hypothesis.Hence, A ⇒∗

G′ B1B2 ⇒∗G′ w1w2 = w. Finally, if the first production is of the form

A → B1 · · ·Bk ∈ R3 for k > 3, then exist n1, . . . , nk > 0, w1, . . . , wk ∈ T ∗ such thatBi ⇒ni

G wi for 1 6 i 6 k, n1 + · · · + nk = n, and w1 · · ·wk = w. Moreover, byconstruction of R′

3, we have A ⇒∗G′ B1 · · ·Bk. By induction hypothesis, Bi ⇒∗

G′ wi

for 1 6 i 6 k. Hence A ⇒∗G′ B1 · · ·Bk ⇒∗

G′ w1 · · ·wk = w. Taking S for A inEquation (3.6) shows L(G) ⊆ L(G′).

(L(G′) ⊆ L(G)) We first prove, by induction on n,

A ⇒nG′ w implies A ⇒∗

G w (3.7)

for A ∈ V , n > 0, and w ∈ T ∗.Basis, n = 0: In this case there is nothing to prove. Induction step, n + 1: If

the first production is of the form A → a ∈ R1, then w = a and also A ⇒∗G′ w. If

the first production is of the form A → B1B2 ∈ R2, then B1 ⇒n1

G w1, B2 ⇒n2

G w2

for some n1, n2 > 0, w1, w2 ∈ T ∗ such that n1 + n2 = n and w1w2 = w. We haveA → B1B2 ∈ R, and Bi ⇒∗

G wi, i = 1, 2, by induction hypothesis. Therefore, A ⇒G

B1B2 ⇒∗G w1w2 = w. If the first production is of the form A → C1Bk ∈ R′

3, thenexists a production A → B1 · · ·Bk ∈ R3 from which A → C1Bk is derived. Moreover,without loss of generality, we can assume that the derivation A ⇒n

G′ w is left-most.Thus A ⇒k−1

G′ B1 · · ·Bk ⇒n−k+1G′ w by construction. Also Bi ⇒ni

G′ wi for 1 6 i 6 k,for suitable n1, . . . , nk > 0, w1, . . . , wk ∈ T ∗ such that n1 + · · · + nk = n − k + 1 andw1 · · ·wk = w. By induction hypothesis Bi ⇒∗

G wi for 1 6 i 6 k. We concludeA ⇒G B1 · · ·Bk ⇒∗

G w1 · · ·wk = w. Again by taking S for A, now in Equation (3.7),we see L(G′) ⊆ L(G).


Example 3.37. Consider the CNF given by

S → ASB | ε A → aAS | a | ε B → bS | A | bb

Elimination of ε-productions yields the grammar

S → ASB | SB | AS | S | ε A → aAS | aS | a B → bS | bb

Eliminiation of unit rules yields the grammar

S → ASB | SB | AS | ε A → aAS | aS | a B → bS | bb

Introduction of auxilliary variables yields the grammar

S → X1B | SB | AS | ε X1 → SBA → XaX2 | XaS | a Xa → a X2 → ASB → XbS | bb Xb → b


Exercise 3.3.1. Consider the CNF G = ({S,A,B,C,D,E}, {a, b}, R, S) where the setof production rules R is given by

S → A | C, A → BB, B → A | ε, C → AD, D → bC, E → ε

Give an equivalent CNF G′ = (V ′, {a, b}, R′, S) with useful symbols only.

Exercise 3.3.2. Consider the CNF G = ({S,A,B,C}, {a, b, c}, R, S) where the set ofproduction rules R is given by

S → A, A → BC | a, B → ε | b, C → ε | c

Give an equivalent CNF G′ = (V ′, {a, b, c}, R′, S) without ε-productions for variablesdifferent from S and without unit rules.

Exercise 3.3.3. Consider the CNF G = ({S,A,B}, {a, b}, R, S) where the set of pro-duction rules R is given by

S → AAA | B, A → aA | B, B → ε

Give an equivalent CNF G′ = (V ′, {a, b, c}, R′, S) which is in Chomsky normal form.

3.4. PARSE TREES 27

S

S S

( S )

ε

( S )

( S )

ε

Figure 3.5: A parse tree representing three (and more) production sequences

3.4 Parse trees

In the previous section, in Example 3.16, we saw various production sequences for thesame string. Basically, the particular production sequences are not very different. It isthe order in which the various rules are applied that varies.

If we visualize the three derivations given in Example 3.16

S ⇒G SS ⇒G S(S) ⇒G S((S)) ⇒G S(()) ⇒G (S)(()) ⇒G ()(())S ⇒G SS ⇒G (S)S ⇒G ()S ⇒G ()(S) ⇒G ()((S)) ⇒G ()(())S ⇒G SS ⇒G (S)S ⇒G (S)(S) ⇒G ()(S) ⇒G ()((S)) ⇒G ()(())

as a rooted tree, we obtain in all three cases the picture given in Figure 3.5. Theend result is the same, although the tree is built up differently. In fact, the order ofindependent productions is abstracted away. Only the causal order remains. Note, aninternal node, i.e. a node that is not a leave, together with its children corresponds to aproduction rule. A leaf labeled ε has no siblings.

We formalize the above idea in the notion of parse tree for a context-free grammar.

Definition 3.38. Let G = (V, T, R, S ) be a context-free grammar. The collectionPTG of parse trees of G, a set of rooted node-labeled trees, is given as follows.

• A single root tree [X] with root labeled X is a parse tree for G if X is a variableor terminal of G.

• A two-node tree [A → ε], with root labeled A and leaf labeled ε is a parse treefor G if A ∈ V and A → ε is a production rule of G.

• If PT 1, PT 2, . . ., PT k are k parse trees for G with roots X1, X2, . . ., Xk,respectively, and A → X1X2 · · ·Xk is a production rule of G, then the tree[A → PT 1,PT 2, . . . ,PT k] with root labeled A and subtrees of the root PT1,PT 2, . . ., PT k is a parse tree for G.


X A

ε

A

PT 1 PT 2 PT 3. . .

Figure 3.6: Parse tree formation

The yield function yield : PTG → (V ∪ T )∗ with respect to G is given by

• yield([X]) = X for X ∈ V ∪T ,

• yield([A → ε]) = ε for A ∈ V ,

• yield([A → PT 1,PT 2, . . . ,PT k]) = yield(PT 1) · yield(PT 2) · . . . · yield(PT k) forA ∈ V , PT 1, . . . ,PT k ∈ PTG.

A parse tree PT for G is called complete if yield(PT ) ∈ T ∗.

For a context-free grammar G , we use PTG(A) to denote the set of all complete parsetrees of G with root labeled A.

S

A

A

B

Ba a b

ε ε

S

S

S

a

a

b

b

Figure 3.7: Complete parse tree yield aab; incomplete parse tree, yield aaSbb.

Next we relate derivations of a context-free grammar and its parse trees.

Theorem 3.39. Let G = (V, T, R, S ) be a context-free grammar. For A ∈ V andw ∈ T ∗, if A ⇒∗

G w then w = yield(PT ) for some complete parse tree PT of G withroot A.

Proof. We prove, for A ∈ V and w ∈ T ∗,

A ⇒nG w implies ∃PT ∈ PTG(A) : w = yield(PT )

3.4. PARSE TREES 29

by induction on n.Basis, n = 0: Since A /∈ T ∗ there is nothing to prove. Induction step, n+1: Suppose

A ⇒G X1 · · ·Xk ⇒nG w for a production rule A → X1 · · ·Xk of G. By Lemma 3.15 we

can find strings w1, . . . , wk ∈ T ∗ such that Xi ⇒∗G wi, for 1 6 i 6 k, and w1 · · ·wk = w.

If Xi ∈ T , for 1 6 i 6 k, we have Xi = wi and we put PT i = [Xi]. If Xi ∈ V , for1 6 i 6 k, we obtain from the induction hypothesis a parse tree PT i of G with root Xi

and yield wi. Consider the parse tree PT = [A → PT 1, . . . ,PT k] obtained from thejuxaposition of the parse trees PT 1, . . . ,PT k and putting the variable A on top. SinceA → X1 · · ·Xk ∈ R this is indeed a parse tree of G, it has root A and is such that

yield(PT ) = yield(PT 1) · · · yield(PT k) = w1 · · ·wk = w

This was to be shown.

Corollary 3.40. For a context-free grammarG with start symbol S it holds that L(G) ={ yield(PT ) | PT ∈ PTG(S) }.

Proof. Direct from the definition of L(G) and the theorem.

We next show that if a string of terminals is the yield of a parse tree, then the string canbe obtained by a leftmost derivation. For the proof of this result we need the followinglemma.

Lemma 3.41. Let G be a context-free grammar.

• Xℓ⇒∗

G α implies wXβℓ⇒∗

G wαβ for all X ∈ (V ∪T ), α, β ∈ (V ∪T )∗, and w ∈ T ∗.

• wαℓ⇒∗

G wβ implies αℓ⇒∗

G β for all w ∈ T ∗, and α, β ∈ (V ∪ T )∗.

Proof. Left as an exercise.

We now turn to the announced result that relates parse trees and leftmost derivations.In its proof we make use of the height of a parse tree. This is the usual height of a tree,i.e. the maximal number of edges of a path from the root to a leaf.

Theorem 3.42. Let G be a context-free grammar. Suppose G has a parse tree PT with

root labeled A and yield w ∈ T ∗. Then there exists a left-most derivation Aℓ⇒∗

G w.

Proof. By induction on the height of the parse tree PT .Basis, n = 1: Then PT has root labeled A and either one leaf labeled ε and w = ε,

or k leaves for some k > 1 labeled a1, . . ., ak and w = a1 · · · ak. In both cases, A → w

is a production rule of G. Since A → w we also have Aℓ⇒∗

G w.Induction step, n+ 1: Then PT has the form [A → PT 1, . . . ,PT k], for some k > 1,

each PT i a parse tree of height at most n say with root labeledXi and yield wi, 1 6 i 6 k,and A → X1 · · ·Xk a production rule of G. Moreover, w = w1 · · ·wk.

We claim the existence of a left-most derivation Aℓ⇒∗

G w1 · · ·wiXi+1 · · ·Xk for 0 6

i 6 k. We prove this claim by induction on i. Basis, i=0: Since G has a production rule


A → X1 · · ·Xk, we also have Aℓ⇒∗

G X1 · · ·Xk, and hence we are done. Induction step,

i + 1: By induction hypothesis for i we have Aℓ⇒∗

G w1 · · ·wiXi+1 · · ·Xk. By induction

hypothesis for n we have Xi+1ℓ⇒∗

G wi+1. With the help of Lemma 3.41 we obtain that

Aℓ⇒∗

G w1 · · ·wiXi+1Xi+2 · · ·Xkℓ⇒∗

G w1 · · ·wiwi+1Xi+2 · · ·Xk

This proves the claim.

By choosing i = k in the claim, we obtain Aℓ⇒∗

G w1 · · ·wk, i.e., Aℓ⇒∗

G w, as was tobe shown.

We summarize the results of this section as the following theorem, which states that fora context-free grammar, the set of strings obtained by arbitrary derivations, the set ofstrings that arise as the yield of a parse tree, and the set of strings obtained by leftmostderivations are all the same set.

Theorem 3.43. Let G = (V, T, R, S ) be a context-free grammar. Then it holds that

L(G) = { yield(PT ) ∈ T ∗ | PT ∈ PTG(S) } = { w ∈ T ∗ | Sℓ⇒∗

G w }

Proof. From Corollary 3.40 we obtain

L(G) ⊆ { yield(PT ) ∈ T ∗ | PT ∈ PTG(S) }

By Theorem 3.42 we have

{ yield(PT ) ∈ T ∗ | PT ∈ PTG(S) } ⊆ { w ∈ T ∗ | Sℓ⇒∗

G w }

By definition of L(G)

{ w ∈ T ∗ | Sℓ⇒∗

G w } ⊆ L(G)

It follows that all set inclusions are equalities, as was to be shown.


Exercise 3.4.1. Consider again the the grammar of Exercise 3.2.1 with production rules

S → XbYX → ε | aXY → ε | aY | bY

Provide parse trees for this grammar with yield aabab, baab, and aaabb.

A context-free grammar G is called ambiguous if there exist two different completeparse trees PT 1 and PT 2 of G such that yield(PT 1) = yield(PT 2). Otherwise G iscalled unambiguous.

3.5. THE CLASS OF CONTEXT-FREE LANGUAGES 31

Exercise 3.4.2. (a) Show that the grammar G given by the production rules

S → AB | aaB A → a | Aa B → b

is ambiguous.

(b) Provide an unambiguous grammar G′ that generates the same language as G.Argue why G′ is unambiguous and why L(G′) = L(G).

Exercise 3.4.3. (a) Show that the grammar G given by the production rules

S → ε | aSbS | bSaS

is ambiguous.

(b) Provide an unambiguous grammar G′ that generates the same language as G.Argue why G′ is unambiguous and why L(G′) = L(G).

3.5 The class of context-free languages

In this section we establish that a language is generated by a context-free grammarprecisely when it is accepted by a push-down automaton. We introduce the notion ofacceptance by empty stack for a PDA and obtain the main result in three steps: (i) fromCFG to PDA, (ii) from PDA to PDA accepting on empty stack, (iii) from PDA acceptingon empty stack to CFG.

Theorem 3.44. Let G = (V, T, R, S ) be a context-free grammar. Then there exists apush-down automaton P such that L(P ) = L(G).

Proof. Define P = ( {q0, q1, q2}, T, V ∪T , ∅,→, q0, {q2} ) with

(1) q0τ [∅/S]−−−−→ q1

(2) q1a[a/ε]−−−−→ q1 for all a ∈ T (matching step)

(3) q1τ [∅/ε]−−−−→ q2

(4) q1τ [A/α]−−−−→ q1 for all A → α ∈ R (production step)

We will prove L(P ) = L(G). For this we need to claim, for all γ ∈ (V ∪T )∗ and w ∈ T ∗,

γℓ⇒∗

G w iff (q1, w, γ) ⊢∗P (q1, ε, ε) (3.8)

Proof of the claim: (⇒) We show by induction on n, γℓ⇒n

G w implies (q1, w, γ) ⊢∗P

(q1, ε, ε). Basis, n = 0: Then we have γ = w. By clause (2) in the definition of P we


have (q1, w, γ) = (q1, w, w) ⊢∗P (q1, ε, ε). Induction step, n+1: Suppose γ

ℓ⇒G γ′

ℓ⇒n

G w.Then γ = vAβ for suitable v ∈ T ∗, A ∈ V , and β ∈ (V ∪T )∗. Then we have γ′ = vαβ for

some A →G α ∈ R. Since γ′ℓ⇒n

G w we have that w = vu and αβℓ⇒n

G u, see Lemma 3.41.Therefore

(q1, w, γ) = (q1, vu, vAβ)

⊢∗P (q1, u, Aβ) by clause (2)

⊢P (q1, u, αβ) by clause (4), A →G α

⊢∗P (q1, ε, ε) by induction hypothesis, αβ

ℓ⇒n

G u

as was to be shown.(⇐) By induction on n, if (q1, w, γ) ⊢n

P (q1, ε, ε) then γℓ⇒∗

G w. Basis, n = 0: Then

we have w = ε and γ = ε. Clearly εℓ⇒∗

G ε.Induction step, n+ 1: Suppose

(q1, w, γ) ⊢P (q1, w′, γ′) ⊢n

P (q1, ε, ε)

We distinguish two cases. Case I, γ = Aβ: We then have (q1, w, γ) ⊢P (q1, w′, γ′) by

clause (4) for a rule A →G α. Thus w = w′ and γ′ = αβ. By induction hypothesis

γ′ℓ⇒∗

G w′, i.e. αβℓ⇒∗

G w. Thus γ = Aβℓ⇒G αβ

ℓ⇒∗

G w. Case II, γ = aβ. Wethen have (q1, w, γ) ⊢P (q1, w

′, γ′) by clause (2). Thus w = aw′ and γ = aγ′. Since

(q1, w′, γ′) ⊢n

P (q1, ε, ε) we have γ′ℓ⇒∗

G w′ by induction hypothesis. Using Lemma 3.41

it follows that aγ′ℓ⇒∗

G aw′, i.e. γℓ⇒∗

G w.We finally prove L(P ) = L(G). Suppose w ∈ L(P ). Thus (q0, w, ε) ⊢∗

P (q2, ε, β) forsome string β ∈ (V ∪ T )∗. By definition of P it follows that

(q0, w, ε) ⊢P (q1, w, S) ⊢∗P (q1, ε, ε) ⊢P (q2, ε, ε)

Since (q1, w, S) ⊢∗P (q1, ε, ε) we have by claim (3.8) S

ℓ⇒∗

G w. Thus w ∈ L(G). Reversely,

suppose w ∈ L(G). Thus Sℓ⇒∗

G w. Again by claim (3.8) we obtain (q1, w, S) ⊢∗P

(q1, ε, ε). Adding the derivation steps (q0, w, ε) ⊢P (q1, w, S) and (q1, ε, ε) ⊢P (q2, ε, ε)we see (q0, w, ε) ⊢∗

P (q2, ε, ε). Thus w ∈ L(P ).

Next we introduce the idea of a PDA accepting on empty stack.

Definition 3.45 (Push-down automaton accepting on empty stack). A push-down au-tomaton accepting on empty stack, also called an empty-stack automaton, is a six-tupleP = (Q, Σ, ∆, D, →P , q0 ) where

1. Q is a finite set of states,

2. Σ is a finite input alphabet,

3. ∆ is a finite data alphabet or stack alphabet,

4. D ∈ ∆ is the stack bottom symbol,


5. →P ⊆ Q× Στ ×∆×∆∗ ×Q, where Στ = Σ ∪ {τ}, is a finite set of transitions orsteps,

6. q0 ∈ Q is the initial state.

Notation and the notion of a configuration extend from a PDA as given by Definition 3.2to a PDA accepting on empty stack. Note, a PDA accepting on empty stack has notransition if the stack is empty. The idea is that, when the stacks becomes empty thePDA blocks and accepts the string read so far. As a consequence we need to adapt thestack in the start situation. Instead of an empty stack, a PDA accepting on empty stackstarts with a stack consisting of the stack bottom symbol D only. No final states needto be identified. Honoring its name, this variant of PDAs accepts on the empty stack.Therefore we have

Definition 3.46. Let P = (Q, Σ, ∆, D, →P , q0 ) be a PDA accepting on empty stack.The language accepted by P on empty stack, notation N (P ), is given by

{ w ∈ Σ∗ | ∃q ∈ Q : (q0, w,D) ⊢∗P (q, ε, ε) }

A PDA accepting on empty stack may terminate in any state. Acceptance or non-acceptance is decided by the stack being empty or not.

0 1 2a[D/1D] b[1/ε]

a[1/11]

b[1/ε]τ [D/ε]

Figure 3.8: Example push-down automaton accepting on empty stack

Example 3.47. Figure 3.8 displays a push-down automaton P accepting on emptystack. We have P = (Q,Σ,∆, D,→P , q0) with Q = {q0, q1, q2}, Σ = {a, b}, ∆ = {1, D}now including the special symbol D that is the stack bottom, and a transition relationgiven by

q0a[D/1D]−−−−−→P q1, q1

a[1/11]−−−−→P q1, q1

b[1/ε]−−−−→P q2, q2

b[1/ε]−−−−→P q2, q2

τ [D/ε]−−−−→P q2

Compared to the standard PDA given in Figure 3.3 there is no final state q3. In fact,there is no final state at all. In the transitions of Figure 3.8 there is no mentioning ofthe empty stack using ∅. Testing for empty stack can now be done by testing for thenew stack symbol D on top (or rather at the bottom) of the stack.

Theorem 3.48. For every push-down automaton P , accepting on final state, exists apush-down automaton P ′ accepting on empty stack such that L(P ) = N (P ′).


Proof. Suppose P = (Q, Σ, ∆, ∅, →P , q0, F ). Define P ′ by

P ′ = (Q ∪ {qend}, Σ, ∆ ∪ {D}, D, →P ′ , q0 )

with a new state qend , a new stack symbol D, and with a transition relation →P ′ givenby

qa[d/x]−−−−→P ′ q′ if q

a[d/x]−−−−→P q′

qa[D/xD]−−−−−→P ′ q′ if q

a[∅/x]−−−−→P q′

qτ [d/x]−−−−→P ′ q′ if q

τ [d/x]−−−−→P q′

qτ [D/xD]−−−−−−→P ′ q′ if q

τ [∅/x]−−−−→P q′

qτ [d/d]−−−−→P ′ qend for q ∈ F and d ∈ ∆ ∪ {D}

qendτ [d/ε]−−−−→P ′ qend for d ∈ ∆

qendτ [D/ε]−−−−→P ′ qend

It holds that (q, w, x) ⊢P (q′, w′, x′) iff (q, w, xD) ⊢P ′ (q′, w′, x′D) for w ∈ Σ∗, q, q′ ∈ Q,and x, x′ ∈ ∆∗. From this it follows that (q0, w, ε) ⊢∗

P (q, ε, x) for some q ∈ F iff(q0, w,D) ⊢∗

P ′ (q, ε, xD) for some q ∈ F iff (q0, w,D) ⊢∗P ′ (qend , ε, ε), since qend can

only be reached in P ′ via a final state of P . We conclude w ∈ L(P ) iff w ∈ N (P ′), andL(P ) = N (P ′) as was to be shown.

The above theorem connects the two types of push-down automata in one direction. Thenext result connects a push-down automaton accepting on empty stack with a context-free grammar.

Theorem 3.49. Let P = (Q, ∆, Σ, →, q0, D) be a push-down automaton accepting onempty stack. There exists a context-free grammar G such that L(G) = N (P ).

Proof. We first define the grammar G. Its terminal alphabet T is the input alphabet Σof P . The variable alphabet V contains a start symbol S and variables of the form [q dq′ ]where q and q′ are states in Q and d is a stack symbol in ∆. Thus

V = {S} ∪ { [q dq′′ ] | q, q′′ ∈ Q, d ∈ ∆ }

The intuition for a variable [q dq′′ ] is that P ′ in state q, for any stack content x, can reachby processing the symbol d on top of the stack dx the state q′′ with stack content x. So,the net change of state is from q to q′′, the net change of the stack is that d is popped.

The rules of G are of three types

• S → [q0Dq′ ] for all q′ ∈ Q

• [q dq′′ ] → a[p0 d1 p1 ][p1 d2 p2 ] · · · [pk−1 dk pk ] if qa[d/d1···dk]−−−−−−−→ q′

with p0, . . . , pk ∈ Q such that p0 = q′ and pk = q′′


• [q dq′′ ] → [p0 d1 p1 ][p1 d2 p2 ] · · · [pk−1 dk pk ] if qτ [d/d1···dk]−−−−−−−→ q′

with p0, . . . , pk ∈ Q such that p0 = q′ and pk = q′′

In order to show the equality of L(G) and N (P ) we prove

[q dq′ ] ⇒∗G w iff (q, w, d) ⊢∗

P (q′, ε, ε) (3.9)

(⇒) We prove, by induction on n, that [q dq′ ] ⇒nG w implies (q, w, d) ⊢∗

P (q′, ε, ε).Basis, n = 1: If [q dq′ ] ⇒G w then [q dq′ ] → w is a rule of G. So, w = a for

some a ∈ Σ and qa[d/ε]−−−−→ q′ or w = ε and q

τ [d/ε]−−−−→ q′. Thus (q, a, d) ⊢P (q′, ε, ε) or

(q, ε, d) ⊢P (q′, ε, ε) and hence (q, a, d) ⊢∗P (q′, ε, ε) or (q, ε, d) ⊢∗

P (q′, ε, ε).Induction step, n+ 1: We have either have

[q dq′ ] ⇒G a[p0 d1 p1 ] · · · [pk−1 dk pk ] ⇒nG w

for some rule [q dq′ ] → a[p0 d1 p1 ] · · · [pk−1 dk pk ] of G, or

[q dq′ ] ⇒G [p0 d1 p1 ] · · · [pk−1 dk pk ] ⇒nG w

for some rule [q dq′ ] → [p0 d1 p1 ] · · · [pk−1 dk pk ] of G. We consider the first case, leavingthe second case to the reader.

By Lemma 3.15 we have w = aw1 . . . wk where [pi−1 di pi ] ⇒∗G wi, 1 6 i 6 k. Thus,

by induction hypothesis, (pi−1, wi, di) ⊢∗P (pi, ε, ε). By Lemma 3.6 we obtain

(pi−1, wiwi+1 · · ·wk, didi+1 · · · dk) ⊢∗P (pi, wi+1 · · ·wk, di+1 · · · dk)

for 1 6 i 6 k. By putting these k derivations together we obtain that

(p0, w1 · · ·wk, d1 · · · dk) ⊢∗P (pk, ε, ε)

Since (q, aw, d) = (q, aw1 · · ·wk, d) ⊢P (p0, w1 · · ·wk, d1 · · · dk) we conclude (q, aw, d)⊢∗P (q′, ε, ε).(⇐) We prove, by induction on n, (q, w, d) ⊢n

P (q′, ε, ε) implies [q dq′ ] ⇒∗G w. Basis,

n = 1: If (q, w, d) ⊢P (q′, ε, ε), we have w = a for some transition qa[d/ε]−−−−→ q′ or w = ε

for some transition qτ [d/ε]−−−−→ q′. Thus G has the production rule [q dq′ ] → a or [q dq′ ] → ε

and [q dq′ ] ⇒∗G w. Induction step, n+ 1: Suppose

(q, w, d) ⊢P (r, v, d1 · · · dk) ⊢nP (q′, ε, ε)

based on the transition qa[d/d1···dk]−−−−−−−→ r. As all of d1, . . . , dk are successively popped of

the stack in this derivation, we can find states r0, . . . rk ∈ Q, strings v0, . . . , vk ∈ Σ∗ suchthat r0 = r, v0 = v, and

(r0, v0, d1 · · · dk) ⊢∗P (r1, v1, d2 · · · dk) ⊢∗

P · · ·

⊢∗P (rk−1, vk−1, dk) ⊢∗

P (rk, vk, ε) ⊢∗P (q′, ε, ε)


Note, since P terminates on empty stack, rk = q′, vk = ε. By Lemma 3.6 we obtain

(r0, v0\v1, d1) ⊢∗P (r1, ε, ε), (r1, v1\v2, d2) ⊢∗

P (r2, ε, ε),

. . . , (rk−1, vk−1\vk, dk) ⊢∗P (rk, ε, ε)

If x < y then x\y is a prefix of x such that x = (y\x)y. Since the length of thesederivations does not exceed n, it follows from the induction hypothesis that

[r0 d1 r1 ] ⇒∗G v0\v1, [r1 d2 r2 ] ⇒∗

G v1\v2 . . . , [rk−1 dk rk ] ⇒∗G vk−1\vk

Hence, by Lemma 3.6,

[r0 d1 r1 ][r1 d2 r2 ] · · · [rk−1 dk rk ] ⇒∗G (v0\v1)(v1\v2) · · · (vk−1\vk)

Since (v0\v1)(v1\v2) · · · (vk−1\vk) = v0\vk = v\ε = v we conclude

[q dq′ ] ⇒∗G a[r0 d1 r1 ][r1 d2 r2 ] · · · [rk−1 dk rk ] ⇒∗

G av = w

The case where(q, w, d) ⊢P (r, w, d1 · · · dk) ⊢n

P (q′, ε, ε)

based on the transition qτ [d/d1···dk]−−−−−−−→ r is left to the reader. This proves the claim.

With the property of Equation (3.9) in place, we finish the proof as follows: w ∈ L(G)iff S ⇒∗

G w iff [q0Dq′ ] ⇒∗G w for some q′ ∈ Q iff (q0, w,D) ⊢P (q′, ε, ε) for some q′ ∈ Q

iff w ∈ N (P ).

Combination of the three theorems above yields that context-free grammars and push-down automata characterize the same languages.

Corollary 3.50. Let L be a language. The following three statements are equivalent.

(i) L = L(P ) for a push-down automaton P accepting on final state;

(ii) L = N (P ′) for a push-down automaton P ′ accepting on empty stack;

(iii) L = L(G) for a context-free grammar G.

Proof. [(i) ⇒ (ii)] This is Theorem 3.48. [(ii) ⇒ (iii)] This is Theorem 3.49. [(iii) ⇒ (i)]This is Theorem 3.44.

3.6 Properties of the class of context-free languages

In this section we establish a number of positive and negative results for the class ofcontext-free languages. We start off with closure properties that hold for regular lan-guages and show that these hold for context-free languages too.

We will make use of the following observation: Suppose G1 and G2 are two context-free grammars for two languages L1 and L2, respectively. Say Gi = (Vi, Σ, Ri, Si ) for1 6 i 6 2. Then we can assume that G1 and G2 have different variables, i.e. V1∩V2 = ∅,applying a renaming of variables if needed.

3.6. PROPERTIES OF CONTEXT-FREE LANGUAGES 37

Theorem 3.51. Let L1, L2 and L be context-free languages over an alphabet Σ.

(a) The language L1 ∪ L2, the union of L1 and L2, is also a context-free language.

(b) The language L1 · L2 = { w1w2 | w1 ∈ L1, w2 ∈ L2 }, the concatenation of L1

and L2, is also a context-free language.

(c) The language L∗ = { w1 · · ·wn | n > 0, w1, . . . , wn ∈ L }, the Kleene closure of L,is also a context-free language.

Proof. (a) Suppose Li = L(Gi), Gi = (Vi, Σ, Ri, Si) for 1 6 i 6 2. We can assumeV1 ∩ V2 = ∅. Consider the context-free grammar G′ = (V ′, Σ, R′, S′ ) where

V ′ = {S′} ∪ V1 ∪ V2 and R′ = { S′ → S1, S′ → S2 } ∪R1 ∪R2

for a fresh variable S′ not in V1 or V2. We claim L(G′) = L(G1) ∪ L(G2).If w ∈ L(G′), then S′ ⇒∗

G′ w. By definition of G′, either it holds that

S′ ⇒G′ S1 ⇒∗G′ w or S′ ⇒G′ S2 ⇒∗

G′ w

In both cases, the derivation Si ⇒∗G′ w of G′ is also a derivation Si ⇒∗

Giw of Gi, since

G1 and G2 have disjoint sets of variables. Thus w ∈ L(G1) or w ∈ L(G2), and hencew ∈ L(G1) ∪ L(G2). We conclude L(G′) ⊆ L(G1) ∪ L(G2).

If w ∈ L(G1), then S1 ⇒∗G1

w. By definition of G′, we then also have S′ ⇒G′

S1 ⇒∗G′ w. Therefore w ∈ L(G′). It follows that L(G1) ⊆ L(G′). Likewise we can derive

L(G2) ⊆ L(G′). We conclude L(G1) ∪ L(G2) ⊆ L(G′). Hence L(G1) ∪ L(G2) = L(G′).

(b) Consider the context-free grammar G′ = (V ′, Σ, R′, S′) where

V ′ = {S′} ∪ V1 ∪ V2 and R′ = { S′ → S1S2 } ∪R1 ∪R2

for a fresh variable S′ not in V1 or V2. We claim L(G′) = L(G1) · L(G2). If w ∈ L(G′),then S′ ⇒∗

G′ w. By definition of G′, it holds that

S′ ⇒G′ S1S2 ⇒∗G′ w

By Lemma 3.15, we can find two words w1, w2 ∈ Σ∗ such that

S1 ⇒∗G′ w1, S2 ⇒∗

G′ w2, and w1w2 = w

Since V1 ∩ V2 = ∅, we then also have S1 ⇒∗G1

w1 and S2 ⇒∗G2

w2. Thus w1 ∈ L(G1)and w2 ∈ L(G2). Hence w = w1w2 ∈ L(G1) · L(G2) and L(G′) ⊆ L(G1) · L(G2).

If w ∈ L(G1) · L(G2), we can write w = w1w2 for some w1 ∈ L(G1) and w2 ∈ L(G2).Thus S1 ⇒∗

G1w1 and S2 ⇒∗

G2w2 By definition of G′ we then also have S1 ⇒∗

G′ w1

and S2 ⇒∗G′ w2. Therefore, by Lemma 3.15, it follows that S1S2 ⇒∗

G′ w1w2. Hence,

S ⇒G′ S1S2 ⇒∗G′ w1w2 = w

and thus w ∈ L(G′). We conclude that L(G1) · L(G′) ⊆ L(G′).


(c) Consider the context-free grammar G′ = (V ′, Σ, R′, S′) where

V ′ = {S′} ∪ V and R′ = { S′ → ε, S′ → SS′ } ∪R

for a fresh variable S′ not in V . We claim L(G′) = L(G)∗. If w ∈ L(G′), then S′ ⇒∗G′

w. By Theorem 3.43 we can assume S′ ℓ⇒∗

G′ w. By definition of G′, assuming k + 1production sequences for S′, we then have

S′ ℓ⇒G′ SS′ ℓ

⇒∗G

w1S′ ℓ⇒G′ w1SS

′ ℓ⇒∗

G · · ·ℓ⇒∗

G

w1 · · ·wk−1S′ ℓ⇒G′ w1 · · ·wk−1SS

′ ℓ⇒∗

G

w1 · · ·wk−1wkS′ ℓ⇒G′ w1 · · ·wk−1wk = w

for suitable strings w1, . . . , wk ∈ Σ∗. By definition of G′ and Lemma 3.15, we then alsohave S ⇒∗

G wi for 1 6 i 6 k. Thus w1, . . . , wk ∈ L(G). Hence w = w1 · · ·wk ∈ L(G)∗

and L(G′) ⊆ L(G)∗.If w ∈ L(G)∗, then w = w1 · · ·wk for suitable k > 0 and w1, . . . , wk ∈ L(G). We

have S ⇒∗G wi for 1 6 i 6 k. Therefore, by Lemma 3.15, we have

w1 · · ·wi−1S′ ⇒G′ w1 · · ·wi−1SS

′ ⇒∗G′ w1 · · ·wi−1wiS

′

for 1 6 i 6 k. Combining these derivations with w1 · · ·wkS′ ⇒G′ w1 · · ·wk, and using

w = w1 · · ·wk this yields S′ ⇒∗G′ w. Thus w ∈ L(G′) and hence L(G)∗ ⊆ L(G′).

At the end of the section, see Theorem 3.59, we will show that the intersection of twocontext-free languages need not to be context-free again. So, the class of context-freelanguages is not closed under intersection. However, a restricted closure property forintersection does hold.

Theorem 3.52. For a context-free language L1 and a regular language L2, it holds thatthe intersection L1 ∩ L2 is a context-free language.

Proof. Let P = (Q1, Σ, ∆, ∅, →P , q10, F1) be a push-down automaton accepting L1,and let D = (Q2, Σ, δ, q

20, F2) be a deterministic finite automaton accepting L2. Note

automaton D has no τ -transitions. We construct a ‘product’ push-down automaton

P ′ = (Q1 ×Q2, Σ, ∆, ∅, →P ′ , (q10, q20), F1 × F2)

where the transition relation →P ′ is defined by

(q1, q2)a[d/x]−−−−→P ′ (q′1, q

′2) iff q1

a[d/x]−−−−→P q′1 and δ(q2, a) = q′2

(q1, q2)τ [d/x]−−−−→P ′ (q′1, q2) iff q1

τ [d/x]−−−−→P q′1

for q1, q′1 ∈ Q1, q2, q

′2 ∈ Q2, a ∈ Σ, d ∈ ∆∅, and x ∈ ∆∗. Thus, either the PDA

component and the DFA component simultaneously process input and update their


state, or the PDA component makes a τ -move, while the DFA component doesn’t move.We will show L(P ′) = L(P ) ∩ L(D). We first claim

((q1, q2), w, z) ⊢∗P ′ ((q1, q2), ε, z

′) iff

(q1, w, z) ⊢∗P (q1, ε, z

′) and (q2, w) ⊢∗D (q2, ε)

(3.10)

for states q1, q1 ∈ Q1, q2, q2 ∈ Q2, w ∈ Σ∗ and z, z′ ∈ ∆∗.By induction on n, if ((q1, q2), w, z) ⊢n

P ′ ((q1, q2), ε, z′), then (q1, w, z) ⊢∗

P (q1, ε, z′)

and (q2, w) ⊢∗D (q2, ε). Basis, n = 0: Then we have w = ε, q1 = q1, q2 = q2, and

z = z′. Therefore (q1, w, z) ⊢∗P ′ (q1, ε, z

′) and (q2, w) ⊢∗D (q2, ε). Induction step, n+ 1:

Suppose((q1, q2), w, z) ⊢P ′ ((q1, q2), w, z) ⊢n

P ′ ((q1, q2), ε, z′)

Then either

(i) input is processed: w = aw, z = dy, z = xy and (q1, q2)a[d/x]−−−−→P ′ (q1, q2) because

q1a[d/x]−−−−→P q1 and δ(q2, a) = q2, or

(ii) a τ -move is made: w = w, z = dy, z = xy, q2 = q2, and (q1, q2)τ [d/x]−−−−→P ′ (q1, q2)

because q1τ [d/x]−−−−→P q1.

Thus either (i) w = aw, (q1, w, z) ⊢P (q1, w, z) and q2a−→D q2, or (ii) w = w, q2 = q2,

and (q1, w, z) ⊢P (q1, w, z). By induction hypothesis, (q1, w, z) ⊢∗P ′ (q1, ε, z

′) and(q2, w) ⊢∗

D (q2, ε). Therefore, in both cases (q1, w, z) ⊢∗P ′ (q1, ε, z

′) and (q2, w) ⊢∗D

(q2, ε).By induction on n, if (q1, w, z) ⊢n

P (q1, ε, z′) and (q2, w) ⊢∗

D (q2, ε), then it holdsthat ((q1, q2), w, z) ⊢∗

P ′ ((q1, q2), ε, z′). Note, the index n for the derivation in P and the

superscript ∗ for the derivation in D. Basis, n = 0: Then we have w = ε, q1 = q1, z = z′,and also q2 = q2, since D has no ε-transitions. Thus ((q1, q2), w, z) ⊢∗

P ′ ((q1, q2), ε, z′).

Induction step, n + 1: Suppose (q1, w, z) ⊢P ′ (q1, w, z) ⊢nP ′ (q1, ε, z

′). We dis-

tinguish two cases. Case (i): w = aw, z = dy, q1a[d/x]−−−−→P q1, and z = xy. Since

w = aw and D is deterministic, we have (q2, w) ⊢D (q2, w) ⊢∗D (q2, ε) for a unique

state q2 ∈ Q2. Therefore, by construction of P ′, (q1, q2)a[d/x]−−−−→P ′ (q1, q2). From

(q1, w, z) ⊢nP ′ (q1, ε, z

′) and (q2, w) ⊢∗D (q2, ε) and the induction hypothesis it fol-

lows that ((q1, q2), w, z) ⊢∗P ′ ((q1, q2), ε, z

′). Therefore ((q1, q2), w, z) ⊢∗P ′ ((q1, q2), ε, z

′)

in this case. Case (ii): w = w, z = dy, q1τ [d/x]−−−−→P q1, and z = xy. By construction

of P ′ we have (q1, q2)τ [d/x]−−−−→P ′ (q1, q2). Thus, ((q1, q2), w, z) ⊢P ′ ((q1, q2), w, z). From

(q1, w, z) ⊢nP ′ (q1, ε, z

′), (q2, w) ⊢∗D (q2, ε), and the induction hypothesis it follows that

((q1, q2), w, z) ⊢∗P ′ ((q1, q2), ε, z

′). Therefore, ((q1, q2), w, z) ⊢∗P ′ ((q1, q2), ε, z

′) also inthis case, which concludes the induction step.

To show L(P ′) = L(P ) ∩ L(D) we reason as follows: Suppose w ∈ L1 ∩ L2. Thus,w ∈ L1, hence (q10, w, ε) ⊢∗

P (q1, ε, z) for some q1 ∈ F1 and z ∈ ∆∗, and w ∈ L1,hence (q20, w) ⊢∗

D (q2, ε) for some q2 ∈ F2. By the claim (3.10), implication right to


left, it follows that ((q10, q20), w, ε) ⊢∗

P ′ ((q1, q2), ε, z) while (q1, q2) ∈ F1 × F2. Thereforew ∈ L(P ′).

Suppose w ∈ L(P ′). Then ((q10, q20), w, ε) ⊢∗

P ′ ((q1, q2), ε, z) for some states q1 ∈ F1,q2 ∈ F2 and z ∈ ∆∗. By the claim (3.10), implicatin left to right, it follows that(q10, w, ε) ⊢∗

P (q1, ε, z) and (q20, w) ⊢∗D (q2, ε). Thus w ∈ L(P ) and w ∈ L(D), hence

w ∈ L1 ∩ L2.

Example 3.53. Consider the language

L = { w ∈ {a, b}∗ | #a(w) = #b(w), w has no substring abba }

It holds that L is a context-free language. We will use Theorem 3.52 to verify this.

Choose L1 = { w ∈ {a, b}∗ | #a(w) = #b(w) } as context-free language and L2 ={ w ∈ {a, b}∗ | w has no substring abba } as regular language. L2 is regular as it is thecomplement of the language {w ∈ {a, b}∗ | w has a substring abba}, i.e. the complementof the language of the regular expression (a + b)∗abba(a + b)∗. Clearly, L = L1 ∩ L2,hence L is context-free being the intersection of a context-free language and a regularlanguage.

We next consider a counterpart ot the Pumping Lemma for regular languages, Theo-rem ??. Also Theorem 3.54 can be used to prove negative results. Here, regarding alanguage not being context-free.

Theorem 3.54 (Pumping Lemma for context-free languages). Let L be a context-freelanguage over an alphabet Σ. There exists a bound m > 0 such that each w ∈ L with|w| > m can be written as w = uvxyz with u, v, x, y, z ∈ Σ∗, vy 6= ε, |vxy| 6 m, and forall k > 0: uvkxykz ∈ L.

Proof. Let G = (V, Σ, R, S) be a context-free grammar that generates L. Let κ =max{ s | A → X1 · · ·Xs ∈ R }. Put m = κ#V+1.

Pick w ∈ L with |w| > m. Let PT be a parse tree for w, with root labeled S, and aminimal number of nodes.

If the height of PT is h, then |w| 6 κh. Since |w| > m = κ#V+1 we have h > #V +1.So PT has a path from root to a leave of length h. This path has h+ 1 nodes of whichh nodes are labeled with a variable. Since h > #V + 1 there is a variable A that occursmore than once on the path. See Figure 3.9.

Let x be the yield of the parse tree PT low with the lowest occurrence of A on thepath as its root. Let v and y be such that vxy is the yield of the parse tree PT high withthe second to lowest occurrence of A on the path as its root. Let u and z be such thatuvxyz is the yield of PT .

If we replace in PT the subtree PT high by PT low , we obtain a parse tree PT ′ withroot S and yield uxz. Hence uxz ∈ L. If we replace in PT the subtree PT low by PT high ,we obtain a parse tree with root S and yield uv2xw2z. Hence uv2xw2z ∈ L. If we iteratethis k times we get a parse tree with root S and yield uvkxwkz. Hence uvkxwkz ∈ L.


To prove that vy 6= ε we argue that if vy = ε then v = ε and y = ε. Thus the parsetree PT ′ above would be a parse tree with root S, yield uxz = uvzyz = w, but with asmaller number of nodes. This contradicts the minimality of PT . So vy 6= ε.

Finally, to establish that |vxy| 6 m we argue that we can arrange that the heightof parse tree PT high is at most #V + 1 by taking the root node of PT high the lowestnode for which there is a node labeled with the same variable down the path. It followsthat PT high has at most κ#V+1 = m leaves. Since vxy is the yield of PT high , we have|vxy| 6 m.

u v x y z

PTlow

PThigh

PT

S

A

A

Figure 3.9: Repeating variable A in minimal parse tree PT with yield uvxyz.

Example 3.55. The language L = { anbncn | n > 0 } is not context-free. We apply thePumping Lemma. Choose m > 0 arbitrarily. Put w = ambmcm ∈ L. Then |w| > m.Suppose we u, v, x, y, z are such that w = uvxyz, vy 6= ε. Since |vy| 6 |vxy| 6 m, thestring vy contains only a’s, only a’s and b’s, only b’s, only b’s and c’s, or only c’s. Butthen w′ = uv2xy2z has only more a’s, only more a’s and b’s, only more b’s, only more b’sand c’s, or only more c’s, respectively. So, the number of a’s, b’s and c’s in w′ is not inbalance and w′ /∈ L. We conclude that there is no m > 0 that fulfills the requirementsof the Pumping Lemma. Hence L is not context-free.

Example 3.56. The language L = { an | n is prime } is not context-free. We applythe Pumping Lemma to show this: Choose m > 0 arbitrarily. Let p be a prime numberp > m. Put w = ap ∈ L. Then |w| > m. Suppose we u, v, x, y, z are such thatw = uvxyz, vy 6= ε and uvnxynz ∈ L for each n > 0. Put q = |vy| and r = |uxz|. Noteq > 0. We have that nq + r is prime, for each n > 0. Consider n = 2q + r + 2. It holdsthat nq+ r = (2q+ r+2)q+ r = 2q2+ rq+2q+ r = (q+1)(2q+ r). Since both q+1 > 1and 2q+r > 1, it follows that r+nq is not prime for this choice of n. Contradiction. So,no m > 0 meets the requirements of the Pumping Lemma. Hence L is not a context-freelanguage.


Example 3.57. Although {wwR | w ∈ {a, b}∗ } is a context-free language, the languageL = {ww | w ∈ {a, b}∗ } is not context-free. To see this, we apply the Pumping Lemma.Let m > 0 be arbitrary. Consider the string ambmambm ∈ L. Pick u, v, x, y, z such thatambmambm = uvxyz, vy 6= ε, and |vxy| 6 m.

Since |vxy| 6 m and |vy| > 0, leaving out v and y from ambmambm results in astring uxz of the form

aℓbmambm, ambℓambm, ambmaℓbm, or ambmambℓ

where 0 6 ℓ < m, or

aℓbkambm, ambℓakbm, or ambmaℓbk

where 0 6 k, ℓ, and k < m or ℓ < m. But, these possibilities cannot equal ww for somestring w ∈ {a, b}∗. This is because for ww to match, w must start with an a and end ina b. Therefore, both the first group of a’s and b’s equals w and the second group of a’sand b’s equals w. However, because of the restrictions on ℓ and k, in all cases this doesnot hold.

So no choice of u, v, x, y, z can be right. Hence, no m exists that meets the require-ments of the Pumping Lemma. So, L is not a context-free language.

Example 3.58. The language L = { w ∈ {a, b, c}∗ | #a(w) = #b(w) = #c(w) } is notcontext-free. Suppose L is context-free. Then, by Lemma 3.52, L∩ a∗b∗c∗ is context-freetoo. But L∩ a∗b∗c∗ = {anbncn | n > 0} and is not context-free as shown in Example 3.55above.

Theorem 3.59. The class of context-free languages is not closed under intersection andcomplement.

Proof. The languages L1 = { anbncm | n,m > 0 } and L2 = { anbmcm | n,m > 0 }are both context-free. However, L1 ∩ L2 = { anbncn | n > 0 } which is not context-freeaccording to Example 3.55.

The union of two context-free languages is context-free again, according to Theo-rem 3.51. If the complement LC of a context-free language L would be context-free ingeneral, then also

L1 ∩ L2 = (LC1 ∪ LC

2 )C

would be context-free. But it isn’t, as was just shown.

Theorem 3.60. Let L ⊆ Σ∗ be a CFL represented by a CFG G = (V, Σ, R, S) inChomsky normal form. Let w ∈ Σ∗ be a string over Σ. Then it can be decided if w ∈ Lor not.

Proof. Let w = a1 · · · an ∈ Σ∗ where n > 0. If n = 0, then we have w ∈ L iff S → ε ∈ R.Suppose n > 0. Define the sets Vi,k ⊆ V , for 1 6 i 6 k 6 n as follows

Vi,i= {A ∈ V | A → ai ∈ R } for 1 6 i 6 n

Vi,k =⋃{A ∈ V | ∃A → BC ∈ R ∃j, i 6 j < k : B ∈ Vi,j ∧ C ∈ Vj+1,k }

for 1 6 i < k 6 n


The intuition for Vi,k is that each variable in Vi,k can produce, in one or more steps, thesubstring ai · · · ak of w. With V1,n defined as above, decide w ∈ L if S ∈ V1,n; decidew /∈ L if S /∈ V1,n.

To establish the correctness of the above decision procedure, we prove

A ⇒∗G ai · · · ak ⇐⇒ A ∈ Vi,k

for A ∈ V and 1 6 i 6 k 6 n, by induction on k − i. Basis, k − i = 0: Direct fromthe definition of Vi,i, for 1 6 i 6 n. Induction step, k − i > 0: (⇒) If A ⇒∗

G ai · · · akthen we must have A ⇒G BC ⇒∗

G ai · · · ak for some B,C ∈ V , since G is in Chomskynormal form. Thus we can find j, i 6 j < k, such that B ⇒∗

G ai · · · aj and C ⇒∗G

aj+1 · · · ak. Note j − i < k − i and k − (j + 1) < k − i. By induction hypothesis wefind B ∈ Vi,j , C ∈ Vj+1,k. Therefore, A ∈ Vi,k by definition of Vi,k. (⇐) If A ∈ Vi,k,where k > i, we can choose a production A → BC ∈ R and an index j, i 6 j < ksuch that B ∈ Vi,j , C ∈ Vj+1,k. Note again j − i < k − i and k − (j + 1) < k − i.Thus, by induction hypothesis, B ⇒∗

G ai · · · aj , C ⇒∗G aj+1 · · · ak. Hence, we have

A ⇒G BC ⇒∗G ai · · · ajaj+1 · · · ak = ai · · · ak, which concludes the induction step.

Example 3.61 (HMU Example 7.34). Consider the CFG G with variables S, A, B, C,terminals 0, 1, start symbol S, and production rules

S → AB | BC A → BA | 0 B → CC | 1 C → AB | 0

Note that G is in Chomsky normal form. Consider the string 10010. We verify, usingthe algorithm described above, whether 10010 ∈ L(G).

We set up a table of 5 columns, one for each position of 10010, and six rows, thebottom row will initially store the letters of the string. The cell on the i-th row and k-thcolumn, for 1 6 i, k 6 5, will hold the set Vi,i+k−1, as defined by the algorithm, providedi 6 i+k−1 6 5. Otherwise the cell is left empty. We put the ith letter of 10010 at thecell in column i of row 6. See Figure 3.10 on the left for a symbolic representation.

The actual values of the sets Vik inductively built up from row 5 going up to row 1.We put a variable in V1,1 and V4,4 if it can directly produce 1. Thus, V1,1 = V4,4 = {B}.Likewise V2,2, V3,3, and V5,5 are set to {A,C}, since both A and C have a productionrule with righthand-side 0. This describes row 5 of the table at the right of Figure 3.10.

Now, by definition, V1,2 contains variables that have a righthand-side yielding avariable in V1,1 and in V2,2, respectively. Since S → BC and B ∈ V1,1, C ∈ V2,2, weput S in V1,2. Because A has the production BA, B ∈ V1,1, and A ∈ V2,2, we also putA in V1,2. The variable B is not put into V1,2; for its production B → CC we haveC /∈ V1,1. Also C si not added to V1,2. (It would, if it had a production C → BA.) Thesets V2,3, V3,4, and V4,5 are established similarly.

To see if S has to be put in the set V1,3, we check (i) for the production S → ABif A ∈ V1,1 = {B} and B ∈ V2,3 = {B}, or A ∈ V1,2 = {S,A} and B ∈ V3,3 = {A,C},and (ii), for the production S → BC if B ∈ V1,1 = {B} and C ∈ V2,3 = {B}, orB ∈ V1,2 = {S,A} and C ∈ V3,3 = {A,C}. Since none of the case hold S is not addedto V1,3. Also the other variable are not added to V1,3. Hence, it remains empty.


V1,5

V1,4 V2,5

V1,3 V2,4 V3,5

V1,2 V2,3 V3,4 V4,5

V1,1 V2,2 V3,3 V4,4 V5,5

a1 a2 a3 a4 a5

{S,A,C}

∅ {S,A,C}

∅ {B} {B}

{S,A} {B} {S,C} {S,A}

{B} {A} {A} {B} {A}

1 0 0 1 0

Figure 3.10: Checking membership for grammar in Chomsky normal form

We can see that B ∈ V2,4. For, B → CC, C ∈ V2,2 = {A,C}, and C ∈ {S,C}.However, the other variables to not meet such a requirement. Therefore, V2,4 = {B}. Theother cells are filled likewise combining production rules and combinations of previouslyestablished sets of variables.


Exercise 3.6.1. Show that the class of context-free languages is closed under reversal,i.e. if L is a context-free language then so is LR = { wR | w ∈ L }.

Exercise 3.6.2. Show that the class of context-free languages is not closed under setdifference, i.e. if L1 and L2 are context-free languages, then L1\L2 = {w ∈ L1 | w /∈ L2}is not context-free in general.

Exercise 3.6.3. Show that the language L1 = { an2

| n > 0 } is not context-free.

Exercise 3.6.4. Show that the language L2 = {wwRw | w ∈ {a, b}∗} is not context-free.

Exercise 3.6.5. Show that the language L3 = {0n102n103n | n > 0} is not context-free.

Exercise 3.6.6. Show that the language L4 = { anbℓcm | n, ℓ > m } is not context-free.


Answers to exercises of Chapter 3

Answers for exercises of Section 3.1

Answer to Exercise 3.1.1

(a) One maximal derivation for (q0, ab, ε):(q0, ab, ε) ⊢P (q0, b, 1) ⊢P (q1, ε, ε) 0P

(b) One maximal derivation for (q0, abb, ε):(q0, abb, ε) ⊢P (q0, bb, 1) ⊢P (q1, b, ε) 0P

(c) One maximal derivation for (q0, aab, ε):(q0, aab, ε) ⊢P (q0, ab, 1) ⊢P (q0, b, 11) ⊢P (q1, ε, 1) ⊢P (q0, ε, ε) 0P

(d) Three maximal derivations for (q0, aaabbb, ε):

(q0, aaabbb, ε) ⊢P (q0, aabbb, 1) ⊢P (q0, abbb, 11) ⊢P (q0, bbb, 111) ⊢P

(q1, bb, 11) ⊢P (q1, b, 1) ⊢P (q1, ε, ε) 0P


(q1, bb, 11) ⊢P (q1, b, 1) ⊢P (q0, b, ε) 0P


(q1, bb, 11) ⊢P (q0, bb, 1) ⊢P (q1, b, ε) 0P


(a)

0 1 2 3

a[∅/1]a[1/11]

τ [1/1]

τ [∅/ε]

b[∅/1]b[1/11]

τ [1/1]

τ [∅/ε]

c[1/ε]

τ [∅/ε]


q0 an 1n n > 0

q1 anbm 1n+m n,m > 0

q2 anbmcℓ 1n+m−ℓ 0 6 ℓ 6 n+m

q3 anbmcℓ ε n+m−ℓ = 0

(b)

0 1 2 3

a[∅/1]a[1/11]

τ [1/1]

τ [∅/ε]

b[1/ε]

τ [1/1]

τ [∅/ε]

c[1/ε]

τ [∅/ε]


q0 aℓ 1ℓ ℓ > 0

q1 aℓbn 1ℓ−n 0 6 n 6 ℓ

q2 aℓbncm 1ℓ−n−m 0 6 m 6 ℓ−n

q3 aℓbncm ε ℓ−n−m = 0


(c)

0 1 2 3

a[∅/1]a[1/11]

τ [1/1]

τ [∅/ε]

b[1/ε]b[∅/2]b[2/22]

τ [1/1]

τ [∅/ε]

c[2/ε]

τ [∅/ε]


q0 an 1n n > 0

q1 anbℓ 1n−ℓ 0 6 ℓ 6 n

anbn+k 2k k > 0

q2 anbn+kcm 2k−m 0 6 m 6 k

q3 anbn+kcm ε k = m


(a)

0 1

2

3

a[∅/1]a[1/11]

τ [∅/ε]τ [1/1]

b[1/1] b[1/ε]

τ [∅/ε]


q0 an 1n n > 0

q1 anbm 1n+m n,m > 0

q2 anbmcℓ 1n+m−ℓ 0 6 ℓ 6 n+m

q3 anbmcℓ ε n+m−ℓ = 0

(b)

0 1 2

a[∅/1]a[1/11]

τ [∅/ε]τ [1/1]

b[1/ε]

τ [∅/ε]

b[∅/ε] state q input w stack x

q0 an 1n n > 0

q1 anbm 1n+m 0 6 m 6 n

q2 anbm+ℓ ε n = m, ℓ > 0

(c)

0

1

2

34

5

a[∅/1]a[1/11]

a[1/11]

τ [1/1]

b[1/ε]

b[1/ε]

b[1/ε]τ [∅/ε]


q0 a2n 12n n > 0

q1 a2n+1 12n+1 n > 0

q2 a2nb3m 12n−3m 0 6 3m 6 2n

q3 a2nb3m+1 12n−(3m+1) 0 6 3m+1 6 2n

q4 a2nb3m+2 12n−(3m+2) 0 6 3m+2 6 2n

q5 a2nb3m+1 ε 2n = 3m+1


(d)

0 1

2

3

a[∅/1]a[1/11]

τ [∅/ε]τ [1/1]

b[1/ε] τ [∅/ε]

τ [1/1]

b[∅/ε]

b[∅/ε]


q0 an 1n n > 0

q1 anbm 1n−m 0 6 m 6 n

q2 anbm+ℓ ε n−m = 0, ℓ > 0

q3 anbm+ℓ+1 ε n−m = 0, ℓ > 0

anbm 1n−m n−m > 0


(a)

0 1

a[∅/1] b[∅/2]a[1/11] b[2/22]a[2/ε] b[1/ε]

τ [1/1]τ [2/2]


q0 w 1n #a(w)−#b(w) = n, n > 0

w 2m #b(w)−#a(w) = m, m > 0

q1 w 1n #a(w)−#b(w) = n, n > 0

w 2m #b(w)−#a(w) = m, m > 0

A computation arrives at the only accepting state q1, precisely when either #a(w) >#b(w) or #b(w) > #a(w) for input w. Thus, #a(w) 6= #b(w). Hence, the languageaccepted by the PDA is L8.


(b)

0 1 234τ [∅/ε]τ [∅/ε]

a[∅/1] b[∅/2] c[∅/ε]a[1/11] b[2/22] c[1/1]a[2/ε] b[1/ε] c[2/2]

τ [1/1]τ [2/2]

a[∅/ε] b[∅/2] c[∅/3]a[2/2] b[2/22] c[3/33]a[3/3] b[3/ε] c[2/ε]

τ [1/1]τ [2/2]


q0 w ε

q1 w 1n #a(w)−#b(w) = n, n > 0

w 2m #b(w)−#a(w) = m, m > 0

q2 w 1n #a(w)−#b(w) = n, n > 0

w 2m #b(w)−#a(w) = m, m > 0

q3 w 2m #b(w)−#c(w) = m, m > 0

w 3ℓ #c(w)−#b(w) = ℓ, ℓ > 0

q4 w 2m #b(w)−#c(w) = m, m > 0

w 3ℓ #c(w)−#b(w) = ℓ, ℓ > 0

The PDA is a non-deterministic combination of two PDA similar to the one ofitem (a). The left part, with accepting state q4 accepts L1

9 = { w ∈ {a, b, c}∗ |#b(w) 6= #c(w) }. The right part, with accepting state q2, accepts L2

9 = { w ∈{a, b, c}∗ | #a(w) 6= #b(w) }. Thus, the PDA itself accepts L1

9 ∪ L29 = L9.

Answers for exercises of Section 3.2


a Sℓ⇒G XbY

ℓ⇒G aXbY

ℓ⇒G aaXbY

ℓ⇒G aabY

ℓ⇒G aabaY

ℓ⇒G aababY

ℓ⇒G aabab;

Sr⇒G XbY

r⇒G XbaY

r⇒G XbabY

r⇒G Xbab

r⇒G aXbab

r⇒G aaXbab

r⇒G aabab

(b) Sℓ⇒G XbY

ℓ⇒G bY

ℓ⇒G baY

ℓ⇒G baaY

ℓ⇒G baabY

ℓ⇒G baab;

Sr⇒G XbY

r⇒G XbaY

r⇒G XbaaY

r⇒G XbaabY

r⇒G Xbaab

r⇒G baab

(c) Sℓ⇒G XbY

ℓ⇒G aXbY

ℓ⇒G aaXbY

ℓ⇒G aaaXbY

ℓ⇒G aaabY

ℓ⇒G aaabbY

ℓ⇒G

aaabb; Sr⇒G XbY

r⇒G XbbY

r⇒G Xbb

r⇒G aXbb

r⇒G aaXbb

r⇒G aaaXbb

r⇒G

aaabb



(a) (LG(A) ⊆ LA) We prove A ⇒nG w implies w ∈ LA, for all w ∈ {a, b}∗, by induction

on n. Basis, n = 0: There is nothing to show, S /∈ {a, b}∗. Induction step, k+1: IfA ⇒n+1

G w, then A ⇒G ε ⇒nG w or A ⇒G aA ⇒n

G w. Thus w = ε or w = aw′

where A ⇒nG w′ (by Lemma 3.15). In the first case we are done, ε ∈ LA. In the

second case we have w′ ∈ LA, i.e. w′ = am for some m > 0. But hen it holds that

w = aw′ = aam = am+1.

(LA ⊆ LG(A)) By induction on n, an ∈ LG(A). Basis, n = 0: We have A ⇒G ε.Induction step, n + 1: By induction hypothesis an ∈ LG(A), i.e. A ⇒∗

G an.Therefore, by Lemma 3.15, aA ⇒∗

G aan. Thus, since A →G aA, we obtainA ⇒G aA ⇒∗

G aan = an+1.

(b) Similar to part (a) one shows LG(B) = { bmkern1mun | n > 0 }. Note, L =LG(A) ∪ LG(B). We have, for w ∈ {a, b}∗, w ∈ L(G) iff S ⇒∗

G w iff A ⇒∗G w or

B ⇒∗G w iff w ∈ LG(A) or w ∈ LG(B) iff w ∈ LG(A) ∪ LG(B) iff w ∈ L. Thus

L(G) = L.


(a) Consider CFG G1 given by

S → aSb | A | B, A → a | aA, B → b | bB

We have LG1(A) = { an | n > 1 } and LG1

(B) = { bn | n > 1 } with a proofsimilar to that of Exercise 3.2.2. Thus v ∈ LG1

(A) iff v = an for some n > 1, andu ∈ LG1

(B) iff u = bn for some n > 1.

(L(G1) ⊆ L) Suppose S ⇒∗G1

w for w ∈ {a, b}∗. Then S ⇒nG1

anSbn ⇒G1

anAbn ⇒∗G1

w for some n > 0, or S ⇒nG1

anSbn ⇒G1anBbn ⇒∗

G1w for

some n > 0. In the first case we have w = anvbn and A ⇒∗G1

v. In the secondcase we have w = anubn and B ⇒∗

G1u. So, w = anvbn and v = am for some

n > 0, m > 1, or w = anubn and u = bm for some n > 0, m > 1. Hencew = an+mbn or w = anbn+m for some n > 0, m > 1. In either case, w ∈ L1.

(L1 ⊆ L(G1)) If w ∈ L1 then w = an+mbn or w = anbn+m for some n > 0,m > 1. Suppose w = an+mbn where n > 0, m > 1. We have S ⇒n

G1anSbn ⇒G1

anAbn ⇒m−1G1

anam−1Abn ⇒G1anam−1abn = an+mbn, thus an+mbn ∈ L(G1).

Similarly, if w = anbn+m for some n > 0, m > 1, then w ∈ L(G1).

(b) Define the CFG G2 by

S → DC | AEA → ε | aA, B → ε | bB, C → ε | cCD → aDb | aA | bB, E → bEc | bB | cC

One can showLG2

(A)= { an | n > 0 }LG2

(C)= { cn | n > 0 }LG2

(D)= { anbm | n,m > 0, n 6= m }LG2

(E)= { bmcℓ | m, ℓ > 0, m 6= ℓ }


by proofs similar to the proofs given for Exercise 3.2.2 and for part (a). We have

L(G) = LG2(D)LG2

(C) ∪ LG2(A)LG2

(E)

= { anbmcℓ | n,m, ℓ > 0, n 6= m } ∪ { anbmcℓ | n,m, ℓ > 0, m 6= ℓ }

= { anbmcℓ | n,m, ℓ > 0, n 6= m ∨m 6= ℓ }

Answer to Exercise 3.2.4 A CFG for 0 has no production rules; a CFG for 1 has oneproduction rule, viz. S → ε; a CFG for the regular expression a, for a ∈ Σ, has oneproduction rule, viz. S → a.

Suppose G1 = (V1, Σ, R1, S1) and G2 = (V2, Σ, R2, S2) are CFG such that L(G1) =L(r1) and L(G2) = L(r2) for two regular expressions r1 and r2. Assume V1∩V2 = ∅ andthe variable S is fresh, i.e. S /∈ V1∪V2. PutG1+2 = ({S}∪V1∪V2, Σ, {S → S1, S → S2}∪R1 ∪R2, S). Then L(G1+2) = L(r1 + r2).

Let G1, G2, r1, r2 and S be as above. Put G1·2 = ({S} ∪ V1 ∪ V2, Σ, {S → S1S2} ∪R1 ∪R2, S). Then L(G1·2) = L(r1 · r2).

Suppose G = (V, Σ, R, S) is a CFG such that L(G) = L(r) for a regular expression r.Let S′ be a fresh variable, i.e. S′ /∈ V . Put G∗ = ({′}∪V, Σ, {S′ → ε, S′ → SS′}∪R, S′).Then L(G∗) = L(r∗).Answer to Exercise 3.2.5 We prove, if S ⇒k

G x then x ∈ X by induction on k. Basis,k = 0: Then we have x = S and S ∈ X . Induction step, k + 1: if S ⇒G aS ⇒k

G x,then x = ax′ and x′ ∈ X by induction hypothesis. If follows that x ∈ X . Similarly, ifS ⇒ Sb ⇒k

G x then x = x′b and x′ ∈ X by induction hypothesis, and it follows thatx ∈ X . If S ⇒G a ⇒k

G x then x = a ∈ X ; if S ⇒G b ⇒kG x then x = b and x ∈ X .

If S ⇒∗G w and w ∈ {a, b} we have w = anbm for some n,m > 0 with n +m > 1. So,

w has no substring ba.


(a) An inductive argument shows S ⇒∗G x implies #a(x) = #b(x) for x ∈ {S, a, b}∗.

Thus, L(G) ⊆ L. For the reverse, we prove w ∈ L implies w ∈ L(G) by inductionon |w|. Basis, |w| = 0: We have w = ε and S ⇒G ε. Induction step, |w| > 0:Suppose w starts with a. Then we can find strings u, v ∈ {a, b}∗ such that w =aubv, #a(u) = #b(u) and #a(v) = #b(v). Note, |u|, |v| < |w|. Thus, by inductionhypothesis, S ⇒∗

G u, S ⇒∗G v. Therefore, S ⇒∗

G aSbS ⇒∗G aubS ⇒∗

G aubv =w. If w starts with b a similar argument applies.

(b) The CFG G′ given by S → ε | SS | aSb | bSa also generates L.

Answers to exercises of Section 3.3

Answer to Exercise 3.3.1 The generating symbols of G are S,A,B,E of which E is notreachable. The production rules that remain are

S → A A → BB B → A | ε


Answer to Exercise 3.3.2 Elimination of unit rules gives the grammar

S → BC | a A → BC | a B → ε | b C → ε | c

Expanding RHS to eliminate ε-productions for non-start symbols gives

S → BC | B | C | a | ε A → BC | B | C B → b C → c

Answer to Exercise 3.3.3 Elimination of unit rules and ε-productions gives the grammar

S → AAA | AA | aA | a | ε A → aA | a

Using auxilliary variables we obtain the CFG in CNF below.

S → AX1 | AA | XaA | a X1 → AA Xa → a A → XaA | a



S

X b Y

a X a Y

a X b Y

ε ε

aababS

X b Y

a X b Y

ε a Y

ε

baab

S

X b Y

a X b Y

a X ε

a X

ε

aaabb

Answer to Exercise 3.4.2 (a) The following two complete parse trees are different butboth have yield aaab:

S

A B

a a a B

b

S

A B

A a b

A a

a


(b) The variable B can only yield the string b. Thus the grammar

S → Ab | aab A → a | Aa

has a grammar that is equialent to the language L(G) of G. The string aab can alsobe produced using the other three production rules, hence we can leave the productionS → aab out without changing the associated language. We obtain the grammar G′

with production rulesS → Ab A → a | Aa

Each sentential form of G′ has at most one variable. Therefore, G′ is an unambigiousgrammar.


(a) The following two complete parse trees are different but both have yield abab:

S

a S b S

b S a S ε

ε ε

S

a S b S

ε a S b S

ε ε

(b) The following CFG generates strings with an equal number of a’s and b’s. Theintuition is that S generates strings with as many a’s as b’s, A generates strings withone more a than b’s, and B generates strings with one more b than a’s.

S → aB | bA | ε A → aS | bAA B → bS | aBB

When producing left-most a string w from the start symbol S, at any moment exactlyone product rule can be applied when w should be yielded. Sentential forms are of theshape wx with w ∈ {a, b}∗, x ∈ {S,A,B}∗.


Answer to Exercise 3.6.1 Let G = (V, Σ, R, S) be a CFG such that L(G) = L.Construct the CFG G′ = (V, Σ, R′, S) by putting R′ = {A → αR | A → α ∈ R }. Thenone may prove

A ⇒∗G w ⇐⇒ A ⇒∗

G′ wR

From this it follows that L(G′) = LR, and that LR is a context-free language.We first prove the following claim:

A ⇒nG w implies A ⇒∗

G′ wR (3.11)

for A ∈ V , w ∈ Σ∗, by induction on n. Basis, n = 0: Trivial. Induction step, n+1: IfA ⇒n+1

G w, we can find k > 0, X1, . . . , Xk ∈ V ∪ Σ, w1, . . . , wk ∈ Σ∗, and n1, . . . , nk >


0 such that A →G X1 · · ·Xk, Xi ⇒ni

G wi for 1 6 i 6 k, n1 + · · · + nk = n, andw1 · · ·wk = w. We have A →G′ Xk · · ·X1 by construction of G′, and Xi ⇒∗

G′ wi for1 6 i 6 k, by induction hypothesis. It follows that A ⇒∗

G′ wRk · · ·wR

1 . Since wR =(w1 · · ·wk)

R = wRk · · ·wR

1 , we have A ⇒∗G′ wR. This proves half of the claim (3.11).

The other implication is similar.From Equation (3.11) we derive S ⇒∗

G w iff S ⇒∗G′ wR. Therefore L(G′) = L(G)R.

Hence, LR = L(G′) and LR is a context-free language.

Answer to Exercise 3.6.2 Consider L1 = { anbncm | n,m > 0 } and L2 = { anbmcℓ |n,m, ℓ > 0, m 6= ℓ }.

Answer to Exercise 3.6.3 Let m > 0 be arbitrary. Consider the string w = a2m

∈ L.Pick strings u, v, x, y, z ∈ Σ∗ such that w = uvxyz, |vxy| 6 m, and vy 6= ε. We havevy = aℓ for some ℓ, 1 6 ℓ 6 m. Thus 2m

2

= |w| < |uv2xy2z| = 2m+ℓ 6 2m+m < 2m+1.Hence, uv2xy2z /∈ L. It follows by the Pumping Lemma for context-free languages thatL is not context-free.

Answer to Exercise 3.6.4 Letm > 0 be arbitrary. Consider the string w = bambbambbamb ∈L. Pick strings u, v, x, y, z ∈ Σ∗ such that w = uvxyz, |vxy| 6 m, and vy 6= ε. We havevy = aℓ for some ℓ, 1 6 ℓ 6 m. Otherwise would pumping of v and y change the numberof b’s unbalancedly. But if vy consists of a’s only, the number of a’s in one of the blockscan be made unbalanced. It follows by the Pumping Lemma for context-free languagesthat L is not context-free.

Answer to Exercise 3.6.5 Letm > 0 be arbitrary. Consider the string w = 0m102m103m ∈L. Pick strings u, v, x, y, z ∈ Σ∗ such that w = uvxyz, |vxy| 6 m, and vy 6= ε. We havevy = 0ℓ for some ℓ, 1 6 ℓ 6 m. Otherwise would pumping of v and y change the numberof 1’s. But if vy consists of 0’s only, it is a substring of the prefix 0m, of the string 102m1,or of the suffix 03m. In all these cases, pumping of v and y disturbs the required patternfor being a member of L. It follows by the Pumping Lemma for context-free languagesthat L is not context-free.

Answer to Exercise 3.6.6 Letm > 0 be arbitrary. Consider the string w = ambmcm ∈ L.Pick strings u, v, x, y, z ∈ Σ∗ such that w = uvxyz, |vxy| 6 m, and vy 6= ε. If vy containsno c’s, then uxz does not meet the requirement of L. The number of a’s or the numberof b’s, or both, is stricly less than the number of c’s. If vy contains a c’s, then vxy isa substring of bmcm. Thus pumping of v and y gives a string strictly less a’s than c’s,also failing the requirement of L. It follows by the Pumping Lemma for context-freelanguages that L is not context-free.

Documents

Push-DownAutomata and Context-Free Languagesevink/education/2it70/PDF2016/2it… · associated with push-down automata is often called the class of context-free languages. This class