CS:4330 Theory of ComputationSpring 2018
Context-Free LanguagesContext-Free Grammars
Haniel Barbosa
Readings for this lecture
Chapter 2 of [Sipser 1996], 3rd edition. Section 2.1.
Context-Free Grammars (CFG)
B There are languages, such as {0n1n | n≥ 0} that cannot be described byfinite automata (or regexps)
B Context-free grammars provide a more powerful mechanism for languagespecification.
B Context-free grammars can describe features that have a recursivestructure, making them useful beyond finite automata.
1 / 39
Formal definition of a CFG
A context-free grammar is a 4-tuple (V , Σ, R, S) in which:
B V is a finite set of symbols called the variables or nonterminals
B Σ is a finite set of symbols, disjoint from V , called terminals
B R is a finite set of rules of the form lhs→ rhs, in which lhs ∈ V andrhs ∈ (V ∪Σ)∗
B S ∈ V is the start nonterminal
2 / 39
Example
B CFG G1 has the following rules:
A → 0A1A → BB → #
B Nonterminals of G1 are {A, B} and A is the start symbol
B Terminals of G1 are {0, 1, #}
3 / 39
Language specification
A grammar is used for a language specification by generating each string of thelanguage in the following manner:
1. Write down the start variable; it is the lhs of the first rule, unless specifiedotherwise
2. Find a variable that is written down and a rule whose lhs is that variable.Replace the written down variable with the rhs of that rule.
3. Repeat step 2 until no variables remain in the string thus generated.
NoteThe sequence of substitutions used to obtain a string using a CFG is called aderivation and may be represented by a tree called derivation tree or a parse tree
4 / 39
Example derivation tree
The derivation tree of the string 000#111 using CFG G1 is:
A
0
A
0
A
0
B
# 1 1 1
5 / 39
Note
B All strings of terminals generated in this way constitute the languagespecified by the grammar
B We write L (G) for the language generated by the grammar G. Thus,L (G1) = {0n#1n | n≥ 0}
B A language generated by a context-free grammar (CFG) is called aContext-Free Language (CFL).
6 / 39
CFG G2
The CFG G2 specifies a fragment of English:
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
7 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉
⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉
⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉
⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉
⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉
⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉
⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉
⇒ a boy sees
8 / 39
Example derivation with G2
〈SENTENCE〉 → 〈NOUN-PHRASE〉〈VERB-PHRASE〉〈NOUN-PHRASE〉 → 〈CP-NOUN〉 | 〈CP-NOUN〉〈PREP-PHRASE〉〈VERB-PHRASE〉 → 〈CP-VERB〉 | 〈CP-VERB〉〈PREP-PHRASE〉〈PREP-PHRASE〉 → 〈PREP〉〈CP-NOUN〉〈CP-NOUN〉 → 〈ARTICLE〉〈NOUN〉〈CP-VERB〉 → 〈VERB〉 | 〈VERB〉〈NOUN-PHRASE〉〈ARTICLE〉 → a | the〈NOUN〉 → boy | girl | flower〈VERB〉 → touches | likes | sees〈PREP〉 → with
〈SENTENCE〉 ⇒ 〈NOUN-PHRASE〉〈VERB-PHRASE〉⇒ 〈CP-NOUN〉〈VERB-PHRASE〉⇒ 〈ARTICLE〉〈NOUN〉〈VERB-PHRASE〉⇒ a〈NOUN〉〈VERB-PHRASE〉⇒ a boy〈VERB-PHRASE〉⇒ a boy〈CP-VERB〉⇒ a boy〈VERB〉⇒ a boy sees
8 / 39
Direct derivation
B If u,v,w ∈ (V ∪Σ)∗ (i.e. are strings of variables and terminals) andA→ w ∈ R (i.e. is a rule of the grammar), then we say that uAv yields uwv,written
uAv⇒ uwv
B We may also say that uwv is directly derived from uAv using the rule A→ w
9 / 39
Derivation
B We write u ∗⇒ v if u = v or if a sequence u1, . . . , uk ∈ (V ∪Σ)∗ exists, fork ≥ 0, and u⇒ u1⇒ ··· ⇒ uk⇒ v
B We may also say that u, u1, . . . , uk, v is a derivation of v from u
10 / 39
Language specified by G
If G = (V , Σ, R, S) is a CFG then the language specified by G (or the language ofG) is a CFL
L (G) = {w ∈ Σ∗ | S ∗⇒ w}
11 / 39
More examples of CFGs
B Consider the grammar
G3 = ({S}, {a,b}, {S→ aSb | SS | ε}, S)
B L (G3) contains strings such as
abab, aaabbb, aababb
Note
If one thinks of a and b as the symbols ‘(’ and ‘)’ then we can see that L (G3) isthe language of all strings of properly nested parenthesis
12 / 39
More examples of CFGs
B Consider the grammar
G3 = ({S}, {a,b}, {S→ aSb | SS | ε}, S)
B L (G3) contains strings such as
abab, aaabbb, aababb
Note
If one thinks of a and b as the symbols ‘(’ and ‘)’ then we can see that L (G3) isthe language of all strings of properly nested parenthesis
12 / 39
Important application
Context-free grammars are used as basis for compiler design and implementation
B Context-free grammars are used as specification mechanisms forprogramming languages
B Designers of compilers use such grammars to implement compiler’scomponents, such as scanners, parsers, code generators, code synthesizers
B The implementation of almost any programming languages is preceded by acontext-free grammar that specifies it
13 / 39
Example
B Consider the grammar G4 = ({E, T , F},{a,+, ∗, (, )}, R, E) in which R is:
E → E+T | TT → T ∗F | FF → (E) | a
B L (G4) is the language of arithmetic expressions
14 / 39
Example
B Consider the grammar G4 = ({E, T , F},{a,+, ∗, (, )}, R, E) in which R is:
E → E+T | TT → T ∗F | FF → (E) | a
B L (G4) is the language of arithmetic expressions
14 / 39
Application: program synthesis
Consider the grammar GS = ({E, B},{0,1,x,y,+, ite,≤, ∧, ¬, , , (, )}, R, E) inwhich R is:
E → 0 | 1 | x | y | (E+E) | ite(B, E, E)B → (¬B) | (B∧B) | (E ≤ E)
What is a program generated with this grammar that solves the following problem:
prog(x,y)≥ x∧prog(x,y)≥ y
A solution is prog(x,y) = ite(x≤ y, y, x), i.e. the max function.
15 / 39
Application: program synthesis
Consider the grammar GS = ({E, B},{0,1,x,y,+, ite,≤, ∧, ¬, , , (, )}, R, E) inwhich R is:
E → 0 | 1 | x | y | (E+E) | ite(B, E, E)B → (¬B) | (B∧B) | (E ≤ E)
What is a program generated with this grammar that solves the following problem:
prog(x,y)≥ x∧prog(x,y)≥ y
A solution is prog(x,y) = ite(x≤ y, y, x), i.e. the max function.
15 / 39
Designing CFGs
B As with the design of automata, the design of CFGs requires creativity
B CFGs are even trickier to construct than finite automata since “we are moreaccustomed to programming a machine than to specify programminglanguages.”
16 / 39
Design Techniques
B Many CFGs are unions of simpler CFGs. Hence the suggestion is toconstruct smaller, simpler grammars first and then to join them into largergrammar
B The mechanism of grammar combination consists of putting all their rulestogether and adding the new rules
S→ S1 | · · · | Sk
where the nonterminals Si, for 1≤ i≤ k, are the start variables of theindividual grammars and S is the new variable
17 / 39
Example Grammar Design
Design a grammar for the language {0n1n | n≥ 0}∪{1n0n | n≥ 0}
1. Construct the grammar S1→ 0S11 | ε for the language
{0n1n | n≥ 0}
Construct the grammar S2→ 1S10 | ε for the language
{1n0n | n≥ 0}
3. Put them together adding the rule S→ S1 | S2, obtaining
S → S1 | S2S1 → 0S11 | εS2 → 1S20 | ε
18 / 39
Example Grammar Design
Design a grammar for the language {0n1n | n≥ 0}∪{1n0n | n≥ 0}1. Construct the grammar S1→ 0S11 | ε for the language
{0n1n | n≥ 0}
Construct the grammar S2→ 1S10 | ε for the language
{1n0n | n≥ 0}
3. Put them together adding the rule S→ S1 | S2, obtaining
S → S1 | S2S1 → 0S11 | εS2 → 1S20 | ε
18 / 39
Example Grammar Design
Design a grammar for the language {0n1n | n≥ 0}∪{1n0n | n≥ 0}1. Construct the grammar S1→ 0S11 | ε for the language
{0n1n | n≥ 0}
Construct the grammar S2→ 1S10 | ε for the language
{1n0n | n≥ 0}
3. Put them together adding the rule S→ S1 | S2, obtaining
S → S1 | S2S1 → 0S11 | εS2 → 1S20 | ε
18 / 39
Example Grammar Design
Design a grammar for the language {0n1n | n≥ 0}∪{1n0n | n≥ 0}1. Construct the grammar S1→ 0S11 | ε for the language
{0n1n | n≥ 0}
Construct the grammar S2→ 1S10 | ε for the language
{1n0n | n≥ 0}
3. Put them together adding the rule S→ S1 | S2, obtaining
S → S1 | S2S1 → 0S11 | εS2 → 1S20 | ε
18 / 39
Second Design Technique
B Constructing a CFG for a regular language is easy if one can first construct aDFA for the language
B Conversion procedure:
1. Make a variable R1 for each state qi of the DFA2. Add rules Ri→ aRj to the CFG if δ(qi,a) = qj is a transition in the DFA3. Add the rule Ri→ ε if qi is an accept state of the DGA4. If q0 is the start state of the DGA make R0 the start variable of the CFG.
TheoremEvery regular language is context-free.
19 / 39
Third Design Technique
B Certain CFLs contain strings with two related substrings such as 0n and 1n
in {0n1n | n≥ 0}
B Example of relationship: to recognize such a language a machine wouldneed to remember an unbounded amount of information about one of thesubstrings
B A CFG that handles this situation uses a rule of the form R→ uRv whichgenerates strings wherein the portion containing u’s corresponds to theportion containing v’s.
20 / 39
Example Application
Consider the CFG G = ({S, B}, {a,b}, {S→ aSb | B | ε, B→ bB | b}, S)
B The following are derivations with G:
S ⇒ aSb ⇒ aaSBB ⇒ aaSbBB,S ⇒ aSb ⇒ aaSBB ⇒ aaSBbB,S ⇒ aSb ⇒ aaSBB ⇒ aaSB,S ⇒ aSb ⇒ aaSBB ⇒ aaBB
which shows that derivations in this grammar can be quite complex
B When rewriting the strings aaSBB we can consider further derivations ofeach of its symbols in isolation
B Derivations from B are B⇒ bB⇒ bbB ∗⇒ bk−1B⇒ bk, k ≥ 1
B Therefore S⇒ aSB ∗⇒ aSbk, k ≥ 1
B Hence, L (G) = {anbm | n≤ m}
21 / 39
Example Application
Consider the CFG G = ({S, B}, {a,b}, {S→ aSb | B | ε, B→ bB | b}, S)
B The following are derivations with G:
S ⇒ aSb ⇒ aaSBB ⇒ aaSbBB,S ⇒ aSb ⇒ aaSBB ⇒ aaSBbB,S ⇒ aSb ⇒ aaSBB ⇒ aaSB,S ⇒ aSb ⇒ aaSBB ⇒ aaBB
which shows that derivations in this grammar can be quite complex
B When rewriting the strings aaSBB we can consider further derivations ofeach of its symbols in isolation
B Derivations from B are B⇒ bB⇒ bbB ∗⇒ bk−1B⇒ bk, k ≥ 1
B Therefore S⇒ aSB ∗⇒ aSbk, k ≥ 1
B Hence, L (G) = {anbm | n≤ m}
21 / 39
Example Application
Consider the CFG G = ({S, B}, {a,b}, {S→ aSb | B | ε, B→ bB | b}, S)
B The following are derivations with G:
S ⇒ aSb ⇒ aaSBB ⇒ aaSbBB,S ⇒ aSb ⇒ aaSBB ⇒ aaSBbB,S ⇒ aSb ⇒ aaSBB ⇒ aaSB,S ⇒ aSb ⇒ aaSBB ⇒ aaBB
which shows that derivations in this grammar can be quite complex
B When rewriting the strings aaSBB we can consider further derivations ofeach of its symbols in isolation
B Derivations from B are B⇒ bB⇒ bbB ∗⇒ bk−1B⇒ bk, k ≥ 1
B Therefore S⇒ aSB ∗⇒ aSbk, k ≥ 1
B Hence, L (G) = {anbm | n≤ m}
21 / 39
Example Application
Consider the CFG G = ({S, B}, {a,b}, {S→ aSb | B | ε, B→ bB | b}, S)
B The following are derivations with G:
S ⇒ aSb ⇒ aaSBB ⇒ aaSbBB,S ⇒ aSb ⇒ aaSBB ⇒ aaSBbB,S ⇒ aSb ⇒ aaSBB ⇒ aaSB,S ⇒ aSb ⇒ aaSBB ⇒ aaBB
which shows that derivations in this grammar can be quite complex
B When rewriting the strings aaSBB we can consider further derivations ofeach of its symbols in isolation
B Derivations from B are B⇒ bB⇒ bbB ∗⇒ bk−1B⇒ bk, k ≥ 1
B Therefore S⇒ aSB ∗⇒ aSbk, k ≥ 1
B Hence, L (G) = {anbm | n≤ m}
21 / 39
Ambiguity
B If a CFG generates the same string in several different ways, we say that thestring is derived ambiguously in that grammar
B If a CFG generates some string we say that the grammar is ambiguous
ExampleThe grammar G5, whose rules are
E→ E+E | E ∗E | (E) | a
generates ambiguously some arithmetic expressions
22 / 39
Ambiguous expressions
Two different derivation trees for a+ a∗ a
E
E
E
a
+ E
a
* E
a
E
E
a
+ E
E
a
* E
a
23 / 39
Note
B The grammar G5 does not capture the casual precedence relations and sogroups the + before ∗ and vice versa
B In contrast, the grammar G4 generates the same language, but everygenerated string has a unique derivation treeG4 = ({E, T , F},{a,+, ∗, (, )}, R, E) in which R is:
E → E+T | TT → T ∗F | FF → (E) | a
B Hence, G5 is ambiguous and G4 is not, i.e. G4 is unambiguous
24 / 39
Note
B When a grammar generates a string ambiguously it means that the stringhas two different derivation trees
B However, two different derivations may produce the same derivation treebecause they may differ in the order in which they replace nonterminals, notin the rules they use
B To concentrate on the structure of the derivations we need to fix the order ofrule application
25 / 39
Fixing rule application order
Definition (Leftmost derivation)A derivation of a string w in a grammar G is a leftmost derivation if at every stepthe leftmost nonterminal is replaced.
Definition (Rightmost derivation)
A derivation of a string w in a grammar G is a rightmost derivation if at every stepthe rightmost nonterminal is replaced.
26 / 39
Inherent Ambiguity
B Some CFLs can have both ambiguous and unambiguous grammars.
B Some CFLs, however, can be generated only by an ambiguous grammar.
B A CFL that can be generated only by ambiguous grammars is calledinherently ambiguous.
Example of inherently ambiguous language
{0i1j2k | i = j∨ j = k}
27 / 39
Chomsky Normal Form
B It is often convenient to simplify CFGs so we can reason about them
B One of the simplest and most useful simplified forms of CFGs is called theChomsky Normal Form
B Another normal form usually used in algebraic specifications is Greibachnormal form
28 / 39
Definition
A context-free grammar G is in Chomsky normal form if every rule is of the form
A → BCA → a
where a is a terminal, A, B, C are nonterminals, and B, C may not be the startvariable
NoteThe rule S→ ε, where S is the start variable, is not excluded
29 / 39
Chomsky Normal Form characterizes CFLs
TheoremAny context-free language is generated by a context-free grammar in Chomskynormal form
Proof ideasB Show that any grammar G can be converted into Chomsky normal form
B Conversion procedure has several states where the rules that violateChomsky normal form conditions are replaced with equivalent ones thatsatisfy these conditions
B Order of transformations:1. add a new start variable2. eliminate all ε-rules3. eliminate unit rules4. convert rules
B Check that the obtained grammar defines the same language as the initialone.
30 / 39
Conversion: 1 - introduce new start state
Add a new start symbol S0 and the rule S0→ S where S was the original startsymbol
NoteThis change guarantees that the start symbol does not occur on the rhs of anyrule
31 / 39
Conversion: 2 - eliminate ε-rules
Repeat
1. Eliminate the ε rule A→ ε where A is not the start symbol
2. For each occurrence of A on the rhs of a rule, add a new rule with thatoccurrence of A deletedExample: To delete A→ ε, replace B→ uAv by B→ uAv | uv; replaceR→ uAvAw by B→ uAvAw | uvAw | uAvw | uwv
3. Replace the rule B→ A, (if it is present) by B→ A | ε unless the rule B→ ε
has not been previously eliminated
until all ε rules are eliminated.
32 / 39
Conversion: 3 - remove unit rules
Repeat
1. Remove a unit rule A→ B
2. For each rule B→ u that appears, add the rule A→ u , unless it was a unitrule previously removed
until all unit rules are eliminated.
Noteu is a string of variables and terminals
33 / 39
Conversion: 4 - Convert all remaining rules
Repeat
1. Replace a rule A→ u1, . . . , uk, k ≥ 3, where each ui, 1≤ i≤ k, is avariable or terminal, by:
A→ u1A1, A1→ u2A2, . . . Ak−2→ uk−1uk
where A1, . . . , Ak−2 are new variables
2. If k ≥ 2 replace any terminal ui with a new variable Ui and add the ruleUi→ ui
until no rules of the form A→ u1, . . . , uk with k ≥ 3 remain.
34 / 39
Example CFG conversion
Consider the grammarS → ASA | aBA → B | SB → b | ε
After the first step of transformation we get
S0 → SS → ASA | aBA → B | SB → b | ε
35 / 39
Removing ε rules
Removing B→ ε:S0 → SS → ASA | aB | aA → B | S | εB → b
Removing A→ εS0 → SS → ASA | aB | a | SA | AS | SA → B | SB → b
36 / 39
Removing ε rules
Removing B→ ε:S0 → SS → ASA | aB | aA → B | S | εB → b
Removing A→ εS0 → SS → ASA | aB | a | SA | AS | SA → B | SB → b
36 / 39
Removing unit rule
Removing S→ SS0 → SS → ASA | aB | a | SA | ASA → B | SB → b
Removing S0→ SS0 → ASA | aB | a | SA | ASS → ASA | aB | a | SA | ASA → B | SB → b
37 / 39
Removing unit rule
Removing S→ SS0 → SS → ASA | aB | a | SA | ASA → B | SB → b
Removing S0→ SS0 → ASA | aB | a | SA | ASS → ASA | aB | a | SA | ASA → B | SB → b
37 / 39
More unit rules
Removing A→ BS0 → SS → ASA | aB | a | SA | ASA → S | bB → b
Removing A→ SS0 → ASA | aB | SA | ASS → ASA | aB | a | SA | ASA → b | ASA | aB | a | SA | SAB → b
38 / 39
More unit rules
Removing A→ BS0 → SS → ASA | aB | a | SA | ASA → S | bB → b
Removing A→ SS0 → ASA | aB | SA | ASS → ASA | aB | a | SA | ASA → b | ASA | aB | a | SA | SAB → b
38 / 39
Converting the remaining rules
S0 → AA1 | UB | a | SA | ASS → AA1 | UB | a | SA | ASA → b | AA1 | UB | a | SA | ASA1 → SAU → aB → b
39 / 39