16
Chapter 4 Normal Forms for CFGs

Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

  • View
    247

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

Chapter 4

Normal Forms for CFGs

Page 2: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

2

4.5 Chomsky Normal Form Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal

form if each rule in G has one of the following forms:

i) A BC

ii) A a

iii) S

where A, B, C V and B, C V - {S} and a

A simplified normal form which restricts the length & composition of the R.H.S. of a rule in CFG

The derivation tree for a string generated by a CFG in chomsky normal form is a binary tree

Page 3: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

3

4.5 Chomsky Normal Form Theorem 4.4.2 Let G = (V, , P, S) be a CFG. There is an

algorithm to construct a grammar G’ = (V’, ’, P’, S’) in chomsky normal form that is equivalent to G

Proof (sketch):

(i) For each rule A w, where |w| > 1, replace each terminal a in w by a distinct variable Y & create new rule Y a

(ii) For each modified rule X w’, w is either a terminal or a string in V+. Rules in the latter form must be broken

into a sequence of rules, each of whose R.H.S. consists of two variables.

Example 4.4.1

One of the applications of using CFGs that are in Chomsky Normal Form

Constructing binary search trees to accomplish “optimal” time & space search complexity for parsing an input

string

Page 4: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

4

4.1 Grammar Transformations Lemma 4.1.1 Let G = (V, , P, S) be a CFG. There is a

CFG G’ = (V’, , P’, S’) that satisfies i) L(G) = L(G’)

ii) Rules in P’ are of the form

A w

where A V’ and w ((V – {S’}) )*.

Proof. If S is a recursive variable, then construct G’ by creating a new start symbol S’ & adding S’ S to P’, i.e.,

G’ = (V {S’}, , P {S’ S}, S’).

If S u, then S’ S u, where u *.

Example. 4.1.1 Assume that P in G includes

S aS | AB | AC,

then P’ in G’ should include S’ S, S aS | AB | AC

G

*G’

G’

*

Page 5: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

5

4.2 Elimination of -rules Nullable variables are variables that can derive .

A grammar w/o nullable variables is called non-contracting grammar since the application of a rule cannot decrease the length of the sentenial form.

It is desirable to avoid the generation of (nullable) variables that are subsequently removed by -rules.

The removal of nullable variables in a grammar guarantees that during process of the deriving a terminal string, each variable generates terminal symbol(s).

The derivation of a terminal string in a grammar G is more cost-effective if G is noncontrating than if G

is contracting.

Page 6: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

6

4.2 Elimination of -rules Algorithm 4.2.1 Construction of Sets of Nullable Variables

Input: A CFG G = (V, , P, S)

Output: Set of Nullable Variables

1. NULL := { A | A P }

2. Repeat

2.1. PREV := NULL

2.2. For each variable A V do

If A w, where w PREV* do

NULL := NULL { A }

Until NULL = PREV.

(Note: w PREV* indicates that A ( w) produces entirely nullable variables)

Page 7: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

7

Example 4.2.1 Given the following CFG

S ACA A aAa | B | C

B bB | b C cC |

Using Algorithm 4.1.2, the set of nullable variables can be computed

4.2 Elimination of -rules

Iteration NULL PREV

0 { C }

1 { A, C} { C }

2 { S, A, C} { A, C}

3 { S, A, C} { S, A, C}

Page 8: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

8

4.2 Elimination of -rules An essentially noncontracting grammar G

● includes S , if L(G)

● excludes any -rules

● yields only noncontracting derivations, with the exception of S .

Theorem 4.2.3 Let G = (V, , P, S) be a CFG. There is a CFG GL = (VL, , PL, SL) w/o -rules that satisfies

● L(GL) = L(G).

● SL is not a recursive variable.

● A PL iff L(G) & A = SL.

Page 9: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

9

4.2 Elimination of -rules Constructing GL from G.

(i) SL is not a recursive variable. Any recursive start variable can be made nonrecursive according to Lemma 4.1.1.

(ii) If L(G), then SL PL.

(iii) Delete all the -rules in G, except SL , from PL as follows:

If A w P, w = w1A1w2A2 … wkAkwk+1 , and A1, …, Ak are nullable variables, then create 2k - 1 new rules in PL, i.e., a set of 2k possible combinations with the inclusion and exclusion of A1, A2, … , Ak.

Example 4.2.2 { S, A, C } are nullable variables of G, where

G: S ACA GL: S ACA | AC | CA | AA | A | C |

A aAa | B | C A aAa | aa | B | C

B bB | b B bB | b

C cC | C cC | c

Page 10: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

10

4.2 Elimination of -rules Example 4.2.3 Let G be the grammar

S ABC

A aA |

B bB |

C cC |

G generates a*b*c*. The nullable variables of G are S, A, B, and C.

The equivalent grammar of G w/o -rules is GL, where

S ABC | AB | AC | BC | A | B | C | A aA | a

B bB | b

C cC | c

Page 11: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

11

4.3 Elimination of Chain rules Definition. A rule of the form A B, where A, B V, which

simply renames a variable in a derivation, is a chain rule.

The removal of chain rules increase the number of rules in the grammar, but reduce the length of derivations

Example. Consider the set of rules P

A aA | a | BB bB | b | C

Eliminating the chain rule A B by (i) adding A w, for every rule B w, and (ii) delete A B.

Hence, P is modified as

A aA | a | bB| b | CB bB | b | C

However, another chain rule A C was created.

Page 12: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

12

4.3 Elimination of Chain rules Algorithm 4.3.1. Construction of the Set CHAIN(A)

Input: A CFG G = (V, , P, S) and variable A

Output: The set of chain rules of A

1. CHAIN(A) := { A }

2. PREV :=

3. Repeat

3.1. NEW := CHAIN(A) – PREV

3.2. PREV := CHAIN(A)

3.3. For each variable B NEW do

For each rule B C do

CHAIN(A) := CHAIN(A) { C }

Until CHAIN(A) = PREV.

Page 13: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

13

4.3 Elimination of Chain rules Theorem 4.3.3 Let G = (V, , P, S) be a CFG. There

is a CFG GC = (V, , PC, S) that satisfies

1) L(GC) = L(G).

2) GC has no chain rules.

Proof. Using P & CHAIN(A), we compute the A rules in GC. The rule A w is in PC if variable B & string w ..

i) B CHAIN(A)

ii) B w P

iii) w V (ensures that PC does not contain chain rules).

Let w L(G) & A B be a maximal sequence of chain rules used to derive w, which can be generated by

S uAv uBv upv w, and S uAv upv w

where B p P is not a chain rule.

G

*

G

*G

*G

G

*Gc

*Gc

Gc

*

Page 14: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

14

4.3 Elimination of Chain rulesExample 4.3.1. Given the following CFG

S ACA | CA | AA | AC | A | C | A aAa | aa | B | C

B bB | b C cC | c

Using Algorithm 4.3.1, we generate

CHAIN(S) = { S, A, C, B } CHAIN(A) = { A, C, B } CHAIN(B) = { B } CHAIN(C) = { C }

Using these CHAIN sets, we generate GC, where PC GC is

S ACA | CA | AA | AC | aAa | aa | bB | b | cC | c | A aAa | aa | bB | b | cC | c B bB | b C cC | c

Page 15: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

15

4.7 Removal of Direct Left Recursion Recursion is necessary to generate strings of arbitrary length.

Left recursion causes problems, not recursion in general

Directly left recursive rules (e.g. A Aa) introduce the possibility of unending computations since

Repeated applications of them fail to generate a prefix that can terminate the parse.

To avoid the possibility of a non-terminating parse, directly left recursive rules must be removed.

Example.

Direct left-recursive rules No direct left-recursive rules

G1: A Aa | b

G2: A Aa | Ab | b | c

G3: A AB | BA | a

G1’: A bZ | b , Z aZ | a

G2’: A bZ | cZ | b | c Z aZ | bZ | a | b

B b | c

G3’: A BAZ | aZ | BA | a

Z BZ | BB b | c

Page 16: Chapter 4 Normal Forms for CFGs. 2 4.5 Chomsky Normal Form n Defn 4.4.1 A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of

16

4.7 Removal of Direct Left Recursion

The removal of any direct left recursion requires the addition of a new variable V, which introduces a set of directly right recursive rules

To remove direct left recursion on variable A, A is divided into directly left recursive rules: A Au1 | Au2 | … | Auj

non-directly left recursive rules: A v1 | v2 | … | vk, where A is not the first symbol in vi (1 i k)

Strategy: build a string in a left-to-right manner by applying (i) non-recursive rules first, followed by (ii) constructing the remaining symbols by right recursion.

A v1 | … | vk | v1Z | … | vkZ

Z u1 | … | uj | u1Z | … | ujZ

Example 4.5.1 Given the rules A Aa | Aab | bb | b

The A rules are: A bb | b | bbZ | bZ

The Z rules are: Z a | ab | aZ | abZ