Upload
phungkhuong
View
236
Download
1
Embed Size (px)
Citation preview
1
Unit 7
Context-free Languages
Reading:Sipser, chapt. 2.1
Hopcroft et. al. chapt. 5
2
Grammar
• Another method of describing languages.
• More powerful than NFA.
• Can describe recursive structures.
• A basic tool in compilation theory.
• The computation is performed by creating a string (parsing): – We start with an empty string and create the string
according to the grammar rules until we have the final output.
• The language of the grammar consists of all possible outputs.
3
The Origin of Grammar
• The origin of the name grammar for this
computational model is in natural languages,
where grammar is a collection of rules.
• This collection defines what is legal in the
language and what is not.
4
Computational Model for Grammars
• The computational model is a collection of
substitution rules over an alphabet and a
set of variables V.
• Every grammar has a start symbol also called
a start variable (usually denoted by S).
• From the start variable we derive a word
using the substitution rules.
5
notation
• We use the notation “” in grammar rules.
• It means :
– can be replaced by
– constructs
– produces
6
Example of a grammar
• ={a,b,c}
• The following grammar generates all strings
over .
SaS (add a)
SbS (add b)
ScS (add c)
S (delete S)
A
7
Grammar
• A collection of substitution rules of the form:
• The symbol A is a variable.
• The string consists of variables and
terminals ().
• The variable S is the start variable.
8
Derivation of a word
1. Write down the start variable.
2. Find a variable A that is written down and a
rule A.
3. Replace the variable A with the string .
4. Repeat steps 2+3 until no variables remain.
9
notation
• We use the notation “” to represent an
actual derivation:
• It means : the string was derived from
using a substitution rule.
10
Production w=aacb
1. SaS can be written
2. SbS
3. ScS
4. S
• How can the word w=aacb be produced?
aacbaacbSaacSaaSaSS)4()2()3()1()1(
Sa | bS | cS |
11
Parsing
• What we did is called parsing a word w according to a given grammar.
• To parse a word or a sentence means to break it into parts that conform to a given grammar.
• We can represent the same production sequence by a parse tree or a derivation tree.
• Each node in the tree is either a letter or a terminal.
• A terminal node is a leaf.
12
Parsing Tree of w=aacbS
Sa
Sa
Sc
Sb
13
Parsing Tree of w=aacb
Or a step by step derivation:
S
Sa
S S
Sa
Sa
14
Parsing w=aacb (cont.)
S
Sa
Sa
Sc
Sb
S
Sa
Sa
Sc
Sb
S
Sa
Sa
Sb
Another example
expr expr + term | term
term term factor | factor
factor ( expr ) | a
• What are the terminals?
• Parse the string: a + a a (is it unique?)
• Parse the string: (a + a) a
Solution: in class 15
16
Context-Free Grammar (CFG): Formal Definition
A context-free grammar (CFG) G is a 4-tuple
G=(V, , S, R), where
1. V is a finite set called the variables.
2. is a finite set, disjoint from V, called the
terminals.
3. S is a start symbol.
4. R is a finite set of production rules, with each
rule being a variable and a string of variables
and terminals: A, AV and (V)*
דקדוק חסר הקשר
17
Derivation in CFG
• Let , and be strings of variables and
terminals
• If A is a rule in the grammar, we say that
A derives , written A .
• We write x y if there exists a sequence
x1, x2, ..xk, k0 and x x1 x2 ... y .
means derives in one step
means derives in one or more steps
means derives in zero or more steps
*
*
+
18
The language of CFG
• The language of the grammar is
L(G) = {w* | S * w}
• The language generated by a CFG is called a
context-free language (CFL). שפה חסרת הקשר
19
Derivation steps
• In fact,
L(G) = {w* | S + w}
• Because a derivation with zero steps produces
only S.
• S is not a string over *, so it can't belong to L.
Common Notations
• Terminals: Lower case, lower alphabet (a, b, c).
• Variables: Upper case, higher alphabet (A, B, C).
• String of terminals: Lower case, higher alphabet
(u, v, w).
• Mixed strings (terminals + variables): Lower case,
Greek letters (, , )
• Terminal or variable: Upper case, higher alphabet
(X, Y, Z)
20
21
Examples over ={0,1}
• Construct a grammar for the following language: L = {0,00,1}
• G = (V={S},={0,1},S, R)
• R: S 0
S 00
S 1
• Alternatively
S 0 | 00 | 1
When a variable has various
production rules, they can all
be written in one line.
Different rules are separated
by the symbol ‘|‘.
22
Examples over ={0,1}
• Construct a grammar for the following
language L = {0n1n |n0}
• G = (V={S},={0,1},S, R) where R:
S0S1
S
Alternatively
S0S1 |
23
Examples over ={0,1}
• Construct a grammar for the following
language
L = {0n1n |n1}
• G = (V={S},={0,1},S, R) where R:
S 0S1 | 01
24
Examples over ={0,1}
• Construct a grammar for the following
language
L = {0*1+}
• G = (V={S,B},={0,1},S, R) where R:
S 0S | 1B
B 1B |
What about 0*1* ?
25
Examples over ={0,1}
• Construct a grammar for the following
language
L = {02i+1 | i0}
• G = (V={S},={0,1},S, R) where R:
S 0 | 00S
26
Examples over ={0,1}
• Construct a grammar for the following
language
L = {0i+11i | i0}
• G = (V={S},={0,1},S, R) where R:
S 0 | 0S1
27
Examples over ={0,1}
• Construct a grammar for the following
language
L = {w| w* and |w| mod 2 = 1}
• G = (V={S},={0,1},S, R) where R:
S 0 | 1| 1S1| 0S0 |1S0 | 0S1
Let‟s parse: 011100101
28
Examples over ={0,1}
• Construct a grammar for the following language
L = {0n1n |n1} {1n0n | n0}
• G = (V={S,A,B},={0,1},S, R) where R:
S A | B
A 0A1 | 01
B 1B0 |
29
Exercise
Construct grammars for the following languages over ={0,1}
1. L1= {w | #1(w) is even}
2. L2= {w | #1(w) is odd}
3. L3= {w| #1(w) = #0(w)}
4. L4= {0n10m10n+m | n,m 0}
Solution: In class
30
From a Grammar to a CFL
• Give a description of L(G) for the following
grammar:
S 0S0 | 1
• L(G) = {0n10n | n0}
31
From a Grammar to a CFL
• Give a description of L(G) for the following
grammar:
S 0S0 | 1S1 |
• L(G) = {The even-length palindromes over
={0,1}}
or
• L(G) = {wwR| w*}
32
From a Grammar to a CFL
• Give a description of L(G) for the following
grammar:
S 0A | 0B
A1S
B1
• L(G) = {(01)n |n1 }
• Simpler version S 01S | 01
33
From a Grammar to a CFL
• Give a description of L(G) for the following
grammar:
S 0S11 | 0
• L(G) = {0n+112n |n0 }
34
From a Grammar to a CFL
• Give a description of L(G) for the following grammar:
S E | NE
N D | DN
D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
E 0 | 2 | 4 | 6
• L(G) = {w | w represents an even octal number}
35
From a Grammar to a CFL
• Give a description of L(G) for the following
grammar:
S N.N | -N.N
N D | DN
D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• L(G) = {w | w represents a decimal rational
number (that has a finite representation) }
36
Exercise
Give a description for L(G) for each of the following
grammars over ={a,b,$} :
G1: S aSb | A
A Aa |
G2: S aSb | SS |
G3: S aSa | bSb | aS | bS | $
Solution: In class
37
Exercise
Give a description for L(G) for the following
grammars over ={a} :
EE+E | E*E | T
T0|1|2|..|9
• Let‟s parse the string 3+4*5
– E E+E T+E 3+E 3+E*E * 3+4*5
– E E*E E*T E*5 E+E*5 * 3+4*5
38
E
EE
3
+
EE *
4
E
5
EE *
EE +
43
T
T
5
T
T T
T
• The string 3+4*5 can be produced in several ways:
Exercise (cont.)
EE+E | E*E | T
T0|1|2|..|9
39
• So if we use this grammar to produce a
programming language then we will have
several computations for 3+4*5.
• There is no precedence of „‟ over the „+‟.
• This language will be impossible to use
because the user won't know which
computation the compiler uses.
• Two possible results: 35 or 23.
Exercise (cont.)
40
Ambiguity • The ability of grammar to generate the same
string in several ways is called ambiguity.
• A grammar is ambiguous if there exists a
string w that can be derived by at least two
different parse trees.
• Sometimes it is possible to find for an
ambiguous grammar an unambiguous one
defining the same language.
• Some CFL are inherently ambiguous
– Example: {aibjck | i=j or j=k}
41
Finite Languages
Theorem: Any finite language cab be
constructed by a CFG.
Proof:
• Let L={wi | in and wi*} be a finite
language over .
• We construct the following grammar:
Sw1
..
Swn
42
Regular Languages
• Question: Are the regular languages cab be
constructed by CFG?
• Answer: in the following.
43
The Regular Grammar
A grammar is called regular if each
production has one of the following forms:
Aw
or
AwB
where w* and A,BV.
44
Regular grammar example
Example
S 012
S 0A
A 0A
A 0
The Regular Grammar
• Theorem: The set of languages that have a
regular grammar is the set of regular
languages.
• Proof: Soon
• Idea: Given an NFA we will create an
equivalent regular grammar. Given a regular
grammar we will build an equivalent NFA.
45
The Regular Grammar
• Conclusion: The regular languages is a
proper subset of the context-free languages.
46
47
Examples of regular grammars
Construct a regular grammar for the following
regular expressions:
L1= 0*
Regular grammar:
S 0S |
L2= (0+1)+
Regular grammar:S 0S | 1S | 0 | 1
L3= 0*+1*
Regular grammar:
S A | B
A 0A |
B 1B |
L4= (01)+
Regular grammar:S 01S | 01
48
From NFA toRegular Grammars
Lemma: A regular grammar can be
constructed for any NFA.
The basic idea (no proof):
Translation of transition functions of an
automaton to rules in a regular grammar.
49
Algorithm: from NFA to RG
1. Rename all states of NFA to a set of capital letters.
2. Name the start state of the NFA S.
3. Translate each transition
(A,)=B into the rule AB
and
(A,)=B into the rule AB.
4. Add the rule A for each accepting state A in the NFA.
50
Example:
Denote q0 by S and q1 by A
The regular grammar is:
S 0S | 0A | 1A |
A 0A | 1S
q0q1
0,1
1
0 0
51
Example:
Denote q0 by S and q1 by A
The regular grammar is:
S 0S | 0A | 1A |
A 0A | 1S
S A
0,1
1
0 0
52
From RG to NFA
Lemma: A NFA can be constructed for any
regular grammar G.
The basic idea (no proof):
Construction of an NFA that accepts the
language of the given regular grammar.
53
Algorithm: from RG to NFA
1. Transform all rules of the grammar to be in a
simple regular form:
Ac or
AcB
where c and A,BV
(this can be done by adding variables).
54
Algorithm (cont.)
2. The start state of the NFA is the
grammar's start symbol S.
3. For each rule:
If AcB construct a state transition
from A to B labeled c.
If AB construct a state transition
from A to B labeled .
A Bc
A B
55
4. For each rule Ac , c , add a new
accepting state F and construct a state
transition from A to the new state F labeled c.
Algorithm (cont.)
Ac
F
56
Example:
Input:
S 0S | 11A
A 1A | 0
New grammar:
S 0S | 1B
B 1A
A 1A | 0
Resulting NFA:
S1
FB A1
1
0
0
57
Today’s Topics:
• Operations over Grammars:- Union- Concatenation- Kleene Star
• Simplified Grammars
• Chomsky Normal Forms
• Chomsky Hierarchy
58
Operations over GrammarsWe can perform the following operations over
grammars:
1. Union
2. Concatenation
3. Kleene star
Corollary: The context-free languages are closed under the above operations.
Note: CFL are not close under complement or under intersection (proof: in class).
(Are regular grammars closed under complement / intersection?)
59
Union
Given two languages and their grammars:
• L1 with G1= (V1,1,S1,R1) and
• L2 with G2 = (V2,2,S2,R2)
Such that V1V2 = ,
we construct their union by merging their grammars:
G = (V1V2{S}, 12, S, R1R2{SS1|S2})
Proof idea: The rule SS1 | S2 enables a string w to be derived either from S1 or from S2.
60
Concatenation
Given two languages and their grammars:
• L1 with G1= (V1,1,S1,R1) and
• L2 with G2 = (V2,2,S2,R2)
Such that V1V2 =
we construct their concatenation :
Gcon = (V1V2{S}, 12, S, R1R2{SS1S2})
Proof idea: The rule SS1S2 enables the creation of a string w=uv where u can be derived from S1 and v from S2.
61
Kleene starGiven a language L1 and its grammar:
G= (V1,1,S1,R1),
then G* is the grammar for L1*:
G* = (V1 {S}, 1, S, R1 {SS1S | })
Proof idea:
• The rule SS1S means that a word w in L(G*) is built of two parts w=uv such that u is derived from S1 and v is derived from S.
• The rule S means a final derivation of S or derivation of the string.
62
Simplified Grammars
• Every context-free grammar can be rewritten
in a simplified form.
• A simplified form of the grammar is a
grammar that
– doesn't have rules
and
– doesn't have unit rules.
• An rule is a rule of the form: A.
• A unit rule is a rule of the form: AB.
Step 1: Removing rules
• A context-free language that does not contain
can be written without rules.
• If L then remove from L.
• Build a simplified form CFG without rules.
• Add a rule S'S | .
63
64
Algorithm for removing rules
1. Find an rule A (A S) and remove it from R.
2. For each rule in R of the form BA where
,(V)*, add to R the rule B.
– Note: We do so for each occurrence of A, e.g.
for BAA we add BA | A | .
– Note: For a rule BA, we add a new rule B
unless this rule has already been removed
through this process.
3. Repeat from step 1 until we eliminate all rules.
65
S aBBAC
A
B
Removing A
S aBBAC | aBBC
B
Removing B
S aBBAC | aBBC
S aBAC | aAC | aBC | aC
Example 1:
66
S aAB
AaA | B
BbB |
Removing B
S aAB | aA ; AaA | B | ; BbB | b
Removing A
S aAB | aA | aB | a
AaA | B | a
BbB | b
Example 2:
67
S aA
A
Removing A
S aA | a
• It is obvious that the first rule can‟t be used to
derive any word, so it can be deleted.
• The minimized grammar is: S a
Example 3:
68
Example 4:
S aBaC
B bB | C
C cC |
Delete C
S aBaC | aBa
B bB | C |
C cC | c
69
Example 4 (cont.):
S aBaC | aBa
B bB | C |
C cC | c
Delete B
S aBaC | aBa | aaC | aa
B bB | C | b
C cC | c
70
Step 2: Removing Unit Rules
A context-free grammar that contains unit rules can be rewritten without unit rules.
Algorithm for removing unit rules:
1. For each unit rule AB , remove this rule from R and add all productions of B to A:
For each B in R add the rule A
2. Repeat step 1 until all unit rules are removed.
71
Example:
SA | b
A B | b
B bB | a
First we will eliminate AB unit rule:
SA | b
AbB | a | b
B bB | a
72
Example (cont.):
SA | b
AbB | a | b
B bB | a
Next we will eliminate SA unit rule:
SbB | a | b
AbB | a | b
B bB | a
73
Example (cont.):
SbB | a | b
AbB | a | b
B bB | a
• A is not reachable from S and can be
removed. The resulting grammar is:
SbB | a | b
B bB | a
74
Chomsky Normal Form
• Any context-free grammar can be written in a
special form called Chomsky Normal Form
(CNF).
Definition
A CFG is in CNF if every rule is of the form:
ABC or
A
where , A,B,CV, and B,CS.
• If a language contains then the S is allowed.
Noam Chomsky (from Wikipedia)
An American linguist, philosopher, cognitivescientist, political activist, author, andlecturer. He is an Institute Professor andprofessor emeritus of linguistics at theMassachusetts Institute of Technology.
75
Chomsky is well known in the academic and scientificcommunity as one of the fathers of modern linguistics. Sincethe 1960s, he has become known more widely as a politicaldissident, an anarchist, and a libertarian socialistintellectual. Chomsky is often viewed as a notable figure incontemporary philosophy.
76
Chomsky Normal Form (cont.)
A grammar in Chomsky Normal Form has several
properties and usages:
– Any string of length n can be derived in 2n-1
steps.
– The parsing tree is a binary tree.
S
B
a
A
DC
77
Chomsky Normal Form (cont.)
Theorem: Any context free languages can be
generated by CNF grammar.
Converting CFG to CNF
1. Add a new start symbol S' and the rule S'S
to CFG.
2. Remove all -rules.
3. Remove all unit rules.
78
4. Convert all remaining rules into a proper
form:
– 4.1 Replace each terminal in a rule whose right-
hand side has two or more symbols with variable
A and add a rule A to CFG.
– 4.2 For each rule of the form AB1B2..Bn where
n2 replace it with the two following rules:
AB1C and C B2..Bn
– 4.3 Repeat step 4 until all rules have the proper
form (right-hand side of length2).
79
Example
Write the following grammar in CNF.
S A | 0B0
A S | 1
B A | 0
80
Example (cont.)
• add an S'S rule.
• There are no -rules.
• Unit rules are SA, AS, BA, S'S.
• We start from AS
A 1 (old rule)
A A | 0B0 (new rules : the rule AA has
no meaning) so we leave A 0B0 | 1
81
• Next, eliminate BA.
B 0 (old rule)
B 0B0 | 1 (new rule)
together: B 0B0 | 1 | 0
Example (cont.)
82
• Next, eliminate SA.
S 0B0 (old rule)
S 0B0 | 1 (new rule)
together: S 0B0 | 1
• Next, eliminate S'S.
S' 0B0 | 1 (new rule)
Example (cont.)
83
• Throw away the rules of S and A since S and A are not reachable from S'.
S'1 | 0B0
B 0B0 | 1 | 0
• Write all rules in Chomsky form:
S'1 | S0BS0
B S0BS0 | 1 | 0
S0 0
Example (cont.)
84
• Replace the S0BS0 right side with S0C and
CBS0
• The resulting final grammar is:
S'1 | S0C
B S0C | 1 | 0
CBS0
S0 0
Example (cont.)
Chomsky Hierarchy
Chomsky hierarchy consist of 4 types of
grammars:
1. Regular (type 3)
2. Context-free (type 2)
3. Context-sensitive (type 1)
4. Recursively enumerable (type 0)
85
Chomsky Hierarchy (cont.)
Regular grammars:
– Restricted to rules as:
Sa or SaB
where a and S,BV
(different from our definition - a*)
• Generates regular languages.
• Can be decided by a FA .
86
Chomsky Hierarchy (cont.)
Context-free grammars:
– Restricted to rules as:
A
where AV and (VU)*
• Generates context-free languages.
• Can be decided by a pushdown automaton
(PDA).
87
Chomsky Hierarchy (cont.)
Context-sensitive grammars:
– Restricted to rules as:
α A β α γ β
AV and α , β, γ (VU)*
• Generates context-sensitive languages.
• Can be decided by a linear-bounded
nondeterministic Turing machine BTM.
88
89
Example:
L={anbn| n>=1}
S aSBC | abC
CB BC
bB bb
bC b
S
a S B C
a a b C B C
a a b B C
a a b b C
a a b b
Chomsky Hierarchy (cont.)
Recursively enumerable grammar:
– No restrictions on rules
• Generates recursively enumerable languages RE.
• Can be decided by a Turing machine.
90