Upload
vladimir-kulyukin
View
58
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
Theory of Computation
Cocke-Younger-Kasami Algorithm
Vladimir Kulyukin
Outline
CFL Acceptance Problem Basic Insight Dynamic Programming Implementation
www.vkedco.blogsot.com
CYK Algorithm’s Problem
Problem: Given a CFG G = (V, T, P, S) and a string x in T*, determine if x is in L(G)?
The Cocke-Younger-Kasami (CYK) algorithm takes a CFG in CNF and a string and determines if S is one of the symbols that derive x
Substring Notation xsl
Let x be a string such that |x|= n ≥ 1 Let xsl be the substring of x of length l that starts at position s,
1≤ s ≤ n and 1≤ l ≤ n For example, if x = aabbabb, then x13 = aab = x[1]x[2]x[3] and
x24 = abba = x[2]x[3]x[4]x[5] In general, if we do 1-based array indexing and the length of
the substring is l, the last available position s at which the substring can start is n – l + 1
For example, if |x| = 4 and l = 2, the possible values for s in xs2 are 1, 2, and 3 = 4 – 2 + 1
CYK Algorithm: Basic Insight
A
B C
xskx(s+k)(l-k)
s s+k s+ls+k-1
xsl
A * xsl iff 1) A BC;2) B * xsk;3) C * x(s+k)(l-k), for some k, 1 ≤ k < l
In other words, to determine if A * xsl there must be a rule A BC and some k, 1 ≤ k < l, for which B * xsk and C * x(s+k)(l-k).
Table D[s, l] CYK is a dynamic programming algorithm that,
given a CNF grammar G = (V, T, S, P) and a string x over a specific alphabet such that |x|= n > 0, incrementally builds a n x n table D (D stands for ‘derives’)
D[s, l] is a set, possibly empty, of symbols A in V such that A * xsl
In other words D[s, l] records all variables in G that derive xsl
Table D[s, l] CYK is a dynamic programming algorithm that,
given a CNF grammar G = (V, T, S, P) and a string x over a specific alphabet such that |x|= n > 0, incrementally builds a n x n table D (D stands for ‘derives’)
D[s, l] is a set, possibly empty, of symbols A in V such that A * xsl
In other words D[s, l] records all variables in G that derive xsl
D[s, l] Initialization
Let G = (V, T, S, P) be a CNF grammar and x be a string such that |x|= n > 0,
Let xsl be the substring of x of length l that starts at position s
If l = 1, then, for each 1≤ s ≤ n, we can check if xs1 can be derived directly from some variable A of G
How? By checking if G has a production A xs1
D[s, l] Initialization
Assume that our CNF grammar is as follows:1. S AB | BC2. A BA | a3. B CC | b4. C AB | a
Assume that the input is x = baaba What does D[s, l] look like?
5 x 5 D[s, l]
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[1,1]
The input is x = baaba The 1st symbol of the input is b Thus, D[1,1] = {A | A b}, where A is
in V There is only one production that
qualifies: B b So D[1,1] = {B}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[2,1]
The input is x = baaba The 2nd symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a, C a
So D[2, 1] = {A,C}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[3,1]
The input is x = baaba The 3rd symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a, C a
So D[3, 1] = {A,C}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[4,1]
The input is x = baaba The 4th symbol of the input is b Thus, D[4,1] = {A | A b}, where A is
in V There is only one production that
qualifies: B b So D[4,1] = {B}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[5,1]
The input is x = baaba The 5th symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a and C a So D[5, 1] = {A,C}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[1,2]
We need to find k, such that 1 ≤ k < 2 and look for productions A BC where B is in D[1,1] and C is in D[2,1]
Since D[1,1] = {B} and D[2,1] = {A, C}, the possibilities for the right-hand sides are {B} x {A, C} = {BA, BC}
The rules that match these possibilities are S BC and A BA
So D[1,2] = {S,A}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[2,2] We need to find k, such that 1 ≤ k <
2, and the rules A BC, where B is in D[2,1] and C is in D[3,1]
Since D[2,1] = {A,C} = D[3,1] = {A,C}, the right-hand side possibilities are AA, AC, CA, CC
There is only one rule that qualifies: B CC
So D[2,2] = {B}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[3,2]
We look for k, such that 1 ≤ k < 2 and rules of the form A BC, where B is in D[3,1] and C is in D[4,1]
D[3,1] = {A,C} and D[4,1] = {B} So the right-hand side (RHS) possibilities
are AB, CB The rules whose RHS’s that match these
possibilities are: S AB and C AB So D[3,2] = {S,C}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B} {S, C}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[4,2]
We look for k, such that 1 ≤ k < 2 and rules of the form A BC, where B is D[4,1] and C is in D[5,1]
V[4,1] = {B}; V[5,1] = {A,C} So the RHS possibilities are BA and BC The rules whose RHS’s that match these
possibilities are: S BC and A BA So D[4,2] = {S,A}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B} {S, C} {S, A}
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[1,3] We look for k, such that 1 ≤ k < 3 and rules
of the form A BC, where, for k = 1, B is in D[1,1] and C is in D[2,2] or where, for k = 2, B is in D[1,2] and C is in D[3,1]
For k = 1, D[1,1] = {B} and D[2,2] = {B}, so there is only one right-hand side possibility: BB
The grammar does not have any productions whose right-hand side is BB
For k = 2, D[1,2] = {S,A} and D[3,1] = {A,C}, so the RHS possibilities are: SA, SC, AA, AC
The grammar does not have any productions whose RHS’s are SA, SC, AA, AC
So D[1,3] = { }
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B} {S, C} {S, A}
{ }
s
1 2 3 4 5
1
2
3
4
5
l
Computing D[2,3] We look for k, such that 1 ≤ k < 3 and rules
of the form A BC, where, if k = 1, B is in D[2,1] and C is in D[3,2] or where, if k = 2, B is in D[2,2] and C is in D[4,1]
For k = 1, D[2,1] = {A,C} and D[3,2] = {S,C}
The RHS possibilities are: AS, AC, CS, CC The only rule that matches is B CC For k = 2, D[2,2] = {B} and D[4,1] = {B} The possibilities are: BB No rules match So D[2,3] = {B}
G’s Productions:
1.S AB | BC
2.A BA | a
3.B CC | b
4.C AB | a
D[s, l] So Far
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B} {S, C} {S, A}
{ } {B}
s
1 2 3 4 5
1
2
3
4
5
l
Rest of D[s, l]
{B} {A, C} {A, C} {B} {A, C}
{S, A} {B} {S, C} {S, A}
{ } {B} {B}
{ } {S, A, C}
{S, A, C}
s
1 2 3 4 5
1
2
3
4
5
l
Is x=baaba Accepted?
Yes, because D[1,5] contains S. It means that S * xsl. In other words, the substring of x that starts at 1 and has a length of 5 is derivable from S.
CYK Algorithm: Pseudocode// Inputs are a string x such that |x| ≥ 1 and a CNF grammar G with no ε-productionsCYK(String x, CNFGrammar G) {
create a n x n table D, where n = |x|;for s from 1 upto n {
D[s, 1] = {A | A → a is in G and a = x[i], i.e., a is the i-th symbol of x}; }
for l from 2 upto n { // l are all possible substring lengths for s from 1 upto n – l + 1 { // s iterates over all possible substring starts
D[s, l] = { }; for k from 1 upto l – 1 { // k iterates over all possible partition positions D[s, l] = D[s, l] U {A | A → BC is a production in G and B is in D[s, k] and C is in D[s+k, l-k]};
} } } if ( S is in D[1, n] ) return true; else return false;}
How & Why CYK Works
CYK runs in O(n3), where |x| = n > 0 Both k and l-k are strictly less than l If we know that each of the two smaller
derivations exists (i.e. B * xsk and C * x(s+k)(l-k)), we can determine if A BC
When we reach l=n, we can determine if S* x1n
References & Reading Suggestions
Hopcroft and Ullman. Introduction to Automata Theory, Languages, and Computation, Narosa Publishing House
Moll, Arbib, and Kfoury. An Introduction to Formal Language Theory
www.youtube.com/vkedco