Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Suffix and Factor Automataand Combinatorics on Words
Gabriele Fici
Workshop PRIN 2007–2009Varese – 5 September 2011
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Suffix Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86)The Suffix Automaton of the word w is the minimaldeterministic automaton recognizing the suffixes of w .
ExampleThe SA of w = aabbabb
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Algorithmic Construction
The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).
The SA has several applications, for example in
pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Algorithmic Construction
The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).
The SA has several applications, for example in
pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
One Way to Build the SA
Build a naif non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
One Way to Build the SA
Build a naif non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.
Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.
Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Ending Positions
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Remarku ∼SA v if and only if for any z ∈ Σ∗ one has
uz ∈ Suff(w)⇐⇒ vz ∈ Suff(w)
RemarkFact(w)/ ∼SA is in bijection with the set of states of the SA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left special
b is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left special
b is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left specialb is right special
a and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left specialb is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
Theorem (G. Fici 09)
|SA(w)| = |w |+ 1 + SLw − Pw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
SLw = 5 since the left special factors of w are ε,a,b,ab,abb
Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL
w − Pw = 7 + 1 + 5− 2 = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the SA
Theorem (G. Fici 09)
|SA(w)| = |w |+ 1 + SLw − Pw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
SLw = 5 since the left special factors of w are ε,a,b,ab,abb
Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL
w − Pw = 7 + 1 + 5− 2 = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Example
Theorem
|SA(w)| = |w |+ 1 + SLw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.
If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Example
Theorem
|SA(w)| = |w |+ 1 + SLw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.
If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Standard Sturmian words
A standard Sturmian word is the cutting sequence of a straightline of irrational slope starting from the origin on the discreteplane.
LemmaA right infinite binary word w is a standard Sturmian word if andonly if the left special factors of w are prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Binary Words
Let fw denote the factor complexity of w , i.e., the functioncounting the number of distinct factors of w of each length.
A binary word is a word w such that fw (1) = 2, i.e., having 2distinct factors of length 1.
Lemma
Let w be a binary word. Then SLw = |w | − Hw .
SLw = number of left special factors of w
Hw = length of the shortest unrepeated prefix of w
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w |+ 1− (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:
CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w |+ 1− (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:
CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
Hw = 2 since aa occurs only once in wPw = 2 since a is left special in w
|SA(w)| = 2 · 7 + 1− (2 + 2) = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε,a,b} ∪ {ε,a}
Ew = |SA(w)|+ |G(w)| − 1 = 11 + 3− 1 = 13
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
Example
Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.
φ(F ) = abcaababcabcaababcaababcabc · · ·
So:
The set of factors of an LSP word is not closed under reversal,in general.
Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Class of LSP Words
Example
Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.
φ(F ) = abcaababcabcaababcaababcabc · · ·
So:
The set of factors of an LSP word is not closed under reversal,in general.
Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Factor Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86)The Factor Automaton of the word w is the minimaldeterministic automaton recognizing the factors of w .
ExampleThe FA of w = aabbabb
0 1 2 3 4 5 6 7
3′
a a b b a b b
ba
b
b
0-0
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Comparison Between the SA and the FA
Example (w=aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
0 1 2 3 4 5 6 7
3′
a a b b a b b
ba
b
b
0-0
States 3 and 3′′ and states 4 and 4′′ have been identifiedGabriele Fici Suffix and Factor Automata and Combinatorics on Words
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Future
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Remarku ∼FA v if and only if for any z ∈ Σ∗ one has
uz ∈ Fact(w)⇐⇒ vz ∈ Fact(w)
RemarkFact(w)/ ∼FA is in bijection with the set of states of the FA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Inclusion
RemarkIf u ∼SA v, then u ∼FA v.
The converse is not true:
Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Inclusion
RemarkIf u ∼SA v, then u ∼FA v.
The converse is not true:
Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Inclusion
LSA ⊂ LFA
Example
w = abcc
We have |SA(w)| = 6 > |w |+ 1, so w /∈ LSA
Nevertheless |FA(w)| = 5 = |w |+ 1, so w ∈ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
Lemma (A. Blumer et al. 84)The SA-classes that are identified by the FA-equivalencecorrespond to the prefixes x of the longest repeated suffix of wsuch that |x | ≥ |stem(w)|.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
So we can define a new parameter:
Definition
SKw =
|stem(w)| if stem(w) is defined
Kw otherwise
Kw = length of the shortest unrepeated suffix of w
This allows us to derive a formula for |FA(w)|:
Theorem
|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Number of States of the FA
So we can define a new parameter:
Definition
SKw =
|stem(w)| if stem(w) is defined
Kw otherwise
Kw = length of the shortest unrepeated suffix of w
This allows us to derive a formula for |FA(w)|:
Theorem
|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Examples
ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Examples
ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Remarks
Take |Σ| = 2.
DefinitionA word w is trapezoidal if it has at most n + 1 factors of length n
DefinitionA word w is rich if it contains |w |+ 1 palindromic factors
We have:
Proposition (A. de Luca 99)w Sturmian⇒ w trapezoidal
Proposition (A. de Luca, A. Glen and L.Q. Zamboni 08)w trapezoidal⇒ w rich
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Remarks
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Remarks:
w is not balanced, since aa,bb ∈ Fact(w)
w is not trapezoidal, since it has four factors of length 2
w is not rich, since it contains only 10 = |w | palindromes:ε,a,b,aa,bb,aba,bab,abba,baab,abaaba
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Conclusions and Future Work
We gave a characterization of the words in LSA and LFA.
In agenda:
Investigate LSP words.
Apply an analogous approach to other data structures, e.g.suffix tree, suffix array, etc.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
G. Fici (2009)Combinatorics of Finite Words and Suffix AutomataProc. of the 3rd International Conference on AlgebraicInformatics. LNCS 5725: 250–259
G. Fici (2010)Factor Automata and Special FactorsProc. of the 13th Mons Theoretical Computer Science Days
G. Fici (2011)Special Factors and the Combinatorics of Suffix and FactorAutomataTheoret. Comput. Sci. 412(29): 3604–3615
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
Thank you!
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words