68
Suffix and Factor Automata and Combinatorics on Words Gabriele Fici Workshop PRIN 2007–2009 Varese – 5 September 2011 Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Suffix and Factor Automata and Combinatorics on Words

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Suffix and Factor Automata and Combinatorics on Words

Suffix and Factor Automataand Combinatorics on Words

Gabriele Fici

Workshop PRIN 2007–2009Varese – 5 September 2011

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 2: Suffix and Factor Automata and Combinatorics on Words

The Suffix Automaton

Definition (A. Blumer et al. 85 — M. Crochemore 86)The Suffix Automaton of the word w is the minimaldeterministic automaton recognizing the suffixes of w .

ExampleThe SA of w = aabbabb

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 3: Suffix and Factor Automata and Combinatorics on Words

Algorithmic Construction

The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:

Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).

The SA has several applications, for example in

pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 4: Suffix and Factor Automata and Combinatorics on Words

Algorithmic Construction

The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:

Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).

The SA has several applications, for example in

pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 5: Suffix and Factor Automata and Combinatorics on Words

One Way to Build the SA

Build a naif non-deterministic automaton:

w = aabbabb

0 1 2 3 4 5 6 7a a b b a b b

Determinize by subset construction:

{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}

{3, 6} {4, 7}

{3, 4, 6, 7}

a a b b a b b

b

b

a

b

b

a

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 6: Suffix and Factor Automata and Combinatorics on Words

One Way to Build the SA

Build a naif non-deterministic automaton:

w = aabbabb

0 1 2 3 4 5 6 7a a b b a b b

Determinize by subset construction:

{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}

{3, 6} {4, 7}

{3, 4, 6, 7}

a a b b a b b

b

b

a

b

b

a

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 7: Suffix and Factor Automata and Combinatorics on Words

Ending Positions

We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).

Examplew = a a b b a b b

1 2 3 4 5 6 7

Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.

Define on Fact(w) the equivalence:

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 8: Suffix and Factor Automata and Combinatorics on Words

Ending Positions

We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).

Examplew = a a b b a b b

1 2 3 4 5 6 7

Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.

Define on Fact(w) the equivalence:

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 9: Suffix and Factor Automata and Combinatorics on Words

Ending Positions

u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)

Remarku ∼SA v if and only if for any z ∈ Σ∗ one has

uz ∈ Suff(w)⇐⇒ vz ∈ Suff(w)

RemarkFact(w)/ ∼SA is in bijection with the set of states of the SA of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 10: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = |Fact(w)/ ∼SA |

The bounds are well known:

|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1

The upper bound is reached for w = ab|w |−1, with a 6= b.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 11: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = |Fact(w)/ ∼SA |

The bounds are well known:

|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1

The upper bound is reached for w = ab|w |−1, with a 6= b.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 12: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = |Fact(w)/ ∼SA |

The bounds are well known:

|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1

The upper bound is reached for w = ab|w |−1, with a 6= b.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 13: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = |Fact(w)/ ∼SA |

The bounds are well known:

|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1

The upper bound is reached for w = ab|w |−1, with a 6= b.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 14: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

The number of states (classes) of the SA of w is therefore

|SA(w)| = |Fact(w)/ ∼SA |

The bounds are well known:

|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1

The upper bound is reached for w = ab|w |−1, with a 6= b.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 15: Suffix and Factor Automata and Combinatorics on Words

Special Factors

Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w

v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb

ab is left special

b is right speciala and b are bispecial

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 16: Suffix and Factor Automata and Combinatorics on Words

Special Factors

Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w

v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb

ab is left special

b is right speciala and b are bispecial

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 17: Suffix and Factor Automata and Combinatorics on Words

Special Factors

Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w

v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb

ab is left specialb is right special

a and b are bispecial

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 18: Suffix and Factor Automata and Combinatorics on Words

Special Factors

Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w

v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w

v is a bispecial factor of w if it is both left and right special

Example

w = aabbabb

ab is left specialb is right speciala and b are bispecial

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 19: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

Theorem (G. Fici 09)

|SA(w)| = |w |+ 1 + SLw − Pw

SLw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

Example (w = aabbabb)

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

SLw = 5 since the left special factors of w are ε,a,b,ab,abb

Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL

w − Pw = 7 + 1 + 5− 2 = 11

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 20: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the SA

Theorem (G. Fici 09)

|SA(w)| = |w |+ 1 + SLw − Pw

SLw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

Example (w = aabbabb)

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

SLw = 5 since the left special factors of w are ε,a,b,ab,abb

Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL

w − Pw = 7 + 1 + 5− 2 = 11

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 21: Suffix and Factor Automata and Combinatorics on Words

Example

Theorem

|SA(w)| = |w |+ 1 + SLw − Pw

Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.

If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 22: Suffix and Factor Automata and Combinatorics on Words

Example

Theorem

|SA(w)| = |w |+ 1 + SLw − Pw

Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.

If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 23: Suffix and Factor Automata and Combinatorics on Words

Standard Sturmian words

A standard Sturmian word is the cutting sequence of a straightline of irrational slope starting from the origin on the discreteplane.

LemmaA right infinite binary word w is a standard Sturmian word if andonly if the left special factors of w are prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 24: Suffix and Factor Automata and Combinatorics on Words

Binary Words

Let fw denote the factor complexity of w , i.e., the functioncounting the number of distinct factors of w of each length.

A binary word is a word w such that fw (1) = 2, i.e., having 2distinct factors of length 1.

Lemma

Let w be a binary word. Then SLw = |w | − Hw .

SLw = number of left special factors of w

Hw = length of the shortest unrepeated prefix of w

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 25: Suffix and Factor Automata and Combinatorics on Words

Binary Words

For binary words we thus have the formula:

|SA(w)| = 2|w |+ 1− (Hw + Pw )

Hw = length of the shortest unrepeated prefix of w

Pw = length of the shortest prefix of w which is not left special

As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:

CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 26: Suffix and Factor Automata and Combinatorics on Words

Binary Words

For binary words we thus have the formula:

|SA(w)| = 2|w |+ 1− (Hw + Pw )

Hw = length of the shortest unrepeated prefix of w

Pw = length of the shortest prefix of w which is not left special

As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:

CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 27: Suffix and Factor Automata and Combinatorics on Words

Example

Example (w = aabbabb)

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

Hw = 2 since aa occurs only once in wPw = 2 since a is left special in w

|SA(w)| = 2 · 7 + 1− (2 + 2) = 11

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 28: Suffix and Factor Automata and Combinatorics on Words

The Number of Edges

What about the number of edges Ew?

The bounds on Ew are well known:

|w | ≤ Ew ≤ 3|w | − 4

For binary words we have the formula:

Lemma (G. Fici 09)

Ew = |SA(w)|+ |G(w)| − 1

G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 29: Suffix and Factor Automata and Combinatorics on Words

The Number of Edges

What about the number of edges Ew?

The bounds on Ew are well known:

|w | ≤ Ew ≤ 3|w | − 4

For binary words we have the formula:

Lemma (G. Fici 09)

Ew = |SA(w)|+ |G(w)| − 1

G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 30: Suffix and Factor Automata and Combinatorics on Words

The Number of Edges

What about the number of edges Ew?

The bounds on Ew are well known:

|w | ≤ Ew ≤ 3|w | − 4

For binary words we have the formula:

Lemma (G. Fici 09)

Ew = |SA(w)|+ |G(w)| − 1

G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 31: Suffix and Factor Automata and Combinatorics on Words

Example

Example (w = aabbabb)

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε,a,b} ∪ {ε,a}

Ew = |SA(w)|+ |G(w)| − 1 = 11 + 3− 1 = 13

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 32: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.

Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.

Corollary

If |Σ| > 2, then:

w is prefix of a standard episturmian word⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 33: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.

Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.

Corollary

If |Σ| > 2, then:

w is prefix of a standard episturmian word⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 34: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.

Corollary

If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.

Corollary

If |Σ| > 2, then:

w is prefix of a standard episturmian word⇒ w ∈ LSA.

w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 35: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .

So, if |Σ| = 2, LSP is the class of standard Sturmian words.

ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 36: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .

So, if |Σ| = 2, LSP is the class of standard Sturmian words.

ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 37: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

Example

Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.

φ(F ) = abcaababcabcaababcaababcabc · · ·

So:

The set of factors of an LSP word is not closed under reversal,in general.

Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 38: Suffix and Factor Automata and Combinatorics on Words

The Class of LSP Words

Example

Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.

φ(F ) = abcaababcabcaababcaababcabc · · ·

So:

The set of factors of an LSP word is not closed under reversal,in general.

Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 39: Suffix and Factor Automata and Combinatorics on Words

The Factor Automaton

Definition (A. Blumer et al. 85 — M. Crochemore 86)The Factor Automaton of the word w is the minimaldeterministic automaton recognizing the factors of w .

ExampleThe FA of w = aabbabb

0 1 2 3 4 5 6 7

3′

a a b b a b b

ba

b

b

0-0

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 40: Suffix and Factor Automata and Combinatorics on Words

Comparison Between the SA and the FA

Example (w=aabbabb)

0 1 2 3 4 5 6 7

3′′ 4′′

3′

a a b b a b b

b

b

a

b

b

a

0 1 2 3 4 5 6 7

3′

a a b b a b b

ba

b

b

0-0

States 3 and 3′′ and states 4 and 4′′ have been identifiedGabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 41: Suffix and Factor Automata and Combinatorics on Words

Future

DefinitionThe future of v in w is what follows, in w , the occurrences of v :

Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}

Examplew = abbaabab

Futw (ba) = {ε,a,ab,aba,abab,b}

Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 42: Suffix and Factor Automata and Combinatorics on Words

Future

DefinitionThe future of v in w is what follows, in w , the occurrences of v :

Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}

Examplew = abbaabab

Futw (ba) = {ε,a,ab,aba,abab,b}

Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 43: Suffix and Factor Automata and Combinatorics on Words

Future

DefinitionThe future of v in w is what follows, in w , the occurrences of v :

Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}

Examplew = abbaabab

Futw (ba) = {ε,a,ab,aba,abab,b}

Define on Fact(w) the equivalence:

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 44: Suffix and Factor Automata and Combinatorics on Words

Future

u ∼FA v ⇐⇒ Futw (u) = Futw (v)

Remarku ∼FA v if and only if for any z ∈ Σ∗ one has

uz ∈ Fact(w)⇐⇒ vz ∈ Fact(w)

RemarkFact(w)/ ∼FA is in bijection with the set of states of the FA of w.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 45: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = |Fact(w)/ ∼FA |

The bounds are well known:

|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2

The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 46: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = |Fact(w)/ ∼FA |

The bounds are well known:

|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2

The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 47: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = |Fact(w)/ ∼FA |

The bounds are well known:

|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2

The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 48: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = |Fact(w)/ ∼FA |

The bounds are well known:

|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2

The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 49: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

The number of states (classes) of the FA of w is therefore

|FA(w)| = |Fact(w)/ ∼FA |

The bounds are well known:

|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2

The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.

And for the lower bound?

Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 50: Suffix and Factor Automata and Combinatorics on Words

Inclusion

RemarkIf u ∼SA v, then u ∼FA v.

The converse is not true:

Example

w = abcaca

Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).

Clearly

|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 51: Suffix and Factor Automata and Combinatorics on Words

Inclusion

RemarkIf u ∼SA v, then u ∼FA v.

The converse is not true:

Example

w = abcaca

Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).

Clearly

|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 52: Suffix and Factor Automata and Combinatorics on Words

Inclusion

LSA ⊂ LFA

Example

w = abcc

We have |SA(w)| = 6 > |w |+ 1, so w /∈ LSA

Nevertheless |FA(w)| = 5 = |w |+ 1, so w ∈ LFA

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 53: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

Does a formula for |FA(w)| exist?

Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.

Example

stem(aabbab) = ab

stem(abacbb) is undefined

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 54: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

Does a formula for |FA(w)| exist?

Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.

Example

stem(aabbab) = ab

stem(abacbb) is undefined

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 55: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

Does a formula for |FA(w)| exist?

Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.

Example

stem(aabbab) = ab

stem(abacbb) is undefined

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 56: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

Lemma (A. Blumer et al. 84)The SA-classes that are identified by the FA-equivalencecorrespond to the prefixes x of the longest repeated suffix of wsuch that |x | ≥ |stem(w)|.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 57: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

So we can define a new parameter:

Definition

SKw =

|stem(w)| if stem(w) is defined

Kw otherwise

Kw = length of the shortest unrepeated suffix of w

This allows us to derive a formula for |FA(w)|:

Theorem

|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw

SLw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 58: Suffix and Factor Automata and Combinatorics on Words

The Number of States of the FA

So we can define a new parameter:

Definition

SKw =

|stem(w)| if stem(w) is defined

Kw otherwise

Kw = length of the shortest unrepeated suffix of w

This allows us to derive a formula for |FA(w)|:

Theorem

|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw

SLw = number of left special factors of w

Pw = length of the shortest prefix of w which is not left special

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 59: Suffix and Factor Automata and Combinatorics on Words

The Language LFA

Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .

Any word w can be uniquely factorized as w = w ′v ′.

w u� �k

u v ′� �k

u v ′ w ′

TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 60: Suffix and Factor Automata and Combinatorics on Words

The Language LFA

Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .

Any word w can be uniquely factorized as w = w ′v ′.

w u� �k

u v ′� �k

u v ′ w ′

TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 61: Suffix and Factor Automata and Combinatorics on Words

The Language LFA

Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .

Any word w can be uniquely factorized as w = w ′v ′.

w u� �k

u v ′� �k

u v ′ w ′

TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 62: Suffix and Factor Automata and Combinatorics on Words

Examples

ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.

ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 63: Suffix and Factor Automata and Combinatorics on Words

Examples

ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.

ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 64: Suffix and Factor Automata and Combinatorics on Words

Remarks

Take |Σ| = 2.

DefinitionA word w is trapezoidal if it has at most n + 1 factors of length n

DefinitionA word w is rich if it contains |w |+ 1 palindromic factors

We have:

Proposition (A. de Luca 99)w Sturmian⇒ w trapezoidal

Proposition (A. de Luca, A. Glen and L.Q. Zamboni 08)w trapezoidal⇒ w rich

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 65: Suffix and Factor Automata and Combinatorics on Words

Remarks

ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.

Remarks:

w is not balanced, since aa,bb ∈ Fact(w)

w is not trapezoidal, since it has four factors of length 2

w is not rich, since it contains only 10 = |w | palindromes:ε,a,b,aa,bb,aba,bab,abba,baab,abaaba

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 66: Suffix and Factor Automata and Combinatorics on Words

Conclusions and Future Work

We gave a characterization of the words in LSA and LFA.

In agenda:

Investigate LSP words.

Apply an analogous approach to other data structures, e.g.suffix tree, suffix array, etc.

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 67: Suffix and Factor Automata and Combinatorics on Words

G. Fici (2009)Combinatorics of Finite Words and Suffix AutomataProc. of the 3rd International Conference on AlgebraicInformatics. LNCS 5725: 250–259

G. Fici (2010)Factor Automata and Special FactorsProc. of the 13th Mons Theoretical Computer Science Days

G. Fici (2011)Special Factors and the Combinatorics of Suffix and FactorAutomataTheoret. Comput. Sci. 412(29): 3604–3615

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words

Page 68: Suffix and Factor Automata and Combinatorics on Words

Thank you!

Gabriele Fici Suffix and Factor Automata and Combinatorics on Words