14
Number of occurrences of powers in strings Maxime Crochemore, Szilard Zsolt Fazekas, Costas S. Iliopoulos, Inuka Jayasekera To cite this version: Maxime Crochemore, Szilard Zsolt Fazekas, Costas S. Iliopoulos, Inuka Jayasekera. Number of occurrences of powers in strings. International Journal of Foundations of Computer Science, World Scientific Publishing, 2010, 21 (4), pp.535–547. <10.1142/S0129054110007416>. <hal- 00693448> HAL Id: hal-00693448 https://hal-upec-upem.archives-ouvertes.fr/hal-00693448 Submitted on 13 Feb 2013 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es.

Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

Number of occurrences of powers in strings

Maxime Crochemore, Szilard Zsolt Fazekas, Costas S. Iliopoulos, Inuka

Jayasekera

To cite this version:

Maxime Crochemore, Szilard Zsolt Fazekas, Costas S. Iliopoulos, Inuka Jayasekera. Number ofoccurrences of powers in strings. International Journal of Foundations of Computer Science,World Scientific Publishing, 2010, 21 (4), pp.535–547. <10.1142/S0129054110007416>. <hal-00693448>

HAL Id: hal-00693448

https://hal-upec-upem.archives-ouvertes.fr/hal-00693448

Submitted on 13 Feb 2013

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

Page 2: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

Number of occurrences of powers in strings

Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆,Costas Iliopoulos1, Inuka Jayasekera1⋆⋆

1 King’s College London, U.K.2 Universite Paris-Est, France

3 Rovira i Virgili University, Tarragona, Spain

Abstract. We show a Θ(n log n) bound on the maximal number of oc-currences of primitively-rooted k-th powers occurring in a string of lengthn for any integer k, k ≥ 2. We also show a Θ(n2) bound on the max-imal number of primitively-rooted powers with fractional exponent e,1 < e < 2, occurring in a string of length n. This result holds obviouslyfor their maximal number of occurrences. The first result contrasts withthe linear number of occurrences of maximal repetitions of exponent atleast 2.

1 Introduction

The subject of this paper is the evaluation of the number of powers in strings.This is one of the most fundamental topics in combinatorics on words not onlyfor its own combinatorial aspects considered since the beginning of last centuryby the precursor A. Thue [18], but also because it is related to lossless text com-pression, string representation, and analysis of molecular biological sequences,to quote a few applications. These applications often require fast algorithms tolocate repetitions because either the amount of data to be treated is huge ortheir flow is to be analysed on the fly, but their design and complexity analysisdepends on the type of repetitions considered and on their bounds.

A repetition is a string composed of the concatenation of several copies ofanother string whose length is called a period. The exponent of a string is in-formally the number of copies and is defined as the ratio between the length ofthe string and its smallest period. This means that the repeated string, calledthe root, is primitive (it is not itself a nontrivial integer power). We considertwo types of strings: integer powers—those having an integer exponent at least2, and fractional powers—those having a fractional exponent between 1 and 2.For both of them we consider their maximal number in a given string as well astheir maximal number of occurrences.

It is known that all the occurrences of integer powers in a string of length ncan be computed in time O(n log n) (see three different methods in [2], [1], and

⋆ Supported by grant no. AP2004-6969 from the Spanish Ministry of Science andEducation of Spain. Partially supported by grant. no. MTM 63422 from the Ministryof Science and Education of Spain.

⋆⋆ Supported by a DTA Award from EPSRC

Page 3: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

[14]). Indeed these algorithms are optimal because the number of occurrences ofsquares (powers of exponent 2) can be of the order of n logn [2].

The computation of occurrences of fractional powers with exponent at least 2has been designed initially by Main [13] who restricted the question to the detec-tion of their leftmost maximal occurrences only. Eventually the notion of runs—maximal occurrences of fractional powers with exponent at least 2—introducedby Iliopoulos et al. [8] for Fibonacci words, led to a linear-time algorithm forlocating all of them on a fixed-sized alphabet. The algorithm, by Kolpakov andKucherov [9, 10], is an extension of Main’s algorithm but their fundamental con-tribution is the linear number of runs in a string. They proved that the numberof runs in a string of length n is at most cn, could not provide any value forthe constant c, but conjectured that c = 1. Rytter [16] proved that c ≤ 5, thenc ≤ 3.44 in [17], Puglisi et al. [15] that c ≤ 3.48, Crochemore and Ilie [3] thatc ≤ 1.6, and Giraud [7] that c ≤ 1.5. The best value computed so far is c = 1.029[4] (see the Web page http://www.csd.uwo.ca/ ilie/runs.html). Franek etal. showed a lower bound of 0.927...n in [6], which was improved to 0.944565n byMatsubara et al. in [11] and to 0.944575n by Puglisi and Simpson. (see the Webpage http://www.shino.ecei.tohoku.ac.jp/runs/). These lower bounds alsopoint in the direction of Kolpakov and Kucherov’s conjecture.

Runs capture all the repetitions in a string but without discriminating amongthem according to their exponent. For example, the number of runs is not easilyrelated to the number of occurrences of squares. This is why we consider an or-thogonal approach here. We count and bound the maximal number of repetitionshaving a fixed exponent, either an integer larger than 1 or a fractional numberbetween 1 and 2. We also bound the number of occurrences of these repetitions.

After introducing the notations and basic definitions in the next section,Section 3 deals with fractional powers with exponent between 1 and 2. It isshown that the maximum number of primitively-rooted powers with a givenexponent e, 1 < e < 2, in a string can be quadratic as well of course as theirmaximum number of occurrences. In Section 4, we consider primitively-rootedinteger powers and show that the maximum number of occurrences of powersof a given exponent k, k ≥ 2, is Θ(n log n). This latter result contrasts withthe linear number of such powers. We also present an efficient algorithm forconstructing the strings in question.

2 Preliminaries

In this section we introduce the notation and recall some basic results that willbe used throughout the paper. All results stated in this section are reported from[12]. An alphabet A is a finite non-empty set. We call the elements of A letters.The set of all finite words over A is A∗, which is a monoid with concatenation(juxtaposition), where the unit element is ǫ, the empty word, whereas the setof non-empty words is A+ = A∗ − ǫ. The length of a word w is denoted by |w|;|ǫ| = 0. Without loss of generality, we can assume that our alphabet is orderedand hence we have an ordering on words. The one we will use is called the

Page 4: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

lexicographical order and is defined by the following relation:

x ≤ y ⇔ (xisaprefixofy or (x = uavandy = ubwanda < b))

where a, b ∈ A and u, v, w ∈ A∗.For words u, v, w ∈ A∗, with w = uv, we say that u is a prefix and v is a

suffix of w. For a word w and an integer n ≥ 0, the n-th power of w is definedinductively as w0 = ǫ, wn = wwn−1. Extending this definition we can talk aboutnon-integer powers too. Take n = k

l> 1 with gcd(k, l) = 1. We say that a word

w is an n-power if both of the following conditions apply:

– |w| = m · k for some integer m > 0,– m · l is a period of w.

The prefix of length m · l of w is a root of w.When w 6= ǫ, w3 is called a cube, with root w. A word w is called primitive

if there is no word u and integer p ≥ 2 such that w = up. We say that w′

is a conjugate of w if there exist u, v ∈ A∗ such that w = uv and w′ = vu. ALyndon word is a (primitive) word which is the lexicographically smallest amongits conjugates.

Let uv be a primitive word such that vu forms a Lyndon word and v isnonempty. In the cube (uv)3, we call central Lyndon position the position |uvu|For two non-empty words u and v it is known that uv = vu implies u, v ∈ z+

for some z ∈ A∗, therefore every word has a unique Lyndon position.If a word w can be written as w = uv = vz, for some words u, v, z ∈ A+,

then we say that w is bordered (v is a border of w). If a word w is bordered,then there exists u ∈ A+, v ∈ A∗ such that w = uvu, that is, a bordered word walways has a border of length at most half the length of w. Moreover, it is easyto see that a bordered word uvu cannot be a Lyndon word, because then eitheruuv (if u < v) or vuu (if v < u) is lexicographically smaller than uvu.

3 A bound on repeats with exponent e, 1 < e < 2

In this section, we show that the maximal number of distinct repetitions withexponent e, 1 < e < 2, is lower bounded by Θ(n2). We do this by looking atthe number of such repetitions that can start at a position in words of the form

akbak

e−1−1, where k is any positive integer such that c|k, where e = c+d

dand

gcd(c + d, d) = 1.First we consider an example with e = 3

2 and k = 9, i.e. w = a9ba17 (see Fig 1).At the first position in this word, we can have 5 repetitions of exponent 3

2 , namelya9ba5, a9ba8, a9ba11, a9ba14 and a9ba17. Moving on to the second position, wehave only 4 repetitions of exponent 3

2 , namely a8ba6, a8ba9, a8ba12 and a8ba15.In the third position also, we have the repetitions a7ba7, a7ba10 and a7ba13.However, now we have one extra repetition as we can also have a7ba4. It is clearthat at every other position in the word, as we get closer to the occurrence ofb, we have an extra repetition. The numbers of primitively-rooted repetitions of

Page 5: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

exponent 32 at each position are 5, 4, 4, 3, 3, 2, 2, 1, 1 (see Fig. 1). The total number

of repetitions can now be summed up to ((5 ∗ 6)/2) + (((5− 1) ∗ 5)/2) = 25. Wegeneralise this example in the next theorem.

Theorem 1. The maximal number of distinct repetitions of exponent e, with1 < e < 2, in a word of length n is Θ(n2).

Proof. The upper bound is trivial because no factor of the string can be countedtwice as an e-power for given e, so let us turn to proving the lower bound.We shall count the number of repetitions starting at each position in a word.For an exponent e, 1 < e < 2, we consider a word, w, formed as shown in Fig.

2. Here, we concatenate a repetition of exponent, e, with root akb and ak

e−1−1,

where k is any positive integer such that c|k, where e = c+dd

and gcd(c+d, d) = 1.(e − 1)|k.length of our string is k · e

e−1 .

a a a a a a a a a b a a a a a a a a a a a a a a a a a

Fig. 1. Repetitions of exponent 1.5 in a9ba17

a ... aba ... a

kk

e−1− 1

Fig. 2. Structure of word, w

For e-powers starting at the first position, the end positions can be (k+1)(e−1), (k + 1)(e − 1) + (c + d), (k + 1)(e − 1) + 2 · (c + d), etc.From here we get that the number of e-powers starting at the first position is

|w| − (k + 1)(e − 1)

c + d+ 1 =

k · ee−1 − (k + 1)(e − 1)

c + d+ 1

Page 6: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

Substituting c+dd

for e in the formula above we get that the number of e-powersstarting at the first position is:

k ·d − c

d · c−

1

d+ 1

This formula proves useful because by substituting k − i for k and taking theinteger part of the result (since we are talking about the number of occurrences)we get the number of e-powers starting at position i + 1. Now let us sum up thenumber of e-power occurrences starting at any one of the first k positions:

k∑

i=1

⌊i ·d − c

d · c−

1

d+ 1⌋

For any positive n its integer part ⌊n⌋ is greater or equal than n − 1. As weare trying to give a lower bound to the number of occurrences, it is alright tosubtract 1 from the formula instead of taking its integer part:

k∑

i=1

(

i ·d − c

d · c−

1

d

)

= k · (k + 1) ·d − c

2d · c−

k

d

This means that the number of e-powers in our string is quadratic in k. At thesame time the length of the string, as we mentioned in the beginning, is k · e

e−1 ,

so for a given e, the number of e-powers in a string of length n is Θ(n2).It is easy to see that every occurrence of an e-power in this string is unique andthis concludes the proof. ⊓⊔

4 A bound on primitively-rooted cubes

After considering powers between 1 and 2, we have look at powers greater than2. First, we show that it is possible to construct strings of length n, which haveΩ(n log n) occurrences of cubes. We can extend the method to all integer powersgreater than 2, and this, together with the O(n log n) upper bound implied bythe number of squares (see [2]) leads us to the Θ(n log n) bound. Finally, we willprove that the sum of all occurrences of powers at least 2 (including non-integerexponents) is quadratic.

Lemma 1. The maximal number of primitively-rooted cubes in a word of lengthn is Θ(n log n).

Proof. Let us suppose there are two primitively-rooted cubes (uv)3 and (xy)3 inw such that their central Lyndon positions uvu.vuv and xyx.yxy are the same.First let us look at the case where the cubes have to be of different length.Without loss of generality we can assume |uv| < |xy|. In this case vu is atthe same time a prefix and suffix of yx. Hence, yx is bordered and cannot bea Lyndon word contradicting the assumption that x.y is a Lyndon position.

Page 7: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

This proves that should there be more cubes which have their central Lyndonposition identical, they all have to be of the same length. Naturally, the firstand last position of a word cannot be central Lyndon to any cube and this givesus the bound n − 2 if we disregard cubes of the same length which have theircentral Lyndon positions at the same place (see Fig. 3). It is easy to see, thatbecause of the periodicity theorem the only string of length n, for which n − 2different positions are central Lyndon ones to some cube, is an.

1 k+1 2k+2 3k+3 4k+3

a a a a a a a a a a a a a a a a a

...

... ... ... ...

4k + 1cubes

Fig. 3. Cubes of word a4k+3

Now take the word a4k+3. According to our previous argument it has atmost 4k + 1 cubes. However, if we change a’s into b’s at positions k + 1,2k + 2and 3k + 3 we get that the number of primitively-rooted cubes in this word is4k + 1 − 9 + (k + 1) = 5k − 7. This is because by introducing each b we losethree cubes but in the end we gain another k + 1 cubes of the form (ajbak−j)3

with 0 ≤ j ≤ k (see Fig. 4). Note that these latter cubes all have their centralLyndon position after the first b (assuming a < b).

We introduced three b’s in the previous step but of course we can repeat theprocedure for the four block of a’s delimited by these b’s and then in turn forthe new, smaller blocks of a’s that result and so on. In the second step, however,we need to introduce 12 b’s - that is, 3 for each of the 4 blocks of a’s - not todisrupt the cubes of length 3k + 3. This way we lose 12 · 3 = 36 cubes and wegain (⌊(k − 3)/4⌋+ 1) · 4 new ones. Performing the introduction of b’s until thenumber of cubes we lose in a step becomes greater or equal to the ones we gain,gives us a string with the maximal possible number of cubes for its length. If k

1 k+1 2k+2 3k+3 4k+3

a a a a a a b a a a a b a a a a b a a a a a a... ... ... ...

...

addedk + 1cubes

removed3 ∗ 3cubes

Fig. 4. Cubes of word akbakbakbak

Page 8: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

equals 4j, 4j + 1 or 4j + 2 for some j then according to the formula above thenumber of cubes we gain is 4j. Note that if k = 4j +3 than the number of cubeswe gain in the second step is 4j + 4 = k + 1, i.e. the same as in the first step.However, together with the delimiting b’s introduced before we would get a bigcube which is not primitively-rooted anymore, so we need to move the newlyintroduced b’s 1, 2 and 3 positions to the left, respectively. This gives us that inthis case too the number of newly formed cubes will be 4j. The smallest lengthat which introducing the b’s does not induce less cubes is 35 that is with k = 8.Summarizing the points above we get that for a string of length n the maximumincrease in the number of cubes for the ith (i > 1) consecutive application ofour procedure is:

(n − 3)

4− 9 · 4i−1

To be able to sum these increases we have to know the number of steps performed.This is given by solving for i the equation:

n − 3

4= 9 · 4i−1

From here we get that the number of steps performed is #steps = ⌊log4n−3

9 ⌋,where by ⌊x⌋ we mean the integer part of x.Hence the number of cubes for length n ≥ 39 is:

n − 2 + 1 +

#steps∑

i=1

(

n − 3

4− 9 · 4i−1

)

= n − 1 +(n − 3)⌊log4

(n−3)9 ⌋

4−

9(1 − 4⌊log4

n−3

9⌋)

−3

= n + 2 +(n − 3)⌊log4

(n−3)9 ⌋

4− 3 · 4⌊log4

n−3

9⌋

The plus one after n − 2 comes from the first application of the insertion of b’swhere we get (n − 3)/4 + 1 cubes instead of (n − 3)/4. For strings shorter than39 therefore the count is one less. ⊓⊔

Since the first paragraph of the proof is valid for any integer power, we canextend the proof by giving the construction of the strings that prove the lowerbound in general for a string of length n and power k (see Fig. 4).

The algorithm above produces strings which have O(n log n) occurrences ofk-th powers. Note, that if we perform the procedure the other way around, weonly need O(log n) cycles and we can eliminate the recursion:

Theorem 2. Algorithm ConstructStrings2 (see Fig. 4) produces a string oflength n that has Ω(n log n) occurrences of primitively-rooted cubes.

Page 9: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

Algorithm ConstructStrings1 (n, k)Input: n ≥ 0, k ≥ 0Output: A string which proves the lower bound of the number of occurrences of integer

powers.1. ℓ = n2. string = aℓ

3. power(1, ℓ)4. Procedure: power( start, end)5. ℓ = end - start6. if ℓ < k3 + k2 + k7. then return8. else string[start + ⌊ℓ/(k + 1)⌋] = b9. string[start + 2 · ⌊ℓ/(k + 1)⌋] = b10. . . .11. string[start + k · ⌊ell/(k + 1)⌋] = b12. for i ←0 to k13. power(start + i · ℓ/(k + 1), start + (i + 1) · ℓ/(k + 1))

Algorithm ConstructStrings2 (n, k)Input: n ≥ 0, k ≥ 0Output: A string which proves the lower bound of the number of occurrences of integer

powers.1. ℓ = n2. while ℓ ≥ k3 + k2 + 3k + 23. do ℓ= ℓ−k

k+1

4. string = (ak2+1 + b)k + a(k+1)·ℓ−k3−k

5. delimiter = b6. while length(string) ∗ (k + 1) + k < n7. do string = (string + delimiter)k + string8. if delimiter = b9. then delimiter = a10. else delimiter = b11. (∗ changing the delimiter is needed to stay primitive ∗)12. string = string + an−length(string)

Page 10: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

Proof. Before entering the second while loop, the length of string and the num-ber of k-th power occurrences in it are both c = (k + 1) · ℓ + k. Now we willshow by induction on i that after the i-th iteration of the second while loop thelength of string will be (k + 1)i · (c + 1) − 1 and the number of occurrences ofk-th powers will be (k + 1)i · c + i · (k + 1)i−1(c + 1).Note that if the length of string was m and the number of k-th power occur-rences was p after the previous cycle, then concatenating k + 1 copies of stringdelimited by k copies of delimiter we get (k + 1) · p + m + 1 powers in the newstring, which will have length (k + 1) ·m + k. Therefore, after the first cycle thelength of string will be

(k + 1) · c + k = (k + 1) · c + (k + 1) − 1 = (k + 1)1 · (c + 1) − 1

At the same time the number of k-th powers will be

(k + 1) · c + c + 1 = (k + 1)1 · c + 1 · (k + 1)0 · (c + 1)

so our statement holds for i = 1. Now suppose it is true for some i ≥ 1. Fromhere we get that for i + 1 the length of string will be:

(k + 1) · ((k + 1)i · (c + 1) − 1) + k = (k + 1)i+1 · (c + 1) − 1

whereas the number of k-th powers is:

(k + 1) · ((k + 1)i · c + i · (k + 1)i−1 · (c + 1)) + ((k + 1)i · (c + 1) − 1) + 1

= (k + 1)i+1 · c + i · (k + 1)i · (c + 1) + (k + 1)i · (c + 1)

= (k + 1)i+1 · c + (i + 1) · (k + 1)i(c + 1)

Now let us look at the running time of the algorithm. In the first while loopwe divide the actual length by k + 1 and we do it until it becomes smaller thank3 +k2 +k therefore we perform O(log n) cycles. The second while loop has thesame number of cycles, with one string concatenation performed in each cycle,hence substituting log n for i in the formula above concludes the proof.

⊓⊔

Corollary 1. In a string of length n the maximal number of primitively-rootedk-th powers, for a given integer k ≥ 2, is Θ(n log n).

Proof. We know from [5] that the maximal number of occurrences of primitively-rooted squares in a word of length n is O(n log n). This implies that the numberof primitively-rooted greater integer powers also have an O(n log n) upper bound,while in Theorem 2 we showed the lower bound Ω(n log n). ⊓⊔

Remark 1. The first part of the proof is directly applicable to runs so we havethat in a string of length n the number of runs of length at least 3p − 1, wherep is the (smallest) period of the run is at most n − 2. Unfortunately we cannotapply the proof directly for runs shorter than that because we need the samestring on both sides of the central Lyndon position.

Page 11: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

We have seen that the number of k-th powers for a given k(≥ 2) in a stringof length n is Θ(n log n), but what happens if we sum up the occurrences of k-thpowers for all k ≥ 2?

Remark 2. The upper bound of the sum of all occurrences of k-th powers with

primitive root, where k ≥ 2, in a word w with |w| = n is n·(n−1)2 . Moreover, the

bound is sharp.

Proof. First consider the word an, for some n > 0. Clearly, taking any substringak, with 2 ≤ k ≤ n, we get a k-th power, so the number of powers greater orequal to two is given by the number of contiguous substrings of length at least

two, that is n·(n−1)2 . Now we will show that this is the upper bound. Let us

suppose that any two positions i and j in the string delimit a k-th power withk ≥ 2, just like in the example above. We need to prove that the same stringcannot be considered a k1-th power and a k2-th power at the same time, withk1, k2 ≥ 2 and k1 6= k2. Suppose the contrary, that is there are 1 ≤ m < ℓ ≤ j−i

2so that both m and ℓ are periods of w[i, j]. Since j − i > m + ℓ − gcd(m, ℓ) theperiodicity lemma tells us that w[i, j] has a period p smaller than m with p|mand p|ℓ, and this, in turn, means w[i, i + ℓ] is not primitive. ⊓⊔

Theorem 3. The number of distinct k-th powers, for a fixed integer k ≥ 3, ina string of length n is at most n

k−2 .

Proof. We will show the upper bound by considering the last occurrences ofevery k-th power. The proof is split into two parts. We will prove the statementfor cubes by considering their starting positions while for higher exponents wewill look at their root positions (see below).

Let us start with the case k = 3. Suppose the last occurrence of two differentcubes u3 and v3 with |u| < |v| start at the same position i in the string. By asimple argument we will arrive at a contradiction by looking at the two casesshown in Figure 5.

First let us look at the case when |u3| ≤ |v2|. In this case there is anotheroccurrence of u3 starting at position i + |v| contradicting our assumption of theprevious occurrence being the last.Now we are left to treat the case when |v2| < |u3| < |v3|. The overlap betweenthe two cubes in this case is at least 2 · |v| which is greater than |v| + |u| andfrom this, Fine and Wilf’s theorem tells us it has a period of length gcd(|u|, |v|).Therefore, there exists some w such that u = wm and v = wn, for some integersm < n. It is easy to see then that in |w3n| the last occurrence of |w3m| starts atposition i + 3 · (m − n) · |w| contradicting our assumption again.We showed that there can be no two different cubes which have their last occur-rence starting at the same position. This implies the bound n for higher distinctpowers as well. However, we can prove something stronger, as we claim in thetheorem. To achieve that result, we will look at root positions. In a power uk

starting at position i, with the smallest period of u being p, we will call positioni+ p the second root position, i+2p the third root position and so on. We showthat for the last occurrences of two 4-th powers u4 and v4, with |u| < |v|, u4

Page 12: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

|u3| < |v2|

v v v

u u u

u u u

|v2| < |u3| < |v3|

w w w w w w w w w w

v v v

u u u

Fig. 5. Cubes u3 and v3 beginning at the same position i.

Page 13: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

starting at position i and v4 starting at position j, the following positions cannotcoincide:

1. the second root position of u and the second root position of v:– if 3|u| < 2|v| then u4 occurs at i + |v|, contradiction;– if 2|v| ≤ 3|u| then according to Fine and Wilf’s theorem v has period k ·p,for some k and then u4 occurs at i + p, contradiction.

2. the second root position of u and the third root position of v:– if 3|u| ≤ |v| then u4 occurs at i + |v|, contradiction;– if |v| < 3|u| ≤ 2|v| then again Fine and Wilf’s theorem gives us u4 occur-ring at i + p, contradiction;– if 2|v| < 3|u|: if this is the case then similarly as before u and v are powersof the same word and hence v4 occurs at j + p, contradiction;

3. the second root position of v and the third root position of u:– if 3|u| < |v| then u4 occurs at i + |v|, contradiction;– if |v| ≤ 3|u| then u4 occurs at i + p, contradiction.

4. the third root position of u and the third root position of v:– if 2|u| ≤ |v| then u4 occurs at i + |v|, contradiction;– if |v| < 2|u| then u4 occurs at i + p.

We can apply the same argument for 5-th powers looking at the second, thirdand fourth root positions and so on for greater powers as well, getting the desiredbound.

5 Conclusion

In conclusion, we have proven the following bounds on repetitions in words:

(i) The maximal number of distinct repetitions of exponent, e, with 1 < e < 2,in a word of length n is Θ(n2).

(ii) The maximal number of primitively-rooted k-th powers in a word of lengthn is Ω(n log n).

We have also described an O(m log n) algorithm which can be used to con-struct strings to illustrate these bounds. Here O(m) is the time complexity ofconcatenating two strings of length n.

References

1. A. Apostolico and F. P. Preparata. Optimal off-line detection of repetitions in astring. Theoret. Comput. Sci., 22(3):297–315, 1983.

2. M. Crochemore. An optimal algorithm for computing the repetitions in a word.Inf. Process. Lett., 12(5):244–250, 1981.

3. M. Crochemore and L. Ilie. Maximal repetitions in strings. J. Comput. Syst. Sci.,2007. In press.

Page 14: Number of occurrences of powers in strings · Number of occurrences of powers in strings Maxime Crochemore1,2, Szilard Zsolt Fazekas3⋆, Costas Iliopoulos 1, Inuka Jayasekera ⋆⋆

4. M. Crochemore, L. Ilie, and L. Tinta. Towards a solution to the “runs” conjec-ture. In P. Ferragina and G. M. Landau, editors, Combinatorial Pattern Matching,LNCS. Springer-Verlag, Berlin, 2008. In press.

5. M. Crochemore and W. Rytter. Squares, cubes and time-space efficient string-searching. Algorithmica, 13(5):405–425, 1995.

6. F. Franek, R. J. Simpson, and W. F. Smyth. The maximum number of runsin a string. In M. M. . K. Park, editor, Proc. 14th Australasian Workshop on

Combinatorial Algorithms, pages 26–35, 2003.7. M. Giraud. Not so many runs in strings. In C. Martin-Vide, editor, 2nd Interna-

tional Conference on Language and Automata Theory and Applications, 2008.8. C. S. Iliopoulos, D. Moore, and W. F. Smyth. A characterization of the squares in

a Fibonacci string. Theoret. Comput. Sci., 172(1–2):281–291, 1997.9. R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear

time. In Proceedings of the 40th IEEE Annual Symposium on Foundations of

Computer Science, pages 596–604, New York, 1999. IEEE Computer Society Press.10. R. Kolpakov and G. Kucherov. On maximal repetitions in words. J. Discret.

Algorithms, 1(1):159–186, 2000.11. K. Kusano, W. Matsubara, A. Ishino, H. Bannai, and A. Shinohara. New lower

bounds for the maximum number of runs in a string. CoRR, abs/0804.1214, 2008.12. M. Lothaire. Applied Combinatorics on Words. Cambridge University Press, Cam-

bridge, UK, 2005.13. M. G. Main. Detecting leftmost maximal periodicities. Discret. Appl. Math.,

25:145–153, 1989.14. M. G. Main and R. J. Lorentz. An O(n log n) algorithm for finding all repetitions

in a string. J. Algorithms, 5(3):422–432, 1984.15. S. J. Puglisi, J. Simpson, and W. F. Smyth. How many runs can a string contain?,

2007. Personal communication, submitted.16. W. Rytter. The number of runs in a string: Improved analysis of the linear upper

bound. In B. Durand and W. Thomas, editors, STACS, volume 3884 of Lecture

Notes in Computer Science, pages 184–195. Springer, 2006.17. W. Rytter. The number of runs in a string. Inf. Comput., 205(9):1459–1469, 2007.18. A. Thue. Uber unendliche Zeichenreihen. Norske Vid. Selsk. Skr. I Math-Nat. Kl.,

7:1–22, 1906.