38
Advances in Applied Mathematics 32 (2004) 485–522 www.elsevier.com/locate/yaama Uniform words Arturo Carpi a,b and Aldo de Luca c,d,a Dipartimento di Matematica e Informatica dell’Università di Perugia, Via Vanvitelli 1, 06100 Perugia, Italy b Istituto di Cibernetica “E. Caianiello” del CNR, 80078 Pozzuoli, Italy c Dipartimento di Matematica dell’Università di Roma “La Sapienza”, piazzale Aldo Moro 2, 00185 Roma, Italy d Centro Interdisciplinare “B. Segre”, Accademia dei Lincei, via della Lungara 10, 00100 Roma, Italy Received 15 July 2002 Abstract A word w over the alphabet A is called uniform if for any two words u and v of the same length, the numbers of occurrences of u and v in w differ at most by 1. In particular, a uniform word contains as factors all the words of length G w , where G w is the maximal length of a repeated factor of w. Some characterizations of uniform words are given. A lower bound for the number of uniform words of length N is determined in some special cases. The main result of the paper is the proof that on each alphabet A there exist uniform words of any length. Moreover, an efficient algorithm to construct for any N a uniform word of length N is given. Finally, we give a characterization of a uniform word of length N as a minimum of two different quasi-order relations defined in A N and as a maximum of suitable entropy functionals. 2003 Elsevier Inc. All rights reserved. Keywords: Uniform word; De Bruijn word; Majorization relation; Entropy 1. Introduction Words are sequences of symbols, called letters, on a finite alphabet. The study of combinatorial and structural properties of words is a subject which becomes more and more interesting both from the theoretical and applicative points of view. As regards the applications we mention here, for instance, the problems of ‘data compression’ and ‘pattern The work for this paper has been supported by the Italian Ministry of Education under Project COFIN 2001 – Linguaggi Formali e Automi: teoria ed applicazioni. * Corresponding author. E-mail addresses: [email protected] (A. Carpi), [email protected] (A. de Luca). 0196-8858/$ – see front matter 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0196-8858(03)00057-5 CORE Metadata, citation and similar papers at core.ac.uk Provided by Elsevier - Publisher Connector

Arturo Carpi - COnnecting REpositories · 2017. 3. 2. · A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 487 In Section 7 we give a characterization

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • a

    ,

    ,ns

    wordsn eacht forof

    f

    re ands theattern

    1

    CORE Metadata, citation and similar papers at core.ac.uk

    Provided by Elsevier - Publisher Connector

    Advances in Applied Mathematics 32 (2004) 485–522

    www.elsevier.com/locate/yaam

    Uniform words✩

    Arturo Carpia,b and Aldo de Lucac,d,∗

    a Dipartimento di Matematica e Informatica dell’Università di Perugia, Via Vanvitelli 1, 06100 Perugia, Italyb Istituto di Cibernetica “E. Caianiello” del CNR, 80078 Pozzuoli, Italy

    c Dipartimento di Matematica dell’Università di Roma “La Sapienza”, piazzale Aldo Moro 2, 00185 RomaItaly

    d Centro Interdisciplinare “B. Segre”, Accademia dei Lincei, via della Lungara 10, 00100 Roma, Italy

    Received 15 July 2002

    Abstract

    A word w over the alphabetA is calleduniform if for any two wordsu andv of the same lengththe numbers of occurrences ofu andv in w differ at most by 1. In particular, a uniform word contaias factors all the words of length� Gw, whereGw is the maximal length of a repeated factor ofw.Some characterizations of uniform words are given. A lower bound for the number of uniformof lengthN is determined in some special cases. The main result of the paper is the proof that oalphabetA there exist uniform words of any length. Moreover, an efficient algorithm to construcanyN a uniform word of lengthN is given. Finally, we give a characterization of a uniform wordlengthN as a minimum of two different quasi-order relations defined inAN and as a maximum osuitable entropy functionals. 2003 Elsevier Inc. All rights reserved.

    Keywords:Uniform word; De Bruijn word; Majorization relation; Entropy

    1. Introduction

    Words are sequences of symbols, calledletters, on a finitealphabet. The study ofcombinatorial and structural properties of words is a subject which becomes momore interesting both from the theoretical and applicative points of view. As regardapplications we mention here, for instance, the problems of ‘data compression’ and ‘p

    ✩ The work for this paper has been supported by the Italian Ministry of Education under Project COFIN 200– Linguaggi Formali e Automi: teoria ed applicazioni.

    * Corresponding author.E-mail addresses:[email protected] (A. Carpi), [email protected] (A. de Luca).

    0196-8858/$ – see front matter 2003 Elsevier Inc. All rights reserved.doi:10.1016/S0196-8858(03)00057-5

    https://core.ac.uk/display/82332768?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1

  • 486 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    uencee, for

    orle

    turein

    est:

    ial

    idely

    ny

    riformrds of

    of

    ,

    ns

    a

    ble 1withcs on

    ot

    cheover,

    matching’ which are of great interest in computer science and the problem of ‘seqassembly’ which is a problem of fundamental importance in molecular biology (seinstance, [5,6,10,12]).

    In the analysis of the structure of a wordrepetitionsplay an essential role (see, finstance, [2]). In fact, the existence of repeatedfactors of quite large length is unavoidabin sufficiently long words over alphabets of small size. Therefore,in combinatorics onwords it is very useful for many purposes to deal with the number of occurrences‖u‖w ofany wordu in a given wordw. The wordu is a factor of w if ‖u‖w � 1 and arepeatedfactor of w if ‖u‖w > 1.

    The point of view that we follow in this paper is to get information on the strucof a wordw by considering some suitable conditions on the number of occurrenceswof any other wordu. In this framework two notions are very natural and of great interfullnessanduniformityof a word.

    A word is calledn-full if all words of lengthn occur in it at least once. In the speccase in which any word of lengthn occurs exactly once, the word is called ade Bruijnword of ordern. These words have a great interest in different fields and have been wstudied by many authors (cf. [8]).

    A word is calledn-uniform if the difference of the numbers of occurrences in it of atwo words of lengthn is at most 1. A word is calleduniform if it is n-uniform for alln � 0. The structure of uniform words of given lengthN is complex since, as it will appeaclear in the last section of the paper, they are words of maximal ‘entropy’ so that unwords are in a certain sense the most ‘random’ elements in the population of all wolengthN .

    We observe that the notion of uniform word is quite different from the notionbalanced word(see [14], for example). We recall that a wordw is balancedif the differenceof the numbers of occurrences of any letter in any two factors ofw having the same lengthis at most 1. The wordabbaa is uniform but it is not balanced whereasabaab is balancedbut it is not uniform.

    In Sections 3 and 4 we introducen-full andn-uniform words. Several characterizatioof n-uniform words are given. In particular, it is shown that a uniform word isGw-full,whereGw is the maximal length of a repeated factor ofw. Moreover, it is shown thatuniform word of lengthdm + m − 1 on ad-letter alphabet, is a de Bruijn word of orderm.One proves also the existence of arbitrarily long uniform words on any alphabet.

    In Section 5 we consider the problem of counting uniform words. As shown in Tain the case of a binary alphabet, the distribution of uniform words is quite irregularseveral points of local maximum and minimum. By using techniques of combinatoriwords, we obtain lower bounds for the numberDU(d,N) of uniform words of lengthNon ad-letter alphabet in the casesN = dm andN = dm + 1. Moreover, we are able tcompute the exact value ofDU(d,N) for infinitely many values ofN . It is also shown thaif d > 1, the fractionDU(d,N)/dN tends to 0 whenN tends to infinity.

    In Section 6 we prove that on any alphabet there exists at least one uniform word of ealength. The proof is quite complicate and requires several technical lemmas. Morit suggests an efficient procedure to construct for anyN a uniform word of lengthN .One derives also a new method for constructing for anym a de Bruijn word of orderm.Moreover, we prove thatDU(d,N) diverges withN .

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 487

    ls.

    ation

    vity,’

    ity.is not

    at the001)

    s

    rdhen

    nver,

    In Section 7 we give a characterization of a uniform word of lengthN as a minimum oftwo different quasi-order relations defined inAN and as a minimum of some functionaThese quasi-orders are introduced by using themajorization relationdefined on suitablevectors containing the numbers of occurrences in anyw ∈ AN of words of length� N ; thefunctionals are (strictly) Schur-convex functions naturally associated with the majorizorders.

    In Section 8 we give an interpretation of these results in terms of ‘entropy,’ ‘repetitiand ‘recurrence.’ More precisely, we characterize uniform words of given lengthN asthe words of lengthN which maximize the entropy as well as minimize the repetitivMoreover, a uniform word minimizes the recurrence even though the conversegenerally true.

    We mention that a short abstract of this paper, without proofs, was presentedInternational Conference “Mathematical Logic, Algebra and Set Theory” (Moscow, 2dedicated to the 100th anniversary of P.S. Novikov [4].

    2. Preliminaries

    Let A be a finite non-empty set, oralphabet, andA∗ the set of all finite sequenceof elements ofA, including the empty sequence, denoted by�. The elements ofA areusually calledlettersand those ofA∗ words. The word� is calledempty word. We setA+ = A∗ \ {�}.

    A word w ∈ A+ can be written uniquely as a sequence of letters as

    w = a1a2 · · ·an,

    with ai ∈ A, 1� i � n, n > 0. The integern is called thelengthof w and denoted by|w|.By definition, the length of� is equal to 0. For anyn � 0 we setAn = {w ∈ A∗ | |w| = n}andA[n] = {w ∈ A∗ | |w| � n}.

    A word u is a factor, or subword, of a wordw if there exist wordsr ands such thatw = rus. If w = us, for some words (respectivelyw = ru, for some wordr), thenu iscalled aprefix (respectively asuffix) of w. We shall denote by Fact(w) the set of factorsof w.

    Let u be a factor of the wordw. A right (respectivelyleft) extensionof u in w is anyfactor ofw of the kindux (respectivelyxu) with x ∈ A.

    Two wordsu,v ∈ A∗ areconjugateif there exist wordsr, s ∈ A∗ such thatu = rs andv = sr. If r, s ∈ A+, thenu andv are said to bestrictly conjugate. The conjugacy relationis an equivalence relation inA∗.

    For any wordw, we shall denote by[w] the conjugacy class ofw. A conjugacy class issometimes calledcircular word since one can represent the conjugacy class of a wowdisposing the letters ofw along a circle in a fixed direction; each word of the class is tobtained reading|w| consecutive letters on the circle in the fixed direction.

    A word is said to beprimitive if it is not strictly conjugate to itself. As is well know(see [13] for example), any word conjugate to a primitive word is primitive. Moreoa wordw is primitive if and only if it cannot be written asw = ur with u �= � andr > 1.

  • 488 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    er

    rds of

    Let w ∈ A+ andp be a positive integer. The wordw hasperiodp if it can be writtenas:

    w = us = tu with u, s, t ∈ A∗ and|s| = |t| = p.

    The wordu satisfying the previous equation is also called aborder of w. The minimalperiod of a wordw will be denoted byπw. We remark that the maximal length of a bordof w is equal to|w|−πw. A wordw is said to beunborderedif its only border is the emptyword.

    Let u ∈ Fact(w). Any pair(λ,µ) ∈ A∗ ×A∗ such thatw = λuµ is called anoccurrenceof u in w. For anyu ∈ A∗, we denote by‖u‖w the number of all distinct occurrences ofuin w, i.e.,

    ‖u‖w = Card({

    (λ,µ) ∈ A∗ × A∗ | w = λuµ}).Trivially, a wordu ∈ A∗ is a factor ofw if and only if ‖u‖w > 0.

    We give here two useful lemmas concerning the number of occurrences of woa fixed length in a given word.

    Lemma 2.1. Letw be a word andn an integer such that0� n � |w|. One has∑u∈An

    ‖u‖w = |w| − n + 1.

    Proof. The total number of occurrences of factors of lengthn in w is given by

    ∑u∈An

    ‖u‖w = Card({

    (λ,µ) ∈ A∗ × A∗ | ∃u ∈ An, w = λuµ}).This number equals the number of prefixesλ of w such that 0� |λ| � |w| − n, namely,|w| − n + 1. �Lemma 2.2. Letw = λsµ, with λ, s,µ ∈ A∗, and setn = |s| + 1. For all u ∈ A[n] one has

    ‖u‖w =∑

    v∈uA∗∩An‖v‖λs + ‖u‖sµ.

    In particular, for all u ∈ An one has

    ‖u‖w = ‖u‖λs + ‖u‖sµ. (1)

    Proof. We prove the result by induction on the length ofλ. If |λ| = 0, the result is trivial.Then let us suppose|λ| > 0. In this case one can write

    λ = xλ′, w = xw′, w′ = λ′sµ,

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 489

    e

    for suitablex ∈ A andλ′ ∈ A∗. By the induction hypothesis,

    ‖u‖w′ =∑

    v∈uA∗∩An‖v‖λ′s + ‖u‖sµ.

    If u is a prefix ofλs, then it is also a prefix ofw so that

    ‖u‖w = ‖u‖w′ + 1 and∑

    v∈uA∗∩An‖v‖λs =

    ∑v∈uA∗∩An

    ‖v‖λ′s + 1.

    If, on the contrary,u is not a prefix ofλs, then

    ‖u‖w = ‖u‖w′ and∑

    v∈uA∗∩An‖v‖λs =

    ∑v∈uA∗∩An

    ‖v‖λ′s .

    In both cases one reaches the conclusion.�A factoru of a wordw is said to berepeatedif ‖u‖w > 1. Any unordered pair(λ1,µ1),

    (λ2,µ2) of distinct occurrences of a repeated factoru of w is called arepetitionof u in w.Trivially, the number of all repetitions of a wordu in w is given by

    ‖u‖w(‖u‖w − 1)2

    .

    For instance, in the case of the wordw = aabaabbabab, the numbers of repetitions of thwordsa, ab, andaaa are respectively 15, 6, and 0.

    For anym � 0 we consider the functionKm :A∗ → N defined by

    Km(w) = 12

    ∑u∈Am

    ‖u‖w(‖u‖w − 1). (2)

    Thus, for anyw ∈ A∗, Km(w) counts the total number of repetitions inw of the wordsof lengthm. We callKm(w) repetitivity of orderm of w. We also denote byK(w) thenumber of all repetitions inw, i.e.,

    K(w) =|w|∑

    m=0Km(w). (3)

    We callK(w) thetotal repetitivityof w.For any wordw one can introduce thesubword complexityof w which is the map

    λw :N → N defined for alln � 0 by

    λw(n) = Card(Fact(w) ∩ An).

    In other terms, for anyn, λw(n) counts the number of (distinct) factors ofw of lengthn.

  • 490 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    r

    f

    e

    Let us denote byGw the maximal length of a repeated factor of a non-empty wordw.We recall [1] that the subword complexityλw of any word w is non-decreasing fo0 � n � Gw + 1 and is strictly decreasing forGw + 1 � n � |w|, having in this interval

    λw(n + 1) = λw(n) − 1. (4)

    Thus,λw reaches inGw + 1 its maximum value. Sinceλw(|w|) = 1, from Eq. (4) oneobtains that

    λw(Gw + 1) = |w| − Gw. (5)

    We observe that for 0� n � |w|,

    λw(n) �∑u∈An

    ‖u‖w = |w| − n + 1,

    where the equality holds if and only ifn > Gw.From the behavior ofλw one easily derives that for 1� n < |w|, λw(n) > 1 unlessw is

    a power of a letter.

    Lemma 2.3. For any non-empty wordw one has

    Gw � |w| − πw.

    Proof. The wordw has a border of maximal length equal to|w| − πw. Since a border ow is a repeated factor ofw, the result follows. �

    Let w ∈ A∗ be a word. For anyu ∈ Fact(w), the quantity‖u‖w − 1 represents thnumber of times thatu reoccurs inw. For anyn � 0 we set

    Pn(w) =∑

    u∈Fact(w)∩An

    (‖u‖w − 1). (6)We callPn(w) therecurrenceof factors ofw of lengthn or, simply,recurrence of ordernof w. Moreover, we shall consider thetotal recurrenceof w, defined by

    P(w) =|w|∑n=0

    Pn(w). (7)

    By Lemma 2.1, one derives that for 0� n � |w| one has

    Pn(w) = |w| − n + 1− λw(n). (8)

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 491

    et

    e of

    By summing, one obtains

    P(w) =( |w| + 2

    2

    )− Card(Fact(w)). (9)

    3. Full and de Bruijn words

    In the sequelA will denote ad-letter alphabet. Letw be a word on the alphabetA andn a non-negative integer. The wordw will be calledn-full if

    An ⊆ Fact(w).We remark that if a wordw is n-full, then, trivially,w is m-full for all m � n. Moreover, ifw ∈ A+ is ann-full word, then

    n � Gw + 1. (10)Indeed, ifa ∈ A andn > 0, thenan is a factor ofw and thereforean−1 is repeated inw.This implies thatGw � n − 1.

    For instance, the wordw = aaababaabbba on the alphabet{a, b} is 3-full andGw = 3.However, it is not 4-full since, e.g.,a4 is not a factor ofw.

    Let w ∈ A∗ be anm-full word with m = Gw + 1. Then any word ofAm occurs exactlyonce inw, since any factor ofw of length m is unrepeated. Anm-full word w withm = Gw + 1 is usually called a (linear) de Bruijnword oforder m (cf. [7,8]).

    For instance, the wordsaabba andaaabbbabaa are de Bruijn words on the alphab{a, b} of order 2 and 3, respectively. The wordaaccbcabba is a de Bruijn word on thealphabet{a, b, c} of order 2.

    The following lemma was proved in [3]. We report the proof here for the sakcompleteness.

    Lemma 3.1. A wordw ∈ A∗ is a de Bruijn word of orderm if and only if|w| = dm + m − 1 and Gw = m − 1.

    Proof. If w ∈ A∗ is a de Bruijn word of orderm, thenGw = m − 1. Moreover, in view ofLemma 2.1,

    dm =∑

    u∈Am‖u‖w = |w| − m + 1,

    so that|w| = dm + m − 1.Conversely, suppose that|w| = dm + m − 1 andGw = m − 1. Then, by Eq. (5),

    λw(m) = λw(Gw + 1) = |w| − Gw = dm.Therefore,w is m-full, with m = Gw + 1, i.e.,w is a de Bruijn word of orderm. �

  • 492 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    Lemma 3.2. Let w be a word of length|w| = dm + m − 1 with m � 1. Thenw is a deBruijn word of orderm if and only if it ism-full.

    Proof. If w is a de Bruijn word of orderm, then it ism-full. Conversely, ifw is anm-fullword of length|w| = dm + m − 1, then

    λw(m) = dm = |w| − m + 1,which impliesm > Gw . Sincew is m-full, by Eq. (10) one hasm � Gw + 1. Thus,Gw = m − 1; by Lemma 3.1,w is a de Bruijn word of orderm. �

    For any positive integersN andd , we define

    GN,d = minv∈AN

    Gv.

    In the sequel we shall denoteGN,d simply byGN when no confusion arises.

    Lemma 3.3. For any positive integerN , one has

    GN = max{n ∈ N | dn + n � N}.

    Proof. Let w ∈ AN be a word such thatGw = GN and setp = max{n ∈ N | dn + n � N}.By Lemma 2.1,

    ∑u∈Ap

    ‖u‖w = N − p + 1 > dp.

    Therefore,w has a repeated factor of lengthp that impliesGN = Gw � p.By definition ofp, one hasN � dp+1 + p so that we can consider a factorv of length

    N of a de Bruijn word of orderp + 1. Sincev has no repeated factor of lengthp + 1, onehasGN � Gv � p. Hence,GN = p. �

    By the preceding lemma, one derives immediately the following relation:

    dGN + GN − 1 < N � dGN +1 + GN. (11)

    Corollary 3.1. Let d > 1 and N be a positive integer. Ifdm � N � dm + m − 1 withm > 0, thenGN = logd N� − 1 = m − 1. If dm + m � N � dm+1 − 1 with m > 0, thenGN = logd N� = m. In any case, one has

    logd N� − 1 � GN � logd N�.

    Proof. One has

    max{n ∈ N | dn + n � N} ={

    m − 1 if dm � N � dm + m − 1,m m+1

    m if d + m � N � d − 1.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 493

    ince

    is

    at

    s

    Sincem = logd N�, the first part of the statement follows from the previous lemma. Sfor any positive integerN there exists anm � 0 such that eitherdm � N � dm + m − 1 ordm + m � N � dm+1 − 1, the result follows. �Proposition 3.1. Let w ∈ AN be aGw-full word. ThenGw = GN . In particular, if d > 1,then

    Gw � logd N�.

    Proof. Sincew is Gw-full, by Eq. (5) one has

    dGw = λw(Gw) � λw(Gw + 1) = N − Gw.Thus, by Eq. (11) one derives

    dGw + Gw � N < dGN +1 + GN + 1and, therefore,Gw < GN + 1 that impliesGw = GN . The remaining part of the proofa consequence of Corollary 3.1.�

    The following proposition shows that the wordsw which areGw-full are the wordswhich minimize the recurrence of ordern for all n > 0.

    Proposition 3.2. A wordw ∈ AN is Gw-full if and only if for all n � 0,Pn(w) = min

    v∈ANPn(v). (12)

    Proof. Let w ∈ AN be aGw-full word. For anyv ∈ AN , one hasλw(n) = dn � λv(n) for0 � n � Gw. By Eq. (8), one derivesPn(w) � Pn(v) for 0 � n � Gw . Forn > Gw one hasPn(w) = 0� Pn(v), so that Eq. (12) is satisfied.

    Conversely, suppose that Eq. (12) is satisfied for alln > 0. We first prove thatGw = GN .Indeed, ifv ∈ AN is a word such thatGv = GN , by Eq. (12) one has

    PGN +1(w) � PGN+1(v) = 0,so that any factor ofw of lengthGN + 1 is unrepeated. This impliesGN � Gw. SinceGw � GN , it follows Gw = GN . If Gw = 0 the result is trivial, so that suppose thGw > 0.

    Let u ∈ A∗ be a de Bruijn word of orderGw = GN . By Eq. (11), one has|u| = dGN + GN − 1 < N.

    Let u′ ∈ AN be any word of lengthN having u as a factor. By Eq. (12), one haPGw(u

    ′) � PGw(w) so that by Eq. (8),λw(Gw) � λu′ (Gw). Sinceu′ is trivially Gw-full,we conclude thatw is Gw-full. �

  • 494 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    f

    Proposition 3.3. Letw ∈ AN , with N = dm + m − 1, m � 1. The following conditions areequivalent:

    (1) w is a de Bruijn word(of orderm),(2) w is Gw-full,(3) Gw = GN .

    Proof. (1) ⇒ (2), trivial; (2)⇒ (3), by Proposition 3.1.(3) ⇒ (1). If Gw = GN , then by Lemma 3.3 one hasGw = GN = m − 1 so that, by

    Lemma 3.1,w is a de Bruijn word of orderm. �As is well known (see [8]), a de Bruijn wordw of orderm over ad-letter alphabetA

    has a border of lengthm − 1, which will be denoted byβw. This is the longest border ofwsince any factor ofw of length� m is unrepeated. This implies that the minimal periodπwis equal todm. Thus, one can writew = uβw, with u primitive word. The following lemmaon de Bruijn words will be useful in the sequel.

    Lemma 3.4. Let v ∈ A∗ be a de Bruijn word of orderm and setw = vs, with s ∈ A∗. Forall n � m and allu ∈ An, one has

    ‖u‖w = dm−n + ‖u‖βvs .

    In particular,

    ‖u‖v = dm−n + ‖u‖βv .

    Proof. Let us writev asv = λβv so thatw = λβvs andm = |βv| + 1. By Lemma 2.2, foranyu ∈ An one has

    ‖u‖w =∑

    f∈uA∗∩Am‖f ‖v + ‖u‖βvs .

    Sincev is a de Bruijn word of orderm, one has

    ∑f∈uA∗∩Am

    ‖f ‖v = Card(uA∗ ∩ Am) = dm−n,

    from which the conclusion follows. �It is well known (see [8]) that for any wordv ∈ Am, the number of de Bruijn words o

    orderm on the alphabetA havingv as a prefix (or suffix) is given by

    D(d,m) = (d − 1)!dm−1ddm−1−m. (13)

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 495

    s

    ple).sake

    er

    Thus, the total number of de Bruijn words of orderm on ad-letter alphabet is given by

    dmD(d,m).

    Let w = uβw be a de Bruijn word of orderm over the alphabetA. The conjugacy clas[u] of u will be called acircular de Bruijn wordof orderm. This definition is motivatedby the fact that by readingdm + m − 1 consecutive letters on the circular word[u],independently from the starting point, one obtains a de Bruijn word (cf. [8], for examMore formally, one can state the following lemma, a proof of which we report for theof completeness.

    Lemma 3.5. Let α be a circular de Bruijn word of orderm, v an element ofα, andv′ theprefix ofv of lengthm − 1. Then the wordvv′ is a de Bruijn word of orderm.

    Proof. Let w = uβw be a de Bruijn word of orderm such thatu ∈ α. By Lemma 3.2,w is m-full. Since u andv are conjugate,w is a factor ofvvv′ . Hence,vvv′ is m-full.Clearly, any factor of lengthm of vvv′ is a factor ofvv′ , so thatvv′ is m-full. Thus, since|vv′| = |w|, by Lemma 3.2 one derives thatvv′ is a de Bruijn word of orderm. �

    One can easily realize that the number of circular de Bruijn words of orderm on thealphabetA is equal to the number of de Bruijn words of orderm starting with any fixedword ofAm, i.e.,D(d,m).

    4. Uniform words

    Let n be a non-negative integer. A wordw over ad-letter alphabetA will be calledn-uniformif for all u,v ∈ An one has

    ∣∣‖u‖w − ‖v‖w∣∣ � 1.For instance, on the alphabet{a, b}, the wordaaababbbb is 1-uniform but it is not 2-uniform whereas the wordbabbaabb is 2-uniform but is not 1-uniform. On a one-lettalphabet, any word is triviallyn-uniform for alln.

    We remark that any wordw ∈ A∗ is trivially n-uniform for all n > Gw. Indeed, anyword of lengthn occurs at most once inw, so that|‖u‖w − ‖v‖w | � 1 for all u,v ∈ An.

    We observe that ifϕ :A∗ → A∗ is an automorphism or an antiautomorphism ofA∗, thenfor all u,w ∈ A∗ one has

    ‖ϕ(u)‖ϕ(w) = ‖u‖w.

    From this, one easily derives that a wordw is n-uniform (respectivelyn-full) if and only ifϕ(w) is n-uniform (respectivelyn-full).

  • 496 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    e

    de

    .

    Lemma 4.1. A wordw ∈ AN is n-uniform if and only if for allu ∈ An one has⌊

    N − n + 1dn

    ⌋� ‖u‖w �

    ⌈N − n + 1

    dn

    ⌉. (14)

    Proof. If n > N , the result is trivial since(N − n + 1)/dn� = 0. Thus, we assum0 � n � N . If Eq. (14) holds for allu ∈ An, then for allu,v ∈ An one has

    ∣∣‖u‖w − ‖v‖w∣∣ �⌈

    N − n + 1dn

    ⌉−

    ⌊N − n + 1

    dn

    ⌋� 1,

    so thatw is n-uniform.Conversely, ifw is n-uniform, then by Lemma 2.1 for allu ∈ An one has

    ∣∣∣∣‖u‖w − N − n + 1dn∣∣∣∣ = 1dn

    ∣∣∣∣dn‖u‖w − ∑v∈An

    ‖v‖w∣∣∣∣

    � 1dn

    ∑v∈An\{u}

    ∣∣‖u‖w − ‖v‖w∣∣ � dn − 1dn

    < 1,

    from which Eq. (14) follows. �Proposition 4.1. If w is n-uniform withn � Gw, thenw is n-full.

    Proof. Sincen � Gw , there exists a repeated factorv of w of length |v| = n. As w isn-uniform, for allu ∈ An one has

    ‖u‖w � ‖v‖w − 1 � 1.

    Therefore,An ⊆ Fact(w). �The following two propositions concern some interesting relations existing between

    Bruijn words of orderm andn-uniform words withn � m.

    Proposition 4.2. Let v ∈ A∗ be a de Bruijn word of orderm andβv be its longest borderFor anyn � m, the wordw = vs is n-uniform if and only ifβvs is n-uniform. In particular,for anyn � m, v is n-uniform if and only ifβv is n-uniform.

    Proof. By Lemma 3.4, ifn � m for all u,u′ ∈ An one has∣∣‖u‖w − ‖u′‖w∣∣ = ∣∣‖u‖βvs − ‖u′‖βvs ∣∣,

    so that the conclusion follows.�

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 497

    y

    .

    s

    h

    d

    We notice that any de Bruijn wordv of order m is (m − 1)-uniform. Indeed,βv is trivially (m − 1)-uniform since |βv| = m − 1, and the conclusion follows bProposition 4.2.

    Proposition 4.3. Letw ∈ A∗ be a word of lengthN = dm + m − 1, with m > 1, and writew = vx with x ∈ A. The wordw is a de Bruijn word of orderm if and only if the followingconditions are satisfied:

    (1) v ∈ Am−2xA∗,(2) Gv � m − 1,(3) v is (m − 1)-uniform.

    Proof. If d = 1 the result is trivial. Let us supposed > 1. If w = vx is a de Bruijn word oforderm, thenx is the last letter ofβw. Sinceβw is a prefix ofv, Condition (1) is satisfiedBy Lemma 3.1 one hasGw = m − 1 so that Condition (2) is satisfied. For allu ∈ Am−1one has‖u‖v = ‖u‖w − ‖u‖βw . By Lemma 3.4,‖u‖w = d + ‖u‖βw , so that one obtains

    ‖u‖v = d,

    and Condition (3) is satisfied.Conversely, suppose thatv satisfies Conditions (1)–(3) and lets be the suffix ofv of

    lengthm − 1. Sincev is (m − 1)-uniform, by Lemma 4.1 one derives‖s‖v = d . Thus,shas at mostd −1 distinct right extensions inv, i.e., there exists at least one lettery such thatsy /∈ Fact(w). Hence,sy is an unrepeated factor ofvy so that by Condition (2) one deriveGvy � m − 1. In view of Corollary 3.1, one obtainsGvy = GN = m − 1. Therefore, byLemma 3.1,vy is a de Bruijn word of orderm. Consequently,vy has a border of lengtm − 1 and by Condition (1),y = x. Thus,w = vx is a de Bruijn word of orderm. �

    A word w is said to beuniform if it is n-uniform for all n � 0. For instance, the worw = aaababbba on the alphabet{a, b} is uniform.

    We remark that by Proposition 4.1 a uniform wordw of lengthN is Gw-full so that, byProposition 3.1,Gw = GN .

    Lemma 4.2. Letw ∈ AN . The following conditions are equivalent:

    (1) w is uniform,(2) w is n-uniform for alln � GN + 1,(3) w is n-uniform for alln � Gw.

    Proof. The implication (1)⇒ (2) is trivial.Now, let us prove the implication (2)⇒ (3). Since w is (GN + 1)-uniform, by

    Lemma 4.1 and Eq. (11) one has that for allu ∈ AGN+1,

    ‖u‖w �⌈

    N − GNGN+1

    ⌉� 1,

    d

  • 498 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    Bruijn

    e

    g

    eport

    so that no word of lengthGN + 1 is a repeated factor ofw, that impliesGw = GN . Thus,w is n-uniform for alln � Gw.

    The implication (3)⇒ (1) follows from the fact thatw is n-uniform for alln > Gw. �Optimality of the bounds given in the previous lemma is shown by the following:

    Example 4.1. Consider the wordw = abbaabb on the binary alphabetA = {a, b}. In thiscase,N = |w| = 7 andGw = 3. Moreover, by Lemma 3.3 one derivesGN = 2. The wordw is n-uniform for 0� n � 2 but it is not 3-uniform sinceabb has 2 occurrences inwwhereasaba has none.

    Proposition 4.4. Let w ∈ A∗ be a uniform word of length|w| = dm + m − 1, with m � 1.Thenw is a de Bruijn word of orderm.

    Proof. If w is uniform, then by Proposition 4.1,w is Gw-full, so that by Proposition 3.3the result follows. �

    We observe that a de Bruijn word is not necessarily uniform. For instance, the dewordaaababbbaa of order 3 on the alphabet{a, b} is not uniform.

    Proposition 4.5. Let v be a de Bruijn word of orderm andβv be its longest border. Thword v is uniform if and only ifβv is uniform.

    Proof. By Lemma 3.1,Gv = m− 1. Hence, by Lemma 4.2,v is uniform if and only if it isn-uniform for alln � m − 1. By Proposition 4.2, this occurs if and only ifβv is n-uniformfor all n � m − 1, that is, if and only ifβv is uniform. �

    From the previous proposition one has that on any alphabet there exist arbitrarily lonuniform de Bruijn words. Indeed, any de Bruijn word of order 1 is trivially uniform. Ifw isany uniform word, then a de Bruijn wordv such thatβv = w is a longer uniform de Bruijnword.

    5. Counting uniform words

    In this section we consider the problem of counting uniform words. In Table 1 we rthe numbers of uniform binary words of length� 20.

    Table 1Numbers of uniform binary words of length� 20Length 1 2 3 4 5 6 7 8 9 10Uniform words 2 2 6 4 4 12 34 20 16 8

    Length 11 12 13 14 15 16 17 18 19 20Uniform words 68 100 144 314 668 360 288 128 192 400

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 499

    umijn

    The behavior of this function is quite irregular, with several points of local maximand minimum. By using some technical lemmas and propositions on circular de Bruwords and the ‘critical point theorem,’ we obtain lower bounds for the numberDU(d,N)of uniform words of lengthN on ad-letter alphabet, in the casesN = dm andN = dm +1.Moreover, we prove a recursive formula onDU which allows one to computeDU(d,N)for infinitely many values ofN . Finally, we show that ifd > 1 the fractionDU(d,N)/dN

    tends to 0 whenN tends to infinity.

    Lemma 5.1. Letw = uu′ be a word withu unbordered andu′ a prefix ofu. Then there donot exist two occurrences(λ1, λ2) and(λ′1, λ

    ′2) of a same non-empty factorf of w, with

    |λ1| < |λ′1| < |u| < |λ1f |. (15)

    Proof. By contradiction, suppose that there exist two occurrences(λ1, λ2) and(λ′1, λ′2) ofa non-empty factorf of w, satisfying Eq. (15). We can write

    w = λ1f λ2 = λ′1f λ′2.

    By Eq. (15),

    u = λ1f1 = λ′1f ′1, u′ = f2λ2 = f ′2λ′2, and f = f1f2 = f ′1f ′2,

    with 0 < |f ′1| < |f1|. One derives that there existsξ �= ε such that

    f1 = f ′1ξ and ξf2 = f ′2,

    so that

    u = λ1f ′1ξ and u′ = ξf2λ′2.

    Sinceu′ is a prefix ofu, ξ is a border ofu, which gives a contradiction.�Proposition 5.1. Let w = uβw be a de Bruijn word withu unbordered andx be the firstletter ofw. Thenu andux are uniform words.

    Proof. Let w = uβw ∈ A∗ be a de Bruijn word of orderm. If m = 1 the result is trivial.Thus we supposem � 2.

    Let f ∈ An with n � m − 1. Then

    ‖f ‖w = ‖f ‖u + δf + ‖f ‖βw,

    where

    δf = Card({

    (λ,µ) ∈ A∗ × A∗ ∣∣ λf µ = w, |λ| < |u| < |λf |}).

  • 500 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    ehe

    ed to

    imal

    Sinceu is unbordered, from the previous lemma one hasδf � 1. By Lemma 3.4,

    ‖f ‖w = dm−n + ‖f ‖βw ,so that

    ‖f ‖u = dm−n − δf ∈ {dm−n − 1, dm−n}. (16)As Gu � Gw = m − 1, by Lemma 4.2 it follows thatu is uniform.

    We recall that, asβw is a prefix ofw, x is the first letter ofβw. Now, letg be the suffixof ux of lengthn � m − 1. Thenδg = 1. Thus, by Eq. (16) one has

    ‖g‖ux = ‖g‖u + 1 = dm−n − δg + 1 = dm−n,and, for allf ∈ An such thatf �= g,

    ‖f ‖ux = ‖f ‖u ∈ {dm−n − 1, dm−n}.SinceGux � Gw = m − 1, by Lemma 4.2 it follows thatux is uniform. �Example 5.1. Consider a de Bruijn wordw = uβw of orderm ending byabm−1, witha, b ∈ A anda �= b. We can writew asw = vabm−1. In this case,βw = bm−1 andu = va.The wordu is unbordered since a non-empty prefix ofβw cannot be a suffix ofva.Therefore, by using the preceding proposition the wordsu andub are uniform. Since therareD(d,m) de Bruijn words ending byabm−1, one can construct, by considering all tpossible choices ofa, b ∈ A, d(d − 1)D(d,m) uniform words of lengthdm and of lengthdm + 1.

    In order to develop more powerful counting arguments for uniform words, we nerecall some definitions and results on local periods of a word.

    Let w be a word overA. Any pair (w1,w2) of words such thatw = w1w2 is calledapointof w. A word uu, with u �= ε, is called alocal repetitionof w in the point(w1,w2)if the following two conditions are satisfied:

    A∗u ∩ A∗w1 �= ∅ and uA∗ ∩ w2A∗ �= ∅.If uu is the shortest local repetition in the point(w1,w2), then the length ofu is called thelocal periodof w in the point(w1,w2) and it is denoted byp(w1,w2). If one has

    πw = p(w1,w2),then(w1,w2) is called acritical point of w.

    The following important theorem (cf. [13]) relates the local periods and the minperiod of a word.

    Theorem 5.1. Any non-empty wordw has a critical point(w1,w2) with w1 �= �.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 501

    e,

    d

    For anyw ∈ A+ we setG[w] = max

    u∈[w]Gu.

    Lemma 5.2. Letw be a non-empty word. If one has

    w = λvµ with |v| = 2G[w] + 1,then for any critical point(v1, v2) of v the word

    w′ = v2µλv1is unbordered.

    Proof. By contradiction, lets be a non-empty border ofw′. One easily verifies thatss isa local repetition ofv in the critical point(v1, v2). Therefore, one has|s| � p(v1, v2) = πv .Sincev is a factor ofw, one hasGv � Gw � G[w]. By Lemma 2.3 and the fact thatw′ isconjugate tow,

    πv � |v| − Gv � G[w] + 1 � Gw′ + 1.Thus,|s| > Gw′ which is a contradiction, sinces is repeated inw′. �Proposition 5.2. Letw be a primitive word and set

    r =⌈ |w|

    2G[w] + 1⌉.

    Then the number of unbordered conjugates ofw is at leastr.

    Proof. We can suppose, with no loss of generality, thatw is unbordered (for instance, oncan replacew with its Lyndon-conjugate, cf. [13]). Ifr = 1 the result is trivial. Thereforesupposer > 1 and factorizew as

    w = v1 · · ·vr−1w′

    with |vi | = 2G[w] + 1, i = 1, . . . , r − 1, andw′ ∈ A+. For anyi = 1, . . . , r − 1, let(v′i , v′′i )be a critical point of the wordvi such thatv′i �= ε. Let us set fori = 1, . . . , r − 1,

    wi = v′′i vi+1 · · ·vr−1w′v1 · · ·vi−1v′i .By the previous lemma, these words are unbordered. Moreover,w,w1, . . . ,wr−1 arepairwise strictly conjugates. Thus, sincew is primitive, they are pairwise distinct.�

    As an example consider the wordw = aabaabbabb. One hasG[w] = 4 and r =

    |w|/(2G[w] + 1)� = 2. The wordw′ = bbabbaabaa is the only other unbordered worconjugate tow.

  • 502 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    larastrean be

    Corollary 5.1. Any circular de Bruijn word of orderm contains at least

    ⌈dm

    2m − 1⌉

    uniform words.

    Proof. Let [u] be a circular de Bruijn word of orderm. Thenu is a primitive word andG[u] � m − 1 since any word of[u] is a factor of a de Bruijn word of orderm. Thus, since|u| = dm one has

    ⌈ |u|2G[u] + 1

    ⌉�

    ⌈dm

    2m − 1⌉

    = q.

    By Proposition 5.2, there exist at leastq distinct unbordered conjugates ofu

    u1, . . . , uq.

    By Lemma 3.5, any of these words gives rise to a de Bruijn wordwi = uiβwi , i = 1, . . . , q ,where, for anyi = 1, . . . , q , βwi is the prefix ofui of lengthm− 1. By Proposition 5.1, thewordsui are uniform. �Corollary 5.2. Letd > 1 andm � 1. The numbers of uniform words of lengthN = dm andN = dm + 1 are greater than

    (d!)N/d2 logd N

    and(d!)(N−1)/d

    2 logd (N − 1),

    respectively.

    Proof. By Corollary 5.1, for anym > 0 any circular de Bruijn word of orderm containsat leastdm/(2m − 1)� uniform words. Since the number of distinct de Bruijn circuwords of orderm is D(d,m) and they are pairwise disjoint, there will exist at leD(d,m)dm/(2m − 1)� uniform words of lengthdm. Moreover, since these words aunbordered elements of de Bruijn circular words, by Proposition 5.1 each of them cextended into a uniform word of lengthdm + 1. By Eq. (13) one derives

    D(d,m)

    ⌈dm

    2m − 1⌉

    >(d!)dm−1

    2m.

    Thus, after settingN = dm andN = dm + 1, respectively, the result follows.�For anyN � 0 we denote byDU(d,N) the number of uniform words of lengthN on

    ad-letter alphabet.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 503

    5

    e

    e

    ves

    We inductively define the sequence of integersfn with n > 0 as follows:

    f1 = 0, fn = dfn−1+1 + fn−1, n > 1.

    Proposition 5.3. For anyn > 1 one has

    DU(d, fn) = dD(d,fn−1 + 1)DU(d, fn−1) = dn−1n−1∏k=1

    D(d,fk + 1).

    Proof. Sincefn = dfn−1+1 + fn−1, by Proposition 4.4 ifw ∈ A∗ is a uniform word oflengthfn, then it is a de Bruijn word of orderfn−1 + 1. Moreover, by Proposition 4.any de Bruijn wordv of orderfn−1 + 1 is uniform if and only if its longest borderβv isuniform. Sinceβv is the suffix ofv of lengthfn−1, DU(d, fn) equals the number of dBruijn words of orderfn−1 + 1 having a uniform suffix of lengthfn−1.

    Since the number of uniform words of lengthfn−1 is DU(d, fn−1) and the number of dBruijn words of orderfn−1 +1 ending by a fixed suffix of lengthfn−1 is dD(d,fn−1 +1),for all n > 1 the following relation holds:

    DU(d, fn) = dD(d,fn−1 + 1)DU(d, fn−1).

    As DU(d, f1) = DU(d,0) = 1, by iteration of the preceding formula one easily derithe result. �

    Let us denote byD(1)U (d,N) the number of 1-uniform words of lengthN on ad-letteralphabet. The following holds:

    Proposition 5.4. LetN = dq + r with 0 � r < d . Then

    D(1)U (d,N) =

    (d

    r

    )N !

    q!d (q + 1)r .

    Proof. SinceN = dq + r, by Lemma 4.1 one derives that a wordw ∈ AN is 1-uniform ifand only if there arer lettersa ∈ A such that

    ‖a‖w =⌈

    N

    d

    ⌉, (17)

    while for the remainingd − r lettersb ∈ A one has

    ‖b‖w =⌊

    N⌋. (18)

    d

  • 504 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    h

    e

    dsas and

    cientd

    The r letters ofA satisfying Eq. (17) can be chosen in(

    dr

    )different ways. For any suc

    a choice, by a classical result of combinatorial analysis, one obtains

    N !q!d−r (q + 1)!r =

    N !q!d (q + 1)r

    distinct words satisfying Eqs. (17) and (18), i.e., 1-uniform words. Since there are(

    dr

    )such

    choices, the result follows.�By using Stirling’s approximation of logN ! (see, for instance, [9]) and making som

    algebraic manipulations, one derives,

    1

    dND

    (1)U (d,N) � c

    1

    (√

    2πN)d−1,

    wherec is a suitable constant.Since for allN � 0,DU(d,N) � D(1)U (d,N) it follows that if d > 1, thenDU(d,N)/dN

    tends to 0 whenN tends to infinity.

    6. Constructing uniform words

    The main result of this section is that over ad-letter alphabet there exist uniform worof any length. The proof, even though it is based on some rather technical lemmpropositions, is constructive. In particular, Lemma 6.2, calledExchange Lemma, gives thekey result from which, with the help of other auxiliary results, one derives an effiprocedure to construct for anyN a uniform word of lengthN . We also obtain a new methofor constructing for anym a de Bruijn word of orderm. Finally we show thatDU(d,N)tends to infinity withN .

    Lemma 6.1. Let w = λxsyµ with x, y ∈ A, s, λ,µ ∈ A∗ and setn = |xsy|. Then for allu ∈ An one has

    ‖u‖w ={‖u‖λxs + ‖u‖syµ + 1 if u = xsy,

    ‖u‖λxs + ‖u‖syµ otherwise.

    Proof. By using Eq. (1) one has

    ‖u‖w = ‖u‖λxs + ‖u‖xsyµ and ‖u‖xsyµ = ‖u‖xsy + ‖u‖syµ.

    From this, the result follows. �Lemma 6.2 (Exchange Lemma).Letw ∈ A∗ be such that

    w = λixisyiµi, i = 1,2,3, (19)

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 505

    with xi, yi ∈ A, λi,µi, s ∈ A∗, i = 1,2,3, |λ1| < |λ2| < |λ3| and setn = |s| + 2. Thenthere exists a wordw′ such that for allu ∈ A[n] one has

    ‖u‖w′ = ‖u‖w + δu12 + δu23 + δu31 − δu11 − δu22 − δu33, (20)

    where

    δuij ={

    1 if u = xisyj ,0 otherwise.

    Proof. Since|λ1| < |λ2| < |λ3|, by Eq. (19) there areα1, α2 ∈ A∗ such that

    λ2x2s = λ1x1sy1α1, λ3x3s = λ2x2sy2α2. (21)

    Let us set

    w′ = λ1x1sy2α2y1α1y3µ3

    (see Fig. 1) and verify Eq. (20). We first suppose|u| = n. By Eq. (21),x2s andx3s aresuffixes ofsy1α1 andsy2α2, respectively. Hence, there exist wordsλ′2 andλ′3 such that

    λ1x1sy2α2 = λ′2x3s, λ′2x3sy1α1 = λ′3x2s. (22)

    Consequently, we can write

    w′ = λ′3x2sy3µ3. (23)

    As w = λ3x3sy3µ3 and|u| = |x3sy3|, by Lemma 6.1 one has

    ‖u‖w = ‖u‖λ3x3s + ‖u‖sy3µ3 + δu33. (24)

    Still by Lemma 6.1, using Eqs. (21)–(23), one obtains

    Fig. 1. Exchange Lemma.

  • 506 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    td

    ‖u‖λ3x3s = ‖u‖λ2x2s + ‖u‖sy2α2 + δu22,‖u‖λ2x2s = ‖u‖λ1x1s + ‖u‖sy1α1 + δu11,

    ‖u‖w′ = ‖u‖λ′3x2s + ‖u‖sy3µ3 + δu23,‖u‖λ′3x2s = ‖u‖λ′2x3s + ‖u‖sy1α1 + δu31,‖u‖λ′2x3s = ‖u‖λ1x1s + ‖u‖sy2α2 + δu12.

    By the preceding equations and Eq. (24) one easily derives Eq. (20).Now, suppose|u| < n and denote byt the common suffix of lengthn − 1 of w andw′.

    By Lemma 2.2, applied in the caseµ = �, one derives

    ‖u‖w =∑

    v∈uA∗∩An‖v‖w + ‖u‖t and ‖u‖w′ =

    ∑v∈uA∗∩An

    ‖v‖w′ + ‖u‖t .

    Since by the previous result for allv ∈ An one has

    ‖v‖w′ − ‖v‖w = δv12 + δv23 + δv31 − δv11 − δv22 − δv33,

    one derives

    ‖u‖w′ − ‖u‖w =∑

    v∈uA∗∩An(δv12 + δv23 + δv31 − δv11 − δv22 − δv33). (25)

    Notice that the sum∑

    v∈uA∗∩An δvpq is equal to 1 ifxpsyq ∈ uA∗ ∩ An, i.e.,u is a prefixof xps, while it is equal to 0 in the opposite case. One derives

    ∑v∈uA∗∩An

    δvpq =∑

    v∈uA∗∩Anδvpq ′, p, q, q

    ′ = 1,2,3,

    so that by Eq. (25) one has‖u‖w′ − ‖u‖w = 0 from which Eq. (20) follows. �Proposition 6.1. Let w, s ∈ A∗ and setn = |s| + 2. If x, y, x ′, y ′ are letters such thax �= x ′, y �= y ′, x ′sy ′ ∈ Fact(w) andxsy is a repeated factor ofw, then there exists a worw′ such that

    ‖xsy‖w′ = ‖xsy‖w − 1, ‖x ′sy ′‖w′ = ‖x ′sy ′‖w − 1,‖x ′sy‖w′ = ‖x ′sy‖w + 1, ‖xsy ′‖w′ = ‖xsy ′‖w + 1,

    and for allu ∈ A[n] \ {xsy, x ′sy ′, x ′sy, xsy ′}

    ‖u‖w′ = ‖u‖w.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 507

    vialeated

    l

    Proof. We apply the Exchange Lemma taking two of the pairs(xi, yi), i = 1,2,3, equalto (x, y) and the third one equal to(x ′, y ′). As one verifies, in all cases one has

    δu12 + δu23 + δu31 ={

    1 if u = xsy or u = x ′sy or u = xsy ′,0 otherwise,

    and

    δu11 + δu22 + δu33 ={

    2 if u = xsy,1 if u = x ′sy ′,0 otherwise.

    By using Eq. (20), the conclusion follows.�Proposition 6.2. Letw ∈ AN be ann-uniform word withn � 1 such that

    N � dn+1 + n − 1.

    Then there exists a wordw′ ∈ AN such thatGw′ � n and for allu ∈ A[n],

    ‖u‖w′ = ‖u‖w.

    Proof. The proof is obtained by induction on the valueKn+1(w), where the functionKn+1is the repetitivity of ordern + 1 as defined by Eq. (2). The base of the induction is trisince ifKn+1(w) = 0, thenw has no repeated factor of lengthn + 1 so that one can takw′ = w. Let us supposeKn+1(w) > 0. This implies that there exists at least one repefactorf of w of lengthn + 1. We can write

    f = xsy, x, y ∈ A, s ∈ An−1.

    Since the number of occurrences inw of the left extensions ofsy in w is less than or equato the number of the occurrences ofsy in w, one has

    ∑z∈A

    ‖zsy‖w � ‖sy‖w.

    Sincew is n-uniform, by Lemma 4.1 and the assumption thatN � dn+1 + n − 1,

    ‖sy‖w � d.

    Sincef = xsy is repeated inw, one has∑

    ‖zsy‖w � 2+∑

    ‖zsy‖w.

    z∈A z∈A\{x}

  • 508 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    ts

    From the previous relations one derives

    ∑z∈A\{x}

    ‖zsy‖w � d − 2.

    This implies the existence of at least one letterx ′ �= x for which ‖x ′sy‖w = 0, i.e.,x ′sy /∈ Fact(w). We distinguish two cases.

    Case 1. x ′s is not a suffix ofw. Since the number of occurrences inw of the right extensionsof xs in w is less than or equal to the number of the occurrences ofxs in w, one has

    ∑z∈A

    ‖xsz‖w � ‖xs‖w.

    Moreover, sincex ′s is not a suffix ofw, the number of occurrences inw of the rightextensions ofx ′s in w is equal to the number of the occurrences ofx ′s in w, i.e.,

    ∑z∈A

    ‖x ′sz‖w = ‖x ′s‖w.

    Sincew is n-uniform, by the previous relations one derives

    ∑z∈A

    (‖xsz‖w − ‖x ′sz‖w) � ‖xs‖w − ‖x ′s‖w � 1.As ‖xsy‖w � 2 and‖x ′sy‖w = 0, one obtains

    ∑z∈A\{y}

    (‖xsz‖w − ‖x ′sz‖w) � −1.Thus, there exists at least one lettery ′ �= y such that

    ‖xsy ′‖w < ‖x ′sy ′‖w.

    This implies that‖x ′sy ′‖w > 0, i.e., x ′sy ′ ∈ Fact(w). By Proposition 6.1, there exisa wordv ∈ AN such that

    ‖xsy‖v = ‖xsy‖w − 1, ‖x ′sy ′‖v = ‖x ′sy ′‖w − 1,

    ‖x ′sy‖v = ‖x ′sy‖w + 1 = 1, ‖xsy ′‖v = ‖xsy ′‖w + 1

    and, for allu ∈ A[n+1] \ {xsy, x ′sy ′, x ′sy, xsy ′},

    ‖u‖v = ‖u‖w.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 509

    ts

    By the previous equations and Eq. (2), one easily derives that

    Kn+1(w) − Kn+1(v) = ‖xsy‖w + ‖x ′sy ′‖w − ‖xsy ′‖w − 2.

    Since‖xsy‖w � 2 and‖x ′sy ′‖w > ‖xsy ′‖w , one obtainsKn+1(v) < Kn+1(w).

    Case 2. x ′s is a suffix ofw. We consider the wordα = wy ′ wherey ′ /∈ A is an extrasymbol. Sincexsy is repeated inα andx ′sy ′ ∈ Fact(α), by Proposition 6.1 there exisβ ∈ (A ∪ {y ′})∗ such that

    ‖xsy‖β = ‖xsy‖α − 1, ‖x ′sy ′‖β = ‖x ′sy ′‖α − 1,

    ‖x ′sy‖β = ‖x ′sy‖α + 1, ‖xsy ′‖β = ‖xsy ′‖α + 1

    and, for allu ∈ (A ∪ {y ′})[n+1] \ {xsy, x ′sy ′, x ′sy, xsy ′},‖u‖β = ‖u‖α.

    From this latter equation one has, in particular,‖y ′‖β = ‖y ′‖α = 1 and for allz ∈ A ∪ {y ′},‖y ′z‖β = ‖y ′z‖α = 0. This implies that

    β = vy ′

    for a suitablev ∈ A∗. It is trivial to verify that for allu ∈ A∗,‖u‖w = ‖u‖α and ‖u‖v = ‖u‖β .

    By the preceding equations one obtains

    ‖xsy‖v = ‖xsy‖w − 1, ‖x ′sy‖v = ‖x ′sy‖w + 1 = 1

    and, for allu ∈ A[n+1] \ {xsy, x ′sy},

    ‖u‖v = ‖u‖w.

    Since‖x ′sy‖w = 0 and‖x ′sy‖v = 1, one derivesKn+1(w) − Kn+1(v) = ‖xsy‖w − 1� 1.

    In both Cases 1 and 2, we have obtained a wordv such thatKn+1(v) < Kn+1(w) andfor all u ∈ A[n], ‖u‖v = ‖u‖w. Sincew is n-uniform, so will bev. Therefore, by theinduction hypothesis, there exists a wordw′ ∈ AN such thatGw′ � n and for allu ∈ A[n],‖u‖w′ = ‖u‖v = ‖u‖w . This concludes the proof.�

    We observe that Proposition 6.2 is not true in general if one supposes thatN = dn+1+n.This is shown, in the cased = 2, by the following example. The wordw = aabbbbabaa

  • 510 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    d

    .2,as

    .2 are

    ion of

    erd

    ee

    at

    a

    of length 10= 23 +2 is 1-uniform and 2-uniform. However, there does not exist a worw′such that‖u‖w′ = ‖u‖w for all u ∈ A[2] andGw′ � 2. Indeed, such a wordw′ would be auniform word. By Proposition 4.4,w′ is a de Bruijn word of order 2 and by Proposition 4βw′ has to be a uniform word of length 2. Hence,βw′ �= bb, so that, by Lemma 3.4 one h‖bb‖w′ = 2+ ‖bb‖βw′ = 2 �= ‖bb‖w = 3.

    We remark that proofs of the Exchange Lemma and of Propositions 6.1 and 6constructive, so that they furnishan algorithm which receiving in input ann-uniform wordw on the alphabetA of length|w| � dn+1 + n − 1, produces a wordw′ such thatGw′ � nand‖u‖w′ = ‖u‖w for all u ∈ A[n]. In particular,w′ is n-uniform and|w′| = |w|. We givehere an example showing how such an algorithm works. A more detailed descriptsuch a procedure is given by the algorithmEliminateRepetitionsreported in Appendix A.

    Example 6.1. Consider the 3-uniform word

    w = aaabbbabaab bab

    on the alphabet{a, b}. This word has two repeated factors of length 4, namelyaabb andbbab. In order to ‘eliminate’ the repetitionaabb, we note thatbabb is not a factor ofw andbab is a suffix ofw. Thus, according to the proofs of Proposition 6.2 and of the ExchangLemma, one has to ‘exchange the order’ of the underlined factors, obtaining the wo

    w1 = aaabbabbba baab

    whose only repeated factor of length 4 isbbab. To ‘eliminate’ also this repetition, observthatabab is not a factor ofw1 and‖abaa‖w1 = 1 > ‖bbaa‖w1. This leads to ‘exchangthe order’ of the underlined factors, obtaining the word

    w′ = w2 = aaabbababbbaab.

    One easily checks that, as expected,‖u‖w = ‖u‖w′ for all u ∈ A[3] andGw′ = 3.

    We observe that one can use Proposition 6.2 to construct for anym a de Bruijn word oforderm on ad-letter alphabet. Indeed, letv = βvu be a de Bruijn word of orderm− 1. Forany i � 0 the wordβvui is (m − 1)-uniform. In fact, this is trivial fori = 0. Inductively,if one supposes thatβvui−1 is (m − 1)-uniform, then by Proposition 4.2 one derives thvui−1 = βvui is (m − 1)-uniform. In particular,w = βvud is an(m − 1)-uniform word oflengthdm + m − 2. By Proposition 6.2, starting fromw one can effectively constructwordw′ which is(m − 1)-uniform with Gw′ � m − 1 and|w′| = |w|. Thus, denoted byxthe(m−1)-th letter ofw′, by Proposition 4.3 the wordw′x is a de Bruijn word of orderm.

    Theorem 6.1. For any non-negative integerN there exists a uniform word of lengthN ona d-letter alphabet.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 511

    wordsy theate

    s

    luee

    Proof. The proof is by induction on the integerN . The result is trivial forN � d . Thus,we supposeN > d . Let n be the unique positive integer such that

    dn + n − 1� N � dn+1 + n − 1.

    By the induction hypothesis, there exists a uniform wordt of length|t| = N − dn � n − 1on ad-letter alphabetA. Let λ be the prefix oft of lengthn − 1. Thent = λs, with s ∈ A∗and|λ| = n − 1. Let v be a de Bruijn word of ordern such thatβv = λ and setw = vs.One has

    |w| = |v| + |s| = dn + n − 1+ |t| − |λ| = dn + |t| = N.

    Since the wordt = βvs is uniform, by Proposition 4.2 one has that for anym � n the wordw is m-uniform. By Proposition 6.2, there exists a wordw′ ∈ AN such thatGw′ � n andfor all u ∈ A[n], ‖u‖w′ = ‖u‖w . Therefore,w′ is m-uniform for allm � n. By Lemma 4.2,w′ is a uniform word of lengthN . �

    The previous theorem furnishes an effective procedure to construct uniformof any length on any alphabet. A description of such a procedure is given balgorithmUniformWordreported in Appendix A. We give here an example to illustrsuch a procedure.

    Example 6.2. Let us consider the cased = 2 andN = 14. One has thatn = 3. A uniformword of lengthN − dn = 6 is, for instance,t = aabbab. For such a choice oft , one hasλ = aa ands = bbab. A de Bruijn wordv of order 3 such thatβv = aa is, for instance,v = aaabbbabaa. Thus

    w = vs = aaabbbabaabbab

    is m-uniform form � 3. As shown in Example 6.1, starting fromw one obtains the uniformwordw′ = aaabbababbbaab of length 14.

    By analyzing the algorithmUniformWordone can derive that ford > 1 the number ofuniform words of lengthN diverges withN .

    Indeed, letn1 > · · · > nkN be the distinct values taken by the variablen in therecursive calls of the AlgorithmUniformWord. If we fix in an arbitrary way the lettera1 > · · · > akN ∈ A, in Step 4 of the algorithm we can choose the de Bruijn wordv havingthe prefixβvai , whenevern = ni . Since AlgorithmEliminateRepetitionsdoes not modifyprefixes of lengthn, we will obtain a wordw whoseni -th letter is equal toai , 1� i � kN .In other terms, we are able to constructdkN distinct uniform words of lengthN . As thenumber of recursive calls diverges withN and the variablen cannot assume the same vain more thand consecutive calls, one derives thatkN diverges withN . Thus we can statthe following proposition.

  • 512 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    hers ofdationy)

    e

    Proposition 6.3. For d > 1 one has

    limN→+∞ DU(d,N) = +∞.

    7. Uniformity and majorization

    In this section we introduce inAN two different quasi-order relations by using tmajorization quasi-orderrelation defined on suitable vectors containing the numbeoccurrences in any wordw ∈ AN of words of length� N . We characterize a uniform worof lengthN as a minimum of these two quasi-order relations. A further characterizcan be done in terms of the minimality of functionals belonging to a large class of (strictlSchur-convex functions naturally associated with the majorization orders.

    For anym > 0, letNm be the set of all vectors

    x = (x1, . . . , xm)

    with xi ∈ N, i = 1, . . . ,m. For x ∈ Nm, we denote byx[1], . . . , x[m] the sequence of thcomponents ofx ordered in a non-increasing way, i.e.,

    x[1] � · · · � x[m].

    As is well known [15], one can introduce inNm the quasi-order relation�, calledmajorization, defined as follows: For anyx, y ∈ Nm one setsx � y if

    k∑i=1

    x[i] �k∑

    i=1y[i], for all k = 1, . . . ,m − 1, and

    m∑i=1

    xi =m∑

    i=1yi.

    If x � y andy � x, one writesx ∼ y. It is clear from the definition thatx ∼ y if and onlyif for all i = 1, . . . ,m, x[i] = y[i]. Hence,∼ is an equivalence relation inNm. We setx ≺ yif x � y andx �∼ y.

    A real valued functionφ :Nm → R is said to beSchur-convex(on Nm) if for allx, y ∈ Nm one has:

    x � y �⇒ φ(x) � φ(y).

    If, in addition,φ(x) < φ(y) wheneverx ≺ y, thenφ is calledstrictly Schur-convex. Werecall [15] that ifq :R+ → R is a continuous strictly convex function, then

    Qm(x) =m∑

    q(xi) (26)

    i=1

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 513

    h

    rn

    t

    is strictly Schur-convex. Examples of strictly Schur-convex functions are

    1

    2

    m∑j=1

    xj (xj − 1) andm∑

    j=1xj logxj , (27)

    where we define 0 log0= 0.Let A be ad-letter alphabet andn � 0. We fix a total order ofAn (for instance, the

    lexicographic order induced by a total order ofA, cf. [13]). With eachw ∈ A∗ we associatethe vectorw(n) with dn components inN defined as

    w(n) = (‖u‖w)u∈An,where the order of components ofw(n) is induced by the order ofAn. LetN be any positiveinteger andn � 0. We introduce inAN the quasi-order�n defined as: For allw,v ∈ AN ,

    w �n v if w(n) � v(n).

    We shall writew ∼n v if w(n) ∼ v(n).For all w ∈ A∗ and n � 0, we shall denote simply byw(n)j , j = 1, . . . , dn, the

    components of the vectorw(n), ordered in a non-increasing way, i.e.,w(n)j = w(n)[j ] . ByLemma 2.1, for 0� n � N one has

    dn∑j=1

    w(n)j = N − n + 1. (28)

    Let us explicitly observe that there can be words inAN which are not comparable witrespect to�n. For instance, in the case of the alphabet{a, b}, take the wordsw = abababaandv = aabbbbb. One hasv ��2 w andw ��2 v. Indeed,w(2)1 = w(2)2 = 3,w(2)3 = w(2)4 = 0,andv(2)1 = 4, v(2)2 = v(2)3 = 1, v(2)4 = 0.

    Let q :R+ → R be a continuous strictly convex function. For any positive integem,consider the functionQm defined by Eq. (26). For anyn � 0, we introduce the functioHn :AN → R defined, for allw ∈ AN , by

    Hn(w) = Qdn(w(n)

    ).

    SinceQdn is strictly Schur-convex, one has for allw,v ∈ AN ,w �n v �⇒ Hn(w) � Hn(v), (29)

    whereasHn(w) < Hn(v) if w ≺n v.

    Proposition 7.1. LetN � 1, n � 0, andw ∈ AN . The following conditions are equivalen:

    (1) w is n-uniform,

  • 514 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    s

    (2) w is a minimum of the quasi-order�n,(3) Hn reaches its minimum inw.

    Proof. If n > N , the result is trivial. Indeed, in this case, all words ofAN aren-uniform,for all u,v ∈ AN one hasu ∼n v, andHn is constant. Let us then supposen � N .

    First we prove (1)⇒ (2). Letw ∈ AN be ann-uniform word. By Lemma 4.1 one has⌊N − n + 1

    dn

    ⌋� w(n)j �

    ⌈N − n + 1

    dn

    ⌉, 1 � j � dn. (30)

    Let us prove that for allv ∈ AN one hasw �n v. Since

    N − n + 1=dn∑

    k=1v

    (n)k � d

    nv(n)1 ,

    one derives

    v(n)1 �

    N − n + 1dn

    .

    Let i be the greatest integer such thatv(n)r � (N −n+1)/dn, r = 1, . . . , i. Thus, by Eq. (30)for i < r � dn one has

    v(n)r �⌊

    N − n + 1dn

    ⌋� w(n)r . (31)

    Let 1� k � dn. If k � i, then

    k∑s=1

    v(n)s �k∑

    s=1

    ⌈N − n + 1

    dn

    ⌉�

    k∑s=1

    w(n)s .

    If, on the contrary,k > i, in view of Eq. (31) one has

    dn∑s=1

    v(n)s =dn∑s=1

    w(n)s �k∑

    s=1w(n)s +

    dn∑s=k+1

    v(n)s ,

    so that, again,

    k∑s=1

    v(n)s �k∑

    s=1w(n)s .

    In view of Eq. (28), this implies that in all casesw �n v.Now, we prove that (2)⇒ (3). Indeed, sincew �n v for all v ∈ AN , by Eq. (29) one ha

    Hn(w) � Hn(v), so thatHn reaches its minimum inw.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 515

    sd

    h

    Finally, we prove that (3)⇒ (1). By Theorem 6.1 there exists ann-uniform wordv ∈ AN . By implication (1)⇒ (2), the wordv is a minimum of�n. Hence,v �n w. Ifone hadv ≺n w, then one would deriveHn(v) < Hn(w) which contradicts the minimalityof Hn(w). Thus, the only possibility isv ∼n w which implies thatw is n-uniform. �

    We define inAN the quasi-order relation� = ⋂n�0 �n. Hence for allw,v ∈ AN onehas

    w � v if ∀n � 0, w �n v.

    Proposition 7.2. LetN � 1. For all w,v ∈ AN if w � v, thenGw � Gv .

    Proof. Let us setGw = n and suppose, by contradiction, thatGv < n. Thus there exista wordu ∈ An such that‖u‖w > 1. Since all the factors of lengthn of v are unrepeatein v, one has that for alls ∈ An, ‖s‖v � 1. Hence, one hasw(n)1 > v(n)1 which contradictsthe fact thatw �n v. �

    We can consider inAN a further quasi-order relation� defined as follows. For eacw ∈ AN we consider the vector

    w′ = (‖u‖w)u∈A[N]of dimensionm = (dN+1 − 1)/(d − 1). For anyw,v ∈ AN we set

    w � v if w′ � v′,

    where in the right-hand side of the previous equation� is the majorization relation inthe set of vectors of dimensionm with components inN. We introduce the functionH :AN → R defined for allw ∈ AN by

    H(w) = Qm(w′),

    whereQm is defined by Eq. (26). One easily verifies that for allw ∈ AN ,

    H(w) =N∑

    n=0Hn(w). (32)

    We remark that

    w � v �⇒ H(w) � H(v), (33)

    sinceQm is Schur-convex.

    Lemma 7.1. LetN � 1. For all w,v ∈ AN if v � w, thenv � w.

  • 516 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    f

    neral.

    f

    Proof. Sincev � w, for all n � 0 one hasv(n) � w(n). By a classical result of theory omajorization (see [15, Chapter 5, Proposition A.7]) one derives

    (v(0), . . . , v(N)) � (w(0), . . . ,w(N))

    which impliesv′ � w′, i.e.,v � w. �Remark 7.1. Let us observe that the converse of the above lemma is not true in geThis is shown by the following example: takew = aaabaaaa andv = aabbaabb. Onehasv � w butv �� w. Indeed, as one easily checks,v ��4 w.

    Proposition 7.3. LetN � 1 andw ∈ AN . The following conditions are equivalent:

    (1) w is uniform,(2) w is a minimum of the quasi-order�,(3) w is a minimum of the quasi-order�,(4) H reaches its minimum inw.

    Proof. (1) ⇒ (2). Sincew is uniform, it isn-uniform for all n � 0. By Proposition 7.1,wis a minimum of�n for all n � 0 and therefore, it is a minimum for�.

    (2)⇒ (3), trivial by Lemma 7.1.(3) ⇒ (4). If w is a minimum of the quasi-order�, then by Eq. (33) it is a point o

    minimum ofH .(4) ⇒ (1). By Theorem 6.1 there exists a uniform wordv ∈ AN . Sincev is n-uniform

    for all n � 0, by Proposition 7.1 one has

    Hn(v) � Hn(w).

    Since, by hypothesis,H(v) = ∑Nn=0 Hn(v) � ∑Nn=0 Hn(w), one derives that for alln,0 � n � N , one hasHn(v) = Hn(w). By Proposition 7.1 one concludes thatw isn-uniformfor all n � 0, i.e.,w is uniform. �Proposition 7.4. Letw ∈ AN and1 � n < N . The following conditions are equivalent:

    (1) w = aN , with a ∈ A,(2) w is a maximum of the quasi-order�n,(3) Hn reaches its maximum inw.

    Proof. (1) ⇒ (2). Let w = aN , with a ∈ A. One hasw(n)1 = ‖an‖w = N − n + 1 andw

    (n)i = 0 for 1< i � dn. Let v ∈ AN . For 1� k � dn one has

    k∑i=1

    w(n)i = w(n)1 = N − n + 1 �

    k∑i=1

    v(n)i .

    Since fork = dn the equality holds, one hasv �n w.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 517

    t,

    r

    s

    -hichn thise shallermsventheerse

    (2)⇒ (3). Sincev �n w for all v ∈ AN by Eq. (29) one hasHn(v) � Hn(w), so thatHnreaches its maximum inw.

    (3) ⇒ (1). We have already shown that for anya ∈ A, aN is a maximum for�n.Let w be a word such thatHn(w) is maximum. One hasw �n aN but one cannohavew ≺n aN , as this would implyHn(w) < Hn(aN) which is a contradiction. Hencew ∼n aN . This implies thatw has only one factor of lengthn, i.e., it is a power of a lette(cf. Section 2). �Proposition 7.5. Letw ∈ AN with N � 1. The following conditions are equivalent:

    (1) w = aN , with a ∈ A,(2) w is a maximum of the quasi-order�,(3) w is a maximum of the quasi-order�,(4) H reaches its maximum inw.

    Proof. (1) ⇒ (2). From the preceding proposition, for anyw ∈ AN one hasw �n aN ,n = 1, . . . ,N − 1. Moreover, trivially,w ∼n aN for all n � N as well as forn = 0. Thus,w � aN .

    (2)⇒ (3) follows from Lemma 7.1; (3)⇒ (4) follows from Eq. (33).(4) ⇒ (1). The implication is trivial if N = 1. Let us supposeN > 1 and, by

    contradiction, thatw is not the power of a letter. Leta ∈ A. By Proposition 7.4 one haHn(w) < Hn(a

    N) for 1 � n < N . Since, moreover,HN(w) = HN(aN) and H0(w) =H0(a

    N), by Eq. (32) one derivesH(w) < H(aN), which is a contradiction. �

    8. Entropy, repetitivity, and recurrence

    In the previous section, we have characterizedn-uniform and uniform words by a property of minimality of some functionals which belong to a large class of functions ware strictly Schur-convex functions naturally associated with majorization orders. Isection we wish to interpret these results by specifying some of these functionals. Wobtain an interesting structural information on uniformity of a word expressed in tof ‘entropy,’ ‘repetitivity,’ and ‘recurrence.’ More precisely, we show that a word of gilength is uniform if and only if it maximizes the entropy and if and only if it minimizesrepetitivity. Moreover, a uniform word minimizes the recurrence even though the convis not true in general.

    A Bernoulli distributionof order m is anm-vectorp = (p1, . . . , pm) of non-negativereal numbers such that

    m∑pi = 1.

    i=1

  • 518 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    ent

    ent

    e

    ge

    By entropyof p one means the functionalH(p) defined by

    H(p) = −m∑

    i=1pi logpi.

    As is well known [11], if for 1� i � m, pi is interpreted as the probability of a certain evEi and one supposes that the eventsEi are disjoint (i.e., mutually exclusive), thenH(p)measures theaverage uncertaintyabout the prevision of the result of a random experimdescribed byp.

    Let w ∈ AN . For anyn � 0 one can consider the Bernoulli distribution of orderdndefined by thedn-vector ( ‖u‖w

    N − n + 1)

    u∈An= 1

    N − n + 1w(n).

    We shall set

    En(w) =H(

    w(n)

    N − n + 1)

    and callEn(w) theentropyof ordern of w. For anyw ∈ AN , En(w) represents the averaguncertainty in making the prevision on the factor which is ‘read’ in the wordw of lengthN by sliding inw in a random way a window of lengthn.

    By simple manipulations one has

    En(w) = log(N − n + 1) − 1N − n + 1Hn(w), (34)

    where

    Hn(w) =∑u∈An

    ‖u‖w log‖u‖w.

    One can also introduce the vector(2‖u‖w

    (N + 1)(N + 2))

    u∈A[N]= 2

    (N + 1)(N + 2)w′,

    which is a Bernoulli distribution of orderm = (dN+1 − 1)/(d − 1). We set

    E(w) =H(

    2

    (N + 1)(N + 2)w′)

    ,

    and callE(w) the entropyof w. The entropyE(w) can be interpreted as the averauncertainty in making the prevision on the factor ofw obtained by taking in a randomway any possible occurrence of one factor.

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 519

    s

    and

    t

    One easily derives:

    E(w) = log (N + 1)(N + 2)2

    − 2(N + 1)(N + 2)H(w), (35)

    where

    H(w) =N∑

    n=0Hn(w). (36)

    By Eqs. (34)–(36), one obtains the following relation:

    E(w) = c + 2(N + 1)(N + 2)

    N∑n=0

    (N − n + 1)En(w), (37)

    wherec is a constant given by

    c = log (N + 1)(N + 2)2

    − 2(N + 1)(N + 2)

    N∑n=0

    (N − n + 1) log(N − n + 1).

    If w = aN with a ∈ A, one hasEn(aN) = 0 for 0� n � N , so that by Eq. (37) one hac = E(aN).

    Now let us consider for any wordw ∈ AN the repetitivityKn(w) of ordern of w andthe total repetitivityK(w) defined by Eqs. (2) and (3), respectively.

    Since functions defined by Eq. (27) are strictly Schur-convex, from Eq. (34)Proposition 7.1 one derives:

    Proposition 8.1. LetN � 1, n � 0, andw ∈ AN . The following conditions are equivalen:

    (1) w is n-uniform,(2) w maximizes the entropy of ordern,(3) w minimizes the repetitivity of ordern.

    From Eq. (35) and Proposition 7.3 one derives:

    Proposition 8.2. Letw ∈ AN with N � 1. The following conditions are equivalent:

    (1) w is uniform,(2) w maximizes the entropy,(3) w minimizes the total repetitivity.

    By Propositions 7.4 and 7.5 one derives also that the entropiesEn, 1 � n < N , andE reach their minimal values inw if and only if w = aN with a ∈ A. Moreover, the

  • 520 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    es

    niform

    eserally

    d

    repetitivity of ordern, 1 � n < N , and the total repetitivity reach their maximal valuin w if and only if w = aN with a ∈ A.

    Now let us consider, for any wordw ∈ AN , the recurrencePn(w) of ordern and thetotal recurrenceP(w) as defined by Eqs. (6) and (7), respectively.

    Proposition 8.3. A wordw ∈ AN with N � 1 is Gw-full if and only if

    P(w) = minv∈AN

    P(v). (38)

    Proof. Let w ∈ AN beGw-full. By Proposition 3.2, for anyv ∈ AN and alln � 0 one hasPn(v) � Pn(w) so thatP(v) � P(w) and Eq. (38) is satisfied.

    Conversely, suppose that Eq. (38) is satisfied. By Theorem 6.1 there exists a uwordv ∈ AN . By Proposition 4.1,v is Gv-full. Hence, by Proposition 3.2, one has

    Pn(w) � Pn(v)

    for all n � 0 and, by Eq. (38),

    P(v) � P(w).

    This can occur if and only if, for alln � 0, Pn(w) = Pn(v). By Proposition 3.2,w isGw-full. �

    Since a uniform wordw ∈ AN is Gw-full, from the preceding proposition one derivthat a uniform word minimizes the total recurrence. However, the converse is not gentrue. Indeed, there existGw-full words which are not uniform (cf. Section 4).

    A further consequence of Proposition 8.3 is the following corollary, which was provein the cased = 2 in [16] with a different technique.

    Corollary 8.1. Let d > 1. For anyN � 1,

    maxv∈AN

    Card(Fact(v)

    ) = dGN+1 − 1d − 1 +

    (N − GN + 1

    2

    ).

    Proof. By the preceding proposition and Eq. (9) a wordw ∈ AN is Gw-full if and only if

    Card(Fact(w)

    ) = maxv∈AN

    Card(Fact(v)

    ).

    If w is Gw-full, then by Proposition 3.1,Gw = GN . Moreover,λw(n) = dn for 0 �n � GN andλw(n) = N − n + 1 for GN + 1 � n � N . Therefore,

    Card(Fact(w)

    ) = N∑λw(n) = dGN+1 − 1d − 1 +

    (N − GN + 1

    2

    ).

    n=0

  • A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522 521

    ;

    atics

    436.(Eds.),

    5–

    Since by Theorem 6.1 for anyN � 1 there exists a wordw ∈ AN which is uniform andconsequentlyGw-full, the assertion is proved.�

    Appendix A

    Algorithm EliminateRepetitions

    Requires: an integern > 0 and ann-uniform wordw ∈ A∗ of length|w| � dn+1 + n − 1.Ensures: for all u ∈ A[n], ‖u‖w remains unchanged andGw � n.Step 1. search for a pair of distinct occurrences(λi ,µi), i = 1,2, of a same factorf of w

    of lengthn + 1; if such a pair is not found, then terminate;Step 2. writef = xsy, with x, y letters;Step 3. find a letterx ′ ∈ A such thatx ′sy is not a factor ofw;Step 4. ifx ′s is not a suffix ofw, then find a lettery ′ such that‖x ′sy ′‖w > ‖xsy ′‖w and

    an occurrence(λ3,µ3) of x ′sy ′; else setw = λ3x ′s;Step 5. factorizew = w1w2w3w4 with {w1,w1w2,w1w2w3} = {λ1xs,λ2xs,λ3x ′s};Step 6. replacew by w1w3w2w4;Step 7. go to Step 1.

    Algorithm UniformWord

    Requires: an integerN > 0.Ensures: w is a uniform word of lengthN on the alphabetA.

    Step 1. ifN � d , then setw equal to the concatenation ofn distinct letters and terminateStep 2. setn = max{k ∈ N | dk + k − 1 � N};Step 3. by recursion, produce a uniform wordw of lengthN − dn;Step 4. produce a de Bruijn wordv of ordern such that its longest borderβv is equal to

    the prefix ofw of lengthn − 1;Step 5. setv = uβv and replacew by uw;Step 6. run algorithmEliminateRepetitions.

    References

    [1] A. de Luca, On the combinatorics of finite words, Theoret. Comput. Sci. 218 (1999) 13–39.[2] A. Carpi, A. de Luca, Words and special factors, Theoret. Comput. Sci. 259 (2001) 145–182.[3] A. Carpi, A. de Luca, On the distribution of characteristic parameters of words, Theoret. Inform

    Appl. 36 (2002) 67–96.[4] A. Carpi, A. de Luca, Full and uniform sequences, Proc. Steklov Inst. Math., in press.[5] A. Carpi, A. de Luca, S. Varricchio, Words, univalent factors, and boxes, Acta Inform. 38 (2002) 409–[6] M. Crochemore, C. Hancart, Automata for matching patterns, in: G. Rozenberg, A. Salomaa

    Handbook of Formal Languages, vol.2, Springer, Berlin, 1997, pp. 399–462.[7] N.G. de Bruijn, A combinatorial problem, Nederl. Akad. Wetensch. Proc. 49 (1946) 758–764.[8] H. Fredricksen, A survey of full length nonlinearshift register cycle algorithms, SIAM Rev. 24 (1982) 19

    221.

  • 522 A. Carpi, Aldo de Luca / Advances in Applied Mathematics 32 (2004) 485–522

    d

    ca 13

    .

    iv.

    NY,

    7–200.

    [9] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics.A Foundation for Computer Science, 2nedition, Addison–Wesley, Reading, MA, 1994.

    [10] J.D. Kececioglu, E.W. Myers, Combinatorial algorithms for DNA sequence assembly, Algorithmi(1995) 7–51.

    [11] A.I. Khinchin, Mathematical Foundationsof Information Theory, Dover, New York, 1957.[12] A. Lempel, J. Ziv, Compressionof individual sequences via variable-rate coding, IEEE Trans. Inform

    Theory 24 (1978) 530–536.[13] M. Lothaire, Combinatorics on Words, 2nd edition, in: Cambridge Mathematical Library, Cambridge Un

    Press, Cambridge, UK, 1997.[14] M. Lothaire, Algebraic Combinatorics on Words, Cambridge Univ. Press, Cambridge, UK, 2002.[15] A.W. Marshall, I. Olkin, Inequalities: Theory of Majorization and Its Applications, Academic Press,

    1979.[16] J. Shallit, On the maximum number of distinct factors of a binary string, Graphs Combin. 9 (1993) 19