Download pdf - Partitions and entropies intel third draft

Statistical Properties of the Entropy Function of a Random

Partition

Anna Movsheva

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 General properties of θ(p, x) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2 Functions µ(p) and σ(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 The Generating Function of Momenta . . . . . . . . . . . . . . . . . . . . . . 10

1.3.4 Discussion of Conjecture 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Methods 14

3 Results 14

3.1 Computation of βr and γr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Bell Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Discussion and Conclusion 19

Abstract

It is well known that living organisms are open self-organizing thermodynamic systems with

a low entropy. An estimate for the number of subsystems with low entropy would give a rough

guess about the number of self-organizing subsystems that exist in a closed system S. I study

the mathematical properties of a model in which a finite set X with a probability distribution

{px|x ∈ X} encodes a set of states of the system S. A partition of the set X =∐l

i=1 Yi, in this

model represents a subsystem with the set of probabilities {p(Yi) =∑

x∈Yipx}. In this paper I

study the entropy function H(p, Y ) = −∑

i p(Yi) ln p(Yi) of a random partition Y . In particular

I study the counting function Θ(p, x) = #{Y |H(p, Y ) ≤ x}. Using computer simulations, I give

evidences that the normalized function θ(p, x) = Θ(p, x)/Θ(H(p,X)) asymptotically can be

approximated by the cumulative Gauss distribution 1/√

2πσ(p)∫ x

−∞ exp(−(t−µ(p))2/2σ(p))dt.

I state my findings in a form of falsifiable conjectures some of which I partly prove. The

asymptotics explain a strong correlation between µ(p), the average entropy of a random partition

of X, and the entropy H(p,X). Since the quantity µ(p) is usually available in practice I can

give an estimate for H(p,X) when it is not directly computable.

Movsheva, Anna

1 Introduction

1.1 Background

One of the main problems of theoretical biology and theoretical physics is to reconcile the theory of

evolution with statistical mechanics and thermodynamics. It was Ilya Prigogine who was the first

who made the fundamental contributions to the solution of this problem. He advocated that living

organisms are open self-organizing thermodynamic systems with a low entropy. These open systems

are part of a large closed system S. Since I am interested in open self-organizing thermodynamic

systems it is important to know the number of subsystems within S that have low entropy. In my

work I studied this question from the mathematical point of view. In my simplified approach the

configuration space of S was a finite set X with a probability distribution. In my interpretation a

subsystem was a partition of X. In my work I studied a function that, for a given x, counted the

number of partitions of X who’s entropy did not exceed x. My approach is rather general because

any configuration space can be approximated by a sufficiently large but a finite set.

The controversy between classical biology and physics has a long history. It revolves around

a paradox that physical processes are reversible and biological are not. Boltzmann in the process

of working on this dilemma laid the foundation of statistical physics. He put forward the notion

of entropy which characterizes the degree of disorder in a statistical system. The second law

of thermodynamics in the formulation of Boltzmann states that the entropy of a closed system

cannot decrease, which makes the time in a statistical system irreversible. The solution of the

problems of irreversibility of time did not completely eliminate the contradiction. The second law

of thermodynamics seems to forbid the long term existence of the organized system, such as living

organisms. Schrodinger in his book [19] (Chapter 6) pointed out that the entropy can go down in an

open system, that is a system that can exchange mass and energy with the surroundings. Prigogine

in his groundbreaking works [15, 14, 16] showed that the self-organization (decrease of entropy) can

be achieved dynamically. His discovery layed down the foundation of non-equilibrium statistical

mechanics. The most interesting self-organizing systems exist far away from the equilibrium and

are non static by their nature.

There is a vast literature on self-organization (see e.g.[16, 10, 9, 12] and the references therein).

1

Movsheva, Anna

Current research is focused on the detailed studying of individual examples of self-organization and

is very successful (see e.g.[3]). In this work I changed the perspective. My motivating problem was

rather general - to estimate the total number of self-organizing subsystems in a thermodynami-

cally closed system. Self-organizing subsystems are the most interesting specimen of the class of

subsystems with a low entropy. This motivates my interest in estimating the number of subsys-

tems with a low entropy. Knowing this number the number of self-organizing subsystems can be

assessed. A problem given in such generalities looks very hard so I made a series of simplifications

that let me progress in this direction. Ashby in [1] argued that any system S can be thought of

as a “machine”. His idea is that the configuration space of S can be approximated by a set or

an alphabet X and the dynamics is given by the transition rule TX : X → X. A homomorphism

between machines S = (X,TX) and Q = (Z, TZ) is a map ψ : X → Z such that ψTX = TZψ.

Homomorphisms are useful in analysis of complicated systems. (See [1] for details) A submachine,

according to [1], is a subset X ′ ⊂ X that is invariant in respect to TX . I never used this definition

in this paper. In my definition a submachine is a homomorphic image ψ : (X,TX)→ (Z, TZ). For

example, if a machine (X,T ) consists of N non-interactive sub machines (X1, T1), . . . , (XN , TN )

then X = X1×· · ·×XN , T = T1×· · ·×TN . Projectors ψi(xi, . . . , xN ) = xi are homomorphisms of

machines. This reflects the fact that the configuration space of a union of non interacting systems

is a product (not a union) of the configuration spaces of the components.

Definition 1.1. A collection of subsets Y = {Yz|z ∈ Z} , such that Yz ∩ Yz′ = ∅, z 6= z′ and⋃z∈Z Yz = X is a partition of a finite set X, r = #X. Let ki to be the cardinality #Yz. In this are

I shall use the notation X =∐z Yz

Any homomorphism ψ : (X,TX) → (Z, TZ) defines a partition of X with Yz equal to {x ∈

X|ψ(x) = z}. In fact up to relabeling the elements of Z the homomorphism is the same as a

partition. This also explains why I am interested in the counting of the partitions. Ashby in

[1] argued that a machine (X,T ) is a limiting case of a more realistic Markov process, in which

deterministic transition rules x → T (x) get replaced by random transition rules x → T (x). The

dynamics of the process is completely determined by the probabilities {px′,x|x, x′ ∈ X} to pass from

the state x to the state x′ and the initial probability distribution {px|x ∈ X}. Markov processes

have been studies in the theory of information developed originally in [20].

2

Movsheva, Anna

Yet there is still another way to interpret quantities that I would like to compute. A submachine

can be also be interpreted as a scientific device. This can be understood in the example of a

hurricane on Jupiter [2]. You can analyze the hurricane in a multitude of ways: visually through

the lenses of a telescope, by recording the fluctuations of winds with a probe, by capturing the

fluctuations of the magnetic field around the hurricane. Every method of analysis (device) gives

a statistical data that yields in turn the respective entropy. If (X, p) is a space of states of the

hurricane, then ψ : X → Z is a function, whose set of values is the set of readings of the scientific

device. It automatically leads to a partition of X as it was explained above. The list of known

scientific methods in planetary science is enormous [13], and any new additional method contributes

something to the knowledge. Yet, the full understanding of the subject would be only possible if I

used all possible methods (ψs). This, however, is not going to happen in planetary science in the

near future. The reason is that the set of states X of the Jupiter atmosphere is colossal, which

makes the set of all conceivable methods of its study (devices) even bigger.

Still, imagine that all the mentioned troubles are nonexistent. It would be interesting to count

the number of scientific devices that yield statistical data about the hurricane with entropy no

greater than a given value. It would be also interesting to know their the average entropy. This is

a dream. I did just that in my oversimplified model.

1.2 Research Problem

In the following, the set X will be {1, . . . , r}. Let p be a probability distribution on X, that is

a collection of numbers pi ≥ 0 such that∑r

i=1 pi = 1. The array p = (p1, . . . , pr) is said to be a

probability vector. The probability of Yi in the partition X =∐Yi is

p(Yi) =∑j∈Yi

pj .

Definition 1.2. Entropy of a partition Y , H(p, Y ) is calculated by the expression −∑l

i=0 p(Yi) ln p(Yi).

In this definition the function x lnx is extended to x = 0 by continuity 0 ln 0 = 0.

Here are some examples of entropies: H(p, Ymax) = −∑r

i=1 pi ln pi for Ymax = {{1}, . . . , {r}},

H(p, Ymin) = 0 for Ymin = {{1, . . . , r}}. One of the properties of the entropy function (see [6]) is

3

Movsheva, Anna

that

H(p, Ymin) ≤ H(p, Y ) ≤ H(p, Ymax) for any Y ∈ Pr (1)

It is clear from the previous discussion that Θ(p, x) = #{Y ∈ Pr|H(p, Y ) ≤ x} is identical to

the function defined in the abstract.

The Bell numberBr ([22],[17]) is the cardinality of Pr. The value Θ(p,H(p, Ymax)) = Θ(p,H(p, id))

thanks to (1) coincides with Br. From this I conclude that

θ(p, x) =#{Y ∈ Pr|H(p, Y ) ≤ x}

Br

is the function defined in the abstract.

My main goal is to find a simple approximation to θ(p, x).

1.3 Hypothesis

In this section I will formulate the conjectures that I obtained with Computing Software Mathe-

matica [11].

Remark 1.3. I equipped the set Pr with the probability distribution P such that P(Y ) for Y ∈ Pr

is equal to 1/Br. The value of the function θ(p, x) is the probability that a random partition Y has

the entropy ≤ x. This explains the adjective “random” in the title of the paper.

In order to state the main result I will need to set notation:

p[k] = (p1, . . . , pr,

k︷︸︸︷0, . . . , 0) (2)

where p = (p1, . . . , pr) is the probability vector. From the set of momenta of the entropy of a

random partition

E(H l(p, Y )) =1Br

∑Y ∈Pr

H l(p, Y ) (3)

I will use the first two to define the average µ(p) = E(H(p, Y )) and the standard deviation σ(p) =

4

Movsheva, Anna

√E(H(p, Y )2)− E(H(p, Y ))2.

Conjecture 1.4. Let p be a probability distribution on {1, . . . , r}. Then

limk→∞

E(H l(p[k], Y ))− 1√2πσ

∫ ∞−∞

xle−(x−µ)2

2σ dx = 0

with µ = µ(p[k]), σ = σ(p[k])and for any integer l ≥ 0.

Practically this means that the cumulative normal distribution

Erf(x, µ, σ) =1√2πσ

∫ x

−∞e−

(x−µ)2

2σ

with µ = µ(p[k]), σ = σ(p[k]) makes a good approximation to θ(p[k], x) for large k.

The initial study of the function θ(p, x) has been done with the help of Mathematica. The

software can effectively compute the quantities associated with set X whose cardinality does not

exceed ten.

1.3.1 General properties of θ(p, x)

The plots of some typical graphs are presented in Figure 1.1. These were done with a help of

Mathematica.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.4

0.6

0.8

1.0

1.2

Figure 1.1: Graphs of θ(p, x), θ(q, x).

The continuous line on the graph corresponds to θ(p, x) with

p = (0.082, 0.244, 0.221, 0.093, 0.052, 0.094, 0.079, 0.130)

The step function corresponds to q = (18 , . . . ,

18). Large steps are common for θ(q, x) when q has

5

Movsheva, Anna

symmetries. A symmetry of q is a permutation τ of X such that qτ(x) = qx for all x ∈ X. Indeed,

if I take a symmetry and act it upon a partition, I get another partition with the same entropy.

This way I can produce many partitions with equal entropies. Hence, I get high steps in the graph.

The effect of of the operation p→ p[1] (2) on θ(p, x) is surprising. Here are the typical graphs:

0.0 0.5 1.0 1.5 2.0 2.5

0.2

0.4

0.6

0.8

1.0

1.2

Figure 1.2: Graphs of θ(p, x), θ(p[1], x), θ(p[2], x) for some randomly chosen p = (p1, . . . , p6).

The reader can see that the graphs have the same bending patterns. Aslo graphs lie one over

the other. I wanted to put forth a conjecture that passed multiple numerical tests.

Conjecture 1.5. For any p I have

θ(p, x) ≥ θ(p[1], x)

A procedure that plots θ(p, x) is hungry for computer memory. This is why it is worthwhile to

find a function that makes a good approximation. I have already mentioned in the introduction

that Erf(x, µ(p), σ(p)) approximates θ(p, x) well. For example, if

p = (0.138, 0.124, 0.042, 0.106, 0.081, 0.131, 0.088, 0.138, 0.154), (4)

the picture below indicates a good agreement of graphs.

6

Movsheva, Anna

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.4

0.6

0.8

1.0

1.2

Figure 1.3: Erf(x, µ(p), σ(p)) (red) vs θ(p, x)(blue), with p as in (4).

The reader will more precise relations between Erf and θ in the following sections.

1.3.2 Functions µ(p) and σ(p)

The good agreement of graphs Erf(x, µ(p), σ(p)) and θ(p, x) raises a question of a detailed analysis

of the functions µ(p) and σ(p). It turns out that a more manageable quantities than µ(p) are

β(p) = H(p, Ymax)− µ(p), γ(p) =H(p, Ymax)

µ(p)(5)

The unequally (1) implies that µ(p) ≤ H(p, Ymax) and β(p) ≥ 0, γ(p) ≥ 1. Evaluation of the

denominator of γ(p) with formula (3) requires intensive computing. On my slow machine I used

the Monte-Carlo approximation [8]

µ(p) ∼ 1k

k∑i=1

H(p, Y i)

where Y i are independent random partitions. Below is the graph of β(p1, p2, p3) γ(p1, p2, p3) plotted

in Mathematica. The reader can distinctly see one maximum in the center corresponding to p =

(13 ,

13 ,

13).

Figure 1.4: The plot of β(p1, p2, 1− p1 − p2) Figure 1.5: The plot of γ(p1, p2, 1− p1 − p2)

A closer look at the plot shows that γ(p1, p2, p3) is not a concave function.

7

Movsheva, Anna

In the following hr stands for the probability vector (1r , . . . ,

1r ).

I came up with a conjecture, which has been numerically tested for r ≤ 9:

Conjecture 1.6. The function γ(p1, . . . , pr) can be extended by continuity to all distributions p.

In this bigger domain it satisfies

1 ≤ γ(p) ≤ γ(hr)def= γr. (6)

Likewise the function β satisfies

0 ≤ β(p) ≤ β(hr)def= βr. (7)

The reader should consult sections below on the alternative ways of computing of βr and γr.

The following table contains an initial segment of the sequence of {γr}.

Table 1: Values of γr.r 2 3 4 5 6 7 8 9 ... 100 ... 1000γr 2 1.826 1.739 1.691 1.659 1.635 1.617 1.602 ... 1.426 ... 1.341

I see that it is a decreasing sequence. Extensive computer tests have lead me the following

conjecture.

Conjecture 1.7. The sequence {γr} satisfies γr ≥ γr+1 and limr→∞

γr = 1.

The limit statement is proved in Proposition 3.6.

Corollary 1.8. limt→∞

γ(p[t]) = 1.

Proof. From Conjecture 1.6 I conclude that 1 ≤ γ(p[t]) ≤ γr+t. Since limt→∞

γr+t = 1 by Conjecture

1.7, limt→∞

γ(p[t]) = 1.

Table 2: Values of βr.

r 6 7 8 9 ... 100 ...βr 0.711731 0.756053 0.793492 0.825835 ... 1.3943 ...

8

Movsheva, Anna

Conjecture 1.9. The sequence {βr} satisfies βr ≤ βr+1 and limr→∞

βr =∞.

The situation with the standard deviation σ(p) is a bit more complicated. Here is a graph of

σ(p1, p2, p3).

Figure 1.6: Three-dimensional view of the graph of standard deviation σ(p1, p2, p3) for θ(p, x).

The reader can clearly see four local maxima. The function σ(p1, p2, p3) is symmetric. The

maxima correspond to the points (13 ,

13 ,

13) and permutations of (1

2 ,12 , 0). This lead me to think

that local maxima of σ(p1, . . . , pr) are permutations of qk,r = hk[r − k], k ≤ r. I tabulated the

values of σ(qk,r) for small k and r in the table below.

Table 3: Values of σ(qk,r).

k\r 3 4 5 6 7 8 92 0.3396 0.3268 0.314 0.3026 0.2924 0.2832 0.2753 0.35 0.3309 0.3173 0.3074 0.2992 0.292 0.2864 - 0.3254 0.309 0.298 0.29 0.283 0.2785 - - 0.302 0.289 0.28 0.273 0.2676 - - - 0.283 0.272 0.265 0.2587 - - - - 0.267 0.258 0.2518 - - - - - 0.254 0.2469 - - - - - - 0.242

The reader can see that the third row in bold has the largest values of each column. It is not

hard to see analytically that qk,r is a critical point of σ. My computer experiments lead me to the

following conjecture:

Conjecture 1.10. The function σ(p) has a global maximum at q3,r.

9

Movsheva, Anna

1.3.3 The Generating Function of Momenta

In order to test Conjecture 1.4 I need to have an effective way of computing E(H l(p[k], Y )) for

large values of k. In this section I present my computations of E(H l(p[k], Y )) for small r, which

lead me to a conjectural formula for E(H l(p[k], Y )).

The factorial generating function of powers of entropy can be written compactly this way:

G(p, Y, s) =∞∑t=0

H(p, Y )tst

t!=∞∑t=0

(−∑l

i=0 p(Yi) ln p(Yi))tst

t!=

= Πli=1p(Yi)

−p(Yi)s

(8)

The function G(p, Y, s) can be extended from Pr to Pr+1 in the following way. I extend the r-

dimensional probability vector p to r + 1-dimensional vector p′ by adding a zero coordinate. Any

partition Y = {Y1, . . . , Yl} defines a partition Y ′ = {Y1, . . . , Yl, {r + 1}}. Note that G(p, Y, s) =

G(p′, Y ′, s).

The following generating function, after normalization, encodes all the moments of the random

partition:

J(p, s) =∑Y ∈Pr

G(p, Y, s) J(p, s)/Br =∑l≥0

E(H l(p, Y ))sl/l! (9)

I want to explore the effect of substitution p→ p[k] on J(p, s).

I use the following notations:

At(p, s) = J(p[t], s).

Here are the results of my computer experimentations. A set of two non-zero p extended by t zeros

yields

At(p1, p2,−s) = Bt+1 + (Bt+2 −Bt+1)pp1s1 pp2s2 (10)

10

Movsheva, Anna

The next is for 3 non-zero p extended by zeroes.

At(p1, p2, p3,−s) = Bt+1 + (Bt+2 −Bt+1)×

×(

(p1 + p2)s(p1+p2)pp3s3 + (p1 + p3)s(p1+p3)pp2s2 + (p2 + p3)s(p2+p3)pp1s1

)+ (Bt+3 − 3Bt+2 + 2Bt+1)pp1s1 pp2s2 pp3s3

(11)

I found At(p, s) for probability vector p with five or less coordinates. In order to generalize the re-

sults of my computation I have to fix some notations. With the notation deg Y = deg{Y1, . . . , Yl} =

l I set J l(p, s) def=∑

deg Y=lG(p, Y, s). The function

At(p, s) =k∑l=1

L(l, t)J l(p, s) (12)

where L(l, t) are some coefficients. For example, in the last line of formula (11) the coefficient

L(3, t) is Bt+3 − 3Bt+2 + 2Bt+1 and the function J3(p, s) is pp1s1 pp2s2 pp3s3 . The reader can see that

the coefficients of J l(p, s) in the formulae (10) and (11) coincide. The coefficients of the Bell

numbers in the formulae for L(l, t):

Bt+1

Bt+2 −Bt+1

Bt+3 − 3Bt+2 + 2Bt+1

Bt+4 − 6Bt+3 + 11Bt+2 − 6Bt+1

Bt+5 − 10Bt+4 + 35Bt+3 − 50Bt+2 + 24Bt+1

form a triangle. I took these constants 1, 1,−1, 1,−3, 2, 1,−6, 11,−6 and entered them into the

Google search window. The result of the search lead me to the sequence A094638, Stirling numbers

of the first kind, in the Online Encyclopedia of Integer Sequences (OEIS [21]).

Definition 1.11. The unsigned Stirling numbers of the first kind are denoted by[nk

]. They count

the number of permutations of n elements with k disjoint cycles [22].

11

Movsheva, Anna

Table 4: Values of the function L

l\t 1 2 3 4 5 ...1 2 5 15 52 203 ...2 3 10 37 151 674 ...3 4 17 77 372 1915 ...4 5 26 141 799 4736 ...5 6 37 235 1540 10427 ...

... ... ... ... ... ... ...

The rows of this table are sequences A000110, A138378, A005494, A045379. OEIS provided me

with the factorial generating function for these sequences:

Conjecture 1.12.

L(l, t) =[l

l

]Bt+l −

[l

l − 1

]Bt+l−1 + · · ·+ (−1)l+1

[l

1

]Bt+1 (13)

∞∑t=0

L(l, t)zt

t!= elz+e

z−1 (14)

The identity (12) holds for all values of t.

1.3.4 Discussion of Conjecture 1.4

Formula (12) simplify computation of E(H l(p[k], Y )). Here is a sample computation of

D(p, l, k) = E(H l(p[k], Y ))− 1√2πσ(p[k])

∫ ∞−∞

xle− (x−µ(p[k]))2

2σ(p[k]) dx

for p = (0.4196, 0.1647, 0.4156)

12

Movsheva, Anna

Table 5: Values of the function D(l, k)

l\k 0 100 200 300 400 5003 -0.0166 -0.0077 -0.0048 -0.0036 -0.0029 -0.00244 -0.0474 -0.0273 -0.0173 -0.0129 -0.0104 -0.00885 -0.0884 -0.0617 -0.0393 -0.0294 -0.0237 -0.02006 -0.1467 -0.1142 -0.0726 -0.0543 -0.0438 -0.0369

The reader can see that the functions k → D(p, l, k) have a minimum for some k after which

they increase toward zero.

1.4 Significance

There are multitudes of possible devices that can be used for study of a remote system. While

some devices will convey a lot of information, some device will be inadequate. Surprisingly, the

majority of the devices (see Conjectures 1.6, 1.7, and 1.9) will measure the entropy very close to

the actual entropy of the system. All that is asked is that the device satisfies condition

the onto map ψ : X → Z. (15)

Z is the set of readings of the device.

The cumulative Gauss distribution [4] makes a good approximation to θ(p, x). The only pa-

rameters that have to be known are the average µ and the standard deviation σ. This give an

effective way of making estimates of θ(p, x). The precise meaning of the estimates can be found in

Conjecture 1.4.

My work offers a theoretical advance in the study of large complex systems through entropy

analysis. The potential applications will be in sciences that deal with complex systems, like econ-

omy, genetics, biology, paleontology, and psychology. My theory explains some hidden relations

between entropies of observed processes in a system. Also my theory can give an insight about the

object of study from incomplete information. This is an important problem to solve and a valuable

contribution to science according to my mentor who is an expert in this field.

13

Movsheva, Anna

2 Methods

All of the conjectures were gotten with the help of Mathematica. My main theoretical technical

tool is the theory of generating functions [22].

Definition 2.1. Let ak be a sequence of numbers where k ≥ 0. The generating function correspond

to ak is a formal power series∑

k≥0 aktk.

My knowledge of Stirling numbers (see Definition 1.11) also comes from [22]. I also used Jensen

Inequality (Theorem 3.4)[6].

3 Results

3.1 Computation of βr and γr

The main result of this section is the explicit formulae for βr (see formula (7)) and γr (see formula

(6)):

βr =ω(r, 1)rBr

γr =1

1− ω(r,1)rBr

(16)

where

ω(r, 1) = r!r−1∑i=0

Bi ln(r − i)i!(r − i− 1)!

(17)

I set some notations. The probability of Yi is kir = #Yi

r and the entropy of Y is H(Y ) =

H(hr, Y ) = −∑l

i=1kir ln ki

r . After some simplifications H(Y ) becomes

H(Y ) = ln r − 1rλ(Y ) (18)

where

λ(Y ) = λ(k1 . . . kl) = ln kk11 kk22 . . . kkll =

l∑i=1

ki ln ki (19)

14

Movsheva, Anna

The average entropy is

E(H(hr, Y )) = ln r −∑

Y ∈Pr λ(Y )rBr

(20)

I am interested in calculating the sums:

ω(r, q) =∑Y ∈Pr

λ(Y )q, q ≥ 0 (21)

The generating function of λ(Y )q with factorial is

Λ(Y, s) =∞∑k=0

λ(Y )ksk

k!= kk1s1 · · · kklsl (22)

I will compute the generating function with factorials of the quantities

Λ(r, s) =∑Y ∈Pr

Λ(Y, s) (23)

Theorem 3.1.

∞∑r=0

Λ(r, s)tr

r!= eF (s,t) (24)

where F (s, t) =∑∞

r=1rrstr

r! .

Proof.

eF (s,t) =∞∑l=0

F (s, t)l

l!=∞∑l=0

1l!

( ∞∑k=1

kkstk

k!

)l=

=∞∑l=0

1l!

∞∑k1=1

kk1s1 tk1

k1!

∞∑k2=1

kk2s2 tk2

k2!· · ·

∞∑kl=1

kklsl tkl

kl!

=∞∑l=0

1l!

∑k1≥1,...,kl≥1

kk1s1 kk2s2 · · · kklsl tk1+k2+···+kl

k1!k2! · · · kl!

=∞∑l=0

1l!

∑1≤k1≤k2≤···≤kl

l!c1!c2! . . .

kk1s1 kk2s2 · · · kklsl tk1+k2+···+kl

k1!k2! · · · kl!

(25)

15

Movsheva, Anna

Coefficient ci is the number of ks that are equal to i. After some obvious simplifications the formula

above becomes:

eF (s,t) =∞∑r=0

1r!tr

∑k1≤k2≤···≤kl,k1+···+kl=r

r!c1!c2! . . .

kk1s1 kk2s2 · · · kklslk1!k2! · · · kl!

(26)

Each partition Y determines a set of numbers, ki = #Yi. I will refer to k1, . . . , kl as to a

portrait of {Y1, . . . , Yl}. Let me fixate one collection of numbers, k1, . . . , kl. I can always assume

that the sequence is not decreasing. Let me count the number of partitions with the given portrait:

k1 ≤ · · · ≤ kl. If the subsets were ordered the number of partitions would equal to (k1+k2+···+kl)!k1!k2!···kl! . In

my case the subsets are unordered and the number of such unordered partitions is (k1+k2+···+kl)!k1!k2!···kl!c1!c2!... ,

where ci is the number of subsets with cardinality i. The function Λ(Y, s) depends only on the

portrait of Y . From this I conclude that

∑Y ∈Pr

Λ(Y, s) =∑

k1≤k2≤···≤kl

(k1 + k2 + · · ·+ kl)!kk1s1 kk2s2 . . . kklslk1!k2! · · · kl!c1!c2! . . .

(27)

which yields the proof.

Note that upon the substitution s = 0 the formula (24) becomes a classical generating function

∑k≥0

Bktk

k!= ee

t−1 (28)

(see [22]).

My knowledge lets me find the generating function∑∞

r=0tr

r!ω(r, 1).

Proposition 3.2.

∞∑r=0

trω(r, 1)r!

= eet−1

∞∑k=1

tk ln k(k − 1)!

(29)

16

Movsheva, Anna

Proof. Using equations 22, 23, and 21 I prove that

∞∑r=0

∂

∂s

Λ(r, s)tr

r!|s=0 =

∞∑r=0

tr

r!

∑Y ∈Pr

λ(Y ) =∞∑r=0

ω(r, 1)tr

r!.

I find alternatively the partial derivative∑∞

r=0∂∂s

Λ(r,s)tr

r! |s=0 with the chain rule applied to right-

hand-side of (24): ∂∂s [e

F (s,t)]|s=0 = eF (s,t) ∂∂s [F (s, t)]|s=0. Note that F (s, t)|s=0 = et − 1 and

∂∂s [F (s, t)]|s=0 =

∑∞k=1

tkk ln kk! . From this I infer that

∂

∂s[eF (s,t)]|s=0 = ee

t−1∞∑k=1

tkk ln kk!

.

I want to find am explicit formula for ω(r, 1). To my advantage I know that eet−1 =

∑∞n=0

Bntn

n! ,

where Bn is the Bell number or the number of unordered partitions that could be made my of a

set of n elements [22]. To find ω(r, 1) I will expand equation (29).

∞∑r=0

trω(r, 1)r!

=∞∑n=0

Bntn

n!

∞∑k=1

ln ktk

(k − 1)!

=B0

0!ln 21!t2 + (

B1

1!ln 21!

+B0

0!ln 32!

)t3 + (B2

2!ln 21!

+B1

1!ln 32!

+B0

0!ln 43!

)t4 + · · ·

(30)

Since equal power series have equal Taylor coefficients, I conclude that formula (29) is valid. For-

mulae (16) follow (20), (5), (6), and (7).

Using the first and second derivatives of equation 11 at s = 0 I find σ(q3,r):

σ(q3,r) =

(−

4B2t+1 ln 22

B2t+3

+8Bt+1Bt+2 ln 22

B2t+3

−4B2

t+2 ln 22

B2t+3

− 4Bt+1 ln 22

3Bt+3+

4Bt+2 ln 22

3Bt+3+

4B2t+1 ln 2 ln 3B2t+3

− 4Bt+1Bt+2 ln 2 ln 3B2t+3

−B2t+1 ln 32

B2t+3

+Bt+1 ln 32

Bt+3

) 12

(31)

3.2 Bell Trials

I introduce a sequence of numbers

pi =(r − 1)!Bi

Bri!(r − i− 1)!, i = 0, . . . , r − 1 (32)

17

Movsheva, Anna

The sequence p = (p0, . . . , pr−1) satisfies pi ≥ 0 and∑r−1

i=0 pi = 1. It follows from the recursion

formula∑r−1

i=0(r−1)!Bii!(r−i−1)! = Br [22]. I refer to a random variable ξ with this probability distribution

as to Bell trials. Note that the average of ln(r − ξ) is equal to ω(r, q)/rBr.

Proposition 3.3. 1.∑r−1

i=0 (r − i)pi = µr−1 = (r−1)Br−1+BrBr

2.∑r−1

i=0 (r − i)2pi = (r−2)(r−1)Br−2+3(r−1)Br−1+BrBr

Proof. I will compute the generating function of Sr(x) =∑r

i=0r!Bix

r−i+1

i!(r−i)! instead. Note that

Sr(x)′|x=1 = Brµr.

∞∑r=0

Sr(x)tr

r!=∞∑r=0

1r!

r∑a+b=r

(a+ b)!Bataxb+1tb

a!b!=∞∑a=0

Bata

a!

∞∑b=0

x(xt)b

b!= ee

t−1xext = xeet−1+xt (33)

I factored the generating function into two series, which very conveniently simplified into expo-

nential expressions. Now that I found the simplified expression for the generating function I will

differentiate it

∂

∂x[xee

t−1+xt]|x=1 = (xt+ 1)eet−1+xt|x=1 = (t+ 1)ee

t−1+t (34)

Note that the function∑

k≥1Bkt

k−1

(k−1)! (compare it with formula (28)) is equal to eet−1′ = ee

t−1+t,

which implies

teet−1+t + ee

t−1+t =∑k≥2

(k − 1)Bk−1tk−1

(k − 1)!+∑k≥1

Bktk−1

(k − 1)!(35)

and the formula for µr−1

The second moment∑r−1

i=0 (r − i)2pi can be computed with the same methodic. Note that the

second moment is equal to Br(x(Sr(x)′))′ The generating function with factorials of the second

moments is

∂

∂x[x(xt+ 1)ee

t−1+xt]|x=1 = (t2x2 + 3tx+ 1)eet−1+xt|x=1 = (t2 + 3t+ 1)ee

t−1+t =

=∑k≥3

(k − 2)(k − 1)Bk−2tk−1

(k − 1)!+∑k≥2

3(k − 1)Bk−1tk−1

(k − 1)!+∑k≥1

Bktk−1

(k − 1)!

(36)

18

Movsheva, Anna

Theorem 3.4. (Jensen’s Inequality)[6],[18] For any concave function f : R → R and a sequence

qi ∈ R>0 the inequality holds

r∑i=1

f(i)qi ≤ f(r∑i=1

iqi)

I want to apply this theorem to the concave function lnx:

Corollary 3.5. There is an inequality with pi as in (32):

r∑i=1

ln(r − i)pi < ln(

1 +(r − 1)Br−1

Br

)

Proof. Follows from Jensen’s Inequality and Proposition 3.3.

Proposition 3.6. limr→∞

γr = 1

Proof. Corollary 3.5 implies

γr =1

1− ω(r,1)rBr ln r

<1

1−ln

“1+

(r−1)Br−1Br

”ln r

(37)

It’s easy to see that Br ≥ 2Br−1 since I always have a choice of whether to add r to the same part

as r − 1 or not. This implies that Br−1

Br≤ 1

2r−1 . From this I conclude that 1 ≤ γr <1

1−ln

„1+

(r−1)

2r−1

«ln r

and lim γr = 1.

4 Discussion and Conclusion

I was not able to prove all the conjectures I made. I have made some steps (Proposition 3.6)

toward the proof of Corollary 1.8 of the Conjecture 1.6, that the ratio of the maximum entropy to

the average entropy is close to one. Another fact I have found was that the difference between the

maximum entropy and the average entropy of partitions slowly grows as #X increases. I conjecture

that the difference has the magnitude ln ln #X. Also I have computed the standard deviations,

formula (31), which is conjecturally the greatest value of σ(p).

19

Movsheva, Anna

My short term goal is to prove these conjectures. The more challenging goal is to add dynamics

to the system being studied and identify self-organizing subsystems among low entropy subsystems.

I am grateful for the support and training of my mentor, Dr. Rostislav Matveyev, on this

research.

References

[1] W. R. Ashby. An Introduction To Cybernetics. John Wiley and Sons Inc, 1966.

[2] R. Beebe. Jupiter the Giant Planet. Smithsonian Books, Washington, 2 edition, 1997.

[3] C. Bettstetter and C. Gershenson, editors. Self-Organizing Systems, volume 6557 of Lecture

Notes in Computer Science. Springer, 2011.

[4] W. Bryc. The Normal Distribution: Characterizations with Applications. Springer-Verlag,

1995.

[5] R. Clausius. Uber die warmeleitung gasformiger korper. Annalen der Physik, 125:353–400,

1865.

[6] T.M. Cover and J.A. Thomas. Elements of information theory. Wiley, 1991.

[7] S.R. de Groot and P. Mazur. Non-Equilibrium Thermodynamics. Dover, 1984.

[8] Z.I. Botev D.P. Kroese, T.Taimre. Handbook of Monte Carlo Methods. John Wiley & Sons,

New York, 2011.

[9] H. Haken. Synergetics, Introduction and Advanced Topics,. Springer,, Berlin, 2004.

[10] S.A. Kauffman. The Origins of Order. Oxford University Press, 1993.

[11] Mathematica. www.wolfram.com.

[12] H. Meinhardt. Models of biological pattern formation: from elementary steps to the organi-

zation of embryonic axes. Curr. Top. Dev. Biol., 81:1–63, 2008.

[13] D. Morrison. Exploring Planetary Worlds. W. H. Freeman, 1994.

20

Movsheva, Anna

[14] I. Prigogine. Non-Equilibrium Statistical Mechanics. Interscience Publishers, 1962.

[15] I. Prigogine. Introduction to Thermodynamics of Irreversible Processes. John Wiley and Sons,

1968.

[16] I. Prigogine and G. Nicolis. Self-Organization in Nonequilibrium Systems: From Dissipative

Structures to Order through Fluctuations. John Wiley and Sons, 1977.

[17] G.C. Rota. The number of partitions of a set. American Mathematical Monthly, 71(5):498–504,

1964.

[18] W. Rudin. Real and Complex Analysis. McGraw-Hill, 1987.

[19] E. Schrodinger. What Is Life? Cambridge University Press, 1992.

[20] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal,,

27:379–423, 1948.

[21] N. J. A. Sloane and other. The on-line encyclopedia of integer sequences oeis.org.

[22] R.P. Stanley. Enumerative combinatorics, volume 1,2. CUP, 1997.

21