Statistical Properties of the Entropy Function of a Random
Partition
Anna Movsheva
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 General properties of θ(p, x) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Functions µ(p) and σ(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 The Generating Function of Momenta . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Discussion of Conjecture 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Methods 14
3 Results 14
3.1 Computation of βr and γr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Bell Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Discussion and Conclusion 19
Abstract
It is well known that living organisms are open self-organizing thermodynamic systems with
a low entropy. An estimate for the number of subsystems with low entropy would give a rough
guess about the number of self-organizing subsystems that exist in a closed system S. I study
the mathematical properties of a model in which a finite set X with a probability distribution
{px|x ∈ X} encodes a set of states of the system S. A partition of the set X =∐l
i=1 Yi, in this
model represents a subsystem with the set of probabilities {p(Yi) =∑
x∈Yipx}. In this paper I
study the entropy function H(p, Y ) = −∑
i p(Yi) ln p(Yi) of a random partition Y . In particular
I study the counting function Θ(p, x) = #{Y |H(p, Y ) ≤ x}. Using computer simulations, I give
evidences that the normalized function θ(p, x) = Θ(p, x)/Θ(H(p,X)) asymptotically can be
approximated by the cumulative Gauss distribution 1/√
2πσ(p)∫ x
−∞ exp(−(t−µ(p))2/2σ(p))dt.
I state my findings in a form of falsifiable conjectures some of which I partly prove. The
asymptotics explain a strong correlation between µ(p), the average entropy of a random partition
of X, and the entropy H(p,X). Since the quantity µ(p) is usually available in practice I can
give an estimate for H(p,X) when it is not directly computable.
Movsheva, Anna
1 Introduction
1.1 Background
One of the main problems of theoretical biology and theoretical physics is to reconcile the theory of
evolution with statistical mechanics and thermodynamics. It was Ilya Prigogine who was the first
who made the fundamental contributions to the solution of this problem. He advocated that living
organisms are open self-organizing thermodynamic systems with a low entropy. These open systems
are part of a large closed system S. Since I am interested in open self-organizing thermodynamic
systems it is important to know the number of subsystems within S that have low entropy. In my
work I studied this question from the mathematical point of view. In my simplified approach the
configuration space of S was a finite set X with a probability distribution. In my interpretation a
subsystem was a partition of X. In my work I studied a function that, for a given x, counted the
number of partitions of X who’s entropy did not exceed x. My approach is rather general because
any configuration space can be approximated by a sufficiently large but a finite set.
The controversy between classical biology and physics has a long history. It revolves around
a paradox that physical processes are reversible and biological are not. Boltzmann in the process
of working on this dilemma laid the foundation of statistical physics. He put forward the notion
of entropy which characterizes the degree of disorder in a statistical system. The second law
of thermodynamics in the formulation of Boltzmann states that the entropy of a closed system
cannot decrease, which makes the time in a statistical system irreversible. The solution of the
problems of irreversibility of time did not completely eliminate the contradiction. The second law
of thermodynamics seems to forbid the long term existence of the organized system, such as living
organisms. Schrodinger in his book [19] (Chapter 6) pointed out that the entropy can go down in an
open system, that is a system that can exchange mass and energy with the surroundings. Prigogine
in his groundbreaking works [15, 14, 16] showed that the self-organization (decrease of entropy) can
be achieved dynamically. His discovery layed down the foundation of non-equilibrium statistical
mechanics. The most interesting self-organizing systems exist far away from the equilibrium and
are non static by their nature.
There is a vast literature on self-organization (see e.g.[16, 10, 9, 12] and the references therein).
1
Movsheva, Anna
Current research is focused on the detailed studying of individual examples of self-organization and
is very successful (see e.g.[3]). In this work I changed the perspective. My motivating problem was
rather general - to estimate the total number of self-organizing subsystems in a thermodynami-
cally closed system. Self-organizing subsystems are the most interesting specimen of the class of
subsystems with a low entropy. This motivates my interest in estimating the number of subsys-
tems with a low entropy. Knowing this number the number of self-organizing subsystems can be
assessed. A problem given in such generalities looks very hard so I made a series of simplifications
that let me progress in this direction. Ashby in [1] argued that any system S can be thought of
as a “machine”. His idea is that the configuration space of S can be approximated by a set or
an alphabet X and the dynamics is given by the transition rule TX : X → X. A homomorphism
between machines S = (X,TX) and Q = (Z, TZ) is a map ψ : X → Z such that ψTX = TZψ.
Homomorphisms are useful in analysis of complicated systems. (See [1] for details) A submachine,
according to [1], is a subset X ′ ⊂ X that is invariant in respect to TX . I never used this definition
in this paper. In my definition a submachine is a homomorphic image ψ : (X,TX)→ (Z, TZ). For
example, if a machine (X,T ) consists of N non-interactive sub machines (X1, T1), . . . , (XN , TN )
then X = X1×· · ·×XN , T = T1×· · ·×TN . Projectors ψi(xi, . . . , xN ) = xi are homomorphisms of
machines. This reflects the fact that the configuration space of a union of non interacting systems
is a product (not a union) of the configuration spaces of the components.
Definition 1.1. A collection of subsets Y = {Yz|z ∈ Z} , such that Yz ∩ Yz′ = ∅, z 6= z′ and⋃z∈Z Yz = X is a partition of a finite set X, r = #X. Let ki to be the cardinality #Yz. In this are
I shall use the notation X =∐z Yz
Any homomorphism ψ : (X,TX) → (Z, TZ) defines a partition of X with Yz equal to {x ∈
X|ψ(x) = z}. In fact up to relabeling the elements of Z the homomorphism is the same as a
partition. This also explains why I am interested in the counting of the partitions. Ashby in
[1] argued that a machine (X,T ) is a limiting case of a more realistic Markov process, in which
deterministic transition rules x → T (x) get replaced by random transition rules x → T (x). The
dynamics of the process is completely determined by the probabilities {px′,x|x, x′ ∈ X} to pass from
the state x to the state x′ and the initial probability distribution {px|x ∈ X}. Markov processes
have been studies in the theory of information developed originally in [20].
2
Movsheva, Anna
Yet there is still another way to interpret quantities that I would like to compute. A submachine
can be also be interpreted as a scientific device. This can be understood in the example of a
hurricane on Jupiter [2]. You can analyze the hurricane in a multitude of ways: visually through
the lenses of a telescope, by recording the fluctuations of winds with a probe, by capturing the
fluctuations of the magnetic field around the hurricane. Every method of analysis (device) gives
a statistical data that yields in turn the respective entropy. If (X, p) is a space of states of the
hurricane, then ψ : X → Z is a function, whose set of values is the set of readings of the scientific
device. It automatically leads to a partition of X as it was explained above. The list of known
scientific methods in planetary science is enormous [13], and any new additional method contributes
something to the knowledge. Yet, the full understanding of the subject would be only possible if I
used all possible methods (ψs). This, however, is not going to happen in planetary science in the
near future. The reason is that the set of states X of the Jupiter atmosphere is colossal, which
makes the set of all conceivable methods of its study (devices) even bigger.
Still, imagine that all the mentioned troubles are nonexistent. It would be interesting to count
the number of scientific devices that yield statistical data about the hurricane with entropy no
greater than a given value. It would be also interesting to know their the average entropy. This is
a dream. I did just that in my oversimplified model.
1.2 Research Problem
In the following, the set X will be {1, . . . , r}. Let p be a probability distribution on X, that is
a collection of numbers pi ≥ 0 such that∑r
i=1 pi = 1. The array p = (p1, . . . , pr) is said to be a
probability vector. The probability of Yi in the partition X =∐Yi is
p(Yi) =∑j∈Yi
pj .
Definition 1.2. Entropy of a partition Y , H(p, Y ) is calculated by the expression −∑l
i=0 p(Yi) ln p(Yi).
In this definition the function x lnx is extended to x = 0 by continuity 0 ln 0 = 0.
Here are some examples of entropies: H(p, Ymax) = −∑r
i=1 pi ln pi for Ymax = {{1}, . . . , {r}},
H(p, Ymin) = 0 for Ymin = {{1, . . . , r}}. One of the properties of the entropy function (see [6]) is
3
Movsheva, Anna
that
H(p, Ymin) ≤ H(p, Y ) ≤ H(p, Ymax) for any Y ∈ Pr (1)
It is clear from the previous discussion that Θ(p, x) = #{Y ∈ Pr|H(p, Y ) ≤ x} is identical to
the function defined in the abstract.
The Bell numberBr ([22],[17]) is the cardinality of Pr. The value Θ(p,H(p, Ymax)) = Θ(p,H(p, id))
thanks to (1) coincides with Br. From this I conclude that
θ(p, x) =#{Y ∈ Pr|H(p, Y ) ≤ x}
Br
is the function defined in the abstract.
My main goal is to find a simple approximation to θ(p, x).
1.3 Hypothesis
In this section I will formulate the conjectures that I obtained with Computing Software Mathe-
matica [11].
Remark 1.3. I equipped the set Pr with the probability distribution P such that P(Y ) for Y ∈ Pr
is equal to 1/Br. The value of the function θ(p, x) is the probability that a random partition Y has
the entropy ≤ x. This explains the adjective “random” in the title of the paper.
In order to state the main result I will need to set notation:
p[k] = (p1, . . . , pr,
k︷ ︸︸ ︷0, . . . , 0) (2)
where p = (p1, . . . , pr) is the probability vector. From the set of momenta of the entropy of a
random partition
E(H l(p, Y )) =1Br
∑Y ∈Pr
H l(p, Y ) (3)
I will use the first two to define the average µ(p) = E(H(p, Y )) and the standard deviation σ(p) =
4
Movsheva, Anna
√E(H(p, Y )2)− E(H(p, Y ))2.
Conjecture 1.4. Let p be a probability distribution on {1, . . . , r}. Then
limk→∞
E(H l(p[k], Y ))− 1√2πσ
∫ ∞−∞
xle−(x−µ)2
2σ dx = 0
with µ = µ(p[k]), σ = σ(p[k])and for any integer l ≥ 0.
Practically this means that the cumulative normal distribution
Erf(x, µ, σ) =1√2πσ
∫ x
−∞e−
(x−µ)2
2σ
with µ = µ(p[k]), σ = σ(p[k]) makes a good approximation to θ(p[k], x) for large k.
The initial study of the function θ(p, x) has been done with the help of Mathematica. The
software can effectively compute the quantities associated with set X whose cardinality does not
exceed ten.
1.3.1 General properties of θ(p, x)
The plots of some typical graphs are presented in Figure 1.1. These were done with a help of
Mathematica.
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure 1.1: Graphs of θ(p, x), θ(q, x).
The continuous line on the graph corresponds to θ(p, x) with
p = (0.082, 0.244, 0.221, 0.093, 0.052, 0.094, 0.079, 0.130)
The step function corresponds to q = (18 , . . . ,
18). Large steps are common for θ(q, x) when q has
5
Movsheva, Anna
symmetries. A symmetry of q is a permutation τ of X such that qτ(x) = qx for all x ∈ X. Indeed,
if I take a symmetry and act it upon a partition, I get another partition with the same entropy.
This way I can produce many partitions with equal entropies. Hence, I get high steps in the graph.
The effect of of the operation p→ p[1] (2) on θ(p, x) is surprising. Here are the typical graphs:
0.0 0.5 1.0 1.5 2.0 2.5
0.2
0.4
0.6
0.8
1.0
1.2
Figure 1.2: Graphs of θ(p, x), θ(p[1], x), θ(p[2], x) for some randomly chosen p = (p1, . . . , p6).
The reader can see that the graphs have the same bending patterns. Aslo graphs lie one over
the other. I wanted to put forth a conjecture that passed multiple numerical tests.
Conjecture 1.5. For any p I have
θ(p, x) ≥ θ(p[1], x)
A procedure that plots θ(p, x) is hungry for computer memory. This is why it is worthwhile to
find a function that makes a good approximation. I have already mentioned in the introduction
that Erf(x, µ(p), σ(p)) approximates θ(p, x) well. For example, if
p = (0.138, 0.124, 0.042, 0.106, 0.081, 0.131, 0.088, 0.138, 0.154), (4)
the picture below indicates a good agreement of graphs.
6
Movsheva, Anna
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure 1.3: Erf(x, µ(p), σ(p)) (red) vs θ(p, x)(blue), with p as in (4).
The reader will more precise relations between Erf and θ in the following sections.
1.3.2 Functions µ(p) and σ(p)
The good agreement of graphs Erf(x, µ(p), σ(p)) and θ(p, x) raises a question of a detailed analysis
of the functions µ(p) and σ(p). It turns out that a more manageable quantities than µ(p) are
β(p) = H(p, Ymax)− µ(p), γ(p) =H(p, Ymax)
µ(p)(5)
The unequally (1) implies that µ(p) ≤ H(p, Ymax) and β(p) ≥ 0, γ(p) ≥ 1. Evaluation of the
denominator of γ(p) with formula (3) requires intensive computing. On my slow machine I used
the Monte-Carlo approximation [8]
µ(p) ∼ 1k
k∑i=1
H(p, Y i)
where Y i are independent random partitions. Below is the graph of β(p1, p2, p3) γ(p1, p2, p3) plotted
in Mathematica. The reader can distinctly see one maximum in the center corresponding to p =
(13 ,
13 ,
13).
Figure 1.4: The plot of β(p1, p2, 1− p1 − p2) Figure 1.5: The plot of γ(p1, p2, 1− p1 − p2)
A closer look at the plot shows that γ(p1, p2, p3) is not a concave function.
7
Movsheva, Anna
In the following hr stands for the probability vector (1r , . . . ,
1r ).
I came up with a conjecture, which has been numerically tested for r ≤ 9:
Conjecture 1.6. The function γ(p1, . . . , pr) can be extended by continuity to all distributions p.
In this bigger domain it satisfies
1 ≤ γ(p) ≤ γ(hr)def= γr. (6)
Likewise the function β satisfies
0 ≤ β(p) ≤ β(hr)def= βr. (7)
The reader should consult sections below on the alternative ways of computing of βr and γr.
The following table contains an initial segment of the sequence of {γr}.
Table 1: Values of γr.r 2 3 4 5 6 7 8 9 ... 100 ... 1000γr 2 1.826 1.739 1.691 1.659 1.635 1.617 1.602 ... 1.426 ... 1.341
I see that it is a decreasing sequence. Extensive computer tests have lead me the following
conjecture.
Conjecture 1.7. The sequence {γr} satisfies γr ≥ γr+1 and limr→∞
γr = 1.
The limit statement is proved in Proposition 3.6.
Corollary 1.8. limt→∞
γ(p[t]) = 1.
Proof. From Conjecture 1.6 I conclude that 1 ≤ γ(p[t]) ≤ γr+t. Since limt→∞
γr+t = 1 by Conjecture
1.7, limt→∞
γ(p[t]) = 1.
Table 2: Values of βr.
r 6 7 8 9 ... 100 ...βr 0.711731 0.756053 0.793492 0.825835 ... 1.3943 ...
8
Movsheva, Anna
Conjecture 1.9. The sequence {βr} satisfies βr ≤ βr+1 and limr→∞
βr =∞.
The situation with the standard deviation σ(p) is a bit more complicated. Here is a graph of
σ(p1, p2, p3).
Figure 1.6: Three-dimensional view of the graph of standard deviation σ(p1, p2, p3) for θ(p, x).
The reader can clearly see four local maxima. The function σ(p1, p2, p3) is symmetric. The
maxima correspond to the points (13 ,
13 ,
13) and permutations of (1
2 ,12 , 0). This lead me to think
that local maxima of σ(p1, . . . , pr) are permutations of qk,r = hk[r − k], k ≤ r. I tabulated the
values of σ(qk,r) for small k and r in the table below.
Table 3: Values of σ(qk,r).
k\r 3 4 5 6 7 8 92 0.3396 0.3268 0.314 0.3026 0.2924 0.2832 0.2753 0.35 0.3309 0.3173 0.3074 0.2992 0.292 0.2864 - 0.3254 0.309 0.298 0.29 0.283 0.2785 - - 0.302 0.289 0.28 0.273 0.2676 - - - 0.283 0.272 0.265 0.2587 - - - - 0.267 0.258 0.2518 - - - - - 0.254 0.2469 - - - - - - 0.242
The reader can see that the third row in bold has the largest values of each column. It is not
hard to see analytically that qk,r is a critical point of σ. My computer experiments lead me to the
following conjecture:
Conjecture 1.10. The function σ(p) has a global maximum at q3,r.
9
Movsheva, Anna
1.3.3 The Generating Function of Momenta
In order to test Conjecture 1.4 I need to have an effective way of computing E(H l(p[k], Y )) for
large values of k. In this section I present my computations of E(H l(p[k], Y )) for small r, which
lead me to a conjectural formula for E(H l(p[k], Y )).
The factorial generating function of powers of entropy can be written compactly this way:
G(p, Y, s) =∞∑t=0
H(p, Y )tst
t!=∞∑t=0
(−∑l
i=0 p(Yi) ln p(Yi))tst
t!=
= Πli=1p(Yi)
−p(Yi)s
(8)
The function G(p, Y, s) can be extended from Pr to Pr+1 in the following way. I extend the r-
dimensional probability vector p to r + 1-dimensional vector p′ by adding a zero coordinate. Any
partition Y = {Y1, . . . , Yl} defines a partition Y ′ = {Y1, . . . , Yl, {r + 1}}. Note that G(p, Y, s) =
G(p′, Y ′, s).
The following generating function, after normalization, encodes all the moments of the random
partition:
J(p, s) =∑Y ∈Pr
G(p, Y, s) J(p, s)/Br =∑l≥0
E(H l(p, Y ))sl/l! (9)
I want to explore the effect of substitution p→ p[k] on J(p, s).
I use the following notations:
At(p, s) = J(p[t], s).
Here are the results of my computer experimentations. A set of two non-zero p extended by t zeros
yields
At(p1, p2,−s) = Bt+1 + (Bt+2 −Bt+1)pp1s1 pp2s2 (10)
10
Movsheva, Anna
The next is for 3 non-zero p extended by zeroes.
At(p1, p2, p3,−s) = Bt+1 + (Bt+2 −Bt+1)×
×(
(p1 + p2)s(p1+p2)pp3s3 + (p1 + p3)s(p1+p3)pp2s2 + (p2 + p3)s(p2+p3)pp1s1
)+ (Bt+3 − 3Bt+2 + 2Bt+1)pp1s1 pp2s2 pp3s3
(11)
I found At(p, s) for probability vector p with five or less coordinates. In order to generalize the re-
sults of my computation I have to fix some notations. With the notation deg Y = deg{Y1, . . . , Yl} =
l I set J l(p, s) def=∑
deg Y=lG(p, Y, s). The function
At(p, s) =k∑l=1
L(l, t)J l(p, s) (12)
where L(l, t) are some coefficients. For example, in the last line of formula (11) the coefficient
L(3, t) is Bt+3 − 3Bt+2 + 2Bt+1 and the function J3(p, s) is pp1s1 pp2s2 pp3s3 . The reader can see that
the coefficients of J l(p, s) in the formulae (10) and (11) coincide. The coefficients of the Bell
numbers in the formulae for L(l, t):
Bt+1
Bt+2 −Bt+1
Bt+3 − 3Bt+2 + 2Bt+1
Bt+4 − 6Bt+3 + 11Bt+2 − 6Bt+1
Bt+5 − 10Bt+4 + 35Bt+3 − 50Bt+2 + 24Bt+1
form a triangle. I took these constants 1, 1,−1, 1,−3, 2, 1,−6, 11,−6 and entered them into the
Google search window. The result of the search lead me to the sequence A094638, Stirling numbers
of the first kind, in the Online Encyclopedia of Integer Sequences (OEIS [21]).
Definition 1.11. The unsigned Stirling numbers of the first kind are denoted by[nk
]. They count
the number of permutations of n elements with k disjoint cycles [22].
11
Movsheva, Anna
Table 4: Values of the function L
l\t 1 2 3 4 5 ...1 2 5 15 52 203 ...2 3 10 37 151 674 ...3 4 17 77 372 1915 ...4 5 26 141 799 4736 ...5 6 37 235 1540 10427 ...
... ... ... ... ... ... ...
The rows of this table are sequences A000110, A138378, A005494, A045379. OEIS provided me
with the factorial generating function for these sequences:
Conjecture 1.12.
L(l, t) =[l
l
]Bt+l −
[l
l − 1
]Bt+l−1 + · · ·+ (−1)l+1
[l
1
]Bt+1 (13)
∞∑t=0
L(l, t)zt
t!= elz+e
z−1 (14)
The identity (12) holds for all values of t.
1.3.4 Discussion of Conjecture 1.4
Formula (12) simplify computation of E(H l(p[k], Y )). Here is a sample computation of
D(p, l, k) = E(H l(p[k], Y ))− 1√2πσ(p[k])
∫ ∞−∞
xle− (x−µ(p[k]))2
2σ(p[k]) dx
for p = (0.4196, 0.1647, 0.4156)
12
Movsheva, Anna
Table 5: Values of the function D(l, k)
l\k 0 100 200 300 400 5003 -0.0166 -0.0077 -0.0048 -0.0036 -0.0029 -0.00244 -0.0474 -0.0273 -0.0173 -0.0129 -0.0104 -0.00885 -0.0884 -0.0617 -0.0393 -0.0294 -0.0237 -0.02006 -0.1467 -0.1142 -0.0726 -0.0543 -0.0438 -0.0369
The reader can see that the functions k → D(p, l, k) have a minimum for some k after which
they increase toward zero.
1.4 Significance
There are multitudes of possible devices that can be used for study of a remote system. While
some devices will convey a lot of information, some device will be inadequate. Surprisingly, the
majority of the devices (see Conjectures 1.6, 1.7, and 1.9) will measure the entropy very close to
the actual entropy of the system. All that is asked is that the device satisfies condition
the onto map ψ : X → Z. (15)
Z is the set of readings of the device.
The cumulative Gauss distribution [4] makes a good approximation to θ(p, x). The only pa-
rameters that have to be known are the average µ and the standard deviation σ. This give an
effective way of making estimates of θ(p, x). The precise meaning of the estimates can be found in
Conjecture 1.4.
My work offers a theoretical advance in the study of large complex systems through entropy
analysis. The potential applications will be in sciences that deal with complex systems, like econ-
omy, genetics, biology, paleontology, and psychology. My theory explains some hidden relations
between entropies of observed processes in a system. Also my theory can give an insight about the
object of study from incomplete information. This is an important problem to solve and a valuable
contribution to science according to my mentor who is an expert in this field.
13
Movsheva, Anna
2 Methods
All of the conjectures were gotten with the help of Mathematica. My main theoretical technical
tool is the theory of generating functions [22].
Definition 2.1. Let ak be a sequence of numbers where k ≥ 0. The generating function correspond
to ak is a formal power series∑
k≥0 aktk.
My knowledge of Stirling numbers (see Definition 1.11) also comes from [22]. I also used Jensen
Inequality (Theorem 3.4)[6].
3 Results
3.1 Computation of βr and γr
The main result of this section is the explicit formulae for βr (see formula (7)) and γr (see formula
(6)):
βr =ω(r, 1)rBr
γr =1
1− ω(r,1)rBr
(16)
where
ω(r, 1) = r!r−1∑i=0
Bi ln(r − i)i!(r − i− 1)!
(17)
I set some notations. The probability of Yi is kir = #Yi
r and the entropy of Y is H(Y ) =
H(hr, Y ) = −∑l
i=1kir ln ki
r . After some simplifications H(Y ) becomes
H(Y ) = ln r − 1rλ(Y ) (18)
where
λ(Y ) = λ(k1 . . . kl) = ln kk11 kk22 . . . kkll =
l∑i=1
ki ln ki (19)
14
Movsheva, Anna
The average entropy is
E(H(hr, Y )) = ln r −∑
Y ∈Pr λ(Y )rBr
(20)
I am interested in calculating the sums:
ω(r, q) =∑Y ∈Pr
λ(Y )q, q ≥ 0 (21)
The generating function of λ(Y )q with factorial is
Λ(Y, s) =∞∑k=0
λ(Y )ksk
k!= kk1s1 · · · kklsl (22)
I will compute the generating function with factorials of the quantities
Λ(r, s) =∑Y ∈Pr
Λ(Y, s) (23)
Theorem 3.1.
∞∑r=0
Λ(r, s)tr
r!= eF (s,t) (24)
where F (s, t) =∑∞
r=1rrstr
r! .
Proof.
eF (s,t) =∞∑l=0
F (s, t)l
l!=∞∑l=0
1l!
( ∞∑k=1
kkstk
k!
)l=
=∞∑l=0
1l!
∞∑k1=1
kk1s1 tk1
k1!
∞∑k2=1
kk2s2 tk2
k2!· · ·
∞∑kl=1
kklsl tkl
kl!
=∞∑l=0
1l!
∑k1≥1,...,kl≥1
kk1s1 kk2s2 · · · kklsl tk1+k2+···+kl
k1!k2! · · · kl!
=∞∑l=0
1l!
∑1≤k1≤k2≤···≤kl
l!c1!c2! . . .
kk1s1 kk2s2 · · · kklsl tk1+k2+···+kl
k1!k2! · · · kl!
(25)
15
Movsheva, Anna
Coefficient ci is the number of ks that are equal to i. After some obvious simplifications the formula
above becomes:
eF (s,t) =∞∑r=0
1r!tr
∑k1≤k2≤···≤kl,k1+···+kl=r
r!c1!c2! . . .
kk1s1 kk2s2 · · · kklslk1!k2! · · · kl!
(26)
Each partition Y determines a set of numbers, ki = #Yi. I will refer to k1, . . . , kl as to a
portrait of {Y1, . . . , Yl}. Let me fixate one collection of numbers, k1, . . . , kl. I can always assume
that the sequence is not decreasing. Let me count the number of partitions with the given portrait:
k1 ≤ · · · ≤ kl. If the subsets were ordered the number of partitions would equal to (k1+k2+···+kl)!k1!k2!···kl! . In
my case the subsets are unordered and the number of such unordered partitions is (k1+k2+···+kl)!k1!k2!···kl!c1!c2!... ,
where ci is the number of subsets with cardinality i. The function Λ(Y, s) depends only on the
portrait of Y . From this I conclude that
∑Y ∈Pr
Λ(Y, s) =∑
k1≤k2≤···≤kl
(k1 + k2 + · · ·+ kl)!kk1s1 kk2s2 . . . kklslk1!k2! · · · kl!c1!c2! . . .
(27)
which yields the proof.
Note that upon the substitution s = 0 the formula (24) becomes a classical generating function
∑k≥0
Bktk
k!= ee
t−1 (28)
(see [22]).
My knowledge lets me find the generating function∑∞
r=0tr
r!ω(r, 1).
Proposition 3.2.
∞∑r=0
trω(r, 1)r!
= eet−1
∞∑k=1
tk ln k(k − 1)!
(29)
16
Movsheva, Anna
Proof. Using equations 22, 23, and 21 I prove that
∞∑r=0
∂
∂s
Λ(r, s)tr
r!|s=0 =
∞∑r=0
tr
r!
∑Y ∈Pr
λ(Y ) =∞∑r=0
ω(r, 1)tr
r!.
I find alternatively the partial derivative∑∞
r=0∂∂s
Λ(r,s)tr
r! |s=0 with the chain rule applied to right-
hand-side of (24): ∂∂s [e
F (s,t)]|s=0 = eF (s,t) ∂∂s [F (s, t)]|s=0. Note that F (s, t)|s=0 = et − 1 and
∂∂s [F (s, t)]|s=0 =
∑∞k=1
tkk ln kk! . From this I infer that
∂
∂s[eF (s,t)]|s=0 = ee
t−1∞∑k=1
tkk ln kk!
.
I want to find am explicit formula for ω(r, 1). To my advantage I know that eet−1 =
∑∞n=0
Bntn
n! ,
where Bn is the Bell number or the number of unordered partitions that could be made my of a
set of n elements [22]. To find ω(r, 1) I will expand equation (29).
∞∑r=0
trω(r, 1)r!
=∞∑n=0
Bntn
n!
∞∑k=1
ln ktk
(k − 1)!
=B0
0!ln 21!t2 + (
B1
1!ln 21!
+B0
0!ln 32!
)t3 + (B2
2!ln 21!
+B1
1!ln 32!
+B0
0!ln 43!
)t4 + · · ·
(30)
Since equal power series have equal Taylor coefficients, I conclude that formula (29) is valid. For-
mulae (16) follow (20), (5), (6), and (7).
Using the first and second derivatives of equation 11 at s = 0 I find σ(q3,r):
σ(q3,r) =
(−
4B2t+1 ln 22
B2t+3
+8Bt+1Bt+2 ln 22
B2t+3
−4B2
t+2 ln 22
B2t+3
− 4Bt+1 ln 22
3Bt+3+
4Bt+2 ln 22
3Bt+3+
4B2t+1 ln 2 ln 3B2t+3
− 4Bt+1Bt+2 ln 2 ln 3B2t+3
−B2t+1 ln 32
B2t+3
+Bt+1 ln 32
Bt+3
) 12
(31)
3.2 Bell Trials
I introduce a sequence of numbers
pi =(r − 1)!Bi
Bri!(r − i− 1)!, i = 0, . . . , r − 1 (32)
17
Movsheva, Anna
The sequence p = (p0, . . . , pr−1) satisfies pi ≥ 0 and∑r−1
i=0 pi = 1. It follows from the recursion
formula∑r−1
i=0(r−1)!Bii!(r−i−1)! = Br [22]. I refer to a random variable ξ with this probability distribution
as to Bell trials. Note that the average of ln(r − ξ) is equal to ω(r, q)/rBr.
Proposition 3.3. 1.∑r−1
i=0 (r − i)pi = µr−1 = (r−1)Br−1+BrBr
2.∑r−1
i=0 (r − i)2pi = (r−2)(r−1)Br−2+3(r−1)Br−1+BrBr
Proof. I will compute the generating function of Sr(x) =∑r
i=0r!Bix
r−i+1
i!(r−i)! instead. Note that
Sr(x)′|x=1 = Brµr.
∞∑r=0
Sr(x)tr
r!=∞∑r=0
1r!
r∑a+b=r
(a+ b)!Bataxb+1tb
a!b!=∞∑a=0
Bata
a!
∞∑b=0
x(xt)b
b!= ee
t−1xext = xeet−1+xt (33)
I factored the generating function into two series, which very conveniently simplified into expo-
nential expressions. Now that I found the simplified expression for the generating function I will
differentiate it
∂
∂x[xee
t−1+xt]|x=1 = (xt+ 1)eet−1+xt|x=1 = (t+ 1)ee
t−1+t (34)
Note that the function∑
k≥1Bkt
k−1
(k−1)! (compare it with formula (28)) is equal to eet−1′ = ee
t−1+t,
which implies
teet−1+t + ee
t−1+t =∑k≥2
(k − 1)Bk−1tk−1
(k − 1)!+∑k≥1
Bktk−1
(k − 1)!(35)
and the formula for µr−1
The second moment∑r−1
i=0 (r − i)2pi can be computed with the same methodic. Note that the
second moment is equal to Br(x(Sr(x)′))′ The generating function with factorials of the second
moments is
∂
∂x[x(xt+ 1)ee
t−1+xt]|x=1 = (t2x2 + 3tx+ 1)eet−1+xt|x=1 = (t2 + 3t+ 1)ee
t−1+t =
=∑k≥3
(k − 2)(k − 1)Bk−2tk−1
(k − 1)!+∑k≥2
3(k − 1)Bk−1tk−1
(k − 1)!+∑k≥1
Bktk−1
(k − 1)!
(36)
18
Movsheva, Anna
Theorem 3.4. (Jensen’s Inequality)[6],[18] For any concave function f : R → R and a sequence
qi ∈ R>0 the inequality holds
r∑i=1
f(i)qi ≤ f(r∑i=1
iqi)
I want to apply this theorem to the concave function lnx:
Corollary 3.5. There is an inequality with pi as in (32):
r∑i=1
ln(r − i)pi < ln(
1 +(r − 1)Br−1
Br
)
Proof. Follows from Jensen’s Inequality and Proposition 3.3.
Proposition 3.6. limr→∞
γr = 1
Proof. Corollary 3.5 implies
γr =1
1− ω(r,1)rBr ln r
<1
1−ln
“1+
(r−1)Br−1Br
”ln r
(37)
It’s easy to see that Br ≥ 2Br−1 since I always have a choice of whether to add r to the same part
as r − 1 or not. This implies that Br−1
Br≤ 1
2r−1 . From this I conclude that 1 ≤ γr <1
1−ln
„1+
(r−1)
2r−1
«ln r
and lim γr = 1.
4 Discussion and Conclusion
I was not able to prove all the conjectures I made. I have made some steps (Proposition 3.6)
toward the proof of Corollary 1.8 of the Conjecture 1.6, that the ratio of the maximum entropy to
the average entropy is close to one. Another fact I have found was that the difference between the
maximum entropy and the average entropy of partitions slowly grows as #X increases. I conjecture
that the difference has the magnitude ln ln #X. Also I have computed the standard deviations,
formula (31), which is conjecturally the greatest value of σ(p).
19
Movsheva, Anna
My short term goal is to prove these conjectures. The more challenging goal is to add dynamics
to the system being studied and identify self-organizing subsystems among low entropy subsystems.
I am grateful for the support and training of my mentor, Dr. Rostislav Matveyev, on this
research.
References
[1] W. R. Ashby. An Introduction To Cybernetics. John Wiley and Sons Inc, 1966.
[2] R. Beebe. Jupiter the Giant Planet. Smithsonian Books, Washington, 2 edition, 1997.
[3] C. Bettstetter and C. Gershenson, editors. Self-Organizing Systems, volume 6557 of Lecture
Notes in Computer Science. Springer, 2011.
[4] W. Bryc. The Normal Distribution: Characterizations with Applications. Springer-Verlag,
1995.
[5] R. Clausius. Uber die warmeleitung gasformiger korper. Annalen der Physik, 125:353–400,
1865.
[6] T.M. Cover and J.A. Thomas. Elements of information theory. Wiley, 1991.
[7] S.R. de Groot and P. Mazur. Non-Equilibrium Thermodynamics. Dover, 1984.
[8] Z.I. Botev D.P. Kroese, T.Taimre. Handbook of Monte Carlo Methods. John Wiley & Sons,
New York, 2011.
[9] H. Haken. Synergetics, Introduction and Advanced Topics,. Springer,, Berlin, 2004.
[10] S.A. Kauffman. The Origins of Order. Oxford University Press, 1993.
[11] Mathematica. www.wolfram.com.
[12] H. Meinhardt. Models of biological pattern formation: from elementary steps to the organi-
zation of embryonic axes. Curr. Top. Dev. Biol., 81:1–63, 2008.
[13] D. Morrison. Exploring Planetary Worlds. W. H. Freeman, 1994.
20
Movsheva, Anna
[14] I. Prigogine. Non-Equilibrium Statistical Mechanics. Interscience Publishers, 1962.
[15] I. Prigogine. Introduction to Thermodynamics of Irreversible Processes. John Wiley and Sons,
1968.
[16] I. Prigogine and G. Nicolis. Self-Organization in Nonequilibrium Systems: From Dissipative
Structures to Order through Fluctuations. John Wiley and Sons, 1977.
[17] G.C. Rota. The number of partitions of a set. American Mathematical Monthly, 71(5):498–504,
1964.
[18] W. Rudin. Real and Complex Analysis. McGraw-Hill, 1987.
[19] E. Schrodinger. What Is Life? Cambridge University Press, 1992.
[20] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal,,
27:379–423, 1948.
[21] N. J. A. Sloane and other. The on-line encyclopedia of integer sequences oeis.org.
[22] R.P. Stanley. Enumerative combinatorics, volume 1,2. CUP, 1997.
21