Download pdf - Physics of uncertainty, the Gibbs paradox and ......Physics of uncertainty, the Gibbs paradox and indistinguishable particles Demetris Koutsoyiannis Department of Water Resources and

Physics of uncertainty, the Gibbs paradox and indistinguishable particles

Demetris Koutsoyiannis

Department of Water Resources and Environmental Engineering, School of Civil

Engineering, National Technical University of Athens, Greece ([email protected] –

http://www.itia.ntua.gr/dk)

Abstract. The idea that, in the microscopic world, particles are indistinguishable,

interchangeable and without identity has been central in quantum physics. The same

idea has been enrolled in statistical thermodynamics even in a classical framework of

analysis to make theoretical results agree with experience. In thermodynamics of

gases, this hypothesis is associated with several problems, logical and technical. For

this case, an alternative theoretical framework is provided, replacing the

indistinguishability hypothesis with standard probability and statistics. In this

framework, entropy is a probabilistic notion applied to thermodynamic systems and is

not extensive per se. Rather, the extensive entropy used in thermodynamics is the

difference of two probabilistic entropies. According to this simple view, no

paradoxical behaviours, such as the Gibbs paradox, appear. Such a simple

probabilistic view within a classical physical framework, in which entropy is none

other than uncertainty applicable irrespective of particle size, enables generalization of

mathematical descriptions of processes across any type and scale of systems ruled by

uncertainty.

Keywords. Uncertainty; Probability; Entropy; Gibbs paradox; Extensivity;

Indistinguishability; Maxwell-Boltzmann statistics.

1. Introduction

Φύσις κρύπτεσθαι φιλεῖ (Nature loves to hide; Heraclitus)

The idea that in the microscopic world the particles are indistinguishable, interchangeable and without

identity has been central in quantum physics and is supported by empirical evidence. Two objects are

identical whenever they have the same architecture and the same values of quantum numbers,

expressing their state. In typical quantum mechanical systems, the number of states (dimension of

Hilbert space) that describe what may happen in a finite volume is always finite (usually small) and

therefore the probability of having particles in identical states is non-zero.

The same idea has been enrolled in statistical thermodynamics even in a classical framework of

analysis [e.g. 1, 2, 3, 4] to help make theoretical results to agree with experience or perception, as well

as with pre-existing thermodynamic results. Namely, the indistinguishability hypothesis has been

central in determining the entropy in the kinetic theory of gases. In this case, the idea has been

accepted despite absence of direct experimental evidence supporting it [e.g., 5, p. 11]. It is well known

that in the kinetic theory of gases the energy and momentum are taken to be continuous variables, as in

classical physics, rather than discrete variables taking on a finite number of values as in quantum

physics. Therefore, the probability that any two particles in motion have the same velocity, momentum

and energy is zero and this can hardly justify their indistinguishability (even if the architecture of

particles is identical).

The indistinguishability hypothesis is not about whether or not we can distinguish or label particles.

It goes far beyond the fact that the particles are identical, implying that this property affects the

probabilistic behaviour of particles and the entropy of the system. Thus, the indistinguishability

hypothesis has resulted in three different probabilistic behaviours or models, typically referred to as

“statistics”, each one labelled by two names of some of the most respected physicists in history. All

2

three depart from standard probability and statistics, which we use in all sciences, including physics of

the macroscopic world. The typical probability-theoretic problem of placing N particles into M boxes

(representing particle states or locations) can serve as a cue for distinguishing the three models from

each other, as well from the standard probability model. Specifically, the number W of possible ways

of placing the particles according to the different models are [4, pp. 259-264, 319-320; 5 pp. 11, 61]:

Standard probabilistic model:

W = M N

(1)

Maxwell-Boltzmann statistics (lacking clear definition; see section 7):

WMB = W

N! =

M N

N! (2)

Bose-Einstein statistics (each of N indistinguishable bosons occupies each of M states with no

restriction on the occupation number in each state):

WBE =

N + M – 1

N =

(N + M – 1)!

N! (M – 1)! (3)

Fermi-Dirac statistics (each of N indistinguishable femions occupies each of M > N states with

the restriction that no more than one particle can occupy a specific state):

WFD =

M

N =

M!

N! (M – N)! (4)

In particular, the notion that triggered the introduction of Maxwell-Boltzmann statistics, which is

the focus of this study, was the entropy of systems of gas molecules, in which the typical model

resulted in an entropic form that was non-extensive, while classical thermodynamics demands that the

entropy be extensive. Here, an attempt is made to show that the typical probabilistic model (equation

(1)) can be reconciled with classical thermodynamics of gases without invoking indistinguishability,

and, at the same time, resolving in a natural way paradoxical results (the Gibbs paradox) associated

with indistinguishability. It is stressed that the study is focused on the inconsistencies of Maxwell-

Boltzmann statistics with respect to thermodynamics of gases and does not extend to phenomena

commonly treated by Bose-Einstein statistics and Fermi-Dirac statistics. It is also noted that the

rationale that the indistinguishability hypothesis should affect the calculation of the entropy in gases

has been disputed by van Kampen [6], Swendsen [7,8], Cheng [9], Versteegh and Dieks [10], and

Corti [11]. While this study draws conclusions similar with these studies, the approach used differs

both in the definition of entropy and its logistics. Namely:

(a) Instead of using Boltzmann’s framework, this study adopts a fully probabilistic approach

relying on Shannon’s [12] formal probabilistic framework that constitutes a generalization of

Boltzmann’s; the differences of the two are discussed below (section 2).

(b) In particular, the study derives entropy based on the probabilistic concepts of random

variables and expectation, and shows that no hypotheses differentiating a thermodynamic

system of gas from the standard probabilistic model is required to describe that system in

probabilistic entropic terms (section 3).

(c) Furthermore the study shows that the probabilistic entropy of a gas, which is none other than a

measure of the total uncertainty, is not (and cannot be) extensive; it derives the extensive

entropy used in thermodynamics as the difference of two probabilistic entropies (section 5);

and it shows that the descriptions of ideal gases by either the formal probabilistic entropy or

the extensive entropy are equivalent (section 6), except that the former is more powerful in

explaining mixing, thus removing the Gibbs paradox (section 8).

3

(d) Finally, the study offers a number of arguments, mostly not appearing in existing literature,

suggesting that the indistinguishability hypothesis results in several inconsistencies, both on

logical and technical grounds (section 7), while showing that the proposed probabilistic

framework is free of inconsistences and paradoxical results (sections 3, 7, 8) and enables its

generalization for processes across any type and scale of systems ruled by uncertainty.

2. The notion of entropy

While, historically, in classical thermodynamics, entropy was introduced as a conceptual,

deterministic and rather obscure notion, later, within statistical thermophysics, entropy has been given

a more rigid probabilistic meaning by Boltzmann and Gibbs, while Shannon used an essentially

similar, albeit more general, entropy definition to describe the information content, which he also

called entropy at von Neumann’s suggestion [2, 4, 13]. Despite the field being over a century old, the

meaning of entropy is still debated and a diversity of opinion among experts is encountered [14]. In

particular, despite having the same name, probabilistic (or information) entropy and thermodynamic

entropy are still regarded by many as two distinct notions. This conviction is encouraged by the fact

that the former is a dimensionless quantity and the latter has units (J/K), which however is only an

historical accident, related to the arbitrary introduction of temperature scales [15]. Accordingly, this

distinction has been disputed on the grounds that the “information entropy [is] shown to correspond to

thermodynamic entropy” [2], that “in principle we do not need a separate base unit for temperature,

the kelvin”, and that, in essence, thermodynamic entropy, like the information entropy, is

dimensionless [15].

In a recent book entitled “A Farewell to Entropy”, Ben-Naim [4] has attempted “not only to use the

principle of maximum entropy in predicting probability distributions [in statistical physics and

thermodynamics], but to replace altogether the concept of entropy with the more suitable concept of

information, or better yet, the missing information”. However, it may be a better idea to keep the name

entropy1 (rather than addressing a farewell and replacing it with “missing information” or other names,

including uncertainty), both in probability and in physics, also associating it with the following

necessary clarifications:

(a) Its definition (see below) relies on probability (abandoning the classical definition dS =

dQ/T, where S, Q and T denote entropy, heat and temperature, respectively).

(b) It is the basic thermodynamic quantity, which supports the definition of all other derived

thermodynamic quantities (e.g. the temperature is the inverse of the partial derivative of

entropy with respect to internal energy).

(c) It is a dimensionless quantity, thus, rendering the unit of kelvin an energy unit, a multiple of

the joule like the calorie and the Btu (i.e. 1 K = 0.138 06505 yJ = 1.3806505×10−23

J; 1 calIT

= 4.1868 J; 1Btu = 1.05506 kJ, etc.). Despite that, as well as the recognition that the

introduction of the kelvin is an historical accident [15], there is no doubt that the kelvin will

continue to be used for temperature.

(d) It is interpreted as a measure of uncertainty (leaving aside obscure interpretations like

“disorder” [4]).

(e) Its tendency to become maximum (as contrasted with other quantities like mass, momentum

1 Actually, the property (e) below makes the term entropy ideal for the notion it describes. It is reminded that the

word is ancient Greek (ἐντροπία, a feminine noun, commonly meaning a turning towards, twist—also a trick,

dodge). It springs from the preposition ἐν (in) and the verb τρέπειν (to turn or direct towards a thing, to turn

round or about, to alter, to change, to overturn). As in Greek composition the preposition ἐν- often expresses the

possession of a quality, the scientific meaning of the term ἐντροπία is the possession of the potential for change.

4

and energy, which are conserved) is the driving force of natural change. This tendency is

formalized as the principle of maximum entropy [16], which we can regard as a physical

(ontological) principle obeyed by natural systems, as well as a logical (epistemological)

principle applicable in making inference about natural systems (in an optimistic view that

our logic in making inference about natural systems could be consistent with the behaviour

of the natural systems).

Recently, Swendsen [14] formulated twelve principles about the foundations of statistical

mechanics, which led him to advocate the use of Boltzmann’s definition of the entropy in terms of the

logarithm of the probability of macroscopic states for a composite system. Here these principles are all

respected, but it is shown that a more formal use of the probability theory, including a formal

probabilistic definition of entropy may be more productive and easier to apply, thus extending

Boltzmann’s account of the number of macroscopic states.

The formal probabilistic definition of entropy is quite economic as it only needs the notion of a

random variable2 z along with the associated probability mass function p(z) or probability density

function f(z). For a discrete random variable z taking values zj with probability mass function pj :=

p(zj), j = 1,…,w, where

j = 1

w

pj = 1 (5)

the entropy is a dimensionless nonnegative quantity defined as (e.g., [5]):

Φ[z] := E[–ln p(z)] = –j = 1

w

pj ln pj (6)

where E[ ] denotes expectation. The above definition follows from the fact that Φ as defined in (6)3 is

the only function (within a multiplicative factor) that satisfies the following simple and general

postulates about uncertainty, originally set up by Shannon [12], as reformulated by Jaynes [17, p.

347]:

(a) it is possible to set up a numerical measure Φ of the “amount of uncertainty’ which is

expressed as a real number;

(b) Φ is a continuous function of pj;

(c) if all the pj are equal (pj = 1/w) then Φ should be a monotonic increasing function of w;

(d) if there is more than one way of working out the value of Φ, then we should get the same

value for every possible way.

A more formal presentation of postulates slightly different from the above, but again leading to (6), is

given by Uffink [18; theorem 1], who emphasizes the consistency character of the notion of entropy

and its maximization.

It can further be noted that in the case of equiprobability, p = pj = 1/w, the entropy becomes

Φ[z] = –ln p = ln w (7)

This corresponds to the original Boltzmann’s definition of entropy. However, when compared to (7),

(6) provides a more general definition, applicable also to cases where (due to constraints)

2 Here the so-called Dutch convention is used, according to which an underlined symbol denotes a random

variable; the same symbol not underlined represents a value of the random variable. 3 The notation of entropy by Φ was done deliberately to avoid confusion with the classical thermodynamic

entropy S, which has some differences as discussed below.

5

equiprobability is not possible. In other words, in the general case, the entropy equals the expected

value of the minus logarithm of probability.

Extension of the above definition for the case of a continuous random variable z with probability

density function f(z), where

-∞

∞

f(z) dz = 1 (8)

is possible, although not contained in Shannon’s [12] original work. This extension involves some

additional difficulties. Specifically, if we discretize the domain of z into intervals of size δz, then (6)

would give an infinite value for the entropy as δz tends to zero (the quantity –ln p = –ln (f(z) δz)) will

tend to infinity). However, if we involve a (so-called) ‘background measure’ with density h(z) and

take the ratio (f(z) δz)/ (h(z) δz) = f(z)/h(z), then the logarithm of this ratio will generally not diverge.

This allows the definition of entropy for continuous variables as [17, p. 375; 18]:

Φ[z] := E[–ln(f(z)/h(z))] = –-∞

∞

ln(f(z)/h(z)) f(z) dz (9)

Again, the entropy is a dimensionless quantity, but its value depends on the chosen background

measure and can be either a positive or negative number [5] (in contrast to the case of a discrete

variable where entropy is strictly nonnegative). The function h(z), although usually omitted in

probability texts, is necessary in order to make the quantity f(z)/h(z) dimensionless so that taking the

logarithm be physically consistent. The background measure h(z) can be any probability density,

proper4 (with integral equal to 1, like in (8)) or improper (meaning that its integral does not converge).

Typically, it can be thought of as an (improper) Lebesgue density, i.e. a constant with dimensions

[h(z)] = [f(z)] = [z–1

]. It can further be noted that in the case of uniform probability density over a finite

interval (or volume, in a vector space) Ω, i.e. f(z) = 1/Ω, assuming also a constant background measure

density, h(z) = h, the entropy becomes

Φ[z] = ln (Ωh) (10)

Again this corresponds to the original Boltzmann’s definition of entropy, with Ω denoting the volume

of the accessible phase space. Yet (9) provides a more general definition than (10), applicable in every

case.

When there is risk of ambiguity, we will call Φ[z], for either a discrete or a continuous random

variable z, probabilistic entropy. While from first glance it seems irrelevant with thermodynamic

entropy, the next session exemplifies that the latter corresponds to, and can be derived from, the

former.

3. The entropy of a monoatomic gas

For demonstration of our framework we will derive the entropy of a simple system, a monoatomic gas,

using purely probabilistic reasoning. Apparently, the result is known (see section 4), but for our

purpose it is useful to make the derivation from scratch. We consider a motionless cube with edge a

(volume V = a3) containing spherical particles (monoatomic molecules, like helium) of mass m0 in fast

motion, whose exact position and velocity we cannot observe. A particle’s state is described by 6

variables, 3 indicating its position xi and 3 indicating its velocity ui, with i = 1, 2, 3. All are represented

4 If h(z) is proper, the quantity in (9) is known as relative entropy (also as Kullback–Leibler divergence,

information divergence, information gain). Here we do not use these terms and we do not regard h(z) as a proper

density but as an improper one necessary to acquire physical consistency if z is a continuous random variable.

6

as random variables, forming the vector z = (x1, x2, x3, u1, u2, u3). Here, we adhere to classical

description making no use of quantum mechanical or even relativistic assumptions. Hence, all six

coordinates are continuous variables, the number of possible microscopic states is uncountably infinite

and the accessible volume of the phase space Ω (also known in probability theory as the certain event)

is also infinite (see below—and also notice the difference with other derivations, e.g. [7], which

assume a finite phase space). This does not entail any difficulty, because we can calculate entropy

from (9) rather than (7) or (10).

We seek to find the probability density function f(z) and, through this, the entropy of a single

particle. This can be done by application of the principle of maximum entropy. Maximization requires

first to express mathematically the constraints about z. The constraints for the particle position are:

0 ≤ xi ≤ a, i = 1, 2, 3 (11)

The velocity components do not obey inequality constraints,5 but obey equality constraints of integral

type. Specifically, conservation of momentum implies that E[m0 ui] = m0 ∫Ω ui f(z) dz = 0, where the

integration space Ω is all phase space, i.e. (0, a) for each xi and (–∞, ∞) for each ui. Thus,

E[ui] = 0, i = 1, 2, 3 (12)

Likewise, conservation of energy implies that E[m0 ||u||2/2] = (m0 /2) ∫Ω ||u||

2 f(z) dz = ε, where ε is the

energy per particle and ||u||2 = u1

2 + u2

2 + u3

2; thus, the constraint is written as

E[||u||2] = 2ε/m0 (13)

We clarify that the expectation E[u] is the macroscopic velocity of the gas. Here it is zero because the

containing box is assumed motionless. Even if it were in motion, the macroscopic velocity would be

observed and there would be no reason to include it in the uncertainty framework, which, instead,

would be formulated in terms of the difference u – E[u]. Thus, the results would be the same as in the

examined motionless case. Also, the constraints will be the same irrespective of whether a particle

follows a free path or collides with other particles.

We form the entropy of z as in (9) recognizing that the constant density h(z) in ln (f(z)/h(z)) should

have units [z–1

] = [x–3

] [u–3

] = L–6

T3. To construct this, we utilize two physical constants, the Planck

constant h (dimensions L2 M T

–1) and an elementary mass mp (dimensions M). This is usually the

particle mass mp ≡ m0, but when we have different particles, it may be more consistent to have a

common reference mass, e.g. the proton mass. We observe that the quantity (mp/h)3 has the required

dimensions L–6

T3, thereby giving the entropy as:

Φ[z] = E[–ln((h/mp)3 f(z))]= – ∫Ω ln((h/mp)

3f(z)) f(z) dz (14)

Here we note that the adoption of the above form of h(z) was made for convenience and economy. In

statistical thermodynamics texts, such as those cited above, the Planck constant is given a more

fundamental meaning for entropy, also invoking Heisenberg’s uncertainty principle. Here it is used

only for the sake of dimensional consistence. No implicit assumption of a non-zero cell size based on

Planck’s constant is made, and not any non-classical concept is used, because the mathematical

framework based on (9) is for continuous variables.

Application of the principle of maximum entropy with constraints (8), (11), (12) and (13) will give

the density function of z as:

5 One could say that each ui (as well as ||u||) should be restricted in (–c, c), where c is the light speed, but this

would be neither necessary nor consistent with the classical framework adopted here. Also, due to finite energy,

ε, one could say that each ui is restricted; however the energy constraint is about the expected value of energy

and is formulated in (13) as an equality constraint rather an inequality one.

7

f(z) = (1/a)3 (3m0 / 4πε)

3/2exp(–3m0 ||u||

2/ 4ε), 0 ≤ xi ≤ a (15)

To see this we first recall [e.g. 5] that, under a set of integral constraints of the form

E[gj(z)] = ∫Ω gj(z) f(z) dz = ηj, j = 1, …, n (16)

the resulting maximum entropy distribution is

f(z) = exp(–λ0 – j = 1

n

λj gj(z)) (17)

where λ0 and λj are constants determined such as to satisfy (8) and (16), respectively. This entails that

the logarithm of f(z) will be a second order polynomial of z. By inspection, we can readily verify that

(15) satisfies this property and also satisfies all constraints (8), (11), (12) and (13).

From (15) we directly observe that the joint distribution f(z) is a product of functions of z’s

coordinates x1, x2, x3, u1, u2, u3. This means that all six random variables are jointly independent. The

independence results from entropy maximization. We also observe a symmetry with respect to the

three velocity coordinates, resulting in uniform distribution of the expected value of energy ε into ε/3

for each direction or degree of freedom; in other words, the equipartition principle is again a result of

entropy maximization.

Combining (14) and (15), the entropy is calculated as

Φ[z] = 3

2 ln

4πe

3

mp2

h2 m0

ε V2/3

=

3

2 ln

4πe

3

mp2

h2 m0

+

3

2 ln ε + ln V (18)

where e is the base of natural logarithms.

We can easily extend these results to find the density function and the entropy of N identical

monoatomic molecules which are in motion in the same cube. Their coordinates form a vector Z =

(z1,…, zN) with 3N location coordinates and 3N velocity coordinates; this could be rearranged as Z =

(X, U), with X = ((x1, x2, x3)1, …, (x1, x2, x3)N) and U = ((u1, u2, u3)1, …, (u1, u2, u3)N). If E is the total

kinetic energy of the N molecules and ε = E/N is the energy per particle, then conservation of energy

yields

E[||U|||2] =2E/m0 = 2Nε/m0 (19)

Application of the ME principle with constraints (8), (11), (12) and (19) will give:

f(Z) = (1/a)3N

(3m0 / 4πε) 3N/2

exp(–3m0 ||U||2/ 4ε) (20)

Hence, the entropy for N particles is:

Φ[Z] = 3N

2 ln

4πe

3

mp2

h2 m0

ε V 2/3

=

3N

2 ln

4πe

3

mp2

h2 m0

+

3N

2 ln ε + N ln V (21)

An interesting property that can be observed in (21) and has important consequences is that, for

constant energy per particle ε and constant volume V, the entropy is proportional to the number of

particles N, i.e., Φ[Z] = A N, where A depends on ε, V and the constants appearing in (21), but not on

N. Now, within the fixed volume V let us consider a part V΄ < V, whose boundaries are not walls, thus

allowing particles entering and leaving V΄. In this case, the number n of particles in V΄ is not fixed and

it can be regarded as a random variable. Due to uniformity, its average will be E[n] =: N´ = N V΄ / V.

For the volume V΄, the conditional entropy for known number of particles n is, obviously, Φ[Z´|n] =

A n, where in Z´ only those coordinates that fall within V´ are counted. Since entropy is an expected

value, the unconditional entropy will be

Φ[Z´] = n = 0

∞

Φ[Z´|n] p(n) = A n = 0

∞

n p(n) = A N´ (22)

8

where p(n) is the probability mass function of n and the sum of n p(n) is by definition the average of n.

This indicates that the entropy for the partial volume V΄ is given by the same formula that provides the

entropy of the fixed volume V. In other words, the same entropic expression is valid whether the

boundaries of a certain volume have fixed walls or are free of walls, where in the latter case the

average number of particles is used. This result would not hold true if there was no proportionality

between Φ and N.

4. The Sackur-Tetrode equation

We can compare the above result to the well-known Sackur-Tetrode equation (after H. M. Tetrode and

O. Sackur, who developed it independently at about the same time in 1912), which gives the standard

(Boltzmann) entropy S of a monatomic classical ideal gas [e.g. 4] as:

S/k = 3N

2 ln

4πe

3 m0

h2 + N +

3N

2 ln ε + N ln

V

N (23)

where k is Boltzmann’s constant. Comparing (21) with (23) we observe that (apart from the

involvement of the constant k and the assumption mp = m0) there is a single but important difference:

The last term N ln V in (21) is replaced by N ln (V/N) + N in the Sackur-Tetrode equation (23).

Clearly, according to this formula there is not proportionality between Φ and N. To obtain (23), the

hypothesis has been made that the gas particles are in fact indistinguishable [e.g. 3, 4], followed by

these steps:

(a) The typical probabilistic model is replaced by the Maxwell-Boltzmann statistics, which

differs from the former by a factor N! (eqn. (2)).

(b) Accordingly, a factor ln(N!), representing the entropy of “indistinguishability” is subtracted

from the entropy in (21).

(c) Using the Stirling approximation for large N, ln(N!) ≈ N ln(N) – N, the resulting entropy for

large N takes the form (23).

5. The proposed framework

The actual purpose behind invoking the “indistinguishability” hypothesis in the Sackur-Tetrode

equation was to make it identical with the classical thermodynamic entropy. However, we will show

that the classical thermodynamic entropy can be derived from the probabilistic entropy in a different,

more natural, manner, without this hypothesis. In this manner, we will keep the entropy φ of a single

particle as given in (18), i.e. a function of the energy per particle ε = E/N and the volume V:

φ = φ(ε, V) = φ(E/N, V) = Φ[z] (24)

and the total entropy of the N particles Φ as given in (21), i.e. a function of the number of particles N,

the total energy E = ε N and the volume V:

Φ = Φ(E, V, N) = Φ[Z] (25)

While obviously Φ/φ = N, the entropy per particle φ is not (and need not be) an intensive property,

because it does depend on the system size V. Also the total entropy Φ is not an extensive property

because it does not have the property Φ(αE, αV, αN) = α Φ(E, V, N), implied by definition of

extensivity [19]. Indeed, it can be readily seen that

Φ(αE, αV, αN) – α Φ(E, V, N) = αN ln α ≠ 0 (26)

The same result is obtained using the definition of extensivity by Tsallis [20], based on the limit of

Φ[Z]/N as N → ∞, which should be finite. However, as seen from (21), when N tends to ∞, so does V,

and thus the limit of Φ[Z]/N is infinite.

9

The fact that φ and Φ are not intensive and extensive quantities, respectively, does not invalidate

them, nor does it mean that we cannot define derived quantities with these properties. Actually, we

will easily do so, but before that it is useful to make some observations on the subjectivity or

objectivity of φ and Φ. We recall that in physics many quantities are subjective in the sense that they

depend on the observer. There may also be some objective quantities that are unaltered if the

observer’s choices change. Thus, the location coordinates (x1, x2, x3) depend on the observer’s choice

of the coordinate frame and change if this frame is translated or rotated; however the distance between

two points remains constant if the frame changes. Also, the velocity depends on the relative motion of

the frame of reference (e.g. the velocity of a car whose speedometer indicates 100 km/h is zero for an

observer moving with the car, 100 km/h for an observer sitting by the road and ~107 000 km/h for a

coordinate system attached to the sun; the kinetic energy, as well as changes thereof, depend on the

reference frame, too).

Surprisingly, however, the entropy Φ[Z] of the gas in a container of a fixed volume V, as given in

(21), does not change with the change of the reference frame, provided that the kinetic energy per gas

molecule ε is defined based on the difference of velocity u from its mean E[u], i.e., ε =

E[m0(||u − E[u]||)2/2]. In this case ε is also invariant, despite that u changes with the reference frame.

The invariance extends to the entropy maximizing distribution. Therefore, despite the fact that

entropies φ and Φ are based on probabilities, they are objective quantities that can be measured and

their magnitude does not depend on the reference frame. As we have seen, though, φ and Φ depend on

the volume V. In principle, this should not be a problem as the volume is also an objective quantity.

However to assign to a system the same entropy, total or per particle, two observers must refer to the

same volume. Again, if the studied system is specified, like in the above gas container example, the

specification of the volume is not a difficult task.

However, in very large systems, such as the atmosphere as a whole, where physical quantities

change with location, specification of the volume is not direct and homogeneity (independence of

expected values from location), as tacitly assumed in our gas container example, does not hold. Still

the equations we have derived are valid but at a local scale, i.e. at a small volume V for which

homogeneity can be assumed. It is convenient to use as local objective quantity the entropy of a single

particle. However, in this case the specification of the volume V will be subjective and thus φ(ε, V)

will also be subjective. To make this objective, we observe that the quantity φ(ε, V) – ln N =: φ*(ε, v)

where v := V/N, is invariant under change of V, provided that the density of particles N/V is fairly

uniform.

This leads to the definition of two derived quantities, which we call standardized entropies and

more specifically, intensive entropy and extensive entropy respectively:

φ*(ε, v) := φ(ε, v) = φ(ε, V) – ln N =

3

2 ln

4πe

3

mp2

h2 m0

+

3

2 ln ε + ln v (27)

Φ*(E,

V,

N) := N φ

*(ε, v) = Φ(E, V, N) – N ln N =

3N

2 ln

4πe

3

mp2

h2 m0

+

3N

2 ln E

N + N ln

V

N (28)

Like the energy per particle, ε, and the volume per particle, v, the standardized entropy per particle

φ*(ε, v), is an intensive property (hence its name) as it does not depend on the size of system that an

observer, justifiably or arbitrarily, considers. Also, like the total energy, E, the volume, V, and the

number of particles, N, which are extensive properties proportional one another (a system of volume

αV, where α is any positive number contains αN particles with a total energy αE), the extensive

entropy Φ*(E, V, N) is indeed an extensive property, as it is easily seen that

Φ*(αE, αV, αN) = α Φ

*(E, V, N) (29)

With reference to the standard thermodynamic entropy S, it can be readily verified that Φ* = S/k.

10

An interpretation of standardized entropies φ*(ε, v) and Φ

*(E, V, N) is that they are not strictly

(probabilistic) entropies, but differences of entropies, taken with the aim to define quantities invariant

under change of the observer’s choices (similar to taking differences of linear coordinates to make

them invariant under translation of the frame of reference). The reference entropies (with respect to

particle location), from which these differences are taken are ln N and N ln N = ln NN for φ

* and Φ

*,

respectively. Thus, φ* or Φ

* measures how much larger the entropy Φ[z] or Φ[Z], respectively, is from

the entropy of a simplified reference system, in which only the particle location, discretized into N

bins, counts (with the number N of bins here representing a discretization of the volume V that is not a

subjective choice of an observer). Clearly, in gases (and fluids in general) there are N and NN ways of

placing one and N particles, respectively, in the N bins, so that the reference entropies are ln N and

N ln N, respectively. Notably, in solids the locations of particles are fixed (only one possible way) and

thus the reference entropy is ln 1 = 0. Thus, φ* and Φ

* in solids become identical to φ and Φ,

respectively, which agrees with the classical result for solids.

An easy perception of φ*(ε, v) is that it is identical to the probabilistic entropy of a system with a

fixed volume equal to v. Also Φ*(E, V, N), is identical to the probabilistic entropy of a system of N

particles, each of which is restricted in a volume v.

6. Equivalence of descriptions by the two entropy measures

We consider again our example with N molecules in motion in a container of volume V and entropy

per particle φ. We make an arbitrary partition of the container into two parts A and B with volumes VA

and VB, respectively, with VA + VB = V. The partition is only mental—no material separation was

made. Therefore, at any time instance any particle can be either in part A with probability π, or in part

B with probability 1 – π.

We assume that we are given the information that a particle is in part A or B. We denote the

conditional entropy, for each of the two cases as φA and φB, respectively. The unconditional entropy

(for the unpartitioned volume) can be calculated from the conditional entropies (see proof in

Appendix) as

φ = π φA + (1 – π) φB + φπ, where φπ := –π ln π – (1 – π) ln (1 – π) (30)

Substituting NA/N for π and NB/N for (1 – π), where NA and NB are the expected number of particles in

parts A and B respectively, we get (see proofs in Appendix)

Φ = ΦA + ΦB + N ln N – NA ln NA – NB ln NB (31)

Φ* = Φ

*A + Φ

*B (32)

Equations (31) and (32) offer equivalent descriptions of our system, using either probabilistic

entropies Φ or extensive entropies Φ*. Apparently, (32) is simpler and therefore the description using

Φ* is more convenient when the number of particles N matters. However, (31) offers additional

insights and helps avoid paradoxical interpretations as will be discussed below.

7. Problems in the indistinguishability hypothesis

As we have seen in section 4, we can produce an entropic quantity S/k essentially equivalent to our

extensive entropy Φ* by employing the indistiguishability hypothesis and the related Maxwell-

Boltzmann statistics. There is an essential philosophical difference between the two approaches. In the

indistiguishability approach, the property of indistiguishability is assigned to our system as a property

that characterizes the essence of the microscopic particles immanent in their existence, which is

retained whether their mathematical description is quantum mechanical or classical. In the kinetic

theory of gases the mathematical description is classical, yet the invoked indistinguishability contrasts

11

microscopic to macroscopic particles or objects, in which no such property has ever been evidenced or

hypothesized. It even contrasts microscopic particles in fluids and in solids; in the latter, according to

the standard interpretation, particles are regarded distinguishable, providing insufficient explanation

about how the indistinguishable particles of a gas or liquid acquired an identity when the fluid froze

and became solid. A common argument offered as an explanation for this is that in a solid, particles

can be distinguished by their positions. This sounds quite reasonable and undisputable. However, it is

equally reasonable to assume that the same applies to a gas, that is, the particles have different

positions x (as well as different velocities u) also in the gaseous phase. The difference is that in a solid

the positions are almost constant while in a gas they are highly variable and unknown, which implies

higher entropy. If in a gas the particle positions (and velocities) were indistinguishable (would it mean

that all particles have the same x and u?), then apparently a different statistical treatment would be

demanded from the outset, rather than an ad hoc correction in the end of the derivation.

On the contrary, our approach, which defines the extensive entropy as a difference of entropies or,

equivalently, as the entropy standardized for the average volume per particle v, does not make any

ontological consideration within the classical physical framework, and does not introduce properties

differentiating microscopic particles from macroscopic ones, neither particles in fluids from those in

solids (recall that in the latter the reference entropy related to their location is zero). It simply

introduces the extensive entropy Φ* and the intensive entropy φ

* as entropic metrics more convenient

that the standard probabilistic ones, Φ and φ, respectively, yet offering descriptions equivalent with the

latter.

At first glance these may seem just philosophical implications, but there are also technical

problems in the indistiguishability approach. These problems can better be viewed with the help of

Figure 1 by applying equation (2) with a small number of particles, N, and a small number of boxes,

M, where M represents a discretization of the volume V. For example, taking M = 3 boxes and N = 2

particles, from (2) we obtain a number of possible arrangements WMB = 4.5, which is absurd (this

number cannot be non-integer). By increasing the number of particles, i.e. N = 9 for M = 3, WMB

becomes lower than 1 (WMB = 0.054), which is even more absurd, indicating a probability greater than

1 (1/WMB = 18.44).

The question is then raised, what does WMB = M N/N! in the Maxwell-Boltzmann statistics really

represent? The answer is illustrated in Figure 1 for the case of N = 2 particles, labelled as a and b,

which are placed into M = 3 boxes. There are W = M N

= 9 possible arrangements. If we regard the

particles as indistinguishable, then the number of possible arrangements is reduced to 6, numbered

from 1 to 6 in Figure 1, where each of numbers 1, 4 and 6 appears once and each of numbers 2, 3 and

5 appear twice. The number 6 is predicted by the Bose-Einstein statistics, i.e. WBE = (N + M – 1)! /

[N! (M – 1)!)] = 4!/(2! 2!) = 6. Now, the number 1/WMB = N!/M N

= 2/9 is, clearly, the probability of

each of the cases 2, 3 and 5. In contrast, each of the cases 1, 4 and 6 has probability 1!/M N = 1/9, so

that all 6 cases indeed sum up to probability 1. We can then observe that the consideration behind

calculating these probabilities does not correspond to maximum entropy, as the possible events are not

equiprobable. Furthermore, the fact that these probabilities are calculated with a denominator M N

indicates that, in fact, the particles are not regarded indistinguishable6; if they were, then the

denominator should be 6, and if entropy was maximized for these 6 cases, then the numerator would

be 1. Thus the number WMB = M N

/N! in (2) seems like a computational trick rather than a result of

consistent analyses based on reasonable hypotheses.

6 In this vein, Papoulis [5] regards the Maxwell-Boltzmann statistics associated with distinguishable particles, in

contrast to statistical thermodynamics texts, such as those cited here, which associate it with indistinguishable

particles.

12

Figure 1 Schematic of possible ways for placing particles a and b into three boxes; the numbers 1-6 indicate

different states if the particles are regarded as indistinguishable.

Were particles indeed indistinguishable, it would be more consistent to use, instead of WMB, the

Bose-Einstein version of possible states, i.e. WBE = (N + M – 1)!/ (N! (M – 1)!). For a large number of

states this would not have any difference from a computational point of view: from the known

asymptotic relationship M b – a Γ(M + a)/Γ(M + b) → 1 as M → ∞ [21], where Γ( ) is the gamma

function, we conclude that, for large M, (M + N – 1)!/(M – 1)! ≈ M N

and thus (3) switches to (2), i.e.

WBE ≈ WMB = M N/N!.7 A second approximation is then needed, and is legitimate for large N (see point

(c) in section 3), to convert the denominator N! into NN, so that eventually the number of states

becomes Wf ≈ WBE ≈ WMB ∝ (M/N)N, which corresponds to maximum entropy (in equiprobability) of

N ln

(M/N).

In other words, the questionable hypothesis of indistinguishability required two steps of

approximation to produce the result N ln

(M/N), which appears merely as an approximation for large M

and N while it is the exact desideratum for any M and N. It is easy to recognize that N ln

(M/N), rather

than ln (M

N/N!), is the exact desideratum, because the former gives an extensive quantity while the

latter actually does not, except as an approximation (cf. the last term of (23), in which M corresponds

to V). In contrast, in the proposed approach the desideratum N ln(M/N) is an exact quantity, not

presupposing any type of approximation or hypothesis. In our approach, the quantity N ln(M/N) does

not represent the entropy, which remains ln(M N

) = N ln M as logic dictates; rather it represents the

difference of two entropies N ln M – N

ln N.

Even if we are satisfied with the derivation of the result N ln(M/N) using the indistinguishability

hypothesis, and suppose that it indeed represents entropy, additional inconsistencies emerge. Their

origin is the fact that entropy is tightly associated with a probability density function and maximum

entropy is associated with the entropy maximizing density function. If we accept the

indistinguishability hypothesis, then the density functions (15) and (20) are no longer valid. Nor is the

independence between density functions of different particles, which is represented in (20). What has

replaced (20), in order for the entropy resulting from it to be ln WBE and/or ln WMB? This is difficult to

answer in general, although in mere combinatorics problems the distributions have been calculated

[22]. However, we can provide an easy demonstration of dependences expanding our example with N

particles placed in M boxes. The probability that all N particles are in the same state i assuming

indistinguishable particles is clearly 1/WBE(N, M) = N! (M – 1)!/(N + M – 1)!. Now we add one particle

and the probability that all N + 1 particles are in state i becomes 1/WBE(N + 1, M) =

(N + 1)!(M − 1)!)/(N + M)!. The conditional probability that the added (N + 1)th particle enters state i

given that the other particles are in state i is the ratio of the two, i.e. WBE(N, M) / WBE(N + 1, M) =

(N + 1)/(N + M). Clearly, for N ≫ M this equals 1 and this means a very strong dependence, which,

7 In this vein, Ben-Naim [4] introduces the Maxwell-Boltzmann statistics as the common limit, for M >> N, of

both Bose-Einstein and Fermi-Dirac statistics.

a, b

a b

a b

a, ba, b

a ba b

a ba b

b a

a, b

a b

b ab a

a, ba, b

a ba b

b a

b a

a, b

b ab a

b ab a

a, ba, b

1

2

3

2

4

5

3

5

6

13

while it is not mathematically impossible, it is physically quite implausible, considering locations of

gas molecules in a container (although it is relevant in other cases involving interacting particles, e.g.

Bose-Einstein condensation). In contrast, in our proposed approach, this conditional probability is

M N

/M N + 1

= 1/M, i.e. independent on N, as reasonably expected for a classical gas in a container.

One may argue that for classical particles in a container the number of particles is always smaller

than the number of possible states, which is infinite, and thus the condition N ≫ M cannot hold.

However, this argument can be refuted by recalling that the entropic mathematical framework is not

necessarily (and only) about the characteristic microscopic states of the particles (also known as

elementary events in probability theory). One can define the entropy for macroscopic states, which

correspond to collections of many, even uncountably many, microscopic states. The ability to make a

description of a system in terms of a partition (a number of mutually exclusive subsets of Ω whose

union equals Ω, where each of the subsets can contain many elementary events), else known as a

coarse grained description, is essential in the entopic framework [5]. Actually, partitioning is tightly

associated with including postulate (d) in the introduction of the entropy function, as described in

section 2; mathematically, this postulate is expressed in terms of a relationship in a partition

refinement [2, p. 3; 18, theorem 1]. As an example, we can coarse grain the description of the

container of particles in motion by defining a partition with only three elements, namely {A1, A2, A3},

with A1 = {z ∈ Ω: 0 ≤ x1 < a/3}, A2 = {z ∈ Ω: a/3 ≤ x1 < 2a/3}, A3 = {z ∈ Ω: 2a/3 ≤ x1 ≤ 1}, where x and

z have been defined in section 3 and a is the length of the container (notice that this partition is defined

considering just one of the coordinates of z, namely x1). In this case, A1 represents a composite event

that a particle is found at the leftmost third of the container, irrespective of the other coordinates and

its velocity. It does not represent a unique characteristic state of a particle (an elementary event) but,

rather it represents a collection of uncountably many events or states, arbitrarily assigned as a

(composite) event by us and thus not representing a characteristic property of the particle per se. The

entropic framework should be valid for this description too, in which M = 3, just as in the example of

Figure 1. In this case, the condition N ≫ M holds true and thus the inconsistency resulting from the

indistinguishability hypothesis is very relevant.

A common justification of the indistinguishability hypothesis has been that there is no way to label

the microscopic particles in a gas and thus they cannot be distinguished from each other. This is

generally correct but irrelevant: Labelling cannot affect the statistical properties and distribution. The

example of Figure 1 can again help us to see that the argument is pointless. In the figure we have

labelled the particles as a and b, and using either the principle of insufficient reason, or the principle of

maximum entropy, we can conclude that, for example, the state where both particles are placed in the

leftmost box is 1/9. We can experimentally test this theoretical result by repeating an experiment many

times, measuring how many times both particles are found in the leftmost box. Now let us erase the

labels and repeat the experiment. Note that the event that both particles are in the leftmost box does

not depend on labels and nothing relating to it will change when we erase the labels. Is it reasonable to

expect that its probability has increased to 1/6, because we erased the labels? We can imagine that a

supporter of the indistinguishability hypothesis would insist that the setting of the above thought

experiment is inappropriate for the microscopic world, in which labelling is not possible at all. But we

can refine the thought experiment to make it more relevant to the microscopic world, assuming two

particles moving in a chamber. Each of them is an atom of each of the two stable isotopes of helium,

so that we can have the possibility to label them by mass number (a = 3He, b =

4He) if we wish. In this

case perhaps we can all agree on the expectation that both atoms will be found in the leftmost third of

the chamber in 1/9 of the time. Now let us remove labelling, or replace the 3He atom with a

4He one to

make labelling impossible. Is it reasonable to expect that now we will find both particles in the

leftmost third of the chamber in 1/6 of the time? There is no doubt that in quantum systems of

interacting particles with a finite number of allowed characteristic states, the question can have a

14

positive answer. However, as explained above, in our classical description of non-interacting

particles, the leftmost third of the chamber does not represent a unique characteristic state of the

particles (an elementary event) but, rather, it represents a collection of uncountably many states (a

composite macroscopic event). Therefore a negative answer of the question is more plausible.

Verification of the negative answer is readily provided by computer simulations of classical systems of

gas particles. As aptly observed by Swendsen [7], if indistinguishability was relevant for this case, it

would require the correction of results obtained from simulating a system of distinguishable particles,

given that computer simulations inevitably use systems of distinguishable particles. A very useful and

enlightening computer application can be found in http://stp.clarku.edu/simulations

/approachtoequilibrium/box/index.html (see also [23]). For example, by setting N = 2, and running the

program for, say, 1000 time steps, we may see that in about 250 (= 1000/4) of them, and not in 333 (=

1000/3), both particles are found in the leftmost half of the box.

8. The Gibbs paradox

We consider two identical parts of a box separated by a diaphragm, each containing N identical

particles with energy E in a volume V. The entropy in each part is Φ(E, V, N). Since independence

between the two parts can be assumed, the total entropy is 2 Φ(E, V, N). If we remove the diaphragm,

the entropy clearly becomes Φ(2E, 2V, 2N) and according to (26) with α = 2, there is an increase of

entropy equal to 2N ln 2. The same result is obtained from (31) by replacing NA and NB with N (and N

with 2N). If we reinsert the diaphragm the entropy will become again 2 Φ(E, V, N) and there will be a

decrease of entropy, –2 N ln 2. These have been regarded as contradictions to thermodynamics, known

as the Gibbs paradox [8,24].

However, according to the probabilistic definition of entropy Φ and its interpretation as a measure

of uncertainty, there is no contradiction or paradox. The increase of entropy after removal of the

diaphragm, quantifies the fact that we have greater uncertainty about the location of each particle.

Initially, we knew each particle’s location with “precision” V and after the removal we lost

information, with the “precision” becoming 2V. The increase of entropy by 2N ln 2, objectively

expresses the lost information, and reflects the physical fact that the motion of each particle has a

bigger domain. Likewise, the decrease of entropy after reinsertion of the diaphragm (which cannot be

thought of as a spontaneous natural phenomenon) reflects the gain of information and the decreased

uncertainty about the position of a particle.

One may contend that the introduction of the diaphragm gives us no information at all about which

part any particular particle might be found in. Such an argument is interesting and in fact can be fully

addressed by the proposed probabilistic approach. Specifically, if we do not know whether a particle is

in one or the other part, then the entropy of the particle is φ as given in (30) and has not been affected

by the diaphragm. However, once the diaphragm is there and once we know that at some time the

particle is in a specific part, then the entropy of the particle has been reduced to φA < φ (the leftmost

term in (30)). The information that the specific particle is in the specific part at a specific time is

important because it is transferred to subsequent times. That is, if we know the part in which a particle

is and there is a diaphragm, then the entropy is φA now and in the future. In contrast, if the diaphragm

is absent, then knowing which part the particle is in makes the entropy φA now, but soon after in the

future this information is lost and the entropy becomes φ. Thus the introduction of the diaphragm is

associated with reduction of uncertainty.

All in all, with the proposed probabilistic framework there is no paradox at all. It is important to see

that the standardized (extensive) entropy Φ* does not change at all when the diaphragm is removed or

reinserted, which indicates full consistence with classical thermodynamics. (It is recalled that the

classical thermodynamic entropy S is identical to Φ*k rather than to Φk).

15

According to the proposed approach, this situation does not change if on either side of the

diaphragm the gases are different, thus removing the asymmetry (or discontinuity) in the two cases

where in the two boxes the gases are different or the same. Again the entropy Φ after the mixing (by

removal of the diaphragm) will be by 2N ln 2 greater than the sum of the entropies in the initial state.

And again, if we consider the mixture of gases as an equivalent single gas (as is the standard practice,

for instance, in the atmospheric mixture) the entropy Φ* will not change, in comparison to the initial

state. The latter is consistent with Ben-Naim’s [4] formulation, “Mixing, as well as demixing, are

something we see but that something has no effect on the MI of the system”, where MI (Missing

Information) actually corresponds to Φ* of our approach, as Ben-Naim uses the indistinguishability

hypothesis. However, since “we see” mixing, there must be an explanation why it happens, and this is

the increased entropy Φ.

9. Summary and concluding remarks

The established idea that, in the microscopic world, particles are indistinguishable and

interchangeable, when used within a classical physical (rather than quantum mechanical) framework,

proves to be problematic on logical and technical grounds. Abandoning this hypothesis does not create

problems but, rather, it helps avoid paradoxical results, such as the Gibbs paradox, associated with a

discontinuity of entropy in mixing gases with different or same particles. More importantly, it avoids a

discontinuity in logic when moving from the microscopic to the macroscopic world, while retaining a

classical framework of description. Such discontinuity cannot stand, as demonstrated by Swendsen

[7,8] using the example of colloids whose particles are large enough to obey classical mechanics and

cannot be regarded indistinguishable. There is no reason why the definition and expression of entropy

in colloids should not be the same as in typical thermodynamic systems, such as ideal gases.

The alternative framework proposed here is the application of the typical probabilistic entropy

definition to thermodynamic systems, such as an ideal gas, treating the particles’ positions and

velocities as random variables, with each particle assigned its own vector of random variables (thus

not using the indistinguishability hypothesis). The entropy calculated in this manner is no longer

extensive. This is not a problem, though. In fact, careful consideration of the probabilistic notion of

entropy reveals that the probabilistic entropy of a system composed of several parts that interact is not

the sum of the entropies of the parts, but includes an additional term, which is the entropy of the

partitioning (related to the probability distribution of a particle being in each part). Furthermore, it is

shown that we can easily adapt the probabilistic entropy to obtain an extensive quantity, which proves

to be identical with the extensive entropy used in classical thermodynamics. This extensive quantity is

the difference of two probabilistic entropies. The rationale in introducing it is to derive a quantity that

is invariant under the observer’s selection of the reference volume for the studied system. It is also

shown that the descriptions of a system by either the typical probabilistic entropy or the extensive

entropy are equivalent, but the quantity that naturally obeys the principle of maximum entropy is the

probabilistic entropy. This provides an explanation for the spontaneous occurrence of mixing, in

which the probabilistic entropy increases while the extensive entropy remains constant.

Is there any usefulness in replacing the typical interpretation of entropy, which includes the

indistinguishability hypothesis, with the proposed framework? A possible reply is that the proposed

interpretation enables potential generalization, rendering the typical formulation of the Second Law a

special case of the more general principle of maximum entropy. Formulating the entropy in pure

probabilistic terms, associating entropy with the notion of a random variable, and interpreting entropy

as none other than uncertainty, makes the same framework applicable in any uncertain system

described by random variables. Removing the barrier between the microscopic and the macroscopic

world, whenever classical descriptions are used in both, enables generalization of mathematical

16

descriptions of processes across any type and scale of systems ruled by uncertainty. Such

generalization is strongly needed in an era dominated by a delusion of the feasibility of deterministic

descriptions of complex systems and a quest for radical reduction of uncertainty, which may be

paralleled to a quest for a perpetual motion machine of the second kind (defined e.g. in ref. [25]). For

example, such generalization on a high level of macroscopization may help explain the marginal

distributional properties of hydrometeorological processes [26], the clustering of rainfall occurrence

[27] and the emergence of Hurst-Kolmogorov dynamics in geophysical and other processes [13, 28].

Acknowledgments The anonymous reviewers’ comments on an earlier version of the paper helped me

to improve the presentation, make the arguments more coherent and avoid some mistaken statements

of the original version. Equally helpful were the comments of an anonymous reviewer of a subsequent

version of the paper, whom I also thank for spotting some errors and for calling my attention to Refs.

[6, 9, 10 and 11]. The discussions with CJ and LM, and particularly their different views, helped me a

lot. I also thank Amilcare Porporato for his interest to read the paper, his detailed corrections and his

encouraging comments. Finally, I am grateful to the Editor Jos Uffink, who despite the rejection of the

original version of the paper (submitted in Jan. 2012) approved my request to resubmit a revision and

thus gave the paper a second chance, whose outcome was positive.

Appendix: Proof of equations (30)-(32)

To show that (30) holds true (cf. also Papoulis, 1991, p. 544) we observe that the unconditional

density f(z) is related to the conditional ones f(z|A) and f(z|B) by

f(z) = π f(z|A)‚ z in A

(1 – π) f(z|B)‚ z in B (33)

Denoting ∫A g(z) dz and ∫B g(z) dz the integral of a function g(z) over the intersection of the domain of z

with the volume A and volume B, respectively, the unconditional entropy will be

φ = Φ[z] = –∫A f(z) ln(f(z)/h(z))dz – ∫B f(z) ln(f(z)/h(z))dz =

= –∫A π f(z|A) ln(π f(z|A)/h(z))dz – ∫B (1 – π) f(z|B) ln((1 – π) f(z|B)/h(z))dz =

= –π ∫A f(z|A) ln(f(z|A)/h(z))dz – π ln π ∫A f(z|A)dz

– (1 – π) ∫B f(z|B) ln(f(z|B)/h(z))dz – (1 – π) ln(1 – π) ∫B f(z|B)dz

We observe that ∫A f(z|A) ln(f(z|A)/h(z))dz = φA and ∫B f(z|B) ln(f(z|B)/h(z))dz = φB, whereas ∫A f(z|A) dz

= ∫B f(z|B) dz = 1. Evidently, then, (30) follows directly.

Further, from (30) we get

φ = (NA/N) φA + (NB/N) φB – (NA/N) ln (NA/N) – (NB/N) ln (NB/N)

Nφ = NA φA + NB φB – NA ln NA – NB ln NB + N ln N

so that (31) follows directly. In turn, from (31) we obtain

φ – ln N = (NA/N) (φA – ln NA) + (NB/N) (φB – ln NB)

Nφ* = NAφ

*A + NB φ

*B

from which (32) follows directly.

References

[1] G. H.Wannier, Statistical Physics, Dover, New York, 532 pp., 1987.

[2] H. S. Robertson, Statistical Thermophysics, Prentice Hall, Englewood Cliffs, NJ, 582 pp., 1993.

[3] K. Stowe, Thermodynamics and Statistical Mechanics (2nd edn., Cambridge Univ. Press, Cambridge, 556 pp.

17

2007.

[4] A. Ben-Naim, A Farewell to Entropy: Statistical Thermodynamics Based on Information, World Scientific Pub.

Singapore, 384 pp., 2008.

[5] A. Papoulis, Probability, Random Variables and Stochastic Processes, 3rd edn., McGraw-Hill, New York, 1991.

[6] N. G. van Kampen, The Gibbs paradox, in Essays in Theoretical Physics, ed. by W. E. Parry, 303-312,

Pergamon, New York, 1984.

[7] R. H. Swendsen, Statistical mechanics of classical systems with distinguishable particles, Journal of Statistical

Physics, 107 (5/6), 1143-1166, 2002.

[8] R. H. Swendsen, Gibbs’ paradox and the definition of entropy, Entropy, 10 (1), 15-18, 2008.

[9] C.-H. Cheng, Thermodynamics of the system of distinguishable particles, Entropy, 11, 326-333, 2009.

[10] M. A. M. Versteegh and D. Dieks, The Gibbs paradox and the distinguishability of identical particles, Am. J.

Phys., 79, 741-746, 2011.

[11] D. S. Corti, Comment on “The Gibbs paradox and the distinguishability of identical particles,” by M. A. M.

Versteegh and D. Dieks, Am. J. Phys., 80, 170-173, 2012.

[12] C. E. Shannon, The mathematical theory of communication, Bell System Technical Journal, 27 (3), 379–423,

1948.

[13] D. Koutsoyiannis, Hurst-Kolmogorov dynamics as a result of extremal entropy production, Physica A, 390 (8),

1424–1432, 2011.

[14] R. H. Swendsen, How physicists disagree on the meaning of entropy, Am. J. Phys., 79 (4), 342-348, 2011.

[15] P. Atkins, Four Laws that Drive the Universe, Oxford Univ. Press, Oxford, 131 pp., 2007.

[16] E. T. Jaynes, Information theory and statistical mechanics, Physical Review, 106 (4), 620-630, 1957.

[17] E. T. Jaynes, Probability Theory: The Logic of Science, Cambridge Univ. Press, Cambridge, 728 pp., 2003.

[18] J. Uffink, Can the maximum entropy principle be explained as a consistency requirement?, Studies In History

and Philosophy of Modern Physics, 26 (3), 223-261, 1995.

[19] E. H. Lieb & J. Yngvason, The entropy of classical thermodynamics, in Entropy, ed. by A. Greven et al.

Princeton Univ. Press, Princeton, NJ, USA, 2003.

[20] C. Tsallis, Entropy, in Encyclopedia of Complexity and Systems Science, ed. by R.A. Meyers, Springer, Berlin,

2009.

[21] M. Abramowitz & I. Stegun, Handbook of Mathematical Functions (10th ed.), 1046 pp., US Government

Printing Office, Washington DC, USA, 1972.

[22] R. K. Niven, Origins of the combinatorial basis of entropy, Bayesian Inference and Maximum Entropy Methods

in Science And Engineering: 27th International Workshop on Bayesian Inference and Maximum Entropy

Methods in Science and Engineering, doi:10.1063/1.2821255, 954, 133-142, 2007.

[23] H. Gould and J. Tobochnik, Statistical and Thermal Physics With Computer Applications, Princeton University

Press, 511 pp., 2010.

[24] E. T. Jaynes, The Gibbs paradox, Maximum entropy and Bayesian methods: proceedings of the Eleventh

International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis (Seattle, 1991), 1-

21, Kluwer, Dordrecht, 1992.

[25] D. Kondepudi & I. Prigogine, Modern Thermodynamics, From Heat Engines to Dissipative Structures, Wiley,

Chichester, 1998.

[26] D. Koutsoyiannis, Uncertainty, entropy, scaling and hydrological stochastics, 1, Marginal distributional

properties of hydrological processes and state scaling, Hydrol. Sci. J., 50 (3), 381–404, 2005.

[27] D. Koutsoyiannis, An entropic-stochastic representation of rainfall intermittency: The origin of clustering and

persistence, Water Resour. Res., 42 (1), W01401, doi: 10.1029/2005WR004175, 2006.

[28] D. Koutsoyiannis, A hymn to entropy (Invited talk), IUGG 2011, Melbourne, International Union of Geodesy

and Geophysics, 2011 (http://itia.ntua.gr/1136/).