21
PHYSICAL REVIEW A VOLUME 40, NUMBER 8 OCTOBER 15, 1989 Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico 87545 (Received 17 June 1988; revised manuscript received 31 March 1989) Algorithmic randomness provides a rigorous, entropylike measure of disorder of an individual, microscopic, definite state of a physical system. It is defined by the size (in binary digits) of the shortest message specifying the microstate uniquely up to the assumed resolution. Equivalently, al- gorithmic randomness can be expressed as the number of bits in the smallest program for a univer- sal computer that can reproduce the state in question (for instance, by plotting it with the assumed accuracy). In contrast to the traditional definitions of entropy, algorithmic randomness can be used to measure disorder without any recourse to probabilities. Algorithmic randomness is typically very difficult to calculate exactly but relatively easy to estimate. In large systems, probabilistic en- semble definitions of entropy (e.g. , coarse-grained entropy of Gibbs and Boltzmann's entropy H =ln8; as well as Shannon's information-theoretic entropy) provide accurate estimates of the al- gorithmic entropy of an individual system or its average value for an ensemble. One is thus able to rederive much of thermodynamics and statistical mechanics in a setting very different from the usu- al. Physical entropy, I suggest, is a sum of (i) the missing information measured by Shannon's formu- la and (ii) of the algorithmic information content algorithmic randomness present in the avail- able data about the system. This definition of entropy is essential in describing the operation of thermodynamic engines from the viewpoint of information gathering and using systems. These Maxwell demon-type entities are capable of acquiring and processing information and therefore can "decide" on the basis of the results of their measurements and computations the best strategy for ex- tracting energy from their surroundings. From their internal point of view the outcome of each measurement is definite. The limits on the thermodynamic efficiency arise not from the ensemble considerations, but rather reflect basic laws of computation. Thus inclusion of algorithmic random- ness in the definition of physical entropy allows one to formulate thermodynamics from the Maxwell demon's point of view. I. ENTROPY AND RANDOMNESS Until recently it was impossible to define the entropy of a single microscopic state of a system. Rather, one had to consider an equilibrium ensemble of identical systems, and calculate how many microscopic configurations have their macroscopic properties identical with the single mi- crostate, entropy that was being sought ~ Once the num- ber of such macroscopically indistinguishable microscop- ic configurations 8' was known, one could employ the celebrated Boltzmann formula' H= Trp lnp . (1. 2) Here p represents the density matrix of the considered quantum system. The unsatisfactory feature of this definition of entropy is well known and immediately apparent. When the sys- tem follows the dynamical evolution prescribed by the appropriate Hamiltonian, the quantity H defined by Eqs. (1.1) and (1. 2) is constant, as the number of microscopic states remains the same (Liouville's theorem). In particu- 40 H =ln8', or its more general version due to Gibbs. In quantum mechanics the analog of Gibb's formula was introduced by von Neumann: 4731 lar, a single initial microstate will give rise, at any time t, to just one future microstate. Yet, to prove the second law, one must argue that the number of states in which the system can be found actually increases. As the thesis of the above statement for all reversible, Hamiltonian evolutions is incorrect, one might have expected that the search for a dynamical explanation of irreversibility will lead nowhere. The success of statistical mechanics has shown that in spite of its self-contradictory features, the probabilistic description of deterministic systems has cap- tured an important element of truth. Boltzmann's equa- tion and his H theorem as well as Gibb's idea of coarse graining serve as the most prominent examples. Both of these ideas achieve their objective by similar means: They selectively discard part of the information about the investigated system, which justifies introduction of the probabilistic description. ' Yet the success is not complete. In spite of the numerous quantitative confirmations of thermodynamic formalism, the issues of the "arrow of time" and, more generally, of the nature of entropy continue to be debated more than a century after their inception. ' Several additional developments have added fuel to this discus- sion. The realization that irreversibility, entropy, and in- formation play a key role in the discussion of the mea- surement process in quantum theory is the first of them. " Development of the rigorous theory of com- Work of the U. S. Cgovernment Not subject to U. S. copyright

Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

PHYSICAL REVIEW A VOLUME 40, NUMBER 8 OCTOBER 15, 1989

Algorithmic randomness and physical entropy

W. H. ZurekTheoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico 87545

(Received 17 June 1988; revised manuscript received 31 March 1989)

Algorithmic randomness provides a rigorous, entropylike measure of disorder of an individual,microscopic, definite state of a physical system. It is defined by the size (in binary digits) of theshortest message specifying the microstate uniquely up to the assumed resolution. Equivalently, al-gorithmic randomness can be expressed as the number of bits in the smallest program for a univer-sal computer that can reproduce the state in question (for instance, by plotting it with the assumedaccuracy). In contrast to the traditional definitions of entropy, algorithmic randomness can be usedto measure disorder without any recourse to probabilities. Algorithmic randomness is typicallyvery difficult to calculate exactly but relatively easy to estimate. In large systems, probabilistic en-semble definitions of entropy (e.g. , coarse-grained entropy of Gibbs and Boltzmann's entropyH =ln8; as well as Shannon's information-theoretic entropy) provide accurate estimates of the al-gorithmic entropy of an individual system or its average value for an ensemble. One is thus able torederive much of thermodynamics and statistical mechanics in a setting very different from the usu-

al. Physical entropy, I suggest, is a sum of (i) the missing information measured by Shannon's formu-la and (ii) of the algorithmic information content —algorithmic randomness —present in the avail-able data about the system. This definition of entropy is essential in describing the operation ofthermodynamic engines from the viewpoint of information gathering and using systems. TheseMaxwell demon-type entities are capable of acquiring and processing information and therefore can"decide" on the basis of the results of their measurements and computations the best strategy for ex-tracting energy from their surroundings. From their internal point of view the outcome of eachmeasurement is definite. The limits on the thermodynamic efficiency arise not from the ensembleconsiderations, but rather reflect basic laws of computation. Thus inclusion of algorithmic random-ness in the definition of physical entropy allows one to formulate thermodynamics from theMaxwell demon's point of view.

I. ENTROPY AND RANDOMNESS

Until recently it was impossible to define the entropy ofa single microscopic state of a system. Rather, one hadto consider an equilibrium ensemble of identical systems,and calculate how many microscopic configurations havetheir macroscopic properties identical with the single mi-crostate, entropy that was being sought ~ Once the num-ber of such macroscopically indistinguishable microscop-ic configurations 8' was known, one could employ thecelebrated Boltzmann formula'

H= —Trp lnp . (1.2)

Here p represents the density matrix of the consideredquantum system.

The unsatisfactory feature of this definition of entropyis well known and immediately apparent. When the sys-tem follows the dynamical evolution prescribed by theappropriate Hamiltonian, the quantity H defined by Eqs.(1.1) and (1.2) is constant, as the number of microscopicstates remains the same (Liouville's theorem). In particu-

40

H =ln8',or its more general version due to Gibbs. In quantummechanics the analog of Gibb's formula was introducedby von Neumann:

4731

lar, a single initial microstate will give rise, at any time t,to just one future microstate. Yet, to prove the secondlaw, one must argue that the number of states in whichthe system can be found actually increases. As the thesisof the above statement for all reversible, Hamiltonianevolutions is incorrect, one might have expected that thesearch for a dynamical explanation of irreversibility willlead nowhere. The success of statistical mechanics hasshown that in spite of its self-contradictory features, theprobabilistic description of deterministic systems has cap-tured an important element of truth. Boltzmann's equa-tion and his H theorem as well as Gibb's idea of coarsegraining serve as the most prominent examples. Both ofthese ideas achieve their objective by similar means:They selectively discard part of the information about theinvestigated system, which justifies introduction of theprobabilistic description. '

Yet the success is not complete. In spite of thenumerous quantitative confirmations of thermodynamicformalism, the issues of the "arrow of time" and, moregenerally, of the nature of entropy continue to be debatedmore than a century after their inception. ' Severaladditional developments have added fuel to this discus-sion. The realization that irreversibility, entropy, and in-formation play a key role in the discussion of the mea-surement process in quantum theory is the first ofthem. " Development of the rigorous theory of com-

Work of the U. S. CgovernmentNot subject to U. S. copyright

Page 2: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4732 W. H. ZUREK 40

munication, with the formal apparatus full of analogies toBoltzmann-Gibbs entropy, is the second. ' ' More re-cently, the development of black-hole thermodynamics'as well as the discussions of the arrow of time in thecosmological context' force one to reexamine the natureof entropy in a setting very different from the one forwhich it was originally invented.

So far all of these discussions have relied on a proba-bilistic formalism in which the entropy of a definite, com-pletely known microstate is always zero, and where themembership of this microstate in an ensemble defines theset of probabilities —the density function used to calcu-late H. Indeed, the proof that Shannon's missing infor-mation measure,

H = —g p, log~, , (1.3)

which we shall also call Boltzmann-G ibbs-Shannon(BGS) entropy, is the unique reasonable additive measureof ignorance' '' appears to have dissuaded many fromlooking for alternatives, although under some specific cir-cumstances such alternatives have proved quite useful ~

The aim of this paper is to investigate a new definitionof entropy, which does not use this ensemble strategy andwhich can be applied to the individual microstates of thesystem. This entropy is based on the rigorous definitionof randomness introduced independently by Solo-monoff, ' Kolmogorov, ' and Chaitin' and furtherelaborated by Kolmogorov, Chai tin, and others. '

Definition of entropy as algorithmic information contentcapitalizes on an intuitive notion of what is a randomnumber, or a random configuration. Random meansdifficult to describe or to reproduce. Consider, for exam-ple, two binary strings:

01010101010101010101,

10011010010110110010.

The first string can be described as "ten 01's." Thesecond more random string has no apparent, simpledescription. This does not always mean that no simpledescription exists. For example, there exists an infiniteseries beginning with 01110101000001001111that also"looks random" at first sight but has a simple descrip-tion: digits of the binary representation of &2. Algo-rithmic entropy of the binary string s can be defined asthe shortest possible description that suffices to reproduces. One can make this definition rigorous by consideringthe length (expressed in the number of digits) of the shor-test algorithm —e.g. , computer program for a universalTuring machine —that produces the output s. I shall de-scribe this definition and discuss some properties of algo-rithmic entropy in Sec. II.

Second III extends the definition of the algorithmic en-tropy of a binary string to an idealized physical system-Boltzmann gas. I describe there (and in Appendix A)how a proper algorithmic implementation of particle in-distinguishability allows one to avoid Gibb s mixing para-dox and leads to the correct expression (that is, theSackur-Tetrode equation) for the entropy of an ideal gas.Moreover, this concrete example provides one with the

opportunity to point out some of the difficulties encoun-tered in an attempt to define algorithmic informationcontent of specific physical systems. These difficulties arefurther considered and partially settled in Appendix B.

A comparison of algorithmic randomness, regarded asentropy, with Boltzmann and Gibbs-Shannon ensembleentropies is presented in Sec. IV. In contrast to the—Trp lnp entropies, algorithmic entropy can change evenin the course of reversible dynamical evolutions. As men-tioned briefly in the first part of Sec. IV and considered inmore detail in Appendix C, this behavior allows one topropose an algorithmic version of the second law obeyedeven by integrable dynamically evolving systems. Bycontrast, in the usual equilibrium thermodynamic ensem-bles algorithmic and statistical approaches are likely toprovide almost identical answers for the value of entropy.Therefore, even if one were to claim that it is the algo-rithmic information content alone that measures disorderof the state of the system, the use of the less direct butmore convenient probabilistic prescriptions for entropycould be rigorously justified.

While Secs. III and IV do contain some new applica-tions and novel ways of looking at the algorithmic con-cepts, they can be regarded as an extended introductionto Sec. V, which is concerned with formulating laws ofthermodynamics from an internal viewpoint of an "infor-mation gathering and using system" (IGUS), an observer-like entity that can perform measurements, process theacquired information, and use the results to take actionsaimed at optimizing thermodynamic efficiency of the en-gine it controls. Section V proposes that both random-ness and missing information play a role in defining physi-caI entropy, that is, the quantity that limits the amount ofthe internal energy of a physical system that can be con-verted into useful work by an IGUS. I demonstrate therethat the physical entropy should be regarded as a sum oftwo separate contributions, one measuring the random-ness of the already known aspects of the state of the sys-tem, the other expressing the remaining ignorance of theobserver about the actual state. This recognition of thedual nature of physical entropy allows one to discuss thethermodynamics of engines operated by a modern dayequivalent of a Maxwell demon —a universal Turingmachine capable of performing measurements —from thepoint of view of such an entity, where the measurementoutcomes are definite, without a reference to the statisti-cal ensemble describing all conceivable measurement out-comes. Some of the crucial aspects of the laws of thermo-dynamics can be regarded from that internal point ofview as a consequence of the laws of computation.

A discussion of the consequences of adopting the newdefinition of physical entropy proposed in Sec. V and in-cluding both of its complementary contributions—randomness and ignorance —is conducted in Sec. VI.There we shall also touch on the subject of extending theapplicability of algorithmic entropy to quantum systems.Further implications of the new definition of entropy forthe second law and for the issues arising in the context ofthe quantum theory of measurements will be explored infuture publications.

The possible relevance of algorithmic information con-

Page 3: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4733

tent to thermodynamics has been anticipated by a briefbut insightful discussion in the seminal paper by Ben-nett. Algorithmic information has also been used tocharacterize information generated by a dynamicallyevolving chaotic system. This approach, advocated byFord, ' focuses on a sequence of states describing a tra-jectory of a system.

The following are two notes about the manner in whichthis paper was written and the way in which it should beread.

(i) Algorithmic information theory is a branch ofmathematics, and many of the results employed in thediscussion below can be established rigorously. I haveopted instead to paraphrase them in an informal manner,revealing the key ideas, but, for the sake of brevity andclarity, without attempting to be rigorous. This com-ment applies equally well to the theorems proposed belowfor the first time, and to the results obtained elsewhere,which are just described here. Moreover, in order to il-lustrate the physical significance of the algorithmic ap-proach as early as possible, I discuss only a comfortableminimum of the results important from the viewpoint ofintended physical applications. Readers interested in amore extensive and rigorous overview are directed toRefs. 20—26.

(ii) Those looking for a quick "preview" of the con-clusions of this work may find it useful to read the begin-ning of Sec. V A, as well as Secs. VI and VII after the firstperusal of the body of the paper. The purpose of this pa-per is not to advocate a complete replacement of the en-semble entropy with the algorithmic entropy —as Secs. II,III, and IV may appear to suggest to a casual reader, butrather to conclude that the physically relevant entropyshould be defined from the internal viewpoint of the ob-server, and that it has components of both probabilisticand algorithmic origin. In a sense, the first half of the pa-per is an extended introduction to the algorithmic in-gredient of the new definition of physical entropy. Itsphysical significance becomes clear only in Secs. V andVI.

II. ALGORITHMIC RANDOMNESSOF BINARY STRINGS

A natural measure of disorder or, equivalently, of thedegree of randomness of a state of a system is the size ofthe smallest prescription required to specify it with someassumed accuracy. Ordered, regular states can be recon-structed from concise algorithms. Random states, bycomparison, require many more bits of information tospecify. In this very intuitive sense, the required amountof information quantifies the degree of randomness.

The strategy we shall adopt to measure the randomnessof states of a physical system will involve a simple com-puter. A program for this computer will be considered agood description of the system if the output will containsufficient information in some standard format toreconstruct —for instance, to plot —the state with the re-quired accuracy. Positions and momenta of all the parti-cles of an ideal gas are an example of such an output.Binary strings —for instance, in the form of numbersspecifying these coordinates —are the key ingredient of

such prescriptions. They can be also thought of asrepresenting a state of a one-dimensional chain of spinsand therefore as a direct description of an elementary ex-ample of a physical system. The aim of this section is todiscuss a measure of randomness of binary strings knownas the algorithmic information content, algorithmic ran-domness, algorithmic entropy, or, sometimes, as the algo-rithmic complexity. '

The algorithmic randomness K(s ) of a binary string sis defined as the length, in the number of digits, of theshortest program s* that will produce output s and haltwhen used as input of a universal Turing machine T:

K(s)—= /s*/ . (2.1)

A computer U is universal —as the Turing machine usedin the above definition —if for any other computer Cthere is a prefix ~c one can add to any program p so that~&p will execute the same computation on computer U as

p alone did on C. Algorithmic randomness of a typicalstring s is to the leading order given by its length in bits,Is I:

K (s) = Is I(2.2)

In other words, typical strings are random and cannot begenerated from more concise programs. Small correc-tions to this estimate of a typical value K(s) depend onthe issues that will be considered below. When s is inter-preted as a binary representation of an integer, Eq. (2.2)implies that K(s) =log2(s ). We shall restrict ourselves tobinary representations in the remainder of this paper.

While typical strings possess algorithmic informationcontent comparable to their length, and are thereforerandom, there exist strings which are obviously algo-rithmically simple. We have already listed some exam-ples from both categories in the Introduction. It is possi-ble to define whole classes of strings which are either al-gorithmically random or simple. For instance, the readercan verify that minimal programs used to define algo-rithmic information content must be algorithmically ran-dom. Let us also note that the logarithm relating the in-formation content of a typical string with the size of theinteger it represents is suggestive of the "log" appearingin the probabilistic definition of entropy. Toenumerate —and, hence, to specify —8' distinct, equallyprobable states of a physical system one obviously needsnumbers of order 8. Hence the typical algorithmic en-tropy of such a state, crudely defined as the size of thenumber giving its "address, " can be expected to be simi-lar to the estimate obtained through the Boltzmann for-mula, Eq. (1.1). We can thus anticipate that the numeri-cal estimates of the algorithmic randomness will be com-patible with the more traditional probabilistic, ensemblecalculations.

There are several points that must be clarified beforealgorithmic entropy can be accepted as an unambiguousmeasure of randomness. The first of them —potentialdependence of the size of the program on the"addressee" —has been already largely bypassed by rely-ing on a universal computer U. Use of different universalcomputers makes at most a difference bounded fromabove by a finite constant [that is, of order unity, O(1)]

Page 4: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4734 W. H. ZUREK

in the size of the minimal program.The next difficulty arises from the somewhat counterin-

tuitive fact: It is often possible to generate all themembers of a certain set with a smaller algorithm thanthe minimal algorithm necessary to print out just one ofthem. To illustrate this fact by a simple and concrete ex-ample (we shall come back to more physically motivatedexamples later in the paper), we consider a program thatcan list all finite strings corresponding to natural num-bers. A simple "counting loop, " a Turing-machine ver-

sion of the FORTRAN loop,

X=OPRINT X%=%+1GOTO I

will do the job. Given sufficient time this program willgenerate every finite string s starting with 1. A slightlymore sophisticated program can print out all the binarystrings arranged in what is known as lexicographic order.

A, 0, 1,00, 01, 10, 11,000, 001,010,011,100, 101,110,111,0000, 0001,0010, . . . . (2.3)

Above, A stands for the empty string. Lexicographic or-dering establishes a correspondence between all thestrings and the set of natural numbers. We can either al-low such strings to be a part of one long output tape—perhaps separated by commas —or we could add an in-struction to erase the string (number) 1V before the string(number) N+1 is printed. This way, instead of a tapewith all the strings we could have an erasable tape withstrings appearing "one at a time. " (The disadvantage ofthis arrangement is that unless we intervene, the numberswill all disappear. )

We have just demonstrated how one can use a conciseprogram to list arbitrarily many strings each of which issupposed, typically, to have randomness given by itslength, Eq. (2.2). Clearly, we are running into a contra-diction. Unless one specifies in more detail the require-ments a program must satisfy, we could encounter an un-comfortable upper limit given by the size of the binaryversion of the program constructing the lexicographiclist, Eq. (2.3), on the entropy of any finite binary string.

One can dispose of the difficulty illustrated above in afew steps. To begin with, one can demand that all theprograms halt after a finite number of steps. This ex-cludes the program described above, but it does not settlethe issue. One can modify the loop by forcing it to haltafter some large but algorithmically simple upper limitNMAX (e.g. , NMAX = 1111111111111.. . 111). Nowthe apparent upper limit on the algorithmic entropy of allthe numbers smaller than NMAX would be somewhatlarger, but still far from reasonable. Suitably chosenNMAX can be encoded into a very compact subroutine.

The next remedy is to demand that after the computerhalts, the output tape should contain nothing but the out-put string s. This condition obviously settles the issueraised above. It will be indispensable in the discussion ofthe algorithmic version of the second law in Appendix C.

Another requirement often imposed on the minimalprograms is the demand that they be prePx free or self-delimiting. Self-delimiting programs ' ' carry withinthem information about their size: They allow the com-puter to "decide" when to stop reading the input tape.They are also known as "instantaneous" codes, since theycan be executed alone, without any additional "prefixes"or "end markers. " An additional benefit of using self-delimiting codes is the simplicity of employing them assubroutines. What is most important from the point of

view of this paper is that the use of the self-delimitingcodes allows one to formulate algorithmic informationtheory in a manner more closely analogous to the Shan-non information theory. ' We shall find this especiallyuseful in Sec. IV, in the course of the discussion of the re-lationship between ensemble entropies and algorithmicrandomness, where the connection between unique deco-dability of a sequence of bits regarded as a message andthe self-delimiting property shall be further explored.

With the above considerations in mind we can nowdefine the joint algorithmic entropy of two strings s, t asthe size of the smallest self-delimiting program thatmakes U calculate both of them. Joint entropy satisfiesthe familiar inequality

K (s, t) ~ K (s)+K (t)+O(1) (2.4)

K (s, t) =K (t)+K(s it, K(t ))+O(1) . (2.6)

This differs somewhat from the "classical" Shannon in-formation theory, where the conditional informationsatisfies K (s, t) =K (t)+K (si t).

Mutual information is a quantity that has obvious in-tuitive significance. It is defined both in the algorithmicand classical information theory through

K(s:t)=K(s)+K(t)—K(s, t) . (2.7)

Its meaning is clear: It provides a measure of the in-

only when the admissible programs are self-delimiting.The price one pays for having Eq. (2.4) comes in theform of a modification of the estimate of the typical valueof algorithmic information of a single string:

K (~) = isi+O(log, isi )+O(1)= Isi+K ( isi )+O(1) .

(2.5)

The origin of this logarithmic correction to the first guessgiven by Eq. (2.2) is simple: The self delimiting pr-ogramused to reproduce s must contain the information aboutthe number of digits on the input tape. To encode this,additional log2 is i

bits will be typically necessary.Another formula modified by the requirement of self-

delimiting programs is the relation for the conditional en-tropy of string s given t, K(sit). Defined as the size ofthe program needed to calculate s from the input t, it isconnected with the joint information through the equa-tion

Page 5: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4735

dependence of two strings. That is, it indicates howmany more bits of information one needs to calculate sand t separately rather than jointly. Mutual informationis symmetrical. Pairs of strings which have mutual infor-mation of zero are called algorithrnically independent.

Most of the binary strings of a certain fixed length arealgorithmically random [K(s ) = ~s

~ ] and independent[K(s:t)=0]. However, there are obvious examples ofstrings that appear random but are in fact algorithmicallysimple [K(s ) « ~s ~ ]. Binary representations of easily cal-culable numbers such as v'2, e, m, etc. , are algorithmi-cally simple. Some of the algorithmically simple numbersare easy to spot. Consider, for example, a string that hasa low density of randomly distributed 1's. A simple stra-tegy designed to minimize program size is to encode sizesof the consecutive intervals of 0's. Readers are invited todemonstrate that this strategy will, in the limit of a verylong string, lead to the estimate of algorithmic entropyequal to the Shannon entropy calculated on the basis ofprobability of 0's and 1's.

Chaitin has demonstrated that the impossibility ofproving the randomness of a "random-looking" longstring is a natural consequence of the algorithmicdefinition of randomness, and can be regarded as a mani-festation of Godel's theorem. ' The proof can be para-phrased as follows: Suppose that a short algorithm canbe used to demonstrate randomness of a string l muchlonger than this algorithm. One could then use such pro-gram as a subroutine of a somewhat larger programdesigned to generate such l. To accomplish this, onewould include the algorithm in a loop generating proofsof randomness of integers in order of increasing proofsize. If a proof of randomness of some integer consider-ably greater than the program were accomplished, theprogram would print that integer and halt. Thus havinga short program prove that l was random would make italgorithmically nonrandorn. This contradiction can beresolved only if we conclude that either the axioms usedin the proof of randomness were inconsistent, or that thesearch for proofs of randomness would continue foreverwithout any successful case in the domain of stringsmuch longer than the program. Therefore it may be notjust difficult to estimate the algorithmic entropy of aspecific binary string; it will also often be impossible todistinguish random and nonrandom configurations. InSec. III we shall show that in spite of this difticulty asso-ciated with calculating K (s ) exactly, the algorithmicviewpoint can lead to useful insights into the entropy ofBoltzmann gas.

III. ALGORITHMIC ENTROPY OF A PHYSICALSYSTEM: BOLTZMANN GAS

Consider a container with N particles of "gas" inside.These "atoms" of gas can be thought of as miniature"hard spheres" with no internal degrees of freedom, andwith dimensions much smaller than the size of the con-tainer. In short, we are dealing with a "Boltzmann gas."

Discussion in this section shall focus on the task ofdescribing the system by means of the most compact

"message. " We shall imagine that the microstate of thesystem is known, and our task is to communicate it tosomeone else or to record it in some reproducible fashion.For definiteness, we assume in accord with Sec. II thatthe addressee is a certain specific Turing machine U. Asthe proof that the message was "understood" we shall ac-cept the ability of the addressee to reproduce, by somereasonable means, the state of the system up to thespecified accuracy.

To describe the microstate we adopt the following stra-tegy. (i) We divide the volume occupied by the particlesinto a lattice of cells sma11 compared with the separationsof the particles so that the probability of finding two par-ticles in the same cell is negligibly small. (This assump-tion of one particle per cell is convenient, but not essen-tial. ) (ii) We specify which of the cells contain particlesand which of them are empty. The size of the cells deter-mines the resolution with which the state of the system isgiven. More sophisticated strategies, which do not em-ploy fixed grids, but, rather, explore all possible methodsof encoding that satisfy some criterion for the accuracy ofthe description can be also employed. In essence, one isdealing with an issue analogous to those encountered inthe domain of coding and information theory. ' '

Algorithmic randomness of the microstate is given bythe length of the shortest computer program sufficient to"reproduce" its state with the requisite resolution. Theform of the reproduction is somewhat arbitrary. Forconcreteness, one can imagine the following procedurewhich could be regarded as a substitute for "computergraphics. " Suppose that a possible output of the univer-sal Turing machine U is the multidimensional "plotter"with a discrete resolution. A state of the system will beconsidered "reproduced" if the machine will fill pixels ofthe plot space corresponding to empty cells with 0 andpixels corresponding to filled cells with 1.

Machines with access to several multidimensionaltapes of this sort and other kinds of hardware [e.g. ,random-access memory (RAM) storage, etc.] are moreconvenient to use, and correspond more closely to thereal-world computers, but can perform only these verysame tasks as a one-tape universal Turing machine. Us-ing them will not influence estimates of the algorithmicentropy. This last statement is true in general —it is re-sponsible for the importance attached to Turingmachines by the theory of computation —but is particu-larly easy to see for the "graphical" representation of theBoltzmann gas. A primitive, but for our purposessufficient way to produce a plot would be to have U printout a tape with consecutive sections separated by com-mas and corresponding to rows in the phase space of thesystem partitioned into cells and organized in somedefinite order.

It is important to draw the distinction between the"binary image" of the gas microstate and the programthat can generate it. An image provides a direct descrip-tion of the state of the system. The program generating itcontains the same information, but represented in adifferent, typically more concise manner. There are, gen-erally, at least several programs that can generate thesame plot —the same binary image. The length of the

Page 6: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4736 W. H. ZUREK

shortest one will define the algorithmic randomness ofthe gas microstate.

The use of the analogy between the actual microstateof the many-body system —e.g. , gas or Quid —and the cor-responding binary image (spin lattice) is, of course, notnew. It was extensively and successfully exploited in theinvestigation of phase transitions. Our treatment willfocus on a different aspect of binary images. Neverthe-less, previous successful applications are not unrelated,and further motivate our approach.

Evaluation of the algorithmic entropy of different mi-croscopic configurations of a gas contained in a D-dimensional cube can be carried out with the help of aspecial-purpose computer L following a straightforward,but not necessarily optimal algorithm. L would first readin the key data (the number of dimensions D, the numberof particles N ). Subsequently, N X 2D phase-space coor-dinates of the individual particles would be generated in adouble loop. For a typical (that is, algorithmically ran-dom) configuration individual coordinates would have tobe generated by NX2D separate "subroutines, " whichcontain appropriate data. The first 2D numbers can bethen used to print 1 in the appropriate empty square ofthe tape corresponding to the position and momentum ofthe first particle —that is, at the location

(x", /b, „,x"'/b, , . . . , x"'/b. „p',"/b, , . . . , p" /b ) .

The process is repeated N times, after which the machinefills in the remaining blank spaces with 0's, and halts.For instance, when D=2, a binary image —a primitiveplot —could begin by printing out a string of pixels of theplot filled in with the appropriate symbols correspondingto the first row: x

&/6„=0, x2/6„=0, p, /6 =0, which

are kept constant, and the variable pz/6 changing fromits minimal to its maximal value. The row withxI/5 =x2/6 =0, p, /6 =1 follows, and so on, untilall of the volume of the phase space of the gas in thevolume of the container is mapped out. The size of theprogram for the special-purpose rnachine L can now bereported as the estimate (and, almost certainly, an overes-timate) of the algorithmic randomness of the state of thegas. The additional information that needs to be suppliedto a universal computer U can be clearly encoded in theform of a compact binary string.

When the configuration of the system is nonrandom(for example, when the particles are placed on a regularlattice), the size of the program can be decreased by gen-erating their coordinates by means of a simple subrou-tine. Hence we can expect to shorten programs generat-ing plots of less random configurations of Boltzmann gas.

The measure of randomness suggested above is verydifferent from the one based on ensembles. In particular,it bypasses the concept of probability, and can be used todefine the entropy of a completely specified microscopicstate of an individual dynamical system. The questiontherefore arises as to whether the value of the algorithmicentropy is related to the value of the statistical or ther-modynamic entropy. This issue will be consideredthroughout the course of this paper. In Appendix A weimplement the strategy outlined above to estimate algo-rithmic randomness of a typical configuration of the clas-

sical Boltzmann gas and explore one of its aspects—indistinguishability —in more detail. It is demonstratedthere that the algorithmic randomness of a typical micro-state of the gas of N indistinguishable particles in D di-mensions confined to the volume Vis

E =N log2 +—log2V D rnkT +O(1) .

NbV 2 (g )2(3.1)

Here k is the Boltzmann constant, 1 the temperature,while b, V=(h ) and 5 define the resolution in theconfiguration and momentum halves of the phase space.

Equation (3.1) is known as the Sackur-Tetrode equa-tion for the entropy of the ideal gas. A shortcut deriva-tion of the Sackur-Tetrode equation would proceed as fol-lows. There are Sl=C /N! distinct ways of distributingN indistinguishable particles of gas among the availableC =(V/b, V)(&mkT /b, ) cells in the phase space. Oneway to specify a certain configuration it to give its "ad-dress" among the 6 possibilities. The typical value of theaddress numeral is then —Q. Hence its binaryspecification requires —log&A bits. The Sackur- Tetrodeequation (3.1) follows.

This second argument yields the correct answer.Moreover, it establishes an intimate connection withBoltzmann's approach to the entropy of a microcanonicalensemble of ideal gas, as it is directly equivalent to the"counting of the complexions. " Furthermore, the ap-proach based on the size of the address can be easilyadopted for an arbitrary microcanonical ensemble. Yet itfollows a philosophy that is quite different from the ex-plicitly descriptive strategy introduced earlier in this sec-tion and implemented in Appendix A. Algorithmic ran-domness is, of course, given by the shortest program thatgenerates the description of the state in question as anoutput. However, the fact that these two very differentprogramming strategies result in an approximately identi-cal answer is encouraging, as it implies that, in spite ofun decidability, finding a reasonable estimate of algo-rithmic randomness of a typical state may not be thatdifficult.

The value of the statistical entropy of a classical systemdepends on the "resolution" —the volume of the cells inthe phase space corresponding to distinct microstates —aswell as on the structure of the "grid" these cells form.Both of these issues, the dependence on the resolution aswell as the subjectivity associated with the "coarse grain-ing, " have their algorithmic counterparts. For the pur-pose of this paper it would have been sufficient to adoptthe attitude that both of these features of the grid aredetermined by the observer: The measurements per-formed by the observer define a certain "natural grid, "which in turn specifies the value of the algorithmic ran-domness in the state of the system. One might, therefore,expect that the algorithmic randomness of a physical sys-tem is as highly subjective as statistical entropy. As dis-cussed in more detail in Appendix B, this remark applieswithout qualifications only to the dependence on the reso-lution. Quantum mechanics must be invoked to fix thevolume of the cells in the phase space and thus to assurethat the entropy is finite. There is, however, a strategy

Page 7: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4737

that removes some of the subjectivity associated with theshape of the grid: In the intuitive definition of the algo-rithmic randomness as the size of the message it is natu-ral to demand that description of the coarse-grainingmust be included as a part of the message. As demon-strated in more detail in Appendix B, this strategyrelegates much of subjectivity to the differences betweenthe distinct Turing machines. This, in turn, allows suchresidual subjectivity to be quantified by employing algo-rithmic methods.

IV. ENSEMBLE ESTIMATES OF ALGORITHMICRANDOMNESS

The purpose of this section is to investigate the relationbetween the algorithmic and statistical measures of disor-der. Physics has traditionally employed entropy in twodistinct roles: (i) as a measure of irreversibility and (ii) asan equilibrium thermodynamic potential.

The use of entropy in the formulation of the secondlaw and the conflict between the irreversibility of thermo-dynamics and reversibility of the underlying dynamicsremains a leading theme of statistical mechanics. Indeed,the observation that in the closed system Trp lnp is a con-stant of motion has been (and remains) one of the keyreasons for dissatisfaction with the traditional definitionsof entropy. One might be concerned that algorithmicdefinitions of entropy wi11 automatically inherit this trou-blesome feature. In particular, one might argue that,since the time-evolved state of the system is obtainedfrom the initial state by the action of the Hamiltonian,the randomness of a11 the states along the trajectoryshould not exceed the sum of the algorithmic informationcontent of the initial state and of the Hamiltonian.Therefore one would be forced to conclude that all thedescendant states of an algorithmically simple state mustbe algorithmically simple. I demonstrate in Appendix Cthat this conclusion is incorrect: The typical algorithmicrandomness of a microstate —even if it is descended froma simple initial condition —is consistent with Boltzmann sentropy of the corresponding microcanonical ensemble.

One way of understanding this is to recognize that thelabeling strategy employed in the derivation of Eq. (3.1)in Sec. III is automatically accomplished in the course ofa dynamical evolution of an ergodic system: There theordered sequence of microstates is generated by the Ham-iltonian. Time —the duration of the evolution from someinitial state —can be regarded as a label of a microstate.In a perfectly ergodic system all of the microstates aretraversed before the system returns (having completed itsPoincare cycle P) to the initial configuration. When thisinitial state is chosen to be algorithmically simple, theleading contribution to the algorithmic randomness of atypical descendant is =log2P, which leads one to thecorrect answer. As described in more detail in AppendixC, the requirement of a specific output —corresponding toa description of a single, definite microstate —at the corn-pletion of the computation is crucial in arriving at thecorrect algorithmic estimate of the randomness of thetime-evolved state. Indeed, for reasons analogous tothese discussed in connection with Eq. (2.2), a programthat would list all the microstates along the trajectory

would be more concise than the program that generates adescription of a definite, typical microstate.

Irreversibility is of obvious interest as a separate issue.Therefore we shall leave a more detailed investigation ofthe algorithmic approach to irreversibility for the sequelof this paper. By contrast, the relation between algo-rithmic randomness and Boltzmann-Gibbs-Shannon en-tropy considered in the remainder of this section is essen-tial for further discussion of physical entropy and is thetrue focus of this section.

A result of Sec. III—the algorithmic derivation of theSackur-Tetrode equation for the entropy of ideal gas-was a forerunner of the conclusion we shall reach: Weshall demonstrate that, in general, probabilistic con-siderations result in accurate estimates of the average al-gorithmic entropy of a member of an ensemble. In spiteof this conclusion, we shall go on in the next section tosuggest that the physical entropy, the true measure of theamount of energy extractable from the system, containsboth algorithmic and missing information contributions.

A. Entropy of thermodynamic ensembles

K(6 ) = is*i «& H(6 ) (4.2)

will define, for the purpose of this paper, thermodynamicensembles. In the context of statistical mechanics, ther-modynamic ensembles are determined by macroscopic—which we take to mean "simply describable"—constraints and the probabilities are easily computablefunctions of the microstates. This motivates our focus onthermodynamic ensembles. They wi11 allow us to showthat the concept of entropy in equilibrium thermodynam-ics can be based on the algorithmic foundation. Howev-er, as we shall discuss at some length in Sec. V, these arenot the only ensembles possible. The three kinds of en-

Consider a statistical ensemble 6, a collection of mi-crostates Iskj that occurs with probabilities pk. The sta-tistical entropy of this ensemble is given by the formula

H(6)= g pl, loge1

(4.1)

In statistical mechanics Eq. (4.1) plays a key role in deriv-ing thermodynamic characteristics of a system from mi-crophysics. In this subsection we shall discuss ensemblesthat are defined concisely. That is, we shall assume thatthere exists a concise algorithm c which, given sufhcienttime, will generate as its output a description of everyrelevant microstate of 8 and compute its probability withsome (arbitrarily high, but finite) predetermined accura-cy. As 6 may contain infinitely many microstates, weshall not demand of c to halt. Instead, we shall requirethat the microstates listed as the output be "weakly sort-ed" according to their probabilities in the followingsense: For every arbitrarily small but finite 6&0 thereshould exist a finite time (number of steps) JV& after whichdescriptions of all the states of 6 with the probabilitiespA & 6 must be listed. The length of the smallest programc' which satisfies these criteria will be regarded as the al-gorithmic information content of the ensemble D. Theinequality

Page 8: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4738 W. H. ZUREK

sernbles widely used in statistical mechanics-microcanonical, canonical, and grand canonical —areclearly "thermodynamic" for sufficiently large systems inthe sense of the defining Eq. (4.2).

Here we shall demonstrate that the average algorith-mic entropy of member of a thermodynamic ensemble

(K)z= g p&K(sI, )

s~ E(~(4.3)

is closely approximated by the corresponding Shannonentropy, Eq. (4.1):

1X p~iog2

s~ e ~'i Pk(4.4)

x=-('s~) = g p, i, , (4.5)

In the context of the algorithmic information theorythis connection between (K)& and H(8) was first estab-lished by Kolmogorov, Levin, and Zvonkin (see Ref. 24)and was extended and strengthened —using advantages ofthe definition of algorithmic randomness by means ofself-delimiting programs —by Chaitin (see especiallyTheorem 3.4, as well as 3.2 and 3.5 in Ref. 21). Bennetthas pointed out the physical significance of this result forthe thermodynamic case defined by Eq. (4.2). Below, Ishall present a new, physically motivated proof of the keyinequalities and —in Sec. V—apply it to the situations in-volving measurements that are sufficiently detailed to in-validate Eq. (4.2).

The line of argument I shall follow to investigate therelationship between 0( 6'), ( K ) &

and K ( 8 ) is borrowedfrom the coding theory (see, e.g. , Hamming' ). One ofthe fundamental problems of coding theory is the searchfor efficient ways of representing symbolss, ,sz, . . . , Sk, . . . , put out by a source of informationwith the respective probabilities p, ,p2, . . . , pk, . . . , interms of an alphabet consisting of another set of symbols(e.g. , 0, 1) for the purpose of convenient storage ortransmission. We shall be particularly interested in thecase when the code words s, , s2, . . . , sk, . . . , corre-sponding to the source symbols are composed from abinary alphabet.

A natural measure of the efficiency of encoding is theaverage length (the number of binary symbols per sourcestate) of the encoded message

s] ~0 s2~ 10 s3~ 1 10, s4~ 1 1 1 ~

With this encoding symbols can be decoded as soon asthey are received.

The analogy that will be pursued in this section regardsan ensemble 6 as a "source" of individual microstateswhich play a role of source symbols. These microstatesoccur in the ensemble with frequencies proportional totheir probabilities pk. The most efficient encoding willminimize the average length

(K),= gp. K(s. )

k

(4.7)

of the programs which can be used to record (or com-municate) individual microstates.

The prefix condition corresponds to the requirementfor the program to be self-delimiting. A self-delimitingprogram must carry within it the information about itssize. Hence it will be able to initiate the computationwithout having to be prompted by some additional sym-bol (e.g. , ","). In this sense it is, therefore, "instantane-ous.

The real advantage of self-delimiting codes followsfrom the relation between their sizes and the probabilitiesthat they will be generated by some random process, suchas the flipping of an unbiased coin. ' The probabilitythat a specific program of length l will be generated is ob-viously 2 '. The sum of all such weights —the total prob-ability that a valid, halting program will be obtained byflipping a coin—cannot be more than 1. This in turn im-plies that sizes of self-delimiting programs as well as thelengths of the encoded words l, = ~s, are constrained bythe Kraft inequality For the en. coding to be uniquelydecodable l; must satisfy

can be decoded as s&s&szs4 or s3S4sz or in a few moreways. By contrast,

s~~0, sz ~01, s3 ~011, s4 ~ 111

is uniquely decodable but not instantaneous. The wholemessage 00111 has only one decoding (s, s, s~). However,it must be received in toto before unique decoding is pos-sible (e.g. , the first three digits "read" s ls2 ).

This necessity of requiring the whole message todecode it is avoided by an instantaneous code

2 '(1. (4.8)

l=sI I (4.6)

s] +0 s2 + 1 s3 ~00, s4~ 1 1

The encoded message should be uniquely decodable.That is, a sequence of 0's and 1's, which constitutes anencoded message s, s sk. . . , should correspond to a sin-gle, unique sequence of source symbols. The best way toaccomplish this is to use the so-called "instantaneous" or"prefix-free" coding. In the instantaneous code no codeword sk is the "prefix" (the first part) of any other codeword. For instance, the encoding

The converse of this theorem is also true; that is, whenev-er the sizes of the words do satisfy the Kraft inequality,an instantaneous code s, with word sizes 1, exists.

Shannon-Fano coding is a specific, efficient implemen-tation of the coding which satisfies the Kraft inequality.The length of the words is chosen to satisfy

(4.9)

where Ia ] is defined as the smallest natural number equalor greater than a. The length lk of a word sk is thereforebounded from both above and below:

is not uniquely decodable. An encoded message 00111 —log2p& sk ~

( —logzpl +1 . (4.10)

Page 9: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4739

Taking the average over the whole ensemble one arrivesat

of 0's and 1's we consider the sum

—g pklog2pk gp; Ik & Qpklog2Pk+1 .k k

(4. 1 1)(4.13a)

The lower bound on the average size of the messagefollows directly from the Kraft inequality: The concavityof the log function yields immediately that

1g Pk log2Pk

—l~

Pk2—t„ log 'k Pk(0 (4.12)

The upper bound exists by virtue of the specific(Shannon-Fano) encoding strategy.

Shannon-Fano coding is not optimal: A recursive stra-tegy known as Huffman coding can be shown to be op-timal. To implement Huffman coding one starts with thetwo least probable states and assigns them code wordsthat differ in only one (last) digit. The combination ofthese two states is thereafter treated as a single new state.Within this redefined list of states, the two least probableare found and assigned the last yet unassigned digit. Theprocedure is repeated until the list of redefined states con-tains only two states, which are assigned digits 0 and 1.In this way each state is assigned an instantaneous se-quence of binary digits, its code word. '

Huffman coding must, of course, satisfy the left-handside of inequality (4.11), since it follows directly from theKraft inequality, which in turn follows from the require-ment of unique decodability. Furthermore, since it iseven more concise than the Shannon-Fano coding, itmust at least obey the same upper bound on the averagemessage size. Indeed, the best upper bound in terms ofthe Shannon entropy of the source is—even for Huffmancoding —given by the right-hand side of Eq. (4. 11).

The conclusion of the above discussion is, therefore,that the average length of the messages of our instantane-ous code can be made very close to the entropy of thesource. Below we shall derive a version of the inequality(4.11) valid for self-delimiting programs.

Our strategy is based on the definition of the ensemble:We have assumed the existence of the concise program cwhich, given enough time, will generate the output con-sisting of "weakly sorted" descriptions of microstates skbelonging to 6 and will compute their respective proba-bilities pk with the requisite (finite, but otherwise arbi-trary) accuracy. This program can be now used as a sub-routine which, in addition to (i) ensemble description c.,contains also (ii) a sorting routine cr which (a) arrangesdescriptions in the order of decreasing computed proba-bilities (so that s, appears before s if and only if p, p )

and (b) for microstates that have, within errors, the samecomputed probabilities, some other definite order (e.g. ,lexicographic) is adopted; and (iii) a program c whichuses the ordered list containing probabilities pk to assigneach microstate sk a definite binary "code word" sk usingeither the Shannon-Fano or the Huffman procedure.

Shannon-Fano coding can be illustrated in a particular-ly straightforward manner (see Fig. 1). To set up adefinite correspondence between states sk and sequences

For definite sk the first lk binary digits of gk constitute aninstantaneous encoding of sk. (We have, of course, as-surned that the states are presorted in the order of de-creasing probabilities. ) The consecutive digitsa, , a2, . . . , of the code word can be then read off directlyfrom the binary expansion of gk

(k)2 —1+ (k j2 —2+. . . +~(k)2CX2

/

(4.13b)

It is perhaps worth emphasizing that even when the ini-tial digits are 0 they should not be omitted in construct-ing the code word sk

—=a', 'az(k ). . . aI '.

Let us now return to the derivation of the analog of in-equality (4. 11) for the average algorithmic randomness ofan ensemble D. The complete program capable ofuniquely reproducing any microstate sk of the ensemblemust, by virtue of the Kraft inequality, satisfy

H(D) & QP, K(s, ) . (4.14)

The upper bound of the average algorithmic randomnessfollows from the inequality

K(s, ) & ~s, ~+K( 6)+K(cr, c )+O(1) . (4. 15)

Here ~s, is the length of the code words, K(6) is thelength of the minimal program to generate descriptionsand probabilities of the ensemble microstates, K(cr, c ) isthe joint algorithmic complexity of the sorting anddecoding algorithms, and O(1) is the usual constant asso-ciated with the choice of the universal computer.

For Shannon-Fano coding ~s; ~

& —logzp, + 1. More-over, K(o, c) does not depend on 6 and therefore can beincorporated in the constant term O(1) along with theextra "+1." Multiplying the inequality K(s, ) & —log2P,+K( P~)+ O(1) by p; and summing over the whole ensem-ble, we arrive at

QP, K( s) &H(6) +K(F) +O(1) . (4.16)

We have therefore established the following.Theorem 4.I. The ensemble average of the algorithmic

randomness of microstates belonging to the ensemble 6 isbounded from below by the statistical entropy of that en-semble and from above by the sum of the Shannon entro-py and algorithmic information content of c~:

H (6') & (K(s, ))(; H&(D) +(KD) +(O1) . (4.17)

For thermodynamic ensembles K( 6 ) «& H( 6), whichimplies that the relative difference between K(6') andH(6) is negligible.

An important issue, omitted in the discussion above,concerns the accuracy with which the probabilities of theensemble microstates have to be computed in order forTheorem 4. 1 to apply. Problems would emerge if one

Page 10: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4740 W. H. ZUREK 40

were required to attain infinite accuracy in calculatingprobabilities of states. Fortunately, it is not difficult tosee that the inequality (4.17) will be satisfied as long asthe generated probability distribution qk is a faithful ap-proximation of the actual distribution given by pk. Oneway of phrasing this requirement is to demand that the

expression

~= &p~iog~k Ik

be of order unity.

(4.18)

1.0

1110

1101101

10

1100Q j,

1011101 a.:t

1010

1001100

1000

011 'I

0110110

0.0

0

00

2

010

001

000

I

3

0101

0100

0011

0010

0001

0000I

4

CODE WORD LENGTH gi —— — log p(s;)FICx. 1. Efficient instantaneous encoding of the states [s;] with probabilities p(s;) is illustrated above with the help of the lexico-

graphic tree. The tree is plotted in a coordinate system, with the vertical axis corresponding to the cumulative probability and thehorizontal axis labeled with the index of the "level" l of the tree (there are 2' distinct branches at the level l). This establishes acorrespondence between each 1-digit-long binary sequence and a 2 ' section of the [0,1) interval. Determination of the code wordcorresponding to the state s, begins with arranging all the states in the order of decreasing probabilities [p (s, ) ~ p(sz) ~p(s, ) ~ ].Code words are assigned in this order. The code word corresponding to the state sj, with probability p (s& ) is picked out from the l& thlevel of the tree, where I»= [log~[1/p(sl, . )]]. The actual sequence of digits assigned to s„corresponds to the first still availablebranch on the level lI, of the tree. If a certain binary string is chosen as the code word (this is indicated by "framing" it in the figure),the corresponding branch and all its "descendants" at levels greater than (to the right of) lk are "cut oA." The tree is successively

/c —l'

Jc—l.

"pruned, " and its whole sections contained in the intervals [g,":,' 2 ', g,",2 ') (indicated by "ticks" on the shaded "pruning bor-

der") become unusable. This guarantees prefix-free coding. There will always be unassigned branches to encode states with thenonincreasing probabilities p(s, ) with the code words of the length [log[1/p(s;)]]. This is apparent both from the figure and from—l.the fact that 2 ' ~p(s;), which in turn guarantees that the Kraft inequality, Eq. (4.8), is satisfied. In fact, the proof of the Kraft in-equality utilizes the lexicographic tree in this manner. The method shown in this figure implements Shannon-Fano coding, which isnot optimal, but which suffices to establish the upper bound in the inequality (4.17).

Page 11: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

40 ALQORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4741

V. PHYSICAL ENTROPY=MISSINGINFORMATION+ KNOWN RANDOMNESS

Section IV has demonstrated that the consequences ofthe radical shift of the paradigm used for the definition ofentropy of thermodynamic ensembles would be, from thepoint of view of its quantitative estimates, barely notice-able. The average values of the algorithmic entropy andof the Gibbs-Shannon entropy of any concisely describedensemble are virtually identical. Indeed, in equilibriumall the sensible definitions of entropy must approximatelycoincide. Otherwise, given the excellent record of theBoltzmann-Gibbs approach, they would be experimental-ly ruled out.

Conceptual implications of the new definition of entro-py are, on the other hand, very different from eitherBoltzmann or Gibbs ensemble definitions. Ensemble en-tropies bear an unmistakable formal similarity toShannon's information-theoretic entropy. Indeed, al-ready Maxwell and Boltzmann expressed the suspicionthat entropy is a measure of ignorance. ' A more directconnection between missing information and entropy wasestablished by Szilard in his discussion of Maxwell'sdemon.

Particularly influential works arguing in favor of theidentification of entropy with missing information aredue to Jaynes ' and Brillouin. Yet all but a few of themost fervent supporters of the information-theoretic in-terpretation will agree with the notion that individualstates of physical systems can be either ordered or ran-dom regardless of our information about them. A brokenglass —a frequent example of the increase of disorder inmovies illustrating the thermodynamic arrow of time —ismore random than the unbroken one not because our ig-norance about it has dramatically increased when it wasshattered by the hammer, but because it has becomemore disordered.

A. Definition of physical entropy

The intuitive notion of randomness is, we believe, for-ma11y expressed by the algorithmic definition of entropy.We have demonstrated that equilibrium systems are ex-pected to have the same equilibrium thermodynamicproperties regardless of which of the two paradigms isemployed to define their entropy. Therefore we have the"objective" quantity —algorithmic randomness —whichmeasures the disorder in the system. This approach isconsistent and well defined for the systems considered byequilibrium statistical mechanics, but it does not suKceto give a full account of the operation of idealized ther-modynamic engines. For, consider gas entering an ap-propriate chamber in the engine. Its initial state, fromthe standpoint of the operation of the engine, is definedby the few relevant macroscopic properties (pressure,volume of the chamber, temperature) which can be usedto determine its entropy, and which can, in turn, be em-ployed in the calculations of the amount of the work thatcan be extracted. The fact that the gas may be in somespecific microstate does not enter into consideration atall. Indeed, the e%ciency of the engine is largely deter-mined by the fact that its design ignopes all but a few

where pk~d is the conditional probability of the state sj,given d.

Definition Physica. l entropy is the sum of the missinginformation and of the length of the most concise recordexpressing the information already at hand:

Sd=Hd+K(d) . (5.1)

It is apparent that Sd is a good thermodynamic poten-tial. For, by the results of Sec IV, 4'd .—=gk pk~dIC(sk ~d).Therefore, for systems in thermodynamic equilibrium,the ensemble average of Sd approximates the premea-surement ensemble entropy (that is, the entropy comput-ed in absence of d). Below, we shall prove that the quan-tity defined by Eq. (5.1) enters into the formulation ofthermodynamics for engines operated by entities that canacquire information through measurements and can pro-cess it in a manner analogous to Turing machines. Weshall also drop the subscript d on Sd to simplify the nota-tion.

The physical significance of the above formula is bestunderstood in the example in which a sequence of mea-surements is being carried out on a system (see Fig. 2).Measurements change conditional probabilities pkId ofthe microstates; as a result, Hd decreases. On the otherhand, the "record tape" containing measurement out-comes is getting more and more data about the system.If these measurements are performed on an algorithmi-cally random member of an equilibrium ensemble, then,by virtue of the results discussed in Sec. IV, the measure-ment will increase the size of the record by the amount

macroscopic characteristics of the microstate. The workextracted is then limited not so much by the details of themicroscopic initial state, but by the combination of thesefew usually macroscopic properties of that state that are"encoded" in the design of the engine —in the "algo-rithm" it follows. If this algorithm is unable to take ad-vantage of all the features of the initial state of the gas,then it will consistently miss the additional opportunitiesto extract all of the extractable energy.

A more sophisticated engine designed to miss fewersuch opportunities could be operated by an IGUS (infor-mation gathering and using system), which, followingeach measurement, could perform a computation in anattempt to optimize the strategy for each individual caseon the basis of the acquired data d. Following the mea-surement, with a definite outcome at hand, the entityoperating the engine could also assess its ability to extractuseful work. The question therefore arises: %'hat quanti-ty should it employ in the calculation of the net useful en-ergy'

Motivated by such considerations, I conjecture that thequantity relevant for this purpose is the physical entropyg. It is a sum of two contributions: (i) The algorithmicrandomness K(d) given by the size of the most concisedescription of the already available relevant data d, and(ii) the information about the actual microstate which isstill missing, in spite of the availability of d, as measuredby the Shannon conditional entropy

Hd Q Pk d iog2Pk d

Page 12: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4742 W. H. ZUREK

S= Hd + K(d)

CO

CQ

Z'.

CL0CC

LLI

NUMBER OF MEASUREMENTS

(b)

CQ

Z'

CL0CC

Z'.LLJ

NUMBER OF MEASUREMENTS

FICs. 2. Schematic illustration of the effect of data acquisitionby measurements on (i) Hd, the information still missing in pres-ence of the data d; (ii) K(d), the algorithmic information con-tent of the data; and (iii) the physical entropy S=Hd+K(d),observers internal measure of net work that can be extractedfrom the system. (a) For a typical (algorithmically random) mi-crostate of the system the size K(d) of the minimal record willincrease by the amount equal to the decrease in Hd. Hence thephysical entropy 4 will be constant, unchanged by the rneasure-ments. (b) For a regular microstate, the increase of the minimalsize of the record describing the acquired data will be less thanthe decrease of the still missing information Hd. Consequently,the value of physical entropy can really decrease as a result ofthe measurement, and the observer can gain the ability to ex-tract useful energy from the system.

almost exactly equal to the increase of the statisticalinformation —decrease of Shannon's entropy —about itsstate. This must be so, as, according to Eq. (4.17), the en-semble average of the sum Hd+K(d) (that is,

gd pd[Hd+K (d)]) is bounded from below by (and in

fact, approximately equal to) the average g„pkK(k),which, for an equilibrium thermodynamic ensemble 6, isapproximately given by the algorithmic entropy of itstypical member. Hence

(gd )„=(K)g -H(8—) .

In the limit when the measurements are successful, andthe microstate known precisely, pk =5kk, the physical en-tropy of the system is given by the algorithmic random-ness of the state in which the system is found.

Consider now a sequence of measurements of a "regu-lar" (that is, simply describable) microstate. Before themeasurements are carried out, the physical entropy isdominated by the missing information. As the measure-ments are carried out, the information about the state ofthe system is stored in the form of a concise programrecord. The size of such a record will be smaller in bitsthan the decrease of the missing information. In the end,the minimal program record, equal in the number of bitsto the algorithmic entropy of the state, can be used tostore the information. However, now this implies a rela-tively short record, as the state, by assumption, was algo-rithmically simple.

Physical entropy defined by Eq. (5.1) can be really de-creased by a gain of information. When the state of thesystem is regular, the information about it can be record-ed concisely, its binary image can be generated from ashort program. Moreover, the length of the record neednot be a monotonic function of the decrease in ignorance.Often it may become possible to compress the recordonly after it contains sufhcient evidence of the underlyingregular pattern.

Indeed, one can claim that the measurements are usu-ally devised "in anticipation" of regularities. Approxi-mate measurements of macroscopic properties performedby our senses are particularly adept at finding familiarpatterns (see, e.g. , Ref. 33 for an interesting discussion ofvisual perception in the computational context). Most ofthe information reaches our consciousness alreadypreprocessed, with the records relatively concise com-pared with the number of hypothetical possibilities.Analyzing this point would take us, however, into a dis-cussion of the theory of perception. While this is a fas-cinating subject, one can demonstrate how the record canbe compressed by other, more rigorous and less anthro-pocentric means.

Information processing by means of computers can ac-complish the goal of compressing the record. To demon-strate this, we consider a record r which can be generatedfrom a more concise program r', by a universal computer(or by a special purpose computer attached to themeasuring device). We also assume that this computercan operate reversibly.

The proof that computers can process information re-versibly. is due to Bennett. Landauer had earlier ar-gued that only logically irreversible operations lead toloss of information and increase of entropy. Bennett tookthe next step to show that reversibility can be accom-plished by storing the information normally erased insuch irreversible operations, so that the computation can"backtrack" from the output to the input along the samelogical route. A good summary of the subject can befound in recent reviews.

To shorten the record without any thermodynamiccost one can use a reversible computer to replace r withr'. Here is one strategy of accomplishing compression:The computer first uses r', which, we assume, it alreadycontains, to generate one more copy of r. The newly gen-erated r can be used to cancel the "old" but identical r.Reversible cancellation leaves one r behind. Theremaining r can be finally converted reversibly into r' by

Page 13: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4743

running the program backwards —it has now all the"directions" to backtrack reversibly to r'.

As a result of all these reversible operations, the corn-puter will be left with r' and the rest of the register previ-ously occupied by r will be filled with ~r~

—~r'~ 0's. Ourdiscussion demonstrates an important point: If therecord r' is known to encode the same information as rmore economically —that is, using fewer bits —the sub-stitution of r with r can be, in principle, used to"compress" information stored in computer memory atno cost in work or entropy increase.

It is important to emphasize that the above argumentdemonstrates the possibility of compression only when amore concise r' is known "by fiat" to generate r, the sub-stitution can be accomplished reversibly. However, ingeneral, as it follows from the undecidable aspects of thealgorithmic information, it may be difficult if not impossi-ble to determine whether a string r has a more concisedescription. Nevertheless, the ability to recognize con-cisely describable configurations will be of advantage inthe task of decreasing physical entropy.

B. Physical entropy and the demon's versionof thermodynamics

To demonstrate that the use of the term "physical en-tropy" for the expression defining 4, Eq. (5.1), is legiti-mate, consider a transition taking the system from the in-itial state s, to some final state s&. We shall assume thatthis transition is accomplished sufficiently slowly to bethermodynamically reversible. To simplify the analysis,we shall also assume that the internal energy of the sys-tem is constant, so that the work extracted in the courseof the transition is given by

5R =ThS . (5.2)

Here T is the temperature, Boltzmann's constant kz = 1,and ES is the difference between the entropy of the final

and initial state

(5.3)AS=S~ —S; .

Equation (5.2) defines, for the purpose of this argu-ment, the quantity that should be regarded as physicalentropy. We shall demonstrate that when this transition(s; ~s& ) is controlled by an automated "demon" —acomputer that can distinguish between different initialstates and which maintains a current record of the stateof the system —the physical entropy S must be given byEq. (5.1). In short, we shall show that S which must beused in Eq. (5.2) is the physical entropy S.

This arrangement is clearly inspired by the "Szilard'sengine, " in which the demon controls the operation,altering the course of the cycle depending on the outcomeof the measurement, on the information it acquires aboutthe new initial state of the system. ' ' The initial stateof the system is a single particle of gas located in eitherthe left or the right chamber. The demon's memory con-tains the corresponding one-bit record. The final state-in the sense of our analysis —corresponds to the particlesomewhere in the container. The demon must according-ly "reset" its memory to be able to initiate a new cycle.This "resetting" —as argued by Bennett ' for the clas-

sical version of the Szilard engine —is thermodynamical-ly costly and essential in preventing the demon fromviolating the second law. Indeed, the thermodynamiccost of resetting was, to some extent, anticipated by Szi-lard, who, nevertheless, in other parts of his paper,viewed measurement rather than erasure as the funda-mental thermodynamically costly operation. Bennett'sargument was based on the observation by Landauer,who, in the context of information processing operations,pointed out that the thermodynamic cost of the removalof information about the past states of the computer iskz T per bit. This conclusion was extended for the quan-tum version of the Szilard engine by the present author.

In the general case considered here, we shall also insistthat the final state of the computer memory shouldcorrectly reflect the demon's knowledge (or lack of it)about the state of the system. With this assumption inmind, we can now calculate the net work obtained by ourcomputer-operated engine. The gain due to the change ofGibbs-Shannon entropy is given by the usual formula in asomewhat unusual notation:

b, W'+'= T(HI H; ) . — (5.4)

Here H& and H; are Gibbs-Shannon entropies of the finaland initial states, respectively. A standard textbook dis-cussion of energy extraction stops here. However, theinformation-theoretic "bill" for this increase of energyhas not been settled. The computer must update itsmemory to get rid of the no longer relevant record r,about the initial state, and introduce the record r&describing its knowledge about the final state. Below weshall prove that the cost of such an update for a comput-er operating in the environment of temperature T is noless than

b, R" '=T[K(rI) K(r, )]=T(—~r~ ~

—~r,*~) . (5.5)

Here rI* and r,' are the minimal records —the most con-

cise programs containing the information required tospecify the ensembles describing states s& and s;. Our dis-cussion will follow a similar but more complete accountof the computational aspects of energy extraction and re-lated issues (e.g. , of the limitations imposed by undecida-bility on the thermodynamic efficiency) given elsewhere.

To justify Eq. (5.5) we first note that the more verboserecord rI can be reversibly, i.e., without any work expen-diture, substituted with rI* (providing, of course, that rI*is known ). The record of the initial state r; plus the in-formation about the operating procedure of the engine,which we assume is encoded in the computer library,suffice to compute rI. This last computation could bealso accomplished reversibly, but generally only at a costof being left with rI and the "historical" record of thecomputation path required to assure reversibility. Thiswould lead to an accumulation of such a historical recordwith each engine cycle, which would slowly fill up thememory tape. Therefore the process would not be trulycyclic, and could not be used in the discussion of the firstand of the second law of thermodynamics. Hence itwould not suffice for our purpose.

Unless rI and the operating procedure of the enginedetermine r, uniquely, the computation of r& from r,.

Page 14: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

must be necessarily irreversible, as some of the informa-tion about the initial state is irreversibly erased from thememory. In short, a computation that disposes of someof the information about the initial state is logically ir-reversible, and hence, in accord with Landauer's remark,must be thermodynamically irreversible.

The key question relevant to our discussion is then:What is the least possible thermodynamic cost of com-puting rf from r, , in a manner which disposes of therecord of the computation path by a universal computeroperating in the environment of temperature T. We be-gin by noting that r, contains obviously the informationrequired to compute rf.

K(r;, rf)=K(r;)=Ir, *I . (5.6)

Moreover, rf does constrain r, partially. This fact can beexpressed by noting that to compute r, one can use rfsupplemented by the conditional information. Accordingto the algorithmic information theory,

K(r, , rf ) =K(rf )+K(r, Irg ) . (5.7a)

In the above equation we have assumed that the compu-tation of r, is carried out from the minimal program rfrather than directly from rf, this is a technical assump-tion important in avoiding small (logarithmic) correc-tions to the equation for the size of the conditionalstring. Another possibility of avoiding such correctionswould involve assuming that the computation starts fromrf, but that K(rf ) is also available. For, in accord withEq. (2.6), one can write

K (rf r, )=K (rf )+K(r, Irf K (rf )) (5.7b)

Here K(r; Irf, K(rf )) is the algorithmic content of theconditional information —the size of the minimal pro-gram to computer r, given both rf and K (rf ). Moreover,

K(r, , rf)=K(rf, r, ) . (5.8)

Erasure of Ir,*I—Irf*I bits of information in the environ-ment of temperature T can be accomplished only at theexpense of no less than b, W' ' of work, Eq. (5.5).

Note that these computations, Eqs. (5.6)—(5.9), recog-nize the fact that the relevant algorithmic randomness isdefined with respect to the specific computer which hap-pens to be operating the engine. Therefore computer-

We are now ready to show that no less thanI r,*I—

I rf I

bits must be erased in the calculation of rf from r, . Tothis end, we note that reversible computation is capableof obtaining r, from the record containing jointly rf andr.~~, where r,

~

+ stands for the conditional string relevantin the context of Eq. (5.7a). Therefore one can also em-ploy a reversible computer in the manner outlined beforeto substitute r; with rf and r.

~

+. Now to finish the up-

date procedure, one can irreversibly erase the bits of r.~

The number of bits that have to be erased is readily cal-culated from Eqs. (5.6) —(5.8). The length of the condi-tional information string is

(5.9)

=T(Sf—S, ) . (5.10)

This justifies our conjecture about the formula for physi-cal entropy.

Proper accounting for the cost of the "informationdisposal" is essential in arriving at this formula. I havedemonstrated that such accounting must be carried outin terms of the minimally sized programs capable ofdescribing ensembles corresponding to the initial andfinal states of the active medium in the engine. Again, Ihave assumed that the minimal programs are alreadyavailable and ready to use.

Possession of minimal programs can only increase theefficiency of the engine. b. W in Eq. (5.2) refers to themaximal extractable energy. Therefore, if one can provethat Eq. (5.5) for the minimal cost of erasure holds whenthe efficiency of compression is optimized, this assump-tion can be made without danger: The computer cannotdo better than we have assumed above, and we were try-ing to find out the maximal attainable efficiency.

However, in this sense, the above discussion assumesthat the most difficult part of the procedure —finding theminimal algorithm for generation of the record describ-ing the initial state given the minimal program describingthe final state —has been somehow accomplished. Inreality, the inability to guess or derive the most concisedescription will be the rule. It will increase the cost oferasure —more bits will have to be erased if the compres-sion does not attain the limit defined by the algorithmicinformation content —and the efficiency of the enginewill decrease. I shall discuss this fundamental inefficiencyof information processing mandated by the undecidabili-ty and its relation to the thermodynamic efficiency of en-gines elsewhere in more detail (see Ref. 40). However, itis already tempting to conjecture that it provides themotivation for efficient acquisition and processing of in-formation in systems which exist in and depend on anevolving nonequilibrium environment.

With the above caveats in mind we conclude that theabsolute best Maxwell's demon can do is given by physi-cal entropy which does include the algorithmic random-ness of the relevant states in its definitions. This observa-tion, expressed formally by Theorem 5.1, allows one toextend the range of applicability of suitably modifiedthermodynamics to engines operated by computers andby other systems capable of acquiring and processing in-formation. The key result relevant to the discussion ofthe thermodynamic costs of information processing canbe stated as follows. Suppose string r2 can be computedfrom the program r

&by a Turing machine T. If we

demand that the computation should start with only r,

dependent error terms usually appearing in the discussionof the algorithmic randomness [O(1)] do not enter intoconsideration.

We have demonstrated the following.Theorem 5.1.The net work gained by an engine coupled with a com-

puterized demon, which can perform measurements andcontrol the operation of the engine, is no more than

b, W=b. W'+'+b, W' '= T[(Hf+Kf )—(H, +K, )]

Page 15: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

40 ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4745

on the tape (which assumes that r~ is self-delimiting) andend with nothing but r2, then the Turing machine musterase no less than K (r, ) —K (r2 ) bits in the process.

The crucial result for statistical mechanics is the recog-nition that the entropy of an object can be properlydefined as a sum of its statistical entropy with the ensem-ble defined by the data d and of the algorithmic informa-tion content associated with the description of that en-semble. This new definition of physical entropy will haveto be reexamined both in the context of the standard is-sues of thermodynamics (e.g. , second law) and from thepoint of view of its implications for the theory of mea-surements in classical and especially quantum physics.

VI. DISCUSSION

In this paper we have considered three measures of en-tropy. Boltzmann-Gibbs-Shannon entropy has aninformation-theoretic Aavor and is a subjective propertyof the microstates in the following sense. The same rni-crostate can be endowed with a very different value of theBGS entropy depending on the ensemble it is assigned to.This entropy is an (objective) property of an ensemblerather than of a microstate. However, ensembles are usu-ally defined subjectively in a manner that may depend onthe state of the knowledge of the observer. Moreover,both in classical and quantum physics the system ispresumably in just one of the microstates constituting theensemble. '

Algorithmic entropy is defined for a single microstate.It is difficult to calculate exactly, but typically relativelyeasy to estimate. In spite of the very different conceptualsetting, its value is related to the number of microstatesin much the same way as the Boltzmann-Gibbs-Shannonentropy. Therefore one could shift the foundations ofthermodynamics and statistical mechanics from the en-semble to the algorithmic basis without endangering anyof its key conclusions. Algorithmic entropy is an objec-tive property of a microstate, except for the 0 ( l ) correc-tions which are related to the size of the description —thealgorithmic information content —of the universal corn-puter used to define it.

Each of these two quantities is a satisfactory measureof disorder when applied from the outside of a complete(that is, including the Maxwell demon-type observer) sys-tem. However, physical entropy, the sum of the algo-rithmic randomness of the available data and of the still

missing information, is necessary to discuss the process ofenergy extraction from the "inside, " from the viewpointof the observer.

In the experience of this author the only systems thatare interested in the amounts of extractable work andcapable of estimating entropy always seem to belong tothe category of "observers. " Therefore it is tempting toargue that the only entropy that should ever be used isthe physical entropy. Indeed, even the traditional ensem-ble entropies are defined subject to the knowledge of thetype of ensemble and its macroscopic parameters, whichconstitute the relevant data. From this point of view, theusual statistical entropy is the physical limit applicable in

the case of almost complete ignorance, K(d) ((Hd. Al-gorithmic entropy of a microstate is the opposite limit.

Having said all that, one must recognize that the sta-tistical entropy is almost always an excellent approxima-tion of the physical entropy —we are invariably in the vi-

cinity of such a thermodynamic limit. Moreover, statisti-cal entropy is far more convenient to calculate and to use,and there is certainly no reason to try to replace it withphysical entropy in engineering applications of thermo-dynamics. One should, on the other hand, use physicalentropy 4 whenever issues of principle are consideredfrom the internal point of view of the information gather-ing and using systems.

To further investigate the connection between thephysical and statistical entropy it is useful to recast thediscussion of Sec. V in the ensemble language. Considera system and a fine partition of its phase spaceg= I co, ci, . . . , cz ). The standard entropy is

H(g)= —gp(c;)log@(c;) . (6.l)

Suppose now that the Maxwell demon can make a dissi-pationless measurement corresponding to a coarse parti-tion p. Suppose also that g' is a refinement of /3. Thedemon can now operate by (a) measuring the observableassociated with the partition /3 to determine which b H/3contains the microstate, (b) constraining the system to liewithin b in a dissipation-free manner, and (c) extractingthe work by relaxing the constraints in a slow isothermalreversible fashion.

Before the process begins with step (a) and after it iscompleted with step (c), the entropy is H(g). However,at the intermediate step (b), it is equal to the conditionalShannon entropy H ( g'~ p). Thus the work extracted inthe course of a cycle, when averaged over all the possiblemeasurement outcomes equals

b, W= T [H (g) —H(g~P)] = TJ(g P) . (6.2)

Here J(g:P) is the mutual Shannon information definedby the statistical, ensemble analog of Eq. (2.7). Mutualinformation is a measure of correlation between two en-sembles. '2'3

So far, the demon has gained A8'of work at the priceof having its memory cluttered by the no longer relevanthistorical information about a "used-up" measurementoutcome. Clearly, it would be possible for the demon tooperate the "engine" outlined above ad infinitum only ifits memory had infinite capacity. However, in order toachieve a truly cyclic process, the memory of the demonshould be returned to the initial uncluttered state. This,according to Landauer and Bennett, can be accorn-plished by resetting the demon's memory at a cost givenby kz T times the number of erased bits.

In order to operate at maximum efficiency, the demonhas to minimize that cost. Hence it must "compress" theblock of memory occupied by the results of past measure-ments to the absolute minimum. Such compression canbe achieved at no additional cost as long as it does notalter the information content. The absolute minimum inthe size of the description is achieved by the minimal pro-gram b*, the length of which defines the algorithmic en-

Page 16: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4746 W. H. ZUREK

tropy of the measurement outcome with respect to thisuniversal computer U which models operations ofMaxwell's demon. As the optimal sizes of programs arerelated to the Shannon entropy via Eq. (4.17), the averagealgorithmic randomness —typical number of memorybits required to represent the outcome of themeasurement —is almost exactly equal to Ãg:P). Thus,already on the ensemble level, one can conclude that thedemon will, at best, get nowhere.

We have just described the process of energy extractionby the demon in the ensemble language. Why should onethen bother introducing "physical entropy, " which isclearly more complicated than the probabilistic ensemblequantities'

The answer is straightforward: Physical entropy al-lows the demon to give its "private and personal" ac-count of the attempts to extract energy. Each of these in-dividual attempts is associated with the definite outcomeof the measurement. Each of them involves a computa-tion in an attempt to represent the measurement outcomein the simplest possible way. Maximum efficiency can beascertained only when the computation leads to aminimal description. Moreover, the demon will miss theopportunity to extract all of the energy from these out-comes for which it will be unable to produce the mostconcise description because of the incompleteness. . Hencethe incompleteness emerges as one of the reasons forinefficiency. One could, of course, ascertain optimal per-formance by furnishing the demon ahead of time with thecomplete list of all the possible outcomes arranged in theorder corresponding to probability —a "Huffman code"for a given ensemble. This would, however, require enor-mous, possibly infinite memory as well as a priorknowledge of the ensemble. Moreover, it would neces-sarily be a very inAexible strategy, locked to a single,definite ensemble. In practice, living systems that at-tempt to extract useful energy from their surroundingsare faced with a variety of unanticipated circumstances.Therefore, rather than store a ready solution for everypossibility, it is advantageous to let some elementaryequivalent of computation choose the strategy. For-tunately, there are usually sufficiently many opportunitiesto allow for a decrease of physical entropy to sustaintheir activity. Moreover, on longer time scales it may beof enormous advantage to modify the information-processing system. Such modifications take advantage ofthe remaining subjectivity of the definition of algorithmicinformation content —of its dependence on the computerU with respect to which it is calculated.

Consequently, while introducing physical entropy doesnot improve thermodynamic limits on engine efficiency, itdoes allow for their discussion in a very different context,which ties statistical physics with the theory of computa-tion, and which provides a physical motivation forefficient information processing. Further implicationsmay include a combined physico-computational insightinto biological systems and their evolution. Ensembleformulations provide neither a motivation nor a languageto explore such questions. Physical entropy —tailored torepresent information from the viewpoint of the systemthat acquires and makes use of it —appears to be an in-

dispensable tool. It forces one to adopt a hybrid —partlyprobabilistic and partly algorithmic —definition of entro-py.

It is interesting to contemplate an extension of the ap-proach developed in this paper to quantum theory. Weshall pursue this subject in more detail elsewhere. Twobrief remarks anticipating some of the conclusions of themore complete discussion are nevertheless appropriate.Let us first note that the algorithmic entropy of a quan-tum system can be defined in much the same way as thatof the ideal Boltzmann gas. Pure states can have differentalgorithmic entropies depending on how "random" and"disordered" they are. The choice of the basis in the Hil-bert space can be regarded as equivalent to the choice ofthe "grid. " All simply defined choices of the basis are ex-pected to result in a similar algorithmic entropy for thesame state.

Physical entropy of a quantum system can be definedfor a mixture in a manner analogous to the "classical"physical entropy defined in Sec. V. However, the discus-sion of the problem of measurement is now somewhatdifferent: Measurements performed on a mixture will de-crease missing information and alter the record. Mea-surements performed on pure states can only alter thesestates (and update the records about them). This distinc-tion as well as other issues usually raised in the context ofquantum measurements are sufficiently complicated towarrant a separate paper.

One may inquire as to why this algorithmic aspect ofphysical entropy was not considered earlier. There are,of course, obvious historical reasons: Questions concern-ing entropy, Maxwell demons, etc. , were generally con-sidered either settled or hopeless by the second part ofthis century, and only rather recently was the cornputer-oriented point of view of information sufficientlydeveloped to be of help in the issues of principle.

The success of the ensemble definition of entropy, com-bined with the fact that ensemble and algorithmic esti-mates of entropy almost always coincide, is the other im-portant reason for concealing the alternative view of en-

tropy. This is not to claim that the role of randomnesswas underestimated: The influential paper ofEhrenfest's' emphasized the role of randomness in theorigin of irreversibility and interpreted Boltzmann's andGibb's definitions from that point of view. Prigogine andhis co-workers ' have advocated the use of the formal-ism in which entropy is defined through probability dis-tributions, but in a manner that has no information-theoretic connotations. These efforts, however, invari-ably associated randomness with probabilities. Whilethis association is not incorrect, it bypasses the alterna-tive, and conceptually very fruitful, algorithmic ap-proach.

VII. CONCLUSIONS

Questions concerning physics and computation havebeen so far focused on the basic limitations placed byphysical laws on information-processing systems. Bycontrast, this paper is concerned with the problem of a"complementary" nature: I have explored some of the

Page 17: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4747

implications that information acquisition and processinghave for physics.

The definition of the physical entropy proposed here ismade possible by the algorithmic definition of random-ness. It is made necessary to the desire to discuss thefunction of the Maxwell's demon from its own perspec-tive. Moreover, the demon's role can be successfullyplayed by an automaton capable of (i) acquiring informa-tion through reversible measurements, (ii) processing it ina manner analogous to the universal computer, and (iii)adopting strategies that aim to optimize its behavior sothat, for example, it can economically extract useful ener-gy from the available sources.

Physical entropy, which must be used in the discussionof the laws of thermodynamics from the internal point ofview of such an automated engine, is the sum of the miss-ing information and of the thermodynamic cost of updat-ing the memory with the record of the measurement out-comes. It provides one with a new perspective of the pro-cess of measurement. It extends the Boltzmann-Gibbs-Shannon missing information paradigm by taking into ac-count the cost of storage of the information about thesystem. It is no longer necessary for the parametersdefining ensembles of concern in the process of energy ex-traction to be few and "macroscopic:" They can benumerous and correspond to microscopic data, but theprice for this information must be taken into considera-tion in evaluating the efficiency of the engines controlledby computers.

K=K +K -=X log V/AV+ —1ogD mkT

v p 2 (& )'p

+0(1) . (A2)

The expression for entropy, Eq. (A2), has a sensibleform, but results in a counterintuitive prediction. Con-sider N gas particles in a container separated in the mid-dle by a partition. Suppose that the partition is removed.Particles which were on the opposite sides of the parti-tion in approximately equal numbers can now begin tomix. Using Eq. (A2) to calculate the difference of entro-pies before and after the removal of the partition resultsin a change of the value of the "candidate entropy, " Eq.(A2):

the additional expenditure would be -D 'log& V/b. V.This brief discussion outlining two different possible

programs that can generate the same binary image illus-trates an important point: Estimates of algorithmic entro-py are quite insensitive to the details of the method of en-coding. We can now consider the space on the programtape required to accomplish a similar "encoding" pro-cedure for the momentum of an individual particle. Ineach degree of freedom a typical number of digits will belog2(p/b, ). As the expected value of one component of pis (mkT), a typical size of the program required to en-code all of the components for one particle must be, toleading order, D logz&mkT/b, . The estimate of algo-rithmic entropy is then

APPENDIX A: INDISTINGUISHABILITYAND THE SACKUR-TETRODE EQUATION

&&av=N log22=N . (A3)

Consider a gas of N particles distributed "at random"in volume V in D-dimensional space. To within apredetermined accuracy AV=A the location of each ofthese particles is described by D integers. When these in-tegers are algorithmically random, the length of a pro-gram needed to encode the location of a single particle is-1og2V/AV. In order to take care of all N particles, aprogram of typical length

K~ =X log2( V/b, V)+ 0(log2(XD) )+0 (1) (A 1)

is usually needed. Note that this formula has only asmall self-delimiting correction 0(1og2(XD)) rather thanthe correction 0(log2K~) of the type appearing in Eq.(2.5). This correction does not depend on the actual algo-rithmic randomness of the microstate, but rather solelyon the particle number and dimensionality. It can there-fore be incorporated as a part of the constant 0 (1)correction for the purpose of further discussion.

Such a small upper limit on the size of the correctioncomes from the recognition that the most economicalway of writing a self-delimiting program is to specifyonce and for all the sizes of blocks of digits describingcoordinates of particles in the phase space. The otherpossibility would be to write a program in which each ofthe coordinates is encoded by means of a separate self-delimiting subroutine. In some cases this would savespace on the input tape. However, as the reader is en-couraged to verify, for a random distribution of particles,

However, change of entropy should be zero, as no realevent occurs as a result of the removal of the partition.

The cause of the increase of entropy by 6K&"v can betraced to the fact that we have not taken advantage of theindistinguishability of the particles in devising the pro-gram designed to encode the state of the system. Infor-mation about individual particles comes in separateblocks, each of which carries an "implicit" index (i.e. , itarrives as a "number 17 input block" on a tape). In otherwords, the special-purpose rnachine L we have describedat the beginning of Sec. III could print in the pixels occu-pied by the particles not just 1 to signify that the corre-sponding cell in the phase space is occupied: It couldalso print out a label of each particle implicitly encodedin its order of appearance on the input data tape.

To compress further the program and to take advan-tage of the particle indistinguishability, we can employ amodified version of L, our special purpose computer. In-stead of the coordinates of individua1 particles in one ofthe dimensions, one can supply only differences in theirlocations.

Any coordinate of the phase space would do. Consid-er, for instance, spatial coordinate x, . Integers needed todescribe the distribution of particles in this dimensionwill be now smaller. Where, before, a number of the or-der of an average value of x, {x) -L /2, was required,now a smaller number —{(b,x) )'~ = {x)/X will besufficient. This one small number has to be followed byD —1 other numbers characterizing other coordinates of

Page 18: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4748 W. H. ZUREK

+N—log2 +O(1),(& )'

which readily simplifies. We are therefore led to the keyresult of this section.

Theorem Al. Algorithmic randomness of a typical mi-crostate of an ideal gas of indistinguishable particles isgiven by

E =N log2 +—log2V D mkT +O(1) .'NbV 2 '(g )2

(A4)

This is the Sackur-Tetrode equation for the entropy ofthe monoatomic gas. Its additive properties are entirelysatisfactory. In particular, the experiment with a parti-tion being removed now leads to 5K =0, in accord withthe physically motivated expectations.

APPENDIX 8: HOW OBJECTIVEIS ALGORITHMIC RANDOMNESS?

Algorithmic entropy is sufficiently different from thetraditional definition of entropy to be regarded withsuspicion. There were several points that were left outfrom the discussion in Sec. III in an attempt to gain afirst quick, physical insight into the relation between thealgorithmic randomness defined through its informationcontent and the equilibrium entropy of an idealized phys-ical system. We return to them now.

The first of them concerns the container in which thegas is enclosed: It is conceivable that the container will beso complex in shape that the program required to specifyit will be comparable in size with the algorithmic ran-dornness of the particle configuration inside. Yet onewould like the entropy to characterize the state of the gasalone. A "brute force" solution to this problem is to in-sist on simple containers. A more sophisticated solutionis to employ conditional information. That is, one candefine the entropy of the gas alone as the number of addi-tional bits that must be supplied —given a description ofthe container —to specify the state of the gas with the as-sumed resolution.

The second issue that was glossed over is more impor-tant. It concerns the grid that was used to discretize thesystem. The first, simple aspect of this issue concerns thesize adopted for individual cells. Obviously, in the limitof infinitesimally small cells, algorithmic entropy of atypical microstate of the system diverges logarithmically.This difficulty is not new. Before the advent of the quan-tum theory entropy was also thought to be logarithmical-ly divergent for the same reason. Entropy is finite onlybecause the volume of the cells in the phase space is limit-ed by the quantum of action. There is, however, a way ofpartially dealing with this problem which does not call onthe quantum: One can compare different configurationsof the gas on the grid with the same resolution. While

the same particle, which have retained their previousvalues. The algorithmic entropy of the gas in a D-dimensional cubic box with the volume V=L is there-fore given by

(L /N) LD —1

K =N log2, +(D —1)N log2 g 1/1/D ( g y)(D —I )/D

the absolute value of algorithmic randomness is still un-determined, it is now possible, at least in principle, tocompute differences of algorithmic randomness ofdifferent configurations.

While the problem of the logarithmic divergence of en-tropy with the resolution of the grid is rather similar forboth the statistical and the algorithmic approach, the is-sue of the shape of the grid is not. This problem is re-sponsible for the ambiguity in defining the coarse-grainedentropy S& of Gibbs . By shifting cells defining coarse-grained entropy one can usually make SG assume anyvalue between its equilibrium value and the smallest valueconsistent with the fine-grained distribution. (Thisdifficulty is particularly acute in the context of Gibb's"proof" of the second law. If the grid defining coarsegraining is continually deformed with the same evolutionHamiltonian that generates evolution of the fine-graineddistribution, then the coarse-grained entropy will be con-stant. Hence, unless one restricts possible grids in somesuitable fashion, Gibb's arguments in favor of the validityof the second law are obviously incomplete. )

A natural resolution of this ambiguity is to insist thatthe grid should be "simple. " The algorithmic informa-tion theory allows one to give a rigorous definition ofwhat is simple: A simple grid is concisely describable bythe (specific) universal computer U. For instance, a gridwith hexagonal cells may be more appropriate for someconfigurations, and it is still concisely describable. Asimple and natural requirement —to include the prescrip-tion for the grid as a part of the program describing thestate of the gas —will eliminate the advantage of usingfanciful grids. In other words, when the optimal strategyof describing the state of the system includes using anunusual and complex coordinate system, it still is a per-fectly legal strategy, as long as the cost of specifying thegrid is counted as part of the cost of this strategy.

The difficulty of Gibb's "coarse-graining" approacharose from the inability to measure the complexity of thedifferent coarse-graining schemes and to account forthem in units of entropy: All grids, no matter how "per-verse, " had to be regarded as equally valid. Algorithmicrandomness settles this difficulty to some extent: Itmakes it possible to quantify the intuitive concept of"simplicity. " In particular, this allows one to rule outco-evolving grids used in arguments against coarse grain-ing simply because they are too complex.

In view of the importance of the concept of coarsegraining and the long-standing concern with its subjec-tivity, it is useful to delineate more precisely what canand what cannot be accomplished by using algorithmicprescriptions. The discussion above assumes (i) a definiteuniversal computer and (ii) a definite metric on the phasespace, which must be unaffected by the choice of the grid.Given these two assumptions one can expect to arrive atthe algorithmic definition of entropy which, for a givencomputer, does not significantly depend on the deforma-tions of the grid.

Algorithmic randomness is by definition equal to thesize of the smallest message program for a given universalcomputer needed to reproduce the state of the system.Once the resolution is fixed by demanding, for example,

Page 19: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4749

APPENDIX C: SECOND LAW AND EVOLUTIONOF A DISCRETE DYNAMICAL SYSTEM

The aim of this appendix is to discuss the behavior ofthe algorithmic randomness in the course of the dynami-cally reversible evolution of a discrete system. We con-sider reuersible transformations of aconite string s:

s~+1= Us, (Cl)

Equation (C1) can be regarded as a deterministic law of

that the sum of squares of the difFerences between the lo-cations of the real microstate particles and their "im-ages" on the plot, evaluated in some natural units (e.g.,units related to the total volume of the phase space avail-able to the system), is less than the specified tolerance, thegrid can be varied in attempts to meet the criterion withthe shortest message. There is just one smallest size forsuch a program. This minimal size is the one that givesthe algorithmic entropy of the system. Such minimiza-tion includes possible variations of the grid.

At first, one might worry that this freedom to choosethe grid makes K completely grid dependent. This is notthe case. If this were true, one could prove that the sizeof the message, including both the description of the gridand the location of the system within the grid, can bemade arbitrarily small for a giuen Turing machine U bymaking a judicious grid choice. The theory of informa-tion and coding proves that this must be impossible. For,if it were the case, one could violate Shannon's noiselesschannel coding theorem' ' by first encoding messages inparticle configurations on the source end, then using the"variable grid trick" to make message programs veryshort, and finally decoding it on the other end with thehelp of the computer.

Eliminating the subjectivity associated with the coarsegraining does not, however, banish it altogether. Ratherit reemerges under the guise of the definition of a univer-sal computer employed to do the plotting of microstates.In this new form, it is, however, easier to deal with andquantify than in the form of coarse graining. In particu-lar, all concisely describable Turing machines (i.e., allcomputers that have algorithmically short descriptions)will yield comparable estimates for algorithmic complexi-ty. They define (approximately) "standard" algorithmicrandomness. More generally, the algorithmic complexityof a machine with a long description can di6'er from sucha standard by no more than the size of its description.

Issues of the freedom of choice of coarse graining, sub-jectivity of algorithmic randomness, and the resulting es-timates of entropy are of course particularly important inthe discussion of the second law. Therefore we shall re-turn to them in a future paper in more detail. For thepurpose of the present paper, which is mostly concernedwith the definition of physical entropy by specific, indivi-dual observers, it is entirely sufticient to adopt the naturalgrid defined by the observer's measurements and use it todefine algorithmic randomness.

motion generating the phase-space trajectory of a simpleexample of an elementary, idealized system. We willdemonstrate how the reversible, deterministic, dynamicalevolution can lead to an increase of the algorithmic ran-domness. We shall conclude that (i) algorithmic random-ness K(s, ) tends to increase in the course of evolution ifthe initial state is algorithmically simple, K (so )« ~so~-log2W; (ii) K(s, ) approaches the equilibriumvalue consistent with the Boltzmann estimate log&8' ofentropy; and (iii) it fiuctuates around this equilibrium.

We shall require the transformation U to be reversibleto establish a closer analogy with the Hamiltonian dy-namics. The old problem of statistical mechanics —thefundamental incompatibility of reversible dynamics withthe second law —can be posed only when U is reversible.That is, we assume that there should exist U ' such that

—1sr U sr+1 . (C2)

We shall also require U to be concisely describable, i.e.,for a typical instant t

K(s„s,+&)—K(s„s,):—K(U) «K(s„s, )=K(s, )+O(l) .

Ciphers used in cryptography provide an example of suchtransformations. Another, related example is affordedby "random-number" generators used in Monte Carloprograms. For example, dynamical evolution of adiscrete system can be described by

s, +, =As, +p(mod P) . (C3)

The analogy between classical dynamics and transforma-tions described by Eq. (C3) can be pursued further bynoting that the n separate orbits of the quasiperiodicdiscrete maps can be associated with the constants ofmotion.

Above, the string s is treated as an integer which itrepresents in the binary natation. Constants A. , p, and Pcharacterize U. It is not dificult to find examples wherethese numbers are easily describable.

The phase space in which the string s evolves is a set ofall strings that correspond to natural numbers containedin the interval [O,P]. The evolution shall be called perfectly ergodic if after P iterations the system would havepassed through all states of the phase space. The evolu-tion shall be called quasiergodic of order n if only Plnstates were occupied in that time. Evolutions for whichn & log2log2P shall be termed ergodie.

For example, the random-number generator, Eq. (C3),is perfectly ergodic when P, A. , and p satisfy the followingconditions.

(i) The greatest common divisor of p and P is l.(ii) A, :—1(mad q) for every prime factor q of P.(iii) A, = 1(mod 4) if P is a multiple of 4.In particular, when P=2 (which means that P is con-

cisely describable), A, =4a+ 1 and p=2P+ 1, where a andP are arbitrary natural numbers, transformation given byEq. (C3) will result in a perfectly ergodie random numbergenerator. Hence it is quite easy to construct a conciselydescribable, ergodic U, with a very large volume of thephase space given by

(C4)

Page 20: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

4750 W. H. ZUREK

Consider now the behavior of the algorithmic complex-ity of the time-dependent state of the system s, in thecourse of the dynamical evolution defined by Eq. (C3).Let us first point out that this problem can be formulatedin two different, but equivalent ways: One can either (i)consider the relative complexity of the initial string soand the evolved string s, = U'so, K(s, /so), or (ii) one caninquire directly about K(s, ). It is the first option that al-lows one to pose the problem of the apparent irreversibil-ity of dynamical evolutions in an "observer-independent"manner. For one may now propose to measure the in-crease of algorithmically defined entropy by

6K=K(s, ~so) . (C5)

Therefore the entropy will be defined relative to the initialconfiguration. This configuration may or may not appear"simple" to an observer. Below, we shall see that the rel-ative entropy will typically increase, although with fluc-tuations, over time scales smaller than the "Poincare re-currence time. "

On the other hand, if we were to assume that the initialstate, wh&ch was presumably prepared by the observer,were algorithmically simple [K(so)« log2P], then, in-stead of the conditional randomness of Eq. (C5), one candirectly estimate

bK—=K(s, ) . (C6)

This equation will be valid whenever K(s, ) ))K(so), thatis, for an overwhelming majority of the phase space.

To prove that AK will tend to increase on time scalesless than the Poincare recurrence time P/n, we mustshow that a program capable of reproducing the pair(so, so ) is much less complex than the program reproduc-ing (so, s, ). We can anticipate that the difference of com-plexity should be of the order of the typical randomnessof the state s, . Typically, it should be similar to the aver-age complexity of an algorithmically random number oforder P/n:

b,K =—log~P/n =log2P . (C7)

Above, and further in this appendix we assume that theevolution of the system is ergodic.

This expectation, Eq. (C7), appears to lead to a para-dox. After all, the evolution from so to s, can be readilyexpressed as

s, = U'so . (Cg)

Why should s, be significantly more difficult to describethan so, if U is, by assumption, concisely describable? Itis straightforward to estimate the complexity of the pro-gram that starts from so and proceeds to generate recur-sively all states of the string s in its phase space as givenby K(so)+K(U)+0(1). This number can be clearlymade much smaller than logiP. How can one then ex-pect a program that generates s, from so to be, for a typi-cal t, as long as log2P?

The requirement that the program must have a uniqueoutput, emphasized already in Sec. II, provides a resolu-tion of this apparent paradox. Apart from the informa-

tion about so and U the program must contain the infor-mation on when to halt the calculation. This is given by titself, and typical t is of order P/n. Consequently, abinary string of the length -log2t must be added to thesimple program generating the whole trajectory. The in-crease of the algorithmic randomness, as defined by Eqs.(C5) —(Cg), will then be bounded from above by

AK=log2t . (C9)

It is not difficult to see that this estimate of entropysatisfies, for t significantly smaller than the recurrenceperiod P, not only the Boltzmann H theorem, but its gen-eralized version

(C10)

as well. These results strongly indicate that the algo-rithmic complexity can be regarded as an interesting con-tender for the "objective" (that is, relatively observer-independent) definition of physical entropy. Moreover, itdemonstrates that to give a rigorous definition of entropyone need not try to argue that dynamical evolutionstransform definite, pure states into mixtures. Probabilityis not indispensable as the foundation of statisticalmechanics.

It is of course of great interest to inquire as to what isthe relation between the algorithmic complexity and themore traditional recipes for calculating entropy. It is alsoof fundamental importance to consider how our idealizedmodel of a dynamical system can be used to deduce thebehavior of entropy in real physical systems.

The "equilibrium value" of entropy, given by the typi-cal size of the program which must be used to generate s,is given by

AK —= log2P+ log2log2P . (C 1 1)

There is one prevailing reason for large fluctuations thatapproach bE-0: recurrences which occur on the timescale of the order of P. As in the usual discussions ofclassical mechanics, they are forced upon the system bythe finite volume of the accessible phase space, given heredirectly by P. Apart from these rare occurrences AKmay fluctuate significantly above the equilibrium valuewhenever the string s& is either intrinsically conciselydescribable, or can be easily generated by transformingso. Of course, these two cases need not be discussed sepa-rately when so is itself concisely describable. Small fluc-tuations around the equilibrium value can be discussedmore naturally in the context of the relationship betweencomplexity and traditional, probabilistically defined en-tropies, which we have considered in Sec. IV. However,it is easy to anticipate their origin. Clearly, for some spe-cial time ti, -P there may be a concise algorithm V cap-able of generating s, from so which does not employ the

V

transformation U. When K(V) «K(U')-K(t), the al-gorithmic complexity of s, relative to so will be small,

V

and a large and very unlikely fluctuation —the spontane-ous departure from the state of equilibrium —will haveoccurred.

Page 21: Algorithmic randomness and physical entropy · Algorithmic randomness and physical entropy W. H. Zurek Theoretical Division, Los Alamos Nationa/ Laboratory, Los Alamos, New Mexico

ALGORITHMIC RANDOMNESS AND PHYSICAL ENTROPY 4751

It is important to emphasize that while the above dis-cussion is clearly relevant for the dynamical understand-ing of the second law, it is only a part of the whole story.In particular, in the above example, entropy increasesonly very slowly (it gets "half way" to its equilibriumvalue only after a time -P'~ ), which is inconsistent withthe relaxation of systems such as the Boltzmann gas.Moreover, the system is stable in the sense that a collec-tion of several natural states —binary strings —alwaysleads to a similar collection with the same number of nat-ural states at a later time. This is not the case in systemswhich, like Boltzmann gas, exhibit exponential relaxationand where a single "cell" in the phase space is rapidlysmeared by the dynamical evolution over many "natural"cells. Indeed, the example we have considered here canbe regarded as "integrable. "

I shall discuss relaxation from the algorithmic point ofview in more realistic systems elsewhere in more detail.The point of this appendix was only to demonstrate thatintegrability of the system does not preclude the possibili-ty that in the course of its evolution it will reach algo-rithrnically random —disordered —configurations even ifthe initial configuration were regular and the Hamiltoni-an responsible for the evolution of the system were sim-ple.

ACKNOWLEDGMENTS

I would like to thank Charles Bennett, Murray Gell-Mann, James Hartle, Seth Lloyd, William Unruh, andJohn Wheeler for discussions and comments on themanuscript.

'L. Boltzmann, Wien Ber. 76, 373 (1977). See S. G. Brush, Ki-netic Theory {Pergamon, Oxford, 1965), Vol. 1, (1966), Vol. 2,and (1972), Vol. 3, for translation of this and other relevantpapers.

J. W. Gibbs, Elementary Principles in Statistica/ Mechanics(Yale University Press, London, 1928).

3J. von Neumann, Mathematical Foundations of QuantumMechanics (Princeton University Press, Princeton, 1955).

4H. Grad, Comm. Pure Appl. Math. 14, 323 (1961).~K. G. Denbigh and J. S. Denbigh, Entropy in Relation to In-

complete Knowledge (Cambridge University Press, Cam-bridge, England, 1985)~

6p. C. W. Davies, Physics of Time Asymmetry (University ofCalifornia Press, Berkeley, 1977).

S. G. Brush, Ref. 1, Vols. 1 —3.~A. Wehrl, Rev. Mod. Phys. 50, 221 (1978).I. Prigogine, From Being to Becoming (Freeman, San Fancisco,

1980).' J. Ford, Phys. Today 36, (4) 40, (1983), and references therein."Quantum Theory and Measurement, edited by J. A. Wheeler

and W. H. Zurek (Princeton University Press, Princeton,1983); H. Everett, in The Many Worlds I-nterpretation ofQuantum Mechanics, edited by B. S. DeWitt and N. CJraham(Princeton University Press, Princeton, 1973); W. H. Zurek,in Information Transfer in Quantum Measurements: Irreversibi lity and Amplification, Vol. 94 of Nato Advanced Study Institute Series in Quantum Optics, Experimental Gravitationand Measurement Theory, edited by P. Meystre and M. O.Scully (Plenum, New York, 1983), pp. 87—106.

t2W. Shannon and W. Weaver, The Mathematical Theory ofCommunication (University of Illinois Press, Urban, 1949).

'3A. I. Khintchin, Information Theory (Dover, New York,1957); R. W. Hamming, Coding and Information Theory(Prentice Hall, Englewood Cliffs, NJ, 1986).

4S. W. Hawking, Phys. Rev. D 13, 191 (1976).~5J. B. Hartle and S. W. Hawking, Phys. Rev. D 28, 2960 (1983).' R. J. Solomonoff, Inf. Control 7, 1 (1964).' A. N. Kolmogorov, Inf. Transmission 1, 3 (1965).' G. J. Chaitin, J. Assoc. Comput. Mach. 13, 547 (1966).' A. N. Kolmogorov, IEEE Trans. Inf. Theory 14, 662 {1968).

A. N. Kolmogorov, Usp. Mat. Nauk. 23, 201 (1968).G. J. Chaitin, J. Assoc. Comput. Mach. 22, 329 (1975), andreferences therein.G. J. Chajtjn, Scj. Am. 232{5),47 (1975).

G. J. Chaitin, IBM, J. Res. Dev. 21, 350 (1977).24A. K. Zvonkin and L. A. Levin, Usp. Mat. Nauk. 25, 602

(1970).25P. Martin-Lof, Inf. Control 9, 602 (1966).26G. J. Chaitin, Algorithmic Information Theory (Cambridge

University Press, Cambridge, England, 1987), and referencestherein.C. H. Bennett, Int. J. Theor. Phys. 21, 905 (1982).

~sP. Gacs, Dokl. Akad. Nauk SSSR 218, 1265 (1974) [Sov.Phys. —Dokl. 15, 1477 (1974)].L. A. Levin, Dokl. Akad. Nauk SSSR 227, 1293 (1976) [Sov.Phys. —Dokl. 17, 522 (1976)].

L. Szilard, Z. Phys. 53, 840 {1929), English translation inBehav. Sci. 9, 301 (1964), reported in Quantum Theory andMeasurement, Ref. 11.

~E. T. Jaynes, Phys. Rev. 106, 620 (1957); 108, 171 (1957).32L. Brillouin, Science and Information Theory, 2nd ed.

(Academic, London, 1962).D. Hofstader, Godel, Escher, Bach (Vintage, New York, 1980).C. H. Bennett, IBM J. Res. Dev. 17, 525 (1973).R. Landauer, IBM J. Res. Dev. 3, 183 (1961).C. H. Bennett, IBM J. Res. Dev. 32, 16 (1988).R. Landauer, Found. Phys. 16, 551 (1986)~

38C. H. Bennett, Sci. Am. 255(11), 108 (1987).39W. H. Zurek, in Frontiers of Nonequilibrium Statistical

Mechanics, edited by G. T. Moore and M. O. Scully (Plenum,New York, 1986), pp. 151—161.W. H. Zurek, Nature (to be published); and unpublished.

4~W. H. Zurek, Phys. Rev. D 24, 1516 (1981);26, 1862 (1982); inRef. 39, pp. 145—149.

4~P. Ehrenfest and T. Ehrenfest, The Conceptual Foundations ofthe Statistical Approach, English translation {Cornell Univer-sity Press, Ithaca, 1956).

43I. Prigogine, C. George, F. Henin, and L. Rosenfeld, Chem.Scr. 4, 5 (1974).

44Such a "reversal" of the approach was advocated in the con-text of quantum measurements by J. A. Wheeler, see, e.g. ,IBM J. Res. Dev. 32, 4 (1988), and references therein. SeeRef. 38 for comments on the relevance of computation toWheeler's ideas.

45J. M. Hammersley and D. C. Handscomb, Mon te CarloMethods (Wiley, New York, 1964).A. G. Konheim, Cryptography, a Primer {Wiley, New York,1981).