Introduction and Preliminaries · 2013. 3. 24. · 16 # 29412 Cust: AddisonWesley Au: Bar-Yam Pg. No. 16 Title: Dynamics Complex Systems Short / Normal / Long 1 Introduction and Preliminaries

16

# 29412 Cust: AddisonWesley Au: Bar-Yam Pg. No. 16Title: Dynamics Complex Systems Short / Normal / Long

1

Introduction and Preliminaries

Conceptual Outline

A deceptively simple model of the dynamics of a system is a deterministiciterative map applied to a single real variable. We characterize the dynamics by look-ing at its limiting behavior and the approach to this limiting behavior. Fixed points thatattract or repel the dynamics, and cycles, are conventional limiting behaviors of asimple dynamic system. However, changing a parameter in a quadratic iterative mapcauses it to undergo a sequence of cycle doublings (bifurcations) until it reaches aregime of chaotic behavior which cannot be characterized in this way. This deter-ministic chaos reveals the potential importance of the influence of fine-scale detailson large-scale behavior in the dynamics of systems.

A system that is subject to complex (external) influences has a dynamicsthat may be modeled statistically. The statistical treatment simplifies the complex un-predictable stochastic dynamics of a single system, to the simple predictable dy-namics of an ensemble of systems subject to all possible influences. A random walkon a line is the prototype stochastic process. Over time, the random influence causesthe ensemble of walkers to spread in space and form a Gaussian distribution. Whenthere is a bias in the random walk, the walkers have a constant velocity superim-posed on the spreading of the distribution.

While the microscopic dynamics of physical systems is rapid and complex,the macroscopic behavior of many materials is simple, even static. Before we can un-derstand how complex systems have complex behaviors, we must understand whymaterials can be simple. The origin of simplicity is an averaging over the fast micro-scopic dynamics on the time scale of macroscopic observations (the ergodic theorem)and an averaging over microscopic spatial variations. The averaging can be performedtheoretically using an ensemble representation of the physical system that assumesall microscopic states are realized. Using this as an assumption, a statistical treatmentof microscopic states describes the macroscopic equilibrium behavior of systems. Thefinal part of Section 1.3 introduces concepts that play a central role in the rest of thebook. It discusses the differences between equilibrium and complex systems.Equilibrium systems are divisible and satisfy the ergodic theorem. Complex systems

❚ 1 . 3 ❚

❚ 1 . 2 ❚

❚ 1 . 1 ❚

01adBARYAM_29412 3/10/02 10:15 AM Page 16

are composed out of interdependent parts and violate the ergodic theorem. They havemany degrees of freedom whose time dependence is very slow on a microscopic scale.

To understand the separation of time scales between fast and slow de-grees of freedom, a two-well system is a useful model. The description of a particletraveling in two wells can be simplified to the dynamics of a two-state (binary vari-able) system. The fast dynamics of the motion within a well is averaged by assumingthat the system visits all states, represented as an ensemble. After taking the aver-age, the dynamics of hopping between the wells is represented explicitly by the dy-namics of a binary variable. The hopping rate depends exponentially on the ratio ofthe energy barrier and the temperature. When the temperature is low enough, thehopping is frozen. Even though the two wells are not in equilibrium with each other,equilibrium continues to hold within a well. The cooling of a two-state system servesas a simple model of a glass transition, where many microscopic degrees of freedombecome frozen at the glass transition temperature.

Cellular automata are a general approach to modeling the dynamics ofspatially distributed systems. Expanding the notion of an iterative map of a single vari-able, the variables that are updated are distributed on a lattice in space. The influ-ence between variables is assumed to rely upon local interactions, and is homoge-neous. Space and time are both discretized, and the variables are often simplified toinclude only a few possible states at each site. Various cellular automata can be de-signed to model key properties of physical and biological systems.

The equilibrium state of spatially distributed systems can be modeled byfields that are treated using statistical ensembles. The simplest is the Ising model, whichcaptures the simple cooperative behavior found in magnets and many other systems.Cooperative behavior is a mechanism by which microscopic fast degrees of freedomcan become slow collective degrees of freedom that violate the ergodic theorem andare visible macroscopically. Macroscopic phase transitions are the dynamics of thecooperative degrees of freedom. Cooperative behavior of many interacting elementsis an important aspect of the behavior of complex systems. This should be contrastedto the two-state model (Section 1.4), where the slow dynamics occurs microscopically.

Computer simulations of models such as molecular dynamics or cellularautomata provide important tools for the study of complex systems. Monte Carlo sim-ulations enable the study of ensemble averages without necessarily describing thedynamics of a system. However, they can also be used to study random-walk dy-namics. Minimization methods that use iterative progress to find a local minimum areoften an important aspect of computer simulations. Simulated annealing is a methodthat can help find low energy states on complex energy surfaces.

We have treated systems using models without acknowledging explicitlythat our objective is to describe them. All our efforts are designed to map a systemonto a description of the system. For complex systems the description must be quitelong, and the study of descriptions becomes essential. With this recognition, we turn

❚ 1 . 8 ❚

❚ 1 . 7 ❚

❚ 1 . 6 ❚

❚ 1 . 5 ❚

❚ 1 . 4 ❚

C onc e pt u al ou t l i ne 17


01adBARYAM_29412 3/10/02 10:15 AM Page 17

to information theory. The information contained in a communication, typically astring of characters, may be defined quantitatively as the logarithm of the number ofpossible messages. When different messages have distinct probabilities P in an en-semble, then the information can be identified as ln(P ) and the average informationis defined accordingly. Long messages can be modeled using the same concepts asa random walk, and we can use such models to estimate the information containedin human languages such as English.

In order to understand the relationship of information to systems, we mustalso understand what we can infer from information that is provided. The theory of logicis concerned with inference. It is directly linked to computation theory, which is con-cerned with the possible (deterministic) operations that can be performed on a stringof characters. All operations on character strings can be constructed out of elemen-tary logical (Boolean) operations on binary variables. Using Tu r i n g ’s model of compu-tation, it is further shown that all computations can be performed by a universal Tu r i n gmachine, as long as its input character string is suitably constructed. Computation the-ory is also related to our concern with the dynamics of physical systems because it ex-plores the set of possible outcomes of discrete deterministic dynamic systems.

We return to issues of structure on microscopic and macroscopic scalesby studying fractals that are self-similar geometric objects that embody the conceptof progressively increasing structure on finer and finer length scales. A general ap-proach to the scale dependence of system properties is described by scaling theory.The renormalization group methodology enables the study of scaling properties byrelating a model of a system on one scale with a model of the system on anotherscale. Its use is illustrated by application to the Ising model (Section 1.6), and to thebifurcation route to chaos (Section 1.1). Renormalization helps us understand the ba-sic concept of modeling systems, and formalizes the distinction between relevantand irrelevant microscopic parameters. Relevant parameters are the microscopicparameters that can affect the macroscopic behavior. The concept of universality isthe notion that a whole class of microscopic models will give rise to the same macro-scopic behavior, because many parameters are irrelevant. A conceptually relatedcomputational technique, the multigrid method, is based upon representing a prob-lem on multiple scales.

The study of complex systems begins from a set of models that capture aspects of thedynamics of simple or complex systems. These models should be sufficiently generalto encompass a wide range of possibilities but have sufficient structure to capture in-teresting features. An exciting bonus is that even the apparently simple mo dels dis-cussed in this chapter introduce features that are not typically treated in the conven-tional science of simple systems, but are appropriate introductions to the dynamics ofcomplex systems.Our treatment of dynamics will often consider discrete rather thancontinuous time. Analytic treatments are often convenient to formulate in continu-

❚ 1 . 1 0 ❚

❚ 1 . 9 ❚


18 I n t r o duc t i on a nd P r e l i m i n a r i e s

01adBARYAM_29412 3/10/02 10:15 AM Page 18

ous variables and differential equations;however, computer simulations are often bestformulated in discrete space-time variables with well-defined intervals. Moreover, theassumption of a smooth continuum at small scales is not usually a convenient start-ing point for the study of complex systems. We are also generally interested not onlyin one example of a system but rather in a class of systems that differ from each otherbut share a characteristic structure. The elements of such a class of systems are col-lectively known as an ensemble.As we introduce and study mathematical models, weshould recognize that our primary objective is to represent properties of real systems.We must therefore develop an understanding of the nature of models and modeling,and how they can pertain to either simple or complex systems.

Iterative Maps (and Chaos)

An iterative map f is a function that evolves the state of a system s in discrete time

s(t) = f(s(t − t)) (1.1.1)

where s(t) describes the state of the system at time t. For convenience we will gener-ally measure time in units of t which then has the value 1,and time takes integral val-ues starting from the initial condition at t = 0.

Ma ny of the com p l ex sys tems we wi ll con s i der in this text are of the form ofEq .( 1 . 1 . 1 ) ,i f we all ow s to be a gen eral va ri a ble of a rbi tra ry dimen s i on . The gen era l i tyof i tera tive maps is discussed at the end of this secti on . We start by con s i dering severa lexamples of i tera tive maps wh ere s is a single va ri a bl e . We discuss bri ef ly the bi n a ryva ri a ble case, s = ±1 . Th en we discuss in gre a ter detail two types of maps with s a re a lva ri a bl e , s ∈ ℜ, linear maps and qu ad ra tic maps. The qu ad ra tic itera tive map is a sim-ple model that can display com p l ex dy n a m i c s . We assume that an itera tive map may bes t a rted at any initial con d i ti on all owed by a spec i f i ed domain of its sys tem va ri a bl e .

1.1.1 Binary iterative mapsThere are only a few binary iterative maps.Question 1.1.1 is a complete enumerationof them.*

Question 1.1.1 Enumerate all possible iterative maps where the systemis described by a single binary variable, s = ±1.

Solution 1.1.1 There are only four possibilities:

s(t) = 1

s(t) = −1

s(t) = s(t − 1)(1.1.2)

s(t) = −s(t − 1)

1.1

I t e ra t i v e ma p s ( an d ch a o s ) 19


*Questions are an integral part of the text. They are designed to promote independent thought. The readeris encouraged to read the question, contemplate or work out an answer and then read the solution providedin the text. The continuation of the text assumes that solutions to questions have been read.

01adBARYAM_29412 3/10/02 10:15 AM Page 19

It is instructive to consider these possibilities in some detail. The main rea-son there are so few possibilities is that the form of the iterative map we areusing depends,at most, on the value of the system in the previous time. Thefirst two examples are constants and don’t even depend on the value of thesystem at the previous time. The third map can only be distinguished fromthe first two by observation of its behavior when presented with two differ-ent initial conditions.

The last of the four maps is the only map that has any sustained dy-namics. It cycles between two values in perpetuity. We can think about thisas representing an oscillator. ❚

Question 1.1.2

a. In what way can the map s(t) = −s(t − 1) represent a physical oscillator?

b. How can we think of the static map, s(t) = s(t − 1), as an oscillator?

c. Can we do the same for the constant maps s(t) = 1 and s(t) = −1?

Solution 1.1.2 (a) By looking at the oscillator displacement with a strobeat half-cycle intervals,our measured values can be represented by this map.(b) By looking at an oscillator with a strobe at cycle intervals. (c) You mightthink we could, by picking a definite starting phase of the strobe with respectto the oscillator. However, the constant map ignores the first value, the os-cillator does not. ❚

1.1.2 Linear iterative maps: free motion, oscillation, decayand growth

The simplest example of an iterative map with s real, s ∈ℜ, is a constant map:

s(t) = s0 (1.1.3)

No matter what the initial value,this system always takes the particular value s0. Theconstant map may seem trivial,however it will be useful to compare the constant mapwith the next class of maps.

A linear iterative map with unit coefficient is a model of free motion or propa-gation in space:

s(t) = s(t − 1) + v (1.1.4)

at su cce s s ive times the va lues of s a re sep a ra ted by v, wh i ch plays the role of the vel oc i ty.

Question 1.1.3 Consider the case of zero velocity

s(t) = s(t − 1) (1.1.5)

How is this different from the constant map?

Solution 1.1.3 The two maps differ in their depen den ce on the initial va lu e . ❚



01adBARYAM_29412 3/10/02 10:15 AM Page 20


Runaway growth or decay is a multiplicative iterative map:

s(t) = gs(t − 1) (1.1.6)

We can generate the values of this iterative map at all times by using the equivalentexpression

(1.1.7)

which is exponential growth or decay. The iterative map can be thought of as a se-quence of snapshots of Eq.(1.1.7) at integral time. g = 1 reduces this map to the pre-vious case.

Question 1.1.4 We have seen the case of free motion, and now jumpedto the case of growth. What happened to accelerated motion? Usually we

would consider accelerated motion as the next step after motion with a con-stant velocity. How can we write accelerated motion as an iterative map?

Solution 1.1.4 The description of accelerated motion requires two vari-ables: position and velocity. The iterative map would look like:

x(t) = x(t − 1) + v(t − 1)(1.1.8)

v(t) = v(t − 1) + a

This is a two-variable iterative map. To write this in the notation of Eq.(1.1.1)we would define s as a vector s(t) = (x(t), v(t)). ❚

Question 1.1.5 What happens in the rightmost exponential expressionin Eq. (1.1.7) when g is negative?

Solution 1.1.5 The logarithm of a negative number results in a phase i .The term i t in the exponent alternates sign every time step as one wouldexpect from Eq. (1.1.6). ❚

At this point,it is convenient to introduce two graphical methods for describingan iterative map. The first is the usual way of plotting the value of s as a function oftime. This is shown in the left panels of Fig. 1.1.1. The second type of plot,shown inthe right panels, has a different purpose. This is a plot of the iterative relation s(t) asa function of s(t − 1). On the same axis we also draw the line for the identity maps(t) = s(t − 1). These two plots enable us to graphically obtain the successive values ofs as follows. Pick a starting value of s, which we can call s(0). Mark this value on theabscissa. Mark the point on the graph of s(t) that corresponds to the point whose ab-scissa is s(0),i.e.,the point (s(0), s(1)).Draw a horizontal line to intersect the identitymap. The intersection point is (s(1), s(1)). Draw a vertical line back to the iterativemap. This is the point (s(1), s(2)). Successive values of s(t) are obtained by iteratingthis graphical procedure. A few examples are plotted in the right panels of Fig. 1.1.1.

In order to discuss the iterative maps it is helpful to recognize several features ofthese maps.First,intersection points of the identity map and the iterative map are thefixed points of the iterative map:

(1.1.9) s0 = f (s0)

s(t) = g t s0 = e ln(g )t s0

I t e ra t i v e map s ( a nd cha o s ) 21

01adBARYAM_29412 3/10/02 10:15 AM Page 21

Fixed points,not surprisingly, play an important role in iterative maps. They help usdescribe the state and behavior of the system after many iterations. There are twokinds of fixed points—stable and unstable. Stable fixed points are characterized by“attracting” the result of iteration of points that are nearby. More precisely, there exists



0 1 2 3 4 5 6 7

s(t)

t

0 1 2 3 4 5 6 7

s(t)=s(t–1) +v

(a)

(b) s(t)

s(t–1)

s(t)

t

s(t)=c

s(t)=s(t–1)+v

s(t)

s(t–1)

s(t)=c

Figure 1.1.1 T he left panels show the time - de p e nde nt value of the system variable s(t) re-s u l t i ng from iterative ma p s. The first panel (a) shows the result of itera t i ng the cons t a nt ma p ;(b) shows the result of add i ng v to the pre v ious value du r i ng each time interval; (c)–(f) sho wt he result of mu l t i p l y i ng by a cons t a nt g, whe re each fig u re shows the behavior for a differe ntra nge of g values: (c) g > 1, (d) 0 < g < 1, (e) 1 < g < 0, and (f) g < 1. The rig ht panels area differe nt way of sho w i ng gra p h ically the results of itera t io ns and are cons t r ucted as fo l l o w s.First plot the func t ion f(s) (solid line), whe re s(t) f(s(t 1)). This can be tho u g ht of as plot-t i ng s(t) vs. s(t 1). Second, plot the ide ntity map s(t) s(t 1) (da s hed line). Mark the ini-t ial value s(0) on the ho r i z o ntal axis, and the point on the graph of s(t) that corre s p o nds tot he point whose abscissa is s(0), i.e. the point (s(0), s(1)). These are shown as squa re s. Fro mt he point (s(0), s(1)) draw a ho r i z o ntal line to intersect the ide ntity map. The int e r s e c t io np o i nt is (s(1), s(1)). Draw a vertical line back to the iterative map. This is the point (s( 1 ) ,s(2)). Successive values of s(t) are obtained by itera t i ng this gra p h ical pro c e du re. ❚

01adBARYAM_29412 3/10/02 10:15 AM Page 22


0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

s(t)=gs(t–1)g>1

s(t)=gs(t–1)0<g<1

s(t)=gs(t–1)0<g<1(d)

(c) s(t)

s(t–1)

s(t)

s(t)

s(t–1)

s(t)

t

t

s(t)=gs(t–1)g>1

0 1 2 3 4 5 6 7

s(t)=gs(t–1)-1<g<0

s(t)=gs(t–1)-1<g<0

0 1 2 3 4 5 6 7

s(t)=gs(t–1)g<-1

s(t)=gs(t–1)g<-1

(e)

(f)

s(t)

s(t–1)

s(t)

t

s(t)

s(t–1)

s(t)

t

23

01adBARYAM_29412 3/10/02 10:15 AM Page 23



a neighborhood of points of s0 such that for any s in this neighborhood the sequenceof points

(1.1.10)

converges to s0. We are using the notation f 2(s) = f(f (s)) for the second iteration,andsimilar notation for higher iterations. This sequence is just the time series of the iter-ative map for the initial condition s. Unstable fixed points have the opposite behavior,in that iteration causes the system to leave the neighborhood of s0. The two types offixed points are also called attracting and repelling fixed points.

The family of multiplicative iterative maps in Eq.(1.1.6) all have a fixed point ats0 = 0. Graphically from the figures, or analytically from Eq. (1.1.7), we see that thefixed point is stable for |g| < 1 and is unstable for |g| > 1. There is also distinct behav-ior of the system depending on whether g is positive or negative. For g < 0 the itera-tions alternate from one side to the other of the fixed point, whether it is attracted toor repelled from the fixed point. Specifically, if s < s0 then f(s) > s0 and vice versa, orsign(s − s0) = −sign(f (s) − s0). For g > 0 the iteration does not alternate.

Question 1.1.6 Consider the iterative map.

s(t) = gs(t − 1) + v (1.1.11)

convince yourself that v does not affect the nature of the fixed point, onlyshifts its position.

Question 1.1.7 Con s i der an arbi tra ry itera tive map of the form Eq .( 1 . 1 . 1 ) ,with a fixed point s0 ( Eq .( 1 . 1 . 9 ) ) . If the itera tive map can be ex p a n ded in

a Tayl or series around s0 s h ow that the first deriva tive

(1.1.12)

characterizes the fixed point as follows:

For |g | < 1, s0 is an attracting fixed point.

For |g | > 1, s0 is a repelling fixed point.

For g < 0, iterations alternate sides in a sufficiently small neighborhood of s0.

For g > 0 ,i tera ti ons remain on one side in a su f f i c i en t ly small nei gh borh ood of s0.

Extra credit: Prove the same theorem for a differentiable function (no Taylorexpansion needed) using the mean value theorem.

Solution 1.1.7 If the iterative map can be expanded in a Taylor series wewrite that

(1.1.13) f (s) = f (s0) + g (s − s0)+ h (s − s0)2 +…

g =df (s)

dss 0

{s, f (s), f 2(s), f 3(s),…}

01adBARYAM_29412 3/10/02 10:15 AM Page 24


where g is the first derivative at s0, and h is one-half of the second derivativeat s0. Since s0 is a fixed point f(s0) = s0 we can rewrite this as:

(1.1.14)

If we did not have any higher-order terms beyond g, then by inspection eachof the four conditions that we have to prove would follow from this expres-sion without restrictions on s. For example, if |g | > 1, then taking the mag-nitude of both sides shows that f(s) − s0 is larger than s − s0 and the iterationstake the point s away from s0. If g > 0,then this expression says that f(s) stayson the same side of s0. The other conditions follow similarly.

To generalize this argument to include the higher-order terms of the ex-pansion, we must guarantee that whichever domain g is in (g > 1, 0 < g < 1,−1 < g < 0, or g < −1), the same is also true of the whole right side. For aTaylor expansion, by choosing a small enough neighborhood |s − s0| < , wecan guarantee the higher-order terms are less than any number we choose.We choose to be half of the minimum of |g − 1|, |g − 0| and |g + 1|. Theng + is in the same domain as g. This provides the desired guarantee and theproof is complete.

We have proven that in the vicinity of a fixed point the iterative mapmay be completely characterized by its first-order expansion (with the ex-ception of the special points g = ±1,0). ❚

Thus far we have not considered the special cases g =±1,0. The special cases g = 0and g = 1 have already been treated as simpler iterative maps. When g = 0, the fixedpoint at s = 0 is so attractive that it is the result of any iteration. When g = 1 all pointsare fixed points.

The new special case g = −1 has a different significance. In this case all points al-ternate between positive and negative values, repeating every other iteration. Suchrepetition is a generalization of the fixed point. Whereas in the fixed-point case we re-peat every iteration, here we repeat after every two iterations. This is called a 2-cycle,and we can immediately consider the more general case of an n-cycle. In this termi-nology a fixed point is a 1-cycle.One way to describe an n-cycle is to say that iteratingn times gives back the same result, or equivalently, that a new iterative map which isthe nth fold composition of the original map h = f n has a fixed point. This descrip-tion would include also fixed points of f and all points that are m-cycles, where m is adivisor of n. These are excluded from the definition of the n-cycles. While we have in-troduced cycles using a map where all points are 2-cycles,more general iterative mapshave specific sets of points that are n-cycles. The set of points of an n-cycle is calledan orbit. There are a variety of properties of fixed points and cycles that can be provenfor an arbitrary map. One of these is discussed in Question 1.1.8.

f (s) −s0

s − s0

= g +h (s − s0) +…

I t e ra t i v e ma p s ( a nd cha o s ) 25

01adBARYAM_29412 3/10/02 10:15 AM Page 25

Question 1.1.8 Prove that there is a fixed point between any two pointsof a 2-cycle if the iterating function f is continuous.

Solution 1.1.8 Let the 2-cycle be written as

(1.1.15)

Consider the function h(s) = f(s) − s, h(s1) and h(s2) have opposite signs andtherefore there must be an s0 between s1 and s2 such that h(s0) = 0—the fixedpoint. ❚

We can also generalize the definition of attracting and repelling fixed points toconsider attracting and repelling n-cycles. Attraction and repulsion for the cycle isequivalent to the attraction and repulsion of the fixed point of f n.

1.1.3 Quadratic iterative maps: cycles and chaosThe next itera tive map we wi ll con s i der de s c ri bes the ef fect of n on l i n e a ri ty (sel f - acti on ) :

s(t) = as(t − 1)(1 − s(t − 1)) (1.1.16)

or equivalently

f (s) = as(1 − s) (1.1.17)

This map has played a significant role in development of the theory of dynamical sys-tems because even though it looks quite innocent,it has a dynamical behavior that isnot described in the conventional science of simple systems. Instead, Eq. (1.1.16) isthe basis of significant work on chaotic behavior, and the transition of behavior fromsimple to chaotic. We have chosen this form of quadratic map because it simplifiessomewhat the discussion. Question 1.1.11 describes the relationship between thisfamily of quadratic maps,parameterized by a, and what might otherwise appear to bea different family of quadratic maps.

We will focus on a values in the range 4 > a > 0. For this range, any value of s inthe interval s ∈[0,1] stays within this interval. The minimum value f(s) = 0 occurs fors = 0,1 and the maximal value occurs for s = 1/2. For all values of a there is a fixed pointat s = 0 and there can be at most two fixed points, since a quadratic can only intersecta line (Eq. (1.1.9)) in two points.

Taking the first derivative of the iterative map gives

(1.1.18)

At s = 0 the derivative is a which, by Question 1.1.7,shows that s = 0 is a stable fixedpoint for a < 1 and an unstable fixed point for a > 1. The switching of the stability ofthe fixed point at s = 0 coincides with the introduction of a second fixed point in theinterval [0,1] (when the slope at s = 0 is greater than one, f (s) > s for small s, and since

df

ds= a(1− 2s)

s1 = f (s2)

s2 = f (s1)



01adBARYAM_29412 3/10/02 10:15 AM Page 26


f (1) = 0, we have that f(s1) = s1 for some s1 in [0,1] by the same construction as inQuestion 1.1.8). We find s1 by solving the equation

(1.1.19)

(1.1.20)

Substituting this into Eq. (1.1.18) gives

(1.1.21)

This shows that for 1 < a < 3,the new fixed point is stable by Question 1.1.7. Moreover,the derivative is positive for 1 < a < 2,so s1 is stable and convergence is from one side.The derivative is negative for 2 < a < 3, so s1 is stable and alternating.

Fig. 1.1.2(a)–(c) shows the three cases: a = 0.5, a = 1.5 and a = 2.8. For a = 0.5,starting from anywhere within [0,1] leads to convergence to s = 0. When s(0) > 0.5 thefirst iteration takes the system to s(1) < 0.5. The closer we start to s(0) = 1 the closerto s = 0 we get in the first jump. At s(0) = 1 the convergence to 0 occurs in the firstjump. A similar behavior would be found for any value of 0 < a < 1. For a = 1.5 the be-havior is more complicated. Except for the points s = 0,1,the convergence is always tothe fixed point s1 = (a − 1)/a between 0 and 1. For a = 2.8 the iterations converge tothe same point;however, the convergence is alternating. Because there can be at mosttwo fixed points for the quadratic map, one might think that this behavior would beall that would happen for 1 < a < 4.One would be wrong. The first indication that thisis not the case is the instability of the fixed point at s1 starting from a = 3.

What happens for a > 3? Both of the fixed points that we have found,and the onlyones that can exist for the quadratic map, are now unstable. We know that the itera-tion of the map has to go somewhere, and only within [0,1]. The only possibility,within our experience, is that there is an attracting n-cycle to which the fixed pointsare unstable. Let us then consider the map f 2(s) whose fixed points are 2-cycles of theoriginal map. f 2(s) is shown in the right panels of Fig. 1.1.2 for increasing values of a.The fixed points of f (s) are also fixed points of f 2(s). However, we see that two addi-tional fixed points exist for a > 3. We can also show analytically that two fixed pointsare introduced at exactly a = 3:

(1.1.22)

To find the fixed point we solve:

(1.1.23)

We already know two solutions of this quartic equation—the fixed points of the mapf . One of these at s = 0 is obvious. Dividing by s we have a cubic equation:

(1.1.24) a3s3 − 2a 3s2 + a2 (1+ a)s + (1 −a 2 ) = 0

s = a 2s(1− s)(1 −as(1−s))

f2(s) = a 2s(1−s)(1− as(1− s))

df

ds s1

= 2− a

s1 = (a −1)/a

s1 = as1(1− s1)

I t e ra t i v e ma ps ( a n d ch ao s ) 27

01adBARYAM_29412 3/10/02 10:15 AM Page 27

28


01adBARYAM_29412 3/10/02 10:15 AM Page 28


29

01adBARYAM_29412 3/10/02 10:15 AM Page 29

30


01adBARYAM_29412 3/10/02 10:15 AM Page 30

We can reduce the equation to a quadratic by dividing by (s − s1) as follows (we sim-plify the algebra by dividing by a(s − s1) = (as − (a − 1))):

(1.1.25)

Now we can obtain the roots to the quadratic:

(1.1.26)

(1.1.27)

This has two solutions (as it must for a 2-cycle) for a <−1 or for a > 3. The former caseis not of interest to us since we have assumed 0 < a < 4. The latter case is the two rootsthat are promised. Notice that for exactly a = 3 the two roots that are the new 2-cycleare the same as the fixed point we have already found s1. The 2-cycle splits off fromthe fixed point at a = 3 when the fixed point becomes unstable. The two attractingpoints continue to separate as a increases. For a > 3 we expect that the result of itera-tion eventually settles down to the 2-cycle. The system state alternates between thetwo roots Eq. (1.1.27). This is shown in Fig. 1.1.2(d).

As we continue to increase a beyond 3, the 2-cycle will itself become unstable ata point that can be calculated by setting

(1.1.28)

df 2

dss2

= −1

s2 =

(a +1) ± (a +1)(a− 3)

2a

a2s2 − a(a + 1)s +(a +1) = 0

(as −(a −1)) a3s3 −2a3s 2 +a 2(1+ a)s +(1−a 2)

a3s3 −(a −1)a2s2

−(a + 1)a 2s2 + a 2(1+ a)s +(1−a 2)

−(a +1)a 2s2 + a(1+ a)(a − 1)s

+ a(1+ a)s +(1−a 2)

a2s2 −a(a + 1)s +(a +1)

)

I t e ra t i v e ma p s ( an d ch a o s ) 31


Figure 1.1.2 (pp. 28-30) Plots of the result of iterating the quadratic map f(s) = as(1 − s)for different values of a. The left and center panels are similar to the left and right panels ofFig. 1.1.1. The left panels plot s(t). The center panels describe the iteration of the map f (s)on axes corresponding to s(t) and s(t − 1). The right panels are similar to the center panelsbut are for the function f2(s). The different values of a are indicated on the panels and showthe changes from (a) convergence to s = 0 for a = 0.5, (b) convergence to s = (a − 1)/ a fora = 1.5, (c) alternating convergence to s = (a − 1)/ a for a = 2.8, (d) bifurcation — conver-gence to a 2-cycle for a = 3.2, (e) second bifurcation — convergence to a 4-cycle for a = 3.5,(f) chaotic behavior for a = 3.8. ❚

01adBARYAM_29412 3/10/02 10:15 AM Page 31

to be a = 1 + √6 = 3.44949. At this value of a the 2-cycle splits into a 4-cycle(Fig. 1.1.2(e)).Each of the fixed points of f 2(s) simultaneously split into 2-cycles thattogether form a 4-cycle for the original map.

Question 1.1.9 Show that when f has a 2-cycle, both of the fixed pointsof f 2 must split simultaneously.

Solution 1.1.9 The split occurs when the fixed points become unstable—the derivative of f 2 equals –1. We can show that the derivative is equal at thetwo fixed points of Eq. (1.1.27), which we call s 2

±:

(1.1.29)

where we have made use of the chain rule.Since f (s2+) = s2

− and vice versa, wehave shown this expression is the same whether s2 = s2

+ or s2 = s2−.

Note: This can be generalized to show that the derivative of f k is thesame at all of its k fixed points corresponding to a k-cycle of f. ❚

The process of taking an n-cycle into a 2n-cycle is called bifurcation.Bifurcation con-tinues to replace the limiting behavior of the iterative map with progressively longercycles of length 2k. The bifurcations can be simulated. They occur at smaller andsmaller intervals and there is a limit point to the bifurcations at ac = 3.56994567.Fig. 1.1.3 shows the values that are reached by the iterative map at long times—thestable cycles—as a function of a < ac . We will discuss an algebraic treatment of the bi-furcation regime in Section 1.10.

Beyond the bifurcation regime a > ac (Fig. 1.1.2(f)) the behavior of the iterativemap can no longer be described using simple cycles that attract the iterations. The be-havior in this regime has been identified with chaos. Chaos has been characterized inmany ways, but one property is quite generally agreed upon—the inherent la ck ofpredictability of the system dynamics. This is often expressed more precisely by de-scribing the sensitivity of the system’s fate to the initial conditions.A possible defini-tion is: There exists a distance d such that for any neighborhood V of any point s it ispossible to find a point s′ within the neighborhood and a number of iterations k sothat f k(s′) is further than d away from f k(s). This means that arbitrarily close to anypoint is a point that will be displaced a significant distance away by iteration.Qualitatively, there are two missing aspects of this definition,first that the points thatmove far away must not be too unlikely (otherwise the system is essentially pre-dictable) and second that d is not too small (in which case the divergence of the dy-namics may not be significant).

If we look at the definition of chaotic behavior, we see that the concept of scaleplays an important role.A small distance between s and s′ turns into a large distancebetween f k(s) and f k(s′). Thus a fine-scale difference eventually becomes a large-scaledifference. This is the essence of chaos as a model of complex system behavior. To un-derstand it more fully, we can think about the state variable s not as one real variable,

df 2

dss2

=df (f (s))

ds s2

=df (s)

ds f (s2)

df (s)

ds s2



01adBARYAM_29412 3/10/02 10:15 AM Page 32

but as an infinite sequence of binary variables that form its binary representation s =0.r1r2r3r4 ... Each of these binary variables represents the state of the system—the valueof some quantity we can measure about the system—on a particular length scale.Thehigher order bits represent the larger scales and the lower order ones represent thefiner scales. Chaotic behavior implies that the state of the first few binary variables,r1r2, at a particular time are determined by the value of fine scale variables at an ear-lier time. The farther back in time we look, the finer scale variables we have to con-sider in order to know the present values of r1r2. Because many different variables arerelevant to the behavior of the system, we say that the system has a complex behavior.We will return to these issues in Chapter 8.

The influence of fine length scales on coarse ones makes iterative maps difficultto simulate by computer. Computer representations of real numbers always have fi-nite precision. This must be taken into account if simulations of iterative maps orchaotic complex systems are performed.

I t e ra t i v e map s ( a n d ch ao s ) 33


Figure 1.1.3 A plot of values of s visited by the quadratic map f(s) = as(1 − s) after manyiterations as a function of a, including stable points, cycles and chaotic behavior. The differ-ent regimes are readily apparent. For a < 1 the stable point is s = 0. For 1 < a < 3 the stablepoint is at s0 = (a − 1)/a. For 3 < a < ac with ac = 3.56994567, there is a bifurcation cascadewith 2-cycles then 4-cycles, etc. 2k-cycles for all values of k appear in progressively narrowerregions of a. Beyond 4-cycles they cannot be seen in this plot. For a > ac there is chaotic be-havior. There are regions of s values that are not visited and regions that are visited in thelong time behavior of the quadratic map in the chaotic regime which this figure does not fullyillustrate. ❚

01adBARYAM_29412 3/10/02 10:15 AM Page 33

Another significant point about the iterative map as a model of a complex systemis that there is nothing outside of the system that is influencing it.All of the informa-tion we need to describe the behavior is contained in the precise value of s. The com-plex behavior arises from the way the different parts of the system—the fine andcourse scales—affect each other.

Question 1.1.10: Why isn’t the iterative map in the chaotic regimeequivalent to picking a number at random?

Solution 1.1.10: We can still predict the behavior of the iterative map overa few iterations. It is only when we iterate long enough that the map becomesunpredictable. More specifically, the continuity of the function f (s) guaran-tees that for s and s′ close together f (s) and f (s′) will also be close together.Specifically, given an it is possible to find a such that for |s − s′| < , | f (s)−f (s′)| < . For the family of functions we have been considering, we only needto set < /a , since then we have:

(1.1.30)

Thus if we fix the number of cycles to be k, we can always find two pointsclose enough so that | f k(s′)−f k(s)|< by setting | s − s′|< /ak. ❚

The tuning of the parameter a leading from simple convergent behavior throughcycle bifurcation to chaos has been identified as a universal description of the ap-pearance of chaotic behavior from simple behavior of many systems. How do we takea complicated real system and map it onto a discrete time iterative map? We must de-fine a system variable and then take snapshots of it at fixed intervals (or at least well-defined intervals). The snapshots correspond to an iterative map. Often there is a nat-ural choice for the interval that simplifies the iterative behavior. We can then check tosee if there is bifurcation and chaos in the real system when parameters that controlthe system behavior are varied.

One of the earliest examples of the application of iterative maps is to the studyof heart attacks. Heart attacks occur in many different ways. One kind of heart at-tack is known as fibrillation. Fibrillation is characterized by chaotic and ineffectiveheart muscle contractions. It has been suggested that bifurcation may be observed inheartbeats as a period doubling (two heartbeats that are inequivalent). If correct,this may serve as a warning that the heart structure, due to various changes in hearttissue parameters, may be approaching fibrillation. Another system where more de-tailed studies have suggested that bifurcation occurs as a route to chaotic behavior isthat of turbulent flows in hydrodynamic systems.A subtlety in the application of theideas of bifurcation and chaos to physical systems is that physical systems are bettermodeled as having an increasing number of degrees of freedom at finer scales. Thisis to be contrasted with a system modeled by a single real number, which has thesame number of degrees of freedom (represented by the binary variables above) ateach length scale.

| f (s) − f ( ′ s )| = a | s(1− s)− ′ s (1− ′ s )| = a | s − ′ s ||1 −(s + ′ s )| < a |s − ′ s | <

34 I n t r od uc t i o n an d P re l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:15 AM Page 34



1.1.4 Are all dynamical systems iterative maps?How general is the iterative map as a tool for describing the dynamics of systems?There are three apparent limitations of iterative maps that we will consider modify-ing later, Eq. (1.1.1):

a. describes the homogeneous evolution of a system since f itself does not dependon time,

b. describes a system where the state of the system at time t depends only on thestate of the system at time t – t, and

c. describes a deterministic evolution of a system.

We can,however, bypass these limitations and keep the same form of the iterative mapif we are willing to let s describe not just the present state of the system but also

a. the state of the sys tem and all other factors that might affect its evo luti on in ti m e ,

b. the state of the system at the present time and sufficiently many previous times,and

c. the probability that the system is in a particular state.

Taking these caveats together, all of the systems we will consider are iterative maps,which therefore appear to be quite general.Generality, however, can be quite useless,since we want to discard as much information as possible when describing a system.

Another way to argue the generality of the iterative map is through the laws ofclassical or quantum dynamics.If we consider s to be a variable that describes the po-sitions and velocities of all particles in a system, all closed systems described by clas-sical mechanics can be described as deterministic iterative maps.Quantum evolutionof a closed system may also be described by an iterative map if s describes the wavefunction of the system. However, our intent is not necessarily to describe microscopicdynamics, but rather the dynamics of variables that we consider to be relevant in de-scribing a system. In this case we are not always guaranteed that a deterministic iter-ative map is sufficient. We will discuss relevant generalizations, first to stochasticmaps, in Section 1.2.

Extra Credit Question 1.1.11 Show that the system of quadratic iterativemaps

(1.1.31)

is essentially equivalent in its dynamical properties to the iterative maps wehave considered in Eq. (1.1.16).

Solution 1.1.11 Two iterative maps are equivalent in their properties if wecan perform a time-independent one-to-one map of the time-dependentsystem states from one case to the other. We will attempt to transform thefamily of quadratic maps given in this problem to the one of Eq.(1.1.16) us-ing a linear map valid at all times

(1.1.32) s(t) = m ′ s (t) +b

s(t) = s(t −1)2 + k

01adBARYAM_29412 3/10/02 10:15 AM Page 35

By direct substitution this leads to:

(1.1.33)

We must now choose the values of m and b so as to obtain the form ofEq. (1.1.16).

(1.1.34)

For a correct placement of minus signs in the parenthesis we need:

(1.1.35)

or

(1.1.36)

(1.1.37)

giving

(1.1.38)

(1.1.39)

We see that for k < 1/4 we have two solutions. These solutions give all possi-ble (positive and negative) values of a.

What about k > 1/4? It turns out that this case is not very interestingcompared to the rich behavior for k < 1/4, since there are no finite fixedpoints,and therefore by Question 1.1.8 no 2-cycles (it is not hard to gener-alize this to n-cycles). To confirm this, verify that iterations diverge to +∞from any initial condition.

Note: The system of equations of this question are the ones extensivelyanalyzed by Devaney in his excellent textbook A First Course in ChaoticDynamical Systems. ❚

Extra Credit Question 1.1.12 You are given a problem to solve whichwhen reduced to mathematical form looks like

(1.1.40)

where f is a complicated function that depends on a parameter c. You knowthat there is a solution of this equation in the vicinity of s0. To solve this equa-tion you try to iterate it (Newton’s method) and it works,since you find thatf k(s0) converges nicely to a solution. Now, however, you realize that you needto solve this problem for a slightly different value of the parameter c, andwhen you try to iterate the equation you can’t get the value of s to converge.Instead the values start to oscillate and then behave in a completely erratic

s = fc (s)

a = −m = 2b = (1 ± 1− 4k )

b = (1± 1− 4k )/2

2b

m= −1

b2 −b +k = 0

′ s (t) = (−m) ′ s (t −1)( −2b

m

− ′ s (t − 1)) +

1

m(b2 +k − b)

′ s (t) = m ′ s (t −1)( ′ s (t −1)+

2b

m) +

1

m(b2 +k − b)

m ′ s (t)+ b = (m ′ s (t − 1)+ b)2 +k

36 I n t r oduc t i on a n d P re l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:15 AM Page 36



way. Suggest a solution for this problem and see if it works for the functionfc (s) = cs(1 − s), c = 3.8, s0 = 0.5. A solution is given in stages (a) - (c) below.

Solution 1.1.12(a) A common resolution of this problem is to consider it-erating the function:

(1.1.41)

where we can adjust to obtain rapid convergence. Note that solutions of

(1.1.42)

are the same as solutions of the original problem.

Question 1.1.12(b) Explain why this could work.

Solution 1.1.12(b) The derivative of this function at a fixed point can becontrolled by the value of . It is a linear interpolation between the fixedpoint derivative of fc and 1. If the fixed point is unstable and oscillating, thederivative of fc must be less than −1 and the interpolation should help.

We can also explain this result without appealing to our work on itera-tive maps by noting that if the iteration is causing us to overshoot the mark,it makes sense to mix the value s we start from with the value we get fromfc(s) to get a better estimate.

Question 1.1.12(c) Explain how to pick .

Solution 1.1.12(c) If the solution is oscillating, then it makes sense to as-sume that the fixed point is in between successive values and the distance isrevealed by how much further it gets each time;i.e., we assume that the iter-ation is essentially a linear map near the fixed point and we adjust so thatwe compensate exactly for the overshoot of fc .

Using two trial iterations, a linear approximation to fc at s0 looks like:

(1.1.43)

Adopting the linear approximation as a definition of g we have:

(1.1.44)

Set up so that the first iteration of the modified system will take youto the desired answer:

(1.1.45)

or

(1.1.46)

(1.1.47) (1 − ) =(s0 − s1)/(s2 −s1)

s0 −s1 =(1− )(f c(s1) − s1) = (1− )(s2 − s1)

s0 = s1 + (1− )fc (s1)

g ≡ (s3 − s2 )/(s2 − s1)

s2 = f c(s1) ≈ g(s1 − s0)+ s0

s3 = fc (s2) ≈ g(s2 −s0) +s0

s = hc(s)

hc(s) = s +(1− )f c(s)

01adBARYAM_29412 3/10/02 10:15 AM Page 37

To eliminate the unknown s0 we use Eq. (1.1.43) to obtain:

(1.1.48)

(1.1.49)

or

(1.1.50)

(1.1.51)

It is easy to check, using the formula in terms of g, that the modified itera-tion has a zero derivative at s0 when we use the approximate linear forms forfc . This means we have the best convergence possible using the informationfrom two iterations of fc . We then use the value of to iterate to convergence.Try it! ❚

Stochastic Iterative Maps

Many of the systems we would like to consider are described by system variableswhose value at the next time step we cannot predict with complete certainty. The un-certainty may arise from many sources,including the existence of interactions and pa-rameters that are too complicated or not very relevant to our problem. We are thenfaced with describing a system in which the outcome of an iteration is probabilisticand not deterministic. Such systems are called stochastic systems. There are severalways to describe such systems mathematically. One of them is to consider the out-come of a particular update to be selected from a set of possible values. The proba-bility of each of the possible values must be specified. This description is not really amodel of a single system, because each realization of the system will do something dif-ferent. Instead,this is a model of a collection of systems—an ensemble.Our task is tostudy the properties of this ensemble.

A stochastic system is generally described by the time evolution of random vari-ables. We begin the discussion by defining a random variable.A random variable s isdefined by its probability distribution Ps(s′), which describes the likelihood that s hasthe value s′. If s is a continuous variable,then Ps (s′)ds′ is the probability that s residesbetween s′ and s′ + ds′. Note that the subscript is the variable name rather than an in-dex. For example, s might be a binary variable that can have the value +1 or −1. Ps (1)is the probability that s = 1 and Ps (−1) is the probability that s = −1. If s is the outcomeof an unbiased coin toss, with heads called 1 and tails called −1, both of these valuesare 1/2.When no confusion can arise,the notation Ps (s′) is abbreviated to P(s), wheres may be either the variable or the value. The sum over all possible values of the prob-ability must be 1.

(1.2.1)

Ps( ′ s ) =1′ s

∑

1.2

= −g /(1− g ) = (s2 − s3)/(2s2 − s1 −s3) 1− = 1/(1− g)

(s0 − s1) = (s2 −s1)/(1− g ) (s2 − s1) = g(s1 − s0)+ (s0 − s1)



01adBARYAM_29412 3/10/02 10:15 AM Page 38

In the discussion of a system described by random variables, we often would liketo know the average value of some quantity Q(s) that depends in a definite way on thevalue of the stochastic variable s. This average is given by:

(1.2.2)

Note that the average is a linear operation.We now consider the case of a time-dependent random variable. Rather than de-

scribing the time dependence of the variable s(t), we describe the time dependence ofthe probability distribution Ps (s′;t). Similar to the iterative map, we can consider thecase where the outcome only depends on the value of the system variable at a previ-ous time,and the transition probabilities do not depend explicitly on time. Such sys-tems are called Markov chains. The transition probabilities from a state at a particu-lar time to the next discrete time are written:

(1.2.3)

Ps is used as the notation for the transition probability, since it is also the probabilitydistribution of s at time t, given a particular value s ′(t − 1) at the previous time. Theuse of a time index for the arguments illustrates the use of the transition probability.Ps (1|1) is the probability that when s = 1 at time t − 1 then s = 1 at time t. Ps (−1|1) isthe probability that when s = 1 at time t − 1 then s =−1 at time t. The transition prob-abilities,along with the initial probability distribution of the system Ps (s′; t = 0), de-termine the time-dependent ensemble that we are interested in. Assuming that wedon’t lose systems on the way, the transition probabilities of Eq. (1.2.3) must satisfy:

(1.2.4)

This states that no matter what the value of the system variable is at a particular time,it must reach some value at the next time.

The stochastic system described by transition probabilities can be written as aniterative map on the probability distribution P(s)

(1.2.5)

It may be more intuitive to write this using the notation

(1.2.6)

in which case it may be sufficient, though hazardous, to write the abbreviated form

(1.2.7)

P(s(t)) = P(s(t)|s(t −1))P(s(t −1))s(t −1)

∑

Ps( ′ s (t);t) = Ps( ′ s (t)| ′ s (t − 1))Ps ( ′ s (t −1);t −1)′ s (t −1)

∑

Ps( ′ s ;t) = Ps( ′ s | ′ ′ s )Ps( ′ ′ s ;t −1)′ ′ s

∑

Ps( ′ ′ s | ′ s )′ ′ s

∑ = 1

Ps( ′ s (t)| ′ s (t −1))

< Q(s) >= Ps( ′ s )Q( ′ s )′ s

∑

S t o c ha s t i c i t e r a t i v e m a ps 39


01adBARYAM_29412 3/10/02 10:15 AM Page 39

It is important to recognize that the time evolution equation for the probabilityis linear. The linear evolution of this system (Eq. (1.2.5)) guarantees that superposi-tion applies. If we start with an initial distribution attime t = 0, then we could find the result at time t by separately looking at the evolu-tion of each of the probabilities P 1(s ;0) and P 2(s ;0). Explicitly we can writeP(s ;t) = P1(s ;t) + P 2(s ;t). The meaning of this equation should be well noted. Theright side of the equation is the sum of the evolved probabilities P 1(s ;t) and P 2(s ;t).This linearity is a direct consequence of the independence of different members of theensemble and says nothing about the complexity of the dynamics.

We note that ultimately we are interested in the behavior of a particular systems(t) that only has one value of s at every time t. The ensemble describes how many suchsystems will behave. Analytically it is easier to describe the ensemble as a whole,how-ever, simulations may also be used to observe the behavior of a single system.

1.2.1 Random walkStochastic systems with only one binary variable might seem to be trivial, but we willdevote quite a bit of attention to this problem. We begin by considering the simplestpossible binary stochastic system. This is the system which corresponds to a coin toss.Ideally, for each toss there is equal probability of heads (s = +1) or tails (s = −1), andthere is no memory from one toss to the next. The ensemble at each time is indepen-dent of time and has an equal probability of ±1:

(1.2.8)

where the discrete delta function is defined by

(1.2.9)

Since Eq. (1.2.8) is independent of what happens at all previous times, the evolutionof the state variable is given by the same expression

(1.2.10)

We can illustrate the evaluation of the average of a function of s at time t :

(1.2.11)

For example, if we just take Q(s) to be s itself we have the average of the systemvariable:

(1.2.12)

< s >t = 12

′ s s '=±1

∑ = 0

< Q(s) >t = Q( ′ s )Ps ( ′ s ;t)′ s = ±1∑ = Q( ′ s ) 1

2 ′ s ,1 + 12 ′ s ,−1( )

′ s =±1

∑ = 12

Q( ′ s )′ s = ±1∑

P( ′ s |s) = 1

2 ′ s ,1 + 12 ′ s ,−1

i ,j =

1 i = j

0 i ≠ j

P(s ;t) = 1

2 s,1 + 12 s ,−1



01adBARYAM_29412 3/10/02 10:16 AM Page 40

Question 1.2.1 Will you win more fair coin tosses if (a) you pick headsevery time,or if (b) you alternate heads and tails, or if (c) you pick heads

or tails at random or if (d) you pick heads and tails by some other system?Explain why.

Solution 1.2.1 In general, we cannot predict the number of coin tosses thatwill be won, we can only estimate it based on the chance of winning.Assuming a fair coin means that this is the best that can be done. Any of thepossibilities (a)–(c) give the same chance of winning. In none of these waysof gambling does the choice you make correlate with the result of the cointoss. The only system (d) that can help is if you have some information aboutwhat the result of the toss will be,like betting on the known result after thecoin is tossed.A way to write this formally is to write the probability distri-bution of the choice that you are making. This choice is also a stochasticprocess. Calling the choice c(t), the four possibilities mentioned are:

(a) (1.2.13)

(b) (1.2.14)

(c) (1.2.15)

(d) (1.2.16)

It is sufficient to show that the average probability of winning is thesame in each of (a)–(c) and is just 1/2. We follow through the manipulationsin order to illustrate some concepts in the treatment of more than one sto-chastic variable. We have to sum over the probabilities of each of the possi-ble values of the coin toss and each of the values of the choices, adding upthe probability that they coincide at a particular time t:

(1.2.17)

This expression assumes that the values of the coin toss and the value ofthe choice are independent,so that the joint probability of having a particu-lar value of s and a particular value of c is the product of the probabilities ofeach of the variables independently:

(1.2.18)

—the probabilities-of-independent-variables factor. This is valid in cases(a)–(c) and not in case (d), where the probability of c occurring is explicitlya function of the value of s.

We eva lu a te the prob a bi l i ty of winning in each case (a) thro u gh (c) using

Ps ,c ( ′ s , ′ c ;t) = Ps( ′ s ;t)Pc ( ′ c ;t)

< c,s >= ′ c , ′ s Ps ( ′ s ;t)Pc( ′ c ,t)′ c

∑′ s

∑

P(c;t) = c,s(t )

P(c;t) = 1

2 c ,1 + 12 c ,−1

P(c;t) = 1+(−1)

t

2 c ,1 + 1−(−1)t

2 c ,−1 = mod2(t) c,1 + mod2(t +1) c ,−1

P(c;t) = c,1

S t o c ha s t ic i t e ra t i v e m a ps 41


01adBARYAM_29412 3/10/02 10:16 AM Page 41

(1.2.19)

where the last equality follows from the normalization of the probability (thesum over all possibilities must be 1, Eq. (1.2.1)) and does not depend at allon the distribution. This shows that the independence of the variables guar-antees that the probability of a win is just 1/2.

For the last case (d) the trivial answer, that a win is guaranteed by thismethod of gambling, can be arrived at formally by evaluating

(1.2.20)

The value of s at time t is independent of the value of c, but the value of c de-pends on the value of s. The joint probability Ps,c(s′,c′;t) may be written as theproduct of the probability of a particular value of s = s′ times the conditionalprobability Pc(c′|s′;t) of a particular value of c = c′ given the assumed valueof s:

(1.2.21) ❚

The next step in our analysis of the binary stochastic system is to consider the be-havior of the sum of s(t) over a particular number of time steps. This sum is the dif-ference between the total number of heads and the total number of tails. It is equiva-lent to asking how much you will win or lose if you gamble an equal amount of moneyon each coin toss after a certain number of bets. This problem is known as a randomwalk, and we will define it as a consideration of the state variable

(1.2.22)

The way to write the evolution of the state variable is:

(1.2.23)

Thus a ra n dom walk con s i ders a state va ri a ble d that can take integer va lues d ∈ { . . . ,− 1 , 0 , 1 , . . . } . At every time step, d(t) can on ly move to a va lue one high er or one lowerthan wh ere it is. We assume that the prob a bi l i ty of a step to the ri ght (high er) is equ a lto that of a step to the left (lower ) . For conven i en ce , we assume (with no loss of gen er-

P( ′ d |d) = 1

2 ′ d ,d+1 + 12 ′ d ,d −1

d(t) = s( ′ t )′ t =1

t

∑

< c,s > = ′ c , ′ s Ps ( ′ s ;t)Pc( ′ c | ′ s ;t)′ c

∑′ s

∑= ′ c , ′ s Ps( ′ s ;t) ′ c , ′ s

′ c ∑

′ s ∑ = Ps( ′ s ;t)

′ s ∑ = 1

< c,s > = ′ c , ′ s Ps ,c( ′ s , ′ c ;t)

′ c ∑

′ s ∑

< c,s > = ′ c , ′ s (12 ′ s ,1 + 1

2 ′ s ,−1)Pc ( ′ c ;t)′ c

∑′ s

∑= (1

2 ′ c ,1 + 12 ′ c ,−1)Pc( ′ c ;t)

′ c ∑

= (12

Pc (1;t) + 12

Pc(−1;t))′ c

∑ = 12



01adBARYAM_29412 3/10/02 10:16 AM Page 42

a l i ty) that the sys tem starts at po s i ti on d(0) = 0 . This is built into Eq .( 1 . 2 . 2 2 ) . Bec a u s eof the sym m etry of the sys tem under a shift of the ori gi n , this is equ iva l ent to con s i d-ering any other starting poi n t . O n ce we solve for the prob a bi l i ty distri buti on of d a ttime t, because of su perpo s i ti on we can also find the re sult of evo lving any initial prob-a bi l i ty distri buti on P(d;t = 0 ) .

We can pictu re the ra n dom walk as that of a drunk who has difficulty con s i s-ten t ly moving forw a rd . Our model of this walk assumes that the drunk is equ a llyl i kely to take a step forw a rd or back w a rd . S t a rting at po s i ti on 0, he moves to ei t h er+1 or −1 . Let’s say it was +1 . Next he moves to +2 or back to 0. Let’s say it was 0. Nex tto +1 or −1 . Let’s say it was +1 . Next to +2 or 0. Let’s say +2 . Next to +3 or +1 . Let’ss ay +1 . And so on .

What is the value of system variable d(t) at time t? This is equivalent to askinghow far has the walk progressed after t steps.Of course there is no way to know howfar a particular system goes without watching it. The average distance over the en-semble of systems is the average over all possible values of s(t). This average is givenby applying Eq. (1.2.2) or Eq. (1.2.11) to all of the variables s(t):

(1.2.24)

The average is written out explicitly on the first line using Eq.(1.2.11). The second lineexpression can be arrived at either directly or from the linearity of the average. The fi-nal answer is clear, since it is equally likely for the walker to move to the right as to theleft.

We can also ask what is a typical distance traveled by a particular walker. By typ-ical distance we mean how far from the starting point. This can either be defined bythe average absolute value of the distance, or as is more commonly accepted,the rootmean square (RMS) distance:

(1.2.25)

(1.2.26)

To evaluate the average of the product of the two steps, we treat differently the case inwhich they are the same step and when they are different steps. When the two stepsare the same one we use s(t) = ±1 to obtain:

(1.2.27)

Which follows from the normalization of the probability (or is obvious). To evaluatethe average of the product of two steps at different times we need the joint probabil-ity of s(t ) and s(t ′). This is the probability that each of them will take a particular

< s(t)2 > = <1> =1

< d(t)2 > =< s( ′ t )′ t =1

t

∑

2

> =< s( ′ t )s( ′ ′ t )′ t , ′ ′ t =1

t

∑ > = < s( ′ t )s( ′ ′ t )′ t , ′ ′ t =1

t

∑ >

(t) = < d(t)2 >

< d(t) > = 12

s(t )=±1∑ K1

2s (3)=±1∑ 1

2s (2)=±1∑ 1

2d(t)

s(1)=±1∑

= < s( ′ t ) >′ t =1

t

∑ = 0

S t o c ha s t ic i t e ra t i v e m aps 43


01adBARYAM_29412 3/10/02 10:16 AM Page 43

value. Because we have assumed that the steps are independent, the joint probability isthe product of the probabilities for each one separately:

t ≠ t ′ (1.2.28)

so that for example there is 1/4 chance that s(t) = +1 and s(t) = −1. The independenceof the two steps leads the average of the product of the two steps to factor:

t ≠ t ′ (1.2.29)

This is zero, since either of the averages are zero. We have the combined result:

(1.2.30)

and finally:

(1.2.31)

This gives the classic and important re sult that a ra n dom walk travels a typical distancethat grows as the squ a re root of the nu m ber of s teps taken : .

We can now consider more completely the probability distribution of the posi-tion of the walker at time t. The probability distribution at t = 0 may be written:

(1.2.32)

After the first time step the probability distribution changes to

(1.2.33)

this results from the definition d(1) = s (1). After the second step d(2) = s(1) + s(2) itis:

(1.2.34)

More generally it is not difficult to see that the probabilities are given by normalizedbinomial coefficients,since the number of ones chosen out of t steps is equivalent tothe number of powers of x in (1 + x)t. To reach a position d after t steps we must take(t + d)/2 steps to the right and (t − d)/2 steps to the left. The sum of these is the num-ber of steps t and their difference is d. Since each choice has 1/2 probability we have:

P(d ;2) = 1

4 d ,2 + 12 d ,0 + 1

4 d ,−2

P(d ;1) = 1

2 d ,1 + 12 d ,−1

P(d ;0) = d,0

< d(t)2 > = < s( ′ t )s( ′ ′ t ) >′ t , ′ ′ t =1

t

∑ = ′ t , ′ ′ t ′ t , ′ ′ t =1

t

∑ = 1′ t =1

t

∑ = t

< s(t)s( ′ t ) > = t , ′ t

< s(t)s( ′ t ) > = P(s(t),s( ′ t ))s(t)s( ′ t )s (t ), s( ′ t )∑

= P(s(t))P(s( ′ t ))s(t)s( ′ t )s (t ),s( ′ t )

∑= < s(t) > < s( ′ t ) > =0

P(s(t),s( ′ t )) = P(s(t))P(s( ′ t ))



01adBARYAM_29412 3/10/02 10:16 AM Page 44

(1.2.35)

where the unusual delta function imposes the condition that d takes only odd or onlyeven values depending on whether t is odd or even.

Let us now consider what happens after a long time.The probability distributionspreads out,and a single step is a small distance compared to the typical distance trav-eled. We can consider s and t to be continuous variables where both conditionsd,t >> 1 are satisfied. Moreover, we can also consider d<< t, because the chance thatall steps will be taken in one direction becomes very small. This enables us to useSterling’s approximation to the factorial

(1.2.36)

For large t it also makes sense not to restrict d to be either odd or even. In order to al-low both, we,in effect, interpolate and then take only half of the probability we havein Eq. (1.2.35). This leads to the expression:

(1.2.37)

where we have defined x = d / t. To approximate this expression it is easier to considerit in logarithmic form:

or exponentiating:

(1.2.39)

P(d ,t) =1

2 te −d

2/ 2t =

1

2e −d

2/ 2

2

ln(P(d,t))= −(t /2)[(1+ x)ln(1 + x) +(1− x)ln(1 − x)]−(1/2)ln(2 t(1− x2))

≈ −(t /2)[(1 + x)(x − x 2 /2+K) +(1− x)(−x − x 2 /2+K)]− (1/2)ln(2 t + K)

= −tx 2 /2− ln( 2 t )

P(d ,t) = t

2 (t −d)(t +d)2t

t te −t

[(d + t)/2][(d+t )/2][(t − d)/2][(t −d)/ 2]e −(d+t )/2−(t −d)/2

= (2 t(1− x2))−1/ 2

(1+ x)[(1+x )t / 2](1− x)[(1−x )t / 2]

x!~ 2 x e −xx x

ln(x!)~ x(lnx −1) + ln( 2 x )

P(d ,t) = 1

2t

t

(d + t)/2

t ,d

oddeven = 1

2t

t !

[(d + t )/2]![(t − d)/2]!t ,doddeven

t ,doddeven =

(1+(−1)t +d )

2

S t o c ha s t i c i t e ra t i v e m a ps 45


(1.2.38)

01adBARYAM_29412 3/10/02 10:16 AM Page 45

The prefactor of the exponential, 1/√2 , originates from the factor √2 x inEq. (1.2.36). It is independent of d and takes care of the normalization of the proba-bility. The result is a Gaussian distribution. Questions 1.2.2–1.2.5 investigate higher-order corrections to the Gaussian distribution.

Question 1.2.2 In order to obtain a correction to the Gaussian distribu-tion we must add a correction term to Sterling’s approximation:

(1.2.40)

Using this expression, find the first correction term to Eq. (1.2.37).

Solution 1.2.2 The correction term in Sterling’s approximation contributesa factor to Eq. (1.2.37) which is (for convenience we write here c = 1/12):

(1.2.41)

where we have only kept the largest correction term,neglecting d comparedto t. Note that the correction term vanishes as t becomes large. ❚

Question 1.2.3 Keeping additional terms of the expansion in Eq.(1.2.38),and the result of Question 1.2.2,find the first order correction terms to

the Gaussian distribution.

Solution 1.2.3 Correction terms in Eq. (1.2.38) arise from several places.We want to keep all terms that are of order 1/t. To do this we must keep inmind that a typical distance traveled is d ∼ √t, so that . The nextterms are obtained from:

This gives us a distribution:

ln(P(d,t))= −(t /2)[(1+ x)ln(1 + x) +(1− x)ln(1 − x)]

− (1/2)ln(2 t(1− x2 ))+ ln(1− 1/4t)

≈ −(t /2)[(1+ x)(x − 12

x 2 + 13

x3 − 14

x4 K)

+ (1− x)(−x − 12

x2 − 13

x 3 − 14

x 4 K)]

− ln( 2 t ) −(1/2)ln(1− x 2) + ln(1− 1/4t)

≈ −(t /2)[(x + x 2 − 12

x 2 − 12

x 3 + 13

x3 + 13

x 4 − 14

x 4K)

+ (−x + x 2 − 12

x 2 + 12

x 3 − 13

x3 + 13

x 4 − 14

x4 K)]

− ln( 2 t ) +(x 2 /2 +…)+ (−1/4t + …)

= −tx 2 /2− ln( 2 t ) −tx 4 /12 + x 2 /2 − 1/4t

(1+ c /t)

(1+ 2c /(t + d))(1+ 2c /(t − d))= (1−

3c

t+ …) =(1−

1

4t+…)

x!~ 2 x e − xx x(1+ 1

12x+…)

ln(x!)~ x(lnx −1) + ln( 2 x ) + ln(1+ 1

12x+…)



(1.2.42)

01adBARYAM_29412 3/10/02 10:16 AM Page 46

(1.2.43) ❚

Question 1.2.4 What is the size of the additional factor? Estimate thesize of this term as t becomes large.

Solution 1.2.4 The typical value of the variable d is its root mean squarevalue = √t . At this value the additional term gives a factor

(1.2.44)

which approaches 1 as time increases. ❚

Question 1.2.5 What is the fraction error that we will make if we neglectthis term after one hundred steps? After ten thousand steps?

Solution 1.2.5 After one hundred time steps the walker has traveled a typ-ical distance of ten steps. We generally approximate the probability of arriv-ing at this distance using Eq.(1.2.39). The fractional error in the probabilityof arriving at this distance according to Eq. (1.2.44) is 1 − e1/6t ≈ −1 / 6t =−0.00167. So already at a distance of ten steps the error is less than 0.2%.

It is mu ch less likely for the walker to arrive at the distance 2 = 2 0 . Th era tio of the prob a bi l i ty to arrive at 20 com p a red to 10 is e−2 / e−0 . 5 ∼ 0 . 2 2 . Ifwe want to know the error of this small er prob a bi l i ty case we would wri te(1 − e−1 6 / 1 2t +4 / 2t−1 / 4t) = (1 − e5 / 1 2t) ≈ −0 . 0 0 4 2 , wh i ch is a larger but sti ll smallerror.

After ten thousand steps the errors are smaller than the errors at onehundred steps by a factor of one hundred. ❚

1.2.2 Generalized random walk and the central limit theoremWe can generalize the random walk by allowing a variety of steps from the current lo-cation of the walker to sites nearby, not only to the adjacent sites and not only to in-teger locations. If we restrict ourselves to steps that on average are balanced left andright and are not too long ranged, we can show that all such systems have the samebehavior as the simplest random walk at long enough times (and characteristicallynot even for very long times). This is the content of the central limit theorem. It saysthat summing any set of independent random variables eventually leads to a Gaussiandistribution of probabilities, which is the same distribution as the one we arrived atfor the random walk.The reason that the same distribution arises is that successive it-eration of the probability update equation, Eq.(1.2.7),smoothes out the distribution,and the only relevant information that survives is the width of the distribution whichis given by (t). The proof given below makes use of a Fourier transform and can beskipped by readers who are not well acquainted with transforms. In the next sectionwe will also include a bias in the random walk. For long times this can be described as

e1/ 6t

P(d ,t) =

1

2 te −d

2/ 2te −d

4/ 12t

3 +d2

/ 2t2 −1/ 4t



01adBARYAM_29412 3/10/02 10:16 AM Page 47

an average motion superimposed on the unbiased random walk.We start with the un-biased random walk.

Each step of the random walk is described by the state variable s(t) at time t. Theprobability of a particular step size is an unspecified function that is independent oftime:

(1.2.45)

We treat the case of integer values of s. The continuum case is Question 1.2.6. The ab-sence of bias in the random walk is described by setting the average displacement ina single step to zero:

(1.2.46)

The statement above that each step is not too long ranged,is mathematically just thatthe mean square displacement in a single step has a well-defined value (i.e., is notinfinite):

(1.2.47)

Eqs. (1.2.45)–(1.2.47) hold at all times.We can still evaluate the average of d(t) and the RMS value of d(t) directly using

the linearity of the average:

(1.2.48)

(1.2.49)

Since s(t ′) and s(t″) are independent for t ′ ≠ t ′′, as in Eq. (1.2.29), the averagefactors:

t ′ ≠ t ′′ (1.2.50)

Thus, all terms t ′ ≠ t ′′ are zero by Eq. (1.2.46). We have:

(1.2.51)

This means that the typical value of d(t) is 0√t .To obtain the full distribution of the random walk state variable d(t) we have to

sum the stochastic variables s(t). Since d(t) = d(t − 1) + s(t) the probability of transi-tion from d(t − 1) to d(t) is f (d(t) − d(t − 1)) or:

< d(t)2 > = < s( ′ t )2 >

′ t =1

t

∑ = t 02

< s( ′ t )s( ′ ′ t ) > = <s( ′ t ) >< s( ′ ′ t ) > = 0

< d(t)2 > =< s( ′ t )′ t =1

t

∑

2

> = < s( ′ t )s( ′ ′ t ) >′ t , ′ ′ t =1

t

∑

< d(t) > = < s( ′ t )′ t =1

t

∑ > =t < s > = 0

< s2 > = s2 f (s) = 0

2

s∑

< s > = sf (s) = 0s

∑

P(s ;t) = f (s)



01adBARYAM_29412 3/10/02 10:16 AM Page 48

(1.2.52)

We can now write the time evolution equation and iterate it t times to get P (d ;t).

(1.2.53)

This is a convolution, so the most convenient way to effect a t fold iteration is inFourier space. The Fourier representation of the probability and transition functionsfor integral d is:

(1.2.54)

We use a Fourier series because of the restriction to integer values of d. Once we solvethe problem using the Fourier representation, the probability distribution is recov-ered from the inverse formula:

(1.2.55)

which is proved

(1.2.56)

using the expression:

(1.2.57)

Applying Eq. (1.2.54) to Eq. (1.2.53):

(1.2.58)

˜ P (k;t) = e −ikd

d∑ f (d − ′ d )P( ′ d ;t − 1)

′ d ∑

=′ d

∑ e −ik(d− ′ d )e −ik ′ d f (d − ′ d )P( ′ d ;t −1)d

∑=

′ d

∑ e −ik ′ ′ d e −ik ′ d f ( ′ ′ d )P( ′ d ;t −1)′ ′ d

∑= e −ik ′ ′ d f ( ′ ′ d )

′ ′ d

∑ e −ik ′ d P( ′ d ;t −1)′ d

∑ = ˜ f (k) ˜ P (k;t − 1)

d , ′ d =

1

2dke ik(d− ′ d )

−∫

1

2dke ikd ˜ P (k;t)

−∫ = 1

2dke ikd e −ik ′ d P( ′ d ;t)

′ d ∑

−∫

=1

2P( ′ d ;t) dke ik(d− ′ d )

−∫

′ d

∑ = P( ′ d ;t) d , ′ d

′ d

∑ = P(d ;t)

P(d ;t) =1

2dke ikd ˜ P (k ;t)

−∫

˜ P (k;t) ≡ e −ikd P(d;t)d

∑˜ f (k) ≡ e −iks f (s)

s∑

P(d;t) = P(d | ′ d )P( ′ d ;t − 1)

d'

∑ = f (d − ′ d )P( ′ d ;t − 1)d '

∑

P( ′ d |d) = f ( ′ d −d)



01adBARYAM_29412 3/10/02 10:16 AM Page 49

we can iterate the equation to obtain:

(1.2.59)

where we use the definition d(1) = s(1) that ensures that P(d ;1) = P(s;1) = f (d).For large t the walker has traveled a large distance,so we are interested in varia-

tions of the probability P(d ;t) over large distances.Thus,in Fourier space we are con-cerned with small values of k. To simplify Eq.(1.2.59) for large t we expand f(k) neark = 0. From Eq.(1.2.54) we can directly evaluate the derivatives of f (k) at k = 0 in termsof averages:

(1.2.60)

We can use this expression to evaluate the terms of a Taylor expansion of f (k):

(1.2.61)

(1.2.62)

Using the normalization of the probability (< 1 > = 1),and Eqs.(1.2.46) and (1.2.47),gives us:

(1.2.63)

We must now rem em ber that a typical va lue of d(t) ,f rom its RMS va lu e , is 0√t . By theproperties of the Fourier transform,this implies that a typical value of k that we mustconsider in Eq.(1.2.63) varies with time as 1/√t . The next term in the expansion,cu-bic in k, would give rise to a term that is smaller by this factor, and therefore becomesunimportant at long times. If we write k = q /√t , then it becomes clearer how to writeEq. (1.2.63) using a limiting expression for large t :

(1.2.64)

This Gaussian, when Fourier transformed back to an expression in d, gives us aGaussian as follows:

(1.2.65)

P(d ;t) =1

2dke ikde −t 0

2k

2/ 2

−∫ ≅

1

2dke ikde −t 0

2k

2/2

−∞

∞

∫

˜ P (k;t) = 1− 12

02q2

t+K

t

~e − 02q

2/2 = e −t 0

2k

2/ 2

˜ P (k;t) = 1− 1

2 02k 2 +K( )t

˜ f (k) =<1> −i < s >k −

1

2< s2 > k 2 +K

˜ f (k) = ˜ f (0) +˜ f (k)

kk =0

k +1

2

2 ˜ f (k)

k 2k =0

k 2 +K

dn ˜ f (k)

dnkk =0

= (−is)n f (s)s

∑ = (−i)n < sn >

˜ P (k;t) = ˜ f (k) ˜ P (k ;t −1) = ˜ f (k)t



01adBARYAM_29412 3/10/02 10:16 AM Page 50

We have extended the integral because the decaying exponential becomes narrow as tincreases. The integral is performed by completing the square in the exponent, giving:

(1.2.66)

or equivalently:

(1.2.67)

which is the same as Eq. (1.2.39).

Question 1.2.6 Prove the central limit theorem when s takes a contin-uum of values.

Solution 1.2.6 The proof follows the same course as the integer valuedcase. We must define the appropriate averages,and the transform. The aver-age of s is still zero, and the mean square displacement is defined similarly:

(1.2.46´)

(1.2.47´)

To avoid problems of notation we substitute the variable x for the state vari-able d:

(1.2.48´)

Skipping steps that are the same we find:

(1.2.51´)

since s(t ′) and s(t ′′) are still independent for t ′ ≠ t ′′. Eq. (1.2.53) is also es-sentially unchanged:

(1.2.53´)

The transform and inverse transform must now be defined using

(1.2.54´)

˜ P (k;t) ≡ dx∫ e −ikxP(x ;t)

˜ f (k) ≡ ds∫ e −iks f (s)

P(x;t) = d ′ x f (x − ′ x )P( ′ x ;t −1)∫

< x(t)2 > = < s( ′ t )

′ t =1

t

∑

2

> = < s( ′ t )2 >′ t =1

t

∑ = t 02

< x(t) > = < s( ′ t )′ t =1

t

∑ > = t < s > = 0

< s2 > = ds∫ s2f (s) = 0

2

< s > = ds∫ sf (s) = 0

P(d ;t) =1

2 (t)2e −d

2/ 2 (t )

2

=1

2dke −d

2/ 2t 0

2

e −(t 02k

2 −2ikd−d2

/t 02

)/2

−∞

∞

∫ =1

2 t 02

e −d2

/ 2t 02



01adBARYAM_29412 3/10/02 10:16 AM Page 51

(1.2.55´)

The latter is proved using the properties of the Dirac (continuum) deltafunction:

(1.2.56´)

where the latter equation holds for an arbitrary function g(x).The remainder of the derivation carries forward unchanged. ❚

1.2.3 Biased random walkWe now return to the simple random walk with binary steps of ±1. The model we con-sider is a random walk that is biased in one direction.Each time a step is taken thereis a probability P+ for a step of +1, that is different from the probability P– for a stepof –1, or:

(1.2.68)

(1.2.69)

where

(1.2.70)

What is the average distance traveled in time t?

(1.2.71)

This equation justifies defining the mean velocity as

(1.2.72)

Since we already have an average displacement it doesn’t make sense to also askfor a typical displacement,as we did with the random walk—the typical displacementis the average one.However, we can ask about the spread of the displacements aroundthe average displacement

(1.2.73)

This is called the standard deviation and it reduces to the RMS distance in the unbi-ased case. For many purposes (t) plays the same role in the biased random walk asin the unbiased random walk. From Eq. (1.2.71) and Eq. (1.2.72) the second term is(vt)2. The first term is:

(t)2 = <(d(t)− <d(t) >)2 > = <d(t)2 > −2 < d(t) >2 + <d(t) >2

= < d(t)2 > − < d(t) >2

v = P+ − P−

< d(t) > = < s( ′ t ) >′ t =1

t

∑ = (P+ − P− )′ t =1

t

∑ = t(P+ − P− )

P+ + P– = 1

P( ′ d |d) = P+ ′ d ,d +1 + P− ′ d ,d−

P(s ;t) = P+ s ,1 + P− s ,−1

(x − ′ x ) = 1

2dke ik(x− ′ x )∫

d ′ x (x − ′ x )∫ g( ′ x ) = g(x)

P(d ;t) =

1

2dke ikd ˜ P (k ;t)∫



01adBARYAM_29412 3/10/02 10:16 AM Page 52

(1.2.74)

Substituting in Eq. (1.2.73):

(1.2.75)

It is interesting to consider this expression in the two limits v = 1 and v = 0. For v = 1the walk is deterministic, P+ = 1 and P− = 0, and there is no element of chance; thewalker always walks to the right.This is equivalent to the iterative map Eq.(1.1.4).Ourresult Eq.(1.2.66) is that = 0, as it must be for a deterministic system. However, forsmaller velocities,the spreading of the systems increases until at v = 0 we recover thecase of the unbiased random walk.

The complete probability distribution is given by:

(1.2.76)

For large t the distribution can be found as we did for the unbiased random walk. Thework is left to Question 1.2.7.

Question 1.2.7 Find the long time (continuum) distribution for the bi-ased random walk.

Solution 1.2.7 We use the Sterling approx i m a ti on as before and take the log-a rithm of the prob a bi l i ty. In ad d i ti on to the ex pre s s i on from the first line ofEq . (1.2.38) we have an ad d i ti onal factor due to the coef f i c i ent of Eq .( 1 . 2 . 7 6 )wh i ch appe a rs in place of the factor of 1 / 2t. We again define x = d/t, and di-vi de by 2 to all ow both odd and even integers . We obtain the ex pre s s i on :

(1.2.77)

It makes the most sense to expand this around the mean of x, <x> = v. Tosimplify the notation we can use Eq. (1.2.70) and Eq. (1.2.72) to write:

(1.2.78)

With these substitutions we have:

(1.2.79)

ln(P(d,t))= (t /2)[(1+ x)ln(1+ v) +(1− x)ln(1− v)]

−(t /2)[(1+ x)ln(1+ x)+ (1− x)ln(1− x)]−(1/2)ln(2 t(1− x 2))

P+ = (1 + v)/2

P− = (1− v)/2

ln(P(d,t))= (t /2)[(1+ x)ln2P+ +(1− x )ln2P− ]

−(t /2)[(1+ x)ln(1 + x)+ (1− x)ln(1 − x)]−(1/2)ln(2 t(1− x 2))

P(d ;t) = P+(d +t )/ 2

P−(d−t)/ 2 t

(d + t)/2

t ,d

oddeven

2 = t(1 − v

2 )

< d(t)2 > =< s( ′ t )′ t =1

t

∑

2

> = < s( ′ t )s( ′ ′ t ) >′ t , ′ ′ t =1

t

∑

= ′ t , ′ ′ t +(1− ′ t , ′ ′ t )(P+2

+ P−2

− 2P+ P− )

′ t , ′ ′ t =1

t

∑= t + t(t −1)v2 = t 2

v2 +t(1− v

2)

S to cha s t i c i t e ra t i v e m aps 53


01adBARYAM_29412 3/10/02 10:16 AM Page 53

We expand the first two terms in a Taylor expansion around the mean of xand expand the third term inside the logarithm. The first term of Eq.(1.2.79)has only a constant and linear term in a Taylor expansion. These cancel theconstant and the first derivative of the Taylor expansion of the second termof Eq. (1.2.79) at x = v. Higher derivatives arise only from the second term:

In the last line we have restored d and used Eq.(1.2.75). Keeping only the firstterms in both expansions gives us:

(1.2.81)

which is a Gaussian distribution around the mean we obtained before. Thisimplies that aside from the constant velocity, and a slightly modified stan-dard deviation, the distribution remains unchanged.

The second term in both expansions in Eq.(1.2.80) become small in thelimit of large t, as long as we are not interested in the tail of the distribution.Values of (d − vt) relevant to the main part of the distribution are given bythe standard deviation, (t). The second terms in Eq. (1.2.80) are thus re-duced by a factor of (t) compared to the first terms in the series. Since (t)grows as the square root of the time, they become insignificant for longtimes. The convergence is slower, however, than in the unbiased randomwalk (Questions 1.2.2–1.2.5). ❚

Question 1.2.8 You are a manager of a casino and are told by the ownerthat you have a cash flow problem. In order to survive, you have to make

sure that nine out of ten working days you have a profit. Assume that the onlygame in your casino is a roulette wheel. Bets are limited to only red or blackwith a 2:1 payoff. The roulette wheel has an equal number of red numbersand black numbers and one green number (the house always wins on green).Assume that people make a fixed number of 106 total $1 bets on the roulettewheel in each day.

a. What is the maximum number of red numbers on the roulette wheelthat will still allow you to achieve your objective?

b. With this number of red numbers, how much money do you make onaverage in each day?

P(d ;t) =1

2 (t)2e −(d−vt)

22 (t )

2

ln(P(d,t))= −(t /2)[1

(1− v2)

(x −v)2 +2

3(1− v2)2

(x − v)3 +K]

− (1/2)ln(2 t[(1− v2) −2v(x − v) +K])

= −[(d − vt)2

2 (t)2+

(d − vt)3

3 (t )4+K]−(1/2)ln(2 ( (t)2 − 2v(d − vt) +K))



(1.2.80)

01adBARYAM_29412 3/10/02 10:16 AM Page 54

Solution 1.2.8 The casino wins $1 for every wrong bet and loses $1 forevery right bet. The results of bets at the casino are equivalent to a randomwalk with a bias given by:

(1.2.82)

(1.2.83)

where,as the manager, we consider positive the wins of the casino. The colorsubscripts can be used interchangeably, since the number of red and black isequal. The velocity of the random walk is given by:

(1.2.84)

To calculate the probability that the casino will lose on a particular day wemust sum the probability that the random walk after 106 steps will result ina negative number. We approximate the sum by an integral over the distrib-ution of Eq. (1.2.81). To avoid problems of notation we replace d with y:

(1.2.85)

(1.2.86)

We have written the probability of loss in a day in terms of the error func-tion erf(x)—the integral of a Gaussian defined by

(1.2.87)

Since

(1.2.88)

we have the expression

(1.2.89)

which is also known as the complementary error function erfc(x).

(1 − erf(z 0)) ≡2

dzz 0

∞

∫ e −z2

erf(∞) = 1

erf(z 0) ≡2

dz0

z0

∫ e −z2

z = ′ y 2 (t)

z 0 = −vt / 2 (t)2 = −vt / 2t(1− v2)

Ploss = dyP(y ;t = 106)

−∞

0

∫ = 1

2 (t )2dy

−∞

0

∫ e −(y −vt )2

2 (t )2

=1

2 (t)2d ′ y

−∞

−vt

∫ e −( ′ y )2

2 (t )2

= 1dz

−∞

z 0

∫ e −z2

= 1

2(1 − erf(z 0))

v = 1/(2Nred + 1)

P− = Nblack /(N red + Nblack +1)

P+ = (N red +1)/(Nred + Nblack +1)



01adBARYAM_29412 3/10/02 10:16 AM Page 55

To obtain the desired constraint on the number of red numbers, orequivalently on the velocity, we invert Eq. (1.2.85) to find a value of v thatgives the desired Ploss = 0.1, or erf(z0) = 0.8. Looking up the error function orusing iterative guessing on an appropriate computer gives z0 = 0.9062.Inverting Eq. (1.2.86) gives:

(1.2.90)

The approximation holds because t is large.The numerical result is v= 0.0013.This g ives us the desired number of each color (inverting Eq. (1.2.84)) ofNred = 371. Of course the result is a very large number and the problem ofwinning nine out of ten days is a very conservative problem for a casino. Evenif we insist on winning ninety-nine out of one hundred days we would haveerf(z0) = 0.98, z0 = 1.645, v = 0.0018 and Nred = 275. The profits per day ineach case are given by vt, which is approximately $1,300 and $1,800 respec-tively. Of course this is much less than for bets on a more realistic roulettewheel. Eventually as we reduce the chance of the casino losing and z0 becomeslarger, we might become concerned that we are describing the properties ofthe tail of the distribution when we calculate the fraction of days the casinomight lose,and Eq.(1.2.85) will not be very accurate. However, it is not dif-ficult to see that casinos do not have cash flow problems. ❚

In order to generalize the proof of the central limit theorem to the case of a bi-ased random walk, we can treat the continuum case most simply by considering thesystem variable x, where (using d → x for the continuum case):

(1.2.91)

O n ly x is a stoch a s tic va ri a ble on the ri ght side , v and t a re nu m bers . Si n ce itera ti ons ofthis va ri a ble would satisfy the con d i ti ons for the gen era l i zed ra n dom walk, the gen er-a l i z a ti on of the Gaussian distri buti on to Eq .(1.2.81) is proved . The discrete case is moredifficult to prove because we cannot shift the va ri a ble d by arbi tra ry amounts and con-ti nue to con s i der it as discrete . We can argue the discrete case to be valid on the basisof the re sult for the con ti nuum case, but a sep a ra te proof can be con s tru cted as well .

1.2.4 Master equation approachThe Master equation is an alternative approach to stochastic systems,an alternative toEq. (1.2.5), that is usually applied when time is continuous. We develop it startingfrom the discrete time case. We can rewrite Eq. (1.2.5) in the form of a differenceequation for a particular probability P(s). Beginning from:

(1.2.92)

P(s ;t) = P(s ;t −1) + P(s | ′ s )P( ′ s ;t −1)′ s

∑ − P(s ;t −1)

x = x − < x > t = x −t < s > = x − vt

v =1

t /2z 0 −1≈ 2z 0 /t

56 I n t r o duc t i on a n d P r e l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:16 AM Page 56

we extract the term where the system remains in the same state:

(1.2.93)

We use the normalization of probability to write it in terms of the transitions awayfrom this site:

(1.2.94)

Canceling the terms in the bracket that refer only to the probability P(s;t − 1) we writethis as a difference equation. On the right appear only the probabilities at differentvalues of the state variable (s′ ≠ s):

(1.2.95)

To write the continuum form we reintroduce the time difference between steps ∆t.

(1.2.96)

When the limit of ∆t → 0 is meaningful, it is possible to make the change to theequation

(1.2.97)

Where the ratio P(s | s′)/∆t has been replaced by the rate of transition R(s | s′).Eq.(1.2.97) is called the Master equation and we can consider Eq.(1.2.95) as the dis-crete time analog.

The Master equation has a simple interpretation: The rate of change of the prob-ability of a particular state is the total rate at which probability is being added into thatstate from all other states,minus the total rate at which probability is leaving the state.Probability is acting like a fluid that is flowing to or from a particular state and is be-ing conserved,as it must be. Eq.(1.2.97) is very much like the continuity equation offluid flow, where the density of the fluid at a particular place changes according to howmuch is flowing to that location or from it.We will construct and use the Master equa-tion approach to discuss the problem of relaxation in activated processes inSection 1.4.

˙ P (s ,t) = R(s | ′ s )P( ′ s ;t) − R( ′ s |s)P(s ;t)( )′ s ≠s

∑

P(s ,t) − P(s ;t − ∆t)

∆t=

P(s | ′ s )

∆tP( ′ s ;t − ∆t) −

P( ′ s |s)

∆tP(s ;t − ∆t)

′ s ≠s∑

P(s ,t) − P(s ;t −1) = P(s | ′ s )P( ′ s ;t −1) − P( ′ s | s)P(s ;t −1)( )′ s ≠s

∑

P(s ;t) = P(s ;t −1) + P(s | ′ s )P( ′ s ;t − 1)′ s ≠s

∑ + 1 − P( ′ s | s)′ s ≠s

∑

P(s ;t −1) − P(s ;t −1)

P(s ;t) = P(s ;t −1) + P(s | ′ s )P( ′ s ;t − 1)′ s ≠s

∑ + P(s |s)P(s;t −1) − P(s ;t −1)



01adBARYAM_29412 3/10/02 10:16 AM Page 57

Thermodynamics and Statistical Mechanics

The field of thermodynamics is easiest to understand in the context of Newtonianmechanics. Newtonian mechanics describes the effect of forces on objects.Thermodynamics describes the effect of heat transfer on objects. When heat is trans-ferred,the temperature of an object changes.Temperature and heat are also intimatelyrelated to energy. A hot gas in a piston has a high pressure and it can do mechanicalwork by applying a force to a piston. By Newtonian mechanics the work is directly re-lated to a transfer of energy. The laws of Newtonian mechanics are simplest to de-scribe using the abstract concept of a point object with mass but no internal struc-ture. The analogous abstraction for thermodynamic laws are materials that are inequilibrium and (even better) are homogeneous. It turns out that even the descrip-tion of the equilibrium properties of materials is so rich and varied that this is still aprimary focus of active research today.

Statistical mechanics begins as an effort to explain the laws of thermodynamicsby considering the microscopic application of Newton’s laws. Microscopically, thetemperature of a gas is found to be related to the kinetic motion of the gas molecules.Heat transfer is the transfer of Newtonian energy from one object to another. The sta-tistical treatment of the many particles of a material, with a key set of assumptions,reveals that thermodynamic laws are a natural consequence of many microscopic par-ticles interacting with each other. Our studies of complex systems will lead us to dis-cuss the properties of systems composed of many interacting parts. The concepts andtools of statistical mechanics will play an important role in these studies, as will thelaws of thermodynamics that emerge from them. Thermodynamics also begins toteach us how to think about systems interacting with each other.

1.3.1 ThermodynamicsThermodynamics describes macroscopic pieces of material in equilibrium in terms ofmacroscopic parameters. Thermodynamics was developed as a result of experi-ence/experiment and,like Newton’s laws,is to be understood as a set of self-consistentdefinitions and equations. As with Newtonian mechanics, where in its simplest formobjects are point particles and friction is ignored,the discussion assumes an idealiza-tion that is directly experienced only in special circumstances. However, the funda-mental laws, once understood,can be widely applied. The central quantities that areto be defined and related are the energy U, temperature T, entropy S, pressure P, themass (which we write as the number of particles) N, and volume V. For magnets,thequantities should include the magnetization M, and the magnetic field H. Othermacroscopic quantities that are relevant may be added as necessary within the frame-work developed by thermodynamics.Like Newtonian mechanics,a key aspect of ther-modynamics is to understand how systems can be acted upon or can act upon eachother. In addition to the quantities that describe the state of a system, there are twoquantities that describe actions that may be made on a system to change its state: workand heat transfer.

1.3



01adBARYAM_29412 3/10/02 10:16 AM Page 58

The equations that relate the macroscopic quantities are known as the zeroth,first and second laws of thermodynamics. Much of the difficulty in understandingthermodynamics arises from the way the entropy appears as an essential but counter-intuitive quantity. It is more easily understood in the context of a statistical treatmentincluded below. A second source of difficulty is that even a seemingly simple materialsystem, such as a piece of metal in a room, is actually quite complicated thermody-namically. Under usual circumstances the metal is not in equilibrium but is emittinga vapor of its own atoms.A thermodynamic treatment of the metal requires consid-eration not only of the metal but also the vapor and even the air that applies a pres-sure upon the metal. It is therefore generally simplest to consider the thermodynam-ics of a gas confined in a closed (and inert) chamb er as a model thermodynamicsystem. We will discuss this example in detail in Question 1.3.1. The translational mo-tion of the whole system, treated by Newtonian mechanics, is ignored.

We begin by defining the concept of equilibrium.A system left in isolation for along enough time achieves a macroscopic state that does not vary in time.The systemin an unchanging state is said to be in equilibrium. Thermodynamics also relies upona particular type of equilibrium known as thermal equilibrium. Two systems can bebrought together in such a way that they interact only by transferring heat from oneto the other. The systems are said to be in thermal contact. An example would be twogases separated by a fixed but thermally conducting wall.After a long enough time thesystem composed of the combination of the two original systems will be in equilib-rium. We say that the two systems are in thermal equilibrium with each other. We cangeneralize the definition of thermal equilibrium to include systems that are not incontact. We say that any two systems are in thermal equilibrium with each other ifthey do not change their (macroscopic) state when they are brought into thermal con-tact. Thermal equilibrium does not imply that the system is homogeneous, for exam-ple, the two gases may be at different pressures.

The zeroth law of thermodynamics states that if two systems are in thermal equi-librium with a third they are in thermal equilibrium with each other. This is not ob-vious without experience with macroscopic objects. The zeroth law implies that theinteraction that occurs during thermal contact is not specific to the materials,it is insome sense weak,and it matters not how many or how big are the systems that are incontact. It enables us to define the temperature T as a quantity which is the same forall systems in thermal equilibrium. A more specific definition of the temperaturemust wait till the second law of thermodynamics. We also define the concept of a ther-mal reservoir as a very large system such that any system that we are interested in,when brought into contact with the thermal reservoir, will change its state by trans-ferring heat to or from the reservoir until it is in equilibrium with the reservoir, butthe transfer of heat will not affect the temperature of the reservoir.

Quite basic to the formulation and assumptions of thermodynamics is that themacroscopic state of an isolated system in equilibrium is completely defined by aspecification of three parameters: energy, mass and volume (U,N,V). For magnets wemust add the magnetization M; we will leave this case for later. The confinement of

T he rmod yn a mi c s a nd s t a t is t i c a l m ec han i cs 59


01adBARYAM_29412 3/10/02 10:16 AM Page 59

the system to a volume V is understood to result from some form of containment.The state of a system can be characterized by the force per unit area—the pressureP—exerted by the system on the container or by the container on the system, whichare the same. Since in equilibrium a system is uniquely described by the three quan-tities (U,N,V), these determine all the other quantities, such as the pressure P andtemperature T. Strictly speaking, temperature and pressure are only defined for a sys-tem in equilibrium, while the quantities (U,N,V) have meaning both in and out ofequilibrium.

It is assumed that for a homogeneous material, changing the size of the system byadding more material in equilibrium at the same pressure and temperature changesthe mass,number of particles N, volume V and energy U, in direct proportion to eachother. Equivalently, it is assumed that cutting the system into smaller parts results ineach subpart retaining the same properties in proportion to each other (see Figs.1.3.1and 1.3.2). This means that these quantities are additive for different parts of a systemwhether isolated or in thermal contact or full equilibrium:

(1.3.1)

where indexes the parts of the system. This would not be true if the parts of the sys-tem were strongly interacting in such a way that the energy depended on the relativelocation of the parts. Properties such as (U,N,V) that are proportional to the size ofthe system are called extensive quantities. Intensive quantities are properties that donot change with the size of the system at a given pressure and temperature. The ratioof two extensive quantities is an intensive quantity. Examples are the particle densityN/V and the energy density U/V. The assumption of the existence of extensive and in-tensive quantities is also far from trivial, and corresponds to the intuition that for amacroscopic object,the local properties of the system do not depend on the size of thesystem. Thus a material may be cut into two parts, or a small part may be separatedfrom a large part, without affecting its local properties.

The simplest thermodynamic systems are homogeneous ones,like a gas in an in-ert container. However we can also use Eq.(1.3.1) for an inhomogeneous system. Forexample,a sealed container with water inside will reach a state where both water andvapor are in equilibrium with each other. The use of intensive quantities and the pro-portionality of extensive quantities to each other applies only within a single phase—a single homogeneous part of the system, either water or vapor. However, the addi-tivity of extensive quantities in Eq. (1.3.1) still applies to the whole system. Ahomogeneous as well as a heterogeneous system may contain different chemicalspecies. In this case the quantity N is replaced by the number of each chemical speciesNi and the first line of Eq.(1.3.1) may be replaced by a similar equation for each species.

U = U∑

V = V∑

N = N∑



01adBARYAM_29412 3/10/02 10:16 AM Page 60

T he rmod yn a mi c s a nd s t a t is t i ca l m echa n i c s 61


Mass NVolume VEnergy U

Mass (1–α)NVolume (1–α)VEnergy (1–α)U

Mass αNVolume αVEnergy αU

Figure 1.3.1 Thermody-namics considers macro-scopic materials. A basicassumption is that cut-ting a system into twoparts will not affect thelocal properties of thematerial and that the en-ergy U, mass (or numberof particles) N and thevolume V will be dividedin the same proportion.The process of separationis assumed to leave thematerials under the sameconditions of pressureand temperature. ❚

Figure 1.3.2 The assumption that the local properties of a system are unaffected by subdi-vision applies also to the case where a small part of a much larger system is removed. The lo-cal properties, both of the small system and of the large system are assumed to remain un-changed. Even though the small system is much smaller than the original system, the smallsystem is understood to be a macroscopic piece of material. Thus it retains the same localproperties it had as part of the larger system. ❚

The first law of thermodynamics describes how the energy of a system maychange. The energy of an isolated system is conserved. There are two macroscopicprocesses that can change the energy of a system when the number of particles is fixed.

01adBARYAM_29412 3/10/02 10:16 AM Page 61

The first is work,in the sense of applying a force over a distance, such as driving a pis-ton that compresses a gas. The second is heat transfer. This may be written as:

dU = q + w (1.3.2)

where q is the heat transfer into the system, w is the work done on the system and Uis the internal energy of the system. The differential d signifies the incremental changein the quantity U as a result of the incremental process of heat transfer and work. Thework performed on a gas (or other system) is the force times the distance applied Fdx,where we write F as the magnitude of the force and dx as an incremental distance.Since the force is the pressure times the area F = PA, the work is equal to the pressuretimes the volume change or:

w = −PAdx = −PdV (1.3.3)

The negative sign arises because positive work on the system,increasing the system’senergy, occurs when the volume change is negative. Pressure is defined to be positive.

If two systems act upon each other, then the energy transferred consists of boththe work and heat t ransfer. Each of these are separately equal in magnitude and op-posite in sign:

dU1 = q21 + w21

dU2 = q12 + w12(1.3.4)

q12 = −q21

w12 = −w21

where q21 is the heat transfer from system 2 to system 1,and w21 is the work performedby system 2 on system 1. q12 and w12 are similarly defined. The last line of Eq.(1.3.4)follows from Newton’s third law. The other equations follow from setting dU = 0 (Eq.(1.3.2)) for the total system, composed of both of the systems acting upon each other.

The second law of thermodynamics g iven in the following few paragraphs de-scribes a few key aspects of the relationship of the equilibrium state with nonequilib-rium states.The statement of the second law is essentially a definition and descriptionof properties of the entropy. Entropy enables us to describe the process of approachto equilibrium. In the natural course of events,any system in isolation will change itsstate toward equilibrium. A system which is not in equilibrium must therefore un-dergo an irreversible process leading to equilibrium. The process is irreversible be-cause the reverse process would take us away from equilibrium, which is impossiblefor a macroscopic system. Reversible change can occur if the state of a system in equi-librium is changed by transfer of heat or by work in such a way (slowly) that it alwaysremains in equilibrium.

For every macroscopic state of a system (not necessarily in equilibrium) there ex-ists a quantity S called the entropy of the system. The change in S is positive for anynatural process (change toward equilibrium) of an isolated system

(1.3.5) dS ≥ 0



01adBARYAM_29412 3/10/02 10:16 AM Page 62

For an isolated system, equality holds only in equilibrium when no change occurs.The converse is also true—any possible change that increases S is a natural process.Therefore, for an isolated system S achieves its maximum value for the equilibriumstate.

The second property of the entropy describes how it is affected by the processesof work and heat transfer during reversible processes. The entropy is affected only byheat transfer and not by work. If we only perform work and do not transfer heat theentropy is constant. Such processes where q = 0 are called adiabatic processes. For adi-abatic processes dS = 0.

The third property of the entropy is that it is extensive:

(1.3.6)

Since in equilibrium the state of the system is defined by the macroscopic quan-tities (U,N,V), S is a function of them—S = S(U,N,V)—in equilibrium. The fourthproperty of the entropy is that if we keep the size of the system constant by fixing boththe number of particles N and the volume V, then the change in entropy S with in-creasing energy U is always positive:

(1.3.7)

where the subscripts denote the (values of the) constant quantities.Because of this wecan also invert the function S = S(U,N,V) to obtain the energy U in terms of S, N andV: U = U(S,N,V).

Finally, we mention that the zero of the entropy is arbitrary in classical treat-ments. The zero of entropy does attain significance in statistical treatments that in-clude quantum effects.

Having described the properties of the entropy for a single system, we can nowreconsider the problem of two interacting systems. Since the entropy describes theprocess of equilibration, we consider the process by which two systems equilibratethermally. According to the zeroth law, when the two systems are in equilibrium theyare at the same temperature. The two systems are assumed to be isolated from anyother influence,so that together they form an isolated system with energy Ut and en-tropy St . Each of the subsystems is itself in equilibrium, but they are at different tem-peratures initially, and therefore heat is t ransferred to achieve equilibrium. The heattransfer is assumed to be performed in a reversible fashion—slowly. The two subsys-tems are also assumed to have a fixed number of particles N1,N2 and volume V1,V2.No work is done, only heat is transferred. The energies of the two systems U1 and U2

and entropies S1 and S2 are not fixed.The transfer of heat results in a transfer of energy between the two systems ac-

cording to Eq. (1.3.4), since the total energy

Ut = U1 + U2 (1.3.8)

S

U

N ,V

> 0

S = S∑

T he r mody n ami c s an d st a t i s t i c a l m ec ha n i c s 63


01adBARYAM_29412 3/10/02 10:16 AM Page 63

is conserved, we have

dUt = dU1 + dU2 = 0 (1.3.9)

We will consider the processes of equilibration twice. The first time we will iden-tify the equilibrium condition and the second time we will describe the equilibration.At equilibrium the entropy of the whole system is maximized. Variation of the en-tropy with respect to any internal parameter will give zero at equilibrium. We can con-sider the change in the entropy of the system as a function of how much of the energyis allocated to the first system:

(1.3.10)

in equilibrium. Since the total energy is fixed, using Eq. (1.3.9) we have:

(1.3.11)

or

(1.3.12)

in equilibrium. By the definition of the temperature,any function of the derivative ofthe entropy with respect to energy could be used as the temperature. It is conventionalto define the temperature T using:

(1.3.13)

This definition corresponds to the Kelvin temperature scale.The units of temperaturealso define the units of the entropy. This definition has the advantage that heat alwaysflows from the system at higher temperature to the system at lower temperature.

To prove this last statement, consider a natural small transfer of heat from onesystem to the other. The transfer must result in the two systems raising their collectiveentropy:

dSt = dS1 + dS2 ≥ 0 (1.3.14)

We rewrite the change in entropy of each system in terms of the change in energy. Werecall that N and V are fixed for each of the two systems and the entropy is a functiononly of the three macroscopic parameters (U,N,V). The change in S for each systemmay be written as:

(1.3.15)

dS1 = S

U

N 1,V1

dU1

dS2 = S

U

N 2,V2

dU2

1

T=

dS

dU

N ,V

dS1

dU1

=dS 2

dU2

dSt

dU1

=dS1

dU1

−dS2

dU2

= 0

dSt

dU1

=dS1

dU1

+dS2

dU1

= 0



01adBARYAM_29412 3/10/02 10:16 AM Page 64

to arrive at:

(1.3.16)

or using Eq. (1.3.9) and the definition of the temperature (Eq. (1.3.13)) we have:

(1.3.17)

or:

(T2 −T1) dU1 ≥ 0 (1.3.18)

This implies that a natural process of heat transfer results in the energy of the first sys-tem increasing (dU1 > 0) if the temperature of the second system is greater than thefirst ((T2 − T1) > 0), or conversely, ifthe temperature of the second system is less thanthe temperature of the first.

Using the definition of temperature, we can also rewrite the expression for thechange in the energy of a system due to heat transfer or work, Eq.(1.3.2). The new ex-pression is restricted to reversible processes. As in Eq. (1.3.2), N is still fixed.Considering only reversible processes means we consider only equilibrium states ofthe system, so we can write the energy as a function of the entropy U = U(S,N,V).Since a reversible process changes the entropy and volume while keeping this functionvalid, we can write the change in energy for a reversible process as

(1.3.19)

The first term reflects the effect of a change in entropy and the second reflects thechange in volume. The change in entropy is related to heat transfer but not to work.If work is done and no heat is transferred,then the first term is zero. Comparing thesecond term to Eq. (1.3.2) we find

(1.3.20)

and the incremental change in energy for a reversible process can be written:

dU = TdS − PdV (1.3.21)

This relationship enables us to make direct experimental measurements of entropychanges. The work done on a system, in a reversible or irreversible process, changesthe energy of the system by a known amount. This energy can then be extracted in areversible process in the form of heat. When the system returns to its original state,we

P = −U

V

N ,S

dU = U

S

N ,V

dS + U

V

N ,S

dV

= TdS + U

V

N ,S

dV

1

T1

−

1

T2

dU1 ≥ 0

S

U

N 1,V1

dU1 +S

U

N 2 ,V2

dU2 ≥ 0

T he rmod yn a mi c s a nd s t a t is t i c a l m ec han i cs 65


01adBARYAM_29412 3/10/02 10:16 AM Page 65

can quantify the amount of heat transferred as a form of energy. Measured heat trans-fer can then be related to entropy changes using q = TdS.

Our treatment of the fundamentals of thermodynamics was brief and does notcontain the many applications necessary for a detailed understanding. The propertiesof S that we have described are sufficient to provide a systematic treatment of the ther-modynamics of macroscopic bodies. However, the entropy is more understandablefrom a microscopic (statistical) description of matter. In the next section we intro-duce the statistical treatment that enables contact between a microscopic picture andthe macroscopic thermodynamic treatment of matter. We will use it to give micro-scopic meaning to the entropy and temperature.Once we have developed the micro-scopic picture we will discuss two applications. The first application, the ideal gas, isdiscussed in Section 1.3.3. The discussion of the second application,the Ising modelof magnetic systems, is postponed to Section 1.6.

1.3.2 The macroscopic state from microscopic statisticsIn order to develop a microscopic understanding of the macroscopic properties ofmatter we must begin by restating the nature of the systems that thermodynamics de-scribes. Even when developing a microscopic picture, the thermodynamic assump-tions are relied upon as guides. Macroscopic systems are assumed to have an extremelylarge number N of individual particles (e.g.,at a scale of 1023) in a volume V. Becausethe size of these systems is so large,they are typically investigated by considering thelimit of N →∞ and V → ∞, while the density n = N /V remains constant. This is calledthe thermodynamic limit. Various properties of the system are separated into exten-sive and intensive quantities. Extensive quantities are proportional to the size of thesystem. Intensive quantities are independent of the size of the system. This reflects theintuition that local properties of a macroscopic object do not depend on the size ofthe system. As in Figs.1.3.1 and 1.3.2, the system may be cut into two parts, or a smallpart may be separated from a large part without affecting its local properties.

The total energy U of an isolated system in equilibrium, along with the numberof particles N and volume V, defines the macroscopic state (macrostate) of an isolatedsystem in equilibrium. Microscopically, the energy of the system E is given in classicalmechanics in terms of the complete specification of the individual particle positions,momenta and interaction potentials. Together these define the microscopic state (mi-crostate) of the system. The microstate is defined differently in quantum mechanicsbut similar considerations apply. When we describe the system microscopically we usethe notation E rather than U to describe the energy. The reason for this difference isthat macroscopically the energy U has some degree of fuzziness in its definition,though the degree of fuzziness will not enter into our considerations. Moreover, Umay also be used to describe the energy of a system that is in thermal equilibrium withanother system. However, thinking microscopically, the energy of such a system is notwell defined,since thermal contact allows the exchange of energy between the two sys-tems. We should also distinguish between the microscopic and macroscopic conceptsof the number of particles and the volume,but since we will not make use of this dis-tinction, we will not do so.



01adBARYAM_29412 3/10/02 10:16 AM Page 66

There are many possible microstates that correspond to a particular macrostateof the system specified only by U,N,V. We now make a key assumption of statisticalmechanics—that all of the possible microstates of the system occur with equal prob-ability. The number of these microstates (U,N,V), which by definition depends onthe macroscopic parameters, turns out to be central to statistical mechanics and is di-rectly related to the entropy. Thus it determines many of the thermodynamic proper-ties of the system, and can be discussed even though we are not always able to obtainit explicitly.

We consider again the problem of interacting systems. As before, we consider twosystems (Fig. 1.3.3) that are in equilibrium separately, with state variables (U1,N1,V1)and (U2,N2,V2). The systems have a number of microstates 1(U1,N1,V1) and

2(U2,N2,V2) respectively. It is not necessary that the two systems be formed of thesame material or have the same functional form of (U,N,V), so the function isalso labeled by the system index. The two systems interact in a limited way, so that theycan exchange only energy. The number of particles and volume of each system re-mains fixed. Conservation of energy requires that the total energy Ut = U1 + U2 re-mains fixed, but energy may be transferred from one system to the other. As before,our objective is to identify when energy transfer stops and equilibrium is reached.

Consider the number of microstates of the whole system t . This number is afunction not only of the total energy of the system but also of how the energy is allo-cated between the systems. So, we write t(U1,U2), and we assume that at any timethe energy of each of the two systems is well defined. Moreover, the interaction be-tween the two systems is sufficiently weak so that the number of states of each system

T he rmo dy na mi cs a n d s t a t i s t i c a l m ec ha n i c s 67


U1,N1,V1

1(U1,N1,V1)

S1(U1,N1,V1)

U2,N2,V2

2(U2,N2,V2)

S2(U2,N2,V2)

U t ,Nt ,Vt

t (U t ,Nt ,Vt )

S t (U t ,Nt ,Vt )

F i g u re 1.3.3 I l l u s t rationof a system formed outof two parts. The textdiscusses this systemwhen energy is trans-ferred from one part tothe other. The transfer ofenergy on a microscopicscale is equivalent tothe transfer of heat on amacroscopic scale, sincethe two systems are notallowed to change theirnumber of particles ortheir volume. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 67

may be counted independently. Then the total number of microstates is the productof the number of microstates of each of the two systems separately.

t(U1,U2) = 1(U1) 2(U2) (1.3.22)

where we have dropped the arguments N and V, since they are fixed throughout thisdiscussion. When energy is transferred,the number of microstates of each of the twosystems is changed. When will the transfer of energy stop? Left on its own,the systemwill evolve until it reaches the most probable separation of energy. Since any particu-lar state is equally likely, the most probable separation of energy is the separation thatgives rise to the greatest possible number of states. When the number of particles islarge,the greatest number of states corresponding to a particular energy separation ismuch larger than the number of states corresponding to any other possible separa-tion. Thus any other possibility is completely negligible. No matter when we look atthe system, it will be in a state with the most likely separation of the energy. For amacroscopic system,it is impossible for a spontaneous transfer of energy to occur thatmoves the system away from equilibrium.

The last paragraph implies that the transfer of energy from one system to theother stops when t reaches its maximum value. Since Ut = U1 + U2 we can find themaximum value of the number of microstates using:

(1.3.23)

or

(1.3.24)

The equivalence of these quantities is analogous to the equivalence of the tempera-ture of the two systems in equilibrium. Since the derivatives in the last equation areperformed at constant N and V, it appears, by analogy to Eq. (1.3.12), that we canidentify the entropy as:

S = k ln( (E,N,V)). (1.3.25)

The constant k, known as the Boltzmann constant, is needed to ensure correspon-dence of the microscopic counting of states with the macroscopic units of the entropy,as defined by the relationship of Eq. (1.3.13), once the units of temperature and en-ergy are defined.

The entropy as defined by Eq.(1.3.25) can be shown to satisfy all of the proper-ties of the thermodynamic entropy in the last section. We have argued that an isolated

1

1(U1)1(U1)

U1

= 1

2(U2)2(U2)

U 2

ln 1(U1)

U1

= ln 2(U2)

U 2

t (U1 ,Ut −U1)

U1

= 0 = 1(U1)

U12(U t −U1) + 1(U1) 2(U t −U1)

U1

0 = 1(U1)

U12(U 2)− 1(U1) 2(U2)

U 2

68 I n t r od uc t i o n a nd P r e l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:16 AM Page 68

system evolves its macrostate in such a way that it maximizes the number of microstatesthat correspond to the macrostate. By Eq. (1.3.25), this is the same as the first prop-erty of the entropy in Eq. (1.3.5), the maximization of the entropy in equilibrium.

Interestingly, demonstrating the second property of the entropy, that it does notchange during an adiabatic process, requires further formal developments relatingentropy to information that will be discussed in Sections 1.7 and 1.8.We will connectthe two discussions and thus be able to demonstrate the second property of the entropyin Chapter 8 (Section 8.3.2).

The extensive property of the entropy follows from Eq.(1.3.22). This also meansthat the number of states at a particular energy grows exponentially with the size ofthe system. More properly, we can say that experimental observation that the entropyis extensive suggests that the interaction between macroscopic materials, or parts of asingle macroscopic material, is such that the microstates of each part of the systemmay be enumerated independently.

The number of microstates can be shown by simple examples to increase with theenergy of the system. This corresponds to Eq.(1.3.7). There are also examples wherethis can be violated, though this will not enter into our discussions.

We consider next a second example of interacting systems that enables us to eval-uate the meaning of a system in equilibrium with a reservoir at a temperature T. Weconsider a small part of a much larger system (Fig. 1.3.4). No assumption is necessaryregarding the size of the small system; it may be either microscopic or macroscopic.Because of the contact of the small system with the large system, its energy is not

T he r mody n a mi c s a nd s t a t is t i c a l m ec han i cs 69


Ut,Nt,Vt,T

E({x ,p}),N ,V

U ,N ,V

Figure 1.3.4 In order to understand temperature we consider a closed system composed ofa large and small system, or equivalently a small system which is part of a much larger sys-tem. The larger system serves as a thermal reservoir transferring energy to and from the smallsystem without affecting its own temperature. A microscopic description of this process interms of a single microscopic state of the small system leads to the Boltzmann probability.An analysis in terms of the macroscopic state of the small system leads to the principle ofminimization of the free energy to obtain the equilibrium state of a system at a fixed tem-perature. This principle replaces the principle of maximization of the entropy, which only ap-plies for a closed system. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 69

always the same.Energy will be transferred back and forth between the small and largesystems. The essential assumption is that the contact between the large and small sys-tem does not affect any other aspect of the description of the small system. This meansthat the small system is in some sense independent of the large system, despite the en-ergy transfer. This is true if the small system is itself macroscopic, but it may also bevalid for certain microscopic systems. We also assume that the small system and thelarge system have fixed numbers of particles and volumes.

Our obj ective is to con s i der the prob a bi l i ty that a particular micro s t a te of t h es m a ll sys tem wi ll be re a l i zed . A micro s t a te is iden ti f i ed by all of the micro s copic para-m eters nece s s a ry to com p l etely define this state . We use the notati on {x , p} to den o tethese coord i n a te s . The prob a bi l i ty that this particular state wi ll be re a l i zed is given bythe fracti on of s t a tes of the whole sys tem for wh i ch the small sys tem attains this state .Because there is on ly one su ch state for the small sys tem , the prob a bi l i ty that this statewi ll be re a l i zed is given by (proporti onal to) a count of the nu m ber of s t a tes of the re s tof the sys tem . Si n ce the large sys tem is mac ro s cop i c , we can count this nu m ber by us-ing the mac ro s copic ex pre s s i on for the nu m ber of s t a tes of the large sys tem :

P({x, p}) ∝ R(Ut − E({x, p}),Nt − N,Vt − V) (1.3.26)

where E({x,p}),N,V are the energy, number of particles and volume of the micro-scopic system respectively. E({x,p})is a function of the microscopic parameters {x,p}.Ut ,Nt ,Vt are the energy, number of particles and volume of the whole system,includ-ing both the small and large systems. R is the entropy of the large subsystem (reser-voir). Since the number of states generally grows faster than linearly as a function ofthe energy, we use a Taylor expansion of its logarithm (or equivalently a Taylor ex-pansion of the entropy) to find

where we have not expanded in the number of particles and the volume because theyare unchanging. We take only the first term in the expansion, because the size of thesmall system is assumed to be much smaller than the size of the whole system.Exponentiating gives the relative probability of this particular microscopic state.

R(Ut − E({x,p}),Nt − N,Vt − V) = R(Ut ,Nt − N,Vt − V)e−E({x,p})/kT (1.3.28)

The probability of this particular state must be normalized so that the sum over allstates is one.Since we are normalizing the probability anyway, the constant coefficientdoes not affect the result. This gives us the Boltzmann probability distribution:

ln R(U t − E({x, p}),N t − N ,Vt −V )

= ln R(U t ,Nt − N,Vt −V ) +ln R(Ut ,N t − N ,Vt −V )

Et

N t ,Vt

(−E({x ,p}))

= ln R(Ut ,N t − N ,Vt −V ) + 1

kT(−E({x ,p}))



(1.3.27)

01adBARYAM_29412 3/10/02 10:16 AM Page 70

(1.3.29)

Eq. (1.3.29) is independent of the states of the large system and depends only on themicroscopic description of the states of the small system. It is this expression whichgenerally provides the most convenient starting point for a connection between themicroscopic description of a system and macroscopic thermodynamics. It identifiesthe probability that a particular microscopic state will be realized when the system hasa well-defined temperature T. In this way it also provides a microscopic meaning tothe macroscopic temperature T. It is emphasized that Eq.(1.3.29) describes both mi-croscopic and macroscopic systems in equilibrium at a temperature T.

The probability of occurrence of a particular state should be related to the de-scription of a system in terms of an ensemble. We have found by Eq. (1.3.29) that asystem in thermal equilibrium at a temperature T is represented by an ensemble thatis formed by taking each of the states in proportion to its Boltzmann probability. Thisensemble is known as the canonical ensemble. The canonical ensemble should becontrasted with the assumption that each state has equal probability for isolated sys-tems at a particular energy. The ensemble of fixed energy and equal a priori proba-bility is known as the microcanonical ensemble. The canonical ensemble is both eas-ier to discuss analytically and easier to connect with the physical world. It will begenerally assumed in what follows.

We can use the Boltzmann probability and the definition of the canonical en-semble to obtain all of the thermodynamic quantities. The macroscopic energy isgiven by the average over the microscopic energy using:

(1.3.30)

For a macroscopic system,the average value of the energy will always be observed inany specific measurement, despite the Boltzmann probability that allows all energies.This is because the number of states of the system rises rapidly with the energy. Thisrapid growth and the exponential decrease of the probability with the energy resultsin a sharp peak in the probability distribution as a function of energy. The sharp peakin the probability distribution means that the probability of any other energy is neg-ligible. This is discussed below in Question 1.3.1.

For an isolated macroscopic system, we were able to identify the equilibrium statefrom among other states of the system using the principle of the maximization of theentropy. There is a similar procedure for a macroscopic system in contact with a ther-mal reservoir at a fixed temperature T. The important point to recognize is that whenwe had a closed system,the energy was fixed. Now, however, the objective becomes toidentify the energy at equilibrium. Of course, the energy is given by the average in

U =1

ZE({x , p})e −E({x ,p})/ kT

{x ,p}

∑

P({x, p}) =1

Ze −E({x ,p})/ kT

Z = e −E({ x,p}) /kT

{x ,p}∑

T he rmo dy na mi cs a n d st a t i s t i c a l m ec ha n i c s 71


01adBARYAM_29412 3/10/02 10:16 AM Page 71

Eq.(1.3.30). However, to generalize the concept of maximizing the entropy, it is sim-plest to reconsider the problem of the system in contact with the reservoir when thesmall system is also macroscopic.

Instead of considering the probability of a particular microstate of well-definedenergy E, we consider the probability of a macroscopic state of the system with an en-ergy U. In this case, we find the equilibrium state of the system by maximizing thenumber of states of the whole system, or alternatively of the entropy:

(1.3.31)

To find the equilibrium state,we must maximize this expression for the entropy of thewhole system. We can again ignore the constant second term. This leaves us withquantities that are only characterizing the small system we are interested in, and thetemperature of the reservoir. Thus we can find the equilibrium state by maximizingthe quantity

S − U/T (1.3.32)

It is conventional to rewrite this and, rather than maximizing the function in Eq.(1.3.32), to minimize the function known as the free energy:

F = U − TS (1.3.33)

This suggests a simple physical significance of the process of change toward equilib-rium. At a fixed temperature, the system seeks to minimize its energy and maximizeits entropy at the same time. The relative importance of the entropy compared to theenergy is set by the temperature. For high temperature, the entropy becomes moredominant, and the energy rises in order to increase the entropy. At low temperature,the energy becomes more dominant, and the energy is lowered at the expense of theentropy. This is the precise statement of the observation that “everything flows down-hill.” The energy entropy competition is a balance that is rightly considered as one ofthe most basic of physical phenomena.

We can obtain a microscopic expression for the free energy by an exercise that be-gins from a microscopic expression for the entropy:

(1.3.34)

The su m m a ti on is over all micro s copic state s . The delta functi on is 1 on ly wh en E({x, p}) = U. Thus the sum counts all of the micro s copic states with en er gy U. S tri ct lys pe a k i n g, the f u n cti on is assu m ed to be sligh t ly “f u z z y,” so that it gives 1 wh en E({x,p}) differs from U by a small amount on a mac ro s copic scale, but by a large amountin terms of the differen ces bet ween en er gies of m i c ro s t a te s . We can then wri te

S = k ln( ) = k ln E x,p{ }( ),U{x ,p}

∑

ln (U, N ,V ) + ln R(Ut −U ,Nt − N ,Vt −V)

= S(U ,N ,V )/k + SR(U t −U ,N t − N ,Vt −V )/k

= S(U,N ,V )/k + SR(U t ,N t − N,Vt −V )/k + 1

kT(−U)



01adBARYAM_29412 3/10/02 10:16 AM Page 72

(1.3.35)

Let us compare the sum in the logarithm with the expression for Z in Eq.(1.3.29). Wewill argue that they are the same. This discussion hinges on the rapid increase in thenumber of states as the energy increases. Because of this rapid growth,the value of Zin Eq.(1.3.29) actually comes from only a narrow region of energy. We know from theexpression for the energy average, Eq.(1.3.30),that this narrow region of energy mustbe at the energy U. This implies that for all intents and purposes the quantity in thebrackets of Eq. (1.3.35) is equivalent to Z. This argument leads to the expression:

(1.3.36)

Comparing with Eq. (1.3.33) we have

F = −kTlnZ (1.3.37)

Since the Boltzmann probability is a convenient starting point,this expression for thefree energy is often simpler to evaluate than the expression for the entropy, Eq.(1.3.34).A calculation of the free energy using Eq.(1.3.37) provides contact betweenmicroscopic models and the macroscopic behavior of thermodynamic systems. TheBoltzmann normalization Z, which is directly related to the free energy is also knownas the partition function. We can obtain other thermodynamic quantities directlyfrom the free energy. For example, we rewrite the expression for the energy Eq.(1.3.30) as:

(1.3.38)

where we use the notation = 1/ kT. The entropy can be obtained using this expres-sion for the energy and Eq. (1.3.33) or (1.3.36).

Question 1.3.1 Consider the possibility that the macroscopic energy ofa system in contact with a thermal reservoir will deviate from its typical

value U. To do this expand the probability distribution of macroscopic en-ergies of a system in contact with a reservoir around this value. How largeare the deviations that occur?

Solution 1.3.1 We considered Eq.(1.3.31) in order to optimize the entropyand find the typical value of the energy U. We now consider it again to findthe distribution of probabilities of values of the energy around the value Usimilar to the way we discussed the distribution of microscopic states {x, p}in Eq.(1.3.27). To do this we distinguish between the observed value of the

U =1

ZE({x , p})e − E({x,p})

{x ,p}

∑ = −ln(Z)

=F

S =

U

T+ k lnZ

S = k ln( ) = k ln E x,p{ }( ),Ue −E x ,p{ }( ) / kTeU /kT

{x ,p}∑

= U

T+ k ln E x ,p{ }( ),Ue −E x ,p{ }( ) / kT

{x ,p}∑

T he rmod yn a mi c s a nd s t a t is t i c a l m echa n i c s 73


01adBARYAM_29412 3/10/02 10:16 AM Page 73

energy U ′ and U. Note that we consider U ′ to be a macroscopic energy,though the same derivation could be used to obtain the distribution of mi-croscopic energies. The probability of U ′ is given by:

(1.3.39)

In the latter form we ignore the fixed arguments N and V. We expand the log-arithm of this expression around the expected value of energy U:

(1.3.40)

where we have kept terms to second order. The first-order terms, which areof the form (1/kT)(U ′ − U), have opposite signs and therefore cancel. Thisimplies that the probability is a maximum at the expected energy U. The sec-ond derivative of the entropy can be evaluated using:

(1.3.41)

where CV is known as the specific heat at constant volume.For our purposes,its only relevant property is that it is an extensive quantity. We can obtain asimilar expression for the reservoir and define the reservoir specific heat CVR.Thus the probability is:

(1.3.42)

where we have left out the (constant) terms that do not depend on U ′.Because CV and CVR are extensive quantities and the reservoir is much big-ger than the small system, we can neglect 1/CVR compared to 1/CV. The re-sult is a Gaussian distribution (Eq. (1.2.39)) with a standard deviation

= T√kCV (1.3.43)

This describes the characteristic deviation of the energy U ′ from the averageor typical energy U. However, since CV is extensive, the square root meansthat the deviation is proportional to √N. Note that the result is consistentwith a random walk of N steps. So for a large system of N ∼ 1023 particles,thepossible deviation in the energy is smaller than the energy by a factor of (weare neglecting everything but the N dependence) 1012—i.e., it is unde-tectable. Thus the energy of a thermodynamic system is very well defined. ❚

1.3.3 Kinetic theory of gases and pressureIn the previous section, we described the microscopic analog of temperature and en-tropy. We assumed that the microscopic analog of energy was understood,and we de-

P( ′ U ) ∝e −(1/ 2kT2

)(1/CV +1/ CVR )(U − ′ U )2

≈ e −(1/ 2kT2)(1/CV )(U − ′ U )

2

d 2S(U)

dU 2=

d

dU

1

T= −

1

T2

1

dU /dT= −

1

T 2CV

S( ′ U ) +SR(U t − ′ U )

= S(U )/k + SR(Ut −U)/k + 1

2k

d 2S(U)

dU 2(U − ′ U )2 + 1

2k

d 2S(U t −U)

dUt2

(U − ′ U )2

P( ′ U ) ∝ ( ′ U , N ,V ) R(U t − ′ U ,N t − N ,Vt −V ) = e S( ′ U )/k +S R (U t − ′ U )/ k



01adBARYAM_29412 3/10/02 10:16 AM Page 74

veloped the concept of free energy and its microscopic analog. One quantity that wehave not discussed microscopically is the pressure. Pressure is a Newtonian concept—the force per unit area. For various reasons,it is helpful for us to consider the micro-scopic origin of pressure for the example of a simplified model of a gas called an idealgas. In Question 1.3.2 we use the ideal gas as an example of the thermodynamic andstatistical analysis of materials.

An ideal gas is composed of indistinguishable point particles with a mass m butwith no internal structure or size. The interaction between the particles is neglected,so that the energy is just their kinetic energy. The particles do interact with the wallsof the container in which the gas is confined. This interaction is simply that of reflec-tion—when the particle is incident on a wall, the component of its velocity perpen-dicular to the wall is reversed.Energy is conserved. This is in accordance with the ex-pectation from Newton’s laws for the collision of a small mass with a much largermass object.

To obtain an expression for the pressure, we must suffer with some notationalhazards,as the pressure P, probability of a particular velocity P(v) and momentum ofa particular particle pi are all designated by the letter P but with different case, argu-ments or subscripts.A bold letter F is used briefly for the force,and otherwise F is usedfor the free energy. We rely largely upon context to distinguish them. Since the objec-tive of using an established notation is to make contact with known concepts,this sit-uation is sometimes preferable to introducing a new notation.

Because of the absence of collisions between different particles of the gas, thereis no communication between them, and each of the particles bounces around thecontainer on its own course. The pressure on the container walls is given by the forceper unit area exerted on the walls,as illustrated in Fig. 1.3.5. The force is given by theaction of the wall on the gas that is needed to reverse the momenta of the incident par-ticles between t and t + ∆t :

(1.3.44)

where |F | is the magnitude of the force on the wall. The latter expression relates thepressure to the change in the momenta of incident particles per unit area of the wall.A is a small but still macroscopic area,so that this part of the wall is flat. Microscopicroughness of the surface is neglected. The change in velocity ∆vi of the particles dur-ing the time ∆t is zero for particles that are not incident on the wall. Particles that hitthe wall between t and t + ∆t are moving in the direction of the wall at time t and arenear enough to the wall to reach it during ∆t. Faster particles can reach the wall fromfarther away, but only the velocity perpendicular to the wall matters. Denoting this ve-locity component as v⊥, the maximum distance is v⊥∆t (see Fig. 1.3.5).

If the particles have velocity only perpendicular to the wall and no velocity par-allel to the wall,then we could count the incident particles as those in a volume Av⊥∆t.We can use the same expression even when particles have a velocity parallel to the sur-face, because the parallel velocity takes particles out of and into this volume equally.

P =

F

A=

1

A∆tm∆v i

i∑

T he rmod yn a mi c s a nd s t a t is t i c a l m ec ha n i c s 75


01adBARYAM_29412 3/10/02 10:16 AM Page 75

Another way to say this is that for a particular parallel velocity we count the particlesin a sheared box with the same height and base and therefore the same volume. Thetotal number of particles in the volume,(N / V)Av⊥∆t, is the volume times the density(N /V).

Within the volume Av⊥∆t, the number of particles that have the velocity v⊥ isgiven by the number of particles in this volume times the probability P(v⊥) that a par-ticle has its perpendicular velocity component equal to v⊥. Thus the number of par-

76 I n t r oduc t i o n an d P re l i m i n a r i e s


Figure 1.3.5 Illustration of a gas of ideal particles in a container near one of the walls.Particles incident on the wall are reflected, reversing their velocity perpendicular to the wall,and not affecting the other components of their velocity. The wall experiences a pressure dueto the collisions and applies the same pressure to the gas. To calculate the pressure we mustcount the number of particles in a unit of time ∆t with a particular perpendicular velocity v⊥that hit an area A. This is equivalent to counting the number of particles with the velocityv⊥ in the box shown with one of its sides of length ∆tv⊥. Particles with velocity v⊥ will hitthe wall if and only if they are in the box. The same volume of particles applies if the parti-cles also have a velocity parallel to the surface, since this just skews the box, as shown, leav-ing its height and base area the same. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 76

ticles incident on the wall with a particular velocity perpendicular to the wall v⊥ isgiven by

(1.3.45)

The total change in momentum is found by multiplying this by the change in mo-mentum of a single particle reflected by the collision, 2mv⊥, and integrating over allvelocities.

(1.3.46)

Divide this by A∆t to obtain the change in momentum per unit time per unit area,which is the pressure (Eq. (1.3.44)),

(1.3.47)

We rewrite this in terms of the average squared velocity perpendicular to the surface

(1.3.48)

where the equal probability of having positive and negative velocities enables us to ex-tend the integral to −∞ while eliminating the factor of two. We can rewrite Eq.(1.3.48)in terms of the average square magnitude of the total velocity. There are three com-ponents of the velocity (two parallel to the surface). The squares of the velocity com-ponents add to give the total velocity squared and the averages are equal:

< v2 > = < v⊥2 + v2

2 + v32 > = 3 < v⊥

2 > (1.3.49)

where v is the magnitude of the particle velocity. The pressure is:

(1.3.50)

Note that the wall does not influence the probability of having a particular velocitynearby. Eq. (1.3.50) is a microscopic expression for the pressure, which we can cal-culate using the Boltzmann probability from Eq. (1.3.29). We do this as part ofQuestion 1.3.2.

Question 1.3.2 Develop the statistical description of the ideal gas by ob-taining expressions for the thermodynamic quantities Z, F, U, S and P,

in terms of N, V, and T. For hints read the first three paragraphs of thesolution.

Solution 1.3.2 The primary task of statistics is counting. To treat the idealgas we must count the number of microscopic states to obtain the entropy,

P =

N

Vm

1

3< v

2 >

P =N

Vm 2 dv⊥

0

∞

∫ P(v⊥ )v⊥2

=N

Vm dv⊥

−∞

∞

∫ P(v⊥)v⊥2

=N

Vm < v⊥

2>

P =1

VN dv⊥

0

∞

∫ P(v⊥ )v⊥ (2mv⊥)

m∆v ii∑ =

1

VNA∆t dv⊥

0

∞

∫ P(v⊥ )v⊥ (2mv⊥)

N

VAP(v⊥ )v⊥ ∆t

T he rmo dy na mi cs a n d st a t i s t i c a l m ec ha n i c s 77


01adBARYAM_29412 3/10/02 10:16 AM Page 77

or sum over the Boltzmann probability to obtain Z and the free energy. Theideal gas presents us with two difficulties.The first is that each particle has acontinuum of possible locations. The second is that we must treat the parti-cles as microscopically indistinguishable. To solve the first problem, we haveto set some interval of position at which we will call a particle here differentfrom a particle there. Moreover, since a particle at any location may havemany different velocities, we must also choose a difference of velocities thatwill be considered as distinct.We define the interval of position to be ∆x andthe interval of momentum to be ∆p. In each spatial dimension,the positionsbetween x and x +∆x correspond to a single state,and the momenta betweenp and p + ∆p correspond to a single state. Thus we consider as one state ofthe system a particle which has position and momenta in a six-dimensionalbox of a size ∆x3∆p3. The size of this box enters only as a constant in classi-cal statistical mechanics, and we will not be concerned with its value.Quantum mechanics identifies it with ∆x3∆p3 = h3, where h is Planck’s con-stant, and for convenience we adopt this notation for the unit volume forcounting.

There is a subtle but important choice that we have made. We have cho-sen to make the counting intervals have a fixed width ∆p in the momentum.From classical mechanics,it is not entirely clear that we should make the in-tervals of fixed width in the momentum or, for example,make them fixed inthe energy ∆E. In the latter case we would count a single state between E andE +∆E. Since the energy is proportional to the square of the momentum,thiswould give a different counting. Quantum mechanics provides an unam-biguous answer that the momentum intervals are fixed.

To solve the problem of the indistinguishability of the particles, we mustremember every time we count the number of states of the system to divideby the number of possible ways there are to interchange the particles, whichis N !.

The energy of the ideal gas is given by the kinetic energy of all of theparticles:

(1.3.51)

where the velocity and momentum of a particle are three-dimensional vec-tors with magnitude vi and pi respectively. We start by calculating the parti-tion function (Boltzmann normalization) Z from Eq. (1.3.29)

(1.3.52)

where the integral is to be evaluated over all possible locations of each of theN particles of the system. We have also included the correction to over-

Z =1

N!{x ,p}

∑ e− p i

2

2mkTi =1

N

∑=

1

N!e

− p i2

2mkTi =1

N

∑ d 3xi d3pi

h3i =1

N

∏∫

E({x, p}) =1

2i =1

N

∑ mvi2 =

pi2

2mi =1

N

∑

78 I n t r oduc t i on an d P re l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:16 AM Page 78

counting, N !. Since the particles do not see each other, the energy is a sumover each particle energy. The integrals separate and we have:

(1.3.53)

The position integral gives the volume V, immediately giving the depen-dence of Z on this macroscopic quantity. The integral over momentum canbe evaluated giving:

and we have that

(1.3.55)

We could have simplified the integration by recognizing that each compo-nent of the momentum px,py and pz can be integrated separately, giving 3Nindependent one-dimensional integrals and leading more succinctly to theresult. The result can also be written in terms of a natural length (T) thatdepends on temperature (and mass):

(T) = (h2 / 2 mkT )1/2 (1.3.56)

(1.3.57)

From the partition function we obtain the free energy, making use ofSterling’s approximation (Eq. (1.2.36)):

F = kTN(lnN − 1) − kTN ln(V / (T )3) (1.3.58)

where we have neglected terms that grow less than linearly with N. Termsthat vary as ln(N) vanish on a macroscopic scale. In this form it might ap-pear that we have a problem,since the N ln(N) term from Sterling’s approx-imation to the factorial does not scale proportional to the size of the system,and F is an extensive quantity. However, we must also note the N ln(V) term,which we can combine with the N ln(N) term so that the extensive nature isapparent:

F = kTN[lnN (T)3/V) − 1] (1.3.59)

Z(V ,T ,N) =V N

N! (T)3N

Z(V ,T ,N) =

V N

N!2 mkT /h2

3N 2

e− p

2

2mkT d 3p∫ = 4 p 2dp0

∞

∫ e− p

2

2mkT = 4 (2mkT)3/ 2 y 2dy0

∞

∫ e − y 2

= 4 (2mkT)3/ 2 −a a=1

dy

0

∞

∫ e −ay2

= 4 (2mkT)3/ 2 −a a=1

1

2 a

= (2 mkT)3/ 2

Z =1

N!

1

h3e

−p 2

2mkT d 3xd 3p∫

N

T he r mo dy n ami c s an d st a t i s t i c a l m ec ha n i c s 79


(1.3.54)

01adBARYAM_29412 3/10/02 10:16 AM Page 79

It is interesting that the factor of N!,and thus the indistinguishability of par-ticles,is necessary for the free energy to be extensive. If the particles were dis-tinguishable,then cutting the system in two would result in a different count-ing, since we would lose the states corresponding to particles switching fromone part to the other. If we combined the two systems back together, therewould be an effect due to the mixing of the distinguishable particles(Question 1.3.3).

The energy may be obtained from Eq. (1.3.38) (any of the forms) as:

(1.3.60)

which provides an example of the equipartition the orem, which says thateach degree of freedom (position-momentum pair) of the system carrieskT / 2 of energy in equilibrium.Each of the three spatial coordinates of eachparticle is one degree of freedom.

The expression for the entropy (S = (U − F)/T)

S = kN[ln(V/N (T)3) + 5/2] (1.3.61)

shows that the entropy per particle S/N grows logarithmically with the vol-ume per particleV /N. Using the expression for U, it may be written in a formS(U,N,V).

Finally, the pressure may be obtained from Eq.(1.3.20), but we must becareful to keep N and S constant rather than T. We have

(1.3.62)

Taking the same derivative of the entropy Eq. (1.3.61) gives us (the deriva-tive of S with S fixed is zero):

(1.3.63)

Substituting, we obtain the ideal gas equation of state:

PV = NkT (1.3.64)

which we can also obtain from the microscopic expression for the pressure—Eq.(1.3.50). We describe two ways to do this.One way to obtain the pressurefrom the microscopic expression is to evaluate first the average of the energy

(1.3.65)

This may be substituted in to Eq. (1.3.60) to obtain

U = < E({x ,p}) > =

1

2i=1

N

∑ m < vi2 > = N

1

2m < v

2 >

0 = −1

V−

3

2

T

VN ,S

P = −U

VN ,S

= −3

2Nk

T

VN ,S

U =

3

2NkT



01adBARYAM_29412 3/10/02 10:16 AM Page 80

(1.3.66)

which may be substituted directly in to Eq. (1.3.50). Another way is to ob-tain the average squared velocity directly. In averaging the velocity, it doesn’tmatter which particle we choose. We choose the first particle:

(1.3.67)

where we have further chosen to average over only one of the components ofthe velocity of this particle and multiply by three. The denominator is thenormalization constant Z. Note that the factor 1/N !, due to the indistin-guishability of particles, appears in the numerator in any ensemble averageas well as in the denominator, and cancels. It does not affect the Boltzmannprobability when issues of distinguishability are not involved.

There are 6N integrals in the numerator and in the denominator of Eq.(1.3.67). All integrals factor into one-dimensional integrals.Each integral inthe numerator is the same as the corresponding one in the denominator, ex-cept for the one that involves the particular component of the velocity we areinterested in. We cancel all other integrals and obtain:

(1.3.68)

The integral is performed by the same technique as used in Eq.(1.3.54). Theresult is the same as by the other methods. ❚

Question 1.3.3 An insulated box is divided into two compartments by apartition. The two compartments contain two different ideal gases at the

same pressure P and temperature T. The first gas has N1 particles and the sec-ond has N2 particles. The partition is punctured. Calculate the resultingchange in thermodynamic parameters (N, V, U, P, S, T, F). What changes inthe analysis if the two gases are the same, i.e., if they are composed of thesame type of molecules?

Solution 1.3.3 By additivity the extrinsic properties of the whole systembefore the puncture are (Eq. (1.3.59)–Eq. (1.3.61)):

< v12 > = 3 < v1⊥

2 > = 3v1⊥

2 e− p1 ⊥

2

2mkT dp1⊥∫e

− p1⊥2

2mkT dp1⊥∫= 3(

2kT

m)

y 2e − y 2

dy∫e − y 2

dy∫= 3(

2kT

m)(

1

2)

< v12 > = 3 < v1⊥

2 > =

1

N!3v1⊥

2 e−

p i2

2mkTi =1

N∑ d3xid

3p i

h3i=1

N

∏∫

1N!

e− p i

2

2mkTi =1

N

∑ d 3xid3pi

h3i=1

N

∏∫

1

2m < v

2 > =3

2kT

T he rmo dy na mi cs a n d s t a t i s t i c a l m ec ha n i c s 81


01adBARYAM_29412 3/10/02 10:16 AM Page 81

(1.3.69)

The pressure is intrinsic, so before the puncture it is (Eq. (1.3.64)):

P0 = N1kT /V1 = N2kT /V2 (1.3.70)

After the puncture, the total energy remains the same, because thewhole system is isolated. Because the two gases do not interact with eachother even when they are mixed, their properties continue to add after thepuncture. However, each gas now occupies the whole volume, V1 + V2. Theexpression for the energy as a function of temperature remains the same,sothe temperature is also unchanged. The pressure in the container is now ad-ditive: it is the sum of the pressure of each of the gases:

P = N1kT /(V1 + V2) + N2kT /(V1 + V2) = P0 (1.3.71)

i.e., the pressure is unchanged as well.The only changes are in the entropy and the free energy. Because the two

gases do not interact with each other, as with other quantities, we can writethe total entropy as a sum over the entropy of each gas separately:

S = kN1[ln((V1 + V2)/N1 (T)3) + 5/2]

+ kN2[ln((V1 + V2)/N2 (T)3) + 5/2] (1.3.72)

= S0 + (N1 + N2)k ln(V1 + V2) − N1k ln(V1) − N2k ln(V2)

If we simplify to the case V1 = V2, we have S = S0 + (N1 + N2)k ln(2). Sincethe energy is unchanged, by the relationship of free energy and entropy(Eq. (1.3.33)) we have:

F = F0 − T(S − S0) (1.3.73)

If the two gases are composed of the same molecule,there is no changein thermodynamic parameters as a result of a puncture. Mathematically, thedifference is that we replace Eq. (1.3.72) with:

S = k(N1 + N2)[ln((V1 + V2)/(N1 + N2) (T)3) + 5/2] = S0 (1.3.74)

where this is equal to the original entropy because of the relationshipN1/V1 = N2 / V2 from Eq. (1.3.70). This example illustrates the effect of in-distinguishability. The entropy increases after the puncture when the gasesare different, but not when they are the same. ❚

Question 1.3.4 An ideal gas is in one compartment of a two-compartmentsealed and thermally insulated box. The compartment it is in has a vol-

ume V1. It has an energy U0 and a number of particles N0. The second com-

U 0 =U1 +U2 =3

2(N1 + N2)kT

V0 = V1 +V2

S0 = kN1[ln(V1 /N1 (T)3) + 5/2] +kN 2[ln(V2 / N2 (T)3) + 5/2]

F0 = kTN1[ln(N1 (T)3 /V1)− 1]+kTN 2[ln(N2 (T)3 /V2) −1]



01adBARYAM_29412 3/10/02 10:16 AM Page 82

partment has volume V2 and is empty. Write expressions for the changes inall thermodynamic parameters (N, V, U, P, S, T, F) if

a. the barrier between the two compartments is punctured and the gas ex-pands to fill the box.

b. the barri er is moved slowly, l i ke a piston , expanding the gas to fill the box .

Solution 1.3.4 Recognizing what is conserved simplifies the solution ofthis type of problem.

a. The energy U and the number of particles N are conserved. Sincethe volume change is given to us explicitly, the expressions for T(Eq. (1.3.60)), F (Eq. (1.3.59)), S (Eq. (1.3.61)), and P (Eq. (1.3.64)) interms of these quantities can be used.

N = N0

U = U0

V = V1 + V2 (1.3.75)

T = T0

F = kTN[ln(N (T)3 /(V1 + V2)) − 1] = F0 + kTN ln(V1 + V2))

S = kN[ln((V1 + V2) /N T)3) + 5/2] = S0 + kN ln((V1 + V2)/V1)

P = NkT / V = NkT/(V1 + V2) = P0V1 /(V1 + V2)

b. The process is reversible and no heat is transferred,thus it is adiabatic—the entropy is conserved. The number of particles is also conserved:

N = N0

S = S0

(1.3.76)

Our main task is to calculate the effect of the work done by the gas pres-sure on the piston. This causes the energy of the gas to decrease,and thetemperature decreases as well. One way to find the change in tempera-ture is to use the conservation of entropy, and Eq. (1.3.61), to obtainthat V / (T)3 is a constant and therefore:

T ∝ V-2/3 (1.3.77)

Thus the temperature is given by:

(1.3.78)

Since the temperature and energy are proportional to each other(Eq. (1.3.60)), similarly:

(1.3.79)

U = U0V1 +V2

V1

−2/ 3

T =T0V1 +V2

V1

−2/ 3



01adBARYAM_29412 3/10/02 10:16 AM Page 83

The free-energy expression in Eq. (1.3.59) changes only through thetemperature prefactor:

(1.3.80)

Finally, the pressure (Eq. (1.6.64)):

(1.3.81) ❚

The ideal gas illustrates the significance of the Boltzmann distribution. Considera single particle. We can treat it either as part of the large system or as a subsystem inits own right. In the ideal gas, without any interactions, its energy would not change.Thus the particle would not be described by the Boltzmann probability in Eq.(1.3.29). However, we can allow the ideal gas model to include a weak or infrequentinteraction (collision) between particles which changes the particle’s energy. Over along time compared to the time between collisions, the particle will explore all possi-ble positions in space and all possible momenta. The probability of its being at a par-ticular position and momentum (in a region d3xd3p) is given by the Boltzmann dis-tribution:

(1.3.82)

Instead of considering the trajectory of this particular particle and the effects ofthe (unspecified) collisions, we can think of an ensemble that represents this particu-lar particle in contact with a thermal reservoir. The ensemble would be composed ofmany different particles in different boxes. There is no need to have more than oneparticle in the system. We do need to have some mechanism for energy to be trans-ferred to and from the particle instead of collisions with other particles. This couldhappen as a result of the collisions with the walls of the box if the vibrations of thewalls give energy to the particle or absorb energy from the particle. If the wall is at thetemperature T, this would also give rise to the same Boltzmann distribution for theparticle. The probability of a particular particle in a particular box being in a partic-ular location with a particular momentum would be given by the same Boltzmannprobability.

Using the Boltzmann probability distribution for the velocity, we could calculatethe average velocity of the particle as:

e− p 2

2mkT d 3pd 3x /h3

e− p

2

2mkT d 3pd 3x /h3∫

P = NkT /V = P0TV0

T0V= P0

V1 +V2

V1

−5/ 3

F = kTN[ln(N (T)3 /V ) −1]= F0T

T0

= F0V1 +V2

V1

−2/ 3



01adBARYAM_29412 3/10/02 10:16 AM Page 84

(1.3.83)

which is the same result as we obtained for the ideal gas in the last part ofQuestion 1.3.2. We could even consider one coordinate of one particle as a separatesystem and arrive at the same conclusion.Our description of systems is actually a de-scription of coordinates.

There are differences when we consider the particle to be a member of an en-semble and as one par ticle of a gas. In the ensemble, we do not need to consider thedistinguishability of particles. This does not affect any of the properties of a singleparticle.

This discussion shows that the ideal gas model may be viewed as quite close tothe basic concept of an ensemble.Generalize the physical particle in three dimensionsto a point with coordinates that describe a complete system. These coordinates changein time as the system evolves according to the rules of its dynamics.The ensemble rep-resents this system in the same way as the ideal gas is the ensemble of the particle. Thelack of interaction between the different members of the ensemble,and the existenceof a transfer of energy to and from each of the systems to generate the Boltzmannprobability, is the same in each of the cases. This analogy is helpful when thinkingabout the nature of the ensemble.

1.3.4 Phase transitions—first and second orderIn the previous section we constructed some of the underpinnings of thermody-namics and their connection with microscopic descriptions of materials using statis-tical mechanics. One of the central conclusions was that by minimizing the free en-ergy we can find the equilibrium state of a material that has a fixed number ofparticles, volume and temperature. Once the free energy is minimized to obtain theequilibrium state of the material, the energy, entropy and pressure are uniquely de-termined. The free energy is also a function of the temperature, the volume and thenumber of particles.

One of the important properties of materials is that they can change their prop-erties suddenly when the temperature is changed by a small amount. Examples of thisare the transition of a solid to a liquid, or a liquid to a gas. Such a change is known asa phase transition. Each well-defined state of the material is considered a particularphase. Let us consider the process of minimizing the free energy as we vary the tem-perature. Each of the properties of the material will, in general, change smoothly asthe temperature is varied. However, special circumstances might occur when theminimization of the free energy at one temperature results in a very different set of

< v2 >= 3 < v⊥

2 >= 3v⊥

2e−

p2

2mkT d 3pd 3x /h3∫e

− p2

2mkT d 3pd 3x /h3∫=

v⊥2 e

− p⊥2

2mkT dp ⊥∫

e−

p⊥2

2mkT dp ⊥∫=

3kT

m

T he rmo dy na mi c s an d st a t i s t i c a l m ec ha n i c s 85


01adBARYAM_29412 3/10/02 10:16 AM Page 85

properties of the material from this minimization at a slightly different temperature.This is illustrated in a series of frames in Fig. 1.3.6, where a schematic of the free en-ergy as a function of some macroscopic parameter is illustrated.

The temperature at which the jump in properties of the material occurs is calledthe critical or transition temperature,Tc . In general,all of the properties of the mate-rial except for the free energy jump discontinuously at Tc . This kind of phase transi-tion is known as a first-order phase transition. Some of the properties of a first-orderphase transition are that the two phases can coexist at the transition temperature sothat part of the material is in one phase and part in the other. An example is ice float-ing in water. If we start from a temperature below the transition temperature—withice—and add heat to the system gradually, the temperature will rise until we reach thetransition temperature. Then the temperature will stay fixed as the material convertsfrom one phase to the other—from ice to water. Once the whole system is convertedto the higher temperature phase, the temperature will start to increase again.



Tc+2∆T

Tc+∆T

Tc

Tc–∆T

Tc–2∆T

F i g u re 1.3.6 Each of thecurves represents thevariation of the free en-ergy of a system as afunction of macroscopicparameters. The differ-ent curves are for dif-ferent temperatures. Asthe temperature is var-ied the minimum of thefree energy all of a sud-den switches from oneset of macroscopic para-meters to another. Thisis a first-order phasetransition like the melt-ing of ice to form water,or the boiling of waterto form steam. Belowthe ice-to-water phasetransition the macro-scopic parameters thatdescribe ice are the min-imum of the free energy,while above the phasetransition the macro-scopic parameters thatdescribe water are theminimum of the freeenergy. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 86

The temperature Tc at which a transition occurs depends on the number of par-ticles and the volume of the system. Alternatively, it may be considered a function ofthe pressure. We can draw a phase-transition diagram (Fig. 1.3.7) that shows the tran-sition temperature as a function of pressure. Each region of such a diagram corre-sponds to a particular phase.

There is another kind of phase transition, known as a second-order phase tran-sition, where the energy and the pressure do not change discontinuously at the phase-transition point. Instead, they change continuously, but they are nonanalytic at thetransition temperature.A common way that this can occur is illustrated in Fig. 1.3.8.In this case the single minimum of the free energy breaks into two minima as a func-tion of temperature. The temperature at which the two minima appear is the transi-tion temperature. Such a second-order transition is often coupled to the existence offirst-order transitions. Below the second-order transition temperature, when the twominima exist, the variation of the pressure can change the relative energy of the twominima and cause a first-order transition to occur. The first-order transition occursat a particular pressure Pc(T) for each temperature below the second-order transitiontemperature. This gives rise to a line of first-order phase transitions. Above thesecond-order transition temperature, there is only one minimum, so that there are

T he r mody n ami c s an d st a t i s t i c a l m ec han i cs 87


T

P

icewater

steam

1st order

2nd order

Figure 1.3.7 Schematic phase diagram of H2O showing three phases — ice, water and steam.Each of the regions shows the domain of pressures and temperatures at which a pure phaseis in equilibrium. The lines show phase transition temperatures, Tc(P), or phase transitionpressures, Pc(T). The different ways of crossing lines have different names. Ice to water: melt-ing; ice to steam: sublimation; water to steam: boiling; water to ice: freezing; steam to wa-ter: condensation; steam to ice: condensation to frost. The transition line from water to steamends at a point of high pressure and temperature where the two become indistinguishable. Atthis high pressure steam is compressed till it has a density approaching that of water, and atthis high temperature water molecules are energetic like a vapor. This special point is asecond-order phase transition point (see Fig. 1.3.8). ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 87



Tc+2∆T

Tc+∆T

Tc

Tc–∆T

Tc–2∆T

Tc–3∆T

Figure 1.3.8 Similar toFig. 1.3.6, each of thecurves represents thevariation of the free en-ergy of a system as afunction of macroscopicparameters. In this case,however, the phase tran-sition occurs when twominima emerge fromone. This is a second-or-der phase transition.Below the temperatureat which the second-or-der phase transition oc-curs, varying the pres-sure can give rise to afirst-order phase transi-tion by changing the rel-ative energies of the twominima (see Figs. 1.3.6and 1.3.7). ❚

also no first-order transitions. Thus, the second-order transition point occurs as theend of a line of first-order transitions.A second-order transition is found at the endof the liquid-to-vapor phase line of water in Fig. 1.3.7.

01adBARYAM_29412 3/10/02 10:16 AM Page 88

The properties of second-order phase transitions have been extensively studiedbecause of interesting phenomena that are associated with them. Unlike a first-orderphase t ransition, there is no coexistence of two phases at the phase transition, be-cause there is only one phase at that point. Instead, there exist large fluctuations inthe local properties of the material at the phase transition. A suggestion of why thisoccurs can be seen from Fig. 1.3.8, where the free energy is seen to be very flat at thephase transition. This results in large excursions (fluctuations) of all the propertiesof the system except the free energy. These excursions, however, are not coherentover the whole material. Instead, they occur at every length scale from the micro-scopic on up. The closer a material is to the phase transition, the longer are thelength scales that are affected. As the temperature is varied so that the system movesaway from the transition temperature,the fluctuations disappear, first on the longestlength scales and then on shorter and shorter length scales. Because at the phasetransition itself even the macroscopic length scales are affected,thermodynamics it-self had to be carefully rethought in the vicinity of second-order phase transitions.The methodology that has been developed, the renormalization group, is an impor-tant tool in the investigation of phase transitions. We will discuss it in Section 1.10.We note that, to be consistent with Question 1.3.1, the specific heat CV must divergeat a second-order phase transition, where energy fluctuations can be large.

1.3.5 Use of thermodynamics and statistical mechanics indescribing the real world

How do we generalize the notions of thermodynamics that we have just described toapply to more realistic situations? The assumptions of thermodynamics—that sys-tems are in equilibrium and that dividing them into parts leads to unchanged localproperties—do not generally apply. The breakdown of the assumptions of thermo-dynamics occurs for even simple materials, but are more radically violated when weconsider biological organisms like trees or people. We still are able to measure theirtemperature. How do we extend thermodynamics to apply to these systems?

We can start by considering a system quite close to the thermodynamic ideal—apure piece of material that is not in equilibrium. For example, a glass of water in aroom. We generally have no trouble placing a thermometer in the glass and measur-ing the temperature of the water. We know it is not in equilibrium, because if we waitit will evaporate to become a vapor spread out throughout the room (even if we sim-plify by considering the room closed). Moreover, if we wait longer (a few hundredyears to a few tens of thousands of years),the glass itself will flow and cover the tableor flow down to the floor, and at least part of it will also sublime to a vapor. The tablewill undergo its own processes of deterioration. These effects will occur even in anidealized closed room without considerations of various external influences or trafficthrough the room. There is one essential concept that allows us to continue to applythermodynamic principles to these materials,and measure the temperature of the wa-ter, glass or table, and generally to discover that they are at the same (or close to thesame) temperature. The concept is the separation of time scales.This concept is as ba-sic as the other principles of thermodynamics. It plays an essential role in discussions



01adBARYAM_29412 3/10/02 10:16 AM Page 89

of the dynamics of physical systems and in particular of the dynamics of complex sys-tems. The separation of time scales assumes that our observations of systems have alimited time resolution and are performed over a limited time. The processes that oc-cur in a material are then separated into fast processes that are much faster than thetime resolution of our observation, slow processes that occur on longer time scalesthan the duration of observation,and dynamic processes that occur on the time scaleof our observation. Macroscopic averages are assumed to be averages over the fastprocesses. Thermodynamics allows us to deal with the slow and the fast processes butonly in very limited ways with the dynamic processes. The dynamic processes are dealtwith separately by Newtonian mechanics.

Slow processes establish the framework in which thermodynamics can be ap-plied. In formal terms,the ensemble that we use in thermodynamics assumes that allthe parameters of the system described by slow processes are fixed. To describe a sys-tem using statistical mechanics, we consider all of the slowly varying parameters ofthe system to be fixed and assume that equilibrium applies to all of the fast processes.Specifically, we assume that all possible arrangements of the fast coordinates exist inthe ensemble with a probability given by the Boltzmann probability. Generally,though not always, it is the microscopic processes that are fast. To justify this we canconsider that an atom in a solid vibrates at a rate of 1010–1012 times per second,a gasmolecule at room temperature travels five hundred meters per second. These are,however, only a couple of select examples.

Sometimes we may still choose to perform our analysis by averaging over manypossible values of the slow coordinates. When we do this we have two kinds of en-sembles—the ensemble of the fast coordinates and the ensemble of the different val-ues of the slow coordinates. These ensembles are called the annealed and quenchedensembles. For example, say we have a glass of water in which there is an ice cube.There are fast processes that correspond to the motion of the water molecules and thevibrations of the ice molecules,and there are also slow processes corresponding to themovement of the ice in the water. Let’s say we want to determine the average amountof ice. If we perform several measurements that determine the coordinates and size ofthe ice, we may want to average the size we find over all the measurements eventhough they are measurements corresponding to different locations of the ice. In con-trast, if we wanted to measure the motion of the ice, averaging the measurements oflocation would be absurd.

Closely related to the discussion of fast coordinates is the ergodic theorem. Theergodic theorem states that a measurement performed on a system by averaging aproperty over a long time is the same as taking the average over the ensemble of thefast coordinates. This theorem is used to relate experimental measurements that areassumed to occur over long times to theoretically obtained averages over ensembles.The ergodic theorem is not a theorem in the sense that it has been proven in general,but rather a statement of a property that applies to some macroscopic systems and isknown not to apply to others. The objective is to identify when it applies.When it doesnot apply, the solution is to identify which quantities may be averaged and which may



01adBARYAM_29412 3/10/02 10:16 AM Page 90

not, often by separating fast and slow coordinates or equivalently by identifying quan-tities conserved by the fast dynamics of the system.

Experimental measurements also generally average properties over large regionsof space compared to microscopic lengths. It is this spatial averaging rather than timeaveraging that often enables the ensemble average to stand for experimental mea-surements when the microscopic processes are not fast compared to the measurementtime. For example, materials are often formed of microscopic grains and have manydislocations. The grain boundaries and dislocations do move, but they often changevery slowly over time. When experiments are sensitive to their properties, they oftenaverage over the effects of many grains and dislocations because they do not have suf-ficient resolution to see a single grain boundary or dislocation.

In order to determine what is the relevant ensemble for a particular experiment,both the effect of time and space averaging must be considered. Technically, this re-quires an understanding of the correlation in space and time of the properties of anindividual system. More conceptually, measurements that are made for particularquantities are in effect made over many independent systems both in space and intime, and therefore correspond to an ensemble average. The existence of correlationis the opposite of independence. The key question (like in the case of the ideal gas) be-comes what is the interval of space and time that corresponds to an independent sys-tem. These quantities are known as the correlation length and the correlation time. Ifwe are able to describe theoretically the ensemble over a correlation length and cor-relation time, then by appropriate averaging we can describe the measurement.

In summary, the program of use of thermodynamics in the real world is to usethe separation of the different time scales to apply equilibrium concepts to the fast de-grees of freedom and discuss their influence on the dynamic degrees of freedom whilekeeping fixed the slow degrees of freedom. The use of ensembles simplifies consider-ation of these systems by systematizing the use of equilibrium concepts to the fast de-grees of freedom.

1.3.6 From thermodynamics to complex systemsOur objective in this book is to consider the dynamics of complex systems. While,asdiscussed in the previous section, we will use the principles of thermodynamics tohelp us in this analysis,another important reason to review thermodynamics is to rec-ognize what complex systems are not. Thermodynamics describes macroscopic sys-tems without structure or dynamics.The task of thermodynamics is to relate the veryfew macroscopic parameters to each other. It suggests that these are the only relevantparameters in the description of these systems. Materials and complex systems areboth formed out of many interacting parts. The ideal gas example described a mate-rial where the interaction between the particles was weak.However, thermodynamicsalso describes solids, where the interaction is strong. Having decided that complexsystems are not described fully by thermodynamics, we must ask, Where do the as-sumptions of thermodynamics break down? There are several ways the assumptionsmay break down, and each one is significant and plays a role in our investigation of

T he rmod yn a mi c s a nd s t a t is t i ca l m echa n i c s 91


01adBARYAM_29412 3/10/02 10:16 AM Page 91

complex systems. Since we have not yet examined particular examples of complex sys-tems, this discussion must be quite abstract. However, it will be useful as we studycomplex systems to refer back to this discussion. The abstract statements will haveconcrete realizations when we construct models of complex systems.

The assumptions of thermodynamics separate into space-related and time-related assumptions. The first we discuss is the divisibility of a macroscopic material.Fig. 1.3.2 (page 61) illustrates the property of divisibility. In this process,a small partof a system is separated from a large part of the system without affecting the localproperties of the material. This is inherent in the use of extensive and intensive quan-tities. Such divisibility is not true of systems typically considered to be complex sys-tems. Consider, for example, a person as a complex system that cannot be separatedand continue to have the same properties. In words, we would say that complex sys-tems are formed out of not only interacting, but also interdependent parts. Since boththermodynamic and complex systems are formed out of interacting parts, it is theconcept of interdependency that must distinguish them. We will dedicate a few para-graphs to defining a sense in which “interdependent” can have a more precisemeaning.

We must first address a simple way in which a system may have a nonextensiveenergy and still not be a complex system. If we look closely at the properties of a ma-terial, say a piece of metal or a cup of water, we discover that its surface is differentfrom the bulk. By separating the material into pieces, the surface area of the materialis changed. For macroscopic materials,this generally does not affect the bulk proper-ties of the material.A characteristic way to identify surface properties, such as the sur-face energy, is through their dependence on particle number. The surface energyscales as N 2/3, in contrast to the extensive bulk energy that is linear in N. This kind ofcorrection can be incorporated directly in a slightly more detailed treatment of ther-modynamics, where every macroscopic parameter has a surface term. The presence ofsuch surface terms is not sufficient to identify a material as a complex system. For thisreason, we are careful to identify complex systems by requiring that the scenario ofFig. 1.3.2 is violated by changes in the local (i.e., everywhere including the bulk) prop-erties of the system, rather than just the surface.

It may be asked whether the notion of “local properties” is sufficiently well de-fined as we are using it. In principle,it is not. For now, we adopt this notion from ther-modynamics. When only a few properties, like the energy and entropy, are relevant,“affect locally”is a precise concept.Later we would like to replace the use of local ther-modynamic properties with a more general concept—the behavior of the system.

How is the scenario of Fig. 1.3.2 violated for a complex system? We can find thatthe local properties of the small part are affected without affecting the local proper-ties of the large part.Or we can find that the local properties of the large part are af-fected as well. The distinction between these two ways of affecting the system is im-portant, because it can enable us to distinguish between different kinds of complexsystems. It will be helpful to name them for later reference. We call the first categoryof systems complex materials, the second category we call complex organisms.



01adBARYAM_29412 3/10/02 10:16 AM Page 92

Why don’t we also include the possibility that the large part is affected but not thesmall part? At this point it makes sense to consider generic subdivision rather thanspecial subdivision. By generic subdivision, we mean the ensemble of possible subdi-visions rather than a particular one.Once we are considering complex systems,the ef-fect of removal of part of a system may depend on which part is removed. However,when we are trying to understand whether or not we have a complex system, we canlimit ourselves to considering the generic effects of removing a part of the system. Forthis reason we do not consider the possibility that subdivision affects the large systemand not the small. This might be possible for the removal of a particular small part,but it would be surprising to discover a system where this is generically true.

Two examples may help to illu s tra te the different classes of com p l ex sys tem s . Atleast su perf i c i a lly, plants are com p l ex materi a l s , while animals are com p l ex or ga n i s m s .The re a s on that plants are com p l ex materials is that the cut ting of p a rts of a plant, su chas leave s , a bra n ch , or a roo t , typ i c a lly does not affect the local properties of the rest ofthe plant, but does affect the exc i s ed part . For animals this is not gen eri c a lly the case.However, it would be bet ter to argue that plants are in an interm ed i a te category, wh eres ome divi s i on s , su ch as cut ting out a lateral secti on of a tree tru n k , a f fect both smalland large part s , while others affect on ly the small er part . For animals, e s s en ti a lly all di-vi s i ons affect both small and large part s .We bel i eve that com p l ex or ganisms play a spe-cial role in the stu dy of com p l ex sys tem beh avi or. The essen tial qu a l i ty of a com p l exor ganism is that its properties are ti ed to the ex i s ten ce of a ll of its part s .

How large is the small part we are talking about? Loss of a few cells from the skinof an animal will not generally affect it. As the size of the removed portion is de-creased,it may be expected that the influence on the local properties of the larger sys-tem will be reduced. This leads to the concept of a robust complex system.Qualitatively, the larger the part that can be removed from a complex system withoutaffecting its local properties,the more robust the system is. We see that a complex ma-terial is the limiting case of a highly robust complex system.

The flip side of subdivision of a system is aggregation. For thermodynamic sys-tems, subdivision and aggregation are the same, but for complex systems they arequite different. One of the questions that will concern us is what happens when weplace a few or many complex systems together. Generally we expect that the individ-ual complex systems will interact with each other. However, one of the points we canmake at this time is that just placing together many complex systems, trees or people,does not make a larger complex system by the criteria of subdivision. Thus, a collec-tion of complex systems may result in a system that behaves as a thermodynamic sys-tem under subdivision—separating it into parts does not affect the behavior of theparts.

The topic of bri n ging toget h er many pieces or su b d ividing into many parts is alsoqu i te disti n ct from the topic of su b d ivi s i on by rem oval of a single part . This bri n gs usto a second assu m pti on we wi ll discuss.Th erm odynamic sys tems are assu m ed to be com-po s ed of a very large nu m ber of p a rti cl e s . What abo ut com p l ex sys tems? We know thatthe nu m ber of m o l ecules in a cup of w a ter is not gre a ter than the nu m ber of m o l ec u l e s



01adBARYAM_29412 3/10/02 10:16 AM Page 93

in a human bei n g.And yet ,we understand that this is not qu i te the ri ght poi n t .We shouldnot be co u n ting the nu m ber of w a ter molecules in the pers on ,i n s te ad we might co u n tthe nu m ber of cell s , wh i ch is mu ch small er. Thus appe a rs the probl em of co u n ting thenu m ber of com pon ents of a sys tem . In the con text of correl a ti ons in materi a l s , this wasbri ef ly discussed at the end of the last secti on . Let us assume for the mom ent that wek n ow how to count the nu m ber of com pon en t s . It seems clear that sys tems with on ly afew com pon ents should not be tre a ted by therm ody n a m i c s .One of the intere s ting qu e s-ti ons we wi ll discuss is wh et h er in the limit of a very large nu m ber of com pon ents wewi ll alw ays have a therm odynamic sys tem .S t a ted in a simpler way from the point of vi ewof the stu dy of com p l ex sys tem s , the qu e s ti on becomes how large is too large or howm a ny is too many. From the therm odynamic pers pective the qu e s ti on is, Un der wh a tc i rc u m s t a n ces do we end up with the therm odynamic limit?

We now switch to a discussion of time-related assumptions.One of the basic as-sumptions of thermodynamics is the ergodic theorem that enables the description ofa single system using an ensemble. When the ergodic theorem breaks down, as dis-cussed in the previous section, additional fixed or quenched variables become im-portant. This is the same as saying that there are significant differences between dif-ferent examples of the macroscopic system we are interested in. This is a necessarycondition for the existence of a complex system. The alternative would be that all re-alizations of the system would be the same, which does not coincide with intuitive no-tions of complexity. We will discuss several examples of the breaking of the ergodictheorem later. The simplest example is a magnet. The orientation of the magnet is anadditional parameter that must be specified, and therefore the ergodic theorem is vi-olated for this system. Any system that breaks symmetry violates the ergodic theorem.However, we do not accept a magnet as a complex system. Therefore we can assumethat the breaking of ergodicity is a necessary but not sufficient condition for com-plexity. All of the systems we will discuss break ergodicity, and therefore it is alwaysnecessary to specify which coordinates of the complex system are fixed and which areto be assumed to be so rapidly varying that they can be assigned equilibriumBoltzmann probabilities.

A special case of the breaking of the ergodic theorem, but one that strikes evenmore deeply at the assumptions of thermodynamics, is a violation of the separationof time scales. If there are dynamical processes that occur on every time scale, then itbecomes impossible to treat the system using the conventional separation of scalesinto fast,slow and dynamic processes.As we will discuss in Section 1.10,the techniquesof renormalization that are used in phase transitions to deal with the existence of manyspatial scales may also be used to describe systems changing on many time scales.

Finally, inherent in thermodynamics,the concept of equilibrium and the ergodictheorem is the assumption that the initial condition of the system does not matter. Fora complex system,the initial condition of the system does matter over the time scalesrelevant to our observation. This brings us back to the concept of correlation time.The correlation time describes the length of time over which the initial conditions arerelevant to the dynamics. This means that our observation of a complex system mustbe shorter than a correlation time.The spatial analog, the correlation length,describes



01adBARYAM_29412 3/10/02 10:16 AM Page 94

the effects of surfaces on the system. The discussion of the effects of subdivision alsoimplies that the system must be smaller than a correlation length. This means thatcomplex systems change their internal structure—adapt—to conditions at theirboundaries. Thus, a suggestive though incomplete summary of our discussion ofcomplexity in the context of thermodynamics is that a complex system is containedwithin a single correlation distance and correlation time.

Activated Processes (and Glasses)

In the last section we saw figures (Fig. 1.3.7) showing the free energy as a function ofa macroscopic parameter with two minima. In this section we analyze a single parti-cle system that has a potential energy with a similar shape (Fig. 1.4.1). The particle isin equilibrium with a thermal reservoir. If the average energy is lower than the energyof the barrier between the two wells, then the particle generally resides for a time inone well and then switches to the other. At very low temperatures, in equilibrium,itwill be more and more likely to be in the lower well and less likely to be in the higherwell. We use this model to think about a system with two possible states, where onestate is higher in energy than the other. If we start the system in the higher energy state,the system will relax to the lower energy state. Because the process of relaxation is en-abled or accelerated by energy from the thermal resevoir, we say that it is activated.

1.4

Ac t i v a te d p ro c e s se s ( a n d g l a s s e s ) 95


E–1

E1

EB

E

x–1 x1xBx

EB

E1

Figure 1.4.1 Illustration of the potential energy of a system that has two local minimum en-ergy configurations x1 and x−1. When the temperature is lower than the energy barriers EB −E−1 and EB − E1, the system may be considered as a two-state system with transitions betweenthem. The relative probability of the two states varies with temperature and the relative en-ergy of the bottom of the two wells. The rate of transition also varies with temperature. Whenthe system is cooled systematically the two-state system is a simple model of a glass (Fig.1.4.2). At low temperatures the system can not move from one well to the other, but is inequilibrium within a single well. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 95

1.4.1 Two-state systemsIt might seem that a system with only two different states would be easy to analyze.Eventually we will reach a simple problem. However, building the simple model willrequire us to identify some questions and approximations relevant to our under-standing of the application of this model to physical systems (e.g. the problem of pro-tein folding found in Chapter 4). Rather than jumping to the simple two-state prob-lem (Eq. (1.4.40) below), we begin from a particle in a double-well potential. Thekinetics and thermodynamics in this system give some additional content to the ther-modynamic discussion of the previous section and introduce new concepts.

We consider Fig. 1.4.1 as describing the potential energy V(x) experienced by aclassical particle in one dimension. The region to the right of xB is called the right welland to the left is called the left well.A classical trajectory of the particle with conservedenergy would consist of the particle bouncing back and forth within the potential wellbetween two points that are the solution of the equation V(x) = E, where E is the to-tal energy of the particle. The kinetic energy at any time is given by

(1.4.1)

which determines the magnitude of the velocity at any position but not the direction.The velocity switches direction every bounce.When the energy is larger than EB , thereis only one distinct trajectory at each energy. For energies larger than E1 but smallerthan EB , there are two possible trajectories, one in the right well—to the right of xB —and one in the left well. Below E1, which is the minimum energy of the right well,thereis again only one trajectory possible, in the left well. Below E−1 there are no possiblelocations for the particle.

If we consider this system in isolation,there is no possibility that the particle willchange from one trajectory to another. Our first objective is to enable the particle tobe in contact with some other system (or coordinate) with which it can transfer en-ergy and momentum. For example, we could imagine that the particle is one of manymoving in the double well—like the ideal gas. Sometimes there are collisions thatchange the energy and direction of the motion. The same effect would be found formany other ways we could imagine the particle interacting with other systems. Themain approximation, however, is that the interaction of the particle with the rest ofthe universe occurs only over short times. Most of the time it acts as if it were by itselfin the potential well. The particle follows a trajectory and has an energy that is the sumof its kinetic and potential energies (Eq.(1.4.1)). There is no need to describe the en-ergy associated with the interaction with the other systems. All of the other particlesof the gas (or whatever picture we imagine) form the thermal reservoir, which has awell-defined temperature T.

We can increase the rate of collisions between the system and the reservoir with-out changing our description. Then the particle does not go very far before it forgetsthe direction it was traveling in and the energy that it had. But as long as the collisionsthemselves occur over a short time compared to the time between collisions,any timewe look at the particle, it has a well-defined energy and momentum. From moment

E(x, p)−V (x) = 1

2mv

2



01adBARYAM_29412 3/10/02 10:16 AM Page 96

to moment,the kinetic energy and momentum changes unpredictably. Still,the posi-tion of the particle must change continuously in time. This scenario is known as dif-fusive motion. The different times are related by:

collision (interaction) time << time between collisions << transit time

where the transit time is the time between bounces from the walls of the potential wellif there were no collisions—the period of oscillation of a particle in the well. The par-ticle undergoes a kind of random walk, with its direction and velocity changing ran-domly from moment to moment. We will assume this scenario in our treatment ofthis system.

When the par ticle is in contact with a thermal reservoir, the laws of thermody-namics apply. The Boltzmann probability gives the probability that the particle isfound at position x with momentum p:

(1.4.2)

Formally, this expression describes a large number of independent systems that makeup a canonical ensemble.The ensemble of systems provides a formally precise way ofdescribing probabilities as the number of systems in the ensemble with a particularvalue of the position and momentum. As in the previous section, Z guarantees thatthe sum over all probabilities is 1. The factor of h is not relevant in what follows, butfor completeness we keep it and associate it with the momentum integral, so thatΣp → ∫dp /h.

If we are interested in the position of the particle,and are not interested in its mo-mentum, we can simplify this expression by integrating over all values of the mo-mentum. Since the energy separates into kinetic and potential energy:

(1.4.3)

The resulting expression looks similar to our original expression. Its meaning is some-what different,however, because V(x) is only the potential energy of the system. Sincethe kinetic energy contributes equivalently to the probability at every location, V(x)determines the probability at every x. An expression of the form e−E/kT is known as theBoltzmann factor of E. Thus Eq.(1.4.3) says that the probability P(x) is proportionalto the Boltzmann factor of V(x). We will use this same trick to describe the probabil-ity of being to the right or being to the left of xB in terms of the minimum energy ofeach well.

To simplify to a two-state system, we must define a variable that specifies onlywhich of the two wells the particle is in. So we label the system by s = ±1, where s = +1if x > xB and s = −1 if x < xB for a particular realization of the system at a particulartime, or:

P(x) =e −V(x )/kT (dp /h)∫ e − p 2 / 2mkT

dx∫ e −V(x )/kT (dp /h)∫ e − p2

/2mkT=

e −V(x)/kT

dx∫ e −V (x )/kT

P(x, p) = e −E(x ,p)/kT / Z

Z =x ,p

∑ e −E(x ,p)/kT =1

hdxdp∫ e −E(x ,p)/kT

Ac t i v a t e d p ro c e s s e s ( a nd g l a s se s ) 97


01adBARYAM_29412 3/10/02 10:16 AM Page 97

s = sign(x − xB) (1.4.4)

Probabilistically, the case x = xB never happens and therefore does not have to be ac-counted for.

We can calculate the probability P(s) of the system having a value of s =+1 using:

(1.4.5)

The largest con tri buti on to this prob a bi l i ty occ u rs wh en V(x) is small e s t . We assu m ethat k T is small com p a red to EB, t h en the va lue of the integral is dom i n a ted by the re-gi on immed i a tely in the vi c i n i ty of the minimum en er gy. De s c ri bing this as a two - s t a tes ys tem is on ly meaningful wh en this is tru e . We simplify the integral by expanding it inthe vi c i n i ty of the minimum en er gy and keeping on ly the qu ad ra tic term :

(1.4.6)

where

(1.4.7)

is the effective spring constant and 1 is the frequency of small oscillations in the rightwell. We can now write Eq. (1.4.5) in the form

(1.4.8)

Because the integrand in the numerator falls rapidly away from the point x = x1, wecould extend the lower limit to −∞. Similarly, the probability of being in the leftwell is:

(1.4.9)

Here the upper limit of the integral could be extended to ∞. It is simplest to assumethat k1 = k−1. This assumption, that the shape of the wells are the same, does not sig-nificantly affect most of the discussion (Question 1.4.1–1.4.2). The two probabilitiesare proportional to a new constant times the Boltzmann factor e−E/kT of the energy atthe bottom of the well. This can be seen e ven without performing the integrals inEq. (1.4.8) and Eq. (1.4.9). We redefine Z for the two-state representation:

P(−1) =

e −E −1 /kT dx e −k−1 (x−x −1 )2

/2kT

−∞

x B

∫dx∫ e −V (x)/kT

P(1) =

e −E1 /kT dx e −k1 (x−x 1)2 /2kT

x B

∞

∫dx∫ e −V (x )/kT

k1 = m 12 =

d 2V(x)

dx 2x 1

V (x) = E1 + 1

2m 1

2(x − x1)2 + … =E1 + 12

k1(x − x1)2 + …

P(1) =

dx e −V (x )/kT

x B

∞

∫dx∫ e −V (x)/ kT



01adBARYAM_29412 3/10/02 10:16 AM Page 98

(1.4.10)

(1.4.11)

The new normalization Zs can be obtained from:

(1.4.12)

giving

(1.4.13)

which is different from the value in Eq. (1.4.2). We arrive at the desired two-stateresult:

(1.4.14)

where f is the Fermi probability or Fermi function:

(1.4.15)

For readers who were introduced to the Fermi function in quantum statistics,it is notunique to that field, it occurs anytime there are exactly two different possibilities.Similarly,

(1.4.16)

which is consistent with Eq. (1.4.12) above since

(1.4.17)

Question 1.4.1 Discuss how k 1 ≠ k−1 would affect the results for the two-state system in equilibrium. Obtain expressions for the probabilities in

each of the wells.

Solution 1.4.1 Extending the integrals to ±∞, as described in the text afterEq. (1.4.8) and Eq. (1.4.9), we obtain:

(1.4.18)

(1.4.19)

P(−1) =e −E1 / kT 2 kT /k−1

dx∫ e −V (x )/ kT

P(1) =e −E1 /kT 2 kT /k1

dx∫ e −V (x)/ kT

f (x) + f (−x) = 1

P(−1) =

e −E−1 /kT

e −E1 / kT +e −E −1 /kT=

1

1+e (E −1 −E1 )/ kT= f (E−1 − E1)

f (x) =

1

1+e x /kT

P(1) =

e −E1 /kT

e −E1 /kT + e −E−1 /kT=

1

1+ e (E1−E−1 )/kT= f (E1 − E−1)

Z s = e −E1 /kT + e −E −1 / kT

P(1)+ P(−1) = 1

P(1) =

e −E1 /kT

Z s

P(−1) =

e −E −1 /kT

Z s

Ac t i v a te d p ro c e s se s ( an d g l a s s e s ) 99


01adBARYAM_29412 3/10/02 10:16 AM Page 99

Because of the approximate extension of the integrals, we are no longer guar-anteed that the sum of these probabilities is 1. However, within the accuracyof the approximation, we can reimpose the normalization condition. Beforewe do so, we choose to rewrite k1 = m 1

2 = m(2 1)2, where 1 is the naturalfrequency of the well. We then ignore all common factors in the two proba-bilities and write

(1.4.20)

(1.4.21)

(1.4.22)

Or we can write, as in Eq. (1.4.14)

(1.4.23)

and similarly for P(−1). ❚

Question 1.4.2 Redefine the energies E1 and E−1 to include the effect ofthe difference between k1 and k−1 so that the probability P(1) (Eq.

(1.4.23)) can be written like Eq. (1.4.14) with the new energies. How is theresult related to the concept of free energy and entropy?

Solution 1.4.2 We define the new energy of the right well as

(1.4.24)

This definition can be seen to recover Eq. (1.4.23) from the form of Eq.(1.4.14) as

(1.4.25)

Eq. (1.4.24) is very reminiscent of the definition of the free energy Eq.(1.3.33) if we use the expression for the entropy:

(1.4.26)

Note that if we consider the temperature dependence, Eq. (1.4.25) is notidentical in its behavior with Eq.(1.4.14). The free energy, F1, depends on T,while the energy at the bottom of the well, E1, does not. ❚

In Question 1.4.2, Eq. (1.4.24), we have defined what might be interpreted as afree energy of the right well. In Section 1.3 we defined only the free energy of the sys-tem as a whole. The new free energy is for part of the ensemble rather than the wholeensemble. We can do this quite generally. Start by identifying a certain subset of all

S1 = −k ln( 1)

P(1) = f (F1 − F−1)

F1 = E1 +kT ln( 1)

P(1) =1

1+ ( 1 / −1)e (E1−E−1 )/kT

′ Z s = −1−1e −E1 /kT + −1

−1e −E−1 /kT

P(−1) = −1

−1e −E−1 /kT

′ Z s

P(1) = 1

−1e −E1 /kT

′ Z s



01adBARYAM_29412 3/10/02 10:16 AM Page 100

possible states of a system. For example, s = 1 in Eq. (1.4.4). Then we define the freeenergy using the expression:

(1.4.27)

This is similar to the usual expression for the free energy in terms of the partitionfunction Z, but the sum is only over the subset of states. When there is no ambiguity,we often drop the subscript and write this asF(1). From this definition we see that theprobability of being in the subset of states is proportional to the Boltzmann factor ofthe free energy

(1.4.28)

If we have several different subsets that account for all possibilities, then we can nor-malize Eq. (1.4.28) to find the probability itself. If we do this for the left and rightwells, we immediately arrive at the expression for the probabilities in Eq.(1.4.14) andEq. (1.4.16), with E1 and E−1 replaced by Fs(1) and Fs(−1) respectively. FromEq.(1.4.28) we see that for a collection of states,the free energy plays the same role asthe energy in the Boltzmann probability.

We note that Eq. (1.4.24) is not the same as Eq.(1.4.27). However, as long as therelative energy is the same, F1 − F−1 = Fs(1) − Fs(−1),the normalized probability is un-changed. When k1 = k−1, the entropic part of the free energy is the same for both wells.Then direct use of the energy instead of the free energy is valid,as in Eq.(1.4.14). Wecan evaluate the free energy of Eq. (1.4.27), including the momentum integral:

(1.4.29)

(1.4.30)

where we have used the definition of the well oscillation frequency above Eq.(1.4.20)to simplify the expression.A similar expression holds for Z−1. The result would be ex-act for a pure harmonic well.

The new definition of the free energy of a set of states can also be used to under-stand the treatment of macroscopic systems,specifically to explain why the energy isdetermined by minimizing the free energy. Partition the possible microstates by thevalue of the energy, as in Eq. (1.3.35). Define the free energy as a function of the en-ergy analogous to Eq. (1.4.27)

(1.4.31)

F(U) = −kT ln E x ,p{ }( ),Ue−E x ,p{ }( )/ kT

{x ,p}

∑

Fs(1) = E1 +kT ln(h 1 /kT )

Z 1 = dxx B

∞

∫ (dp /h)∫ e −E(x ,p)/ kT = dxx B

∞

∫ e −V (x)/kT (dp /h)∫ e −p2 / 2mkT

≈ e −E1 /kT dx e −k1 (x−x 1)2

/2kT

x B

∞

∫ 2 mkT /h ≈ e −E1/ kT m /k1 2 kT /h

= e −E1 /kTkT /h 1

P(1) ∝e −Fs (1)/ kT

Fs(1) = −kT ln( s ,1 e −E({x ,p})/ kT

{x ,p}

∑ ) = −kT ln(Z 1)

Ac t i v a t e d p ro ce s s e s ( an d g l a s se s ) 101


01adBARYAM_29412 3/10/02 10:16 AM Page 101

Since the relative probability of each value of the energy is given by

(1.4.32)

the most likely energy is given by the lowest free energy. For a macroscopic system,the most likely value is so much more likely than any other value that it is observedin any measurement. This can immediately be generalized. The minimization of thefree energy gives not only the value of the energy but the value of any macroscopicparameter.

1.4.2 Relaxation of a two-state systemTo investigate the kinetics of the two-state system, we assume an ensemble of systemsthat is not an equilibrium ensemble. Instead,the ensemble is characterized by a time-dependent probability of occupying the two wells:

(1.4.33)

Normalization continues to hold at every time:

(1.4.34)

For example, we might consider starting a system in the upper well and see how thesystem evolves in time. Or we might consider starting a system in the lower well andsee how the system evolves in time. We answer the question using the time-evolvingprobabilities that describe an ensemble of systems with the same starting condition.To achieve this objective, we construct a differential equation describing the rate ofchange of the probability of being in a particular well in terms of the rate at which sys-tems move from one well to the other. This is just the Master equation approach fromSection 1.2.4.

The systems that make transitions from the left to the right well are the ones thatcross the point x = xB. More precisely, the rate at which transitions occur is the prob-ability current per unit time of systems at xB, moving toward the right. Similar to Eq.(1.3.47) used to obtain the pressure of an ideal gas on a wall,the number of particlescrossing xB is the probability of systems at xB with velocity v, times their velocity:

(1.4.35)

where J(1|−1) is the number of systems per unit time moving from the left to theright. There is a hidden assumption in Eq. (1.4.35). We have adopted a notation thattreats all systems on the left together. When we are considering transitions,this is onlyvalid if a system that crosses x = xB from right to left makes it down into the well onthe left, and thus does not immediately cross back over to the side it came from.

We further assume that in each well the systems are in equilibrium, even whenthe two wells are not in equilibrium with each other. This means that the probabilityof being in a particular location in the right well is given by:

J(1 |−1) = (dp /h)vP(x B , p;t)0

∞

∫

P(1;t) + P(−1;t) = 1

P(1) → P(1;t)

P(−1) → P(−1;t)

P(U) ∝e −F(U )/kT



01adBARYAM_29412 3/10/02 10:16 AM Page 102

(1.4.36)

In equilibrium,this statement is true because then P(1) = Z1 /Z. Eq.(1.4.36) presumesthat the rate of collisions between the particle and the thermal reservoir is faster thanboth the rate at which the system goes from one well to the other and the frequencyof oscillation in a well.

In order to evaluate the transition rate Eq.(1.4.35), we need the probability at xB.We assume that the systems that cross xB moving from the left well to the right well(i.e.,moving to the right) are in equilibrium with systems in the left well from wherethey came. Systems that are moving from the right well to the left have the e quilib-rium distribution characteristic of the right well. With these assumptions, the rate atwhich systems hop from the left to the right is given by:

(1.4.37)

We find using Eq. (1.4.29) that the current of systems can be written in terms of atransition rate per system:

(1.4.38)

Similarly, the current and rate at which systems hop from the right to the left are givenby:

(1.4.39)

When k1 = k−1 then 1 = −1. We continue to deal with this case for simplicity and de-fine = 1 = −1. The expressions for the rate of transition suggest the interpretationthat the frequency is the rate of attempt to cross the barrier. The probability of cross-ing in each attempt is given by the Boltzmann factor, which gives the likelihood thatthe energy exceeds the barrier. While this interpretation is appealing, and is oftengiven,it is misleading. It is better to consider the frequency as describing the width ofthe well in which the particle wanders. The wider the well is,the less likely is a barriercrossing. This interpretation survives better when more general cases are considered.

The tra n s i ti on ra tes en a ble us to con s tru ct the time va ri a ti on of the prob a bi l i tyof occ u pying each of the well s . This gives us the co u p l ed equ a ti ons for the twoprob a bi l i ti e s :

(1.4.40)

˙ P (−1;t) = R(−1|1)P(1;t) − R(1| −1)P(−1;t)

˙ P (1;t) = R(1| −1)P(−1;t) − R(−1|1)P(1;t)

J(−1 |1) = R(−1 |1)P(1;t)

R(−1 |1) = 1e−(E B −E1 )/ kT

J(1| −1) = R(1| −1)P(−1;t)

R(1| −1) = −1e− EB −E −1( ) /kT

J(1 |−1) = (dp /h)(p /m) P(−1;t)e −(EB +p 2 / 2m)/ kT /Z −1

0

∞

∫= P(−1;t)e −EB / kT (kT /h)/Z −1

P(x, p;t) = P(1;t)e −E (x ,p)/kT /Z1

Z 1 = dxdpx B

∞

∫ e −E(x,p)/ kT

Ac t i v at ed P ro ce s s e s ( a nd g l a s se s ) 103


01adBARYAM_29412 3/10/02 10:16 AM Page 103

These are the Ma s ter equ a ti ons (Eq . (1.2.86)) for the two - s t a te sys tem . We have ar-rived at these equ a ti ons by introducing a set of a s su m pti ons for tre a ting the kinet-ics of a single parti cl e . The equ a ti ons are mu ch more gen era l , s i n ce they say on lythat there is a ra te of tra n s i ti on bet ween one state of the sys tem and the other. It isthe corre s pon den ce bet ween the two - s t a te sys tem and the moving parti cle that weh ave establ i s h ed in Eq s . (1.4.38) and (1.4.39). This corre s pon den ce is approx i m a te .Eq . (1.4.40) does not rely upon the rel a ti onship bet ween EB and the ra te at wh i chs ys tems move from one well to the other. However, it does rely upon the assu m p-ti on that we need to know on ly wh i ch well the sys tem is in to specify its ra te oftra n s i ti on to the other well . On avera ge this is alw ays tru e , but it would not be agood de s c ri pti on of the sys tem , for ex a m p l e , i f en er gy is con s erved and the keyqu e s ti on determining the kinetics is wh et h er the parti cle has more or less en er gythan the barri er EB.

We can solve the coupled equations in Eq. (1.4.40) directly. Both equations arenot necessary, given the normalization constraint Eq.(1.4.34). Substituting P(−1;t) =1 − P(1;t) we have the equation

(1.4.41)

We can rewrite this in terms of the equilibrium value of the probability. By definitionthis is the value at which the time derivative vanishes.

(1.4.42)

where the right-hand side follows from Eq.(1.4.38) and Eq.(1.4.39) and is consistentwith Eq. (1.4.13), as it must be. Using this expression, Eq. (1.4.24) becomes

(1.4.43)

where we have defined an additional quantity

(1.4.44)

The solution of Eq. (1.4.43) is

(1.4.45)

This solution describes a decaying exponential that changes the probability from thestarting value to the equilibrium value. This explains the definition of , called the re-laxation time. Since it is inversely related to the sum of the rates of transition betweenthe wells,it is a typical time taken by a system to hop between the wells. The relaxationtime does not depend on the starting probability. We note that the solution ofEq.(1.4.41) does not depend on the explicit form of P(1; ∞) or . The definitions im-plied by the first equal signs in Eq.(1.4.42) and Eq.(1.4.44) are sufficient. Also, as canbe quickly checked, we can replace the index 1 with the index −1 without changinganything else in Eq (1.4.45). The other equations are valid (by symmetry) after thesubstitution 1 ↔ −1.

P(1;t) =(P(1;0)− P(1;∞))e −t / + P(1;∞)

1/ = (R(1| −1) + R(−1 |1)) = (e −(EB −E1 )/ kT +e −(EB −E−1)/kT )

˙ P (1;t) =(P(1;∞)− P(1;t))/

P(1;∞) = R(−1| 1) /(R(1| −1) + R(−1|1)) = f (E1 − E−1)

˙ P (1;t) = R(−1 |1) − P(1;t)(R(1 |−1)+ R(−1|1))



01adBARYAM_29412 3/10/02 10:16 AM Page 104

There are several intuitive relationships between the equilibrium probabilitiesand the transition rates that may be written down. The first is that the ratio of theequilibrium probabilities is the ratio of the transition rates:

(1.4.46)

The second is that the equilibrium probability divided by the relaxation time is therate of transition:

(1.4.47)

Question 1.4.3 Eq. (1.4.45) implies that the relaxation time of the sys-tem depends largely on the smaller of the two energy barriers EB − E1 and

EB − E−1. For Fig. 1.4.1 the smaller barrier is EB − E1. Since the relaxation timeis independent of the starting probability, this barrier controls the rate of re-laxation whether we start the system from the lower well or the upper well.Why does the barrier EB − E1 control the relaxation rate when we start fromthe lower well?

Solution 1.4.3 Even though the rate of transition from the lower well to theupper well is controlled by EB − E−1, the fraction of the ensemble that mustmake the transition in order to reach equilibrium depends on E1. The higherit is,the fewer systems must make the transition from s = −1 to s = 1. Takingthis into consideration implies that the time to reach equilibrium dependson EB − E1 rather than EB − E−1. ❚

1.4.3 Glass transitionGlasses are materials that when cooled from the liquid do not undergo a conventionaltransition to a solid. Instead their viscosity increases,and in the vicinity of a particu-lar temperature it becomes so large that on a reasonable time scale they can be treatedas solids.However, on long enough time scales,they flow as liquids. We will model theglass transition using a two-state system by considering what happens as we cooldown the two-state system. At high enough temperatures, the system hops back andforth between the two minima with rates given by Eqs.(1.4.38) and (1.4.39). is a mi-croscopic quantity; it might be a vibration rate in the material. Even if the barriers arehigher than the temperature, EB − E±1 >> kT, the system will still be able to hop backand forth quite rapidly from a macroscopic perspective.

As the system is cooled down, the hopping back and forth slows down. At somepoint the rate of hopping will become longer than the time we are observing the sys-tem. Systems in the higher well will stay there. Systems in the lower well will staythere. This means that the population in each well becomes fixed. Even when wecontinue to cool the system down, there will be no change, and the ensemble will nolonger be in equilibrium. Within each well the system will continue to have a proba-bility distribution for its energy given by the Boltzmann probability, but the relative

P1(∞) = R(−1|1)

P1(∞) P−1(∞) = R(−1|1)/R(1| −1)

Ac t i v a t e d p ro c e s s e s ( a n d g l a s s e s ) 105


01adBARYAM_29412 3/10/02 10:16 AM Page 105

populations of the two wells will no longer be described by the equilibriumBoltzmann probability.

To gain a feeling for the numbers,a typical atomic vibration rate is 1012/sec. Fora barrier of 1eV, at twice room temperature, kT ≈ 0.05eV (600°K), the transition ratewould be of order 103/sec. This is quite slow from a microscopic perspective, but atroom temperature it would be only 10−6/sec, or one transition per year.

The rate at which we cool the system down plays an essential role. If we coolfaster, then the temperature at which transitions stop is higher. If we cool at a slowerrate, then the temperature at which the transitions stop is lower. This is found to bethe case for glass transitions, where the cooling rate determines the departure pointfrom the equilibrium trajectory of the system,and the eventual properties of the glassare also determined by the cooling rate. Rapid cooling is called quenching. If we raisethe temperature and lower it slowly, the procedure is called annealing.

Using the model two - s t a te sys tem we can simu l a te what would happen if we per-form an ex peri m ent of cooling a sys tem that becomes a gl a s s .F i g. 1.4.2 shows the prob-a bi l i ty of being in the upper well as a functi on of the tem pera tu re as the sys tem is coo l eddown . The curves dep a rt from the equ i l i brium curve in the vi c i n i ty of a tra n s i ti on tem-pera tu re we might call a freezing tra n s i ti on , because the kinetics become frozen . Th eglass tra n s i ti on is not a tra n s i ti on like a first- or secon d - order tra n s i ti on (Secti on 1.3.4)because it is a tra n s i ti on of the kinetics ra t h er than of the equ i l i brium stru ctu re of t h es ys tem . Bel ow the freezing tra n s i ti on , the rel a tive prob a bi l i ty of the sys tem being in theu pper well is given approx i m a tely by the equ i l i brium prob a bi l i ty at the tra n s i ti on .

The freezing transition of the relative population of the upper state and the lowerstate is only a simple model of the glass transition;however, it is also more widely ap-plicable. The freezing does not depend on cooperative effects of many particles. Tofind examples, a natural place to look is the dynamics of individual atoms in solids.Potential energies with two wells occur for impurities, defects and even bulk atoms ina solid. Impurities may have two different local configurations that differ in energ yand are separated by a barrier. This is a direct analog of our model two-state system.When the temperature is lowered, the relative population of the two configurationsbecomes frozen. If we raise the temperature, the system can equilibrate again.

It is also possible to artificially cause impurity configurations to have unequal en-ergies.One way is to apply uniaxial stress to a crystal—squeezing it along one axis. Ifan impurity resides in a bond between two bulk atoms, applying stress will raise theenergy of impurities in bonds oriented with the stress axis compared to bonds per-pendicular to the stress axis. If we start at a relatively high temperature, apply stressand then cool down the material, we can freeze unequal populations of the impurity.If we have a way of measuring relaxation, then by raising the temperature graduallyand observing when the defects begin to equilibrate we can discover the barrier to re-laxation. This is one of the few methods available to study the kinetics of impurity re-orientation in solids.

The two-state system provides us with an example of how a simple system maynot be able to equilibrate over experimental time scales. It also shows how an e qui-



01adBARYAM_29412 3/10/02 10:16 AM Page 106

librium ensemble can be used to treat relative probabilities within a subset of states.Because the motion within a particular well is fast,the relative probabilities of differ-ent positions or momenta within a well may be described using the Boltzmannprobability. At the same time, the relative probability of finding a system in each ofthe two wells depends on the initial conditions and the history of the system—whattemperature the system experienced and for how long. At sufficiently low tempera-tures, this relative probability may be treated as fixed. Systems that are in the higherwell may be assumed to stay there. At intermediate temperatures, a treatment of thedynamics of the transition between the two wells can (and must) be included. Thismanifests a violation of the ergodic theorem due to the divergence of the time scale

Ac t i v a te d p ro c e s se s ( a n d g l a s s e s ) 107


0.00

0.02

0.04

0.06

0.08

0.10

0.12

100 200 300 400 500 600

P(1;∞)

100˚K/sec200˚K/sec

0.4˚K/sec…

T

P(1;t)t

Figure 1.4.2 Plot of the fraction of the systems in the higher energy well as a function oftemperature. The equilibrium value is shown with the dashed line. The solid lines show whathappens when the system is cooled from a high temperature at a particular cooling rate. Theexample given uses E1 − E−1 = 0.1eV and EB − E−1 = 1.0eV. Both wells have oscillation fre-quencies of v = 1012/sec. The fastest cooling rate is 200°K/sec and each successive curve iscooled at a rate that is half as fast, with the slowest rate being 0.4°K/sec. For every coolingrate the system stops making transitions between the wells at a particular temperature thatis analogous to a glass transition in this system. Below this temperature the probability be-comes essentially fixed. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 107

for equilibration between the two wells. Thus we have identified many of the fea-tures that are necessary in describing nonequilibrium systems: divergent time scales,violation of the ergodic theorem, frozen and dynamic coordinates. We have illus-trated a method for treating systems where there is a separation of long time scalesand short time scales.

Question 1.4.4 Write a program that can generate the time dependenceof the two-state system for a specified time history. Reproduce Fig. 1.4.2.

For an additional “experiment,” try the following quenching and annealingsequence:

a. Starting from a high enough temperature to be in equilibrium, cool the sys-tem at a rate of 10°K/sec down to T = 0.

b. Heat the system up to temperature Ta and keep it there for one second.

c. Cool the system back down to T = 0 at rate of 100°K/sec.

Plot the results as a function of Ta. Describe and explain them in words. ❚

1.4.4 DiffusionIn this secti on we bri ef ly con s i der a mu l tiwell sys tem . An example is illu s tra ted inF i g. 1 . 4 . 3 , wh ere the po ten tial well depths and barri ers va ry from site to site . A simplercase is found in Fig. 1 . 4 . 4 , wh ere all the well depths and barri ers are the same. A con-c rete example would be an inters ti tial impuri ty in an ideal crys t a l . The impuri ty live sin a peri odic en er gy that repeats every integral mu l tiple of an el em en t a ry length a.

We can apply the same analysis from the previous section to describe what hap-pens to a system that begins from a particular well at x = 0. Over time, the systemmakes transitions left and right at random,in a manner that is reminiscent of a ran-dom walk.We will see in a moment that the connection with the random walk is validbut requires some additional discussion.



0 1 2-1

E–1 E0E1

E2

EB(0|-1) EB(1|0)EB(2|1)

EB(3|2)V(x)

x

Figure 1.4.3 Illustration of a multiple-well system with barrier heights and well depths thatvary from site to site. We focus on the uniform system in Fig. 1.4.4. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 108

The probability of the system being in a particular well is changed by probabilitycurrents into the well and out from the well. Systems can move to or from the well im-mediately to their right and immediately to their left. The Master equation for the ithwell in Fig. 1.4.3 is:

(1.4.48)

(1.4.49)

where Ei is the depth of the ith well and EB(i + 1|i) is the barrier to its right. For theperiodic system of Fig. 1.4.4 ( i → , EB(i + 1|i) → EB) this simplifies to:

(1.4.50)

(1.4.51)

Since we are already describing a continuum differential equation in time,it is conve-nient to consider long times and write a continuum equation in space as well.Allowing a change in notation we write

(1.4.52)

Introducing the elementary distance between wells a we can rewrite Eq. (1.4.50)using:

(1.4.53)

(P(i −1;t)+ P(i + 1;t) − 2P(i;t))

a2

→(P(xi − a;t)+ P(xi + a;t) − 2P(xi ;t))

a2→

2

x 2P(x ;t)

P(i;t) →P(xi ;t)

R = e − (E B− E 0) /kT

˙ P (i;t) = R(P(i −1;t)+ P(i + 1;t) − 2P(i ;t))

R(i + 1|i) = ie−(EB (i +1|i )−Ei )/kT

R(i − 1|i) = ie−(EB (i|i−1)−Ei )/kT

˙ P (i;t) = R(i |i − 1)P(i − 1;t) + R(i |i +1)P(i +1;t) −(R(i + 1|i) + R(i −1 |i))P(i;t)

Ac t i v a t e d p ro c e s s e s ( a nd g l a s se s ) 109


0 1 2-1

E0

V(x)

x

EB

Figure 1.4.4 When the barrier heights and well depths are the same, as illustrated, the longtime behavior of this system is described by the diffusion equation. The evolution of the sys-tem is controlled by hopping events from one well to the other. The net effect over long timesis the same as for the random walk discussed in Section 1.2. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 109

where the last expression assumes a is small on the scale of interest. Thus the contin-uum version of Eq. (1.4.50) is the conventional diffusion equation:

(1.4.54)

The diffusion constant D is given by:

(1.4.55)

The solution of the diffusion equation, Eq. (1.4.54), depends on the initial con-ditions that are chosen. If we consider an ensemble of a system that starts in one welland spreads out over time, the solution can be checked by substitution to be theGaussian distribution found for the random walk in Section 1.2:

(1.4.56)

We see that motion in a set of uniform wells after a long time reduces to that of a ran-dom walk.

How does the similari ty to the ra n dom walk arise? This might appear to be a nat-u ral re su l t ,s i n ce we showed that the Gaussian distri buti on is qu i te gen eral using the cen-tral limit theorem . The scen a rio here ,h owever, is qu i te differen t . The cen tral limit the-orem was proven in Secti on 1.2.2 for the case of a distri buti on of prob a bi l i ties of s tep st a ken at specific time interva l s . Here we have a time con ti nu u m . Hopping events mayh a ppen at any ti m e . Con s i der the case wh ere we start from a particular well . Our differ-en tial equ a ti on de s c ri bes a sys tem that might hop to the next well at any ti m e . A hop isan even t , and we might con cern ours elves with the distri buti on of su ch events in ti m e .We have assu m ed that these events are uncorrel a ted .Th ere are unphysical con s equ en ce sof this assu m pti on . For ex a m p l e , no matter how small an interval of time we ch oo s e ,t h ep a rti cle has some prob a bi l i ty of traveling arbi tra ri ly far aw ay. This is not nece s s a ri ly acorrect micro s copic pictu re , but it is the con ti nuum model we have devel oped .

There is a procedure to convert the event-controlled hopping motion betweenwells into a random walk that takes steps with a certain probability at specific time in-tervals. We must select a time interval. For this time interval, we evaluate the totalprobability that hops move a system from its original position to all possible positionsof the system. This would give us the function f (s) in Eq.(1.2.34). As long as the meansquare displacement is finite,the central limit theorem continues to apply to the prob-ability distribution after a long enough time. The generality of the conclusion also im-plies that the result is more widely applicable than the assumptions indicate.However,there is a counter example in Question 1.4.5.

Question 1.4.5 Discuss the case of a parti cle that is not in con t act with at h ermal re s evoir moving in the mu l tiple well sys tem (en er gy is con s erved ) .

P(x,t) =1

4 Dte −x

2/4Dt =

1

2e −x

2/2

2

= 2Dt

D = a 2R = a 2 e −(EB −E0)/ kT

˙ P (x;t) = D2

x 2P(x;t)



01adBARYAM_29412 3/10/02 10:16 AM Page 110

Solution 1.4.5 If the energy of the system is lower than EB , the system staysin a single well bouncing back and forth. A model that describes how tran-sitions occur between wells would just say there are none.

For the case where the energy is larger than EB, the system will movewith a periodically varying velocity in one direction. There is a problem inselecting an ensemble to describe it. If we choose the ensemble with onlyone system moving in one direction, then it is described as a deterministicwalk. This description is consistent with the motion of the system.However, we might also think to describe the system using an ensembleconsisting of particles with the same energy. In this case it would be oneparticle moving to the right and one moving to the left. Taking an intervalof time to be the time needed to move to the next well, we would find atransition probability of 1/2 to move to the right and the same to the left.This would lead to a conventional random walk and will give us an incor-rect result for all later times.

This example illustrates the need for an assumption that has not yet beenexplicitly mentioned. The ensemble must describe systems that can maketransitions to each other. Since the energy-conserving systems cannot switchdirections, the ensemble cannot include both directions. It is enough, how-ever, for there to be a small nonzero probability for the system to switch di-rections for the central limit theorem to apply. This means that over longenough times, the distribution will be Gaussian. Over short times,however,the probability distribution from the random walk model and an almost bal-listic system would not be very similar. ❚

We can generalize the multiple well picture to describe a biased random walk.The potential we would use is a “washboard potential,” illustrated in Fig. 1.4.5. TheMaster equation is:

(1.4.57)

(1.4.58)

To obtain the continuum limit, replace i → x : P(i + 1;t) → P(x + a,t), andP(i − 1;t) → P(x − a,t), and expand in a Taylor series to second order in a to obtain:

(1.4.59)

(1.4.60)

D = a 2(R+ + R− )/2

v = a(R+ − R−)

˙ P (x;t) = −vx

P(x ;t) + D2

x 2P(x ;t)

R+ = ie− E+ /kT

R− = ie− E− /kT

˙ P (i;t) = R+ P(i − 1;t) + R−P(i +1;t)− (R+ + R− )P(i;t)

Ac t i v a t e d p ro ce s s e s ( a nd g l a s se s ) 111


01adBARYAM_29412 3/10/02 10:16 AM Page 111

The solution is a moving Gaussian:

(1.4.61)

Since the description of diffusive motion always allows the system to stay where it is,there is a limit to the degree of bias that can occur in the random walk. For this limitset R− = 0. Then D = av/2 and the spreading of the probability is given by = √avt.This shows that unlike the biased random walk in Section 1.2, diffusive motion on awashboard with a given spacing a cannot describe ballistic or deterministic motion ina single direction.

Cellular Automata

The first four sections of this chapter were dedicated to systems in which the existenceof many parameters (degrees of freedom) describing the system is hidden in one wayor another. In this section we begin to describe systems where many degrees of free-dom are explicitly represented. Cellular automata (CA) form a general class of mod-els of dynamical systems which are appealingly simple and yet capture a rich varietyof behavior. This has made them a favorite tool for studying the generic behavior ofand modeling complex dynamical systems. Historically CA are also intimately relatedto the development of concepts of computers and computation. This connection con-tinues to be a theme often found in discussions of CA. Moreover, despite the wide dif-ferences between CA and conventional computer architectures,CA are convenient for

1.5

P(x,t) =1

4 Dte −(x−vt )

2/ 4Dt =

1

2e −(x−vt )

2/2

2

= 2Dt



0 1 2-1

V(x)

x

∆E+ ∆E–

Figure 1.4.5 The biased random walk is also found in a multiple-well system when the illus-trated washboard potential is used. The velocity of the system is given by the difference inhopping rates to the right and to the left. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 112

computer simulations in general and parallel computer simulations in particular.Thus CA have gained importance with the increasing use of simulations in the devel-opment of our understanding of complex systems and their behavior.

1.5.1 Deterministic cellular automataThe concept of cellular automata begins from the concept of space and the locality ofinfluence. We assume that the system we would like to represent is distributed inspace,and that nearby regions of space have more to do with each other than regionsfar apart. The idea that regions nearby have greater influence upon each other is of-ten associated with a limit (such as the speed of light) to how fast information aboutwhat is happening in one place can move to another place.*

Once we have a system spread out in space, we mark off the space into cells. Wethen use a set of variables to describe what is happening at a given instant of time ina particular cell.

s(i, j, k ;t) = s(xi, yj, zk;t) (1.5.1)

where i, j, k are integers (i, j, k ∈Z),and this notation is for a three-dimensional space(3-d). We can also describe automata in one or two dimensions (1-d or 2-d) or higherthan three dimensions. The time dependence of the cell variables is given by an iter-ative rule:

s(i, j, k;t) = R({s(i ′ − i, j ′ − j, k ′ − k ;t − 1)} i ′, j ′, k ′ ∈ Z) (1.5.2)

where the rule R is shown as a function of the values of all the variables at the previ-ous time,at positions relative to that of the cell s(i, j, k ;t − 1). The rule is assumed tobe the same everywhere in the space—there is no space index on the rule. Differencesbetween what is happening at different locations in the space are due only to the val-ues of the variables, not the update rule. The rule is also homogeneous in time; i.e.,the rule is the same at different times.

The locality of the rule shows up in the form of the rule. It is assumed to give thevalue of a particular cell variable at the next time only in terms of the values of cellsin the vicinity of the cell at the previous time. The set of these cells is known as itsneighborhood. For example, the rule might depend only on the values of twenty-seven cells in a cube centered on the location of the cell itself.The indices of these cellsare obtained by independently incrementing or decrementing once, or leaving thesame, each of the indices:

s(i, j, k;t) = R(s(i ± 1,0, j ± 1, 0, k ± 1, 0;t − 1)) (1.5.3)

Ce l l u l a r a u to m a ta 113


*These assumptions are both reasonable and valid for many systems. However, there are systems wherethis is not the most natural set of assumptions. For example, when there are widely divergent speeds ofpropagation of different quantities (e.g.,light and sound) it may be convenient to represent one as instan-taneous (light) and the other as propagating (sound). On a fundamental level, Einstein, Podalsky andRosen carefully formulated the simple assumptions of local influence and found that quantum mechanicsviolates these simple assumptions.A complete understanding of the nature of their paradox has yet to bereached.

01adBARYAM_29412 3/10/02 10:16 AM Page 113

where the informal notation i ± 1,0 is the set {i − 1,i,i + 1}. In this case there are a to-tal of twenty-seven cells upon which the update rule R(s) depends. The neighborhoodcould be smaller or larger than this example.

CA can be usefully simplified to the point where each cell is a single binary vari-able. As usual, the binary variable may use the notation {0,1}, {−1,1}, {ON,OFF} or{↑,↓}. The terminology is often suggested by the system to be described. Two 1-d ex-amples are given in Question 1.5.1 and Fig. 1.5.1. For these 1-d cases we can show thetime evolution of a CA in a single figure,where the time axis runs vertically down thepage and the horizontal axis is the space axis.Each figure is a CA space-time diagramthat illustrates a particular history.

In these examples, a finite space is used rather than an infinite space. We can de-fine various boundary conditions at the edges.The most common is to use a periodicboundary condition where the space wraps around to itself. The one-dimensional ex-amples can be described as circles.A two-dimensional example would be a torus anda three-dimensional example would be a generalized torus. Periodic boundary con-ditions are convenient, because there is no special position in the space. Some caremust be taken in considering the boundary conditions even in this case, because thereare rules where the behavior depends on the size of the space. Another standard kindof boundary condition arises from setting all of the values of the variables outside thefinite space of interest to a particular value such as 0.

Question 1.5.1 Fill in the evolution of the two rules of Fig. 1.5.1. Thefirst CA (Fig. 1.5.1(a)) is the majority rule that sets a cell to the majority

of the three cells consisting of itself and its two neighbors in the previoustime. This can be written using s(i ;t) = ±1 as:

s(i ;t + 1) = sign(s(i − 1;t) + s(i ;t) + s(i + 1;t)) (1.5.4)

In the figure {−1, + 1} are represented by {↑, ↓} respectively.The second CA (Fig. 1.5.1(b)), called the mod2 rule,is obtained by set-

ting the i th cell to be OFF if the number of ON squares in the neighborhoodis e ven, and ON if this number is odd. To write this in a simple form uses(i;t) = {0, 1}. Then:

s(i ;t + 1) = mod2 (s(i − 1;t) + s(i ; t) + s(i + 1;t)) (1.5.5)

Solution 1.5.1 Notes:

1. The first rule (a) becomes trivial almost immediately, since it achieves afixed state after only two updates. Many CA, as well as many physicalsystems on a macroscopic scale, behave this way.

2. Be careful about the boundary conditions when updating the rules,par-ticularly for rule (b).

3. The second rule (b) goes through a sequence of states very differentfrom each other. Surprisingly, it will recover the initial configuration af-ter eight updates. ❚



01adBARYAM_29412 3/10/02 10:16 AM Page 114

Ce l l u l a r a u tom a ta 115


(a)

t

0

Rule

1

2

3

4

5

6

7

8

9

t

1

2

3

4

5

6

7

8

9

(b)

t

1

2

3

4

5

6

7

8

9

0

Rule

1

2

3

4

5

6

7

8

9

t

1 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0

1 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1

Figure 1.5.1 Two examples of one dimensional (1-d) cellular automata. The top row in eachcase gives the initial conditions. The value of a cell at a particular time is given by a rule thatdepends on the values of the cells in its neighborhood at the previous time. For these rulesthe neighborhood consists of three cells: the cell itself and the two cells on either side. Thefirst time step is shown below the initial conditions for (a) the majority rule, where each cellis equal to the value of the majority of the cells in its neighborhood at the previous time and(b) the mod2 rule which sums the value of the cells in the neighborhood modulo two to ob-tain the value of the cell in the next time. The rules are written in Question 1.5.1. The restof the time steps are to be filled in as part of this question. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 115

Question 1.5.2 The evo luti on of the mod2 rule is peri odic in ti m e . Af terei ght update s , the initial state of the sys tem is recovered in Fig. 1 . 5 . 1 ( b ) .

Because the state of the sys tem at a particular time determines uniqu ely thes t a te at every su cceeding ti m e , this is an 8-cycle that wi ll repeat itsel f . Th erea re sixteen cells in the space shown in Fig. 1 . 5 . 1 ( b ) . Is the nu m ber of cells con-n ected with the length of the cycle? Try a space that has ei ght cells (Fig.1 . 5 . 2 ( a ) ) .

Solution 1.5.2 For a space with eight cells, the maximum length of a cycleis four. We could also use an initial condition that has a space periodicity offour in a space with eight cells (Fig. 1.5.2(b)). Then the cycle length wouldonly be two. From these examples we see that the mod2 rule returns to theinitial value after a time that depends upon the size of the space. Moreprecisely, it depends on the periodicity of the initial conditions. The timeperiodicity (cycle length) for these examples is simply related to the spaceperiodicity. ❚

Question 1.5.3 Look at the mod2 rule in a space with six cells(Fig. 1.5.2(c)) and in a space with five cells (Fig. 1.5.2(d)) .What can you

conclude from these trials?

Solution 1.5.3 The mod2 rule can behave quite differently depending onthe periodicity of the space it is in.The examples in Question 1.5.1 and 1.5.2considered only spaces with a periodicity given by 2k for some k. The new ex-amples in this question show that the evolution of the rule may lead to afixed point much like the majority rule. More than one initial conditionleads to the same fixed point. Both the example shown and the fixed pointitself does. Systematic analyses of the cycles and fixed points (cycles of pe-riod one) for this and other rules of this type,and various boundary condi-tions have been performed. ❚

The choice of initial conditions is an important aspect of the operation of manyCA. Computer investigations of CA often begin by assuming a “seed” consisting of asingle cell with the value +1 (a single ON cell) and all the rest −1 (OFF). Alternatively,the initial conditions may be chosen to be random: s(i, j, k;0) = ±1 with equal proba-bility. The behavior of the system with a particular initial condition may be assumedto be generic, or some quantity may be averaged over different choices of initialconditions.

Like the iterative maps we considered in Section 1.1,the CA dynamics may be de-scribed in terms of cycles and attractors. As long as we consider only binary variablesand a finite space, the dynamics must repeat itself after no more than a number ofsteps equal to the number of possible states of the system. This number grows expo-nentially with the size of the space. There are 2N states of the system when there are atotal of N cells. For 100 cells the length of the longest possible cycle would be of order1030. To consider such a long time for a small space may seem an unusual model ofspace-time. For most analogies of CA with physical systems,this model of space-timeis not the most appropriate. We might restrict the notion of cycles to apply only whentheir length does not grow exponentially with the size of the system.



01adBARYAM_29412 3/10/02 10:16 AM Page 116

Rules can be distinguished from each other and classified according to a varietyof features they may possess. For example, some rules are reversible and others arenot. Any reversible rule takes each state onto a unique successor. Otherwise it wouldbe impossible to construct a single valued inverse mapping. Even when a rule isreversible,it is not guaranteed that the inverse rule is itself a CA,since it may not de-pend only on the local values of the variables. An example is given in question 1.5.5.



(a)

1

2

3

4

5

6

7

8

9

t

0

Rule

0 1 1 1 0 0 0 1

0 0 1 0 1 0 1 1

t

1

2

3

4

5

6

7

8

9

(b)

1

2

3

4

5

6

7

8

9

t

0

Rule

0 1 1 1 0 1 1 1

0 0 1 0 0 0 1 0

t

1

2

3

4

5

6

7

8

9

t

1

2

3

4

5

6

7

8

9

(c)

1

2

3

4

5

6

7

8

9

t

0

Rule

t

1

2

3

4

5

6

7

8

9

(d)

1

2

3

4

5

6

7

8

9

t

0

Rule

1

1

0 1 1 1 0

0 0 1 0 1

1

1

0 1 1 1

0 0 1 0

Figure 1.5.2 Four additional examples for the mod2 rule that have different initial condi-tions with specific periodicity: (a) is periodic in 8 cells, (b) is periodic in 4 cells, though itis shown embedded in a space of periodicity 8, (c) is periodic in 6 cells, (d) is periodic in 5cells. By filling in the spaces it is possible to learn about the effect of different periodicitieson the iterative properties of the mod2 rule. In particular, the length of the repeat time (cy-cle length) depends on the spatial periodicity. The cycle length may also depend on the spe-cific initial conditions. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 117

Question 1.5.4 Which if any of the two rules in Fig 1.5.1 is reversible?

Solution 1.5.4 The majority rule is not reversible, because locally we can-not identify in the next time step the difference between sequences that con-tain (11111) and (11011), since both result in a middle three of (111).

A discussion of the mod2 rule is more involved,since we must take intoconsideration the size of the space. In the examples of Questions 1.5.1–1.5.3we see that in the space of six cells the rule is not reversible. In this case sev-eral initial conditions lead to the same result. The other examples all appearto be reversible, since each initial condition is part of a cycle that can be runbackward to invert the rule. It turns out to be possible to construct explicitlythe inverse of the mod2 rule. This is done in Question 1.5.5. ❚

Extra Credit Question 1.5.5 Find the inverse of the mod2 rule,when thisis possible. This question involves some careful algebraic manipulation

and may be skipped.

Solution 1.5.5 To find the inverse of the mod2 rule,it is useful to recall thatequality modulo 2 satisfies simple addition properties including:

s1 = s2 ⇒ s1 + s = s2 + s mod2 (1.5.6)

as well as the special property:

2s = 0 mod2 (1.5.7)

Together these imply that variables may be moved from one side of theequality to the other:

s1 + s = s2 ⇒ s1 = s2 + s mod2 (1.5.8)

Our task is to find the value of all s(i;t) from the values of s(j;t + 1) thatare assumed known. Using Eq. (1.5.8), the mod2 update rule (Eq. (1.5.5))

s(i;t + 1) = (s(i − 1;t) + s(i;t) + s(i + 1;t)) mod2 (1.5.9)

can be rewritten to give us the value of a cell in a layer in terms of the nextlayer and its own neighbors:

s(i − 1;t) = s(i ;t + 1) + s(i;t) + s(i + 1;t ) mod2 (1.5.10)

Substitute the same equation for the second term on the right (using onehigher index) to obtain

s(i − 1;t) = s(i;t + 1) + [s(i + 1;t + 1) + s(i + 1;t) + s(i + 2;t)] + s(i + 1;t)mod2 (1.5.11)

the last term cancels against the middle term of the parenthesis and we have:

s(i − 1;t) = s(i;t + 1) + s(i + 1;t + 1) + s(i + 2;t) mod2 (1.5.12)

It is convenient to rewrite this with one higher index:

s(i;t) = s(i + 1;t + 1) + s(i + 2;t + 1) + s(i + 3;t) mod2 (1.5.13)



01adBARYAM_29412 3/10/02 10:16 AM Page 118

Interestingly, this is actually the solution we have been looking for,though some discussion is necessary to show this. On the right side of theequation appear three cell values. Two of them are from the time t + 1, andone from the time t that we are trying to reconstruct. Since the two cell val-ues from t + 1 are assumed known, we must know only s(i + 3; t) in order toobtain s(i;t). We can iterate this expression and see that instead we need toknow s(i + 6;t) as follows:

s(i;t) = s(i + 1;t +1) + s(i + 2;t + 1)

+ s(i + 4;t + 1) + s(i + 5;t +1) + s(i + 6;t)mod2 (1.5.14)

There are two possible cases that we must deal with at this point. Thefirst is that the number of cells is divisible by three,and the second is that itis not. If the number of cells N is divisible by three, then after iterating Eq.(1.5.13) a total of N/3 times we will have an expression that looks like

s(i;t) = s(i + 1;t +1) + s(i + 2;t + 1)

+ s(i + 4;t + 1) + s(i + 5;t +1) + s(i + 6;t)mod2 (1.5.15)

+ . . .

+ s(i + N − 2;t + 1) + s(i + N − 1;t + 1) + s(i; t)

where we have used the property of the periodic boundary conditions to sets(i + n;t) = s(i;t). We can cancel this value from both sides of the equation.What is left is an equation that states that the sum over particular values ofthe cell variables at time t + 1 must be zero.

0 = s(i + 1; t + 1) + s (i + 2; t + 1)

+ s (i + 4; t + 1) + s(i + 5; t +1) + s(i + 6; t)mod2 (1.5.16)

+ . . .

+ s (i + N − 2; t + 1) + s(i + N − 1; t + 1)

This means that any set of cell values that is the result of the mod2 rule up-date must satisfy this condition. Consequently, not all possible sets of cellvalues can be a result of mod2 updates. Thus the rule is not one-to-one andis not invertible when N is divisible by 3.

When N is not divisible by three, this problem does not arise, becausewe must go around the cell ring three times before we get back to s(i;t). Inthis case,the analogous equation to Eq.(1.5.16) would have every cell valueappearing exactly twice on the right of the equation. This is because each cellappears in two out of the three travels around the ring. Since the cell valuesall appear twice,they cancel,and the equation is the tautology 0 = 0. Thus inthis case there is no restriction on the result of the mod2 rule.

We almost have a full procedure for reconstructing s(i; t). Choose thevalue of one particular cell variable, say s(1;t) = 0. From Eq.(1.5.13), obtainin sequence each of the cell variables s(N − 2;t), s(N − 5,t), . . . By going

Ce l l u l a r au tom a ta 119


01adBARYAM_29412 3/10/02 10:16 AM Page 119

around the ring three times we can find uniquely all of the values. We nowhave to decide whether our original choice was correct. This can be done bydirectly applying the mod2 rule to find the value of say, s(1; t + 1). If we ob-tain the right value, then we have the right choice; if the wrong value, thenall we have to do is switch all of the cell values to their opposites. How do weknow this is correct?

There was only one other possible choice for the value of s(1; t) = 1. Ifwe were to choose this case we would find that each cell value was the oppo-site, or one’s complement, 1 − s(i; t) of the value we found. This can be seenfrom Eq. (1.5.13). Moreover, the mod2 rule preserves complementation.Which means that if we complement all of the values of s(i; t) we will findthe complements of the values of s(1; t + 1). The proof is direct:

1 − s(i;t + 1) = 1 − (s(i − 1;t) + s(i;t) + s(i + 1;t))

= (1 − s(i − 1;t)) + (1 − s(i;t)) + (1 − s(i + 1;t))) − 2 mod2 (1.5.17)

= (1 − s(i − 1;t)) + (1 − s(i;t)) + (1 − s(i + 1;t)))

Thus we can find the unique predecessor for the cell values s(i;t + 1). Withsome care it is possible to write down a fully algebraic expression for thevalue of s(i;t) by implementing this procedure algebraically. The result f orN = 3k + 1 is:

mod2 (1.5.18)

A similar result for N = 3k + 2 can also be found.Note that the inverse of the mod2 rule is not a CA because it is not a lo-

cal rule. ❚

One of the interesting ways to classify CA—introduced by Wolfram—separatesthem into four classes depending on the nature of their limiting behavior. Thisscheme is particularly interesting for us,since it begins to identify the concept of com-plex behavior, which we will address more fully in a later chapter. The notion of com-plex behavior in a spatially distributed system is at least in part distinct from the con-cept of chaotic behavior that we have discussed previously. Specifically, theclassification scheme is:

Class-one CA: evolve to a fixed homogeneous state

Class-two CA: evolve to fixed inhomogeneous states or cycles

Class-three CA: evolve to chaotic or aperiodic behavior

Class-four CA: evolve to complex localized structures

One example of each class is given in Fig. 1.5.3. It is assumed that the length of the cy-cles in class-two automata does not grow as the size of the space increases. This clas-sification scheme has not yet found a firm foundation in analytical work and is sup-ported largely by observation of simulations of various CA.

s(i;t ) = s(i;t +1) + (j=1

(N −1) /3

∑ s(i + 3 j − 2;t + 1)+ s(i + 3 j;t + 1))



01adBARYAM_29412 3/10/02 10:16 AM Page 120



F i g u re 1.5.3 I l l u s t ra t ion of four CA update rules with ra ndom initial cond i t io ns that are in ap e r io d ic space with a period of 100 cells. The initial cond i t io ns are shown at the top and timep roceeds do w nw a rd. Each is updated for 100 steps. O N cells are ind icated as filled squa re s. O F F

cells are not shown. Each of the rules gives the value of a cell in terms of a ne ig h b o r hood offive cells at the pre v ious time. The ne ig h b o r hood consists of the cell itself and the two cellsto the left and to the rig ht. The rules are known as “totalistic” rules since they de p e nd onlyon the sum of the variables in the ne ig h b o r ho o d. Us i ng the no t a t ion si = 0,1, the rules ma ybe re p re s e nted using i(t) = si − 2(t − 1) + si − 1(t − 1) + si(t − 1) + si + 1(t − 1) + si + 2(t − 1 )by specifying the values of i(t) for which si(t) is O N. T hese are (a) only i(t) = 2, (b) only

i(t) = 3, (c) i(t) = 1 and 2, and (d) i(t) = 2 and 4. See paper 1.3 in Wo l f ram’s collectio nof articles on CA. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 121

It has been suggested that class-four automata have properties that enable themto be used as computers.Or, more precisely, to simulate a computer by setting the ini-tial conditions to a set of data representing both the program and the input to theprogram. The result of the computation is to be obtained by looking some time laterat the state of the system. A criteria that is clearly necessary for an automaton to beable to act as a computer is that the result of the dynamics is sensitive to the initialconditions. We will discuss the topic of computation further in Section 1.8.

The flip side of the use of a CA as a model of computation is to design a com-puter that will simulate CA with high efficiency. Such machines have been built, andare called cellular automaton machines (CAMs).

1.5.2 2-d cellular automataTwo- and three-dimensional CA provide more opportunities for contact with physi-cal systems. We illustrate by describing an example of a 2-d CA that might serve as asimple model of droplet growth during condensation. The rule,il lustrated in part pic-torially in Fig. 1.5.4, may be described by saying that a particular cell with four or



Figure 1.5.4 Illustration of a 2-d CA that may be thought of as a simple model of dropletcondensation. The rule sets a cell to be ON (condensed) if four or more of its neighbors arecondensed in the previous time, and OFF (uncondensed) otherwise. There are a total of 2

9=512

possible initial configurations; of these only 10 are shown. The ones on the left have 4 ormore cells condensed and the ones on the right have less than 4 condensed. This rule is ex-plained further by Fig. 1.5.5 and simulated in Fig. 1.5.6. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 122



more “condensed” neighbors at time t is condensed at time t + 1. Neighbors arecounted from the 3 × 3 square region surrounding the cell, including the cell itself.

Fig. 1.5.5 shows a simulation of this rule starting from a random initial startingpoint of approximately 25% condensed (ON) and 75% uncondensed (OFF) cells. Overthe first few updates, the random arrangement of dots resolves into droplets, whereisolated condensed cells disappear and regions of higher density become the droplets.Then over a longer time, the droplets grow and reach a stable configuration.

The characteristics of this rule may be understood by considering the propertiesof boundaries between condensed and uncondensed regions,as shown in Fig. 1.5.6.Boundaries that are vertical,horizontal or at a 45˚ diagonal are stable. Other bound-aries will move,increasing the size of the condensed region. Moreover, a concave cor-ner of stable edges is not stable. It will grow to increase the condensed region.On theother hand,a convex corner is stable. This means that convex droplets are stable whenthey are formed of the stable edges.

It can be shown that for this size space,the 25% initial filling is a transition den-sity, where sometimes the result will fill the space and sometimes it will not. Forhigher densities, the system almost always reaches an end point where the wholespace is condensed. For lower densities, the system almost always reaches a stable setof droplets.

This example illustrates an important point about the dynamics of many sys-tems, which is the existence of phase transitions in the kinetics of the system. Suchphase transitions are similar in some ways to the thermodynamic phase transitionsthat describe the equilibrium state of a system changing from, for example,a solid toa liquid. The kinetic phase transitions may arise from the choice of initial conditions,as they did in this example. Alternatively, the phase transition may occur when weconsider the behavior of a class of CA as a function of a parameter. The parametergradually changes the local kinetics of the system; however, measures of its behaviormay change abruptly at a particular value. Such transitions are also common in CAwhen the outcome of a particular update is not deterministic but stochastic, as dis-cussed in Section 1.5.4.

1.5.3 Conway’s Game of LifeOne of the most popular CA is known as Conway’s Game of Life. Conceptually, it isdesigned to capture in a simple way the reproduction and death of biological organ-isms. It is based on a model where,locally, if there are too few organisms or too manyorganisms the organisms will disappear. On the other hand,if the number of organ-isms is just right,they will multiply. Quite surprisingly, the model takes on a life of itsown with a rich dynamical behavior that is best understood by direct observation.

The specific rule is defined in terms of the 3 × 3 neighborhood that was used inthe last section. The rule,illustrated in Fig. 1.5.7,specifies that when there are less thanthree or more than four ON (populated) cells in the neighborhood,the central cell willbe OFF (unpopulated) at the next time. If there are three ON cells,the central cell willbe ON at the next time. If there are four ON cells,then the central cell will keep its pre-vious state—ON if it was ON and OFF if it was OFF.

01adBARYAM_29412 3/10/02 10:16 AM Page 123



Figure 1.5.5 S i mu l a t ion of the conde ns a t ion CA described in Fig. 1.5.4. The initial cond i t io nsa re chosen by setting ra ndomly each site O N with a probability of 1 in 4. The initial few stepsresult in isolated O N sites disappearing and small ra g ged droplets of O N sites fo r m i ng in hig he r -de nsity re g io ns. The droplets grow and smo o t hen their bounda r ies until at the sixtieth fra mea static arra nge me nt of convex droplets is re a c he d. The first few steps are shown on the firstp a ge. Every tenth step is shown on the second page up to the sixtieth.

01adBARYAM_29412 3/10/02 10:16 AM Page 124



Figure 1.5.5 C o n t i n u e d . T he initial occupation probability of 1 in 4 is near a phase tra ns i-t ion in the kine t ics of this mo del for a space of this size. For slig htly hig her de ns i t ies the fi-nal config u ra t ion consists of a droplet covering the whole space. For slig htly lower de ns i t ie st he final config u ra t ion is of isolated dro p l e t s. At a probability of 1 in 4 either may occur de-p e nd i ng on the specific initial state. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 125



Figure 1.5.6 The droplet condensation model of Fig. 1.5.4 may be understood by noting thatcertain boundaries between condensed and uncondensed regions are stable. A completely sta-ble shape is illustrated in the upper left. It is composed of boundaries that are horizontal,vertical or diagonal at 45˚. A boundary that is at a different angle, such as shown on the up-per right, will move, causing the droplet to grow. On a longer length scale a stable shape(droplet) is illustrated in the bottom figure. A simulation of this rule starting from a randominitial condition is shown in Fig. 1.5.5. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 126

F i g. 1.5.8 shows a simu l a ti on of the rule starting from the same initial con d i ti on su s ed for the con den s a ti on rule in the last secti on . Th ree sequ en tial frames are shown ,t h en after 100 steps an ad d i ti onal three frames are shown . Frames are also shown after200 and 300 step s .Af ter this amount of time the rule sti ll has dynamic activi ty from fra m eto frame in some regi ons of the sys tem , while others are app a ren t ly static or under go sim-ple cyclic beh avi or. An example of c yclic beh avi or may be seen in several places wh eret h ere are hori zontal bars of t h ree O N cells that swi tch every time step bet ween hori zon-tal and verti c a l . Th ere are many more com p l ex local stru ctu res that repeat cycl i c a lly wi t hmu ch lon ger repeat cycl e s .Moreover, t h ere are special stru ctu res call ed gl i ders that tra n s-l a te in space as they cycle thro u gh a set of con f i g u ra ti on s . The simplest gl i der is shownin Fig. 1 . 5 . 9 ,a l ong with a stru ctu re call ed a gl i der gun, wh i ch cre a tes them peri od i c a lly.

We can make a con n ecti on bet ween Conw ay ’s Game of L i fe and the qu ad ra tic it-era tive map con s i dered in Secti on 1.1. The ri ch beh avi or of the itera tive map was fo u n dbec a u s e , for low va lues of the va ri a ble the itera ti on would increase its va lu e , while for



Figure 1.5.7 The CA rule Conway’s Game of Life is illustrated for a few cases. When there arefewer than three or more than four neighbors in the 3 × 3 region the central cell is OFF in thenext step. When there are three neighbors the central cell is ON in the next step. When thereare four neighbors the central cell retains its current value in the next step. This rule was de-signed to capture some ideas about biological organism reproduction and death where toofew organisms would lead to disappearance because of lack of reproduction and too manywould lead to overpopulation and death due to exhaustion of resources. The rule is simulatedin Fig. 1.5.8 and 1.5.9. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 127



1

2

3

101

102

103

Figure 1.5.8 Simulation of Conway’s Game of Life starting from the same initial conditionsas used in Fig. 1.5.6 for the condensation rule where 1 in 4 cells are ON. Unlike the conden-sation rule there remains an active step-by-step evolution of the population of ON cells formany cycles. Illustrated are the three initial steps, and three successive steps each startingat steps 100, 200 and 300.

01adBARYAM_29412 3/10/02 10:16 AM Page 128



201

202

203

301

302

303

Figure 1.5.8 Continued. After the initial activity that occurs everywhere, the pattern of ac-tivity consists of regions that are active and regions that are static or have short cyclical ac-tivity. However, the active regions move over time around the whole space leading to changeseverywhere. Eventually, after a longer time than illustrated here, the whole space becomes ei-ther static or has short cyclical activity. The time taken to relax to this state increases withthe size of the space. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 129



1

2

3

4

5

6

Figure 1.5.9 Special initial conditions simulated using Conway’s Game of Life result in struc-tures of ON cells called gliders that travel in space while progressing cyclically through a setof configurations. Several of the simplest type of gliders are shown moving toward the lowerright. The more complex set of ON cells on the left, bounded by a 2 × 2 square of ON cells ontop and bottom, is a glider gun. The glider gun cycles through 30 configurations during whicha single glider is emitted. The stream of gliders moving to the lower right resulted from theactivity of the glider gun. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 130

h i gh va lues the itera ti on would dec rease its va lu e . Conw ay ’s Game of L i fe and other CAthat ex h i bit intere s ting beh avi or also contain similar nonlinear feed b ack . Moreover, t h es p a tial arra n gem ent and coupling of the cells gives rise to a va ri ety of n ew beh avi ors .

1.5.4 Stochastic cellular automataIn addition to the deterministic automaton of Eq. (1.5.3), we can define a stochasticautomaton by the probabilities of transition from one state of the system to another:

P({s(i, j, k; t)}|{s(i, j, k; t − 1)}) (1.5.19)

This general stochastic rule for the 2N states of the system may be simplified.We haveassumed for the deterministic rule that the rule for updating one cell may be per-formed independently of others. The analog for the stochastic rule is that the updateprobabilities for each of the cells is independent. If this is the case,then the total prob-ability may be written as the product of probabilities of each cell value. Moreover, ifthe rule is local,the probability for the update of a particular cell will depend only onthe values of the cell variables in the neighborhood of the cell we are considering.

(1.5.20)

where we have used the notation N(i , j , k ; t) to indicate the values of the cell variablesin the neighborhood of (i , j , k). For example, we might consider modifying thedroplet condensation model so that a cell value is set to be ON with a certain proba-bility (depending on the number of ON neighbors) and OFF otherwise.

Stochastic automata can be thought of as modeling the effects of noise and morespecifically the ensemble of a dynamic system that is subject to thermal noise. Thereis another way to make the analogy between the dynamics of a CA and a thermody-namic system that is exact—if we consider not the space of the automaton but thed + 1 dimensional space-time. Consider the ensemble of all possible histories of theCA. If we have a three-dimensional space,then the histories are a set of variables withfour indices {s(i, j, k, t)}. The probability of a particular set of these variables occur-ring (the probability of this history) is given by

(1.5.21)

This expression is the product of the probabilities of each update occurring in the his-tory. The first factor on the right is the probability of a particular initial state in theensemble we are considering. If we consider only one starting configuration,its prob-ability would be one and the others zero.

We can relate the probability in Eq.(1.5.21) to thermodynamics using Boltzmannprobability. We simply set it to the expression for the Boltzmann probability at a par-ticular temperature T.

P({s(i, j, k,t)}) = e −E({s(i, j, k, t)})/kT (1.5.22)

There is no need to include the normalization constant Z because the probabilities areautomatically normalized. What we have done is to define the energy of the particu-lar state as:

E({s(i, j, k, t)}) = kT ln (P({s(i, j, k,t)})) (1.5.23)

P({s(i, j,k,t)})=t

∏ P0(s(i, j ,k;t)| N(i, j ,k;t −1))i ,j,k∏ P({s(i, j,k;0)})

P({s(i, j, k; t)}| {s(i, j, k; t − 1)})= P0(s(i, j, k; t)| N(i, j, k; t − 1))i, j,k∏



01adBARYAM_29412 3/10/02 10:16 AM Page 131

This expression shows that any d dimensional automaton can be related to a d + 1 di-mensional system described by equilibrium Boltzmann probabilities. The ensembleof the d + 1 dimensional system is the set of time histories of the automaton.

There is an important cautionary note about the conclusion reached in the lastparagraph. While it is true that time histories are directly related to the ensemble of athermodynamic system,there is a hidden danger in this analogy. These are not typi-cal thermodynamic systems, and therefore our intuition about how they should be-have is not trustworthy. For example, the time direction may be very different fromany of the space directions. For the d + 1 dimensional thermodynamic system, thismeans that one of the directions must be singled out. This kind of asymmetry doesoccur in thermodynamic systems, but it is not standard. Another example of the dif-ference between thermodynamic systems and CA is in their sensitivity to boundaryconditions. We have seen that many CA are quite sensitive to their initial conditions.While we have shown this for deterministic automata,it continues to be true for manystochastic automata as well. The analog of the initial conditions in a d + 1 dimensionalthermodynamic system is the surface or boundary conditions. Thermodynamic sys-tems are typically insensitive to their boundary conditions. However, the relationshipin Eq.(1.5.23) suggests that at least some thermodynamic systems are quite sensitiveto their boundary conditions. An interesting use of this analogy is to attempt to dis-cover special thermodynamic systems whose behavior mimics the interesting behav-ior of CA.

1.5.5 CA generalizationsThere are a variety of generalizations of the simplest version of CA which are usefulin developing models of particular systems. In this section we briefly describe a few ofthem as illustrated in Fig. 1.5.10.

It is often convenient to consider more than one variable at a particular site.One way to think about this is as multiple spaces (planes in 2-d,lines in 1-d) that arecoupled to each other. We could think about each space as a different physical quan-tity. For example, one might represent a magnetic field and the other an electricfield. Another possibility is that we might use one space as a thermal reservoir. Thesystem we are actually interested in might be simulated in one space and the thermalreservoir in another. By considering various combinations of multiple spaces repre-senting a physical system, the nature of the physical system can become quite rich inits structure.

We can also consider the update rule to be a compound rule formed of a sequenceof steps.Each of the steps updates the cells. The whole rule consists of cycling throughthe set of individual step rules. For example,our update rule might consist of two dif-ferent steps. The first one is performed on every odd step and the second is performedon every even step. We could reduce this to the previous single update step case bylooking at the composite of the first and second steps. This is the same as looking atonly every even state of the system. We could also reduce this to a multiple space rule,where both the odd and even states are combined together to be a single step.



01adBARYAM_29412 3/10/02 10:16 AM Page 132

However, it may be more convenient at times to think about the system as perform-ing a cycle of update steps.

Finally, we can allow the state of the system at a particular time to depend on thestate of the system at several previous times,not just on the state of the system at theprevious time.A rule might depend on the most recent state of the system and the pre-vious one as well. Such a rule is also equivalent to a rule with multiple spaces, by con-sidering both the present state of the system and its predecessor as two spaces. Oneuse of considering rules that depend on more than one time is to enable systematicconstruction of reversible deterministic rules from nonreversible rules. Let the origi-nal (not necessarily invertible) rule be R(N(i, j, k ; t)). A new invertible rule can bewritten using the form

s(i, j, k ; t) = mod2(R(N(i, j, k ;t − 1)) + s(i, j, k ; t − 2)) (1.5.24)

The inverse of the update rule is immediately constructed using the properties of ad-dition modulo 2 (Eq. (1.5.8)) as:

s(i, j, k ; t − 2) = mod2(R(N(i, j, k ; t − 1)) + s(i, j, k ; t)) (1.5.25)

1.5.6 Conserved quantities and Margolus dynamicsStandard CA are not well suited to the description of systems with constraints or con-servation laws. For example, if we want to conserve the number of ON cells we mustestablish a rule where turning OFF one cell (switching it from ON to OFF) is tied toturning ON another cell. The standard rule considers each cell separately when an up-date is performed. This makes it difficult to guarantee that when this particular cell isturned OFF then another one will be turned ON. There are many examples of physicalsystems where the conservation of quantities such as number of particles, energy andmomentum are central to their behavior.

A systematic way to construct CA that describe systems with conserved quanti-ties has been developed. Rules of this kind are known as partitioned CA or Margolusrules (Fig. 1.5.11). These rules separate the space into nonoverlapping partitions (alsoknown as neighborhoods). The new value of each cell in a partition is given in termsof the previous values of the cells in the same partition. This is different from the con-ventional automaton, since the local rule has more than one output as well as morethan one input. Such a rule is not sufficient in itself to describe the system update,since there is no communication in a single update between different partitions. Thecomplete rule must specify how the partitions are shifted after each update with re-spect to the underlying space. This shifting is an essential part of the dynamical rulethat restores the cellular symmetry of the space.

The convenience of this kind of CA is that specification of the rule gives us directcontrol of the dynamics within each partition, and therefore we can impose conser-vation rules within the partition. Once the conservation rule is imposed inside thepartition, it will be maintained globally—throughout the whole space and throughevery time step. Fig. 1.5.12 illustrates a rule that conserves the number of ON cells in-side a 2 × 2 neighborhood. The ON cells may be thought of as particles whose num-



01adBARYAM_29412 3/10/02 10:16 AM Page 133



R

R

R

R

R

R

R

R

R

R

R

R

(a)

(b)

Figure 1.5.10 Schematic illustrations of several modifications of the simplest CA rule. Thebasic CA rule updates a set of spatially arrayed cell variables shown in (a). The first modifi-cation uses more than one variable in each cell. Conceptually this may be thought of as de-scribing a set of coupled spaces, where the case of two spaces is shown in (b). The secondmodification makes use of a compound rule that combines several different rules, where the

01adBARYAM_29412 3/10/02 10:16 AM Page 134



R1

R2

R1

R2

R1

R2

R

(c)

(d)

case of two rules is shown in (c). The third modification shown in (d) makes use of a rule thatdepends on not just the most recent value of the cell variables but also the previous one. Both(c) and (d) may be described as special cases of (b) where two successive values of the cellvariables are considered instead as occurring at the same time in different spaces. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 135



Conventional CA rule

Partitioned (Margolus) CA rule

Partition Alternation

Figure 1.5.11 Pa r t i t io ned CA (Ma rgolus rules) enable the imposition of cons e r v a t ion laws ina direct way. A convent io nal CA gives the value of an ind i v idual cell in terms of the pre v io u svalues of cells in its ne ig h b o r hood (top). A partitio ned CA gives the value of several cells in ap a r t icular partition in terms of the pre v ious values of the same cells (center). This enables con-s e r v a t ion rules to be imposed directly within a particular partition. An example is given in Fig .1.5.12. In add i t ion to the rule for upda t i ng the partition, the dy na m ics must specify how thep a r t i t io ns are to be shifted from step to step. For example (bottom), the use of a 2 × 2 parti-t ion may be impleme nted by alterna t i ng the partitio ns from the solid lines to the da s hed line s.Every even update the da s hed lines are used and every odd update the solid lines are used top a r t i t ion the space. This re s t o res the cellular perio d icity of the space and enables the cells toc o m mu n icate with each othe r, which is not possible without the shifting of partitio ns. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 136

ber is conserved. The only requirement is that each of the possible arrangement ofparticles on the left results in an arrangement on the right with the same number ofparticles. This rule is augmented by specifying that the 2 × 2 partitions are shifted bya single cell to the right and down after every update. The motion of these particles isthat of an unusual gas of particles.

The rule shown is only one of many possible that use this 2 × 2 neighborhoodand conserve the number of particles. Some of these rules have additional propertiesor symmetries.A rule that is constructed to conserve particles may or may not be re-versible. The one illustrated in Fig. 1.5.12 is not reversible. There exist more than onepredecessor for particular values of the cell variables. This can be seen from the twomappings on the lower left that have the same output but different input.A rule thatconserves particles also may or may not have a particular symmetry, such as a sym-metry of reflection.A symmetry of reflection means that reflection of a configurationacross a particular axis before application of the rule results in the same effect as re-flection after application of the rule.

The existence of a well-defined set of rules that conserves the number of parti-cles enables us to choose to study one of them for a specific reason. Alternatively, byrandomly constructing a rule which conserves the number of particles, we can learnwhat particle conservation does in a dynamical system independent of other regular-ities of the system such as reversibility and reflection or rotation symmetries. Moresystematically, it is possible to consider the class of automata that conserve particlenumber and investigate their properties.

Question 1.5.6 Design a 2-d Margolus CA that represents a particle orchemical reaction: A + B ↔ C. Discuss some of the parameters that must

be set and how you could use symmetries and conservation laws to set them.

Solution 1.5.6 We could use a 2 × 2 partition just like that in Fig. 1.5.12.On each of the four squares there can appear any one of the four possibili-ties (O, A, B, C). There are 44 = 256 different initial conditions of the parti-tion.Each of these must be paired with one final condition,if the rule is de-terministic. If the rule is probabilistic, then probabilities must be assignedfor each possible transition.

To represent a chemical reaction, we choose cases where A and B are ad-jacent (horizontally or vertically) and replace them with a C and a 0. If weprefer to be consistent, we can always place the C where A was before. To gothe other direction, we take cases where C is next to a 0 and replace them withan A and a B. One question we might ask is, Do we want to have a reactionwhenever it is possible, or do we want to assign some probability for the re-action? The latter case is more interesting and we would have to use a prob-abilistic CA to represent it. In addition to the reaction, the rule would in-clude particle motion similar to that in Fig. 1.5.12.

To apply symmetries, we could assume that reflection along horizontalor vertical axes, or rotations o f the partition by 90˚ before the update, willhave the same effect as a reflection or rotation of the partition after the



01adBARYAM_29412 3/10/02 10:16 AM Page 137

update. We could also assume that A, B and C move in the same way whenthey are by themselves. Moreover, we might assume that the rule is symmet-ric under the transformation A ↔ B.

There is a simpler approach that requires enumerating many fewer states.We choose a 2 × 1 rectangular partition that has only two cells,and 42 = 16possible states. Of these, four do not change: [A,A], [B,B], [C,C] and [0,0].



Figure 1.5.12 Illustration of a particular 2-d Margolus rule that preserves the number of ON

cells which may be thought of as particles in a gas. The requirement for conservation of num-ber of particles is that every initial configuration is matched with a final configuration hav-ing the same number of ON cells. This particular rule does not observe conventional symme-tries such as reflection or rotation symmetries that might be expected in a typical gas. Manyrules that conserve particles may be constructed in this framework by changing around thefinal states while preserving the number of particles in each case. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 138

Eight others are paired because the cell values can be switched to achieveparticle motion (with a certain probability): [A,0] ↔ [0,A], [B,0] ↔ [0,B],[C,A] ↔ [A,C],and [C,B] ↔ [B,C].Finally, the last four, [C,0],[0,C], [A,B]and [B, A],can participate in reactions. If the rule is deterministic,they mustbe paired in a unique way for possible transitions. Otherwise,each possibil-ity can be assigned a probability:[C,0] ↔ [A,B],[0,C] ↔ [B,A],[C,0] ↔[B,A]and [0,C] ↔ [A,B]. The switching of the particles without undergoing reac-tion for these states may also be allowed with a certain probability. Thus,eachof the four states can have a nonzero transition probability to each of the oth-ers. These probabilities may be related by the symmetries mentioned before.Once we have determined the update rule for the 2x1 partition, we can chooseseveral ways to map the partitions onto the plane.The simplest are obtainedby dividing each of the 2 × 2 partitions in Fig. 1.5.11 horizontally or verti-cally. This gives a total of four ways to partition the plane. These four can al-ternate when we simulate this CA. ❚

1.5.7 Differential equations and CACellular automata are an alternative to differential equations for the modeling ofphysical systems. Differential equations when modeled numerically on a computerare often discretized in order to perform integrals. This discretization is an approxi-mation that might be considered essentially equivalent to setting up a locally discretedynamical system that in the macroscopic limit reduces to the differential equation.Why not then start from a discrete system and prove its relevance to the problem ofinterest? This a priori approach can provide distinct computational advantages. Thisargument might lead us to consider CA as an approximation to differential equa-tions. However, it is possible to adopt an even more direct approach and say that dif-ferential equations are themselves an approximation to aspects of physical reality. CAare a different but equally valid approach to approximating this reality. In general,differential equations are more convenient for analytic solution while CA are moreconvenient for simulations. Since complex systems of differential equations are oftensolved numerically anyway, the alternative use of CA appears to be worth systematicconsideration.

While both cellular automata and differential equations can be used to modelmacroscopic systems,this should not be taken to mean that the relationship betweendifferential equations and CA is simple. Recognizing a CA analog to a standard dif-ferential equation may be a difficult problem.One of the most extensive efforts to useCA for simulation of a system more commonly known by its differential equation isthe problem of hydrodynamics. Hydrodynamics is typically modeled by the Navier-Stokes equation. A type of CA called a lattice gas (Section 1.5.8) has been designedthat on a length scale that is large compared to the cellular scale reproduces the be-havior of the Navier-Stokes equation. The difficulties of solving the differential equa-tion for specific boundary conditions make this CA a powerful tool for studying hy-drodynamic flow.



01adBARYAM_29412 3/10/02 10:16 AM Page 139

A frequently occurring differential equation is the wave equation. The wave equa-tion describes an elastic medium that is approximated as a continuum. The waveequation emerges as the continuum limit of a large variety of systems. It is to be ex-pected that many CA will also display wavelike properties. Here we use a simple ex-ample to illustrate one way that wavelike properties may arise. We also show how theanalogy may be quite different than intuition might suggest. The wave equation writ-ten in 1-d as

(1.5.26)

has two types of solutions that are waves traveling to the right and to the left with wavevectors k and frequencies of oscillation k = ck:

(1.5.27)

A particular solution is obtained by choosing the coefficients Ak and Bk. These solu-tions may also be written in real space in the form:

f = A(x − ct) + B(x + ct) (1.5.28)

where

(1.5.29)

are two arbitrary functions that specify the initial conditions of the wave in an infi-nite space.

We can construct a CA analog of the wave equation as illustrated in Fig. 1.5.13. Itshould be understood that the wave equation will arise only as a continuum or longwave limit of the CA dynamics.However, we are not restricted to considering a modelthat mimics a vibrating elastic medium. The rule we construct consists of a 1-d par-titioned space dynamics.Each update, adjacent cells are paired into partitions of twocells each. The pairing switches from update to update,analo gous to the 2-d examplein Fig. 1.5.11. The dynamics consists solely of switching the contents of the two adja-cent cells in a single partition. Starting from a particular initial configuration, it canbe seen that the contents of the odd cells moves systematically in one direction (rightin the figure),while the contents of the even cells moves in the opposite direction (leftin the figure). The movement proceeds at a constant velocity of c = 1 cell/update. Thuswe identify the contents of the odd cells as the rightward traveling wave,and the evencells as the leftward traveling wave.

The dynamics of this CA is the same as the dynamics of the wave equation ofEq.(1.5.28) in an infinite space. The only requirement is to encode appropriately theinitial conditions A(x), B(x) in the cells. If we use variables with values in the conven-

˜ A (x) = Akeikx

k∑

˜ B (x) = Bkeikx

k∑

f = Akei(kx− kt ) + Bke

i (kx+ kt )

k

∑

2 f

t 2= c 2

2 f

x 2



01adBARYAM_29412 3/10/02 10:16 AM Page 140

tional real continuum si ∈ℜ , then the (discretized) waves may be encoded directly. Ifa binary representation si = ±1 is used, the local average over odd cells represents theright traveling wave A(x − ct),and the local average over even cells represents the lefttraveling wave B(x + ct).

1.5.8 Lattice gasesA lattice gas is a type of CA designed to model gases or liquids of colliding particles.Lattice gases are formulated in a way that enables the collisions to conservemomentum as well as number of particles. Momentum is represented by setting thevelocity of each particle to a discrete set of possibilities.A simple example, the HPPgas,is illustrated in Fig. 1.5.14.Each cell contains four binary variables that representthe presence (or absence) of particles with unit velocity in the four compass directionsNESW. In the figure,the presence of a particle in a cell is indicated by an arrow. Therecan be up to four particles at each site.Each particle present in a single cell must havea distinct velocity.

Ce l l u l a r a u to ma ta 141


t

0

1

2

3

4

5

6

7

8

9

t

1

2

3

4

5

6

7

8

9

1 -1 -1 1 1 -1 1 1 -1 1 -1 1 1 -1 1 1

-1 1 1 -1 -1 1 1 1 1 -1 1 -1 -1 1 1 1

1 1 -1 1 1 -1 -1 11 -1 1 1 1 -1 1 -1

Figure 1.5.13 A simple 1-d CA using a Margolus rule, which switches the values of the twoadjacent cells in the partition, can be used to model the wave equation. The partitions al-ternate between the two possible ways of partitioning the cells every time step. It can beseen that the initial state is propagated in time so that the odd (even) cells move at a fixedrate of one cell per update to the right (left). The solutions of the wave equation likewiseconsist of a right and left traveling wave. The initial conditions of the wave equation solu-tion are the analog of the initial condition of the cells in the CA. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 141

The dynamics of the HPP gas is performed in two steps that alternate: propaga-tion and collision. In the propagation step, particles move from the cell they are in tothe neighboring cell in the direction of their motion. In the collision step, each cellacts independently, changing the particles from incoming to outgoing according toprespecified collision rules. The rule for the HPP gas is illustrated in Fig. 1.5.15.Because of momentum conservation in this rule, there are only two possibilities forchanges in the particle velocity as a result of a collision.A similar lattice gas,the FHPgas, which is implemented on a hexagonal lattice of cells rather than a square lattice,has been proven to give rise to the Navier-Stokes hydrodynamic equations on amacroscopic scale. Due to properties of the square lattice in two dimensions, this be-havior does not occur for the HPP gas. One way to understand the limitation of thesquare lattice is to realize that for the HPP gas (Fig. 1.5.14),momentum is conservedin any individual horizontal or vertical stripe of cells. This type of conservation law isnot satisfied by hydrodynamics.

1.5.9 Material growthOne of the natural physical systems to model using CA is the problem of layer-by-layer material growth such as is achieved in molecular beam epitaxy. There are manyareas of study of the growth of materials. For example,in cases where the material isformed of only a single type of atom,it is the surface structure during growth that isof interest. Here, we focus on an example of an alloy formed of several different atoms,where the growth of the atoms is precisely layer by layer. In this case the surface struc-ture is simple, but the relative abundance and location of different atoms in the ma-terial is of interest. The simplest case is when the atoms are found on a lattice that isprespecified, it is only the type of atom that may vary.

The analogy with a CA is established by considering each layer of atoms, when itis deposited, as represented by a 2-d CA at a particular time. As shown in Fig. 1.5.16the cell values of the automaton represent the type of atom at a particular site. Thevalues of the cells at a particular time are preserved as the atoms of the layer depositedat that time. It is the time history of the CA that is to be interpreted as representingthe structure of the alloy. This picture assumes that once an atom is incorporated ina complete layer it does not move.

In order to construct the CA, we assume that the probability of a particular atombeing deposited at a particular location depends on the atoms residing in the layerimmediately preceding it. The stochastic CA rule in the form of Eq.(1.5.20) specifiesthe probability of attaching each kind of atom to every possible atomic environmentin the previous layer.

We can illustrate how this might work by describing a specific example.There ex-ist alloys formed out of a mixture of gallium,arsenic and silicon.A material formedof equal proportions of gallium and arsenic forms a GaAs crystal, which is exactly likea silicon crystal, except the Ga and As atoms alternate in positions. When we put sili-con together with GaAs then the silicon can substitute for either the Ga or the Asatoms. If there is more Si than GaAs, then the crystal is essentially a Si crystal withsmall regions of GaAs,and isolated Ga and As. If there is more GaAs than Si,then the



01adBARYAM_29412 3/10/02 10:16 AM Page 142


Propagation step

Collision step

Figure 1.5.14 Illustration of theupdate of the HPP lattice gas. In alattice gas, binary variables in eachcell indicate the presence of parti-cles with a particular velocity. Herethere are four possible particles ineach cell with unit velocities in thefour compass directions, NESW.Pictorially the presence of a particleis indicated by an arrow in the di-rection of its velocity. Updating thelattice gas consists of two steps:propagating the particles accordingto their velocities, and allowing theparticles to collide according to acollision rule. The propagation stepconsists of moving particles fromeach cell into the neighboring cellsin the direction of their motion. Thecollision step consists of each cellindependently changing the veloci-ties of its particles. The HPP colli-sion rule is shown in Fig. 1.5.15, andimplemented here from the middleto the bottom panel. For conve-nience in viewing the different stepsthe arrows in this figure alternatebetween incoming and outcoming.Particles before propagation (top)are shown as outward arrows fromthe center of the cell. After the prop-agation step (middle) they areshown as incoming arrows. After col-lision (bottom) they are againshown as outgoing arrows. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 143



t t

Figure 1.5.16 Illustration of the time history of a CA and its use to model the structure ofa material (alloy) formed by a layer by layer growth. Each horizontal dashed line representsa layer of the material. The alloy has three types of atoms. The configuration of atoms in eachlayer depends only on the atoms in the layer preceding it. The type of atom, indicated in thefigure by filled, empty and shaded dots, are determined by the values of the cell variables ofthe CA at a particular time, si(t) = ±1,0. The time history of the CA is the structure of thematerial. ❚

Figure 1.5.15 Thecollision rule forthe HPP lattice gas.With the exceptionof the case of twoparticles coming infrom N and S andleaving from E andW, or vice versa(dashed box), thereare no changes inthe particle veloci-ties as a result ofcollisions in thisrule. Momentumconservation doesnot allow any otherchanges. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 144

crystal will be essentially a GaAs crystal with isolated Si atoms. We can model thegrowth of the alloys formed by different relative proportions of GaAs and Si of theform (GaAs)1-xSix using a CA. Each cell of the CA has a variable with three possiblevalues si = ±1,0 that would represent the occupation of a crystal site by Ga, As and Sirespectively. The CA rule (Eq. (1.5.20)) would then be constructed by assuming dif-ferent probabilities for adding a Si, Ga and As atom at the surface. For example, thelikelihood of finding a Ga next to a Ga atom or an As next to an As is small, so theprobability of adding a Ga on top of a Ga can be set to be much smaller than otherprobabilities. The probability of an Si atom si = 0 could be varied to reflect differentconcentrations of Si in the growth. Then we would be able to observe how the struc-ture of the material changes as the Si concentration changes.

This is one of many examples of physical, chemical and biological systems thathave been modeled using CA to capture some of their dynamical properties. We willencounter others in later chapters.

Statistical Fields

In real systems as well as in kinetic models such as cellular automata (CA) discussedin the previous section, we are often interested in finding the state of a system—thetime averaged (equilibrium) ensemble when cycles or randomness are present—thatarises after the fast initial kinetic processes have occurred. Our objective in this sec-tion is to treat systems with many degrees of freedom using the tools of equilibriumstatistical mechanics (Section 1.3). These tools describe the equilibrium ensemble di-rectly rather than the time evolution. The simplest example is a collection of inter-acting binary variables, which is in many ways analogous to the simplest of the CAmodels. This model is known as the Ising model,and was introduced originally to de-scribe the properties of magnets.Each of the individual variables corresponds to a mi-croscopic magnetic region that arises due to the orbital motion of an electron or theinternal degree of freedom known as the spin of the electron.

The Ising model is the simplest model of i n teracting degrees of f reedom . E ach of the va ri a bles is bi n a ry and the interacti ons bet ween them are on ly spec i f i ed by on ep a ra m eter—the strength of the interacti on . Rem a rk a bly, m a ny com p l ex sys tems wewi ll be con s i dering can be model ed by the Ising model as a first approx i m a ti on . Wewi ll use several vers i ons of the Ising model to discuss neu ral net works in Ch a pter 2 andpro teins in Ch a pter 4. The re a s on for the usefulness of this model is the very ex i s ten ceof i n teracti ons bet ween the el em en t s . This interacti on is not pre s ent in simpler mod-els and re sults in va rious beh avi ors that can be used to understand some of the key as-pects of com p l ex sys tem s . The con cepts and tools that are used to stu dy the Ising modelalso may be tra n s ferred to more com p l i c a ted model s . It should be unders tood , h ow-ever, that the Ising model is a simplistic model of m a gn ets as well as of o t h er sys tem s .

In Section 1.3 we considered the ideal gas with collisions. The collisions were aform of interaction. However, these interactions were incidental to the model becausethey were assumed to be so short that they were not present during observation. Thisis no longer true in the Ising model.

1.6

S t a t i s t i c a l f i e l d s 145


01adBARYAM_29412 3/10/02 10:16 AM Page 145

1.6.1 The Ising model without interactionsThe Ising model describes the energy of a collection of elements (spins) representedby binary variables.It is so simple that there is no kinetics, only an energy E[{si}].Laterwe will discuss how to reintroduce a dynamics for this model. The absence of a dy-namics is not a problem for the study of the equilibrium properties of the system,since the Boltzmann probability (Eq.(1.3.29)) depends only upon the energy. The en-ergy is sp ecified as a function of the values of the binary variables {si = ±1}. Unlessnecessary, we will use one index for all of the spin variables regardless of dimension-ality. The use of the term “spin” originates from the magnetic analogy. There is noother specific term,so we adopt this terminology. The term “spin” emphasizes that thebinary variable represents the state of a physical entity such that the collection of spinsis the system we are interested in.A spin can be il lustrated as an arrow of fixed length(see Fig. 1.6.1). The value of the binary variable describes its orientation, where +1 in-dicates a spin oriented in the positive z direction (UP),and –1 indicates a spin orientedin the negative z direction (DOWN).

Before we consider the effects of interactions between the spins, we start by con-sidering a system where there are no interactions. We can write the energy of such asystem as:

(1.6.1)

Where ei(si) is the energy of the i th spin that does not depend on the values of any ofthe other spins. Since si are binary we can write this as:

(1.6.2)

All of the terms that do not depend on the spin va ri a bles have been co ll ected toget h eri n to a con s t a n t . We set this constant to zero by redefining the en er gy scale. The qu a n ti-ties {hi} de s c ri be the en er gy due to the ori en t a ti on of the spins. In the magn etic sys temt h ey corre s pond to an ex ternal magn etic field that va ries from loc a ti on to loc a ti on .L i kes m a ll magn et s , spins try to ori ent along the magn etic fiel d . A spin ori en ted along them a gn etic field (si and hi h ave the same sign) has a lower en er gy than if it is anti p a ra ll elto the magn etic fiel d . As in Eq .( 1 . 6 . 2 ) , the con tri buti on of the magn etic field to the en-er gy is −|hi | ( |hi| ) wh en the spin is para ll el (anti p a ra ll el) to the field directi on . Wh en con-ven i ent we wi ll simplify to the case of a uniform magn etic fiel d , hi = h.

Wh en the spins are non i n teracti n g, the Ising model redu ces to a co ll ecti on of t wo -s t a te sys tems that we inve s ti ga ted in Secti on 1.4. L a ter, wh en we introdu ce interacti on sbet ween the spins, t h ere wi ll be differen ce s . For the non i n teracting case we can wri te theprob a bi l i ty for a particular con f i g u ra ti on of the spins using the Boltzmann prob a bi l i ty:

(1.6.3)

P[{s i }]=

e − E[{si }]

Z=

ehi s i

i∑

Z=

e hi s i

i

∏Z

E[{s i }]=1

2(ei (1)− ei (−1))si

i

∑ + (ei (1)+ ei (−1)) = E0 – his i

i

∑ → – his i

i

∑

E[{s i }]= ei (s i )i

∑



01adBARYAM_29412 3/10/02 10:16 AM Page 146

S ta t i s t i c a l f i e l d s 147


Figure 1.6.1 One way to visualize the Ising model is as a spatial array of binary variablescalled spins, represented as UP or DOWN arrows. A one-dimensional (1-d) example with all spinsUP is shown on top. The middle and lower figures show two-dimensional (2-d) arrays whichhave all spins UP (middle) or have some spins UP and some spins DOWN (bottom). ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 147

where = 1/kT. The partition function Z is given by:

(1.6.4)

where the second to last equality replaces the sum over all possible values of the spinvariables with a sum over each spin variable si = ±1 within the product. Thus the prob-ability factors as:

(1.6.5)

This is a product over the result we found for probability of the two-state system (Eq.(1.4.14)) if we write the energy of a single spin using the notation Ei(si) = –hisi.

Now that we have many spin variables, we can investigate the thermodynamics ofthis model by writing down the free energy and entropy of this model. This is dis-cussed in Question 1.6.1.

Question 1.6.1 Evaluate the thermodynamic free energy, energy and en-tropy for the Ising model without interactions.

Solution 1.6.1 The free energy is given in terms of the partition functionby Eq. (1.3.37):

(1.6.6)

The latter expression is a more common way of writing this result.The thermodynamic energy of the system is found from Eq.(1.3.38) as

(1.6.7)

Th ere is another way to obtain the same re su l t . The therm odynamic en er gy isthe avera ge en er gy of the sys tem (Eq .( 1 . 3 . 3 0 ) ) , wh i ch can be eva lu a ted direct ly:

(1.6.8)

which is the same as before. We have used the possibility of writing the prob-ability of a single spin variable independent of the others in order to performthis average. It is convenient to define the local magnetization mi as the av-erage value of a particular spin variable:

(1.6.9)

mi = si = si Ps i(s i )

s i =±1

∑ = Psi(1) − Psi

(−1)

U = E[{s i }] = – his i

i

∑ = – hi si

i

∑ = – hi si

s i

∑ P(s i )i

∑

= − hi(e hi −e − hi )

(e hi +e − hi )i∑ = − hi tanh(

i∑ h i )

U = −ln(Z)

= −hi (e

hi − e − hi )

(e hi + e − hi )i

∑ = − hi tanh(i

∑ hi )

F = −kT ln(Z) = −kT lni

∑ e hi + e − hi

= −kT ln

i

∑ 2cosh hi( )( )

P[{s i }]= P(si )i

∏ =e hi s i

e hi + e − hi

i∏

Z ={s i}

∑ e − E[{s i}] ={si }

∑ e hi s i

i∏ = e his i

s i

∑i

∏ = e hi + e − hi

i∏



01adBARYAM_29412 3/10/02 10:16 AM Page 148

Or using Eq. (1.6.5):

(1.6.10)

In Fig. 1 . 6 . 2 , the magn eti z a ti on at a particular site is plotted as a functi on of t h em a gn etic field for several different tem pera tu res ( = 1 /kT ) .The magn eti z a ti oni n c reases with increasing magn etic field and with dec reasing tem pera tu re unti lit satu ra tes asym pto ti c a lly to a va lue of +1 or –1. In terms of the magn eti z a ti onthe en er gy is:

(1.6.11)

We can calculate the entropy of the Ising model using Eq. (1.3.36)

(1.6.12)

S = k U +k lnZ = −k hi tanh(i

∑ hi ) + k lni

∑ 2cosh hi( )( )

U = – himi

i

∑

mi = si = tanh( hi )



-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-4 -2 0 2 4

m(h)

h

=1

=2

=0.5

Figure 1.6.2 Plot of the magnetization at a particular site as a function of the magnetic fieldfor independent spins in a magnetic field. The magnetization is the average of the spin value,so the magnetization shows the degree to which the spin is aligned to the magnetic field.The different curves are for several temperatures = 0.5,1,2 ( = 1/kT). The magnetizationhas the same sign as the magnetic field. The magnitude of the spin increases with increasingmagnetic field. Increasing temperature, however, decreases the alignment due to increasedrandom motion of the spins. The maximum magnitude of the magnetization is 1, correspond-ing to a fully aligned spin. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 149

which is not particularly enlightening. However, we can rewrite this in termsof the magnetization using the identity:

(1.6.13)

and the inverse of Eq. (1.6.10):

(1.6.14)

Substituting into Eq. (1.6.12) gives

(1.6.15)

Rearranging slightly, we have:

(1.6.16)

The final expression can be derived,at least for the case when all mi arethe same, by counting the number of states directly. It is worth deriving theentropy twice,because it may be used more generally than this treatment in-dicates. We will assume that all hi = h are the same. The energy then dependsonly on the total magnetization:

(1.6.17)

To obtain the entropy from the counting of states (Eq.(1.3.25)) we evaluatethe number of states within a particular narrow energy range. Since the en-ergy is the sum over the values of the spins,it may also be written as the dif-ference between the number of UP spins N(1) and DOWN spins N(−1):

E[{si}] = –h(N(1) − N(−1)) (1.6.18)

Thus, to find the entropy for a particular energy we must count how manystates there are with a particular number of UP and DOWN spins. Moreover,flipping a spin from DOWN to UP causes a fixed increment in the energy.Thus there is no need to include in the counting the width of the energy in-terval in which we are counting states. The number of states with N(1) UP

spins and N(−1) DOWN spins is:

(1.6.19)

(E,N) =N

N(1)

=

N!

N(1)!N(−1)!

E[{s i }]= –h sii

∑U = –h mi

i∑ = −hNm

S = +k N ln(2)−1

2(1 + mi )ln 1+ mi( )+ (1− mi )ln 1− mi( )( )

i

∑

S = −k m i1

2ln

1+ mi

1− mi

i

∑ +kN ln(2) −k1

2ln

i

∑ 1− mi

2

hi =1

2ln

1+ mi

1− mi

cosh(x) =1

1− tanh2(x)



01adBARYAM_29412 3/10/02 10:16 AM Page 150

The ent ropy can be written using Sterling’s approximation (Eq. (1.2.27)),neglecting terms that are less than of order N, as:

S = k ln( (E,N)) = k[N(lnN − 1) − N(1)(lnN(1) −1) − N(−1)(lnN(−1)–1]

= k[N lnN − N(1)lnN(1) − N(−1)lnN(−1)] (1.6.20)

the latter following from N = N(1) + N(−1). To simplify this expression fur-ther, we write it in terms of the magnetization. Using Ps i

(−1) + Psi(1) = 1 and

Eq. (1.6.9) for the magnetization we have the probability that a particularspin is UP and DOWN in terms of the magnetization as:

Psi(1) = (1 + m) / 2

Psi(−1) = (1 − m) / 2

(1.6.21)

Since there are many spins in the system, we can obtain the number of UP

spins using

N(1) = NPsi(1) = N(1 + m) / 2

N(−1) = NPsi(1) = N(1 − m) / 2

(1.6.22)

Using these expressions, Eq.(1.6.20) becomes the same as Eq.(1.6.16), withhi = h.

There is an important difference between the two derivations, in thatthe second assumed that all of the magnetic fields were the same. Thus, thefirst derivation appears more general. However, since the original system hasno interactions, we could consider each of the spins with its own field hi as aseparate system. If we want to calculate the entropy of the individual spin,we would consider an ensemble of such spins. The ensemble consists ofmany spins with the same field h = hi. The derivation of the entropy usingthe ensemble would be identical to the derivation we have just given, exceptthat at the end we would divide by the number of different systems in the en-semble N. Adding together the entropies of different spins would then giveexactly Eq. (1.6.16).

The en tropy of a spin from Eq . (1.6.16) is maximal for a magn eti z a ti on ofzero wh en it has the va lue k l n ( 2 ) . From the ori ginal def i n i ti on of the en tropy,this corre s ponds to the case wh en there are ex act ly two different po s s i ble state sof the sys tem . It thus corre s ponds to the case wh ere the prob a bi l i ty of e achs t a te s = ±1 is 1/2. The minimal en tropy is for ei t h er m = 1 or m = −1—wh ent h ere is on ly one po s s i ble state of the spin, so the en tropy must be zero. ❚

1.6.2 The Ising modelWe now add the essential aspect of the Ising model—interactions between the spins.The location of the spins in space was unimportant in the case of the noninteractingmodel. However, for the interacting model, we consider the spins to be located on aperiodic lattice in space. Similar to the CA models of Section 1.5, we allow the spinsto interact only with their nearest neighbors. It is conventional to interpret neighbors



01adBARYAM_29412 3/10/02 10:16 AM Page 151

strictly as the spins with the shortest Euclidean distance from a particular site. Thismeans that for a cubic lattice there are two, four and six neighbors in one, two andthree dimensions respectively. We will assume that the interaction with each of theneighbors is the same and we write the energy as:

(1.6.23)

The notation <ij> under the summation indicates that the sum is to be performedover all i and j that are nearest neighbors. For example,in one dimension this couldbe written as:

(1.6.24)

If we wanted to emphasize that each spin interacts with its two neighbors, we couldwrite this as

(1.6.25)

wh ere the factor of 1/2 corrects for the do u ble co u n ting of the interacti on bet ween everyt wo nei gh boring spins. In two and three dimen s i ons (2-d and 3-d), t h ere is need of ad-d i ti onal indices to repre s ent the spatial depen den ce . We could wri te the en er gy in 2-d as:

(1.6.26)

and in 3-d as:

(1.6.27)

In these sums,each nearest neighbor pair appears only once. We will be able to hidethe additional indices in 2-d and 3-d by using the nearest neighbor notation <ij> asin Eq. (1.6.23).

The interacti on J bet ween spins may arise from many different source s . Similar tothe deriva ti on of hi in Eq .( 1 . 6 . 2 ) , this is the on ly form that an interacti on bet ween twospins can take (Questi on 1.6.2). Th ere are two disti n ct po s s i bi l i ties for the beh avi or ofthe sys tem depending on the sign of the interacti on . Ei t h er the interacti on tries to ori-ent the spins in the same directi on (J > 0) or in the oppo s i te directi on (J < 0). The for-m er is call ed a ferrom a gn et and is the com m on form of a magn et . The other is call edan anti ferrom a gn et (Secti on 1.6.4) and has very different ex ternal properties but canbe repre s en ted by the same model , with J h aving the oppo s i te sign .

Question 1.6.2 Show that the form of the interaction given in Eq.(1.6.24) Jss ′ is the most general interaction between two spins.

Solution 1.6.2 We write as a general form of the energy of two spins:

E[{s i, j,k}]= – hi, j,ks i, j,k

i ,j,k

∑ − J (si ,j ,ksi +1, j,k

i , j,k

∑ + si ,j,ks i ,j+1,k +s i, j,ks i, j,k+1)

E[{s i, j }]= – hi ,j si ,j

i ,j

∑ − J (s i, jsi +1,j

i ,j

∑ + si ,j si ,j +1)

E[{s i }]= – h is i

i

∑ − J1

2(si si +1

i

∑ + si si −1)

E[{s i }]= – h is i

i

∑ − J si s i+1

i

∑

E[{s i }]= – h is i

i

∑ − J s is j

<ij>∑



01adBARYAM_29412 3/10/02 10:16 AM Page 152

(1.6.28)

If we expand this we wi ll find a constant term , terms that are linear in s and s ′and a term that is proporti onal to ss ′. The linear terms give rise to the local fiel dhi, and the final term is the interacti on . Th ere are other po s s i ble interacti on sthat could be wri t ten that would inclu de three or more spins. ❚

In a magnetic system, each microscopic spin is itself the source of a small mag-netic field. Magnets have the property that they can be the source of a macroscopicmagnetic field. When a material is a source of a magnetic field, we say that it is mag-netized. The magnetic field arises from constructive superposition of the microscopicsources of the magnetic field that we represent as spins.In effect,the small spins com-bine together to form a large spin. We have seen in Section 1.6.1 that when there is amagnetic field hi, each spin will orient itself with the magnetic field. This means thatin an external field—a field due to a source outside of the magnet—there will be amacroscopic orientation of the spins and they will in turn give rise to a magnetic field.Magnets,however, can be the source of a magnetic field even when there is no exter-nal field. This occurs only below a particular temperature known as the Curie tem-perature of the material. At higher temperatures,a magnetization exists only in an ex-ternal magnetic field. The Ising model captures this behavior by showing that theinteractions between the spins can cause a spontaneous orientation of the spins with-out any external field. The spontaneous magnetization is a collective phenomenon. Itwould not exist for an isolated spin or even for a small collection of interacting spins.

Ultimately, the reason that the spontaneous magnetization is a collective phe-nomenon has more to do with the kinetics than the thermodynamics of the system.The spontaneous magnetization must occur in a particular direction. Without an ex-ternal field,there is no reason for any particular direction, but the system must chooseone. In our case,it must choose between one of two possibilities—UP or DOWN. Oncethe magnetization occurs,it breaks a symmetry of the system, because we can now tellthe difference between UP and DOWN on the macroscopic scale. At this point,the ki-netics of the system must reenter. If the system were able to flip between UP andDOWN very rapidly, we would not be able to measure either case. However, we knowthat if all of the spins have to flip at once, the likelihood of this happening becomesvanishingly small as the number of spins grows. Thus for a large number of spins ina macroscopic material, this flipping becomes slower than our observation of themagnet.On the other hand,if we had only a few spins,they would still flip back andforth. It is this property of the system that makes the spontaneous magnetization acollective phenomenon.

Returning briefly to the discussion at the end of Section 1.3,we see that by choos-ing a direction for the magnetization,the magnet breaks the ergodic theorem. It is nolonger possible to represent the system using an ensemble with all possible states of

e(s , ′ s ) = e(1,1)(1+ s)(1+ ′ s )

4+ e(1,−1)

(1+s)(1− ′ s )

4

+e(1,−1)(1− s)(1+ ′ s )

4+ e(−1, −1)

(1−s)(1− ′ s )

4

S t a t i st i ca l f i e l d s 153


01adBARYAM_29412 3/10/02 10:16 AM Page 153

the system. We must exclude half of the states that have the opposite magnetization.The reason, as we described there, is because of the existence of a slow process, or along time scale, that prevents the system from going from one choice of magnetiza-tion to the other.

The ex i s ten ce of a spon t a n eous magn eti z a ti on arises because of the en er gy lower-ing of the sys tem wh en nei gh boring spins align with each other. At su f f i c i en t ly lowtem pera tu re s , this causes the sys tem to align co ll ectively one way or another. Above theCu rie tem pera tu re , Tc , the en er gy gain by align m ent is de s troyed by the tem pera tu re -i n du ced ra n dom flipping of i n d ivi dual spins.We say that the high er tem pera tu re ph a s eis a disordered ph a s e , as com p a red to the ordered low tem pera tu re ph a s e , wh ere allspins are align ed . Wh en we think abo ut this therm ody n a m i c a lly, the disorder is an ef fect of optimizing the en tropy, wh i ch prom o tes the disordered state and com pete swith the en er gy as the tem pera tu re is incre a s ed .

1.6.3 Mean field theoryDespite the simplicity of the Ising model, it has never been solved exactly except inone dimension, and in two dimensions for hi = 0. The techniques that are useful inthese cases do not generalize well. We will emphasize instead a powerful approxima-tion technique for describing systems of many interacting parts known as the meanfield approximation. The idea of this approximation is to treat a single element of thesystem under the average influence of the rest of the system. The key to doing this cor-rectly is to recognize that this average must be performed self-consistently. The mean-ing of self-consistency will be described shortly. The mean field approximation can-not be applied to all interacting systems. However, when it can be, it enables thesystem to be understood in a direct way.

To use the mean field approximation we single out a particular spin si and findthe effective field (or mean field) it experiences hi′. This field is obtained by replacingall variables in the energy by their average values, except for si. This leads to an effec-tive energy EMF(si) for si. To obtain it we can neglect all terms in the energy (Eq.(1.6.23)) that do not include si.

(1.6.29)

The sum is over all nearest neighbors of si. If we are able to find what the mean fieldhi′ is, then we can solve this interacting Ising model using the solution of the Isingmodel without interactions. The problem is that in order to find the field we have toknow the average value of the spins,which in turn depends on the effective fields. Thisis the self-consistency. We will develop a single algebraic equation for the solution. Itis interesting first to consider this problem when the external fields hi are zero. Eq.(1.6.29) shows that a mean field might still exist.When the external field is zero, eachof the spin variables has the same equation. We might guess that the average value ofthe spin in one location will be the same as that in any other location:

EMF (s i ) = –his i − J si < s j >jnn∑ = – ′ h is i

′ h i = hi + J < s j >jnn∑



01adBARYAM_29412 3/10/02 10:16 AM Page 154

m = mi = < si > (1.6.30)

In this case our equations become

where z is the number of nearest neighbors,known as the coordination number of thesystem. Eq.(1.6.10) gives us the value of the average magnetization when the spin issubject to a field.Using this same expression under the influence of the mean field wehave

m = tanh( hi′) = tanh( zJm) (1.6.32)

This is the self-consistent equation, which g ives the value of the magnetization interms of itself. The solution of this equation may be found graphically, as illustratedin Fig. 1.6.3, by plotting the functions y = m and y = tanh( zJm) and finding their in-tersections. There is always a solution m = 0. In addition, for values of zJ > 1, thereare two more solutions related by a change of sign m = ±m0( zJ), where we name thepositive solution m0( zJ). When zJ = 1, the line y = m is tangent to the plot o f y =tanh( zJm) at m = 0. For values zJ > 1,the value of y = tanh( zJm) must rise abovethe line y = m for small positive m and then cross it. The crossing point is the solutionm0( zJ). m0( zJ) approaches one asymptotically as zJ → ∞, e. g. as the temperaturegoes to zero. A plot of m0( zJ) from a numerical solution of Eq. (1.6.32) is shown inFig. 1.6.4.

We see that there are two different regimes for this model with a transition at atemperature Tc given by zJ = 1 or

kTc = zJ (1.6.33)

To understand what is happening it is helpful to look at the energy U(m) and the freeenergy F(m) as a function of the magnetization,assuming that all spins have the samemagnetization. We will treat the magnetization as a parameter that can be varied. Theactual magnetization is determined by minimizing the free energy.

To determine the energy, we must average Eq.(1.6.23), which includes a productof spins on neighboring sites. The mean field approximation treats each spin as if itwere independent of other spins except for their average field. This implies that wehave neglected correlations between the value of one spin and the others around it.Assuming that the spins are uncorrelated means the average over the product over twospins may be approximated by the product over the averages:

<sisj > ≈ <si><sj> = m2 (1.6.34)

The average over the energy without any external fields is then:

(1.6.35)

U(m) = < −J s is j<ij>∑ > = −

1

2NJzm2



(1.6.31)

01adBARYAM_29412 3/10/02 10:16 AM Page 155

The factor of 1/2 arises because we count each interaction only once (see Eqs.(1.6.24)–(1.6.27)). A sum over the average of EMF(si) would give twice as much, dueto counting each of the interactions twice.

Since we have fixed the magnetization of all spins to be the same, we can use theentropy we found in Question 1.6.1 to obtain the free energy as:

(1.6.36)

This free en er gy is plotted in Fig. 1.6.5 as a functi on of m/J z for va rious va lues ofk T/J z. We see that the beh avi or of this sys tem is prec i s ely the beh avi or of a secon d -order phase tra n s i ti on de s c ri bed in Secti on 1.3. Above the tra n s i ti on tem pera tu reTc t h ere is on ly one po s s i ble phase and bel ow Tc t h ere are two phases of equal en-

F(m) = −1

2NJzm2 − NkT ln(2) −

1

2(1 + m)ln 1+m( ) +(1− m)ln 1− m( )( )

156 I n t r o duc t i on a n d P re l i m i n a r i e s


-1.5

-1

-0.5

0

0.5

1

1.5

-4 -2 0 2 4

tanh( zJm)m

m

zJ=2

zJ=1zJ=0.5

Solution ofm=tanh( zJm)

F i g u re 1.6.3 G ra p h ical solution of Eq. (1.6.32) m = tanh( z J m) by plotting both the left-a nd rig ht - h a nd sides of the equa t ion as a func t ion of m a nd looking for the int e r s e c t io ns.m = 0 is always a solution. To cons ider other possible solutio ns we note that both func-t io ns are ant i s y m me t r ic in m so we need only cons ider positive values of m. For every pos-itive solution the re is a negative solution of equal ma g n i t ude. When z J = 1 the slope ofboth sides of the equa t ion is the same at m = 0. For z J > 1 the slope of the rig ht is gre a t e rthan the left side. For large positive values of m t he rig ht side of the equa t ion is alwaysless than the left side. Thus for z J > 1, the re must be an add i t io nal solution. The solu-t ion is plotted in Fig. 1.6.4. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 156

er gy. Q u e s ti on 1.6.3 cl a rifies a technical point in this deriva ti on , and Questi on 1.6.4gen era l i zes the soluti on to inclu de non zero magn etic fields hi ≠ 0.

Question 1.6.3 Show that the minima of the free energy are the solu-tions of Eq.(1.6.32). This shows that our derivation is internally consis-

tent. Specifically, that our two ways of defining the mean field approxima-tion, first using Eq. (1.6.29) and then using Eq. (1.6.34), are compatible.

Solution 1.6.3 Taking the derivative of Eq. (1.6.35) with respect to m andsetting it to zero gives:

(1.6.37)

Recognizing the inverse of tanh,as in Eq.(1.6.14), gives back Eq.(1.6.32) asdesired. ❚

Question 1.6.4 Find the replacements for Eq. (1.6.31)–(1.6.36) for thecase where there is a uniform external magnetic field hi = h. Plot the free

energy for a few cases.

0 = −Jzm −kT −1

2ln 1+m( )− ln 1−m( )( )



0

0.10.20.3

0.40.50.6

0.70.80.9

1

0 0.5 1 1.5 2

m( zJ)

zJ

Figure 1.6.4 The mean field approximation solution of the Ising model gives the magneti-zation (average value of the spin) as a solution of Eq. (1.6.32). The solution is shown as afunction of zJ. As discussed in Fig. 1.6.3 and the text for zJ > 1 there are three solutions.Only the positive one is shown. The solution m = 0 is unstable, as can be seen by analysis ofthe free energy shown in Fig. 1.6.5. The other solution is the negative of that shown. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 157


-0.8

-0.7

-0.6

-0.5-1 -0.5 0 0.5 1

-0.8

-0.7

-0.6

-0.5-1 -0.5 0 0.5 1

-0.7

-0.6

-0.5-1 -0.5 0 0.5 1

(a)

(b)

(c)

h=0

kT=0.8

kT=0.9

kT=1.0

kT=1.1

h=0.1

kT=0.8

kT=0.9

kT=1.0

kT=1.1

kT=0.8

h=0.1

h=0.05

h=0

h=–0.05

01adBARYAM_29412 3/10/02 10:16 AM Page 158

Solution 1.6.4 Applying an external magnetic field breaks the symmetrybetween the two different minima in the energy that we have found. In thiscase we have instead of Eq. (1.6.29)

EMF(si) = –hi′si

hi ′ = h + zJm(1.6.38)

The self-consistent equation instead of Eq. (1.6.32) is:

m = tanh( h + zJm) (1.6.39)

Averaging over the energy gives:

(1.6.40)

The entropy is unchanged, so the free energy becomes:

(1.6.41)

Several plots are shown in Fig. 1 . 6 . 5 . Above k Tc of Eq . (1.6.33) the app l i c a ti onof an ex ternal magn etic field gives rise to a magn eti z a ti on by shifting the lo-c a ti on of the single minimu m . Bel ow this tem pera tu re there is a ti l ting of t h et wo minima. Thu s , going from a po s i tive to a nega tive va lue of h would givean abru pt tra n s i ti on—a firs t - order tra n s i ti on wh i ch occ u rs at ex act ly h = 0 . ❚

In discussing the mean field equations, we have assumed that we could specifythe magnetization as a parameter to be optimized. However, the prescription we havefrom thermodynamics is that we should take all possible states of the system with aBoltzmann probability. What is the justification for limiting ourselves to only onevalue of the magnetization? We can argue that in a macroscopic system, the optimal

F(m) = −Nhm −1

2NJzm 2 − NkT ln(2) −

1

2(1+ m)ln 1+ m( ) +(1− m)ln 1−m( )( )

U(m) = < −h s ii

∑ − J s is j<ij>∑ > =−Nhm −

1

2NJzm 2

S ta t i st i ca l f i e l d s 159


F i g u re 1.6.5 Plots of the mean field approx i ma t ion to the free ene rg y. (a) shows the free en-e rgy for h = 0 as a func t ion of m for various values of k T. The free ene rgy m a nd k T a re me a-s u red in units of J z. As the tempera t u re is lowered below k T/z J = 1 the re are two minima in-stead of one (shown by arrows). These minima are the solutio ns of Eq. (1.6.32) (see Questio n1.6.3). The solutio ns are illustrated in Fig. 1.6.4. (b) Shows the same curves as (a) but with ama g ne t ic field h/z J = 0.1. The location of the minimum gives the value of the ma g ne t i z a t io n .T he ma g ne t ic field causes a ma g ne t i z a t ion to exist at all tempera t u re s, but it is larger at lowert e m p e ra t u re s. At the lowest tempera t u re shown k T/z J = 0.8 the effect of the phase tra ns i t io ncan be seen in the beginnings of a second (metastable) minimum at negative values of the ma g-ne t i z a t ion. (c) shows plots at a fixed tempera t u re of k T/z J = 0.8 for differe nt values of the ma g-ne t ic fie l d. As the value of the field goes from positive to ne g a t i v e, the minimum of the fre ee ne rgy switches from positive to negative values discont i nuo u s l y. At exactly h = 0 the re is a dis-c o nt i nuous jump from positive to negative ma g ne t i z a t ion—a first-order phase tra ns i t io n . ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 159

value of the magnetization will so dominate other magnetizations that any other pos-sibility is negligible. This is reasonable except for the case when the magnetic field isclose to zero, below Tc , and we have two equally likely magnetizations. In this case,theusual justification does not hold, though it is often implicitly applied.A more com-plete justification requires a discussion of kinetics given in Section 1.6.6.

Using the results of Question 1.6.4, we can draw a phase diagram like that illus-trated in Section 1.3 for water (Fig. 1.3.7). The phase diagram of the Ising model (Fig.1.6.6) describes the transitions as a function of temperature (or ) and magnetic fieldh. It is very simple for the case of the magnetic system,since the first-order phase tran-sition line lies along the h = 0 axis and ends at the second-order transition point givenby Eq. (1.6.33).

1.6.4 AntiferromagnetsWe found the existence of a phase transition in the last section from the self-consistentmean field result (Eq. (1.6.32)), which showed that there was a nonzero magnetiza-tion for zJ > 1. This condition is satisfied for small enough temperature as long asJ > 0. What about the case of J < 0? There are no additional solutions of Eq.(1.6.32)for this case. Does this mean there is no phase transition? Actually, it means that oneof our assumptions is not a good one. When J < 0,each spin would like (has a lowerenergy if…) its neighbors to antialign rather than align their spins. However, we haveassumed that all spins have the same magnetization, Eq. (1.6.30). The self-consistentequation assumes and does not guarantee that all spins have the same magnetization.This assumption is not a good one when the spins are trying to antialign.



kT

kTc=zJ

first order transition

hFigure 1.6.6 The phasediagram of the Isingmodel found from themean field approxima-tion. The line of first-or-der phase transitions ath = 0 ends at the sec-ond-order phase transi-tion point given byEq. (1.6.32). For posi-tive values of h there isa net positive magneti-zation and for negativevalues there is a nega-tive magnetization. Thechange through h = 0 iscontinuous above thesecond-order transitionpoint, and discontinu-ous below it. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 160



(0,0) (1,0)

(1,1)(1,0)

(-1,0)

(-1,-1) (0,-1) (1,-1)

(-1,1)

Figure 1.6.7 In orderto obtain mean fieldequations for the anti-ferromagnetic case J <0 we consider a squarelattice (top) and labelevery site according tothe sum of its rectilin-ear indices as odd(open circles) or even(filled circles). A fewsites are shown with in-dices. Each site is un-derstood to be the loca-tion of a spin. We theninvert the spins (rede-fine them by s → −s)that are on odd sitesand find that the newsystem satisfies thesame equations as theferromagnet. The sametrick works for any bi-partite lattice; for ex-ample the hexagonallattice shown (bottom).By using this trick welearn that at low tem-peratures the systemwill have a spontaneousmagnetism that is posi-tive on odd sites andnegative on even sitesor the opposite. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 161

We can solve the case of a sys tem with J < 0 on a squ a re or cubic latti ce direct ly us-ing a tri ck . We label every spin by indices (i , j) in 2-d, as indicated in Fig. 1 . 6 . 7 , or (i , j, k)in 3-d. Th en we con s i der sep a ra tely the spins whose indices sum to an odd nu m ber( “odd spins”) and those whose indices sum to an even nu m ber (“even spins” ) . No tethat all the nei gh bors of an odd spin are even and all nei gh bors of an even spin are od d .Now we invert all of the odd spins. Ex p l i c i t ly we define new spin va ri a bles in 3-d as

s ′ijk = (−1)i +j+ksijk(1.6.42)

In terms of these new spins,the energy without an external magnetic field is the sameas before, except that each term in the sum has a single additional factor of (–1). Thereis only one factor of (−1) because every nearest neighbor pair has one odd and oneeven spin. Thus:

(1.6.43)

We have com p l eted the tra n s form a ti on by defining a new interacti on J ′ = –J > 0. Interms of the new va ri a bl e s , we are back to the ferrom a gn et . The soluti on is thes a m e , and bel ow the tem pera tu re given by k Tc = zJ ′ t h ere wi ll be a spon t a n eo u sm a gn eti z a ti on of the new spin va ri a bl e s . What happens in terms of the ori gi n a lva ri a bles? Th ey become anti a l i gn ed . All of the even spins have magn eti z a ti on inone directi on , U P, and the odd spins have magn eti z a ti on in the oppo s i te directi on ,DOW N, or vi ce vers a . This lowers the en er gy of the sys tem , because the nega tive in-teracti on J < 0 means that all of the nei gh boring spins want to anti a l i gn . This isc a ll ed an anti ferrom a gn et .

The trick we have used to solve the antif erromagnet works for certain kinds ofperiodic arrangements of spins called bipartite lattices. A bipartite lattice can be di-vided into two lattices so that all the nearest neighbors of a member of one lattice aremembers of the other lattice. This is exactly what we need in order for our redefini-tion of the spin variables to work. Many lattices are bipartite,including the cubic lat-tice and the hexagonal honeycomb lattice illustrated in Fig. 1.6.7. However, the trian-gular lattice, illustrated in Fig. 1.6.8, is not.

The t riangular lattice exemplifies an important concept in interacting systemsknown as frustration. Consider what happens when we try to assign magnetizationsto each of the spins on a triangular lattice in an effort to create a configuration with alower energy than a disordered system. We start at a position marked (1) on Fig. 1.6.8and assign it a magnetization of m. Then, since it wants its neighbors to be an-tialigned, we assign position (2) a magnetization of −m. What do we do with the spinat (3)? It has interactions both with the spin at (1) and with the spin at (2). These in-teractions would have it be antiparallel with both—an impossible task.We say that thespin at (3) is frustrated,since it cannot simultaneously satisfy the conflicting demandsupon it. It should not come as a surprise that the phenomenon of frustration becomesa commonplace occurrence in more complex systems. We might even say that frus-tration is a source of complexity.

E[{ ′ s i }]= −J s is j

<ij>∑ = −(−J) ′ s i ′ s j

<ij>∑ = − ′ J ′ s i ′ s j

<ij>∑

162 I n t r oduc t i o n an d P re l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:16 AM Page 162



(1) (2)

(3)

Figure 1.6.8 A triangularlattice (top) is not a bi-partite lattice. In thiscase we cannot solve theantiferromagnet J < 0 bythe same method as usedfor the square lattice (seeFig. 1.6.7). If we try to as-sign magnetizations todifferent sites we findthat assigning a magneti-zation to site (1) wouldlead site (2) to be an-tialigned. This combina-tion would, however re-quire site (3) to beantialigned to both sites(1) and (2), which is im-possible. We say that site(3) is “frustrated.” Thebottom illustration showswhat happens when wetake the hexagonal latticefrom Fig. 1.6.7 and super-pose the magnetizationson the triangular latticeleaving the additionalsites (shaded) as unmag-netized (see Questions1.6.5–1.6.7). ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 163

Question 1.6.5 Despite the existence of frustration, it is possible toconstruct a state with lower energy than a completely disordered state

on the triangular lattice. Construct one of them and evaluate its freeenergy.

Solution 1.6.5 We con s tru ct the state by ex tending the process discussed inthe text for assigning magn eti z a ti ons to indivi dual site s . We start by assigning am a gn eti z a ti on m to site (1) in Fig. 1.6.8 and −m to site (2). Because site (3) isf ru s tra ted , we assign it no magn eti z a ti on . We con ti nue by assigning magn eti z a-ti ons to any site that alre ady has two nei gh bors that are assign ed magn eti z a-ti on s . We assign a magn eti z a ti on of m wh en the nei gh bors are −m and 0, am a gn eti z a ti on of −m wh en the nei gh bors are m and 0 and a magn eti z a ti on of0 wh en the nei gh bors are m and −m. This gives the illu s tra ti on at the bo t tom ofF i g. 1 . 6 . 8 . Com p a ring with Fig. 1 . 6 . 7 , we see that the magn eti zed sites corre-s pond to the hon eycomb latti ce . O n e - t h i rd of the triangular latti ce sites have am a gn eti z a ti on of +m, −m and 0. E ach magn eti zed site has three nei gh bors ofthe oppo s i te magn eti z a ti on and three unmagn eti zed site s . The free en er gy ofthis state is given by:

(1.6.44)

The first term is the energy. Each nearest neighbor pair of spins that are an-tialigned provides an energy Jm2. Let us call this a bond between two spins.There are a total of three interactions for every spin (each spin interacts withsix other spins but we can count each interaction only once). However, onaverage there is only one out of three interactions that is a bond in this sys-tem. To count the bonds, note that one out of three spins (with mi = 0) hasno bonds, while the other two out of three spins each have three bonds. Thisgives a total of six bonds for three sites, but each bond must be counted onlyonce for a pair of interacting spins. We divide by two to get three bonds forthree spins, or an average of one bond per site. The second term in Eq.(1.6.44) is the entropy of the N / 3 unmagnetized sites,and the third term isthe entropy of the 2N / 3 magnetized sites.

Th ere is another way to sys tem a ti c a lly con s tru ct a state with an en er gyl ower than a com p l etely disordered state . As s i gn magn eti z a ti on s +m a n d −ma l tern a tely along one stra i ght line—a on e - d i m en s i onal anti ferrom a gn et .Th en skip both nei gh boring lines by set ting all of t h eir magn eti z a ti ons tozero. Th en repeat the anti ferrom a gn etic line on the next para ll el line. Th i scon f i g u ra ti on of a l tern a ting anti ferrom a gn etic lines is also lower in en er gythan the disordered state , but it is high er in en er gy than the con f i g u ra ti ons h own in Fig. 1.6.8 at low en o u gh tem pera tu re s , as discussed in the nex tqu e s ti on . ❚

F(m) = NJm2 − 1

3NkT ln(2)

− 2

3NkT ln(2)− 1

2(1 +m)ln 1+m( )+ (1− m)ln 1− m( )( )



01adBARYAM_29412 3/10/02 10:16 AM Page 164

Question 1.6.6 Show that the state illustrated on the bottom of Fig. 1.6.8has the lowest possible free energy as the temperature goes to zero, at

least in the mean field approximation.

Solution 1.6.6 As the temperature goes to zero, the entropic contributionto the free energy is ir relevant. The energy of the Ising model is minimizedin the mean field approximation when the magnetization is +1 if the localeffective field is positive, or –1 ifit is negative. The magnetization is arbitraryif the effective field is zero. If we consider three spins arranged in a triangle,the lowest possible energy of the three interactions between them is given byhaving one with m = +1, one with m = –1 and the other arbitrary. This isforced, because we must have at least one +1 and one –1 and then the otheris arbitrary. This is the optimal energy for any triangle of interactions. Theconfiguration of Fig. 1.6.8 achieves this optimal arrangement for all trianglesand therefore must give the lowest possible energy of any state. ❚

Question 1.6.7 In the case of the ferromagnet and the antiferromagnet,we found that there were two different states of the system with the same

energy at low temperatures. How many states are there of the kind shown inFig. 1.6.8 and described in Questions 1.6.5 and 1.6.6?

Solution 1.6.7 There are two ways to count the states. The first is to countthe number of distinct magnetization structures. This counting is as follows.Once we assign the values of the magnetization on a single triangle, we havedetermined them everywhere in the system. This follows by inspection or byinduction on the size of the assigned triangle. Since we can assign arbitrar-ily the three different magnetizations (m, −m,0) within a triangle, there area total of six such distinct magnetization structures.

We can also count how many disti n ct arra n gem ents of spins there are .This is rel evant at low tem pera tu res wh en we want to know the po s s i ble state sat the lowest en er gy. We see that there are 2N/ 3 a rra n gem ents of the arbi tra ryspins for each of the magn eti z a ti on s . If we want to count all of the state s , wecan almost mu l ti p ly this nu m ber by 6. We have to correct this sligh t ly bec a u s eof s t a tes wh ere the arbi tra ry spins are all align ed U P or DOW N. Th ere are twoof these for each arra n gem ent of the magn eti z a ti on s , and these wi ll beco u n ted twi ce . Making this correcti on gives 6(2N/ 3 − 1) state s . We see thatf ru s tra ti on gives rise to a large nu m ber of l owest en er gy state s .

We have not yet proven that these are the on ly states with the lowest en er gy.This fo ll ows from the requ i rem ent that every tri a n gle must have its lowest po s-s i ble en er gy, and the ob s erva ti on that set ting the va lue of the magn eti z a ti ons ofone tri a n gle then forces the va lues of a ll other magn eti z a ti ons uniqu ely. ❚

Question 1.6.8 We discovered that our assumption that all spins shouldhave the same magnetization does not always apply. How do we know

that we found the lowest energy in the case of the ferromagnet? Answer thisfor the case of h = 0 and T = 0.



01adBARYAM_29412 3/10/02 10:16 AM Page 165

Solution 1.6.8 To minimize the energy, we can consider each term of theenergy, which is just the product of spins on adjacent sites. The minimumpossible value for each term of a ferromagnet occurs for aligned spins. Thetwo states we found at T = 0 with mi = 1 and mi = –1 are the only possiblestates with all spins aligned. Since they give the minimum possible energy,they must be the correct states. ❚

1.6.5 Beyond mean field theory (correlations)Mean field theory treats only the average orientation of each spin and assumes thatspins are uncorrelated. This implies that when one spin changes its sign, the otherspins do not respond. Since the spins are interacting, this must not be true in a morecomplete treatment. We expect that even above Tc , nearby spins align to ea ch other.Below Tc , nearby spins should be more aligned than would be suggested by the aver-age magnetization. Alignment of spins implies their values are correlated. How do wequantify the concept of correlation? When two spins are correlated they are morelikely to have the same value. So we might define the correlation of two spins as theaverage of the product of the spins:

(1.6.45)

According to this definition, they are correlated if they are both always +1, so thatPsisj

(1,1) = 1. Then < sisj > achieves its maximum possible value +1. The problem withthis definition is that when si and sj are both always +1 they are completely indepen-dent of each other, because each one is +1 independently of the other. Our concept ofcorrelation is the opposite of independence. We know that if spins are independent,then their joint probability distribution factors (see Section 1.2)

P(si ,sj) = P(si)P(sj) (1.6.46)

Thus we define the correlation as a measure of the departure of the joint probabilityfrom the product of the individual probabilities.

(1.6.47)

This definition means that when the correlation is zero, we can say that si and sj are in-dependent. However, we must be careful not to assume that they are not aligned witheach other. Eq. (1.6.45) measures the spin alignment.

Question 1.6.9 One way to think about the difference between Eq.(1.6.45) and Eq. (1.6.47) is by considering a hierarchy of correlations.

The first kind of correlation is of individual spins with themselves and is justthe average of the spin. The second kind are correlations between pairs ofspins that are not contained in the first kind. Define the next kind of corre-lation in the hierarchy that would describe correlations between three spinsbut exclude the correlations that appear in the first two.

sis j (P(s i ,s j ) − P(s i )P(s j )) = < si s j > − < si >< s j >s i ,s j

∑

< s is j > = s is j P(s i ,s j )s i ,s j

∑ = Ps is j(1,1) + Psi s j

(−1, −1) − Ps i sj (−1,1)− Ps i s j(1,−1)



01adBARYAM_29412 3/10/02 10:16 AM Page 166

Solution 1.6.9 The first three elements in the hierarchy of correlations are:

< si >< sisj > − < si > < sj > (1.6.48)

< sisjsk> − < sisj > < sk > − < sisk > < sj > − < sjsk > < si > +2 < si > < sj > < sk >

The expression for the correlation of three spins can be checked by seeingwhat happens if the variables are independent. When variables are indepen-dent, the average of their product is the same as the product of their aver-ages. Then all averages become products of averages of single variables andeverything cancels. Similarly, if the first two variables si and sj are correlatedand the last one sk is independent of them,then the first two terms cancel andthe last three terms also cancel. Thus, this expression measures the correla-tions of three variables that are not present in any two of them. ❚

Question 1.6.10 To see the difference between Eqs. (1.6.45) and(1.6.47), evaluate them for two cases: (a) si is always equal to 1 and sj is

always equal to –1,and (b) si is always the opposite of sj but each of them av-erages to zero (i.e., is equally likely to be +1 or –1).

Solution 1.6.10

a. Psisj(1,−1) = 1, so < sisj > = −1, but < sisj > − < si > < sj > = 0.

b. < sisj > = −1, and < sisj > − < si > < sj > = −1. ❚

Comparing Eq. (1.6.34) with Eq. (1.6.47), we see that correlations measure thedeparture of the system from mean field theory. When there is an average magnetiza-tion, such as there is below Tc in a ferromagnet, the effect of the average magnetiza-tion is removed by our definition of the correlation. This can also be seen from rewrit-ing the expression for correlations as:

< sisj > − < si > < sj > = < (si − < si > ) (sj − < sj >) > (1.6.49)

Correlations measure the behavior of the difference between the spin and its averagevalue. In the rest of this section we discuss qualitatively the correlations that are foundin a ferromagnet and the breakdown of the mean field approximation.

The en er gy of a ferrom a gn et is determ i n ed by the align m ent of n ei gh bori n gs p i n s . Po s i tive correl a ti ons bet ween nei gh boring spins redu ce its en er gy. Po s i tiveor nega tive correl a ti ons diminish the po s s i ble con f i g u ra ti ons of spins and there-fore redu ce the en tropy. At very high tem pera tu re s , the com peti ti on bet ween theen er gy and the en tropy is dom i n a ted by the en tropy, so there should be no corre-l a ti ons and each spin is indepen den t . At low tem pera tu re s , well bel ow the tra n s i-ti on tem pera tu re , the avera ge va lue of the spins is close to on e . For ex a m p l e , for

z J = 2 , wh i ch corre s ponds to T = Tc / 2, the va lue of m0( z J) is 0.96 (see Fig.1 . 6 . 4 ) . So the correl a ti ons given by Eq . (1.6.47) play almost no ro l e . Correl a ti on sa re most significant near Tc , so it is near the tra n s i ti on that the mean field ap-prox i m a ti on is least va l i d .



01adBARYAM_29412 3/10/02 10:16 AM Page 167

For all T > Tc and for h = 0, the magnetization is zero. However, starting fromhigh temperature, the correlation between neighboring spins increases as the tem-perature is lowered. Moreover, the correlation of one spin with its neighbors,and theircorrelation with their neighbors,induces a correlation of each spin with spins fartheraway. The distance over which spins are correlated increases as the temperature de-creases. The correlation decays exponentially, so a correlation length (T) may be de-fined as the decay constant of the correlation:

< sisj > − < si > < sj > ∝ e −rij / (T ) (1.6.50)

where rij is the Euclidean distance between si and sj. At Tc the correlation length di-verges. This is one way to think about how the phase transition occurs. The divergenceof the correlation length implies that two spins anywhere in the system become cor-related. As mentioned previously, in order for the instantaneous magnetization to bemeasured, there must also be a divergence of the relaxation time between oppositevalues of the magnetization. This will be discussed in Sections 1.6.6 and 1.6.7.

For tem pera tu res just bel ow Tc , the avera ge magn eti z a ti on is small . The corre-l a ti on length of the spins is large . The avera ge align m ent (Eq . (1.6.45)) is essen ti a llythe same as the correl a ti on (Eq . ( 1 . 6 . 4 7 ) ) . However, as T is furt h er redu ced bel owTc , the avera ge magn eti z a ti on grows prec i p i to u s ly and the correl a ti on measu res thed i f feren ce bet ween the spin-spin align m ent and the avera ge spin va lu e . Both thecorrel a ti on and the correl a ti on length dec rease aw ay from Tc . As the tem pera tu regoes to zero, the correl a ti on length also goes to zero, even as the correl a ti on itsel fva n i s h e s .

At T = Tc t h ere is a special circ u m s t a n ce wh ere the correl a ti on length is infinite .This does not mean that the correl a ti on is unch a n ged as a functi on of the distance be-t ween spins, rij. Si n ce the magn eti z a ti on is zero, the correl a ti on is the same as the spina l i gn m en t . If the align m ent did not dec ay with distance , the magn eti z a ti on would beu n i ty, wh i ch is not correct . The infinite correl a ti on length corre s ponds to power lawra t h er than ex pon en tial dec ay of the correl a ti on s .A power law dec ay of the correl a ti on sis more gradual than ex pon en tial and implies that there is no ch a racteri s tic size for thecorrel a ti on s : we can find correl a ted regi ons of spins that are of a ny size . Si n ce the cor-rel a ted regi ons flu ctu a te , we say that there are flu ctu a ti ons on every length scale.

The existence of correlations on every length scale near the phase transition andthe breakdown of the mean field approximation that neglects these correlationsplayed an important role in the development of the theory of phase transitions. Thediscrepancy between mean field predictions and experiment was one of the great un-solved problems of statistical physics. The development of renormalization tech-niques that directly consider the behavior of the system on different length scalessolved this problem. This will be discussed in greater detail in Section 1.10.

In Section 1.3 we discussed the nature of ensemble averages and indicated thatone of the central issues was determining the size of an independent system. For theIsing model and other systems that are spatial ly uniform, it is the correlation lengththat determines the size of an independent system. If a physical system is much largerthan a correlation length then the system is self-averaging, in that experimental mea-



01adBARYAM_29412 3/10/02 10:16 AM Page 168

surements average over many independent samples.We see that far from a phase tran-sition,uniform systems are generally self-averaging;near a phase transition,the phys-ical size of a system may enter in a more essential way.

The mean field approx i m a ti on is su f f i c i ent to captu re the co ll ective beh avi or of t h eIsing model . However, even Tc is not given correct ly by mean field theory, and indeed itis difficult to calculate . The actual tra n s i ti on tem pera tu re differs from the mean fiel dva lue by a factor that depends on the dimen s i on a l i ty and stru ctu re of the latti ce . In 1-d ,the failu re of mean field theory is most severe ,s i n ce there is actu a lly no real tra n s i ti on .Ma gn eti z a ti on does not occ u r, except in the limit of T → 0 . The re a s on that there is nom a gn eti z a ti on in 1-d, is that there is alw ays a finite prob a bi l i ty that at some point alon gthe chain there wi ll be a swi tch from having spins DOW N to having spins U P. This istrue no matter how low the tem pera tu re is. The prob a bi l i ty of su ch a bo u n d a rybet ween U P a n d DOW N spins dec reases ex pon en ti a lly with the tem pera tu re . It is givenby 1/( 1 + e2J /k T) ≈ e −2J /k T at low tem pera tu re . Even one su ch bo u n d a ry de s troys theavera ge magn eti z a ti on for an arbi tra ri ly large sys tem . While form a lly there is no ph a s etra n s i ti on in one dimen s i on , u n der some circ u m s t a n ces the ex pon en ti a lly growi n gd i s t a n ce bet ween bo u n d a ries may have con s equ en ces like a phase tra n s i ti on . The ef fecti s ,h owever, mu ch more gradual than the actual phase tra n s i ti ons in 2-d and 3-d.

The mean field approximation improves as the dimensionality increases. This isa consequence of the increase in the number of neighbors. As the number of neigh-bors increases,the averaging used for determining the mean field becomes more reli-able as a measure of the environment of the spin. This is an important point that de-serves some thought. As the number of different influences on a particular variableincreases, they become better represented as an average influence. Thus in 3-d, themean field approximation is better than in 2-d. Moreover, it turns out that rather thanjust gradually improving as the number of dimensions increases, for 4-d the meanfield approximation becomes essentially exact for many of the properties of impor-tance in phase transitions. This happens because correlations become irrelevant onlong length scales in more than 4-d. The number of effective neighbors of a spin alsoincreases if we increase the range of the interactions. Several different models withlong-range interactions are discussed in the following section.

The Ising model has no built-in dynamics;however, we often discuss fluctuationsin this model. The simplest fluctuation would be a single spin flipping in time. Unlessthe average value of a spin is +1 or –1,a spin must spend some time in each state. Wecan see that the presence of correlations implies that there must be fluctuations in timethat affect more than one spin. This is easiest to see if we consider a system above thetransition, where the average magnetization is zero. When one spin has the value +1,then the average magnetization of spins around it will be positive. On average,a re-gion of spins will tend to flip together from one sign to the other. The amount of timethat the region takes to flip depends on the length of the correlations. We have definedcorrelations in space between two spins. We could generalize the definition in Eq.(1.6.47) to allow the indices i and j to refer to different times as well as spatial posi-tions. This would tell us about the fluctuations over time in the system. The analog ofthe correlation length Eq.(1.6.50) would be the relaxation time (Eq.(1.6.69) below).



01adBARYAM_29412 3/10/02 10:16 AM Page 169

The Ising model is useful for describing a large variety of systems;however, thereare many other statistical models using more complex variables and interactions thathave been used to represent various physical systems. In general, these models aretreated first using the mean field approximation. For each model,there is a lower di-mension (the lower critical dimension) below which the mean field results are com-pletely invalid. There is also an upper critical dimension, where mean field is exact.These dimensions are not necessarily the same as for the Ising model.

1.6.6 Long-range interactions and the spin glassLong-range interactions enable the Ising model to serve as a model of systems that aremuch more complex than might be expected from the magnetic analog that moti-vated its original int roduction. If we just consider ferromagnetic interactions sepa-rately, the model with long-range interactions actually behaves more simply. If we justconsider antiferromagnetic interactions, larger scale patterns of UP and DOWN spinsarise. When we include both negative and positive interactions together, there will beadditional features that enable a richer behavior. We will start by considering the caseof ferromagnetic long-range interactions.

The primary effect of the increase in the range of ferromagnetic interactions isimprovement of the mean field approximation. There are several ways to model in-teractions that extend beyond nearest neighbors in the Ising model. We could set asphere of a particular radius r0 around each spin and consider all of the spins withinthe sphere to be neighbors of the spin at the center.

(1.6.51)

Here we do not restrict the summations over i and j in the second term,so we explic-itly include a factor of 1/2 to avoid counting interactions twice.Alternatively, we coulduse an interaction J(rij) that decays either exponentially or as a power law with dis-tance from each spin:

(1.6.52)

In both Eqs. (1.6.51) and (1.6.52) the self-interaction terms i = j are generally to beexcluded. Since si

2 = 1 they only add a constant to the energy.Q u i te gen era lly and indepen dent of the ra n ge or even the va ri a bi l i ty of i n terac-

ti on s , wh en all interacti ons are ferrom a gn eti c , J > 0, t h en all the spins wi ll align at lowtem pera tu re s . The mean field approx i m a ti on may be used to esti m a te the beh avi or. Allcases then redu ce to the same free en er gy (Eq . (1.6.36) or Eq . (1.6.41)) with a measu reof the strength of the interacti ons rep l acing z J. The on ly differen ce from the neare s tn ei gh bor model then rel a tes to the acc u racy of the mean field approx i m a ti on . It is sim-plest to con s i der the model of a fixed interacti on strength with a cutof f l en g t h . Th emean field is acc u ra te wh en the correl a ti on length is shorter than the interacti on dis-t a n ce . Wh en this occ u rs , a spin is interacting with other spins that are uncorrel a tedwith it. The avera ging used to obtain the mean field is then correct . Thus the approx-

E[{s i }]= – h is i

i

∑ −1

2J s is j

rij <r0

∑



01adBARYAM_29412 3/10/02 10:16 AM Page 170

i m a ti on improves if the interacti on bet ween spins becomes lon ger ra n ged . However,the correl a ti on length becomes arbi tra ri ly long near the phase tra n s i ti on . Thu s , forl on ger interacti on len g t h s , the mean field approx i m a ti on holds cl o s er to Tc but even-tu a lly becomes inacc u ra te in a narrow tem pera tu re ra n ge around Tc .Th ere is one modelfor wh i ch the mean field approx i m a ti on is ex act indepen dent of tem pera tu re or di-m en s i on . This is a model of i n f i n i te ra n ge interacti ons discussed in Questi on 1.6.11.The distance - depen dent interacti on model of Eq . (1.6.52) can be shown to beh ave likea finite ra n ge interacti on model for interacti ons that dec ay more ra p i dly than 1/r in 3-d . For we a ker dec ay than 1/r this model is essen ti a lly the same as the lon g - ra n ge in-teracti on model of Q u e s ti on 1.6.11. In teracti ons that dec ay as 1/r a re a borderline case.

Question 1.6.11 Solve the Ising model with infinite ranged interactionsin a uniform magnetic field. The infinite range means that all spins in-

teract with the same interaction strength. In o rder to keep the energy ex-trinsic (proportional to the volume) we must make the interactions betweenpairs of spins weaker as the system becomes larger, so replace J → J /N. Theenergy is given by:

(1.6.53)

For simplicity, keep the i = j terms in the second sum even though they addonly a constant.

Solution 1.6.11 We can solve this problem exactly by rewriting the energyin terms of a collective coordinate which is the average over the spin variables

(1.6.54)

in terms of which the energy becomes:

(1.6.55)

This is the same as the mean field Eq. (1.6.39) with the substitution Jz → J.Here the equation is exact. The result for the entropy is the same as before,since we have fixed the average value of the spin by Eq.(1.6.54). The solutionfor the value of m for h = 0 is given by Eq.(1.6.32) and Fig. 1.6.4. For h ≠ 0the discussion in Question 1.6.4 applies. ❚

The case of antiferromagnetic interactions will be considered in greater detail inChapter 7. If all interactions are antiferromagnetic J < 0,then extending the range ofthe interactions tends to reduce their effect, because it is impossible for neighboringspins to be antialigned and lower the energy. To be antialigned with a neighbor is tobe aligned with a second neighbor. However, by forming patches of UP and DOWN

spins it is possible to lower the energy. In an infinite-ranged antiferromagnetic sys-tem,all possible states with zero magnetization have the same lowest energy at h = 0.

E({si }) = hNm −

1

2JNm2

m =1

Ns i

i

∑

E[{s i }]= –h si

i

∑ −1

2NJ si s j

i, j

∑

S ta t i s t i ca l f i e l d s 171


01adBARYAM_29412 3/10/02 10:16 AM Page 171

This can be seen from the energy expression in Eq.(1.6.55). In this sense,frustrationfrom many sources is almost the same as no interaction.

In addition to the ferromagnet and antiferromagnet, there is a third possibilitywhere there are both positive and negative interactions. The physical systems thathave motivated the study of such models are known as spin glasses. These are mate-rials where magnetic atoms are found or placed in a nonmagnetic host.The randomlyplaced magnetic sites interact via long-range interactions that oscillate in sign withdistance. Because of the randomness in the location of the spins, there is a random-ness in the interactions between them. Experimentally, it is found that such systemsalso undergo a transition that has been compared to a glass transition, and thereforethese systems have become known as spin glasses.

A model for these materials, known as the Sherrington-Kirkpatrick spin glass,makes use of the Ising model with infinite-range random interactions:

(1.6.56)

Jij = ± J

The interactions Jij are fixed uncorrelated random variables—quenched variables.The properties of this system are to be averaged over the random variables Jij but onlyafter it is solved.

Similar to the ferromagnetic or antiferromagnetic Ising model,at high tempera-tures kT >> J the spin glass model has a disordered phase where spins do not feel theeffect of the interactions beyond the existence of correlations. As the temperature islowered,the system undergoes a transition that is easiest to describe as a breaking ofergodicity. Because of the random interactions,some arrangements of spins are muchlower in energy than others. As with the case of the antiferromagnet on a t riangularlattice,there are many of these low-energy states. The difference between any two ofthese states is large,so that changing from one state to the other would involve the flip-ping of a finite fraction of the spins of the system. Such a flipping would have to becooperative, so that overcoming the barrier between low-energy states becomes im-possible below the transition temperature during any reasonable time. The low-energy states have been shown to be organized into a hierarchy determined by the sizeof the overlaps between them.

Question 1.6.12 Solve a model that includes a special set of correlatedrandom interactions of the type of the Sherrington-Kirkpatrick model,

where the interactions can be written in the separable form

Jij = i j

i = ±1(1.6.57)

This is the Mattis model. For simplicity, keep the terms where i = j.

Solution 1.6.12 We can solve this probl em by defining a new set of va ri a bl e s

s′i = isi (1.6.58)

E[{s i }]= −1

2NJ ijs i s j

ij

∑



01adBARYAM_29412 3/10/02 10:16 AM Page 172

In terms of these variables the energy becomes:

(1.6.59)

which is the same as the ferromagnetic Ising model. The phase transition ofthis model would lead to a spontaneous magnetization of the new variables.This corresponds to a net orientation of the spins toward (or opposite) thestate si = i. This can be seen from

m = < s′i > = i< si > (1.6.60)

This model shows that a set of mixed interactions can cause the system tochoose a particular low-energy state that behaves like the ordered state foundin the ferromagnet. By extension, this makes it plausible that fully randominteractions lead to a variety of low-energy states. ❚

The existence of a large number of randomly located energy minima in the spinglass might suggest that by engineering such a system we could control where theminima occur. Then we might use the spin glass as a memory. The Mattis model pro-vides a clue to how this might be accomplished. The use of an outer product repre-sentation for the matrix of interactions turns out to be closely related to the modeldeveloped by Hebb for biological imprinting of memories on the brain. The engi-neering of minima in a long-range-interaction Ising model is precisely the model de-veloped by Hopfield for the behavior of neural networks that we will discuss inChapter 2.

In the ferromagnet and antiferromagnet, there were intuitive ways to deal withthe breaking of ergodicity, because we could easily define a macroscopic parameter(the magnetization) that differentiated between different macroscopic states of thesystem. More general ways to do this have been developed for the spin glass and ap-plied to the study of neural networks.

1.6.7 Kinetics of the Ising modelWe have introdu ced the Ising model wi t h o ut the ben efit of a dy n a m i c s . Th ere are manych oi ces of dynamics that would lead to the equ i l i brium en s em ble given by the Is i n gm odel . One of the most natu ral would arise from con s i dering each spin to have thet wo - s t a te sys tem dynamics of Secti on 1.4. In this dy n a m i c s , tra n s i ti ons bet ween U P a n dDOW N occur ac ross an interm ed i a te barri er that sets the tra n s i ti on ra te . We call this theactiva ted dynamics and wi ll use it to discuss pro tein folding in Ch a pter 4 because it canbe motiva ted micro s cop i c a lly. The activa ted dynamics de s c ri bes a con ti nuous ra te oftra n s i ti on for each of the spins. It is of ten conven i ent to con s i der tra n s i ti ons as occ u r-ring at discrete ti m e s . A parti c u l a rly simple dynamics of this kind was introdu ced byG l a u ber for the Ising model . It also corre s ponds to the dynamics popular in studies ofn eu ral net works that we wi ll discuss in Ch a pter 2. In this secti on we wi ll show that thet wo different dynamics are qu i te cl o s ely rel a ted . In Secti on 1.7 we wi ll con s i der severa lo t h er forms of dynamics wh en we discuss Mon te Ca rlo simu l a ti on s .

E[{s i }]= −1

2Ni js is j

ij

∑ = −1

2N′ s i ′ s j

ij

∑



01adBARYAM_29412 3/10/02 10:16 AM Page 173

If there are many different possible ways to assign a dynamics to the Ising model,how do we know which one is correct? As for the model itself, it is necessary to con-sider the system that is being modeled in order to determine which kinetics is appro-priate. However, we expect that there are many different choices for the kinetics thatwill provide essentially the same results as long as we consider its long time behavior.The central limit theorem in Section 1.2 shows that in a stochastic process, many in-dependent steps lead to the same Gaussian distribution of probabilities,independentof the specific steps that are taken. Similarly, if we choose a dynamics for the Isingmodel that allows individual spin flips, the behavior of processes that involve manyspin flips should not depend on the specific dynamics chosen. Having said this, weemphasize that the conditions under which different dynamic rules provide the samelong time behavior are not fully established. This problem is essential ly the same asthe problem of classifying dynamic systems in general. We will discuss it in more de-tail in Section 1.7.

Both the activated dynamics and the Glauber dynamics assume that each spin re-laxes from its present state toward its equilibrium distribution. Relaxation of eachspin is independent of other spins. The equilibrium distribution is determined by therelative energy of its UP and DOWN state at a particular time. The energy diff erencebetween having the i th spin si UP and DOWN is:

E+i({sj}j≠i) = E(si = +1,{sj}j≠i) −E(si = –1,{sj}j≠i) (1.6.61)

The probability of the spin being UP or DOWN is given by Eq. (1.4.14) as:

(1.6.62)

Psi(−1) = 1 − f(E+i) = f(−E+i) (1.6.63)

In the activated dynamics, all spins perform transitions at all times with ratesR(1|–1) and R(−1|1) given by Eqs.(1.4.38) and (1.4.39) with a site-dependent energybarrier EBi that sets the relaxation time for the dynamics i. As with the two-statesystem, it is assumed that each transition occurs essentially instantaneously. Thechoice of the barrier EBi is quite important for the kinetics, particularly since it mayalso depend on the state of other spins with which the i th spin interacts. As soon asone of the spins makes a transition,all of the spins with which it interacts must changetheir rate of relaxation accordingly. Instead of considering directly the rate of transi-tion, we can consider the evolution of the probability using the Master equation,Eq. (1.4.40) or (1.4.43). This would be convenient for Master equation treatments ofthe whole system. However, the necessity of keeping track of all of the probabilitiesmakes this impractical for all but simple considerations.

Glauber dynamics is simpler in that it considers only one spin at a time. The sys-tem is updated in equal time intervals.Each time interval is divided into N small timeincrements. During each time increment, we select a particular spin and only considerits dynamics. The selected spin then relaxes completely in the sense that its state is setto be UP or DOWN according to its equilibrium probability, Eq. (1.6.62). The transi-tions of different spins occur sequentially and are not otherwise coupled. The way we

Ps i

(1) =1

1+ e E+i / kT= f (E+i )



01adBARYAM_29412 3/10/02 10:16 AM Page 174

select which spin to update is an essential part of the Glauber dynamics. The simplestand most commonly used approach is to select a spin at random in each time incre-ment. This means that we do not guarantee that every spin is selected during a timeinterval consisting of N spin updates.Likewise,some spins will be updated more thanonce in a time interval.On average,however, every spin is updated once per time in-terval.

In order to show that the Glauber dynamics are intimately related to the activateddynamics, we begin by considering how we would implement the activated dynamicson an ensemble of independent two-state systems whose dynamics are completely de-termined by the relaxation time = (R(1|–1) + R(1|–1))−1 (Eq.(1.4.44)). We can thinkabout this ensemble as representing the dynamics of a single two-state system, or, ina sense that will become clear, as representing a noninteracting Ising model. The to-tal number of spins in our ensemble is N. At time t the ensemble is described by thenumber of UP spins given by NP(1;t) and the number of DOWN spins NP(−1;t).

We describe the a ctivated dynamics of the ensemble using a small time interval∆t, which eventually we would like to make as small as possible. During the intervalof time ∆t, which is much smaller than the relaxation time , a certain number of spinsmake transitions. The probability that a particular spin will make a transition fromUP to DOWN is given by R(−1|1)∆t. The total number of spins making a transitionfrom DOWN to UP, and from UP to DOWN, is:

NP(−1;t)R(1|–1)∆t

NP(1;t)R(−1|1)∆t(1.6.64)

respectively. To implement the dynamics, we must randomly pick out of the whole en-semble this number of UP spins and DOWN spins and flip them. The result would bea new number of UP and DOWN spins NP(1;t + ∆t) and NP(−1;t + ∆t). The processwould then be repeated.

It might seem that there is no reason to randomly pick the ensemble elements toflip, because the result is the same if we rearrange the spins arbitrarily. However, ifeach spin represents an identifiable physical system (e.g., one spin out of a noninter-acting Ising model) that is performing an internal dynamics we are representing, thenwe must randomly pick the spins to flip.

It is somewhat inconvenient to have to worry about selecting a particular num-ber of UP and DOWN spins separately. We can modify our prescription so that we se-lect a subset of the spins regardless of orientation. To achieve this, we must allow thatsome of the selected spins will be flipped and some will not. We select a fraction ofthe spins of the ensemble. The number of these that are DOWN is NP(−1; t). In or-der to flip the same number of spins from DOWN to UP, as in Eq.(1.6.64), we must flipUP a fraction R(1|–1)∆t / of the NP(−1; t) spins. Con s equ en t ly, the fracti on of s p i n swe do not flip is (1 – R( 1 | – 1 )∆t / ) . Si m i l a rly, the nu m ber of s el ected U P spins is

N P ( 1;t) the fracti on of these to be flipped is R(−1|1 )∆t / , and the fracti on we do notflip is (1 − R(−1|1) ∆t/ ) . In order for these ex pre s s i ons to make sense (to be po s i tive )

must be large en o u gh so that at least one spin wi ll be flipped . This implies > max(R( 1 | – 1 )∆t, R(−1 | 1 )∆t) . Moreover, we do not want to be larger than it must be



01adBARYAM_29412 3/10/02 10:16 AM Page 175

because this will just force us to select additional spins we will not be flipping. A con-venient choice would be to take

= (R(1| − 1) + R(−1|1))∆t = ∆t / (1.6.65)

The consequences of this choice are quite interesting, since we find that the fractionof selected DOWN spins to be flipped UP is R(1|–1) / (R(1|–1) + R(−1|1)) = P(1), theequilibrium fraction of UP spins. The fraction not to be flipped is the equilibriumfraction of DOWN spins. Similarly, the fraction o f selected UP spins that are to beflipped DOWN is the equilibrium fraction of DOWN spins, and the fraction to be leftUP is the equilibrium fraction of UP spins. Consequently, the outcome of the dynam-ics of the selected spin does not depend at all on the initial state of the spin. The re-vised prescription for the dynamics is to select a fraction of spins from the ensem-ble and set them according to their equilibrium probability.

We still must choose the time interval ∆t. The smallest time interval that makessense is the interval for which the number of selected spins would be just one. Asmaller number would mean that sometimes we would not choose any spins.Settingthe number of selected spins N = 1 using Eq. (1.6.65) gives:

(1.6.66)

which also implies the condition ∆t << , and means that the approximation of a fi-nite time increment ∆t is directly coupled to the size of the ensemble. Our new pre-scription is that we select a single spin and set it UP or DOWN according to its equi-librium probability. This would be the prescription of Glauber dynamics if theensemble were considered to be the Ising model without interactions. Thus for a non-interacting Ising model, the Glaub er dynamics and the activated dynamics are thesame. So far we have made no approximation except the finite size of the ensemble.We still have one more step to go to apply this to the interacting Ising model.

The activa ted dynamics is a stoch a s tic dy n a m i c s , so it does not make sense todiscuss on ly the dynamics of a particular sys tem but the dynamics of an en s em bl eof Ising model s . At any mom en t , the activa ted dynamics treats the Ising model as aco ll ecti on of s everal kinds of s p i n s . E ach kind of spin is iden ti f i ed by a parti c u l a rva lue of E+ and EB. These para m eters are con tro ll ed by the local envi ron m ent of t h es p i n . The dynamics is not con cern ed with the source of these qu a n ti ti e s , on ly thei rva lu e s . The dynamics are that of an en s em ble con s i s ting of s everal kinds of s p i n swith a different nu m ber Nk of e ach kind of s p i n , wh ere k i n dexes the kind of s p i n .According to the re sult of the previous para gra ph , and spec i f i c a lly Eq . ( 1 . 6 . 6 5 ) , wecan perform this dynamics over a time interval ∆t by sel ecting Nk∆t / k spins of e achkind and updating them according to the Glauber met h od . This is stri ct lya pp l i c a ble on ly for an en s em ble of Ising sys tem s . If the Ising sys tem that we are con-s i dering contains many correl a ti on len g t h s , Eq . ( 1 . 6 . 5 0 ) , t h en it repre s ents the en-s em ble by itsel f . Thus for a large en o u gh Ising model , we can app ly this to a singl es ys tem .

∆ t =

1

N(R(1 | −1)+ R(−1|1))=

N



01adBARYAM_29412 3/10/02 10:16 AM Page 176

If we want to select spins arbitrarily, rather than of a particular kind, we mustmake the assumption that all of the relaxation times are the same, k → . This as-sumption means that we would select a total number of spins:

(1.6.67)

As before, ∆t may also be chosen so that in each time interval only one spin is selected.Using two assumptions, we have been able to derive the Glauber dynamics di-

rectly from the activated dynamics.One of the assumptions is that the dynamics mustbe considered to apply only as the dynamics of an ensemble. Even though both dy-namics are stochastic dynamics, applying the Glauber dynamics directly to a singlesystem is only the same as the activated dynamics for a large enough system. The sec-ond assumption is the equivalence of the relaxation times k. When is this assumptionvalid? The expression for the relaxation time in terms of the two-state system is givenby Eq. (1.4.44) as

1/ = (R(1|–1) + R(−1|1)) = (e−(EB −E1) /kT + e −(EB−E−1)/kT) (1.6.68)

When the relative energy of the two states E1 and E−1 varies between different spins,this will in general vary. The size of the relaxation time is largely controlled by thesmaller of the two energy differences EB − E1 and EB − E−1. Thus,maintaining the samerelaxation time would require that the smaller energy difference is nearly constant.This is essential, because the relaxation time changes exponentially with the energydifference.

We have shown that the Glauber dynamics and the activated dynamics are closelyrelated despite appearing to be quite different. We have also found how to generalizethe Glauber dynamics if we must allow different relaxation times for different spins.Finally, we have found that the time increment for a single spin update correspondsto /N. This means that a single Glauber time step consisting of N spin updates cor-responds to a physical time —the microscopic relaxation time of the individualspins.

At this point we have introduced a dynamics for the Ising model, and it shouldbe possible for us to investigate questions about its kinetics.Often questions about thekinetics may be described in terms of time correlations. Like the correlation length,we can introduce a correlation time s that is given by the decay of the spin-spin cor-relation

< si(t ′)si(t) > − < si >2 ∝ e −|t −t ′|/ s (1.6.69)

For the case of a relaxing two-state system,the correlation time is the relaxation time. This follows from Eq. (1.4.45), with some attention to notation as described in

Question 1.6.13.

Question 1.6.13 Show that for a two-state system, the correlation timeis the relaxation time .

Nk∆t

kk∑ → N

∆t



01adBARYAM_29412 3/10/02 10:16 AM Page 177

Solution 1.6.13 The difficulty in this question is restoring some of the no-tational details that we have been leaving out for convenience. FromEq. (1.6.45) we have for the average:

(1.6.70)

Let’s assume that t ′> t, then each of these joint probabilities of the formPsi(t ′),si(t)(s2,s1) is g iven by the probability that the two-state system starts inthe state s1 at time t, multiplied by the probability that it will evolve from s1

into s2 at time t ′.

(1.6.71)

The first factor on the ri ght is call ed the con d i ti onal prob a bi l i ty. The prob-a bi l i ty for a particular state of the spin is the equ i l i brium prob a bi l i tyt h a t we wro te as P(1) and P(−1 ) . The con d i ti onal prob a bi l i ties sati s f yPsi( t′) ,si(t)( 1s1) + Psi(t ′) ,si ( t )(−1s1) = 1 , so we can simplify Eq . (1.6.70) to :

(1.6.72)

The evolution of the probabilities are described by Eq.(1.4.45),repeated here:

P(1;t) = (P(1;0) − P(1;∞))e-t/ + P(1;∞) (1.6.73)

Since the conditional probability assumes a definite value for the initial state(e.g., P(1;0) = 1 for Ps(t ′),s(t)(1|1)), we have:

Ps(t ′),s(t)(1|1) = (1 − P(1))e − (t′-t)/ + P(1)

Ps(t′),s(t)(−1|–1) = (1 − P(−1))e − (t′-t)/ + P(−1)(1.6.74)

Inserting these into Eq. (1.6.72) gives:

(1.6.75)

The constant term on the right is the same as the square of the average of thespin:

<si(t)>2 = (P(1) − P(−1))2 (1.6.76)

Inserting into Eq.(1.6.69) leads to the desired result (we have assumed thatt′ > t):

<si(t ′)si(t)> − <si(t )>2 = 4P(1)P(−1)e−(t′ − t)/ ∝ e−(t′ − t)/ (1.6.77) ❚

< s i ( ′ t )s i (t) > = (2 (1− P(1))e −( ′ t −t )/ + P(1)[ ] −1)P(1)

+ (2 (1− P(−1))e −( ′ t −t )/ + P(−1)[ ]− 1)P(−1)

= 4P(1)P(−1)e −( ′ t −t )/ +(P(1) − P(−1))2

< s i ( ′ t )s i (t) > = (2Ps i ( ′ t ), si (t )(1|1)− 1)P(1) +(2Ps i ( ′ t ),s i (t)(−1| −1)− 1)P(−1)

Ps i ( ′ t ),s i(t )(s2 ,s1) = Ps i ( ′ t ), si (t )(s2 |s1)Psi (t )(s1)

< s i ( ′ t )s i (t) > = Ps i ( ′ t ),s i (t )(1,1)+ Ps i ( ′ t ),s i (t )(−1, −1)

− Ps i ( ′ t ), s i (t)(1,−1) − Ps i( ′ t ),s i (t )(−1,1)

178 I n t r o duc t i on a n d P re l i m i n a r i e s


01adBARYAM_29412 3/10/02 10:16 AM Page 178

From the beginning of our discussion of the Ising model,a central issue has beenthe breaking of the ergodic theorem associated with the spontaneous magnetization.Now that we have introduced a kinetic model, we will tackle this problem directly.First we describe the problem fully. The ergodic theorem states that a time averagemay be replaced by an ensemble average. In the ensemble,all possible states of the sys-tem are included with their Boltzmann probability. Without formal justification, wehave treated the spontaneous magnetization of the Ising model at h = 0 as a macro-scopically observable quantity. According to our prescription,this is not the case. Letus perform the average < si > over the ensemble at T = 0 and h = 0. There are two pos-sible states of the system with the same energy, one with {si = 1} and one with {si = –1}.Since they must occur with equal probability by our assumption, we have that the av-erage < si > is zero.

This argument breaks down because of the kinetics of the sys tem that preven t sa tra n s i ti on from one state to the other du ring the co u rse of a measu rem en t . Thu swe measu re on ly one of the two po s s i ble states and find a magn eti z a ti on of 1 or –1.How can we prove that this sys tem breaks the er godic theorem? The most direct te s tis to start from a sys tem with a sligh t ly po s i tive magn etic field near T = 0 wh ere them a gn eti z a ti on is +1 , and reverse the sign of the magn etic fiel d . In this case the equ i-l i brium state of the sys tem should have a magn eti z a ti on of – 1 . In s te ad the sys tem wi llmaintain its magn eti z a ti on as +1 for a long time before even tu a lly swi tching fromone to the other. The process of s wi tching corre s ponds to the kinetics of a firs t - ordertra n s i ti on .

1.6.8 Kinetics of a first-order phase transitionIn this section we discuss the first-order transition kinetics in the Ising model. Similararguments apply to other first-order transitions like the freezing or boiling of water.If we start with an Ising model in equilibrium at a temperature T < Tc and a smallpositive magnetic field h << zJ, the magnetization of the system is essentially m0( zJ).If we change the magnetic field suddenly to a small negative value, the equilibriumstate of the system is −m0( zJ);however, the system will require some time to changeits magnetization. The change in the magnetic field has very little effect on the energyof an individual spin si. This energy is mostly due to the interaction with its neigh-bors, with a relatively small contribution due to the external field. Most of the timethe neighbors are oriented UP, and this makes the spin have a lower energy when it isUP. This gives rise to the magnetization m0( zJ). Until si’s neighbors change their av-erage magnetization, si has no reason to change its magnetization. But then neither dothe neighbors. Thus, because each spin is in its own local equilibrium,the process thateventually equilibrates the system requires a cooperative effect including more thanone spin. The process by which such a first-order transition occurs is not the simul-taneous switching of all of the spins from one value to the other. This would requirean impossibly long time. Instead the t ransition occurs by nucleation and growth ofthe equilibrium phase.

It is easiest to describe the nucleation process when T is sufficiently less than Tc ,

so that the spins are almost always +1. In mean field, already for T < 0.737Tc the



01adBARYAM_29412 3/10/02 10:16 AM Page 179

probability of a spin being UP is greater than 90% (P(1) = (1 + m)/2 > 0.9),and forT < 0.61Tc the probability of a spin being UP is greater than 95%. As long as T isgreater than zero, individual spins will flip from time to time. However, even thoughthe magnetic field would like them to be DOWN, their local environment consisting ofUP spins does not. Since the interaction with their neighbors is stronger than the in-teraction with the external field,the spin will generally flip back UP after a short time.There is a smaller probability that a second spin,a neighbor of the first spin, will alsoflip DOWN. Because one of the neighbors of the second spin is already DOWN, there isa lower energy cost than for the first one. However, the energy of the second spin isstill higher when it is DOWN, and the spins will generally flip back, first one then theother. There is an even smaller probability that three interacting spins will flipDOWN.The existence of two DOWN spins makes it more likely for the third to do so. If the firsttwo spins were neighbors,than the third spin can have only one of them as its neigh-bor. So it still costs some energy to flip DOWN the third spin. If there are three spinsflipped DOWN in an L shape,the spin that completes a 2 × 2 square has two neighborsthat are +1 and two neighbors that are –1,so the interactions with its neighbors can-cel. The external field then gives a preference for it to be DOWN. There is still a highprobability that several of the spins that are DOWN will flip UP and the little clusterwill then disappear. Fig. 1.6.9 shows various clusters and their energies compared to auniform region of +1 spins. As more spins are added,the internal region of the clus-ter becomes composed of spins that have four neighbors that are all DOWN. Beyond acertain size (see Question 1.6.14) the cluster of DOWN spins will grow, because addingspins lowers the energy of the system. At some point the growing region of DOWN

spins encounters another region of DOWN spins and the whole system reaches its newequilibrium state, where most spins are DOWN.

Question 1.6.14 Using an estimate of how the energy of large clusters ofDOWN spins grows, show that large enough clusters must have a lower

energy than the same region if it were composed of UP spins.

Solution 1.6.14 The en er gy of a clu s ter of DOW N spins is given by its inter-acti on with the ex ternal magn etic field and the nu m ber of a n ti a l i gn ed bon d sthat form its bo u n d a ry. The ch a n ge in en er gy due to the ex ternal magn eti cf i eld is ex act ly 2hNc , wh i ch is proporti onal to the nu m ber of spins in the



Figure 1.6.9 Illustration of small clusters of DOWN spins shown as filled dark squares resid-ing in a background of UP spins on a square lattice. The energies for creating the clusters areshown. The magnetic field, h, is negative. The formation of such clusters is the first step to-wards nucleation of a DOWN region when the system undergoes a first-order transition from UP

to DOWN. The energy is counted by the number of spins that are DOWN times the magnetic fieldstrength, plus the interaction strength times the number of antialigned neighboring spins,which is the length of the boundary of the cluster. In a first-order transition, as the size ofthe clusters grows the gain from orienting toward the magnetic field eventually becomesgreater than the loss from the boundary energy. Then the cluster becomes more likely to growthan shrink. See Question 1.6.14 and Fig. 1.6.10. ❚

01adBARYAM_29412 3/10/02 10:16 AM Page 180



8h+20J

2h+8J

4h+12J

6h+16J

8h+16J

10h+24J10h+20J

12h+24J12h+22J 12h+26J

01adBARYAM_29412 3/10/02 10:16 AM Page 181

clu s ter Nc . This is nega tive since h is nega tive . The en er gy of the bo u n d a ry isproporti onal to the nu m ber of a n ti a l i gn ed bon d s , and it is alw ays po s i tive .Because every ad d i ti onal anti a l i gn ed bond raises the clu s ter en er gy, t h ebo u n d a ry of the clu s ter tends to be smooth at low tem pera tu re s . Th erefore , wecan esti m a te the bo u n d a ry en er gy using a simple shape like a squ a re or circ u-lar clu s ter in 2-d (a cube or ball in 3-d). Ei t h er way the en er gy wi ll increase asf JNc

(d- 1 ) /d, wh ere d is the dimen s i on a l i ty and f is a constant acco u n ting for thes h a pe . Si n ce the nega tive con tri buti on to the en er gy incre a s e s , in proporti onto the area (vo lume) of the clu s ter, and the po s i tive con tri buti on to the en er gyi n c reases in proporti on to the peri m eter (su rf ace area) of the clu s ter, the neg-a tive term even tu a lly wi n s .O n ce a clu s ter is large en o u gh so that its en er gy isdom i n a ted by the interacti on with the magn etic fiel d ,t h en , on - avera ge , ad d i n gan ad d i ti onal spin to the clu s ter wi ll lower the sys tem en er gy. ❚

Question 1.6.15 Without looking at Fig. 1.6.9, construct all of the dif-ferent possible clusters of as many as five DOWN spins.Label them with

their energy.

Solution 1.6.15 See Fig. 1.6.9. ❚

The scenario just described, known as nucleation and growth, is generally re-sponsible for the kinetics of first-order transitions. We can illustrate the processschematically (Fig. 1.6.10) using a one dimensional plot indicating the energy per spinof a cluster as a function of the number of atoms in the cluster. The energy of the clus-ter increases at first when there are very few spins in the cluster, and then decreasesonce it is large enough. Eventually the energy decreases linearly with the number ofspins in the cluster. The decrease per spin is the energy difference per spin between thetwo phases. The first cluster size that is “over the hump” is known as the critical clus-ter. The process of reaching this cluster is known as nucleation.A first estimate of thetime to nucleate a critical cluster at a particular place in space is given by the inverseof the Boltzmann factor of the highest energy barrier in Fig. 1.6.10. This correspondsto the rate of transition over the barrier given by a two-state system with this samebarrier (see Eq. (1.4.38) and Eq. (1.4.44)). The size of the critical cluster depends onthe magnitude of the magnetic field.A larger magnetic field implies a smaller criticalcluster. Once the critical cluster is reached,the kinetics corresponds to the biased dif-fusion described at the end of Section 1.4. The primary difficulty with an illustrationsuch as Fig. 1.6.10 is that it is one-dimensional. We would need to show the energy ofeach type of cluster and all of the ways one cluster can transform into another.Moreover, the clusters themselves may move in space and merge or separate. In Fig.1.6.11 we show frames from a simulation of nucleation in the Ising model usingGlauber dynamics. The frames illustrate the process of nucleation and growth.

Experimental studies of nucleation kinetics are sometimes quite difficult. Inphysical systems,impurities often lower the barrier to nucleation and therefore con-trol the rate at which the first-order transition occurs. This can be a problem for theinvestigation of the inherent nucleation because of the need to study highly purified



01adBARYAM_29412 3/10/02 10:16 AM Page 182

systems. However, this sensitivity should be understood as an opportunity for controlover the kinetics. It is similar to the sensitivity of electrical properties to dopant im-purities in a semiconductor, which enables the construction of semiconductor de-vices. There is at least one direct example of the control of the kinetics of a first-ordertransition. Before describing the example,we review a few properties of the water-to-ice transition. The temperature of the water-to-ice transition can be lowered signifi-cantly by the addition of impurities. The freezing temperature of salty ocean water islower than that of pure water. This suppression is thermodynamic in origin, whichmeans that the Tc is actually lower. There exist fish that live in sub-zero-degrees oceanwater whose blood has less salt than the surrounding ocean. These fish use a family ofso-called antifreeze proteins that are believed to kinetically suppress the freezing oftheir blood. Instead of lowering the freezing temperature,these proteins suppress icenucleation.

The existence of a long nucleation time implies that it is often possible to createmetastable materials. For example, supercooled water is water whose temperature hasbeen lowered below its freezing point. For many years, particle physicists used a su-perheated fluid to detect elementary particles. Ultrapure liquids in large tanks were



E

Nc0

EBmax

Critical N c

Figure 1.6.10 Schematic illustration of the energies that control the kinetics of a first-orderphase transition. The horizontal axis is the size of a cluster of DOWN spins Nc that are the equi-librium phase. The cluster is in a background of UP spins that are the metastable phase. Thevertical axis is the energy of the cluster. Initially the energy increases with cluster size untilthe cluster reaches the critical cluster size. Then the energy decreases. Each spin flip has itsown barrier to overcome, leading to a washboard potential. The highest barrier EBmax that thesystem must overcome to create a critical nucleus controls the rate of nucleation. This is sim-ilar to the relaxation of a two-level system discussed in Section 1.4. However, this simple pic-ture neglects the many different possible clusters and the many ways they can convert intoeach other by the flipping of spins. A few different types of clusters are shown in Fig. 1.6.9. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 183


t=240

t=280 t=400

t=360

t=200 t=320

Figure 1.6.11 Frames from a simulation illustrating nucleation and growth in an Ising modelin 2-d. The temperature is T = zJ/3 and the magnetic field is h = −0.25. Glauber dynamics wasused. Each time step consists of N updates where the space size is N = 60 × 60. Frames shownare in intervals of 40 time steps. The first frame shown is at t = 200 steps after the begin-ning of the simulation. Black squares are DOWN spins and white areas are UP spins. The


01adBARYAM_29412 3/10/02 10:17 AM Page 184


t=440

t=640

t=560

t=480

t=520

t=600

metastability of the UP phase is seen in the existence of only a few DOWN spins until the frameat t = 320. All earlier frames are qualitatively the same as the frames at t = 200,240 and 280.A critical nucleus forms between t = 280 and t = 320. This nucleus grows systematically un-til the final frame when the whole system is in the equilibrium DOWN phase. ❚

Co mput e r s i m ul a t i o n s 185

01adBARYAM_29412 3/10/02 10:17 AM Page 185

suddenly shifted above their boiling temperature. Small bubbles would then nucleatealong the ionization trail left by charged particles moving through the tank.The bub-bles could be photographed and the tracks of the particles identified. Such detectorswere called bubble chambers. This methodology has been largely abandoned in favorof electronic detectors. There is a limit to how far a system can be supercooled or su-perheated. The limit is easy to understand in the Ising model. If a system with a pos-itive magnetization m is subject to a negative magnetic field of magnitude greaterthan zJm, then each individual spin will flip DOWN independent of its neighbors. Thisis the ultimate limit for nucleation kinetics.

1.6.9 Connections between CA and the Ising modelOur primary objective throughout this section is the investigation of the equilibriumproperties of interacting systems. It is useful, once again, to consider the relationshipbetween the equilibrium ensemble and the kinetic CA we considered in Section 1.5.When a deterministic CA evolves to a unique steady state independent of the initialconditions, we can identify the final state as the T = 0 equilibrium ensemble. This is,however, not the way we usually consider the relationship between a dynamic systemand its equilibrium condition. Instead, the equilibrium state of a system is generallyregarded as the time average over microscopic dynamics. Thus when we use the CAto represent a microscopic dynamics, we could also identify a long time average of aCA as the equilibrium ensemble. Alternatively, we can consider a stochastic CA thatevolves to a unique steady-state distribution where the steady state is the equilibriumensemble of a suitably defined energy function.

Computer Simulations (Monte Carlo,Simulated Annealing)

Com p uter simu l a ti ons en a ble us to inve s ti ga te the properties of dynamical sys tems byd i rect ly stu dying the properties of p a rticular model s . O ri gi n a lly, the introdu cti on ofcom p uter simu l a ti on was vi ewed by many re s e a rch ers as an unde s i ra ble ad ju n ct to an-a lytic theory. Cu rren t ly, s i mu l a ti ons play su ch an important role in scien tific stu d i e sthat many analytic re sults are not bel i eved unless they are te s ted by com p uter simu l a-ti on . In part , this ref l ects the understanding that analytic inve s ti ga ti ons of ten requ i rea pprox i m a ti ons that are not nece s s a ry in com p uter simu l a ti on s . Wh en a series of a p-prox i m a ti ons has been made as part of an analytic stu dy, a com p uter simu l a ti on of t h eori ginal probl em can direct ly test the approx i m a ti on s . If the approx i m a ti ons are va l i-d a ted , the analytic re sults of ten gen era l i ze the simu l a ti on re su l t s . In many other cases,s i mu l a ti ons can be used to inve s ti ga te sys tems wh ere analytic re sults are unknown .

1.7.1 Molecular dynamics and deterministic simulationsThe simulation of systems composed of microscopic Newtonian particles that expe-rience forces due to interparticle interactions and external fields is called moleculardynamics. The techniques of molecular dynamics simulations, which integrate

1.7



01adBARYAM_29412 3/10/02 10:17 AM Page 186

Newton’s laws for individual particles,have been developed to optimize the efficiencyof computer simulation and to take advantage of parallel computer architectures.Typically, these methods implement a discrete iterative map (Section 1.1) for the par-ticle positions. The most common (Verlet) form is:

r(t) = 2r(t − ∆t) − r(t − 2∆t) + ∆t2 a(t − ∆t) (1.7.1)

where a(t) = F(t)/m is the force on the particle calculated from models for interparti-cle and external forces. As in Section 1.1, time would be measured in units of the timeinterval ∆t for convenience and efficiency of implementation. Eq. (1.7.1) is alge-braically equivalent to the iterative map in Question 1.1.4, which is written as an up-date of both position and velocity:

r(t) = r(t − ∆t) + ∆tv(t − ∆t /2)

v(t + ∆t /2) = v(t − ∆t /2) + ∆ta(t)(1.7.2)

As indicated, the velocity is interpreted to be at half integral times, though this doesnot affect the result of the iterative map.

For most such simulations of physical systems,the accuracy is limited by the useof models for interatomic interactions. Modern efforts attempt to improve upon thisapproach by calculating forces from quantum mechanics. However, such simulationsare very limited in the number of particles and the duration of a simulation.A usefulmeasure of the extent of a simulation is the product Ntmax of the amount of physicaltime tmax, and the number of particles that are simulated N. Even without quantummechanical forces,molecular dynamics simulations are still far from being able to de-scribe systems on a space and time scale comparable to human senses. However, thereare many questions that can be addressed regarding microscopic properties of mole-cules and materials.

The development of appropriate simplified macroscopic descriptions of physicalsystems is an essential aspect of our understanding of these systems. These modelsmay be based directly upon macroscopic phenomenology obtained from experiment.We may also make use of the microscopic information obtained from various sources,including both theory and experiment, to inform our choice of macroscopic models.It is more difficult, but important as a strategy for the description of both simple andcomplex systems, to develop systematic methods that enable macroscopic models tobe obtained directly from microscopic models. The development of such methods isstill in its infancy, and it is intimately related to the issues of emergent simplicity andcomplexity discussed in Chapter 8.

Abstract mathematical models that describe the deterministic dynamics for var-ious systems, whether represented in the form of differential equations or determin-istic cellular automata (CA, Section 1.5), enable computer simulation and studythrough integration of the differential equations or through simulation of the CA.The effects of external influences, not incorporated in the parameters of the model,may be modeled using stochastic variables (Section 1.2).Such models, whether of flu-ids or of galaxies, describe the macroscopic behavior of physical systems by assumingthat the microscopic (e.g., molecular) motion is irrelevant to the macroscopic

Co mput e r s i m u l a t i o n s 187


01adBARYAM_29412 3/10/02 10:17 AM Page 187

phenomena being described. The microscopic behavior is summarized by parameterssuch as density, elasticity or viscosity. Such model simulations enable us to describemacroscopic phenomena on a large range of spatial and temporal scales.

1.7.2 Monte Carlo simulationsIn our investigations of various systems, we are often interested in average quantitiesrather than a complete description of the dynamics.This was particularly apparent inSection 1.3, when equilibrium thermodynamic properties of systems were discussed.The ergodic theorem (Section 1.3.5) suggested that we can use an ensemble averageinstead of the space-time average of an experiment. The ensemble average enables usto treat problems analytically, when we cannot integrate the dynamics explicitly. Forexample, we studied equilibrium properties of the Ising model in Section 1.6 withoutreference to its dynamics. We were able to obtain estimates of its free energy, energyand magnetization by averaging various quantities using ensemble probabilities.

However, we also found that there were quite severe limits to our analytic capa-bilities even for the simplest Ising model. It was necessary to use the mean field ap-proximation to obtain results analytically. The essential difficulty that we face in per-forming ensemble averages for complex systems,and even for the simple Ising model,is that the averages have to be performed over the many possible states of the system.For as few as one hundred spins,the number of possible states of the system—2100—is so large that we cannot average over all of the possible states. This suggests that weconsider approximate numerical techniques for studying the ensemble averages. Inorder to perform the averages without summing over all the states, we must find someway to select a representative sample of the possible states.

Mon te Ca rlo simu l a ti ons were devel oped to en a ble nu m erical avera ges to be per-form ed ef f i c i en t ly. Th ey play a cen tral role in the use of com p uters in scien ce . Mon teCa rlo can be thought of as a gen eral way of e s ti m a ting avera ges by sel ecting a limitedsample of s t a tes of the sys tem over wh i ch the avera ges are perform ed . In order to opti-m i ze conver gen ce of the avera ge , we take adva n t a ge of i n form a ti on that is known abo utthe sys tem to sel ect the limited sample. As we wi ll see , u n der some circ u m s t a n ce s ,t h es equ en ce of s t a tes sel ected in a Mon te Ca rlo simu l a ti on may itsel f be used as a model ofthe dynamics of a sys tem . Th en ,i f we are careful abo ut de s i gning the Mon te Ca rl o, wecan sep a ra te the time scales of a sys tem by tre a ting the fast degrees of f reedom using anen s em ble avera ge and sti ll treat ex p l i c i t ly the dynamic degrees of f reedom .

To introduce the concept of Monte Carlo simulation,we consider finding the av-erage of a function f (s), where the system variable s has the probability P(s). For sim-plicity, we take s to be a single real variable in the range [−1,+1]. The average can beapproximated by a sum over equally spaced values si :

(1.7.3)

This formula works well if the functions f (s) and P(s) are reasonably smooth and uni-form in magnitude. However, when they are not smooth,this sum can be a very inef-

< f (s) > = f (s)P(s)ds−1

1

∫ ≈ f (s i )P(si )s i

∑ s =1

Mf (n/ M)P(n /M )

n=−M

M

∑



01adBARYAM_29412 3/10/02 10:17 AM Page 188

ficient way to perform the integral. Consider this integral when P(s) is a Gaussian,andf (s) is a constant:

(1.7.4)

A plot of the integrand in Fig. 1.7.1 shows that for <<1 we are performing the inte-gral by summing many values that are essentially zero. These values contribute noth-ing to the result and require as much computational effort as the comparatively fewpoints that do contribute to the integral near s = 0, where the function is large. Thefew points near s = 0 will not give a very accurate estimate of the integral. Thus,mostof the computational work is being wasted and the integral is not accurately evalu-ated. If we want to improve the accuracy of the sum, we have to increase the value ofM. This means we will be summing many more points that are almost zero.

To avoid this problem, we would like to focus our attention on the region inEq. (1.7.4) where the integrand is large. This can be done by changing how we selectthe points where we perform the average. Instead of picking the points at equal inter-vals along the line, we pick them with a probability given by P(s). This is the same assaying that we have an ensemble representing the system with the state variable s.Then we perform the ensemble average:

(1.7.5)

< f (s) > = f (s)P(s)ds∫ =1

Nf (s)

s :P(s )

N

∑

< f (s) > ∝ e − s2

/ 22

ds−1

1

∫ ≈1

Me −(n/ M )

2/2

2

n=− M

M

∑



-1 −σ 0 1σ

Figure 1.7.1 Plot of the Gaussian distribution illustrating that an integral that is performedby uniform sampling will use a lot of points to represent regions where the Gaussian is van-ishingly small. The problem gets worse as becomes smaller compared to the region overwhich the integral must be performed. It is much worse in typical multidimensional averageswhere the Boltzmann probability is used. Monte Carlo simulations make such integrals com-putationally feasible by sampling the integrand in regions of high probability. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 189

The latter expression represents the sum over N values of s, where these values havethe probability distribution P(s). We have implicitly assumed that the function f (s) isrelatively smooth compared to P(s). In Eq. (1.7.5) we have replaced the integral witha sum over an ensemble. The problem we now fa ce is to obtain the members of theensemble with probability P(s). To do this we will invert the ergodic theorem ofSection 1.3.5.

Since Section 1.3 we have described an ensemble as representing a system, if thedynamics of the system satisfied the ergodic theorem. We now turn this around andsay that the ensemble sum in Eq.(1.7.5) can be represented by any dynamics that sat-isfies the ergodic theorem, and which has as its equilibrium probability P(s). To dothis we introduce a time variable t that, for our current purposes, just indicates theorder of terms in the sum we are performing. The value of s appearing in the t th termwould be s(t). We then rewrite the ergodic theorem by considering the time averageas an approximation to the ensemble average (rather than the opposite):

(1.7.6)

The problem remains to sequentially generate the states s(t), or, in other words, tospecify the dynamics of the system. If we know the probability P(s),and s is a few bi-nary or real variables,this may be done directly with the assistance of a random num-ber generator (Question 1.7.1). However, often the system coordinate s represents alarge number of variables.A more serious problem is that for models of physical sys-tems, we generally don’t know the probability distribution explicitly.

Thermodynamic systems are described by the Boltzmann probability(Section 1.3):

(1.7.7)

where {x,p} are the microscopic coordinates of the system,and E({x,p}) is the micro-scopic energy. An example of a quantity we might want to calculate would be the av-erage energy:

(1.7.8)

In many cases,as discussed in Section 1.4, the quantity that we would like to find theaverage of depends only on the position of particles and not on their momenta. Wethen write more generally

(1.7.9)

P(s) = 1

Z s

e −F(s )/ kT

Z s = e −F(s )/kT

s∑

U =1

ZE({x , p})e −E({x ,p})/ kT

{x ,p}

∑

P({x, p}) =1

Ze −E({x ,p})/ kT

Z = e −E({ x,p}) /kT

{x ,p}∑

< f (s) > =

1

Tf (s(t))

t =1

T

∑



01adBARYAM_29412 3/10/02 10:17 AM Page 190

where we use the system state variable s to represent the relevant coordinates of thesystem. We make no assumption about the dimensionality of the coordinate s whichmay, for example, be the coordinates {x} of all of the particles. F(s) is the free energyof the set of states associated with the coordinate s. A precise definition, which indi-cates both the variable s and its value s ′, is given in Eq. (1.4.27):

(1.7.10)

We note that Eq. (1.7.9) is often written using the notation E(s) (the energy of s) in-stead of F(s) (the free energy of s),though F(s) is more correct. An average we mightcalculate, of a quantity Q(s), would be:

(1.7.11)

where Q(s) is assumed to depend only on the variable s and not directly on {x,p}.The problem with the evaluation of either Eq. (1.7.8) or Eq. (1.7.11) is that the

Boltzmann probability does not explicitly give us the probability of a particular state.In order to find the actual probability, we need to find the partition function Z. To cal-culate Z we need to perform a sum over all states of the system, which is computa-tionally impossible.Indeed,if we were able to calculate Z, then,as discussed in Section1.3, we would know the free energy and all the other thermodynamic properties of thesystem. So a prescription that relies upon knowing the actual value of the probabilitydoesn’t help us. However, it turns out that we don’t need to know the actual proba-bility in order to construct a dynamics for the system, only the relative probabilitiesof particular states. The relative probability of two states, P(s) / P(s′),is directly givenby the Boltzmann probability in terms of their relative energy:

P(s) / P(s′) = e−(F(s)−F(s′))/kT (1.7.12)

This is the key to Monte Carlo simulations. It is also a natural result, since a systemthat is evolving in time does not know global properties that relate to all of its possi-ble states. It only knows properties that are related to the energy it has,and how thisenergy changes with its configuration. In classical mechanics, the change of energywith configuration would be the force experienced by a particle.

Our task is to describe a dynamics that generates a sequence of states of a systems(t) with the proper probability distribution, P(s). The classical (Newtonian) ap-proach to dynamics implies that a deterministic dynamics exists which is responsiblefor generating the sequence of states of a physical system. In order to generate theequilibrium ensemble, however, there must be contact with a thermal reservoir.Energy transfer between the system and the reservoir introduces an external interac-tion that disrupts the system’s deterministic dynamics.

We will make our task simpler by allowing ourselves to consider a stochasticMarkov chain (Section 1.2) as the dynamics of the system. The Markov chain is de-scribed by the probability Ps(s′|s ′′) of the system in a state s = s′′ making a transition

U =1

ZQ(s)e −F (s )/ kT

s

∑

Fs( ′ s ) = −kT ln( s , ′ s e −E ({x,p}) /kT

{x ,p}

∑ )

Co mp ut e r s i m ul a t i o n s 191


01adBARYAM_29412 3/10/02 10:17 AM Page 191

to the state s = s′. A particular sequence s(t) is generated by starting from one config-uration and choosing its successors using the transition probabilities.

The general formulation of a Markov chain includes the classical Newtonian dy-namics and can also incorporate the effects of a thermal reservoir. However, it is gen-erally convenient and useful to use a Monte Carlo simulation to evaluate averages thatdo not depend on the momenta,as in Eq.(1.7.11). There are some drawbacks to thisapproach. It limits the properties of the system whose averages can be evaluated.Systems where interactions between par ticles depend on their momenta cannot beeasily included. Moreover, averages of quantities that depend on both the momentumand the position of particles cannot be performed. However, if the energy separatesinto potential and kinetic energies as follows:

(1.7.13)

then averages over all quantities that just depend on momenta (such as the kinetic en-ergy) can be evaluated directly without need for numerical computation. These aver-ages are the same as those of an ideal gas. Monte Carlo simulations can then be usedto perform the average over quantities that depend only upon position {x}, or moregenerally, on position-related variables s. Thus,in the remainder of this section we fo-cus on describing Markov chains for systems described only by position-related vari-ables s.

As described in Section 1.2 we can think about the Markov dynamics as a dy-namics of the probability rather than the dynamics of a system. Then the dynamicsare specified by

(1.7.14)

In order for the stochastic dynamics to represent the ensemble, we must have the timeaverage over the probability distribution Ps(s′,t) equal to the ensemble probability.This is true for a long enough time average if the probability converges to the ensem-ble probability distribution, which is a steady-state distribution of the Markov chain:

(1.7.15)

Thermodynamics and stochastic Markov chains meet when we construct the Markovchain so that the Boltzmann probability, Eq. (1.7.9), is the limiting distribution.

We now make use of the Perron-Frobenius theorem (see Section 1.7.4 below),which says that a Markov chain governed by a set of transition probabilities Ps(s′|s′′)converges to a unique limiting probability distribution as long as it is irreducible andacyclic. Irreducible means that there exist possible paths between each state and allother possible states of the system. This does not mean that all states of the system areconnected by nonzero transition probabilities. There can be transition probabilitiesthat are zero. However, it must be impossible to separate the states into two sets forwhich there are no transitions from one set to the other. Acyclic means that the sys-tem is not ballistic—the states are not organized by the transition matrix into a ring

Ps( ′ s ) = Ps( ′ s ;∞) = Ps( ′ s | ′ ′ s )Ps( ′ ′ s ;∞)′ ′ s

∑

Ps( ′ s ;t) = Ps( ′ s | ′ ′ s )Ps( ′ ′ s ;t −1)′ ′ s

∑

E({x, p}) =V ({x})+ pi2 /2m

i

∑



01adBARYAM_29412 3/10/02 10:17 AM Page 192

with a deterministic flow around it. There may be currents, but they must not be de-terministic. It is sufficient for there to be a single state which has a nonzero probabil-ity of making a transition to itself for this condition to be satisfied,thus it is often as-sumed and unstated.

We can now summarize the problem of identifying the desired Markov chain.Wemust construct a matrix Ps(s′|s′′) that satisfies three properties.First,it must be an al-lowable transition matrix. This means that it must be nonnegative, Ps(s′′|s′)≥0, andsatisfy the normalization condition (Eq (1.2.4)):

(1.7.16)

Second,it must have the desired probability distribution, Eq.(1.7.9),as a fixed point.Third, it must not be reducible—it is possible to construct a path between any twostates of the system.

These conditions are sufficient to guarantee that a long enough Markov chainwill be a good approximation to the desired ensemble. There is no guarantee that theconvergence will be rapid.As we have seen in Section 1.4,in the case of the glass tran-sition,the ergodic theorem may be violated on all practical time scales for systems thatare following a particular dynamics. This applies to realistic or artificial dynamics. Ingeneral such violations of the ergodic theorem, or even just slow convergence of av-erages, are due to energy barriers or entropy “bottlenecks” that prevent the systemfrom reaching all possible configurations of the system in any reasonable time. Suchobstacles must be determined for each system that is studied, and are sometimes butnot always apparent. It should be understood that different dynamics will satisfy theconditions of the ergodic theorem over very different time scales. The equivalence ofresults of an average performed using two distinct dynamics is only guaranteed iftheyare both simulated for long enough so that each satisfies the ergodic theorem.

Our discussion here also gives some additional insights into the conditions un-der which the ergodic theorem applies to the actual dynamics of physical systems. Wenote that any proof of the applicability of the ergodic theorem to a real system re-quires considering the actual dynamics rather than a model stochastic process. Whenthe ergodic theorem does not apply to the actual dynamics, then the use of a MonteCarlo simulation for performing an average must be considered carefully. It will notgive the same results if it satisfies the ergodic theorem while the real system does not.

We are still faced with the task of selecting values for the transition probabilitiesPs(s′|s′′) that satisfy the three requirements given above. We can simplify our searchfor transition probabilities Ps(s′|s′′) for use in Monte Carlo simulations by imposingthe additional constraint of microscopic reversibility, also known as detailed balance:

Ps(s′′|s′)Ps(s′;∞) = Ps(s′|s′′) Ps(s′′;∞) (1.7.17)

This equation implies that the transition currents between two states of the system areequal and therefore cancel in the steady state,Eq.(1.7.15). It corresponds to true equi-librium,as would be present in a physical system. Detailed balance implies the steady-state condition, but is not required by it.Steady state can also include currents that do

Ps( ′ ′ s | ′ s )′ ′ s

∑ = 1

Co mp ute r s im u l a t io ns 193


01adBARYAM_29412 3/10/02 10:17 AM Page 193

not change in time. We can prove that Eq.(1.7.17) implies Eq. (1.7.15) by summingover s′:

(1.7.18)

We do not yet have an explicit prescription for Ps(s′|s′′). There is still a tremen-dous flexibility in determining the transition probabilities.One prescription that en-ables direct implementation, called Metropolis Monte Carlo, is:

(1.7.19)

These expressions specify the transition probability Ps(s′|s′′) in terms of a symmetricstochastic matrix (s′|s′′). (s′|s′′) is independent of the limiting equilibrium distrib-ution. The constraint associated with the limiting distribution has been incorporatedexplicitly into Eq. (1.7.19). It satisfies detailed balance by direct substitution inEq. (1.7.17), since for Ps(s′)≥ Ps(s′′) (similarly for the opposite) we have

(1.7.20)

The sym m etry of the matrix (s′ |s′ ′) is essen tial to the proof of det a i l ed balance . O n emust of ten be careful in the de s i gn of s pecific algorithms to en su re this property. It isalso important to note that the limiting prob a bi l i ty appe a rs in Eq . (1.7.19) on ly in theform of a ra tio Ps(s′)/Ps(s′′) wh i ch can be given direct ly by the Boltzmann distri buti on .

To understand Metropolis Monte Carlo, it is helpful to describe a few examples.We first describe the movement of the system in terms of the underlying stochasticprocess specified by (s′|s′′), which is independent of the limiting distribution. Thismeans that the limiting distribution of the underlying process is uniform over thewhole space of possible states.

A standard way to choose the matrix (s′|s′′) is to set it to be constant for a fewstates s′ that are near s′′. For example, the simplest random walk is such a case, sinceit allows a probability of 1/2 for the system to move to the right and to the left. If s isa continuous variable,we could choose a distance r0 and allow the walker to take a stepanywhere within the distance r0 with equal probability. Both the discrete and contin-uous random walk have d-dimensional analogs or, for a system of interacting parti-cles, N-dimensional analogs. When there is more than one dimension,we can chooseto move in all dimensions simultaneously. Alternatively, we can choose to move inonly one of the dimensions in each step. For an Ising model (Section 1.6), we couldallow equal probability for any one of the spins to flip.

Once we have specified the underlying stochastic process, we generate the se-quence of Monte Carlo steps by applying it. However, we must modify the probabili-ties according to Eq. (1.7.19). This takes the form of choosing a step, but sometimesrejecting it rather than taking it. When a step is rejected,the system does not change

Ps( ′ ′ s | ′ s )Ps ( ′ s ) = ( ′ ′ s | ′ s )Ps( ′ s ) = ( ′ s | ′ ′ s )Ps( ′ s )

= ( ′ s | ′ ′ s )Ps( ′ s )/Ps( ′ ′ s )( )Ps( ′ ′ s ) = Ps( ′ s | ′ ′ s )Ps( ′ ′ s )

Ps( ′ s | ′ ′ s ) = ( ′ s | ′ ′ s ) Ps( ′ s )/Ps( ′ ′ s ) ≥1 ′ ′ s ≠ ′ s

Ps( ′ s | ′ ′ s ) = ( ′ s | ′ ′ s )Ps( ′ s )/ Ps( ′ ′ s ) Ps( ′ s )/Ps( ′ ′ s ) <1 ′ ′ s ≠ ′ s

Ps( ′ ′ s | ′ ′ s ) = 1−′ s ≠ ′ ′ s

∑ Ps ( ′ s | ′ ′ s )

′ s

∑ Ps( ′ ′ s | ′ s )Ps( ′ s ;∞) =′ s

∑ Ps( ′ s | ′ ′ s )Ps ( ′ ′ s ; ∞) = Ps( ′ ′ s ;∞)



01adBARYAM_29412 3/10/02 10:17 AM Page 194

its state.This gives rise to the third equation in Eq.(1.7.19) where the system does notmove. Specifically, we can implement the Monte Carlo process according to the fol-lowing prescription:

1. Pick one of the possible moves allowed by the underlying process. The selectionis random from all of the possible moves. This guarantees that we are selecting itwith the underlying probability (s′|s′′).

2. Calculate the ratio of probabilities between the location we are going to, com-pared to the location we are coming from

Ps(s′′) / Ps(s′) = e−(E(s′)−E(s′))/kT (1.7.21)

If this ratio of probabilities is greater than one, which means the energy is lower wherewe are going, the step is accepted. This gives the probability for the process to occuras (s′|s′′), which agrees with the first line of Eq.(1.7.19). However, if this ratio is lessthan one, we accept it with a probability given by the ratio. For example,if the ratio is0.6, we accept the move 60% of the time. If the move is rejected,the system stays in itsoriginal location.Thus,if the energy where we are trying to go is higher, we do not ac-cept it all the time, only some of the time. The likelihood that we accept it decreasesthe higher the energy is.

The Metropolis Monte Carlo prescription makes logical sense. It tends to movethe system to regions of lower energy. This must be the case in order for the final dis-tribution to satisfy the Boltzmann probability. However, it also allows the system toclimb up in energy so that it can reach, with a lower probability, states of higher en-ergy. The ability to climb in energy also enables the system to get over barriers such asthe one in the two-state system in Section 1.4.

For the Ising model , we can see that the Mon te Ca rlo dynamics that uses all singl espin flips as its underlying stoch a s tic process is not the same as the Glauber dy n a m i c s( Secti on 1.6.7), but is similar. Both begin by sel ecting a particular spin. Af ter sel ecti onof the spin, the Mon te Ca rlo wi ll set the spin to be the oppo s i te with a prob a bi l i ty:

min(1,e −(E(1)−E(−1)) / kT) (1.7.22)

This means that if the energy is lower for the spin to flip, it is flipped. If it is higher, itmay still flip with the indicated probability. This is different from the Glauber pre-scription, which sets the selected spin to UP or DOWN according to its equilibriumprobability (Eq. (1.6.61)–Eq. (1.6.63)). The difference between the two schemes canbe shown by plotting the probability of a selected spin being UP as a function of theenergy difference between UP and DOWN, E+ = E(1) – E(–1) (Fig. 1.7.2). The Glauberdynamics prescription is independent of the starting value of the spin.The MetropolisMonte Carlo prescription is not. The latter causes more changes, since the spin ismore likely to flip. Unlike the Monte Carlo prescription, the Glauber dynamics ex-plicitly requires knowledge of the probabilities themselves. For a single spin flip in anIsing system this is fine, because there are only two possible states and the probabili-ties depend only on E+. However, this is difficult to generalize when a system has manymore possible states.

Co mp ut e r s i m u l a t i o n s 195


01adBARYAM_29412 3/10/02 10:17 AM Page 195

There is a way to generalize further the use of Monte Carlo by recognizing thatwe do not even have to use the correct equilibrium probability distribution when gen-erating the time series. The generalized expression for an arbitrary probability distri-bution P ′(s) is:

(1.7.23)

The su b s c ri pt P(s) indicates that the avera ge assumes that s has the prob a bi l i ty dis-tri buti on P(s) ra t h er than P ′(s) . This equ a ti on gen era l i zes Eq . ( 1 . 7 . 5 ) . The prob-l em with this ex pre s s i on is that it requ i res that we know ex p l i c i t ly the prob a bi l i ti e sP(s) and P ′(s) . This can be rem ed i ed . We illu s tra te for a specific case, wh ere we usethe Boltzmann distri buti on at one tem pera tu re to eva lu a te the avera ge at anothertem pera tu re :

(1.7.24)

The ratio of partition functions can be directly evaluated as an average:

< f (s) >P (s ) =1

N

f (s)P(s)

′ P (s)s: ′ P (s)

N

∑ =′ Z

Z

1

Nf (s)e −E(s)(1/kT −1/k ′ T )

s : ′ P (s )

N

∑

< f (s) >P (s ) =f (s)P(s)

′ P (s)′ P (s)ds∫ =

1

N

f (s)P(s)

′ P (s)s : ′ P (s)

N

∑



-10 -5 5 10

0.2

0.4

0.6

0.8

1

0E+/kT

Ps (1; t)

s(t–1)=–1

Metropolis Monte Carlo

Glauber Dynamicss(t–1)=1

Figure 1.7.2 Illustration of the difference between Metropolis Monte Carlo and Glauber dy-namics for the update of a spin in an Ising model. The plots show the probability Ps(1;t ) ofa spin being UP at time t. The Glauber dynamics probability does not depend on the startingvalue of the spin. There are two curves for the Monte Carlo probability, for s(t − 1) = 1 ands(t − 1) = −1. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 196

(1.7.25)

Thus we have the expression:

(1.7.26)

This means that we can obtain the average at various temperatures using only a sin-gle Monte Carlo simulation. However, the whole point of using the ensemble aver-age is to ensure that the average converges rapidly. This may not happen if the en-semble temperature T ′ is much different from the temperature T. On the otherhand, there are circumstances where the function f(s) may have an energy depen-dence that makes it better to perform the average using an ensemble that is not theequilibrium ensemble.

The approach of Monte Carlo simulations to the study of statistical averages en-sures that we do not have to be concerned that the dynamics we are using for the sys-tem is a real dynamics. The result is the same for a broad class of artificial dynamics.The generality provides a great flexibility; however, this is also a limitation. We can-not use the Monte Carlo dynamics to study dynamics. We can only use it to performstatistical averages. Must we be resigned to this limitation? The answer, at least in part,is no. The reason is rooted in the central limit theorem. For example,the implemen-tations of Metropolis Monte Carlo and the Glauber dynamics are quite different. Weknow that in the limit of long enough times, the distribution of configurations gen-erated by both is the same. We expect that since each of them flips only one spin,if weare interested in changes in many spins,the two should give comparable results in thesense of the central limit theorem. This means that aside from an overall scale factor,the time evolution of the distribution of probabilities for long times is the same. Sincewe already know that the limiting distribution is the same in both cases, we are as-serting that the approach to this limiting distribution, which is the long time dynam-ics, is the same.

The claim that for a large number of steps all dynamics is the same is not trueabout all possible Monte Carlo dynamics.If we allow all of the spins in an Ising modelto change their values in one step of the underlying dynamics (s′|s′′), then this stepwould be equivalent to many steps in a dynamics that allows only one spin to flip at atime. In order for two different dynamics to give the same results,there are two typesof constraints that are necessary. First, both must have similar kinds of allowed steps.Specifically, we define steps to the naturally proximate configurations as local moves.As long as the Monte Carlo allows only local moves, the long time dynamics shouldbe the same. Such dynamics correspond to a local diffusion in the space of possible

< f (s) >P (s ) = f (s)e −E(s)(1/kT −1/ k ′ T )

s : ′ P (s )

N

∑ e −E(s )(1/kT −1/k ′ T )

s : ′ P (s )

N

∑

Z

′ Z

=

e −E(s )/kT

s∑

e −E(s )/ k ′ T

s∑

=e −E(s )(1/kT −1/ k ′ T )e −E(s)/k ′ T

s∑

e −E (s )/k ′ T

s∑

= e −E(s )(1/kT −1/k ′ T )

′ P (s )= 1

Ne −E (s )(1/kT−1/k ′ T )

s : ′ P (s)

N

∑

Co mp ut e r s i m u l a t io ns 197


01adBARYAM_29412 3/10/02 10:17 AM Page 197

configurations of the system. More generally, two different dynamics should be thesame if configuration changes that require many steps in one also require many stepsin the other. The second type of constraint is related to symmetries of the problem. Alack of bias in the random walk was necessary to guarantee that the Gaussian distrib-ution resulted from a generalized random walk in Section 1.2. For systems with morethan one dimension, we must also ensure that there is no relative bias between mo-tion in different directions.

We can think about Monte Carlo dynamics as diffusive dynamics of a system thatinteracts frequently with a reservoir. There are properties of more realistic dynamicsthat are not reproduced by such configuration Monte Carlo simulations. Correlationsbetween steps are not incorporated because of the assumptions underlying Markovchains. This rules out ballistic motion,and exact or approximate momentum conser-vation. Momentum conservation can be included if both position and momentumare included as system coordinates. The method called Brownian dynamics incorpo-rates both ballistic and diffusive dynamics in the same simulation. However, if corre-lations in the dynamics of a system have a shorter range than the motion we are in-terested in, momentum conservation may not matter to results that are of interest,and conventional Monte Carlo simulations can be used directly.

In summary, Monte Carlo simulations are designed to reproduce an ensemblerather than the dynamics of a particular system. As such,they are ideally suited to in-vestigating the equilibrium properties of thermodynamic systems. However, MonteCarlo dynamics with local moves often mimic the dynamics of real systems. Thus,Monte Carlo simulations may be used to investigate the dynamics of systems whenthey are appropriately designed. This property will be used in Chapter 5 to simulatethe dynamics of long polymers.

There is a flip side to the design of Monte Carlo dynamics to simulate actual dy-namics. If our objective is the traditional objective of a Monte Carlo simulation, ofobtaining an ensemble average, then the ability to simulate dynamics may not be anadvantage. In some systems, the real dynamics is slow and we would prefer to speedup the process. This can often be done by knowingly introducing nonlocal moves thatdisplace the state of the system by large distances in the space of conformations. Suchnonlocal Monte Carlo dynamics have been designed for various systems. In particu-lar, both local and nonlocal Monte Carlo dynamics for the problem of polymer dy-namics will be described in Chapter 5.

Question 1.7.1 In order to perform Monte Carlo simulations, we mustbe able to choose steps at random and accept or reject steps with a cer-

tain probability. These operations require the availability of random num-bers. We might think of the source of these random numbers as a thermalreservoir. Computers are specifically deigned to be completely deterministic.This means that inherently there is no randomness in their operation. To ob-tain random numbers in a computer simulation requires a deterministic al-gorithm that generates a sequence of numbers that look random but are notrandom. Such sequences are called pseudo-random numbers. Random



01adBARYAM_29412 3/10/02 10:17 AM Page 198

numbers should not be correlated to each other. However, using pseudo-random numbers, if we start a program over again we must get exactly thesame sequence of numbers. The difficulties associated with the generation ofrandom numbers are central to performing Monte Carlo computer simula-tions. If we assume that we have random numbers, and they are not reallyuncorrelated, then our results may very well be incorrect. Nevertheless,pseudo-random numbers often give results that are consistent with those ex-pected from random numbers.

There are a variety of techniques to generate pseudo-random numbers.Many of these pseudo-random number generators are designed to provide,with equal “probability,” an integer between 0 and the maximal integer pos-sible. The maximum integer used by a particular routine on a particular ma-chine should be checked before using it in a simulation. Some use a standardshort integer which is represented by 16 bits (2 bytes).One bit represents theunused sign of the integer. This leaves 15 bits for the magnitude of the num-ber. The pseudo-random number thus ranges up to 215 − 1 = 32767. An ex-ample of a routine that provides pseudo-random integers is the subroutiner a n d ( ) in the ANSI C library, which is executed using a line such as:

k = r a n d ( ); (1.7.27)

The following three questions discuss how to use such a pseudo-randomnumber generator. Assume that it provides a standard short integer.

1. Explain how to use a pseudo-random number generator to choose amove in a Metropolis Monte Carlo simulation, Eq. (1.7.19).

2. Explain how to use a pseudo-random number generator to accept or re-ject a move in a Metropolis Monte Carlo simulation, Eq. (1.7.19).

3. Explain how to use a pseudo-random number generator to provide val-ues of x with a probability P(x) for x in the interval [0,1]. Hint: Use twopseudo-random numbers every step.

Solution 1.7.1

1. Given the necessity of choosing one out of M possible moves, we crea tea on e - to - one mapping bet ween the M m oves and the integers {0, . . . ,M − 1} If M is smaller than 215 we can use the value of k = r a n d ( ) todetermine which move is taken next. If k is larger than M − 1, we don’tmake any move. If M is much smaller than 215 then we can use onlysome of the bits of k. This avoids making many unused calls tor a n d ( ).Fewer bits can be obtained using a modulo operation. For example, ifM = 10 we might use k modulo 16. We could also ignore values above32759,and use k modulo 10. This also causes each move to occur withequal frequency. However, a standard word of caution about using onlya few bits is that we shouldn’t use the lowest order bits (i.e., the units,twos and fours bits), because they tend to be more correlated than the



01adBARYAM_29412 3/10/02 10:17 AM Page 199

higher order bits. Thus it may be best first to divide k by a small num-ber, like 8 (or equivalently to shift the bits to the right),if it is desired touse fewer bits. If M is larger than 215 it is necessary to use more than onecall to r a n d ( ) (or a random number generator that provides a 4-byteinteger) so that all possible moves are accounted for.

2. Given the necessity of determining whether to accept a move with theprobability P, we compare 215 P with a number given by k = r a n d ( ).If the former is bigger we accept the move, and if it is smaller we rejectthe move.

3. One way to do this is to gen era te two ra n dom nu m bers r1 and r2.Dividing both by 32767 (or 21 5) , we use the first ra n dom nu m ber to bethe loc a ti on in the interval x = r1/ 3 2 7 6 7 . However, we use this loc a ti onon ly if the second ra n dom nu m ber r2 /32767 is small er than P(x) . If t h era n dom nu m ber is not used , we gen era te two more and proceed . Th i smeans that we wi ll use the po s i ti on x with a prob a bi l i ty P(x) as de s i red .Because it is nece s s a ry to gen era te many ra n dom nu m bers that are re-j ected , this met h od for gen era ting nu m bers for use in performing the in-tegral Eq . (1.7.3) is on ly useful if eva lu a ti ons of the functi on f(x) aremu ch more co s t ly than ra n dom nu m ber gen era ti on . ❚

Question 1.7.2 To compare the errors that arise from conventional nu-merical integration and Monte Carlo sampling, we return to Eq.(1.7.4)

and Eq. (1.7.5) in this and the following question. We choose two integralsthat can be evaluated analytically and for which the errors can also be eval-uated analytically.

Evaluate two examples of the integral ∫P(x)f (x)dx over the intervalx ∈[1,1]. For the first example (1) take f(x) = 1, and for the second (2)f(x) = x. In both cases assume the probability distribution is an exponential

(1.7.28)

where the normalization constant A is given by the expression in squarebrackets.

Calculate the two integrals exactly (analytically). Then evaluate approx-imations to the integrals using sums over N equally spaced points,Eq.(1.7.4). These sums can also be evaluated analytically. To improve the re-sult of the sum,you can use Simpson’s rule. This modifies Eq.(1.7.4) only bysubtracting 1/2 of the value of the integrand at the first and last points. Theerrors in evaluation of the same integral by Monte Carlo simulation are tobe calculated in Question 1.7.3.

Solution 1.7.2

1. The value of the integral of P(x) is unity as required by normalization.If we use a sum over equally spaced points we would have:

P(x) = Ae− x=

e − e −

e − x



01adBARYAM_29412 3/10/02 10:17 AM Page 200

(1.7.29)

where we used the temporary definition a = e − /M to obtain

(1.7.30)

Expanding the answer in powers of / M gives:

(1.7.31)

The second term can be eliminated by noting that the sum could beevaluated using Simpson’s rule by subtracting 1/2 of the contribution ofthe end points. Then the third term gives an error of 2 / 2M2. This is theerror in the numerical approximation to the average of f (x) = 1.

2. For f(x) = x the exact integral is:

(1.7.32)

while the sum is:

With some assistance from Mathematica,the expansion to second orderin /M is:

= −A(e − e − )

2+ A

(e +e − )+

A

M

(e −e − )

2+

A

M 2

11

12(e − e − ) +…

= −1/ +coth( )+2M

+ 11

12 M 2+…

A dx xe − x ≈−1

1

∫ A

M 2ne− (n/ M )

n=−M

M

∑ = A

M 2na n

n=−M

M

∑

= A

M 2a

d

daan

n=−M

M

∑ = A

M 2a

d

da

(a M+1 −a − M)

a −1

= A

M 2

((M +1)aM +1 + Ma− M )

a −1+ A

M 2

a(a M +1 − a −M )

(a −1)2

=A

M

(e e / M + e − )

e /M −1+

A

M 2

(e e /M )

e / M − 1+

A

M 2

e / M(e e / M − e − )

(e / M −1)2

A dx xe − x

−1

1

∫ = −Ad

ddxe − x

−1

1

∫ = −Ad

d

e −e −

= −coth( ) +(1/ )

A dxe − x ≈−1

1

∫ A(e − e − ) + A

(e +e − )

2M+ A

(e − e − )

2M 2+…

= 1+2M

tanh( ) +2

2M 2+…

A dxe − x ≈−1

1

∫ A

M

(a M+1 − a −M )

a −1=

A

M

(e e / M −e − )

e / M −1

A dxe − x ≈−1

1

∫ A

Me − (n/ M )

n=−M

M

∑ =A

Ma n

n=− M

M

∑



(1.7.34)

(1.7.33)

01adBARYAM_29412 3/10/02 10:17 AM Page 201

The first two terms are the correct result. The third term can be seen tobe eliminated using Simpson’s rule. The fourth term is the error. ❚

Question 1.7.3 E s ti m a te the errors in performing the same integrals as inQ u e s ti on 1.7.2 using a Mon te Ca rlo en s em ble sampling with N terms as

in Eq .( 1 . 7 . 5 ) . It is not nece s s a ry to eva lu a te the integrals to eva lu a te the errors .

Solution 1.7.3

1. The errors in performing the integral for f(x) = 1 are zero, since theMonte Carlo sampling would be given by the expression:

(1.7.35)

One way to think about this result is that Monte Carlo takes advantageof the normalization of the probability, which the technique of sum-ming the integrand over equally spaced points cannot do. This knowl-edge makes this integral trivial, but it is also of use in performing otherintegrals.

2. To evaluate the error for the integral over f (x) = x we use an argumentbased on the sampling error of different regions of the integral. Webreak up the domain [−1,1] into q regions of size ∆x = 2/q. Each regionis assumed to have a significant number of samples. The number ofthese samples is approximately given by:

NP(x)∆x (1.7.36)

If this were the ex act nu m ber of samples as q i n c re a s ed ,t h en the integra lwould be ex act . However, s i n ce we are picking the points at ra n dom ,t h ere wi ll be a devi a ti on in the nu m ber of these from this ideal va lu e . Th etypical devi a ti on , according to the discussion in Secti on 1.2 of ra n domw a l k s , is the squ a re root of this nu m ber. Thus the error in the su m

(1.7.37)

from a particular interval ∆x is

(NP(x)∆x)1/2 f(x) (1.7.38)

Since this error could have either a positive or negative sign, we musttake the square root of the sum of the squares of the error in each regionto give us the total error:

(1.7.39)

P(x)f (x)∫ −1

Nf (x)

s :P (s )

N

∑ ≈1

NNP(x) x f (x)2∑ ≈

1

NP(x )f (x)2∫

f (x)s :P (s )

N

∑

< 1>P(s )=1

N1

s :P (s )

N

∑ = 1



01adBARYAM_29412 3/10/02 10:17 AM Page 202

For f(x) = x the integral in the square root is:

(1.7.40)

The approach of Mon te Ca rlo is useful wh en the ex pon en tial is ra p i dly de-c ayi n g. In this case, > > 1 , and we keep on ly the third term and have an errorthat is just of m a gn i tu de 1/√N. Com p a ring with the sum over equ a lly spacedpoints from Questi on 1.7.2, we see that the error in Mon te Ca rlo is indepen-dent of for large , while it grows for the sum over equ a lly spaced poi n t s .This is the crucial adva n t a ge of the Mon te Ca rlo met h od . However, for a fixedva lue of we also see that the error is more slowly dec reasing with N than thesum over equ a lly spaced poi n t s . So wh en a large nu m ber of samples is po s s i-bl e , the sum over equ a lly spaced points is more ra p i dly conver gen t . ❚

Question 1.7.4 How would the discrete natu re of the integer ra n domnu m bers de s c ri bed in Questi on 1.7.1 affect the en s em ble sampling?

An s wer qu a l i t a tively. Is there a limit to the acc u racy of the integral in this case?

Solution 1.7.4 The integer random numbers introduce two additionalsources of error, one due to the sampling interval along the x axis and theother due to the imperfect approximation of P(x). In the limit of a largenumber of samples, each of the possible values along the x axis would besampled equally. Thus, the ensemble sum would reduce to a sum of the in-tegrand over equally spaced points. The number of points is given by thelargest integer used (e.g., 215). This limits the accuracy accordingly. ❚

1.7.3 Perron-Frobenius theoremThe Perron-Frobenius theorem is tied to our understanding of the ergodic theoremand the use of Monte Carlo simulations for the representation of ensemble averages.The theorem only applies to a system with a finite space of possible states. It says thata transition matrix that is irreducible must ultimately lead to a stable limiting proba-bility distribution. This distribution is unique, and thus depends only on the transi-tion matrix and not on the initial conditions. The Perron-Frobenius theorem assumesan irreducible matrix,so that starting from any state,there is some path by which it ispossible to reach every other state of the system. If this is not the case,then the theo-rem can be applied to each subset of states whose transition matrix is irreducible.

In a more general form than we will discuss,the Perron-Frobenius theorem dealswith the effect of matrix multiplication when all of the elements of a matrix are pos-itive. We will consider it only for the case of a transition matrix in a Markov chain,which also satisfies the normalization condition, Eq. (1.7.16). In this case, the proofof the Perron-Frobenius theorem follows from the statement that there cannot be anyeigenvalues of the transition matrix that are larger than one. Otherwise there wouldbe a vector that would increase everywhere upon matrix multiplication. This is not

Ae− x f (x)2dx∫ = Ae − xx 2dx∫ = A

d 2

d 2

(e − e − )=

22

−2coth( )

+ 1



01adBARYAM_29412 3/10/02 10:17 AM Page 203

possible, because probability is conserved. Thus if the probability increases in oneplace it must decrease someplace else, and tend toward the limiting distribution.

A difficulty in the proof of the theorem arises from dealing with the case in whichthere are deterministic currents through the system: e.g., ballistic motion in a circu-lar path. An example for a two-state system would be

P(1|1) = 0 P(1| −1) = 1

P(−1|1) = 1 P(−1|−1) = 0(1.7.41)

In this case, a system in the state s = +1, goes into s = −1, and a system in the states = −1 goes into s = +1. The limiting behavior of this Markov chain is of two proba-bilities that alternate in position without ever settling down into a limiting distribu-tion. An example with three states would be

P(1|1) = 0 P(1|2) = 1 P(1|3) = 1

P(2|1) = .5 P(2|2) = 0 P(2|3) = 0 (1.7.42)

P(3|1) = .5 P(3|2) = 0 P(3|3) = 0

Half of the systems with s = 1 make transitions to s = 2 and half to s = 3. All systemswith s = 2 and s = 3 make transitions to s = 1. In this case there is also a cyclical be-havior that does not disappear over time. These examples are special cases, and theproof shows that they are special. It is sufficient, for example, for there to be a singlestate where there is some possibility of staying in the same state. Once this is true,these examples of cyclic currents do not apply and the system will settle down into alimiting distribution.

We will prove the Perron-Frobenius theorem in a few steps enumerated below.The proof is provided for completeness and reference, and can be skipped withoutsignificant loss for the purposes of this book. The proof relies upon properties of theeigenvectors and eigenvalues of the transition matrix. The eigenvectors need not al-ways be positive, real or satisfy the normalization condition that is usually applied toprobability distributions, P(s). Thus we use v(s) to indicate complex vectors that havea value at every possible state of the system.

Given an irreducible real nonnegative matrix (P(s′|s) ≥ 0) satisfying

(1.7.43)

we have:

1. Applying P(s′|s) cannot increase the value of all elements of a nonnegative vec-tor, v(s ′) ≥ 0:

(1.7.44)

To avoid infinities, we can assume that the minimization only includes s′ such thatv(s ′) ≠ 0.

′ s min

1

v( ′ s )P( ′ s | s)v(s)

s∑

≤ 1

P( ′ s | s)′ s

∑ = 1



01adBARYAM_29412 3/10/02 10:17 AM Page 204

Proof : Assume that Eq. (1.7.44) is not true. In this case

(1.7.45)

for all v(s ′) ≠ 0, which implies

(1.7.46)

Using Eq. (1.7.43), the left is the same as the right and the inequality is impossible.

2. The magnitude of eigenvalues of P(s′|s) is not greater than one.

Proof : Let v(s) be an eigenvector of P(s′|s) with eigenvalue :

(1.7.47)

Then:

(1.7.48)

This inequality follows because each term in the sum on the left has been made pos-itive. If all terms started with the same phase, then equality holds. Otherwise, in-equality holds. Comparing Eq. (1.7.48) with Eq. (1.7.44), we see that | | ≤ 1.

If | | = 1,then equality must hold in Eq.(1.7.48), and this implies that |v(s)|, thevector whose elements are the magnitudes of v(s),is an eigenvector with eigenvalue 1.Steps 3–5 show that there is one such vector which is strictly positive (greater thanzero) everywhere.

3. P(s′|s) has an eigenvector with eigenvalue = 1. We use the notation v1(s) for thisvector.

Proof : The existence of such an eigenvector follows from the existence of an eigen-vector of the transpose matrix with eigenvalue = 1. Eq.(1.7.43) implies that the vec-tor v(s) = 1 (one everywhere) is an eigenvector of the transpose matrix with eigenvalue

= 1. Thus v1(s) exists, and by step 2 we can take it to be real and nonnegative, v1(s)≥ 0. We can, however, assume more, as the following shows.

4. An eigenvector of P(s′|s) with eigenvalue 1 must be strictly positive, v1(s) > 0.

Proof : Define a new Markov chain given by the transition matrix

Q(s′|s) = (P(s′|s) + s,s′) / 2 (1.7.49)

Applying Q(s′|s) N − 1 times to any vector v1(s) ≥ 0 must yield a vector that is strictlypositive. This follows because P(s′|s) is irreducible. Starting with unit probability atany one value of s, after N − 1 steps we will move some probability everywhere. Also,by the construction of Q(s′|s), any s which has a nonzero probability at one time willcontinue to have a nonzero probability at all later times. By linear superposition,thisapplies to any initial probability distribution. It also applies to any unnormalized vec-tor v1(s) ≥ 0. Moreover, if v1(s) is an eigenvector of P(s′|s) with eigenvalue one,then it

P( ′ s | s)v(s)s

∑ ≥ v( ′ s )

P( ′ s | s)v(s)s

∑ = v( ′ s )

′ s

∑ P( ′ s |s)v(s)s

∑ > v( ′ s )′ s

∑

P( ′ s | s)v(s)s

∑ > v( ′ s )



01adBARYAM_29412 3/10/02 10:17 AM Page 205

is also an eigenvector of Q(s′|s) with the same eigenvalue. Since applying Q(s′|s) tov1(s) changes nothing, applying it N − 1 times also changes nothing. We have justproven that v1(s) must be strictly positive.

5 . Th ere is on ly one linearly indepen dent ei genvector of P(s′ |s) with ei genva lue = 1 .

Proof : Assume there are two such eigenvectors: v1(s) and v2(s). Then we can make alinear combination c1v1(s) + c2v2(s),so that at least one of the elements is zero and oth-ers are positive. This linear combination is also an eigenvector of P(s′|s) with eigen-value = 1, which violates step 4. Thus there is exactly one eigenvector of P(s′|s) witheigenvalue = 1, v1(s):

(1.7.50)

6. Either P(s′|s) has only one eigenvalue with | | = 1 (in which case = 1), or it canbe written as a cyclical flow.

Proof : Steps 2 and 5 imply that all eigenvectors of P(s′|s) with eigenvalues satisfying| | = 1 can be written as:

vi(s) = Di(s)v1(s) = ei i(s)v1(s) (1.7.51)

As indicated, Di (s) is a vector with elements of magnitude one, |Di (s)| = 1. We canwrite

(1.7.52)

There cannot be any terms in the sum on the left of Eq.(1.7.52) that add terms of dif-ferent phase. If there were, then we would have a smaller magnitude than adding theabsolute values, which would not agree with Eq.(1.7.50). Thus we can assign all of theelements of Di(s) into groups that have the same phase. P(s′|s) cannot allow transi-tions to occur from any two of these groups into the same group. Since P(s′|s) is irre-ducible, the only remaining possibility is that the different groups are connected in aring with the first mapped onto the second, and the second mapped onto the third,and so on until we return to the first group. In particular, if there are any transitionsbetween a site and itself this would violate the requirements and we could have nocomplex eigenvalues.

7. A Markov chain governed by an irreducible transition matrix, which has only oneeigenvector, v1(s) with | | = 1,has a limiting distribution over long enough timeswhich is proportional to this eigenvector. Using P t(s′|s) to represent the effect ofapplying P(s′|s) t times, we must prove that:

(1.7.53)

for v(s) ≥ 0. The coefficient c depends on the normalization of v(s) and v1(s). If bothare normalized so that the total probability is one, then conservation of probabilityimplies that c = 1.

limt→∞

v(s;t) = limt →∞

P t( ′ s |s)v(s)s

∑ = cv1( ′ s )

P( ′ s | s)D i (s)v1(s)s

∑ = iD i ( ′ s )v1( ′ s )

P( ′ s | s)v1(s)s

∑ = v1( ′ s )



01adBARYAM_29412 3/10/02 10:17 AM Page 206

Proof : We write the matrix P(s′|s) in the Jordan normal form using a similarity trans-formation. In matrix notation:

P = S−1JS (1.7.54)

J consists of a block diagonal matrix.Each of the block matrices along the diagonal isof the form

(1.7.55)

where is an eigenvalue of P. In this block the only nonzero elements are s on thediagonal, and 1s just above the diagonal.

Since Pt = S−1JtS, we consider Jt, which consists of diagonal blocks Nt. We provethat Nt → 0 as t → ∞ for < 1. This can be shown by evaluating explicitly the ma-trix elements. The qth element above the diagonal of Nt is:

(1.7.56)

which vanishes as t → ∞.Since 1 is an eigenvalue with only one eigenvector, there must be one 1 × 1 block

along the diagonal of J for the eigenvalue 1. Then Jt as t → ∞ has only one nonzero el-ement which is a 1 on the diagonal. Eq.(1.7.53) follows, because applying the matrixPt always results in the unique column of S−1 that corresponds to the nonzero diago-nal element of Jt. By our assumptions,this column must be proportional to v1(s). Thiscompletes our proof and discussion of the Perron-Frobenius theorem.

1.7.4 MinimizationAt low temperatures, a thermodynamic system in equilibrium will be found in itsminimum energy configuration. For this and other reasons,it is often useful to iden-tify the minimum energy configuration of a system without describing the full en-semble. There are also many other problems that can be formulated as minimizationor optimization problems.

Minimization problems are often described in a d-dimensional space of contin-uous variables. When there is only a single valley in the parameter space of the prob-lem,there are a variety of techniques that can be used to obtain this minimum. Theymay be classified into direct search and gradient-based techniques. In this section wefocus on the single-valley problem. In Section 1.7.5 we will discuss what happenswhen there is more than one valley.

Di rect search tech n i ques invo lve eva lu a ting the en er gy at va rious loc a ti on sand closing in on the minimum en er gy. In one dimen s i on , s e a rch tech n i ques canbe very ef fective . The key to a search is bracketing the minimum en er gy. Th en

t −q t

q

N =

1 0 0

0 O 0

0 0 O 1

0 0 0



01adBARYAM_29412 3/10/02 10:17 AM Page 207

e ach en er gy eva lu a ti on is used to geom etri c a lly shrink the po s s i ble domain of t h em i n i mu m .

We start in one dimension by looking at the energy at two positions s1 and s2 thatare near each other. If the left of the two positions s1 is higher in energy E(s1) > E(s2),then the minimum must be to its right. This follows from our assumption that thereis only a single valley—the energy rises monotonically away from the minimum andtherefore cannot be lower than E(s2),anywhere to the left of s1. Evaluating the energyat a third location s3 to the right of s2 further restricts the possible locations of theminimum. If E(s3) is also greater than the middle energy location E(s3) > E(s2), thenthe minimum must lie between s1 and s3. Thus, we have successfully bracketed theminimum. Otherwise, we have that E(s3) < E(s2), and the minimum must lie to theright of s2. In this case we look at the energy at a location s4 to the right of s3. Thisprocess is continued until the energy minimum is bracketed. To avoid taking manysteps to the right,the size of the steps to the right can be taken to be an increasing geo-metric series, or may be based on an extrapolation of the function using the valuesthat are available.

O n ce the en er gy minimum is bracketed , the segm ent is bi s ected again anda gain to loc a te the en er gy minimu m . This is an itera tive proce s s . We de s c ri be asimple vers i on of this process that can be easily implem en ted . An itera ti on begi n swith three loc a ti ons s1 < s2 < s3. The va lues of the en er gy at these loc a ti ons sati s f yE(s1) , E(s3) > E(s2) . Thus the minimum is bet ween s1 a n d s3. We ch oose a new lo-c a ti on s4, wh i ch in even steps is s4 = (s1 + s2) / 2 and in odd steps is s4 = (s2 + s3) / 2 .Th en we el i m i n a te ei t h er s1 or s3. The one that is el i m i n a ted is the one next to s2 i fE(s2) > E(s4) , or the one next to s4 i f E(s2) < E(s4) . The remaining three loc a ti on sa re rel a bl ed to be s1, s2, s3 for the next step. Itera ti ons stop wh en the distance be-t ween s1 and s3 is small er than an error to l era n ce wh i ch is set in adva n ce . More so-ph i s ti c a ted vers i ons of this algorithm use improved met h ods for sel ecting s4 t h a taccel era te the conver gen ce .

In higher-dimension spaces,direct search can be used. However, mapping a mul-tidimensional energy surface is much more difficult. Moreover, the exact logic thatenables an energy minimum to be bracketed within a particular domain in one di-mension is not possible in higher-dimension spaces. Thus, techniques that make useof a gradient of the function are typically used even if the gradient must be numeri-cally evaluated. The most common gradient-based minimization techniques includesteepest descent, second order and conjugate gradient.

Steepest descent techniques involve taking steps in the direction of the most rapiddescent direction as determined by the gradient of the energy. This is the same as us-ing a first-order expansion of the energy to determine the direction of motion towardlower energy. Illustrating first in one dimension,we start from a position s1 and writethe expansion as:

(1.7.57)

E(s) = E(s1) +(s −s1)dE(s)

dss 1

+O((s − s1)2)



01adBARYAM_29412 3/10/02 10:17 AM Page 208

We now take a step in the direction of the minimum by setting:

(1.7.58)

From the ex p a n s i on we see that for small en o u gh c,E(s2) must be small er than E(s1) . Th eprobl em is to caref u lly sel ect c so that we do not go too far. If we go too far we may re achbeyond the en er gy minimum and increase the en er gy. We also do not want to make su cha small step that many steps wi ll be needed to re ach the minimu m . We can think of t h es equ en ce of con f i g u ra ti ons we pick as a time sequ en ce , and the process we use to pickthe next loc a ti on as an itera tive map. Th en the minimum en er gy con f i g u ra ti on is a fixedpoint of the itera tive map given by Eq .( 1 . 7 . 5 8 ) . From a point near to the minimum wecan have all of the beh avi ors de s c ri bed in Secti on 1.1—stable (conver ging) and unsta-ble (diver gi n g ) , both of these with or wi t h o ut altern a ti on from side to side of the min-i mu m .O f p a rticular rel eva n ce is the discussion in Questi on 1.1.12 that su ggests how cm ay be ch o s en to stabi l i ze the itera tive map and obtain rapid conver gen ce .

When s is a multidimensional variable, Eq. (1.7.57) and Eq. (1.7.58) both con-tinue to apply as long as the derivative is replaced by the gradient:

E(s) = E(s1) + (s − s1). ∇s E(s)|s1

+ O((s − s1)2) (1.7.59)

s2 = s1 − c∇s E(s)|s1

(1.7.60)

Si n ce the directi on oppo s i te to the grad i ent is the directi on in wh i ch the en er gy dec re a s e smost ra p i dly, this is known as a steepest de s cent tech n i qu e .For the mu l ti d i m en s i onal caseit is more difficult to ch oose a con s i s tent va lue of c , s i n ce the beh avi or of the functi on maynot be the same in different directi on s . The va lue of c m ay be ch o s en “on the fly ” by mak-ing su re that the new en er gy is small er than the old. If the current va lue of c gives a va lu eE(s2) wh i ch is larger than E(s1) then c is redu ced . We can improve upon this by loo k i n ga l ong the directi on of the grad i ent and con s i dering the en er gy to be a functi on of c :

E(s1 − c∇s E(s)|s1

) (1.7.61)

Then c can be chosen by finding the actual minimum in this direction using the searchtechnique that works well in one dimension.

Grad i ent tech n i ques work well wh en different directi ons in the en er gy have thesame beh avi or in the vi c i n i ty of the minimum en er gy. This means that the second de-riva tive in different directi ons is approx i m a tely the same. If the second deriva tives arevery different in different directi on s ,t h en the grad i ent tech n i que tends to bo u n ce backand forth perpendicular to the directi on in wh i ch the second deriva tive is very small ,wi t h o ut making mu ch progress tow a rd the minimum (Fig. 1 . 7 . 3 ) . Im provem ents ofthe grad i ent tech n i que fall into two cl a s s e s . One class of tech n i ques makes direct use ofthe second deriva tive s , the other does not. If we expand the en er gy to second order atthe pre s ent best guess for the minimum en er gy loc a ti on s1 we have

(1.7.62) E(s) = E(s1) +(s −s1) ⋅∇s E(s)

s1

+(s −s1)⋅s ∇ s

r ∇ sE(s)

s1

⋅(s − s1) +O((s − s1)3)

s2 = s1 − cdE(s)

dss1



01adBARYAM_29412 3/10/02 10:17 AM Page 209

Setting the gradient of this expression to zero gives the next approximation for theminimum energy location s2 as:

(1.7.63)

This, in effect, gives a better description of the value of c for Eq. 1.7.60, which turnsout to be a matrix inversely related to the second-order derivatives. Steps are large indirections in which the second derivative is small. If the second derivatives are not eas-ily available, approximate second derivatives are used that may be improved upon asthe minimization is b eing performed. Because of the need to evaluate the matrix ofsecond-order derivatives and invert the matrix,this approach is not often convenient.In addition, the use of second derivatives assumes that the expansion is valid all theway to the minimum energy. For many minimization problems, this is not validenough to be a useful approximation. Fortunately, there is a second approach calledthe conjugate gradient technique that often works as well and sometimes better.

Conjugate gradient techniques make use of the gradient but are designed to avoidthe difficulties associated with long narrow wells where the steepest descent tech-niques result in oscillations. This is done by starting from a steepest descent in the firststep of the minimization. In the second step, the displacement is taken to be along adirection that does not include the direction taken in the first step. Explicitly, let vi bethe direction taken in the ith step, then the first two directions would be:

(1.7.64)

This ensures that v2 is orthogonal to v1. Subsequent directions are made orthogonalto some number of previous steps. The use of orthogonal directions avoids much ofthe problem of bouncing back and forth in the energy well.

Monte Carlo simulation can also be used to find minimum energy configurationsif the simulations are done at zero temperature. A zero temperature Monte Carlomeans that the steps taken always reduce the energy of the system. This approachworks not only for continuous variables, but also for the discrete variables like in theIsing model. For the Ising model,the zero temperature Monte Carlo described above

v1 = −∇sE(s)s1

v2 = −∇sE(s)s 2

+ v1

(v1 ⋅∇s E(s)s 2

)

v1 ⋅v1

s2 = s1 −

1

2

s ∇ s

r ∇ s E(s)

s1

−1

⋅∇s E(s)s1



Figure 1.7.3 Illustration of the difficulties in finding a minimum energy by steepest descentwhen the second derivative is very different in different directions. The steps tend to oscil-late and do not make progress toward the minimum along the flat direction. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 210

and the zero temperature Glauber dynamics are the same. Every selected spin is placedin its low energy orientation—aligned with the local effective field.

None of these tech n i ques are su i ted to finding the minimum en er gy con f i g u ra ti oni ft h ere are mu l tiple en er gy minima, and we do not know if we are loc a ted near the cor-rect minimum en er gy loc a ti on . One way to ad d ress this probl em is to start from va ri-ous initial con f i g u ra ti ons and to look for the local minimum nearby. By doing thism a ny times it might be po s s i ble to iden tify the gl obal minimum en er gy. This work swh en there are on ly a few different en er gy minima. Th ere are no tech n i ques that guar-a n tee finding the gl obal minimum en er gy for an arbi tra ry en er gy functi on E(s) .However, by using Mon te Ca rlo simu l a ti ons that are not at T = 0 , a sys tem a tic approachc a ll ed simu l a ted annealing has been devel oped to try to iden tify the gl obal minimu m .

1.7.5 Simulated annealingSimulated annealing was introduced relatively recently as an approach to finding theglobal minimum when the energy or other optimization function contains many lo-cal minima. The approach is based on the physical process of heating a system andcooling it down slowly. The minimum energy for many simple materials is a crystal.If a material is heated to a liquid or vapor phase and cooled rapidly, the material doesnot crystallize. It solidifies as a glass or amorphous solid. On the other hand, if it iscooled slowly, crystals may form. If the material is formed out of several differentkinds of atoms, the cooling may also result in phase separation into particular com-pounds or atomic solids.The separated compounds are lower in energy than a rapidlycooled mixture.

Simulated annealing works in much the same way. A Monte Carlo simulation isstarted at a high temperature. Then the temperature is lowered according to a coolingschedule until the temperature is so low that no additional movements are likely. Ifthe procedure is effective,the final energy should be the lowest energy of the simula-tion. We could also keep track of the energy during the simulation and take the low-est value, and the configuration at which the lowest value was reached.

In general, simulated annealing improves upon methods that find only a localminimum energy, such as steepest descent, discussed in the previous section. Forsome problems, the improvement is substantial. Even if the minimum energy that isfound is not the absolute minimum in energy of the system, it may be close. For ex-ample, in problems where there are many configurations that have roughly the samelow energy, simulated annealing may find one of the low-energy configurations.

However, simulated annealing does not work well for all problems,and for someproblems it fails completely. It is also true that annealing of physical materials doesnot always result in the lowest energy conformation. Many materials, even whencooled slowly, result in polycrystalline materials, disordered solids and mixtures.When it is important for technological reasons to reach the lowest energy state, spe-cial techniques are often used. For example, the best crystal we know how to make issilicon. In order to form a good silicon crystal, it is grown using careful nonuniformcooling. A single crystal can be gradually pulled from a liquid that solidifies only onthe surfaces of the existing crystal. Another technique for forming crystals is growth



01adBARYAM_29412 3/10/02 10:17 AM Page 211

from the vapor phase, where atoms are deposited on a previously formed crystal thatserves as a template for the continuing growth. The difficulties inherent in obtainingmaterials in their lowest energy state are also apparent in simulations.

In Secti on 1.4 we con s i dered the cooling of a two - s t a te sys tem as a model of a gl a s stra n s i ti on . We can think abo ut this simu l a ti on to give us clues abo ut why both phys i-cal and simu l a ted annealing som etimes fail to find low en er gy states of the sys tem . Wes aw that using a constant cooling ra te leaves some sys tems stu ck in the high er en er gywell . Wh en there are many su ch high en er gy wells then the sys tem wi ll not be su cce s s-ful in finding a low en er gy state . The probl em becomes more difficult if the hei ght ofthe en er gy barri er bet ween the two wells is mu ch larger than the en er gy differen ce be-t ween the upper and lower well s . In this case, at high er tem pera tu res the sys tem doe snot care wh i ch well it is in. At low tem pera tu res wh en it would like to be in the loweren er gy well , it cannot overcome the barri er. How well the annealing works in finding al ow en er gy state depends on wh et h er we care abo ut the en er gy scale ch a racteri s tic ofthe barri er, or ch a racteri s tic of the en er gy differen ce bet ween the two minima.

There is another characteristic of the energy that can help or hurt the effective-ness of simulated annealing. Consider a system where there are many local minimumenergy states (Fig. 1.7.4). We can think about the effect of high temperatures as plac-ing the system in one of the many wells of the energy minima. These wells are calledbasins of attraction.A system in a particular basin of attraction will go into the min-imum energy configuration of the basin if we suddenly cool to zero temperature. We



E(s)

Figure 1.7.4 Schematic plot of a system energy E(s) as a function of a system coordinate s.In simulated annealing, the location of a minimum energy is sought by starting from a hightemperature Monte Carlo and cooling the system to a low temperature. At the high tempera-ture the system has a high kinetic energy and explores all of the possible configurations. Asthe temperature is cooled it descends into one of the wells, called basins of attraction, andcannot escape. Finally, when the temperature is very low it loses all kinetic energy and sitsin the bottom of the well. Minima with larger basins of attraction are more likely to capturethe system. Simulated annealing works best when the lowest-energy minima have the largestbasins of attraction. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 212

also can see that the gradual cooling in simulated annealing will result in low energystates if the size of the basin of attraction increases with the depth of the well. Thismeans that at high temperatures the system is more likely to be in the basin of attrac-tion of a lower energy minimum. Thus,simulated annealing works best when energyvaries in the space in such a way that deep energy minima also have large basins of at-traction. This is sometimes but not always true both in physical systems and in math-ematical optimization problems.

Another way to improve the performance of simulated annealing is to introducenonlocal Monte Carlo steps. If we understand the characteristics of the energy, we candesign steps that take us through energy barriers. The problem with this approach isthat if we don’t know the energy surface well enough, then moving around in thespace by arbitrary nonlocal steps will result in attempts to move to locations wherethe energy is high. These steps will be rejected by the Monte Carlo and the nonlocalmoves will not help. An example where nonlocal Monte Carlo moves can help is treat-ments of low-energy atomic configurations in solids. Nonlocal steps can allow atomsto move through each other, switching their relative positions, instead of trying tomove gradually around each other.

Finally, for the success of simulated annealing, it is often necessary to design care-fully the cooling schedule.Generally, the slower the cooling the more likely the simu-lation will end up in a low energy state. However, given a finite amount of computerand human time,it is impossible to allow an arbitrarily slow cooling. Often there areparticular temperatures where the cooling rate is crucial. This happens at phase tran-sitions, such as at the liquid-to-solid phase boundary. If we know of such a transition,then we can cool rapidly down to the transition, cool very slowly in its vicinity andthen speed up thereafter. The most difficult problems are those where there are bar-riers of varying heights leading to a need to cool slowly at all temperatures.

For some problems the cooling rate should be slowed as the temperature be-comes lower. One way to achieve this is to use a logarithmic cooling schedule. For ex-ample, we set the temperature T(t) at time step t of the Monte Carlo, to be:

T(t) = T0 / ln(t / t0 + 1) (1.7.65)

where t0 and T0 are parameters that must be chosen for the particular problem. InQuestion 1.7.5 we show that for the two-state system,if kT0 > (EB − E1),then the sys-tem will always relax into its ground state.

Question 1.7.5: Show that by using a logarithmic cooling schedule, Eq.(1.7.65), where kT0 > (EB − E1),the two-state system of Section 1.4 al-

ways relaxes into the ground state. To simplify the problem, consider an in-cremental time ∆t during which the temperature is fixed.Show that the sys-tem will still relax to the equilibrium probability over this incremental time,even at low temperatures.

Solution 1.7.5: We write the solution of the time evolution during the in-cremental time ∆t from Eq. (1.4.45) as:

P(1;t + ∆t) − P(1;∞) = (P(1;t) − P(1;∞))e−t /τ(t) (1.7.66)



01adBARYAM_29412 3/10/02 10:17 AM Page 213

where P(1;∞) is the equilibrium value of the probability for the temperatureT(t). (t)is the relaxation time for the temperatureT(t). In order for relax-ation to occur we must have that e−t/τ(t)<<1, equivalently:

t / (t) >> 1 (1.7.67)

We calculate (t) from Eq. (1.4.44):

(1.7.68)

where we have substituted Eq. (1.7.65) and defined = (EB − E1)/kT0. Wemake the reasonable assumption that we start our annealing at a high tem-perature where relaxation is not a problem. Then by the time we get to thelow temperatures that are of interest, t >> t0, so:

1/ (t) > 2ν(t/ t0)− (1.7.69)

and

(1.7.70)

For < 1 the right-hand side increases with time and thus the relaxation im-proves with time according to Eq. (1.7.67). If relaxation occurs at highertemperatures, it will continue to occur at all lower temperatures despite theincreasing relaxation time. ❚

Information

Ultimately, our ability to quantify complexity (How complex is it?) requires a quan-tification of information (How much information does it take to describe it?). In thissection, we discuss information. We will also need computation theory described inSection 1.9 to discuss complexity in Chapter 8.A quantitative theory of informationwas developed by Shannon to describe the problem of communication. Specifically,how much information can be communicated through a transmission channel (e.g.,a telephone line) with a specified alphabet of letters and a rate at which letters can betransmitted. The simplest example is a binary alphabet consisting of two characters(digits) with a fixed rate of binary digits (bits) per second. However, the theory is gen-eral enough to describe quite arbitrary alphabets,letters of variable duration such asare involved in Morse code, or even continuous sound with a specified band-width.We will not consider many of the additional applications,our objective is to establishthe basic concepts.

1.8.1 The amount of information in a messageWe start by considering the information content of a string of digits s = (s1s2...sN).Onemight naively expect that information is contained in the state of each digit. However,when we receive a digit, we not only receive information about what the digit is, but

1.8

t / (t) > t 0 t 1−

1/ (t) = (e −(EB −E1 )/ kT(t ) + e −(EB −E−1)/kT(t ))

> e −(EB −E1)/kT (t ) = (t /t 0 + 1)−



01adBARYAM_29412 3/10/02 10:17 AM Page 214

also what the digit is not.Let us assume that a digit in the string of digits we receive isthe number 1. How much information does this provide? We can contrast two differ-ent scenarios—binary and hexadecimal digits:

1. There were two possibilities for the number, either 0 or 1.

2. There were sixteen possibilities for the number {0, 1, 2,3,4,5, 6, 7,8, 9, A, B, C,D, E, F}.

In which of these did the “1”communicate more information? Since the first case pro-vides us with the information that it is “not 0,” while the second provides us with theinformation that it is “not 0,” “not 2,” “not 3,” etc., the second provides more infor-mation. Thus there is more information in a digit that can have sixteen states than adigit that can have only two states.We can quantify this difference if we consider a bi-nary representation of hexadecimal digits {0000,0001,0010,0011,…,1111}. It takesfour binary digits to represent one hexadecimal digit. The hexadecimal number 1 isrepresented as 0001 in binary form and uses four binary digits.Thus a hexadecimal 1contains four times as much information as a binary 1.

We note that the amount of information does not depend on the particular valuethat is taken by the digit. For hexadecimal digits, consider the case of a digit that hasthe value 5. Is there any difference in the amount of information given by the 5 thanif it were 1? No, either number contains the same amount of information.

This illustrates that information is actually contained in the distinction betweenthe state of a digit compared to the other possible states the digit may have. In orderto quantify the concept of information, we must specify the number of possible states.Counting states is precisely what we did when we defined the entropy of a system inSection 1.3. We will see that it makes sense to define the information content of astring in the same way as the entropy—the logarithm of the number of possible statesof the string:

I(s) = log2( ) (1.8.1)

By conven ti on , the inform a ti on is def i n ed using the loga rithm base two. Thu s , t h ei n form a ti on con t a i n ed in a single bi n a ry digit wh i ch has two po s s i ble states is log2(2) = 1 .More gen era lly, the nu m ber of po s s i ble states in a string of N bi t s , with each bit takingone of t wo va lues (0 or 1) is 2N. Thus the inform a ti on in a string of N bits is (in wh a tfo ll ows the functi on log( ) wi ll be assu m ed to be base two ) :

I(s) = log(2N) = N (1.8.2)

Eq.(1.8.2) says that each bit provides one unit of information. This is consistent withthe intuition that the amount of information grows linearly with the length of thestring. The logarithm is essential, because the number of possible states grows expo-nentially with the length of the string, while the information grows linearly.

It is important to recognize that the definition of information we have given as-sumes that each of the possible realizations of the string has equal a priori probabil-ity. We use the phrase a priori to emphasize that this refers to the probability prior toreceipt of the string—once the string has arrived there is only one possibility.

I n fo r m a t i o n 215


01adBARYAM_29412 3/10/02 10:17 AM Page 215

To think about the role of probability we must discuss further the nature of themessage that is being communicated. We construct a scenario involving a sender anda receiver of a message. In order to make sure that the recipient of the message couldnot have known the message in advance (so there is information to communicate), weassume that the sender of the information is sending the result of a random occur-rence, like the flipping of a coin or the throwing of a die. To enable some additionalflexibility, we assume that the random occurrence is the drawing of a ball from a bag.This enables us to construct messages that have different probabilities. To be specific,we assume there are ten balls in the bag numbered from 0 to 9. All of them are red ex-cept the ball marked 0, which is green. The person communicating the message onlyreports ifthe ball drawn from the bag is red (using the digit 1) or green (using the digit0). The recipient of the message is assumed to know about the setup. If the recipientreceives the number 0,he then knows exactly which ball was selected,and all that werenot selected. However, if he receives a 1, this provides less information, because heonly knows that one of nine was selected,not which one. We notice that the digit 1 isnine times as likely to occur as the digit 0.This suggests that a higher probability digitcontains less information than a lower probability digit.

We generalize the definition of the information content of a string of digits to al-low for the possibility that different strings have different probabilities. We assumethat the string is one of an ensemble of possible messages, and we define the infor-mation as:

I(s) = −log(P(s)) (1.8.3)

where P(s) is the probability of the occurrence of the message s in the ensemble. Notethat in the case of equal a priori probability P(s) = 1/ , Eq. (1.8.3) reduces toEq. (1.8.1). The use o f probabilities in the definition of information makes sense inone of two cases:(1) The recipient knows the probabilities that represent the conven-tions of the transmission, or (2) A large number of independent messages are sent,and we are considering the information communicated by one of them. Then we canapproximate the probability of a message by its proportion of appearance among themessages sent. We will discuss these points in greater detail later.

Question 1.8.1 Calculate the information, according to Eq.(1.8.3),thatis provided by a single digit in the example given in the text of drawing

red and green balls from a bag.

Solution 1.8.1 For the case of a 0,the information is the same as that of adecimal digit:

I(0) = −log(1/10) ≈ 3.32 (1.8.4)

For the case of a 1 the information is

I(0) = −log(9/10) ≈ 0.152 (1.8.5) ❚

We can spec i a l i ze the def i n i ti on of i n form a ti on in Eq . (1.8.3) to a messages = (s1s2. . .sN) com po s ed of i n d ivi dual ch a racters (bi t s , h ex adecimal ch a racters ,ASCII ch a racters , dec i m a l s , etc.) that are com p l etely indepen dent of e ach other



01adBARYAM_29412 3/10/02 10:17 AM Page 216

(for example, each corresponding to the result of a separate coin toss). This meansthat the total probability of the message is the product of the probability of eachcharacter, P(s) = ∏

iP(si). Then the information content of the message is given by:

(1.8.6)

If all of the characters have equal probability and there are k possible characters in thealphabet, then P(si) = 1/k, and the information content is:

I(s) = N log(k) (1.8.7)

For the case of binary digits,this reduces to Eq.(1.8.2). For other cases like the hexa-decimal case,k = 16,this continues to make sense:the information I = 4N correspondsto the requirement of representing each hexadecimal digit with four bits. Note thatthe previous assumption of equal a priori probability for the whole string is strongerthan the independence of the digits and implies it.

Question 1.8.2 App ly the def i n i ti on of i n form a ti on con tent in Eq .( 1 . 8 . 3 )to each of the fo ll owing cases. As sume messages consist of a total of N bi t s

su bj ect to the fo ll owing con s traints (aside for the con s traints assume equ a lprob a bi l i ti e s ) :

1. Every even bit is 1.

2. Every (odd, even) pair of bits is either 11 or 00.

3. Every eighth bit is a parity bit (the sum modulo 2 of the previous sevenbits).

Solution 1.8.2: In each case, we first g ive an intuitive argument, and thenwe show that Eq. (1.8.3) or Eq. (1.8.6) give the same result.

1. The only information that is transferred is the state of the odd bits.Thismeans that only half of the bits contain information. The total infor-mation is N / 2. To apply Eq.(1.8.6), we see that the even bits, which al-ways have the value 1, have a probability P(1) = 1 which contributes noinformation. Note that we never have to consider the case P(0) = 0 forthese bits, which is good, because by the formula it would give infiniteinformation. The odd bits with equal probabilities, P(1) = P(0) = 1/2,give an information of one for either value received.

2. Every pair of bits contains only two possibilities, giving us the equiva-lent of one bit of information rather than two. This means that total in-formation is N /2. To apply Eq.(1.8.6), we have to consider every (odd,even) pair of bits as a single character. These characters can never havethe value 01 or 10, and they have the value 11 or 00 with probabilityP(11) = P(00) = 1/2, which gives the expected result. We will see laterthat there is another way to think about this example by using condi-tional probabilities.

3. The number of independent pieces of information is 7N / 8. To see thisfrom Eq. (1.8.6), we group each set of eight bits together and consider

I(s) = −i

∑ log(P(si ))



01adBARYAM_29412 3/10/02 10:17 AM Page 217

them as a single character (a byte). There are only 27 different possibil-ities for each byte, and each one has equal probability according to ourconstraints and assumptions. This gives the desired result.

Note : Such representations are used to check for noise in transmission.If there is noise,the redundancy of the eighth bit provides additional in-formation. The noise-dependent amount of additional information canalso be quantified; however, we will not discuss it here. ❚

Question 1.8.3 Con s i der a tra n s m i s s i on of E n glish ch a racters usingan ASCII repre s en t a ti on . ASCII ch a racters are the conven ti on a l

m et h od for com p uter repre s en t a ti on of E n glish text including small andcapital let ters , nu m erals and punctu a ti on . Discuss (do not eva lu a te forthis qu e s ti on) how you would determine the inform a ti on con tent of am e s s a ge . We wi ll eva lu a te the inform a ti on con tent of E n glish in a laterqu e s ti on .

Solution 1.8.3 In ASCII, characters are represented using eight bits. Someof the possible combinations of bits are not used at all. Some are used veryinfrequently. One way to determine the information content of a message isto assume a model where each of the characters is independent. To calculatethe information content using this assumption, we must find the probabil-ity of occurrence of each character in a sample text. Using these probabili-ties,the formula Eq.(1.8.6) could be applied. However, this assumes that thelikelihood of occurrence of a character is independent of the preceding char-acters, which is not correct. ❚

Question 1.8.4: Assume that you know in advance that the number ofones in a long binary message is M. The total number of bits is N. What

is the information content of the message? Is it similar to the informationcontent of a message of N independent binary characters where the proba-bility that any character is one is P(1) = M /N?

Solution 1.8.4:We count the number of possible messages with M ones andtake the logarithm to obtain the information as

(1.8.8)

We can show that this is almost the same as the information of a message ofthe same length with a particular probability of ones, P(1) = M / N, by use ofthe first two terms of Sterling’s approximation Eq. (1.2.36). Assuming 1 <<M << N (A correction to this would grow logarithmically with N and can befound using the additional terms in (Eq. (1.2.36)):

I ∼N(log(N) − 1) − M(log(M) − 1) − (N − M)(log(N − M) − 1)

= −N[P(1)log(P(1)) + (1 − P(1))log(1 − P(1))](1.8.9)

I = log(N

M

) = log(

N !

M !(N − M )!)



01adBARYAM_29412 3/10/02 10:17 AM Page 218

This is the information from a string of independent characters whereP(1) = M / N. For such a string, the number of ones is approximately NP(1)and the number of zeros N(1 − P(1)) (see also Question 1.8.7). ❚

1.8.2 Characterizing sources of informationThe information content of a particular message is defined in terms of the probabil-ity that it,out of all possible messages, will be received. This means that we are char-acterizing not just a message but the source of the message.A direct characterizationof the source is not the information of a particular message, but the average informa-tion over the ensemble of possible messages. For a set of possible messages with agiven probability distribution P(s) this is:

(1.8.10)

If the messages are composed out of characters s = (s1s2...sN),and each character is de-termined independently with a probability P(si), then we can write the informationcontent as:

(1.8.11)

We can move the factor in parenthesis inside the inner sum and interchange the or-der of the summations.

(1.8.12)

The latter expression results from recognizing that the sum over all possible states isa sum over all possible values of each of the letters. The sum and product can beinterchanged:

(1.8.13)

giving the result:

(1.8.14)

This shows that the average information content of the whole message is the averageinformation content of each character summed over the whole character string. If thecharacters have the same probability, this is just the average information content of anindividual character times the number of characters. If all letters of the alphabet havethe same probability, this reduces to Eq. (1.8.7).

 = − P(s ′ i )log P(s ′ i )( )s ′ i

∑′ i

∑

{si }i ≠i '

∑ P(si )i ≠ ′ i ∏ = P(si )

si

∑i ≠ ′ i ∏ = 1

 = −′ i

∑ P(si )i

∏

s∑ log P(s ′ i )( ) = −

′ i ∑

{s i }i≠ i '

∑ P(s i )i≠ ′ i ∏

P(s ′ i )log P(s ′ i )( )

s ′ i

∑

 = − P(s i )

i∏

s∑ log P(s ′ i )

′ i ∏

= − P(s i )

i∏

s∑ log P(s ′ i )( )

′ i ∑

 = − P

s∑ (s)log(P(s))



01adBARYAM_29412 3/10/02 10:17 AM Page 219

The average information content of a binary variable is given by:

 = −(P(1)log(P(1)) + P(0)log(P(0))) (1.8.15)

Aside from the use of a logarithm base two, this is the same as the entropy of a spin(Section 1.6) with two possible states s = ±1 (see Question 1.8.5). The maximum in-formation content occurs when the probabilities are equal, and the information goesto zero when one of the two becomes one,and the other zero (see Fig. 1.8.1). The in-formation reflects the uncertainty in, or the lack o f advance knowledge about, thevalue received.

Question 1.8.5 S h ow that the ex pre s s i on for the en tropy S given inEq . (1.6.16) of a set of n on i n teracting bi n a ry spins is the same as the

information content defined in Eq.(1.8.15) aside from a normalization con-stant k ln(2). Consider the binary notation si = 0 to be the same as si = −1 forthe spins.



0.2 0.4 0.6 0.8 1

0.5

1.0

1.5

0

–log(P)

P

–Plog(P)

–Plog(P)–(1–P)log(1–P)

2.0

0.0

Figure 1.8.1 Plots of functions related to the information content of a message with proba-bility P. −log(P) is the information content of a single message of probability P. −Plog(P) isthe contribution of this message to the average information given by the source. While theinformation content of a message diverges as P goes to zero, it appears less frequently so itscontribution to the average information goes to zero. If there are only two possible messages,or two possible (binary) characters with probability P and 1 − P then the average informationgiven by the source per message or per character is given by −Plog(P) − (1 − P)log(1 − P). ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 220

Solution 1.8.5 The local magnetization mi is the average value of a partic-ular spin variable:

mi = Psi(1) − Psi

(−1) (1.8.16)

Using Psi(1) + Psi

(−1) = 1 we have:

Psi(1) = (1 + mi) / 2

Psi(−1) = (1 − mi) / 2

(1.8.17)

Inserting these expressions into Eq. (1.8.15) and summing over a set of bi-nary variables leads to the expression:

(1.8.18)The result is more general than this derivation suggests and will be discussedfurther in Chapter 8. ❚

Question 1.8.6 For a given set of possible messages, prove that the en-semble where all messages have equal probability provides the highest

average information.

Solution 1.8.6 Since the sum over all probabilities is a fixed number (1),weconsider what happens when we transfer some probability from one messageto another. We start with the information given by

(1.8.19)

and after shifting a probability of from one to the other we have:

(1.8.20)

We need to expand the change in information to first nonzero order in . Wesimplify the task by using the expression:

 − = f (P(s ′) + ) − f (P(s ′)) + f (P(s″) − ) − f (P(s″)) (1.8.21)

where

f (x) = −xlog(x) (1.8.22)

Taking a derivative, we have

(1.8.23)

This gives the result:

 − = −(log(P(s′)) − log(P(s″))) (1.8.24)

d

dxf (x) = −(log(x) +1)

< ′ I > = −(P( ′ s ) − )ln(P( ′ s ) − ) −(P( ′ ′ s ) + )ln(P( ′ ′ s ) + ) − P(s)ln(P(s))s≠ ′ s , ′ ′ s ∑

 = −P( ′ s )ln(P( ′ s )) − P( ′ ′ s )ln(P( ′ ′ s ))− P(s)ln(P(s))s≠ ′ s , ′ ′ s ∑

I = N −1

2(1+ mi )log 1+ mi( )+ (1− mi )log 1− mi( )( )

i

∑

= S /k ln(2)



01adBARYAM_29412 3/10/02 10:17 AM Page 221

Since log(x) is a monotonic increasing function, we see that the average in-formation increases (( − ) > 0) when probability > 0 is transferredfrom a higher-probability character to a lower-probability character (P(s″)> P(s ′) ⇒ −(log(P(s ′)) − log(P(s″)) > 0). Thus,any change of the probabil-ity toward a more uniform probability distribution increases the average in-formation. ❚

Question 1.8.7 A source produces strings of characters of length N. Eachcharacter that appears in the string is independently selected from an al-

phabet of characters with probabilities P(si). Write an expression for theprobability P(s) of a typical string of characters. Show that this expressionimplies that the string gives N times the average information content of anindividual character. Does this mean that every string must give this amountof information?

Solution 1.8.7 For a long string, each character will appear NP(si) times.The probability of such a string is:

(1.8.25)

The information content is:

(1.8.26)

which is N times the average information of a single character. This is the in-formation of a typical string. A particular string might have information sig-nificantly different from this. However, as the number of characters in thestring increases, by the central limit theorem (Section 1.2), the fraction oftimes a particular character appears (i.e., the distance traveled in a randomwalk divided by the total number of steps) becomes more narrowly distrib-uted around the expected probability P(si). This means the proportion ofmessages whose information content differs from the typical value decreaseswith increasing message length. ❚

1.8.3 Correlations between charactersThus far we have considered characters that are independent of each other. We canalso consider characters whose values are correlated. We describe the case of two cor-related characters. Because there are two characters,the notation must be more com-plete. As discussed in Section 1.2, we use the notation Ps1,s2

(s ′1, s′2) to denote the prob-ability that in the same string the character s1 takes the value s ′1 and the variable s2 takesthe value s′2. The average information contained in the two characters is given by:

(1.8.27)

 = − Ps 1 ,s 2

( ′ s 1 , ′ s 2)′ s 1, ′ s 2

∑ log(Ps1,s2( ′ s 1 , ′ s 2))

I(s) = −log(P(s)) = −N P(si )log(s i

∑ P(s i ))

P(s) = P(s i )NP(si )

s i

∏



01adBARYAM_29412 3/10/02 10:17 AM Page 222

Note that the notation I(s1,s2) is often used for this expression. We use < Is1,s2> because

it is not a function of the values of the characters—it is the average information car-ried by the characters labeled by s1 and s2. We can compare the information contentof the two characters with the information content of each character separately:

(1.8.28)

(1.8.29)

It is possible to show (see Question 1.8.8) the inequalities:

(1.8.30)

The right inequality means that we receive more information from both charactersthan from either one separately. The left inequality means that information we receivefrom both characters together cannot exceed the sum of the information from eachseparately. It can be less if the characters are dependent on each other. In this case,re-ceiving one character reduces the information given by the second.

The relationship between the information from a character s1 and the informa-tion from the same character after we know another character s2 can be investigatedby defining a contingent or conditional probability:

(1.8.31)

This is the probability that s1 takes the value s′1 assuming that s2 takes the value s′2. Weused this notation in Section 1.2 to describe the transitions from one value to the nextin a chain of events (random walk). Here we are using it more generally. We couldrecover the previous meaning by writing the transition probability as Ps(s ′1|s ′2) =Ps(t),s(t − 1)(s ′1|s′2). In this section we will be concerned with the more general defini-tion, Eq. (1.8.31).

We can find the inform a ti on con tent of the ch a racter s1 wh en s2 t a kes the va lue s ′2

(1.8.32)



s 2= ′ s 2= − Ps 1 ,s 2

( ′ s 1 | ′ s 2)′ s 1

∑ log Ps 1,s2( ′ s 1 | ′ s 2)( )

=

−′ s 1

∑ Ps1 ,s 2( ′ s 1 , ′ s 2) log(Ps 1 ,s 2

( ′ s 1 , ′ s 2)) − log( Ps1 ,s 2( ′ ′ ′ s 1 , ′ s 2)

′ ′ ′ s 1

∑ )

Ps1,s2( ′ ′ s 1, ′ s 2)

′ ′ s 1

∑

Ps1 ,s 2( ′ s 1 | ′ s 2) =

Ps1,s 2( ′ s 1 , ′ s 2)

Ps 1 ,s 2( ′ ′ s 1 , ′ s 2)

′ ′ s 1

∑

 + ≥ ≥ ,

 = − Ps1 ,s 2

( ′ s 1, ′ s 2)′ s 1 , ′ s 2

∑ log( Ps1 ,s 2( ′ ′ s 1 , ′ s 2)

′ ′ s 1

∑ )

 = − Ps1 ,s 2

( ′ s 1 , ′ s 2)′ s 1 , ′ s 2

∑ log( Ps 1 ,s 2( ′ s 1 , ′ ′ s 2)

′ ′ s 2

∑ )

Ps2( ′ s 2) = Ps1 ,s 2

( ′ s 1, ′ s 2)′ s 1

∑

Ps1( ′ s 1) = Ps1 ,s 2

( ′ s 1, ′ s 2 )′ s 2

∑



01adBARYAM_29412 3/10/02 10:17 AM Page 223

This can be averaged over possible values of s2, giving us the average information con-tent of the character s1 when the character s2 is known.

(1.8.33)

The average we have taken should be carefully understood. The unconventional dou-ble average notation is used to indicate that the two averages are of a different nature.One way to think about it is as treating the information content of a dynamic variables1 when s2 is a quenched (frozen) random variable. We can rewrite this in terms of theinformation content of the two characters,and the information content of the char-acter s2 by itself as follows:

(1.8.34)

Thus we have:

(1.8.35)

This is the intuitive result that the information content given by both characters is thesame as the information content gained by sequentially obtaining the informationfrom the characters.Once the first character is known,the second character providesonly the information given by the conditional probabilities. There is no reason to re-strict the use of Eq.(1.8.27) – Eq.(1.8.35) to the case where s1 is a single character ands2 is a single character. It applies equally well if s1 is one set of characters,and s2 is an-other set of characters.

Question 1.8.8 Prove the inequalities in Eq. (1.8.30).

Hints for the left inequality:

1. It is helpful to use Eq. (1.8.35).

2. Use convexity ( f(⟨x⟩) > ⟨ f(x)⟩) of the function f (x) = −xlog(x).

Solution 1.8.8 The right inequality in Eq. (1.8.30) follows from the in-equality:

(1.8.36)

Ps1( ′ ′ s 1) = Ps1 ,s 2

( ′ ′ s 1, ′ s 2 )′ ′ s 1

∑ > Ps1,s2( ′ s 1, ′ s 2)

 = < Is 1

> + <> = + <>

<> = −

′ s 1 , ′ s 2

∑ Ps1,s 2( ′ s 1, ′ s 2) log(Ps 1,s2

( ′ s 1, ′ s 2))− log( Ps 1,s2( ′ ′ s 1 , ′ s 2)

′ ′ s 1

∑ )

= −

<> ≡ < s 2 = ′ s 2>

= − Ps 2( ′ s 2)

′ s 2

∑ Ps1 ,s 2( ′ s 1 | ′ s 2)

′ s 1

∑ log Ps 1 ,s 2( ′ s 1 | ′ s 2)( )

= − Ps1 ,s 2( ′ ′ ′ s 1, ′ s 2)

′ ′ ′ s 1

∑′ s 1

∑Ps1 ,s 2

( ′ s 1, ′ s 2)

Ps1,s 2( ′ ′ s 1, ′ s 2)

′ ′ s 1

∑′ s 2

∑ log Ps1 ,s 2( ′ s 1 | ′ s 2)( )

= −′ s 1, ′ s 2

∑ Ps1,s2( ′ s 1 , ′ s 2)log Ps1,s2

( ′ s 1 | ′ s 2)( )



01adBARYAM_29412 3/10/02 10:17 AM Page 224

The logarithm is a monotonic increasing function, so we can take thelogarithm:

(1.8.37)

Changing sign and averaging leads to the desired result:

(1.8.38)

The left inequality in Eq. (1.8.30) may be proven from Eq. (1.8.35) and theintuitive inequality

(1.8.39)

To prove this inequality we make use of the convexity of the function f(x) =−xlog(x). Convexity of a function means that its value always lies above linesegments (secants) that begin and end at points along its graph.Algebraically:

f((ax + by) / (a + b)) > (af(x) + bf(y)) / (a + b) (1.8.40)

More generally, taking a set of values of x and averaging over them gives:

f (⟨x⟩) > ⟨ f (x)⟩ (1.8.41)

Convexity of f(x) follows from the observation that

(1.8.42)

for all x > 0, which is where the function f (x) is defined.

We then note the relationship:

(1.8.43)

where, to simplify the following equations, we use a subscript to indicate theaverage with respect to s2. The desired result follows from applying convex-ity as follows:

(1.8.44)

 = − Ps1

( ′ s 1)′ s 1

∑ log(Ps1( ′ s 1)) = f (Ps 1

( ′ s 1)′ s 1

∑ ) = f (< Ps1,s2( ′ s 1 | ′ s 2) >s 2

′ s 1

∑ )

> < f (Ps1 ,s 2( ′ s 1 | ′ s 2)

′ s 1

∑ ) >s 2

= − Ps2( ′ s 2 )

′ s 2

∑ Ps 1 ,s 2( ′ s 1 | ′ s 2)

′ s 1

∑ log Ps1,s2( ′ s 1 | ′ s 2)( ) = <>

Ps1( ′ s 1) = Ps2

( ′ s 2)′ s 2

∑ Ps 1 ,s 2( ′ s 1 | ′ s 2) = < Ps 1 ,s 2

( ′ s 1 | ′ s 2) >s 2

d 2f

dx 2= −

1

x ln(2)< 0

< Is 1>( ) > <>( )

 = − Ps1 ,s 2

( ′ s 1, ′ s 2)′ s 1 , ′ s 2

∑ log( Ps1 ,s 2( ′ ′ s 1 , ′ s 2)

′ ′ s 1

∑ )

< − Ps1 ,s 2( ′ s 1 , ′ s 2)

′ s 1, ′ s 2

∑ log(Ps 1,s2( ′ s 1, ′ s 2)) = 

log( Ps1 ,s 2( ′ ′ s 1, ′ s 2)

′ ′ s 1

∑ ) > log(Ps 1,s2( ′ s 1 , ′ s 2))



01adBARYAM_29412 3/10/02 10:17 AM Page 225

the final equality following from the definition in Eq. (1.8.33). We can nowmake use of Eq. (1.8.35) to obtain the desired result. ❚

1.8.4 Ergodic sourcesWe consider a source that provides arbitrarily long messages, or simply continues togive characters at a particular rate. Even though the messages are infinitely long, theyare still considered elements of an ensemble. It is then convenient to measure the av-erage information per character. The characterization of such an information sourceis simplified if each (long) message contains within it a complete sampling of the pos-sibilities. This means that if we wait long enough, the entire ensemble of possiblecharacter sequences will be represented in any single message. This is the same kindof property as an ergodic system discussed in Section 1.3.By analogy, such sources areknown as ergodic sources. For an ergodic source,not only the characters appear withtheir ensemble probabilities, but also the pairs of characters, the triples of characters,and so on.

For ergodic sources,the information from an ensemble average over all possiblemessages is the same as the information for a particular long string. To write thisdown we need a notation that allows variable length messages. We write sN = (s1s2...sN),where N is the length of the string. The average information content per charactermay be written as:

(1.8.45)

The rightmost equality is valid for an ergodic source. An example of an ergodic sourceis a source that provides independent characters—i.e., selects each character from anensemble. For this case, Eq.(1.8.45) was shown in Question 1.8.7. More generally, fora source to be ergodic, long enough strings must break up into independent sub-strings, or substrings that are more and more independent as their length increases.

Assuming that N is large enough, we can use the limit in Eq. (1.8.45) and write:

(1.8.46)

Thus, for large enough N, there are a set of strings that are equally likely to be gener-ated by the source. The number of these strings is

(1.8.47)

Since any string of characters is possible,in principle,this statement must be formallyunderstood as saying that the total probability of all other strings becomes arbitrarilysmall.

If the string of characters is a Markov chain (Section 1.2),so that the probabilityof each character depends only on the previous character, then there are general con-ditions that can ensure that the source is ergodic. Similar to the discussion of MonteCarlo simulations in Section 1.7, for the source to be ergodic,the transition probabil-

2N <is >

P(sN ) ≈ 2 −N <is >

< is >= limN →∞

< Is N>

N= − lim

N→∞

1

NP(sN )

s N

∑ log(P(sN )) = − limN →∞

1

Nlog(P(sN ))



01adBARYAM_29412 3/10/02 10:17 AM Page 226

ities between characters must be irreducible and acyclic. Irreducibility guarantees thatall characters are accessible from any starting character. The acyclic property guaran-tees that starting from one substring, all other substrings are accessible. Thus, if wecan reach any particular substring, it will appear with the same frequency in all longstrings.

We can generalize the usual Markov chain by allowing the probability of a char-acter to depend on several (n) previous characters. A Markov chain may be con-structed to represent such a chain by defining new characters, where each new char-acter is formed out of a substring of n characters. Then each new character dependsonly on the previous one. The essential behavior of a Markov chain that is importanthere is that correlations measured along the chain of characters disappear exponen-tially. Thus,the statistical behavior of the chain in one place is independent of what itwas in the sufficiently far past. The number of characters over which the correlationsdisappear is the correlation length. By allowing sufficiently many correlation lengthsalong the string—segments that are statistically independent—the average propertiesof one string will be the same as any other such string.

Question 1.8.9 Consider ergodic sources that are Markov chains withtwo characters si = ±1 with transition probabilities:

a. P(1|1) = .999, P(−1|1) = .001, P(−1|−1) = 0.5, P(1|−1) = 0.5

b. P(1|1) = .999, P(−1|1) = .001, P(−1|−1) = 0.999, P(1|−1) = 0.001

c. P(1|1) = .999, P(−1|1) = .001, P(−1|−1) = 0.001, P(1|−1) = 0.999

d. P(1|1) = .001, P(−1|1) = .999, P(−1|−1) = 0.5, P(1|−1) = 0.5

e. P(1|1) = .001, P(−1|1) = .999, P(−1|−1) = 0.999, P(1|−1) = 0.001

f. P(1|1) = .001, P(−1|1) = .999, P(−1|−1) = 0.001, P(1|−1) = 0.999

Describe the appearance of the strings generated by each source, and(roughly) its correlation length.

Solution 1.8.9 (a) has long regions of 1s of typical length 1000. In betweenthere are short strings of –1s of average length 2 = 1 + 1/2 + 1/4 + ...(thereis a probability of 1/2 that a second character will be –1 and a probability of1/4 that both the second and third will be –1, etc.). (b) has long regions of1s and long regions of –1s, both of typical length 1000. (c) is like (a) exceptthe regions of –1s are of length 1. (d) has no extended regions of 1 or –1 buthas slightly longer regions of –1s. (e) inverts (c). (f ) has regions of alternat-ing 1 and –1 of length 1000 before switching to the other possibility (odd andeven indices are switched). We see that the characteristic correlation lengthis of order 1000 in (a),(b),(c),(e) and (f ) and of order 2 in (d ). ❚

We have considered in detail the problem of determining the information con-tent of a message, or the average information generated by a source, when the char-acteristics of the source are well defined. The source was characterized by the ensem-ble of possible messages and their probabilities. However, we do not usually have a



01adBARYAM_29412 3/10/02 10:17 AM Page 227

well-defined characterization of a source of messages,so a more practical question isto determine the information content from the message itself.The definitions that wehave provided do not guide us in determining the information of an arbitrary mes-sage. We must have a model for the source. The model must be constructed out of theinformation we have—the string of characters it produces.One possibility is to modelthe source as ergodic. An ergodic source can be modeled in two ways, as a source ofindependent substrings or as a generalized Markov chain where characters depend ona certain number of previous characters. In each case we construct not one,but an in-finite sequence of models. The models are designed so that if the source is ergodicthen the information estimates given by the models converge to give the correct in-formation content.

There is a natural sequence of independent substring models indexed by thenumber of characters in the substrings n. The first model is that of a source produc-ing independent characters with a probability specified by their frequency of occur-rence in the message. The second model would be a source producing pairs of corre-lated characters so that every pair of characters is described by the probability givenby their occurrence (we allow character pairs to overlap in the message). The thirdmodel would be that of a source producing triples of correlated characters,and so on.We use each of these models to estimate the information. The nth model estimate ofthe information per character given by the source is:

(1.8.48)

where we indicate using the subscript 1,n that this is an estimate obtained using thefirst type of model (independent substring model) using substrings of length n. Wealso make use of an approximate probability for the substring defined as

(1.8.49)

where N(sn) is the number of times sn appears in the string of length N. The informa-tion of the source might then be estimated as the limit n → ∞ of Eq. (1.8.48):

(1.8.50)

For an ergodic source, we can see that this converges to the information of the mes-sage. The n limit converges monotonically from above. This is because the additionalinformation in sn+1 given by sn+1 is less than the information added by each previouscharacter (see Eq. 1.8.59 below). Thus, the estimate of information per characterbased on sn is higher than the estimate based on sn+1. Therefore, for each value of n theestimate <is >1,n is an upper bound on the information given by the source.

How large does N have to be? Since we must have a reasonable sample of the oc-currence of substrings in order to estimate their probability, we can only estimateprobabilities of substrings that are much shorter than the length of the string. Thenumber of possible substrings grows exponentially with n as kn, where k is the num-

< is > = limn→∞

limN →∞

1

n˜ P N(sn )

sn

∑ log(˜ P N (sn))

˜ P N (sn ) = N(sn)/(N − n +1)

< is >1,n = limN→∞

1

n˜ P N(sn )

sn

∑ log(˜ P N (sn))



01adBARYAM_29412 3/10/02 10:17 AM Page 228

ber of possible characters. If substrings occur with roughly similar probabilities,then to estimate the probability of a substring of length n would require at least astring of length kn characters. Thus,taking the large N limit should be understood tocorrespond to N greater than kn. This is a very severe requirement. This means thatto study a model of English character strings of length n = 10 (ignoring upper andlower case, numbers and punctuation) would require 2610 ~ 1014 characters. This isroughly the number of characters in all of the books in the Library of Congress (seeQuestion 1.8.15).

The generalized Markov chain model assumes a particular character is depen-dent only on n previous characters. Since the first n characters do not provide a sig-nificant amount of information for a very long chain (N >> n), we can obtain the av-erage information per character from the incremental information given by acharacter. Thus, for the nth generalized Markov chain model we have the estimate:

(1.8.51)

where we define the approximate conditional probability using:

(1.8.52)

Taking the limit n → ∞ we have an estimate of the information of the source percharacter:

(1.8.53)

This also converges from above as a function of n for large enough N. For a given n, aMarkov chain model takes into account more correlations than the previous inde-pendent substring model and thus gives a better estimate of the information(Question 1.8.10).

Question 1.8.10 Prove that the Markov chain model gives a better esti-mate of the information for ergodic sources than the independent sub-

string model for a particular n. Assume the limit N → ∞ so that the estimatedprobabilities become actual and we can substitute PN → P in Eq.(1.8.48) andEq. (1.8.51).

Solution 1.8.10 The information in a substring of length n is given by thesum of the information provided incrementally by each character, where theprevious characters are known. We derive this statement algebraically(Eq. (1.8.59)) and use it to prove the desired result. Taking the N limit inEq. (1.8.48), we define the nth approximation using the independent sub-string model as:

(1.8.54)

< is >1,n =1

nP(sn )

sn

∑ log(P(sn ))

< is > = limn→∞

limN →∞

˜ P N (sn−1)s n−1

∑ ˜ P (sn | sn−1)s n

∑ log(˜ P (sn |sn−1))

˜ P N (sn |sn−1) = N(sn−1sn)/N(sn−1)

< is >2,n = < > = lim

N→ ∞˜ P N (sn−1)

s n−1

∑ ˜ P (sn | sn−1)s n

∑ log(˜ P (sn | sn−1))



01adBARYAM_29412 3/10/02 10:17 AM Page 229

and for the nth generalized Markov chain model we take the same limit inEq. (1.8.51):

(1.8.55)

To rel a te these ex pre s s i ons to each other, fo ll ow the deriva ti on of Eq .( 1 . 8 . 3 4 ) ,or use it with the su b s ti tuti ons s1 → sn−1 and s2 → sn, to obt a i n

(1.8.56)

Using the identities

(1.8.57)

this can be rewritten as:

< is >2, n = n< is >1, n −(n − 1)< is >1, n−1 (1.8.58)

This result can be summed over n from 1 to n (the n = 1 case is<is >2,1 = <is>1,1) to obtain:

(1.8.59)

since < is >2,n is monotonic decreasing and < is >1,n is seen from this expres-sion to be an average over < is >2,n with lower values of n, we must have that

< is >2,n ≤ < is >1,n (1.8.60)

as desired. ❚

Question 1.8.11 We have shown that the two models—the independentsubstring models and the generalized Markov chain model—are upper

bounds to the information in a string. How good is the upper bound? Thinkup an example that shows that it can be terrible for both, but better for theMarkov chain.

Solution 1.8.11 Consider the example of a long string formed out of a re-peating substring, for example (000000010000000100000001…). The aver-age information content per character of this string is zero. This is becauseonce the repeat structure has become established,there is no more informa-tion. Any model that gives a nonzero estimate of the information content per

′ n =1

n

∑ 2, ′ n = n < is >1,n

P(sn−1sn) = P(sn )

P(sn−1) = P(sn−1sn )sn

∑

< is >2,n = −s n −1,sn

∑ P(sn−1sn) log(P(sn−1sn ))− log( P(sn−1sn )sn

∑ )

< is >2,n = P(sn−1)s n−1

∑ P(sn | sn−1)s n

∑ log(P(sn |sn−1))



01adBARYAM_29412 3/10/02 10:17 AM Page 230

character will make a great error in its estimate of the information contentof the string, which is N times as much as the information per character.

For the independent substring model,the estimate is never zero. For theMarkov chain model it is nonzero until n reaches the repeat distance. AMarkov model with n the same size or larger than the repeat length will givethe correct answer of zero information per character. This means that evenfor the Markov chain model, the information estimate does not work verywell for n less than the repeat distance. ❚

Question 1.8.12 Write a computer program to estimate the informationin English and find the estimate. For simple,easy-to-compute estimates,

use single-character probabilities,two-character probabilities,and a Markovchain model for individual characters. These correspond to the above defin-itions of < is >2,1= < is >1,1, < is >1,2, and < is >2,2 respectively.

Solution 1.8.12 A program that evaluates the information content usingsingle-character probabilities applied to the text (excluding equations) ofSection 1.8 of this book gives an estimate of information content of 4.4bits/character. Two-character probabilities gives 3.8 bits/character, and theone-character Markov chain model gives 3.3 bits/character. A chapter of abook by Mark Twain gives similar results. These estimates are decreasing inmagnitude, consistent with the discussion in the text. They are also still quitehigh as estimates of the information in English per character.

The best estimates are based upon human guessing of the next charac-ter in a written text. Such experiments with human subjects give estimates ofthe lower and upper bounds of information content per character of Englishtext. These are 0.6 and 1.2 bits/character. This range is significantly below theestimates we obtained using simple models. Remarkably, these estimatessuggest that it is enough to give only one in four to one in eight characters ofEnglish in order for text to be decipherable. ❚

Question 1.8.13 Construct an example illustrating how correlations canarise between characters over longer than,say, ten characters. These cor-

relations would not be represented by any reasonable character-basedMarkov chain model. Is there an example of this type relevant to the Englishlanguage?

Solution 1.8.13 Example 1: If we have information that is read from a ma-trix row by row, where the matrix entries have correlations between rows,then there will be correlations that are longer than the length of the matrixrows.

Example 2: We can think about successive English sentences as rows ofa matrix. We would expect to find correlations between rows (i.e., betweenwords found in adjacent sentences) rather than just between letters. ❚



01adBARYAM_29412 3/10/02 10:17 AM Page 231

Question 1.8.14 Estimate the amount of information in a typical book(order of magnitude is sufficient). Use the best estimate of information

content per character of English text of about 1 bit per character.

Solution 1.8.14 A rough estimate can be made using as follows: A 200page novel with 60 characters per line and 30 lines per page has 4 × 105

characters. Textbooks can have several times this many characters.A dictio-nary, which is significantly longer than a typical book, might have 2 × 107

characters. Thus we might use an order of magnitude value of 106 bits perbook. ❚

Question 1.8.15 Obtain an estimate of the number of characters (andthus the number of bits of information) in the Library of Congress.

Assume an average of 106 characters per book.

Solution 1.8.15 According to information provided by the Library ofCongress,there are presently (in 1996) 16 million books classified accordingto the Library of Congress classification system, 13 million other books atthe Library of Congress, and approximately 80 million other items such asnewspapers, maps and films. Thus with 107–108 book equivalents, we esti-mate the number of characters as 1013–1014. ❚

Inherent in the notion of quantifying information content is the understandingthat the same information can be communicated in different ways, as long as theamount of information that can be transmitted is sufficient. Thus we can use binary,decimal, hexadecimal or typed letters to communicate both numbers and letters.Information can be communicated using any set of (two or more) characters. Thepresumption is that there is a way of translating from one to another. Translation op-erations are called codes; the act of translation is encoding or decoding. Among pos-sible codes are those that are invertible.Encoding a message cannot add information,it might,however, lose information (Question 1.8.16). Invertible codes must preservethe amount of information.

Once we have determined the information content, we can compare differentways of writing the same information. Assume that one source generates a message oflength N characters with information I. Then a different source may transmit thesame information using fewer characters. Even if characters are generated at the samerate,the information may be more rapidly transmitted by one source than another. Inparticular, regardless of the value of N, by definition of information content, we couldhave communicated the same information using a binary string of length I. It is,how-ever, impossible to use fewer than I bits because the maximum information a binarymessage can contain is equal to its length. This amount of information occurs for asource with equal a priori probability.

Encoding the information in a shorter form is equivalent to data compression.Thus a completely compressed binary data string would have an amount of informa-tion g iven by its length. The source of such a message would be characterized as asource of messages with equal a priori probability—a random source. We see that ran-



01adBARYAM_29412 3/10/02 10:17 AM Page 232

domness and information are related. Without a translation (decoding) function itwould be impossible to distinguish the completely compressed information from ran-dom numbers. Moreover, a random string could not be compressed.

Question 1.8.16 Prove that an encoding operation that takes a messageas input and converts it into another well-defined message (i.e., for a

particular input message, the same output message is always given) cannotadd information but may reduce it. Describe the necessary conditions for itto keep the same amount of information.

Solution 1.8.16 Our definition of information relies upon the specifica-tion of the ensemble of possible messages. Consider this ensemble and as-sume that each message appears in the ensemble a number of times in pro-portion to its probability, like the bag with red and green balls. The effect ofa coding operation is to label each ball with the new message (code) that willbe delivered after the coding operation. The amount of information dependsnot on the nature of the label, but rather on the number of balls with thesame label. The requirement that a particular message is encoded in a well-defined way means that two balls that start with the same message cannot belabeled with different codes. However, it is possible for balls with differentoriginal messages to be labeled the same. The average information is notchanged if and only if all distinct messages are labeled with distinct codes. Ifany distinct messages become identified by the same label, the informationis reduced.

We can prove this conclusion algebraically using the result ofQuestion 1.8.8, which showed that transferring probability from a less likelyto a more likely case reduced the information content. Here we are,in effect,transferring all of the probability from the less likely to the more likely case.The change in information upon labeling two distinct messages with thesame code is given by (f (x) = −xlog(x), as in Question 1.8.8):

∆I = f (P(s1) + P(s2)) − (f(P(s1)) + f (P(s2)))

= (f (P(s1) + P(s2)) + f (0)) − (f (P(s1)) + f (P(s2))) < 0(1.8.61)

where the inequality follows because f (x) is convex in the range 0 < x < 1. ❚

1.8.5 Human communicationThe theory of information, like other theories, relies upon idealized constructs thatare useful in establishing the essential concepts, but do not capture all features of realsystems. In particular, the definition and discussion of information relies uponsources that transmit the result of random occurrences, which, by definition, cannotbe known by the recipient. The sources are also completely described by specifying thenature of the random process. This model for the nature of the source and the recip-ient does not adequately capture the attributes of communication between humanbeings. The theory of information can be applied directly to address questions about



01adBARYAM_29412 3/10/02 10:17 AM Page 233

information channels and the characterization of communication in general. It canalso be used to develop an understanding of the complexity of systems. In this section,however, we will consider some additional issues that should be kept in mind whenapplying the theory to the communication between human beings. These issues willarise again in Chapter 8.

The definition of information content relies heavily on the concepts of probabil-ity, ensembles, and processes that generate arbitrarily many characters. These con-cepts are fraught with practical and philosophical difficulties—when there is only onetransmitted message,how can we say there were many that were possible? A book maybe considered as a single communication.A book has finite length and, for a particu-lar author and a particular reader, is a unique communication. In order to understandboth the strengths and the limitations of applying the theory of information,it is nec-essary to recognize that the information content of a message depends on the infor-mation that the recipient of the message already has. In particular, information thatthe recipient has about the source. In the discussion above,a clear distinction has beenmade. The only information that characterizes the source is in the ensemble proba-bilities P(s). The information transmitted by a single message is distinct from the en-semble probabilities and is quantified by I(s). It is assumed that the characterizationof the source is completely known to the recipient. The content of the message is com-pletely unknown (and unknowable in advance) to the recipient.

A slightly more difficult example to consider is that of a recipient who does notknow the characterization of the source. However, such a characterization in terms ofan ensemble P(s) does exist. Under these circumstances, the amount of informationtransferred by a message would be more than the amount of information given byI(s). However, the maximum amount of information that could be transferred wouldbe the sum of the information in the message,and the information necessary to char-acterize the source by specifying the probabilities P(s). This upper bound on the in-formation that can be transferred is only useful if the amount of information neces-sary to characterize the source is small compared to the information in the message.

The difficulty with discussing human communication is that the amount of in-formation necessary to ful ly characterize the source (one human being) is generallymuch larger than the information transmitted by a particular message. Similarly, theamount of information possessed by the recipient (another human being) is muchlarger than the information contained in a par ticular message. Thus it is reasonableto assume that the recipient does not have a full characterization of the source. It isalso reasonable to assume that the model that the recipient has about the source ismore sophisticated than a typical Markov chain model, even though it is a simplifiedmodel of a human being. The information contained in a message is, in a sense, theadditional information not contained in the original model possessed by the recipi-ent. This is consistent with the above discussion, but it also recognizes that specifyingthe probabilities of the ensemble may require a significant amount of information. Itmay also be convenient to summarize this information by a different type of modelthan a Markov chain model.



01adBARYAM_29412 3/10/02 10:17 AM Page 234

Once the specific model and information that the recipient has about the sourceenters into an evaluation of the information transfer, there is a certain and quite rea-sonable degree of relativity in the amount of information transferred. An extreme ex-ample would be if the recipient has already received a long message and knows thesame message is being repeated,then no new information is being transmitted.A per-son who has memorized the Gettysburg Address will receive very little new informa-tion upon hearing or reading it again. The prior knowledge is part of the model pos-sessed by the recipient about the source.

Can we incorporate this in our definition of information? In every case where wehave measured the information of a message, we have made use of a model of thesource of the information. The underlying assumption is that this model is possessedby the recipient. It should now be recognized that there is a certain amount of infor-mation necessary to describe this model. As long as the amount of information in themodel is small compared to the amount of information in the message, we can saythat we have an absolute estimate of the information content of the message. As soonas the information content of the model approaches that of the message itself, thenthe amount of information transferred is sensitive to exactly what information isknown. It might be possible to develop a theory of information that incorporates theinformation in the model,and thus to arrive at a more absolute measure of informa-tion. Alternatively, it might be necessary to develop a theory that considers the recip-ient and source more completely, since in actual communication between human be-ings, both are nonergodic systems possessed of a large amount of information. Thereis significant overlap of the information possessed by the recipient and the source.Moreover, this common information is essential to the communication itself.

One effort to arrive at a universal definition of information content of a messagehas been made by formally quantifying the information contained in models. The re-sulting information measure, Kolmogorov complexity, is based on computation the-ory discussed in the next section. While there is some success with this approach,twodifficulties remain. In order for a universal definition of information to be agreedupon,models must still have an information content which is less than the message—knowledge possessed must be smaller than that received. Also, to calculate the infor-mation contained in a particular message is essentially impossible, since it requirescomputational effort that grows exponentially with the length of the message. In anypractical case,the amount of information contained in a message must be estimatedusing a limited set of models of the source. The utilization of a limited set of modelsmeans that any estimate of the information in a message is an upper bound.

Computation

The theory of com p ut a ti on de s c ri bes the opera ti ons that we perform on nu m bers ,i n cluding ad d i ti on , su btracti on , mu l ti p l i c a ti on and divi s i on . More gen era lly, a com-p ut a ti on is a sequ en ce of opera ti ons each of wh i ch has a def i n i te / u n i qu e / well - def i n edre su l t . The fundamental stu dy of su ch opera ti ons is the theory of l ogi c . Logical

1.9

C o m p u t a t i o n 235


01adBARYAM_29412 3/10/02 10:17 AM Page 235

operations do not necessarily act upon numbers, but rather upon abstract objectscalled statements. Statements can be combined together using operators such asAND and OR, and acted upon by the negation operation NOT. The theory of logicand the theory of computation are at root the same. All computations that havebeen conceived of can be constructed out of logical operations. We will discuss thisequivalence in some detail.

We also discuss a further equivalence, generally less well appreciated, betweencomputation and deterministic time evolution. The theory of computation strives todescribe the class of all possible discrete deterministic or causal systems.Computations are essentially causal relationships. Computation theory is designed tocapture all such possible relationships. It is thus essential to our understanding notjust of the behavior of computers, or of human logic, but also to the understandingof causal relationships in all physical systems. A counterpoint to this association ofcomputation and causality is the recognition that certain classes of deterministic dy-namical systems are capable of the property known as universal computation.

One of the central findings of the theory of computation is that many apparentlydifferent formulations of computation turn out to be equivalent. The sense in whichthey are equivalent is that each one can simulate the other. In the early years of com-putation theory, there was an effort to describe sets of operations that would be morepowerful than others. When all of them were shown to be equivalent it became gen-erally accepted (the Church-Turing hypothesis) that there is a well-defined set of pos-sible computations realized by any of several conceptual formulations. This has be-come known as the theory of universal computation.

1.9.1 Propositional logicLogic is the study of reasoning, inference and deduction. Propositional logic describesthe manipulation of statements that are either true or false. It assumes that there ex-ists a set of statements that are either true or false at a par ticular time, but not both.Logic then provides the possibility of using an assumed set of relationships betweenthe statements to determine the truth or falsehood of other statements.

For example,the statements Q1 = “I am standing” and Q2 = “I am sitting” may berelated by the assumption: Q1 is true implies that Q2 is not true. Using this assump-tion,it is understood that a statement “Q1 AND Q2” must be false. The falsehood de-pends only on the relationship between the two sentences and not on the particularmeaning of the sentences. This suggests that an abstract construction that describesmechanisms of inference can be developed. This abstract construction is proposi-tional logic.

Propositional logic is formed out of statements (propositions) that may be true(T) or false (F), and operations. The operations are described by their actions uponstatements. Since the only concern of logic is the truth or falsehood of statements, wecan describe the operations through tables of truth values (truth tables) as follows.NOT (^) is an operator that acts on a single statement (a unary operator) to form anew statement. If Q is a statement then ^Q (read “not Q”) is the symbolic represen-



01adBARYAM_29412 3/10/02 10:17 AM Page 236

tation of “It is not true that Q.” The truth of ^Q is directly (causally) related to thetruth of Q by the relationship in the table:

Q ^Q

T F (1.9.1)F T

The value of the truth or falsehood of Q is shown in the left column and the corre-sponding value of the truth or falsehood of ^Q is given in the right column.

Similarly, we can write the truth tables for the operations AND (&) and OR (|):

Q1 Q2 Q1&Q2

T T TT F F (1.9.2)F T FF F F

Q1 Q2 Q1|Q2

T T TT F T (1.9.3)F T TF F F

As the tables show, Q1&Q2 is only true if both Q1 is true and Q2 is true. Q1|Q2 is onlyfalse if both Q1 is false and Q2 is false.

Propositional logic includes logical theorems as statements. For example, thestatement Q1 is true if and only if Q2 is true can also be written as a binary operationQ1 ≡ Q2 with the truth table:

Q1 Q2 Q1 ≡ Q2

T T TT F F (1.9.4)F T FF F T

Another binary operation is the statement Q1 implies Q2, Q1 ⇒ Q2. When thisstatement is translated into propositional logic,there is a difficulty that is usually by-passed by the following convention:

Q1 Q2 Q1 ⇒ Q2

T T TT F F (1.9.5)F T TF F T

The difficulty is that the last two lines suggest that when the antecedent Q1 is false,theimplication is true, whether or not the consequent Q2 is true. For example, the



01adBARYAM_29412 3/10/02 10:17 AM Page 237

statement “If I had wings then I could fly”is as true a statement as “If I had wings thenI couldn’t fly,” or the statement “If I had wings then potatoes would be flat.” The prob-lem originates in the necessity of assuming that the result is true or false in a uniqueway based upon the truth values of Q1 and Q2. Other information is not admissible,and a third choice of “nonsense” or “incomplete information provided”is not allowedwithin propositional logic. Another way to think about this problem is to say thatthere are many operators that can be formed with definite outcomes. Regardless ofhow we relate these operators to our own logical processes, we can study the systemof operators that can be formed in this way. This is a model, but not a complete one,for human logic.Or, if we choose to define logic as described by this system,then hu-man thought (as reflected by the meaning of the word “implies”) is not fully charac-terized by logic (as reflected by the meaning of the operation “⇒”).

In addition to unary and binary operations that can act upon statements to formother statements,it is necessary to have parentheses that differentiate the order of op-erations to be performed. For example a statement ((Q1 ≡ Q2)&(^Q3)|Q1) is a seriesof operations on primitive statements that starts from the innermost parenthesis andprogresses outward.As in this example,there may be more than one innermost paren-thesis. To be definite, we could insist that the order of performing these operations isfrom left to right. However, this order does not affect any result.

Within the context of propositional logic, it is possible to describe a systematicmechanism for proving statements that are composed of primitive statements. Thereare several conclusions that can be arrived at regarding a particular statement.A tau-tology is a statement that is always true regardless of the truth or falsehood of its com-ponent statements. Tautologies are also called theorems. A contradiction is a state-ment that is always false. Examples are given in Question 1.9.1.

Question 1.9.1 Evaluate the truth table of:

a. (Q1 ⇒ Q2)|((^Q2)&Q1)

b. (^(Q1 ⇒ Q2))≡((^Q1)|Q2)

Identify which is a tautology and which is a contradiction.

Solution 1.9.1 Build up the truth table piece by piece:a. Tautology:

Q1 Q2 Q1 ⇒ Q2 (^Q2)&Q1 (Q1 ⇒ Q2)|((^Q2)&Q1)

T T T F TT F F T TF T T F TF F T F T

(1.9.6)



01adBARYAM_29412 3/10/02 10:17 AM Page 238

b. Contradiction:

Q1 Q2 ^(Q1 ⇒ Q2) (^Q1)|Q2 (^(Q1 ⇒ Q2)) ≡ ((^Q1)|Q2)

T T F T FT F T F FF T F T FF F F T F

(1.9.7) ❚

Question 1.9.2: Construct a theorem (tautology) from a contradiction.

Solution 1.9.2: By negation. ❚

1.9.2 Boolean algebraPropositional logic is a particular example of a more general symbolic system knownas a Boolean algebra. Set theory, with the operators complement,union and intersec-tion, is another example of a Boolean algebra. The formulation of a Boolean algebrais convenient because within this more general framework a number of importanttheorems can be proven. They then hold for propositional logic,set theory and otherBoolean algebras.

A Boolean algebra is a set of elements B={Q1,Q2, …}, a unary operator (^), andtwo binary operators, for which we adopt the notation (+,•),that satisfy the follo wingproperties for all Q1, Q2, Q3 in B:

1. Closure: ^Q1, Q1+Q2, and Q1•Q2 are in B

2. Commutative law: Q1+Q2=Q2+Q1, and Q1•Q2=Q2•Q1

3. Distributive law: Q1•(Q2+Q3)=(Q1•Q2)+(Q1•Q3) andQ1+(Q2•Q3)=(Q1+Q2)•(Q1+Q3)

4. Existence of identity elements, 0 and 1: Q1+0=Q1, and Q1•1=Q1

5. Complementarity law: Q1+(^Q1)=1 and Q1•(^Q1)=0

The statements of properties 2 through 5 consist of equalities. These equalities indi-cate that the element of the set that results from operations on the left is the same asthe element resulting from operations on the right. Note particularly the second partof the distributive law and the complementarity law that would not be valid if we in-terpreted + as addition and • as multiplication.

Assumptions 1 to 5 allow the proof of additional properties as follows:

6. Associative property: Q1+(Q2+Q3)=(Q1+Q2)+Q3 and Q1•(Q2•Q3)=(Q1•Q2)•Q3

7. Idempotent property: Q1+Q1=Q1 and Q1•Q1=Q1

8. Identity elements are nulls: Q1+1=1 and Q1•0=0

9. Involution property: ^(^Q1)=Q1



01adBARYAM_29412 3/10/02 10:17 AM Page 239

10. Absorption property: Q1+(Q1•Q2)=Q1 and Q1•(Q1+Q2)=Q1

11. DeMorgan’s Laws: ^(Q1+Q2)=(^Q1)•(^Q2) and ^(Q1•Q2)=(^Q1)+(^Q2)

To identify propositional logic as a Boolean algebra we use the set B={T,F} andmap the operations of propositional logic to Boolean operations as follows:(^ to ^),(| to +) and (& to •). The identity elements are mapped:(1 to T) and (0 to F). The proofof the Boolean properties for propositional logic is given as Question 1.9.3.

Question 1.9.3: Prove that the identification of propositional logic as aBoolean algebra is correct.

Solution 1.9.3: (1) is trivial; (2) is the invariance of the truth tables ofQ1&Q2, Q1|Q2 to interchange of values of Q1 and Q2; (3) requires compari-son of the t ruth tables of Q1|(Q2&Q3) and (Q1|Q2)&(Q1|Q3) (see below).Comparison of the truth tables of Q1&(Q2|Q3) and (Q1&Q2)|(Q1&Q3) isdone similarly.

Q1 Q2 Q3 Q2&Q3 Q1|(Q2&Q3) Q1|Q2 Q1|Q3 (Q1|Q2)&(Q1|Q3)

T T T T T T T TT T F F T T T TT F T F T T T TT F F F T T T TF T T T T T T TF T F F F T F FF F T F F F T FF F F F F F F F

(1.9.8)

(4) requires verifying Q1&T=T, and Q1|F=F (see the truth tables for & and |above);(5) requires constructing a truth table for Q|^Q and verifying that itis always T (see below). Similarly, the truth table for Q&^Q shows that it isalways F.

Q ^Q Q|^Q

T F T (1.9.9) ❚F T T

1.9.3 CompletenessOur obj ective is to show that an arbi tra ry truth tabl e , an arbi tra ry logical statem en t ,can be con s tru cted out of on ly a few logical opera ti on s . Truth tables are also equ iva-l ent to nu m erical functi on s — s pec i f i c a lly, f u n cti ons of bi n a ry va ri a bles that have bi-n a ry re sults (bi n a ry functi ons of bi n a ry va ri a bl e s ) . This can be seen using theBoolean repre s en t a ti on of T and F as {1,0} that is more familiar as a bi n a ry notati onfor nu m erical functi on s . For ex a m p l e , we can wri te the A N D and O R opera ti on s( f u n cti ons) also as:



01adBARYAM_29412 3/10/02 10:17 AM Page 240

Q1 Q2 Q1•Q2 Q1 Q2

1 1 1 11 0 0 1 (1.9.10)0 1 0 10 0 0 0

Similarly for all truth tables,a logical operation is a binary function of a set of binaryvariables. Thus,the ability to form an arbitrary truth table from a few logical opera-tors is the same as the ability to form an arbitrary binary function of binary variablesfrom these same logical operators.

To prove this ability, we use the properties of the Boolean algebra to systemati-cally discuss truth tables. We first construct an alternative Boolean expression forQ1+Q2 by a procedure that can be generalized to arbitrary truth tables.The procedureis to look at each line in the truth table that contains an outcome of 1 and write an ex-pression that provides unity for that line only. Then we combine the lines to achievethe desired table. Q1•Q2 is only unity for the first line,as can be seen from its column.Similarly, Q1•(^Q2) is unity for the second line and (^Q1)•Q2 is unity for the thirdline. Using the properties of + we can then combine the terms together in the form:

Q1•Q2+Q1•(^Q2)+(^Q1)•Q2 (1.9.11)

Using associative and identity properties, this gives the same result as Q1+Q2.We have replaced a simple expression with a much more complicated expression

in Eq.(1.9.11). The motivation for doing this is that the same procedure can be usedto represent any truth table. The general form we have constructed is called the dis-junctive normal form. We can construct a disjunc tive normal representation for anarbitrary binary function of binary variables. For example, given a specific binaryfunction of binary variables, f (Q1,Q2,Q3), we construct its truth table, e.g.,

Q1 Q2 Q3 f (Q1,Q2,Q3)

1 1 1 11 0 1 00 1 1 10 0 1 0 (1.9.12)1 1 0 01 0 0 10 1 0 00 0 0 0

The disjunctive normal form is given by:

f (Q1,Q2,Q3)=Q1•Q2•Q3+(^Q1)•Q2•Q3+Q1•(^Q2)•(^Q3) (1.9.13)

as can be verified by inspection. An analogous construction can represent any binaryfunction.

We have demonstrated that an arbitrary truth table can be constructed out of thethree operations (^,+, •). We say that these form a complete set of operations. Since



01adBARYAM_29412 3/10/02 10:17 AM Page 241

there are 2n lines in a truth table formed out of n binary variables, there are 22 npos-

sible functions of these n binary variables. Each is specified by a particular choice ofthe 2n possible outcomes. We have achieved a dramatic simplification by recognizingthat all of them can be written in terms of only three operators. We also know that atmost (1/2)n2n (^) operations, (n − 1) 2n (•) operations and 2n − 1 (+) operations arenecessary. This is the number of operations needed to represent the identity function1 in disjunctive normal form.

It is possible to further simplify the set of operations required. We can eliminateeither the + or the • operations and still have a complete set. To prove this we need onlydisplay an expression for either of them in terms of the remaining operations:

Q1•Q2=^((^Q1)+(^Q2))(1.9.14)

Q1+Q2=^((^Q1)•(^Q2))

Question 1.9.4: Verify Eq. (1.9.14).

Solution 1.9.4: They may be verified using DeMorgan’s Laws and the invo-lution property, or by construction of the truth tables, e.g.:

Q1 Q2 ^Q1 ^Q2 Q1•Q2 (^Q1) (^Q2)

1 1 0 0 1 01 0 0 1 0 10 1 1 0 0 10 0 1 1 0 1

(1.9.15) ❚

It is possible to go one step further and identify binary operations that can rep-resent all possible functions of binary variables. Two possibilities are the NAND (&)and NOR (|) operations defined by:

Q1 & Q2=^(Q1&Q2) → ^(Q1•Q2)(1.9.16)

Q1 | Q2=^(Q1|Q2) → ^(Q1+Q2)

Both the logical and Boolean forms are written above. The truth tables of these oper-ators are:

Q1 Q2 ^(Q1•Q2) ^(Q1 Q2)

1 1 0 01 0 1 0 (1.9.17)0 1 1 00 0 1 1

We can prove that each is complete by itself (capable of representing all binary func-tions of binary variables) by showing that they are capable of representing one of theearlier complete sets.We prove the case for the NAND operation and leave the NOR op-eration to Question 1.9.5.



01adBARYAM_29412 3/10/02 10:17 AM Page 242

^Q1=^(Q1•Q1)=Q1 & Q1(1.9.18)

(Q1•Q2)=^(^(Q1•Q2))=^(Q1 & Q2)=(Q1 & Q2) & (Q1 & Q2)

Question 1.9.5: Verify completeness of the NOR operation.

Solution 1.9.5: We can use the same formulas as in the proof of the com-pleteness of NAND by replacing • with + and & with | everywhere. ❚

1.9.4 Turing machinesWe have found that logical operators can represent any binary function of binary vari-ables. This means that all well-defined mathematical operations on integers can berepresented in this way. One of the implications is that we might make machines outof physical elements, each of which is capable of performing a Boolean operation.Such a machine would calculate a mathematical function and spare us a tedious task.We can graphically display the operations of a machine performing a series ofBoolean operations as shown in Fig. 1.9.1. This is a simplified symbolic form similarto forms used in the design of computer logic circuits.

By looking carefully at Fig. 1.9.1 we see that there are several additional kinds ofactions that are necessary in addition to the elementary Boolean operation. These ac-tions are indicated by the lines that might be thought of as wires. One action is totransfer information from the location where it is input into the system, to the placewhere it is used. The second is to duplicate the information. Duplication is repre-sented in the figure by a branching of the lines. The branching enables the same



Q1

Q2

Q1Q2

Q1| Q2^

Q1& Q2

Figure 1.9.1 Graphical representation of Boolean operations. The top figure shows a graph-ical element representing the NOR operation Q1

^|Q2 = ^(Q1|Q2). In the bottom figure we com-

bine several operations together with lines (wires) indicating input, output, data duplicationand transfer to form the AND operation, (Q1

^|Q1)

^|(Q2

^|Q2) = (^Q1)

^|(^Q2) = Q1&Q2. This equation

may be used to prove completeness of the NOR operation. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 243

information to be used in more than one place. Additional implicit actions involvetiming, because the representation makes an assumption that time causes the infor-mation to be moved and acted upon in a sequence from left to right. It is also neces-sary to have mechanisms for input and output.

The kind of mathematical machine we just described is limited to performingone prespecified function of its inputs. The process of making machines is time con-suming. To physically rearrange components to make a new function would be in-convenient. Thus it is useful to ask whether we might design a machine such that partof its input could include a specification of the mathematical operation to be per-formed. Both information describing the mathematical function,and the numbers onwhich it is to be performed, would be encoded in the input which could be describedas a string of binary characters.

This discussion suggests that we should systematically consider the properties/qualities of machines able to perform computations. The theory of computation is aself-consistent discussion of abstract machines that perform a sequence of prespeci-fied well-defined operations. It extends the concept of universality that was discussedfor logical operations. While the theory of logic determined that all Boolean functionscould be represented using elementary logic operations, the theory of computationendeavors to establish what is possible to compute using a sequence of more generalelementary operations. For this discussion many of the practical matters of computerdesign are not essential. The key question is to establish a relationship between ma-chines that might be constructed and mathematical functions that may be computed.Part of the problem is to define what a computation is.

There are several alternative models of computation that have been shown to beequivalent in a formal sense since each one of them can simulate any other. Turing in-troduced a class of machines that represent a particular model of computation.Rather than maintaining information in wires, Turing machines (Fig. 1.9.2) use astorage device that can be read and written to. The storage is represented as an infi-nite one-dimensional tape marked into squares. On the tape can be written charac-ters, one to a square. The total number of possible characters, the alphabet, is finite.These characters are often taken to be digits plus a set of markers (delimiters). In ad-dition to the characters,the tape squares can also be blank. All of the tape is blank ex-cept for a finite number of nonblank places. Operations on the tape are performed bya roving read-write head that has a sp ecified (finite) number of internal storage ele-ments and a simple kind of program encoded in it.We can treat the program as a tablesimilar to the tables discussed in the context of logic. The table operation acts uponthe value of the tape at the current location of the head,and the value of storage ele-ments within the read head. The result of an operation is not just a single binary value.Instead it corresponds to a change in the state of the tape at the current location(write),a change in the internal memory of the head,and a shift of the location of thehead by one square either to the left or to the right.

We can also think about a Turing machine (TM) as a dynamic system. The inter-nal table does not change in time. The internal state s(t),the current location l(t),thecurrent character a(t) and the tape c(t) are all functions of time. The table consists of



01adBARYAM_29412 3/10/02 10:17 AM Page 244

a set of instructions or rules of the form { ,s′,a′,s,a} corresponding to a deterministictransition matrix. s and a are the current internal state and current tape character re-spectively. s′ and a′ are the new internal state and character. is the move to be made,either right or left (R or L).

Using either conceptual model, the TM starts from an initial state and locationand a specified tape. In each time interval the TM head performs the following oper-ations:

1. Read the current tape character

2. Find the instruction that corresponds to the existing combination of (s,a)

3. Change the internal memory to the corresponding s′4. Write the tape with the corresponding character a′5. Move the head to the left or right as specified by

When the TM head reaches a special internal state known as the halt state, then theoutcome of the computation may be read from the tape. For simplicity, in what fol-lows we will indicate entering the halt state by a move = H which is to halt.

The best way to understand the operation of a TM is to construct particulartables that perform particular actions (Question 1.9.6). In addition to logical



R s2 1 , s1 1

L s1 1 , s1 0

R s1 1 , s2 0

H s2 1 , s2 1

0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0

s

Figure 1.9.2 Turing’s model of computation — the Turing machine (TM) — consists of a tapedivided into squares with characters of a finite alphabet written on it. A roving “head” indi-cated by the triangle has a finite number of internal states and acts by reading and writingthe tape according to a prespecified table of rules. Each rule consists of a command to readthe tape, write the tape, change the internal state of the TM head and move either to the leftor right. A simplified table is shown consisting of several rules of the form { , s′, a′, s, a}where a and a′ are possible tape characters, s and s′ are possible states of the head and isa movement of the head right (R), left (L) or halt (H). Each update the TM starts by findingthe rule { , s′, a′, s, a} in the table such that a is the character on the tape at the current lo-cation of the head, and s is its current state. The tape is written with the corresponding a′and the state of the TM head is changed to s′. Then the TM head moves according to the cor-responding right or left. The illustration simplifies the characters to binary digits 0 and 1and the states of the TM head to s1 and s2. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 245

operations, the possible actions include moving and copying characters.Constructing particular actions using a TM is tedious, in large part because themovements of the head are limited to a single displacement right or left. Actualcomputers use direct addressing that enables access to a particular storage locationin its memory using a number (address) specifying its location. TMs do not gener-ally use this because the tape is arbitrarily long, so that an address is an arbitrarilylarge number, requiring an arbitrarily large storage in the internal state of the head.Infinite storage in the head is not part of the computational model.

Question 1.9.6 The following TM table is designed to move a string ofbinary characters (0 and 1) that are located to the left of a special marker

M to blank squares on the tape to the right of the M and then to stop on theM. Blank squares are indicated by B. The internal states of the head are indi-cated by s1, s2 . . . These are not italicized, since they are values rather thanvariables. The movements of the head right and left are indicated by R andL. As mentioned above, we indicate entering the halt state by a movement H.Each line has the form { , s′, a′, s, a}.

Read over the program and convince yourself that it does what it is sup-posed to. Describe how it works. The TM must start from state s1 and mustbe located at the leftmost nonblank character. The line numbering is only forconvenience in describing the TM, and has no role in its operation.

1. R s2 B s1 02. R s3 B s1 13. R s2 0 s2 04. R s2 1 s2 15. R s2 M s2 M6. R s3 0 s3 07. R s3 1 s3 18. R s3 M s3 M (1.9.19)9. L s4 0 s2 B

10. L s4 1 s3 B11. L s4 0 s4 012. L s4 1 s4 113. L s4 M s4 M14. R s1 B s4 B15. H s1 M s1 M

Solution 1.9.6 This TM works by (lines 1 or 2) reading a nonblank char-acter (0 or 1) into the internal state of the head; 0 is represented by s2 and 1is represented by s3. The character that is read is set to a blank B. Then theTM moves to the right, ignoring all of the tape characters 0, 1 or M (lines 3through 8) until it reaches a blank B. It writes the stored character (lines 9 or10), changing its state to s4. This state specifies moving to the left,ignoringall characters 0,1 or M (lines 11 through 13) until it reaches a blank B. Then



01adBARYAM_29412 3/10/02 10:17 AM Page 246

(line 14) it moves one step right and resets its state to s1. This starts the pro-cedure from the beginning. If it encounters the marker M in the state s1 in-stead of a character to be copied, then it halts (line 15). ❚

Since each character can also be represented by a set of other characters (i.e.,2 inbinary is 10), we can allow the TM head to read and write not one but a finite pre-specified number of characters without making a fundamental change. The followingTM, which acts upon pairs of characters and moves on the tape by two characters ata time, is the same as the one given in Question 1.9.6.

1. 01 01 00 00 012. 01 11 00 00 113. 01 01 01 01 014. 01 01 11 01 115. 01 01 10 01 106. 01 11 01 11 017. 01 11 11 11 118. 01 11 10 11 10 (1.9.20)9. 10 10 01 01 00

10. 10 10 11 11 0011. 10 10 01 10 0112. 10 10 11 10 1113. 10 10 10 10 1014. 01 00 00 10 0015. 00 00 10 00 10

The particular choice of the mapping from characters and internal states onto thebinary representation is not unique. This choice is characterized by using the left andright bits to represent different aspects. In columns 3 or 5, which represent the tapecharacters, the right bit represents the type of element (marker or digit), and the leftrepresents which element or marker it is: 00 represents the blank B, 10 represents M,01 represents the digit 0,and 11 represents the digit 1. In columns 2 or 4, which rep-resent the state of the head,the states s1 and s4 are represented by 00 and 10, s2 and s3

are represented by 01 and 11 respectively. In column 1, moving right is 01, left is 10,and halt is 00.

The architecture of a TM is very general and allows for a large variety of actionsusing complex tables. However, all TMs can be simulated by transferring all of the re-sponsibility for the table and data to the tape.A TM that can simulate all TMs is calleda universal Turing machine (UTM). As with other TMs,the responsibility of arrang-ing the information lies with the “programmer.” The UTM works by representing thetable,current state,and current letter on the UTM tape. We will describe the essentialconcepts in building a UTM but will not explicitly build one.

The UTM acts on its own set of characters with its own set of internal states. Inorder to use it to simulate an arbitrary TM, we have to represent the TM on the tapeof the UTM in the characters that the UTM can operate on. On the UTM tape, we



01adBARYAM_29412 3/10/02 10:17 AM Page 247

must be able to represent four types of entities: a TM character, the state of the TMhead, the movement to be taken by the TM head, and markers that indicate to theUTM what is where on the tape. The markers are special to the UTM and must becarefully distinguished from the other three. For later reference, we will build a par-ticular type of UTM where the tape can be completely represented in binary.

The UTM tape has three parts,the part that represents the table of the TM,a workarea,and the part that represents the tape of the TM (Fig. 1.9.3). To represent the tapeand table of a particular but arbitrary TM, we start with a binary representation of itsalphabet and of its internal states

a1 → 00000, a2 → 00001, a3 → 00010, …(1.9.21)

s1 → 000, s2 → 001, …

where we keep the left zeros, as needed for the number of bits in the longest binarynumber. We then make a doubled binary representation like that used in the previousexample, where each bit becomes two bits with the low order bit a 1. The doubled bi-nary notation will enable us to distinguish between UTM markers and all other enti-ties on the tape. Thus we have:

a1 → 01 01 01 01 01, a2 → 01 01 01 01 11, a3 → 01 01 01 11 01, …(1.9.22)

s1 → 01 01 01, s2 → 01 01 11, …

These labels of characters and states are in a sense arbitrary, since the transition tableis what gives them meaning.

We also encode the movement commands. The movement commands are not ar-bitrary, since the UTM must know how to interpret them.We have allowed the TM todisplace more than one character, so we must encode a set of movements such as R1,L1, R2, L2, and H. These correspond respectively to moving one character right, onecharacter left, two characters right, two characters left, and entering the halt state.Because the UTM must understand the move that is to be made, we must agree once



Figure 1.9.3 The universal Turing machine (UTM) is a special TM that can simulate the com-putation performed by any other TM. The UTM does this by executing the rules of the TM thatare encoded on the tape of the UTM. There are three parts to the UTM tape, the part wherethe TM table is encoded (on the left), the part where the tape of the TM is encoded (on theright) and a workspace (in the middle) where information representing the current state ofthe TM head, the current character of the TM tape, and the movement command, are encoded.See the text for a description of the operation of the UTM based on its own rule table. ❚

M4

TM table TM TapeWorkspace

Universal Turing Machine

Current characterInternal stateMove

M2 M1 M5

01adBARYAM_29412 3/10/02 10:17 AM Page 248

and for all on a coding of these movements. We use the lowest order bit as a directionbit (1 Right, 0 Left) and the rest of the bits as the number of displacements in binary

R1 → 011, R2 → 101, …,

L1 → 010, L2 → 100, …, (1.9.23)

H → 000 or 001

The doubled binary representation is as before:each bit becomes two bits with the loworder bit a 1,

R1 → 01 11 11 , R2 → 11 01 11 , …,

L1 → 01 11 01 , L2 → 11 01 01 , …, (1.9.24)

H → 01 01 01 or 01 01 11

Care is necessary in the UTM design because we do not know in advance how manytypes of TM moves are possible.We also don’t know how many characters or internalstates the TM has. This means that we don’t know the length of their binary repre-sentations.

We need a number of markers that indicate to the UTM the beginning and endof encoded characters, states and movements described above. We also need markersto distinguish different regions of the tape. A sufficient set of markers are:

M1—the beginning of a TM character,

M2—the beginning of a TM internal state,

M3—the beginning of a TM table entry, which is also the beginning of a move-ment command,

M4—a separator between the TM table and the workspace,

M5—a separator between the workspace and the TM tape,

M6—the beginning of the current TM character (the location of the TM head),

M7—the identified TM table entry to be used in the current step, and

B—the blank, which we include among the markers.

Depending on the design of the UTM, these markers need not all be distinct. In anycase, we encode them also in binary

B → 000, M1 → 001, M2 → 010, … (1.9.25)

and then doubled binary form where the second character is now zero:

B → 00 00 00, M1 → 00 00 10, M2 → 00 10 00, … (1.9.26)

We are now in a position to encode both the tape and table of the TM on the tapeof the UTM. The representation of the table consists of a sequence of representationsof the lines of the table, L1L2..., where each line is represented by the doubled binaryrepresentation of

M3 M2 s′ M1 a′ M2 s M1a (1.9.27)



01adBARYAM_29412 3/10/02 10:17 AM Page 249

The markers are definite but the characters and states and movements correspond tothose in a particular line in the table.The UTM representation of the tape of the TM,a1a2 . . ., is a doubled binary representation of

M1 a1 M1 a2 M1 a3 . . . (1.9.28)

The workspace starts with the character M4 and ends with the character M5. There isroom enough for the representation of the current TM machine state,the current tapecharacter and the movement command to be executed. At a particular time in execu-tion it appears as:

M4 M2 s M1 a M5 (1.9.29)

We describe in general terms the operation of the UTM using this representationof a TM. Before execution we must indicate the starting location of the TM head andits initial state. This is done by changing the corresponding marker M1 to M6 (at theUTM tape location to the left of the character corresponding to the initial location ofthe TM), and the initial state of the TM is encoded in the workspace after M2.

The UTM starts from the leftmost nonblank character of its tape. It moves to theright until it encounters M6. It then copies the character after M6 into the work areaafter M1. It compares the values of (s,a) in the work area with all of the possible (s,a)pairs in the transition table pairs until it finds the same pair. It marks this table entrywith M7. The corresponding s′ from the table is copied into the work area after M2.The corresponding a′ is copied to the tape after M6. The corresponding movementcommand is copied to the work area after M4. If the movement command is H theTM halts. Otherwise, the marker M6 is moved according to the value of . It is movedone step at a time (i.e.,the marker M6 is switched with the adjacent M1) while decre-menting the value of the digits of (except the rightmost bit) and in the directionspecified by the rightmost bit.When the movement command is decremented to zero,the TM begins the cycle again by copying the character after M6 into the work area.

There is one detail we have overlooked: the TM can write to the left o f its non-blank characters. This would cause problems for the UTM we have designed,since tothe left of the TM tape representation is the workspace and TM table. There are vari-ous ways to overcome this difficulty. One is to represent the TM tape by folding itupon itself and interleaving the characters.Starting from an arbitrary location on theTM tape we write all characters on the UTM tape to the right of M5 , so that odd char-acters are the TM tape to the right, and even ones are the TM tape to the left.Movements of the M6 marker are doubled, and it is reflected (bounces) when it en-counters M5.

A TM is a dynamic system. We can reformulate Turing’s model of computationin the form of a cellular automaton (Section 1.5) in a way that will shed some light onthe dynamics that are being discussed. The most direct way to do this is to make anautomaton with two adjacent tapes. The only information in the second strip is a sin-gle nonblank character at the location of the head that represents its internal state.The TM update is entirely contained within the update rule of the automaton. Thisupdate rule may be constructed so that it acts at every point in the space, but is en-abled by the nonblank character in the adjacent square on the second tape. When the



01adBARYAM_29412 3/10/02 10:17 AM Page 250

dynamics reaches a steady state (it is enough that two successive states of the au-tomaton are the same),the computation is completed. If desired we could reduce thisCA to one tape by placing each pair of squares in the two tapes adjacent to each other,interleaving the two tapes. While a TM can be represented as a CA,any CA with onlya finite number of active cells can be updated by a Turing machine program (it is com-putable). There are many other CA that can be programmed by their initial state toperform computations. These can be much simpler than using the TM model as astarting point. One example is Conway’s Game of Life, discussed in Section 1.5.Likea UTM, this CA is a universal computer—any computation can be performed bystarting from some initial state and looking at the final steady state for the result.

When we consider the relationship of computation theory to dynamic systems,there are some intentional restrictions in the theory that should be recognized. Theconventional theory of computation describes a single computational unit operatingon a character string formed from a finite alphabet of characters. Thus, computationtheory does not describe a continuum in space,an infinite array of processors, or realnumbers. Computer operations only mimic approximately the formal definition ofreal numbers. Since an arbitrary real number requires infinitely many digits to spec-ify, computations upon them in finite time are impossible. The rejection by compu-tation theory of operations upon real numbers is not a trivial one. It is rooted in fun-damental results of computation theory regarding limits to what is inherentlypossible in any computation.

This model of computation as dynamics can be summarized by saying that acomputation is the steady-state result of a deterministic CA with a finite alphabet (fi-nite number of characters at each site) and finite domain update rule.One of the char-acters (the blank or vacuum) must be such that it is unchanged when the system isfilled with these characters. The space is infinite but the conditions are such that allspace except for a finite region must be filled with the blank character.

1.9.5 Computability and the halting problemThe construction of a UTM guarantees that if we know how to perform a particularoperation on numbers, we can program a UTM to perform this computation.However, if someone gives you such a program––can you determine what it will com-pute? This seemingly simple question turns out to be at the core of a central problemof logic theory. It turns out that it is not only difficult to determine what it will com-pute,it is,in a formal sense that will be described below, impossible to figure out if itwill compute anything at all. The requirement that it will compute something is thateventually it will halt. By halting, it declares its computation completed and the an-swer given. Instead of halting, it might loop forever or it might continue to write onever larger regions of tape. To say that we can determine whether it will computesomething is equivalent to saying that it will eventually halt. This is called the haltingproblem. How could we determine if it would halt? We have seen above how to rep-resent an arbitrary TM on the tape of a particular TM. Consistent with computationtheory, the halting problem is to construct a special TM, TH, whose input is a de-scription of a TM and whose output is a single bit that specifies whether or not the



01adBARYAM_29412 3/10/02 10:17 AM Page 251

TM will halt. In order for this to make sense,the program TH must itself halt. We canprove by contradiction that this is not possible in general, and therefore we say thatthe halting problem is not computable. The proof is based on constructing a para-doxical logical statement of the form “This statement is false.”

A proof starts by assuming we have a TM called TH that accepts as input a taperepresenting a TM Y and its tape y. The output, which can be represented in func-tional form as TH (Y, y), is always well-defined and is either 1 or 0 representing thestatement that the TM Y halts on y or doesn’t halt on y respectively. We now constructa logical contradiction by constructing an additional TM based on TH. First we con-sider TH (Y,Y), which asks whether Y halts when acting on a tape representing itself.We design a new TM TH 1 that takes only Y as input,copies it and then acts in the sameway as TH. So we have

TH1(Y) = TH (Y,Y) (1.9.30)

We now define a TM TH2 that is based on TH1 but whenever TH1 gives the answer0 it gives the answer 1,and whenever TH1 gives the answer 1 it enters a loop and com-putes forever. A moment’s meditation shows that this is possible if we have TH1.Applying TH 2 to itself then gives us the contradiction, since TH2(TH2) gives 1 if

TH1(TH2) = TH(TH2,TH2) = 0 (1.9.31)

By definition of TH this means that TH 2(TH 2) does not halt, which is a contradiction.Alternatively, TH2(TH 2) computes forever if

TH1(TH2) = TH(TH2,TH2) = 1

by definition of TH this means that TH2(TH 2) halts, which is a contradiction.The noncomputability of the halting problem is similar to Gödel’s theorem and

other results denying the completeness of logic, in the sense that we can ask a ques-tion about a logical construction that cannot be answered by it.Gödel’s theorem maybe paraphrased as: In any axiomatic formulation of number theory (i.e.,integers),itis possible to write a statement that cannot be proven T or F. There has been a lot ofdiscussion about the philosophical significance of these theorems.A basic conclusionthat may be reached is that they describe something about the relationship of the fi-nite and infinite. Turing machines can be represented,as we have seen, by a finite setof characters. This means that we can enumerate them, and they correspond one-to-one to the integers. Like the integers, there are (countably) infinitely many of them.Gödel’s theorem is part of our understanding of how an infinite set of numbers mustbe described. It tells us that we cannot describe their properties using a finite set ofstatements. This is appealing from the point of view of information theory since anarbitrary integer contains an arbitrarily large amount of information. The noncom-putability of the halting problem tells us more specifically that we can ask a questionabout a system that is described by a finite amount of information whose answer (inthe sense of computation) is not contained within it.We have thus made a vague con-nection between computation and information theory. We take this connection onestep further in the following section.



01adBARYAM_29412 3/10/02 10:17 AM Page 252

1.9.6 Computation and information in briefOne of our objectives will be to relate computation and information. We thereforeask, Can a calculation produce information? Let us think about the results of a TMcalculation which is a string of characters—the nonblank characters on the outputtape. How much information is necessary to describe it? We could describe it directly,or use a Markov model as in Section 1.8. However, we could also give the input of theTM and the TM description, and this would be enough information to enable us toobtain the output by computation. This description might contain more or fewercharacters than the direct description of the output. We now return to the problem ofdefining the information content of a string of characters. Utilizing the full power ofcomputation, we can define this as the length of the shortest possible input tape for aUTM that gives the desired character string as its output. This is called the algorith-mic (or Kolmogorov) complexity of a character string. We have to be careful with thedefinition, since there are many different possible UTM. We will discuss this ingreater detail in Chapter 8. However, this discussion does imply that a calculationcannot produce information. The information present at the beginning is sufficientto obtain the result of the computation. It should be understood, however, that theinformation that seems to us to be present in a result may be larger than the originalinformation unless we are able to reconstruct the starting point and the TM used forthe computation.

1.9.7 Logic, computation and human thoughtBoth logic and computation theory are designed to capture aspects of humanthought. A fundamental question is whether they capture enough of this process—are human beings equivalent to glorified Turing machines? We will ask this questionin several ways throughout the text and arrive at various conclusions,some of whichsupport this identification and some that oppose it.One way to understand the ques-tion is as one of progressive approximation. Logic was originally designed to modelhuman thought. Computation theory, which generalizes logic, includes additionalfeatures not represented in logic. Computers as we have defined them are instrumentsof computation. They are given input (information) specifying both program anddata and provide well-defined output an indefinite time later. One of the features thatis missing from this kind of machine is the continuous input-output interaction withthe world characteristic of a sensory-motor system. An appropriate generalization ofthe Turing machine would be a robot. As it is conceived and sometimes realized,a ro-bot has both sensory and motor capabilities and an embedded computer. Thus it hasmore of the features characteristic of a human being. Is this sufficient, or have wemissed additional features?

Logic and computation are often contrasted with the concept of creativity. Oneof the central questions about computers is whether they are able to simulate creativ-ity. In Chapter 3 we will produce a model of creativity that appears to be possible tosimulate on a computer. Hidden in this model, however, is a need to use randomnumbers. This might seem to be a minor problem, since we often use computers to



01adBARYAM_29412 3/10/02 10:17 AM Page 253

generate random numbers. However, computers do not actually generate random-ness,they generate pseudo-random numbers. If we recall that randomness is the sameas information, by the discussion in the previous section,a computer cannot gener-ate true randomness.A Turing machine cannot generate a result that has more infor-mation than it is given in its initial data. Thus creativity appears to be tied at least inpart to randomness, which has often been suggested, and this may be a problem forconventional computers. Conceptually, this problem can be readily resolved byadding to the description of the Turing machine an infinite random tape in additionto the infinite blank tape. This new system appears quite similar to the original TMspecification.A reasonable question would ask whether it is really inherently differ-ent. The main difference that we can ascertain at this time is that the new systemwould be capable of generating results with arbitrarily large information content,while the original TM could not. This is not an unreasonable distinction to make be-tween a creative and a logical system. There are still key problems with understandingthe practical implications of this distinction.

The subtle ty of this discussion increases when we consider that one branch oftheoretical computer science is based on the commonly believed assumption thatthere exist functions that are inherently difficult to invert—they can only be invertedin a time that grows exponentially with the length of the nonblank part of the tape.For all practical purposes, they cannot be inverted, because the estimated lifetime ofthe universe is insufficient to invert such functions. While their existence is notproven, it has been proven that if they do exist, then such a function can be used togenerate a string of characters that, while not random, cannot be distinguished froma random string in less than exponential time. This would suggest that there can beno practical difference between a TM with a random tape,and one without. Thus,thepossibility of the existence of noninvertible functions is intimately tied to questionsabout the relationship between TM, randomness and human thought.

1.9.8 Using computation and information to describe thereal world

In this section we review the fundamental relevance of the theories of computationand information in the real world. This relevance ultimately arises from the proper-ties of observations and measurements.

In our ob s erva ti ons of the worl d , we find that qu a n ti ties we measu re va ry. In deed ,wi t h o ut va ri a ti on there would be no su ch thing as an ob s erva ti on . Th ere are va ri a ti on sover time as well as over space . Our intell ectual ef fort is ded i c a ted to cl a s s i f ying or un-derstanding this va ri a ti on . To con c reti ze the discussion , we con s i der ob s erva ti ons of ava ri a ble s wh i ch could be as a functi on of time s(t) or of s p ace s(x) . Even though x ort m ay appear con ti nu o u s , our ob s erva ti ons may of ten be de s c ri bed as a finite discretes et {si} . One of the cen tral (met a ) ob s erva ti ons abo ut the va ri a ti on in va lue of {si} is thats om etimes the va lue of the va ri a ble si can be inferred from , is correl a ted wi t h , or is noti n depen dent from its va lue or va lues at some other time or po s i ti on sj .

These concepts have to do with the relatedness of si to sj . Why is this important?The reason is that we would like to know the value of si without having to observe it.



01adBARYAM_29412 3/10/02 10:17 AM Page 254

We can understand this as a problem in prediction—to anticipate events that will oc-cur. We would also like to know what is located at unobserved positions in space;e.g.,around the corner. And even if we have observed something, we do not want to haveto remember all observations we make. We could argue more fundamentally thatknowledge/information is important only ifprediction is possible. There would be noreason to remember past observations if they were uncorrelated with anything in thefuture. If correlations enable prediction,then it is helpful to store information aboutthe past. We want to store as little as possible in order to make the prediction. Why?Because storage is limited, or because accessing the right information requires asearch that takes time. If a search takes more time than we have till the event we wantto predict, then the information is not useful. As a corollary (from a simplified utili-tarian point of view), we would like to retain only information that gives us the best,most rapid prediction, under the most circumstances, for the least storage.

Inference is the process of logic or computation. To be able to infer the state of avariable si means that we have a definite formula f(sj) that will g ive us the value o f si

with complete certainty from a knowledge of sj . The theory of computation describeswhat functions f are possible. If the index i corresponds to a later time than j we saythat we can predict its value. In addition to the value of sj we need to know the func-tion f in order to predict the value of si. This relationship need not be from a singlevalue sj to a single value si. We might need to know a collection of values {sj } in orderto obtain the value of si from f ({sj }).

As part of our experience of the world, we have learned that observations at a par-ticular time are more closely related to observations at a previous time than observa-tions at different nearby locations. This has been summarized by the principle ofcausality. Causality is the ability to determine what happens at one time from whathappened at a previous time. This is more explicitly stated as microcausality—whathappens at a particular time and place is related to what happened at a previous timein its immediate vicinity. Causality is the principle behind the notion of determinism,which suggests that what occurs is determined by prior conditions. One of the waysthat we express the relationship between system observations over time is by conser-vation laws. Conservation laws are the simplest form of a causal relationship.

Correlation is a looser relationship than inference. The statement that values si

and sj are correlated implies that even if we cannot tell exactly what the value si is froma knowledge of sj , we can describe it at least partially. This partial knowledge may alsobe inherently statistical in the context of an ensemble of values as discussed below.Correlation often describes a condition where the values si and sj are similar. If theyare opposite, we might say they are anticorrelated. However, we sometimes use theterm “correlated”more generally. In this case,to say that si and sj are correlated wouldmean that we can construct a function f (sj) which is close to the value of si but not ex-actly the same. The degree of correlation would tell us how close we expect them tobe. While correlations in time appear to be more central than correlations in space,systems with interactions have correlations in both space and time.

Concepts of relatedness are inherently of an ensemble nature. This means thatthey do not refer to a particular value si or a pair of values (si, sj) but rather to a



01adBARYAM_29412 3/10/02 10:17 AM Page 255

collection of such values or pairs. The ensemble nature of relationships is often moreexplicit for correlations, but it also applies to inference. This ensemble nature is hid-den by func tional terminology that describes a relationship between particular val-ues. For example, when we say that the temperature at 1:00 P.M. is correlated with thetemperature at 12:00 P.M., we are describing a relationship between two temperaturevalues. Implicitly, we are describing the collection of all pairs of temperatures on dif-ferent days or at different locations. The set of such pairs are analogs. The concept ofinference also generally makes sense only in reference to an ensemble. Let us assumefor the moment that we are discussing only a single value si . The statement of infer-ence would imply that we can obtain si as the value f(sj). For a single value,the easiestway (requiring the smallest amount of information) to specify f (sj) would be to spec-ify si. We do not gain by using inference for this single case. However, we can gain ifwe know that, for example,the velocity of an object will remain the same if there areno forces upon it. This describes the velocity v(t) in terms of v(t ′) of any one objectout of an ensemble of objects. We can also gain from inference if the function f (sj)gives a string of more than one si.

The notion of independence is the opposite of inference or correlation. Two val-ues si and sj are independent if there is no way that we can infer the value of one fromthe other, and if they are not correlated. Randomness is similar to independence. Theword “independent” is used when there is no correlation between two observations.The word “random” is stronger, since it means that there is no correlation between anobserved value and anything else. A random process,like a sequence of coin tosses,isa sequence where each value is independent of the others. We have seen in Section 1.8that randomness is intimately related with information. Random processes are un-predictable,therefore it makes no sense for us to try to accumulate information thatwill help predict it. In this sense, a random process is simple to describe. However,once a random process has occurred,other events may depend upon it. For example,someone who wins a lottery will be significantly affected by an event presumed to berandom. Thus we may want to remember the results of the random process after itoccurs. In this case we must remember each value. We might ask, Once the randomprocess has occurred, can we summarize it in some way? The answer is that we can-not. Indeed, this property has been used to define randomness.

We can abstract the problem of prediction and description of observations to theproblem of data compression. Assume there are a set of observations {si} for which wewould like to obtain the shortest possible description from which we can reconstructthe complete set of observations. If we can infer one value from another, then the setmight be compressed by eliminating the inferable values. However, we must makesure that the added information necessary to describe how the inference is to be doneis less than the information in the eliminated values. Correlations also enable com-pression. For example,let us assume that the values are biased ON with a probabilityP(1) = .999 and OFF with a probability P (−1) = 0.001. This means that one in a thou-sand values is OFF and the others are ON. In this case we can remember which onesare OFF rather than keeping a list of all of the values.We would say they are ON exceptfor numbers 3, 2000,2403,5428, etc. This is one way of coding the information. This



01adBARYAM_29412 3/10/02 10:17 AM Page 256

method of encoding has a problem in that the numbers representing the locations ofthe OFF values may become large. They will be correlated because the first few digitsof successive locations will be the same (…,431236,432112,434329,…). We can fur-ther reduce the list if we are willing to do some more processing, by giving the inter-vals between successive OFF values rather than the absolute numbers of their location.

Ultimately, when we have reached the limits of our ability to infer one observa-tion from another, the rest is information that we need. For example, differentialequations are based on the presumption that boundary conditions (initial conditionsin time,and boundary conditions in space) are sufficient to predict the behavior of asystem. The values of the initial conditions and the boundary conditions are the in-formation we need. This simple model of a system, where information is clearly andsimply separated from the problem of computation, is not always applicable.

Let us assume that we have made extensive observations and have separated fromthese observations a minimal set that then can be used to infer all the rest.A minimalset of information would have the property that no one piece of information in itcould be obtained from other pieces of information. Thus,as far as the set itself is con-cerned, the information appears to be random. Of course we would not be satisfiedwith any random set; it would have to be this one in particular, because we want touse this information to tell us about all of the actual observations.

One of the difficulties with random numbers is that it is inherently difficult toprove that numbers are random. We may simply not have thought of the right func-tion f that can predict the value of the next number in a sequence from the previousnumbers. We could argue that this is one of the reasons that gambling is so attractiveto people because of the use of “lucky numbers” that are expected by the individualto have a better-than-random chance of success. Indeed,it is the success of science tohave shown that apparently uncorrelated events may be related. For example, thefalling of a ball and the motion of the planets. At the same time, science provides aframework in which noncausal correlations, otherwise called superstitions, arerejected.

We have argued that the purpose of knowledge is to succinctly summarize infor-mation that can be used for prediction. Thus,in its most abstract form, the problemof deduction or prediction is a problem in data compression. It can thus be arguedthat science is an exercise in data compression. This is the essence of the principle ofOccam’s razor and the importance of simplicity and universality in science.The moreuniversal and the more general a law is,and the simpler it is,then the more data com-pression has been achieved. Often this is considered to relate to how valuable is thecontribution of the law to science. Of course, even if the equations are general andsimple,if we cannot solve them then they are not particularly useful from a practicalpoint of view. The concept of simplicity has always been poorly defined. While scienceseeks to discover correlations and simplifications in observations of the universearound us,ultimately the minimum description of a system (i.e.,the universe) is givenby the number of independent pieces of information required to describe it.

Our understanding of information and computation enters also into a discussionof our models of systems discussed in previous sections. In many of these models, we



01adBARYAM_29412 3/10/02 10:17 AM Page 257

assumed the existence of random variables, or random processes. This randomnessrepresents either unknown or complex phenomena. It is important to recognize thatthis represents an assumption about the nature of correlations between different as-pects of the problem that we are modeling. It assumes that the random process is in-dependent of (uncorrelated with) the aspects of the system we are explicitly studying.When we model the random process on a computer by a pseudo-random numbergenerator, we are assuming that the computations in the pseudo-random numbergenerator are also uncorrelated with the system we are studying. These assumptionsmay or may not be valid, and tests of them are not generally easy to perform.

Fractals, Scaling and Renormalization

The physics of Newton and the related concepts of calculus, which have dominatedscientific thinking for three hundred years,are based upon the understanding that atsmaller and smaller scales—both in space and in time—physical systems become sim-ple,smooth and without detail.A more careful articulation of these ideas would notethat the fine scale structure of planets, materials and atoms is not without detail.However, for many problems, such detail becomes irrelevant at the larger scale. Sincethe details are irrelevant, formulating theories in a way that assumes that the detaildoes not exist yields the same results as a more exact description.

In the treatment of complex systems, including various physical and biologicalsystems,there has been a recognition that the concept of progressive smoothness onfiner scales is not always a useful mathematical starting point. This recognition is animportant fundamental change in perspective whose consequences are still beingexplored.

We have already discussed in Section 1.1 the subject of chaos in iterative maps. Inchaotic maps, the smoothness of dynamic behavior is violated. It is violated becausefine scale details matter. In this section we describe fractals,mathematical models ofthe spatial structure of systems that have increasing detail on finer scales.Geometricfractals have a self-similar structure, so that the structure on the coarsest scale is re-peated on finer length scales. A more general framework in which we can articulatequestions about systems with behavior on all scales is that of scaling theory intro-duced in Section 1.10.3.One of the most powerful analytic tools for studying systemsthat have scaling properties is the renormalization group. We apply it to the Isingmodel in Section 1.10.4, and then return full cycle by applying the renormalizationgroup to chaos in Section 1.10.5.A computational technique,the multigrid method,that enables the description o f problems on multiple scales is discussed in Section1.10.6. Finally, we discuss briefly the relevance of these concepts to the study of com-plex systems in Section 1.10.7.

1.10.1 FractalsTraditional geometry is the study of the properties of spaces or objects that have in-tegral dimensions. This can be generalized to allow effective fractional dimensions ofobjects, called fractals, that are embedded in an integral dimension space. In recent

1.10



01adBARYAM_29412 3/10/02 10:17 AM Page 258

years the recognition that fractals can play an important role in modeling natural phe-nomena has fueled a whole area of research investigating the occurrence and proper-ties of fractal objects in physical and biological systems.

Fractals are of ten def i n ed as geom etric obj ects whose spatial stru ctu re is sel f -s i m i l a r. This means that by magn i f ying one part of the obj ect , we find the same stru c-tu re as of the ori ginal obj ect . The obj ect is ch a racteri s ti c a lly form ed out of a co ll ec-ti on of el em en t s : poi n t s , line segm en t s , planar secti ons or vo lume el em en t s . Th e s eel em ents exist in a space of the same or high er dimen s i on to the el em ents them s elve s .For ex a m p l e , line segm ents are on e - d i m en s i onal obj ects that can be found on a line,p l a n e , vo lume or high er dimen s i onal space . We might begin to de s c ri be a fractal bythe obj ects of wh i ch it is form ed . However, geom etric fractals are of ten de s c ri bed bya procedu re (algorithm) that cre a tes them in an ex p l i c i t ly self-similar manner.

One of the simplest examples of a fractal object is the Cantor set (Fig. 1.10.1).This set is formed by a procedure that starts from a single line segment. We removethe middle third from the segment. There are then two line segments left.We then re-move the middle third from both of these segments, leaving four line segments.Continuing iteratively, at the kth iteration there are 2k segments. The Cantor set,which is the limiting set of points obtained from this process,has no line segments init. It is self-similar by direct construction,since the left and right third of the originalline segment can be expanded by a factor of three to appear as the original set.

An analog of the Cantor set in two dimensions is the Sierpinski gasket(Fig. 1.10.2). It is constructed from an equilateral triangle by removing an internal tri-angle which is half of the size of the original triangle. This procedure is then iteratedfor all of the smaller triangles that result. We can see that there are no areas that areleft in this shape. It is self-similar, since each of the three corner triangles can be ex-panded by a factor of two to appear as the original set.

For self-similar objects, we can obtain the effective fractal dimension directly byconsidering their composition from parts. We do this by analogy with conventional

F ra c t a l s , s c a l i ng an d re n o r m a l i z a t i o n 259


(0)

(1)

(2)

(3)

(4)

Figure 1.10.1 Illustration of the construction of the Cantor set, one of the best-known frac-tals. The Cantor set is formed by iteratively removing the middle third from a line segment,then the middle third from the two remaining line segments, and so on. Four iterations of theprocedure are shown starting from the complete line segment at the top. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 259

geometric objects which are also self-similar. For example,a line segment,a square, ora cube can be formed from smaller objects of the same type. In general, for a d-di-mensional cube, we can form the cube out of smaller cubes. If the size of the smallercubes is reduced from that of the large cube by a factor of , where is inversely pro-portional to their diameter, ∝ 1/R, then the number of smaller cubes necessary toform the original is N = d. Thus we could obtain the dimension as:

d = ln(N) / ln( ) (1.10.1)

For self-similar fractals we can do the same, where N is the number of parts that makeup the whole.Each of the parts is assumed to have the same shape, but reduced in sizeby a factor of from the original object.

We can gen era l i ze the def i n i ti on of f ractal dimen s i on so that we can use it toch a racteri ze geom etric obj ects that are not stri ct ly sel f - s i m i l a r. Th ere is more thanone way to gen era l i ze the def i n i ti on . We wi ll adopt an intu i tive def i n i ti on of f ract a ld i m en s i on wh i ch is cl o s ely rel a ted to Eq . ( 1 . 1 0 . 1 ) . If the obj ect is em bed ded in d- d i-m en s i on s , we cover the obj ect with d- d i m en s i onal disks. This is illu s tra ted in Fig.1.10.3 for a line segm ent and a rect a n gle in a two - d i m en s i onal space . If we cover theobj ect with two - d i m en s i onal disks of a fixed rad iu s , R, using the minimal nu m ber ofdisks po s s i bl e , the nu m ber of these disks ch a n ges with the rad ius of the disks ac-cording to the power law:

N(R) ∝ R−d (1.10.2)

where d is defined as the fractal dimension. We note that the use of disks is only illus-trative. We could use squares and the result can be proven to be equivalent.



F i g u re 1.10.2 T he Sie r p i nski gasket is fo r med in a similar ma n ner to the Cantor set. Startingf rom an equilateral tria ng l e, a similar tria ngle one half the size is re moved from the middle leav-i ng three tria ngles at the corne r s. The pro c e du re is then iteratively applied to the re ma i n i ng tri-a ng l e s. The fig u re shows the set that results after four itera t io ns of the pro c e du re. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 260

We can use either Eq. (1.10.1) or Eq. (1.10.2) to calculate the dimension of theCantor set and the Sierpinski gasket. We illustrate the use of Eq. (1.10.2). For theCantor set, by construction, 2k disks (or line segments) of radius 1/3k will cover theset. Thus we can write:

N(R / 3k) = 2k N(R) (1.10.3)

F ra c t a l s , s ca l i n g a nd re n o r m a l i z a t i o n 261


(b)

(a)

(c)

F i g u re 1.10.3 In order to de f i ne the dime ns ion of a fractal object, we cons ider the problem ofc o v e r i ng a set with a minimal number of disks of radius R. (a) shows a line segme nt with thre ed i f f e re nt coverings superimposed. (b) and (c) show a re c t a ngle with two differe nt coverings re-s p e c t i v e l y. As the size of the disks de c reases the number of disks necessary to cover the shapeg rows as R−d. This behavior becomes exact only in the limit R → 0. The fractal dime ns ion de-f i ned in this way is some t i mes called the box - c o u nt i ng dime ns ion, because d- d i me ns io nal boxe sa re often used ra t her than disks. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 261

Using Eq. (1.10.2) this is:

(R / 3k)−d = 2k R−d (1.10.4)

or:

3d = 2 (1.10.5)

which is:

d = ln(2) / ln(3) ≅ 0.631 (1.10.6)

We would arrive at the same result more directly from Eq. (1.10.1).For the Sierpinski gasket, we similarly recognize that the set can be covered by

three disks of radius 1/2, nine disks of radius 1/4,and more generally 3k disks of ra-dius 1/2k. This gives a dimension of:

d = ln(3) / ln(2) ≅ 1.585 (1.10.7)

For these fractals there is a deterministic algorithm that is used to generate them.We can also consider a kind of stochastic fractal generated in a similar way, however,at each level the algorithm involves choices made from a probability distribution. Thesimplest modification of the sets is to assume that at each level a choice is made withequal probability from several possibilities.For example,in the Cantor set, rather thanremoving the middle third from each of the line segments, we could choose at ran-dom which of the three thirds to remove. Similarly for the Sierpinski gasket, we couldchoose which of the four triangles to remove at each stage. These would be stochasticfractals,since they are not described by a deterministic self-similarity but by a statis-tical self-similarity. Nevertheless, they would have the same fractal dimension as thedeterministic fractals.

Question 1.10.1 How does the dimension of a fractal,as defined by Eq.(1.10.2), depend on the dimension of the space in which it is embedded?

Solution 1.10.1 The dimen s i on of a fractal is indepen dent of the di-m en s i on of the space in wh i ch it is em bed ded . For ex a m p l e , we migh ts t a rt with a d- d i m en s i onal space and increase the dimen s i on of the spaceto d + 1 dimen s i on s . To show that Eq . (1.10.2) is not ch a n ged , we form acovering of the fractal by d + 1 dimen s i onal sph eres whose inters ecti onwith the d- d i m en s i onal space is the same as the covering we used for thea n a lysis in d d i m en s i on s . ❚

Question 1.10.2 Prove that the fractal dimension does not change if weuse squares or circles for covering an object.

Solution 1.10.2 Assume that we have minimal coverings of a shape usingN1(R) = c1R−d1 squares, and minimal coverings by N2(R) = c2R−d2 circles,with d1 ≠ d2. The squares are characterized using R as the length of their side,while the circles are characterized using R as their radius. If d1 is less than d2,then for smaller and smaller R the number of disks becomes arbitrarilysmaller than the number of squares. However, we can cover the same shape



01adBARYAM_29412 3/10/02 10:17 AM Page 262

using squares that circumscribe the disks. The number of these squares isN ′1(R) = c1(R / 2)−d1. This is impossible, because for small enough R, N ′1(R)will be smaller than N 1(R), which violates the assumption that the latter is aminimal covering. Similarly, if d is greater than d ′, we use disks circum-scribed around the squares to arrive at a contradiction. ❚

Question 1.10.3 Calculate the fractal dimension of the Koch curve givenin Fig. 1.10.4.

Solution 1.10.3 The Koch curve is composed out of four Koch curves re-duced in size from the original by a factor of 3. Thus, the fractal dimensionis d = ln(4) / ln(3) ≈ 1.2619. ❚

Question 1.10.4 Show that the length of the Koch curve is infinite.

Solution 1.10.4 The Koch curve can be con s tru cted by taking out the mid-dle third of a line segm ent and inserting two segm ents equ iva l ent to the on ethat was rem oved . Th ey are inserted so as to make an equ i l a teral tri a n gle wi t hthe rem oved segm en t . Thu s , at every itera ti on of the con s tru cti on procedu re ,the length of the peri m eter is mu l ti p l i ed by 4/3 , wh i ch means that it diver ge sto infinity. It can be proven more gen era lly that any fractal of d i m en s i on 2 >d > 1 must have an infinite length and zero are a ,s i n ce these measu res of s i zea re for on e - d i m en s i onal and two - d i m en s i onal obj ects re s pectively. ❚

Eq. (1.10.2) neglects the jumps in N(R) that arise as we vary the radius R. SinceN(R) can only have integral values,as we lower R and add additional disks there arediscrete jumps in its value. It is conventional to define the fractal dimension by takingthe limit of Eq.(1.10.2) as R → 0, where this problem disappears. This approach,how-ever, is linked philosophically to the assumption that systems simplify in the limit ofsmall length scales. The assumption here is not that the system becomes smooth andfeatureless, but rather that the fractal properties will continue to all finer scales andremain ideal. In a physical system,the fractal dimension cannot be taken in this limit.Thus, we should allow the definition to be applied over a limited domain of lengthscales as is appropriate for the problem. As long as the domain of length scales is large,we can use this definition. We then solve the problem of discrete jumps by treating theleading behavior of the function N(R) over this domain.

The problem of treating distinct dimensions at different length scales is only oneof the difficulties that we face in discussing fractal systems. Another problem is inho-mogeneity. In the following section we discuss objects that are inherently inhomoge-neous but for which an alternate natural definition of dimension can be devised todescribe their structure on all scales.

1.10.2 TreesItera tive procedu res like those used to make fractals can also be used to make geo-m etric obj ects call ed tree s . An example of a geom etric tree , wh i ch be a rs va g u e

F ra c t a l s , s c a l i n g a n d r e n o r m a l i z a t i o n 263


01adBARYAM_29412 3/10/02 10:17 AM Page 263

re s em bl a n ce to physical tree s , is shown in Fig. 1 . 1 0 . 5 . The tree is form ed by starti n gwith a single obj ect (a line segm en t ) , scaling it by a factor of 1/2 , du p l i c a ting it twotimes and attaching the parts to the ori ginal obj ect at its bo u n d a ry. This process ist h en itera ted for each of the re su l ting part s . The itera ti ons cre a te stru ctu re on finerand finer scales.



(0)

(1)

(2)

(3)

(4)

Figure 1.10.4 Illustration of the starting line segment and four successive stages in the for-mation of the Koch curve. For further discussion see Questions 1.10.3 and 1.10.4. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 264

We can generalize the definition of a tree to be a set formed by iteratively addingto an object copies of itself. At iteration t, the added objects are reduced in size by afactor t and duplicated N t times, the duplicated versions being rotated and thenshifted by vectors whose lengths converge to zero as a function of t. A tree is differentfrom a fractal because the smaller versions of the original object, are not containedwithin the original object.

The fractal dimen s i on of trees is not as stra i gh tforw a rd as it is for sel f - s i m i l a rf ract a l s . The ef fective fractal dimen s i on can be calculated ; h owever, it gives re su l t sthat are not intu i tively rel a ted to the tree stru ctu re . We can see why this is a probl emin Fig. 1 . 1 0 . 6 . The dimen s i on of the regi on of the tree wh i ch is above the size R is thatof the em bed ded en ti ty (line segm en t s ) , while the fractal dimen s i on of the regi onwh i ch is less than the size R is determ i n ed by the spatial stru ctu re of the tree . Bec a u s eof the ch a n ging va lue of R in the scaling rel a ti on , an interm ed i a te va lue for the frac-tal dimen s i on would typ i c a lly be found by a direct calculati on (Questi on 1.10.5).

It is reasonable to avoid this problem by classifying trees in a different categorythan fractals. We can define the tree dimension by considering the self-similarity ofthe tree structure using the same formula as Eq. (1.10.1), but now applying the defi-nition to the number N and scaling of the displaced parts of the generating struc-ture, rather than the embedded parts as in the fractal. In Section 1.10.7 we will en-counter a treelike structure; however, it will be more useful to describe it rather thanto give a dimension that might characterize it.

Question 1.10.5 A simple version of a tree can be constructed as a set ofpoints {1/k } where k takes all positive integer values.The tree dimension

F ra c t a l s , s c a l i n g an d re n o r m a l i z a t i o n 265


Figure 1.10.5 A geometric treeformed by an iterative algorithmsimilar to those used in formingfractals. This tree can be formedstarting from a single line seg-ment. Two copies of it are then re-duced by a factor of 2, rotated by45˚ left and right and attached atone end. The procedure is repeatedfor each of the resulting line seg-ments. Unlike a fractal, a tree isnot solely composed out of partsthat are self-similar. It is formedout of self-similar parts, alongwith the original shape — itstrunk. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 265

of this set is zero because it can be formed from a point which is duplicatedand then displaced by progressively smaller vectors. Calculate the fractal di-mension of this set.

Solution 1.10.5 We construct a covering of scale R from line segments ofthis length. The covering that we construct will be formed out of two parts.One part is constructed from segments placed side by side. This part startsfrom zero and covers infinitely many points of the set. The other part is con-structed from segments that are placed on individual points. The crossingpoint between the two sets can be calculated as the value of k where the dif-ference between successive points is R. For k below this value,it is not possi-ble to include more than one point in one line segment. For k above thisvalue, there are two or more points per line segment. The critical value of kis found by setting:

(1.10.8)

or kc = R−1/2. This means that the number of segments needed to cover indi-vidual points is given by this value. Also, the number of segments that areplaced side by side must be enough to go up to this point, which has the value1/kc . This number of segments is given by

(1.10.9)

1 kc

R= R−1/2 ≈ kc

1

kc

−1

kc +1=

1

kc (kc +1)≈

1

kc2

= R



F i g u re 1 . 1 0 . 6 I l l u s t ra t io nof the covering of a geomet-ric tree by disks. The cover-ing shows that the larger-scale structures of the tree(the trunk and first branchesin this case) have an effec-tive dimension given by thedimension of their com-p o ne nt s. The smaller scales t r uc t u res have a dime n-s ion that is de t e r m i ned byt he algorithm used to ma ket he t re e. This inho mo ge ne-ity implies that the fra c t a ld i me ns ion is not always thena t u ral way to describe thet re e. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 266

Thus we must cover the line segment up to the point R1/2 with R−1/2 line seg-ments, and use an additional R−1/2 line segments to cover the rest of thepoints. This gives a total number of line segments in a covering of 2R−1/2. Thefractal dimension is thus d = 1/2.

We could have used fewer line segments in the covering by coveringpairs of points and triples of points rather than covering the whole line seg-ment below 1/kc . However, each partial covering of the set that is concernedwith pairs, triples and so on consists of a number of segments that grows asR−1/2. Thus our conclusion remains unchanged by this correction. ❚

Trees il lustrate only one example of how system properties may exist on manyscales, but are not readily described as fractals in the conventional sense. In order togeneralize our concepts to enable the discussion of such properties, we will introducethe concept of scaling.

1.10.3 ScalingGeometric fractals suggest that systems may have a self-similar structure on all lengthscales. This is in contrast with the more typical approach of science, where there is aspecific scale at which a phenomenon appears. We can think about the problem of de-scribing the behavior of a system on multiple length scales in an abstract manner. Aphenomenon (e.g., a measurable quantity) may be described by some function ofscale, f (x). Here x represents the characteristic scale rather than the position. Whenthere is a well-defined length scale at which a particular effect occurs, for longer lengthscales the function would typically decay exponentially:

f(x) ∼ e−x /λ (1.10.10)

This functional dependence implies that the characteristic scale at which this prop-erty disappears is given by .

In order for a system property to be relevant over a large range of length scales,itmust vary more gradually than exponentially. In such cases, typically, the leading be-havior is a power law:

f (x) ∼ x (1.10.11)

A function that follows such power-law behavior can also be characterized by the scal-ing rule:

f (ax) = a f (x) (1.10.12)

This means that if we ch a racteri ze the sys tem on one scale, t h en on a scale that is largerby the factor a it has a similar appe a ra n ce , but scaled by the factor a . is call ed the scal-ing ex pon en t . In con trast to the beh avi or of an ex pon en ti a l , for a power law there is nop a rticular length at wh i ch the property disappe a rs . Thu s , it may ex tend over a wi dera n ge of l ength scales. Wh en the scaling ex pon ent is not an integer, the functi on f (x) isn on a n a lyti c . Non - a n a lyti c i ty is of ten indicative of a property that cannot be tre a ted bya s suming that it becomes smooth on small or large scales. However, f racti onal scalingex pon ents are not nece s s a ry in order for power- l aw scaling to be app l i c a bl e .

F ra c t a l s , s ca l i n g a n d r e n o r m a l i z a t i o n 267


01adBARYAM_29412 3/10/02 10:17 AM Page 267

Even when a system property follows power-law scaling, the same behavior can-not continue over arbitrarily many length scales. The disappearance of a certainpower law may occur because of the appearance of a new behavior on a longer scale.This change is characterized by a crossover in the scaling properties of f (x). An ex-ample of crossover occurs when we have a quantity whose scaling behavior is

(1.10.13)

If A1 > A2 and 1 < 2 then the first term will dominate at smaller length scales, andthe second at larger length scales. Alternatively, the power-law behavior may eventu-ally succumb to exponential decay at some length scale.

There are three related approaches to applying the concept of scaling in model orphysical systems. The first approach is to consider the scale x to be the physical size ofthe system, or the amount of matter it contains. The quantity f (x) is then a propertyof the system measured as the size of the system changes. The second approach is tokeep the system the same, but vary the scale o f our observation. We assume that ourability to observe the system has a limited degree of discernment of fine details—afinest scale of observation. Finer details are to be averaged over or disregarded. Bymoving toward or away from the system, we change the physical scale at which ourobservation can no longer discern details. x then represents the smallest scale at whichwe can observe variation in the system st ructure. Finally, in the third approach weconsider the relationship between a property measured at one location in the systemand the same property measured at another location separated by the distance x. Thefunction f (x) is a correlation of the system measurements as a function of the distancebetween regions that are being considered.

Examples of quantities that follow scaling relations as a function of system sizeare the extensive properties of thermodynamic systems (Section 1.3) such as the en-ergy, entropy, free energy, volume, number of particles and magnetization:

U(ax) = adU(x) (1.10.14)

These properties measure quantities of the whole system as a function of system size.All have the same scaling exponent—the dimension of space. Intrinsic thermody-namic quantities are independent of system size and therefore also follow a scaling be-havior where the scaling exponent is zero.

An o t h er example of scaling can be found in the ra n dom walk (Secti on 1.2). Wecan gen era l i ze the discussion in Secti on 1.2 to all ow a walk in d d i m en s i ons by ch oo s-ing steps wh i ch are ±1 in each dimen s i on indepen den t ly. A ra n dom walk of N s teps int h ree dimen s i ons can be thought of as a simple model of a molecule form ed as a ch a i nof m o l ecular units—a po lym er. If we measu re the avera ge distance bet ween the en d sof the chain as a functi on of the nu m ber of s teps R(N) , we have the scaling rel a ti on :

R(aN) = a1/2 R(N) (1.10.15)

This scaling of distance traveled in a random walk with the number of steps taken isindependent of dimension. We will consider random walks and other models of poly-mers in Chapter 5.

f (x) ~ A1x1 + A2x 2



01adBARYAM_29412 3/10/02 10:17 AM Page 268

Often our interest is in knowing how different parts of the system affect eachother. Direct interactions do not always reflect the degree of influence. In complex sys-tems, in which many elements are interacting with each other, there are indirectmeans of interacting that transfer influence between one part of a system and another.The simplest example is the Ising model, where even short-range interactions can leadto longer-range correlations in the magnetization. The correlation function int ro-duced in Section 1.6.5 measures the correlations between different locations. Thesecorrelations show the degree to which the interactions couple the behavior of differ-ent parts of the system. Correlations of behavior occur in both space and time. As wementioned in Section 1.3.4, near a second-order phase transition, there are correla-tions between different places and times on every length and time scale, because theyfollow a power-law behavior. This example will be discussed in greater detail in thefollowing section.

Our discussion of scaling also finds application in the theory of computation(Section 1.9) and the practical aspects of simulation (Section 1.7). In addition to thequestion of computability discussed in Section 1.9, we can also ask how hard it is tocompute something. Such questions are generally formulated by describing a class ofproblems that can be ordered by a parameter N that describes the size of the problem.The objective of the theory of computational complexity is to determine how thenumber of operations necessary to solve a problem grows with N. A scaling analysiscan also be used to compare different algorithms that may solve the same problem.We are often primarily concerned with the scaling behavior (exponential, power lawand the value of the scaling exponent) rather than the coefficients of the scaling be-havior, because in the comparison of the difficulty of solving different problems ordifferent methodologies this is often, though not always, the most important issue.

1.10.4 Renormalization group

G e n e ral method The ren orm a l i z a ti on group is a formalism for stu dying the scal-ing properties of a sys tem . It starts by assuming a set of equ a ti ons that de s c ri be thebeh avi or of a sys tem . We then ch a n ge the length scale at wh i ch we are de s c ri bing thes ys tem . In ef fect , we assume that we have a finite abi l i ty to see det a i l s . By movi n gaw ay from a sys tem , we lose some of the det a i l . At the new scale we assume that thesame set of equ a ti ons can be app l i ed , but po s s i bly with different coef f i c i en t s . Th eobj ective is to rel a te the set of equ a ti ons at one scale to the set of equ a ti ons at theo t h er scale. O n ce this is ach i eved , the scale-depen dent properties of the sys tem canbe inferred .

Applications of the renormalization group method have been largely to the studyof equilibrium systems,particularly near second-order phase transitions where meanfield approaches break down (Section 1.6).The premise of the renormalization groupis that exactly at a second-order phase transition,the equations describing the systemare independent of scale. In recent years, dynamic renormalization theory has beendeveloped to describe systems that evolve in time. In this section we will describe themore conventional renormalization group for thermodynamic systems.

F ra c t a l s , s c a l i n g an d re n o r m a l i z a t i o n 269


01adBARYAM_29412 3/10/02 10:17 AM Page 269

We il lustrate the concepts of renormalization using the Ising model. The Isingmodel,discussed in Section 1.6, describes the interactions of spins on a lattice. It is afirst model of any system that exhibits simple cooperative behavior, such as a magnet.

In order to apprec i a te the con cept of ren orm a l i z a ti on , it is useful to recogn i ze thatthe Ising model is not a true micro s copic theory of the beh avi or of a magn et . It migh ts eem that there is a well - def i n ed way to iden tify an indivi dual spin with a single el ectronat the atomic level .However, this is far from app a rent wh en equ a ti ons that de s c ri be qu a n-tum mechanics at the atomic level are con s i dered . Si n ce the rel a ti onship bet ween the mi-c ro s copic sys tem and the spin model is not manife s t , it is clear that our de s c ri pti on of t h em a gn et using the Ising model relies upon the mac ro s copic properties of the model ra t h erthan its micro s copic natu re . S t a ti s tical mechanics does not gen era lly attem pt to derivem ac ro s copic properties direct ly from micro s copic re a l i ty. In s te ad , it attem pts to de s c ri bethe mac ro s copic ph en om ena from simple model s . We might not give up hope of i den ti-f ying a specific micro s copic rel a ti onship bet ween a particular material and the Is i n gm odel ,h owever, the use of the model does not rely upon this iden ti f i c a ti on .

Essential to this approach is that many of the details of the atomic regime aresomehow irrelevant at longer length scales. We will return later to discuss the rele-vance or irrelevance of microscopic details. However, our first question is: What is asingle spin variable? A spin variable represents the effective magnetic behavior of a re-gion of the material. There is no particular reason that we should imagine an indi-vidual spin variable as representing a small or a large region of the material.Sometimes it might be possible to consider the whole magnet as a single spin in anexternal field. Identifying the spin with a region of the material of a particular size isan assignment of the length scale at which the model is being applied.

What is the difference between an Ising model describing the system at onelength scale and the Ising model describing it on another? The essential point is thatthe interactions between spins will be different depending on the length scale at whichwe choose to model the system. The renormalization group takes this discussion onestep further by explicitly relating the models at different scales.

In Fig. 1.10.7 we illustrate an Ising model in two dimensions. There is a secondIsing model that is used to describe this same system but on a length scale that is twiceas big. The first Ising model is described by the energy function (Hamiltonian):

(1.10.16)

For conven i en ce , in what fo ll ows we have inclu ded a constant en er gy term −c N =−c Σ1 .This term does not affect the beh avi or of the sys tem ,h owever, its va ri a ti on from scaleto scale should be inclu ded . The second Ising model is de s c ri bed by the Ha m i l ton i a n

(1.10.17)

where both the variables and the coefficients have primes. While the first model hasN spins, the second model has N ′ spins. Our objective is to relate these two models.The general process is called renormalization. When we go from the fine scale to thecoarse scale by eliminating spins, the process is called decimation.

′ E [{ ′ s i }]= − ′ c 1i

∑ – ′ h ′ s ii

∑ − ′ J ′ s i ′ s j<ij>∑

E[{s i }]= −c 1i

∑ – h s i

i

∑ − J s is j

<ij>∑



01adBARYAM_29412 3/10/02 10:17 AM Page 270



Figure 1.10.7 Schematic illustration of two Ising models in two dimensions. The spins areindicated by arrows that can be UP or DOWN. These Ising models illustrate the modeling of asystem with different levels of detail. In the upper model there are one-fourth as many spinsas in the lower model. In a renormalization group treatment the parameters of the lowermodel are related to the parameters of the upper model so that the same system can be de-scribed by both. Each of the spins in the upper model, in effect, represents four spins in thelower model. The interactions between adjacent spins in the upper model represent the neteffect of the interactions between groups of four spins in the lower model. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 271

There are a variety of methods used for relating models at different scales. Eachof them provides a distinct conceptual and practical approach. While in principle theyshould provide the same answer, they are typically approximated at some stage of thecalculation and therefore the answers need not be the same. All the approaches we de-scribe rely upon the partition function to enable direct connection from the micro-scopic statistical treatment to the macroscopic thermodynamic quantities. For a par-ticular system, the par tition function can be written so that it has the same value,independent of which representation is used:

(1.10.18)

It is conven ti onal and conven i ent wh en performing ren orm a l i z a ti on tra n s form a-ti ons to set = 1 /k T = 1 . Si n ce mu l tiplies each of the para m eters of the en er gyf u n cti on , it is a redundant para m eter. It can be rei n s erted at the end of the calcu-l a ti on s .

The different approaches to renormalization are useful for various models thatcan be studied. We will describe three of them in the following paragraphs because ofthe importance of the different conceptual treatments. The three approaches are (1)summing over values of a subset of the spins, (2) averaging over a local combinationof the spins, and (3) summing over the short wavelength degrees of freedom in aFourier space representation.

1. Summing over values of a subset of the spins. In the first approach we considerthe spins on the larger scale to be a subset of the spins on the finer scale. To findthe energy of interaction between the spins on the larger scale we need to elimi-nate (decimate) some of the spins and replace them by new interactions betweenthe spins that are left. Specifically, we identify the larger scale spins as corre-sponding to a subset {si}A of the smaller scale spins. The rest of the spins {si}B

must be eliminated from the fine scale model to obtain the coarse scale model.We can implement this directly by using the partition function:

(1.10.19)

In this equation we have identified the spins on the larger scale as a subset of thefiner scale spins and have summed over the finer scale spins to obtain the effec-tive energy for the larger scale spins.

2. Averaging over a local combination of the spins. We need not identify a particu-lar spin of the finer scale with a particular spin of the coarser scale.We can chooseto identify some function of the finer scale spins with the coarse scale spin. Forexample, we can identify the majority rule of a certain number of fine scale spinswith the coarse scale spins:

(1.10.20)

e −E[{ ′ s i }] =i ∈A∏ ′ s i ,sign( s i∑ )

{s i }

∑ e −E[{s i }]

e − ′ E [{ ′ s i }] ={si }B

∑ e −E[{ ′ s i }A ,{s i }B ] = e −E[{s i }]

{s i }

∑ ′ s i ,s i

i∈A∏

Z ={s i}

∑ e − E[{s i}] ={ ′ s i }

∑ e − ′ E [{ ′ s i}]



01adBARYAM_29412 3/10/02 10:17 AM Page 272

This is easier to think abo ut wh en an odd nu m ber of spins are being ren orm a l i zedto become a single spin. No te that this is qu i te similar to the con cept of defining aco ll ective coord i n a te that we used in Secti on 1.4 in discussing the two - s t a te sys tem .The differen ce here is that we are defining a co ll ective coord i n a te out of on ly a fewori ginal coord i n a te s , so that the redu cti on in the nu m ber of degrees of f reedom iscom p a ra tively small . No te also that by conven ti on we con ti nue to use the term“en er gy,” ra t h er than “f ree en er gy,” for the co ll ective coord i n a te s .

3. Summing over the short wavelength degrees of freedom in a Fourier space rep-resentation. Rather than performing the elimination of spins dir ectly, we mayrecognize that our procedure is having the effect of removing the fine scale vari-ation in the problem. It is natural then to consider a Fourier space representationwhere we can remove the rapid changes in the spin values by eliminating thehigher Fourier components. To do this we need to represent the energy functionin terms of the Fourier transform of the spin variables:

(1.10.21)

Writing the Hamiltonian in terms of the Fourier transformed variables, we thensum over the values of the high frequency terms:

(1.10.22)

The remaining coordinates sk have k > k0.

All of the approaches described above typically require some approximation inorder to perform the analysis. In general there is a conservation of effort in that thesame difficulties tend to arise in each approach, but with different manifestation.Partof the reason for the difficulties is that the Hamiltonian we use for the Ising model isnot really complete. This means that there can be other parameters that should be in-cluded to describe the behavior of the system. We will see this by direct application inthe following examples.

Ising model in one dimension We illu s tra te the basic con cepts by app lying theren orm a l i z a ti on group to a on e - d i m en s i onal Ising model wh ere the procedu re canbe done ex act ly. It is conven i ent to use the first approach (nu m ber 1 above) ofi den ti f ying a su b s et of the fine scale spins with the larger scale model . We start wi t hthe case wh ere there is an interacti on bet ween nei gh boring spins, but no magn eti cf i el d :

(1.10.23)

We sum the partition function over the odd spins to obtain

(1.10.24)

Z ={s i }even

∑{s i }odd

∑ ec 1

i∑ +J

i∑ s i s i+1

=i even∏ 2cosh(J(s i + s i+2))

{si }even

∑ e 2c

E[{s i }]= −c 1i

∑ − J s i s j

<ij>∑

e −E[{s k }] ={sk }k<k0

∑ e −E[{s k}]

sk = e ikx i

i

∑ si

F ra c t a l s , s c a l i n g a n d r e n o r m a l i z a t i o n 273


01adBARYAM_29412 3/10/02 10:17 AM Page 273

We equate this to the energy for the even spins by themselves, but with primedquantities:

(1.10.25)

This gives:

(1.10.26)

or

c ′ + J ′sisi + 2 = ln(2cosh(J(si + si + 2))) + 2c (1.10.27)

Inserting the two distinct combinations of values of si and si+2 (si = si+2 and si = −si+2),we have:

c ′ + J ′ = ln(2cosh(2J )) + 2c(1.10.28)

c ′ − J ′ = ln(2cosh(0)) + 2c = ln(2) + 2c

Solving these equations gives the primed quantities for the larger scale model as:

J ′ = (1/2)ln(cosh(2,J ))(1.10.29)

c ′ = 2c + (1/2)ln(4cosh(2J ))

This is the renormalization group relationship that we have been looking for. It relatesthe values of the parameters in the two different energy functions at the differentscales.

While it may not be obvious by inspection, this iterative map always causes J todecrease. We can see this more easily if we transform the relationship of J to J ′ to theequivalent form:

tanh(J ′) = tanh(J)2 (1.10.30)

This means that on longer and longer scales the effective interaction between neigh-boring spins becomes smaller and smaller. Eventually the system on long scales be-haves as a string of decoupled spins.

The analysis of the one-dimensional Ising model can be extended to include amagnetic field. The decimation step becomes:

(1.10.31)

We equate this to the coarse scale partition function:

(1.10.32)

which requires that:

Z ={s i }odd

∑ e′ c + ′ h

i

∑ ′ s i + ′ J i

∑ ′ s i ′ s i +1

=i odd∏ 2cosh(h + J(s i + si +2))e 2c

{s i }odd

∑

Z ={s i }even

∑{s i }odd

∑ ec 1

i

∑ +hi

∑ si +Ji

∑ s is i +1

=i odd∏ 2cosh(h + J(s i + si +2))

{si }even

∑ e 2c

e′ c + ′ J si s i+2 = 2cosh(J(s i + si +2))e 2c

Z ={s i }even

∑ e′ c +

i

∑ ′ J i

∑ s i si+ 2

=i even∏ 2cosh(J(si +s i +2))

{s i }even

∑ e 2c



01adBARYAM_29412 3/10/02 10:17 AM Page 274

c ′ + h ′ + J ′ = h + ln(2cosh(h + 2J)) + 2c

c ′ − J ′ = ln(2cosh(h)) + 2c (1.10.33)

c ′ − h ′ + J ′ = −h + ln(2cosh(h − 2J)) + 2c

We solve these equations to obtain:

c ′ = 2c + (1/4)ln(16cosh(h + 2J)cosh(h − 2J)cosh(h)2)

J ′ = (1/4)ln(cosh(h + 2J)cosh(h − 2J)/cosh(h)2) (1.10.34)

h′ = h + (1/2)ln(cosh(h + 2J)/cosh(h − 2J))

which is the desired renormalization group transformation. The renormalizationtransformation is an iterative map in the parameter space (c, h, J).

We can show what happens in this iterative map using a plot of changes in thevalues of J and h at a particular value of these parameters. Such a diagram of flows inthe parameter space is illustrated in Fig. 1.10.8. We can see from the figure or from Eq.(1.10.34) that there is a line of fixed points of the iterative map at J = 0 with arbitrary



-4

-3

-2

-1

0

1

2

3

4

0 0.5 1 1.5 2

h

J

Figure 1.10.8 The renormalization transformation for the one-dimensional Ising model is il-lustrated as an iterative flow diagram in the two-dimensional (h,J ) parameter space. Each ofthe arrows represents the effect of decimating half of the spins. We see that after a few iter-ations the value of J becomes very small. This indicates that the spins become decoupled fromeach other on a larger scale. The absence of any interaction on this scale means that there isno phase transition in the one-dimensional Ising model. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 275

value of h. This simply means that the spins are decoupled. For J = 0 on any scale,thebehavior of the spins is determined by the value of the external field.

The line of f i xed points at J = 0 is a stable (attracting) set of f i xed poi n t s . Th ef l ow lines of the itera tive map take us to these fixed points on the attractor line. Inad d i ti on , t h ere is an unstable fixed point at J = ∞. This would corre s pond to as tron gly co u p l ed line of s p i n s , but since this fixed point is unstable it does not de-s c ri be the large scale beh avi or of the model . For any finite va lue of J, ch a n ging thescale ra p i dly causes the va lue of J to become small . This means that the large scalebeh avi or is alw ays that of a sys tem with J = 0 .

Ising model in two dimensions In the one-dimensional case treated in the pre-vious section, the renormalization group works perfectly and is also, from the pointof view of studying phase transitions,uninteresting. We will now look at two dimen-sions, where the renormalization group must be approximated and where there is alsoa phase transition.

We can simplify our task in two dimensions by eliminating half of the spins (Fig.1.10.9) instead of three out of four spins as illustrated previously in Fig. 1.10.7.Eliminating half of the spins causes the square cell to be rotated by 45˚,but this shouldnot cause any problems. Labeling the spins as in Fig. 1.10.9 we write the decimationstep for a Hamiltonian with h = 0:

(1.10.35)

In the last expression we take into consideration that each bond of the form s1s2 ap-pears in two squares and each spin appears in four squares.

In order to solve Eq . (1.10.35) for the va lues of c ′and J ′ we must insert all po s-s i ble va lues of the spins (s1,s2,s3,s4) . However, this leads to a serious probl em .Th ere are four disti n ct equ a ti ons that arise from the different va lues of the spins.This is redu ced from 24 = 8 bec a u s e , by sym m etry, i nverting all of the spins give sthe same answer. The probl em is that while there are four equ a ti on s , t h ere areon ly two unknowns to solve for, c ′ and J ′. The probl em can be illu s tra ted by rec-ognizing that there are two disti n ct ways to have two spins U P and two spinsDOW N. One way is to have the spins that are the same be ad jacent to each other,and the other way is to have them be oppo s i te each other ac ross a diagon a l . Th et wo ways give the same re sult for the va lue of (s1 + s2 + s3 + s4) but different re su l t sfor (s1s2 + s2s3 + s3s4 + s4s1) .

Z ={s i }A

∑{s i }B

∑ ec 1+J

i∑

i∑ s0 (s 1+s 2+s 3 +s 4)

={s i }A

∑i∈B∏ 2cosh(J(s1 +s2 + s3 + s4 ))e c

={s i }A

∑i∈B∏ e c′+(J ′/2)(s1s 2+s2 s 3+s3 s 4+s 4 s1)



01adBARYAM_29412 3/10/02 10:17 AM Page 276

In order to solve this problem, we must introduce additional parameters whichcorrespond to other interactions in the Hamiltonian. To be explicit, we would make atable of symmetry-related combinations of the four spins as follows:

(s1,s2,s3,s4) (1,1,1,1) (1,1,1,−1) (1,1,−1,−1) (1,−1,1,−1)1 1 1 1 1

(s1 + s2 + s3 + s4) 4 2 0 0(s1s2 + s2s3 + s3s4 + s4s1) 4 0 0 −4 (1.10.36)

(s1s3 + s2s4) 2 0 −2 2s1s2s3s4 1 −1 1 1

In order to make use of these to resolve the problems with Eq.(1.10.35), we must in-troduce new interactions in the Hamiltonian and new parameters that multiply them.This leads to second-neighbor interactions (across a cell diagonal),and four spin in-teractions around a square:

(1.10.37)

where the notation << ij >> indicates second-neighbor spins across a square diagonal,and < ijkl > indicates spins around a square. This might seem to solve our problem.However, we started out from a Hamiltonian with only two parameters,and now weare switching to a Hamiltonian with four parameters. To be self-consistent, we shouldstart from the same set of parameters we end up with. When we start with the addi-tional parameters K and L this will,however, lead to still further terms that should beincluded.

E[{s i }]= − ′ c 1i

∑ – ′ J s is j

<ij>∑ − ′ K si s j

<<ij >>∑ − ′ L si s j

<ijkl>∑ sksl



s0 s1

s2

s3

s4

F i g u re 1.10.9 In a re no r ma l-i z a t ion tre a t me nt of the two-d i me ns io nal Is i ng mo del it ispossible to de c i mate one outof two spins as illustrated inthis fig u re. The black do t sre p re s e nt spins that re main int he larger-scale mo del, andt he white dots re p re s e nt spinsthat are de c i ma t e d. The ne a r-e s t - ne ighbor int e ra c t io ns int he larger-scale mo del ares hown by da s hed line s. As dis-cussed in the text, the pro c e s sof de c i ma t ion int ro duces ne wi nt e ra c t io ns between spinsa c ross the dia go nal, and fo u rspin int e ra c t io ns betweens p i ns aro u nd a squa re. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 277

Relevant and irrelevant parameters In general,as we eliminate spins by renormal-ization, we introduce interactions between spins that might not have been includedin the original model. We will have interactions between second or third neighbors orbetween more than two spins at a time. In principle, by using a complete set of para-meters that describe the system we can perform the renormalization transformationand obtain the flows in the parameter space. These flows tell us about the scale-de-pendent properties of the system.

We can characterize the flows by focusing on the fixed points of the iterative map.These fixed points may be stable or unstable. When a fixed point is unstable, renor-malization takes us away from the fixed point so that on a larger scale the propertiesof the system are found to be different from the values at the unstable fixed point.Significantly, it is the unstable fixed points that represent the second-order phasetransitions. This is because deviating from the fixed point in one direction causes theparameters to flow in one direction, while deviating from the fixed point in anotherdirection causes the parameters to flow in a different direction. Thus,the macroscopicproperties of the system depend on the direction microscopic parameters deviatefrom the fixed point—a succinct characterization of the nature of a phase transition.

Using this characterization of fixed points, we can now distinguish between dif-ferent types of parameters in the model. This includes all of the additional parame-ters that might be introduced in order to achieve a self-consistent renormalizationtransformation. There are two major categories of parameters: relevant or irrelevant.Starting near a particular fixed point, changes in a relevant parameter grow underrenormalization. Changes in an irrelevant parameter shrink.Because renormalizationindicates the values of system parameters on a larger scale,this tells us which micro-scopic parameters are important to the macroscopic scale. When observed on themacroscopic scale, relevant parameters change at the phase transition, while irrele-vant parameters do not.A relevant parameter should be included in the Hamiltonianbecause its value affects the macroscopic behavior. An irrelevant parameter may oftenbe included in the model in a more approximate way. Marginal parameters are theborderline cases that neither grow nor shrink at the fixed point.

Even when we are not solely interested in the behavior of a system at a phase tran-sition, but rather are concerned with its macroscopic properties in general,the defin-ition of “relevant” and “irrelevant” continues to make sense. If we start from a partic-ular microscopic description of the system, we can ask which parameters are relevantfor the macroscopic behavior. The relevant parameters are the ones that can affect themacroscopic behavior of the system. Thus, a change in a relevant microscopic para-meter changes the macroscopic behavior. In terms of renormalization, changes in rel-evant parameters do not disappear as a result of renormalization.

We see that the use of a ny model , su ch as the Ising model , to model a physical sys-tem assumes that all of the para m eters that are essen tial in de s c ri bing the sys tem havebeen inclu ded . Wh en this is tru e , the re sults are universal in the sense that all micro-s copic Ha m i l tonians wi ll give rise to the same beh avi or. Ad d i ti onal terms in theHa m i l tonian cannot affect the mac ro s copic beh avi or. We know that the micro s cop i cbeh avi or of the physical sys tem is not re a lly de s c ri bed by the Ising model or any othersimple model . Thu s , in cre a ting models we alw ays rely upon the con cept , i f not the



01adBARYAM_29412 3/10/02 10:17 AM Page 278

proce s s , of ren orm a l i z a ti on to make many of the micro s copic details disappe a r, en-a bling our simple models to de s c ri be the mac ro s copic beh avi or of the physical sys tem .

In the Ising model, in addition to longer range and multiple spin interactions,there is another set of parameters that may be relevant. These parameters are relatedto the use of binary variables to describe the magnetization of a region of the mater-ial. It makes sense that the process of renormalization should cause the model to haveadditional spin values that are intermediate between fully magnetized UP and fullymagnetized DOWN. In order to accommodate this, we might introduce a continuumof possible magnetizations.Once we do this,the amplitude of the magnetization hasa probability distribution that will be controlled by additional parameters in theHamiltonian. These parameters may also be relevant or irrelevant. When they are ir-relevant,the Ising model can be used without them. However, when they are relevant,a more complete model should be used.

The parameters that are relevant generally depend on the dimensionality ofspace. From our analysis of the behavior of the one-dimensional Ising model,the pa-rameter J is irrelevant. It is clearly irrelevant because not only variations in J but J it-self disappears as the scale increases. However, in two dimensions this is not true.

For our purposes we will be satisfied by simplifying the renormalization treat-ment of the two-dimensional Ising model so that no additional parameters are intro-duced. This can be done by a fourth renormalization group technique which has someconceptual as well as practical advantages over the others. However, it does hide theimportance of determining the relevant parameters.

Bond shifting We simplify our analysis of the two-dimensional Ising model by mak-ing use of the Migdal-Kadanoff transformation. This renormalization group tech-nique is based on the recognition that the correlation between adjacent spins can en-able us to, in effect, substitute the role of one spin for another. As far as the coarserscale model is concerned, the function of the finer scale spins is to mediate the inter-action between the coarser scale spins. Because one spin is correlated to the behaviorof its neighbor, we can shift the responsibility for this interaction to a neighbor, anduse this shift to simplify elimination of the spins.

To apply these ideas to the two-dimensional Ising model, we move some of theinteractions (bonds) between spins, as shown in Fig. 1.10.10. We note that the dis-tance over which the bonds act is preserved. The net result of the bond shifting is thatwe form short linear chains that can be renormalized just like a one-dimensionalchain. The renormalization group transformation is thus done in two steps.First weshift the bonds, then we decimate. Once the bonds are mo ved, we write the renor-malization of the partition function as:

(1.10.38)

Z ={s i }A

∑{s i }B

∑{si }C

∑ ec 1

i∑ +2J

i∑ s 0(s1+s 2)

=i ∈A∏ 2cosh(2J(s1 +s2))e 4c

{s i }

∑=

i ∈A∏ e ′ c + ′ J (s 1s 2 )

{s i}

∑

F ra c t a l s , s ca l i n g a n d r e n o r m a l i z a t i o n 279


01adBARYAM_29412 3/10/02 10:17 AM Page 279

The spin labels s0, s1, s2 are assigned along each doubled bond, as indicated inFig. 1.10.10. The three types of spin A, B and C correspond to the white, black andgray dots in the figure. The resulting equation is the same as the one we found whenperforming the one-dimensional renormalization group transformation with the ex-ception of factors of two. It gives the result:



(a)

(b)

s1 s2s0

F i g u re 1 . 1 0 . 1 0 I l l u s-tration of the Migdal-Kadanoff renormaliza-tion transformation thatenables us to bypassthe formation of addi-tional interactions. Inthis approach some ofthe interactions be-tween spins are movedto other spins. If all thespins are aligned (at lowtemperature or high J),then shifting bondsdoesn’t affect the spinalignment. At high tem-perature, when the spinsare uncorrelated, the in-teractions are not im-portant anyway. Nearthe phase transition,when the spins arehighly correlated, shift-ing bonds still makessense. A pattern of bondmovement is illustratedin (a) that gives rise tothe pattern of doubledbonds in (b). Note thatwe are illustrating onlypart of a periodic lattice,so that bonds are movedinto and out of the re-gion illustrated. Usingthe exact renormaliza-tion of o ne - d i me ns io na lc h a i ns, the gray spinsa nd the black spins canbe de c i mated to leaveonly the white spins. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 280

J ′ = (1/2)ln(cosh(4J ))(1.10.39)

c ′ = 4c + (1/2)ln(4cosh(4J ))

The ren orm a l i z a ti on of J in the two - d i m en s i onal Ising model tu rns out to beh avequ a l i t a tively different from the on e - d i m en s i onal case. Its beh avi or is plotted inF i g. 1.10.11 using a flow diagra m . Th ere is an unstable fixed point of the itera tivemap at J ≈ . 3 0 5 . This non zero and non i n f i n i te fixed point indicates that we have aphase tra n s i ti on . Rei n s erting the tem pera tu re , we see that the phase tra n s i ti on occ u rsat J = .305 which is significantly larger than the mean field result zJ = 1 or J = .25found in Section 1.6. The exact value for the phase transition for this lattice, J ≈ .441,which can be obtained analytically by other techniques, is even larger.

It turns out that there is a trick that can give us the exact transition point using asimilar renormalization transformation. This trick begins by recognizing that wecould have moved bonds in a larger square. For a square with b cells on a side, wewould end up with each bond on the perimeter being replaced by a bond of strengthb. Using Eq.(1.10.30) we can infer that a chain of b bonds of strength bJ gives rise toan effective interaction whose strength is

J ′(b) = tanh−1(tanh(bJ )b) (1.10.40)



J

0 0.2 0.4 0.6 0.8 1

0.2 0.25 0.3 0.35 0.4

J

(a)

(b)

Figure 1.10.11 The two-dimensional Ising model renormalization group transformation ob-tained from the Migdal-Kadanoff transformation is illustrated as a flow diagram in the one-dimensional parameter space (J). The arrows show the effect of successive iterations start-ing from the black dots. The white dot indicates the position of the unstable fixed point, J c,which is the phase transition in this model. Starting from values of J slightly below J c, iter-ation results in the model on a large scale becoming decoupled with no interactions betweenspins (J → 0). This is the high-temperature phase of the material. However, starting fromvalues of J slightly above J c iteration results in the model on the large scale becomingstrongly coupled (J → ∞) and spins are aligned. (a) shows only the range of values from 0to 1, though the value of J can be arbitrarily large. (b) shows an enlargement of the regionaround the fixed point. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 281

The trick is to take the limit b → 1, because in this limit we are left with the originalIsing model. Extending b to nonintegral values by analytic continuation may seem alittle strange, but it does make a kind of sense. We want to look at the incrementalchange in J as a result of renormalization, with b incrementally different from 1. Thiscan be most easily found by taking the hyperbolic tangent of both sides of Eq.(1.10.40), and then taking the derivative with respect to b. The result is:

(1.10.41)

Setting this equal to zero to find the fixed point of the transformation actually givesthe exact result for the phase transition.

The renormalization group gives us more information than just the location ofthe phase transition. Fig. 1.10.11 shows changes that occur in the parameters as thelength scale is varied. We can use this picture to understand the behavior of the Isingmodel in some detail. It shows what happens on longer length scales by the directionof the arrows. If the flow goes toward a particular point,then we can tell that on thelongest (thermodynamic) length scale the behavior will be characterized by the be-havior of the model at that point. By knowing how close we are to the original phasetransition, we can also learn from the renormalization group what is the length scaleat which the behavior characteristic of the phase transition will disappear. This is thelength scale at which the iterative map leaves the region of the repelling fixed pointand moves to the attracting one.

We can also ch a racteri ze the rel a ti onship bet ween sys tems at different va lu e sof the para m eters : tem pera tu res or magn etic fiel d s . Ren orm a l i z a ti on takes usf rom a sys tem at one va lue of J to another. Thu s , we can rel a te the beh avi or of as ys tem at one tem pera tu re to another by performing the ren orm a l i z a ti on for bo t hs ys tems and stopping both at a particular va lue of J. At this point we can direct lyrel a te properties of the two sys tem s , su ch as their free en er gi e s . Di f ferent nu m bersof ren orm a l i z a ti on steps in the two cases mean that we are rel a ting the two sys-tems at different scales. Su ch de s c ri pti ons of rel a ti onships of the properties of on es ys tem at one scale with another sys tem at a different scale are known as scalingf u n cti ons because they de s c ri be how the properties of the sys tem ch a n ge wi t hs c a l e .

The renormalization group was developed as an analytic tool for studying thescaling properties of systems with spatially arrayed interacting parts.We will study an-other use of renormalization in Section 1.10.5. Then in Section 1.10.6 we will intro-duce a computational approach—the multigrid method.

Question 1.10.6 In this section we displayed our iterative maps graphi-cally as flow diagrams, because in renormalization group transforma-

tions we are often interested in maps that involve more than one variable.Make a diagram like Fig. 1.1.1 for the single variable J showing the iterativerenormalization group transformation for the two-dimensional Ising modelas given in Eq. (1.10.39).

d ′ J (b)

dbb=1

= J + sinh(J)cosh(J )ln(tanh(J ))



01adBARYAM_29412 3/10/02 10:17 AM Page 282

Solution 1.10.6 See Fig. 1.10.12. The fixed point and the iterative behaviorare readily apparent. ❚

1.10.5 Renormalization and chaosOur final example of renormalization brings us back to Section 1.1, where we studiedthe properties of iterative maps and the bifurcation route to chaos. According to ourdiscussion, cycles of length 2k, k = 0,1,2,..., appeared as the parameter a was variedfrom 0 to ac = 3.56994567, at which point chaotic behavior appeared.Fig. 1.1.3 sum-marizes the bifurcation route to chaos.A schematic of the bifurcation part of this di-agram is reproduced in Fig . 1.10.13. A brief review of Section 1.1 may be useful forthe following discussion.

The process of bifurcation appears to be a self-similar process in the sense thatthe appearance of a 2-cycle for f (s) is repeated in the appearance of a 2-cycle for f 2(s),but over a smaller range of a. The idea of self-similarity seems manifest inFig. 1.10.13, where we would only have to change the scale of magnification in the sand a directions in order to map one bifurcation point onto the next one. While thismapping might not work perfectly for smaller cycles, it becomes a better and better



0.0

0.3

0.6

0.9

1.2

1.6

1.9

2.2

2.5

2.8

0 1 2 3 4 5 6 7 8

f(J)=0.5 ln(cosh(4J))

Figure 1.10.12 The iterative map shown as a flow diagram in Fig. 1.10.11 is shown here inthe same manner as the iterative maps in Section 1.1. On the left are shown the successivevalues of J as iteration proceeds. Each iteration should be understood as a loss of detail inthe model and hence an observation of the system on a larger scale. Since in general our ob-servations of the system are macroscopic, we typically observe the limiting behavior as theiterations go to ∞. This is similar to considering the limiting behavior of a standard iterativemap. On the right is the graphical method of determining the iterations as discussed inSection 1.1. The fixed points are visible as intersections of the iterating function with the di-agonal line. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 283


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5 1 1.5 2 2.5 3 3.5 4aac

0

0

s1 = (a −1) a

s2 =(a +1) ± (a +1)(a − 3)

2a

a = 1 + 6

s

∆ak ∆ak+1

∆sk

∆sk+1

Figure 1.10.13 Schematic reproduction of Fig. 1.1.4, which shows the bifurcation route tochaos. Successive branchings are approximately self-similar. The bottom figure shows the de-finition of the scaling factors that relate the successive branchings. The horizontal rescalingof the branches, δ, is given by the ratio of ∆ak to ∆ak+1. The vertical rescaling of thebranches, α, is given by the ratio of ∆sk to ∆sk+1. The top figure shows the values from whichwe can obtain a first approximation to the values of and δ, by taking the ratios from thezeroth, first and second bifurcations. The zeroth bifurcation point is actually the point a = 1.The first bifurcation point occurs at a = 3. the second occurs at a = 1 + √6. The values of sat the bifurcation points were obtained in Section 1.1, and formulas are indicated on the fig-ure. When the scaling behavior of the tree is analyzed using a renormalization group treat-ment, we focus on the tree branches that cross s = 1/2. These are indicated by bold lines inthe top figure. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 284

approximation as the number of cycles increases. The bifurcation diagram is thus atreelike object. This means that the sequence of bifurcation points forms a geometri-cally converging sequence, and the width of the branches is also geometrically con-verging. However, the distances in the s and a directions are scaled by different fac-tors. The factors that govern the tree rescaling at each level are and , as shown inFig. 1.10.13 (b):

(1.10.42)

By this convention,the magnitude of both and is greater than one. is defined tobe negative because the longer branch flips up to down at every branching. The val-ues are to be obtained by taking the limit as k → ∞ where these scale factors have well-defined limits.

We can find a first approximation to these scaling factors by using the values atthe first and second bifurcations that we calculated in Section 1.1. These values, givenin Fig. 1.10.13, yield:

≈ (3 − 1)/(1 + √6 − 3) = 4.449 (1.10.43)

(1.10.44)

Numerically, the asymptotic value of for large k is found to be 4.6692016. This dif-fers from our first estimate by only 5%. The numerical value for is 2.50290787,which differs from our first estimate by a larger margin of 30%.

We can determine these constants with greater accuracy by studying directly theproperties of the functions f, f 2, . . . f 2k

. . . that are involved in the formation of 2k cy-cles. In order to do this we modify our notation to explicitly include the dependenceof the function on the parameter a. f (s,a), f 2(s,a), etc. Note that iteration of the func-tion f only applies to the first argument.

The tree is formed out of curves s2k (a) that are obtained by solving the fixed pointequation:

(1.10.45)

We are interested in mapping a segment of this curve between the values of s where

(1.10.46)

and

(1.10.47)

df 2 k

(s ,a)

ds= −1

df 2 k

(s ,a)

ds= 1

s2k (a) = f 2k

(s2k (a),a)

≈2s1

a=3

s2+ −s2

−

a=1+ 6

=4

3

a

(a +1)(a − 3)a=1+ 6

= 3.252

= lim

k →∞

∆sk

∆sk+1

= lim

k →∞

∆a k

∆ak +1

F ra c t a l s , s c a l i n g a n d re n o r m a l i z a t i o n 285


01adBARYAM_29412 3/10/02 10:17 AM Page 285

onto the next function, where k is replaced everywhere by k + 1. This mapping is akind of renormalization process similar to that we discussed in the previous section.In order to do this it makes sense to expand this function in a power series around anintermediate point, which is the point where these derivatives are zero. This is knownas the superstable point of the iterative map. The superstable point is very convenientfor study, because for any value of k there is a superstable point at s = 1/2. This followsbecause f(s,a) has its maximum at s = 1/2, and so its derivative is zero. By the chainrule,the derivative of f 2k

(s ,a),is also zero. As illustrated in Fig. 1.10.13,the line at s =1/2 intersects the bifurcation tree at every level of the hierarchy at an intermediatepoint between bifurcation points. These intersection points must be superstable.

It is convenient to displace the origin of s to be at s = 1/2,and the origin of a tobe at the convergence point of the bifurcations.We thus define a function g which rep-resents the structure of the tree. It is approximately given by:

g(s ,a) ≈ f(s + 1/2,a + ac) − 1/2 (1.10.48)

However, we would like to represent the idealized tree rather than the real tree. Theidealized tree would satisfy the scaling relation exactly at all values of a. Thus g shouldbe the analog of the function f which would give us an ideal tree. To find this functionwe need to expand the region near a = ac by the appropriate scaling factors.Specifically we define:

(1.10.49)

The easiest way to think about the function g (s,a) is that it is quite similar to the qua-dratic function f(s,a) but it has the form necessary to cause the bifurcation tree to havethe ideal scaling behavior at every branching. We note that g (s,a) depends on the be-havior of f (s,a) only very near to the point s = 1/2. This is apparent in Eq. (1.10.49)since the region near s = 1/2 is expanded by a factor of k.

We note that g (s,a) has its maximum at s = 0. This is a consequence of the shiftin origin that we chose to make in defining it.

Our objective is to find the form of g(s,a) and,with this form,the values of and. The trick is to recognize that what we need to know can be obtained directly from

its scaling properties. To write the scaling properties we look at the relationship be-tween successive iterations of the map and write:

g(s,a) = g 2(s/ ,a / ) (1.10.50)

This follows either from our discussion and definition of the scaling parameters andor directly from Eq. (1.10.49).

For convenience, we analyze Eq. (1.10.50) first in the limit a → 0. This corre-sponds to looking at the function g (s,a) as a function of s at the limit of the bifurca-tion sequence. This function (Fig. 1.10.14) still looks quite similar to our originalfunction f(s), but its specific form is different. It satisfies the relationship:

g(s,0) = g(s) = g 2(s / ) (1.10.51)

g(s,a) = lim

k→∞

k f 2k

(s / k +1/2,a / k + ac) −1/2



01adBARYAM_29412 3/10/02 10:17 AM Page 286

We approximate this function by a quadratic with no linear term because g(s) has itsmaximum at s = 0:

g(s) ≈ g0 + cs2 (1.10.52)

Inserting into Eq. (1.10.51) we obtain:

g0 + cs2 ≈ (g 0 + c (g0 + c(s / )2)2) (1.10.53)

Equating separately the coefficients of the first and second terms in the expansiongives the solution:

= 1 / (1 + cg0)

= 2cg0

(1.10.54)

We see that c and g0 only appear in the combination cg0, which means that there is oneparameter that is not determined by the scaling relationship. However, this does notprevent us from determining . Eq. (1.10.54) can be solved to obtain:



-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-0.5 -0.3 -0.1 0.1 0.3 0.5

f(s+1/2,ac)-1/2

Figure 1.10.14 Three functions are plotted that are successive approximations to g(s) = g(s,0). This function is essentially the limiting behavior of the quadratic iterative map f(s) at theend of the bifurcation tree ac. The functions plotted are the first three k values inserted inEq. (1.10.49): f(s + 1/2, a + ac) − 1/2, af 2(s/ + 1/2, ac) − 1/2 and a2f 4(s/ 2 + 1/2, ac) −1/2. The latter two are almost indistinguishable, indicating that the sequence of functionsconverges rapidly. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 287

cg0 = (−1 ± √3)/2 = −1.3660254

= (−1 ± √3) = −2.73205081(1.10.55)

We have chosen the negative solutions because the value of and the value of cg0 mustbe negative.

We return to consider the dependence of g(s,a,) on a to obtain a new estimate for. Using a first-order linear dependence on a we have:

g(s,a,) ≈ g0 + cs2 + ba (1.10.56)

Inserting into Eq. (1.10.50) we have:

g0 + cs2 + ba ≈ (g0 + c(g0 + c(s / )2 + ba / )2 + ba / ) (1.10.57)

Taking only the first order term from this equation in a we have:

= 2 cg0 + = 4.73205 (1.10.58)

Eq. (1.10.55) and Eq.(1.10.58) are a significant improvement over Eq.(1.10.44) andEq.(1.10.43). The new value of is less than 10% from the exact value. The new valueof is less than 1.5% from the exact value. To improve the accuracy of the results, weneed only add additional terms to the expansion of g(s,a) in s. The first-order term ina is always sufficient to obtain the corresponding value of .

It is important, and actually central to the argument in this section, that the ex-plicit form of f (s,a) never entered into our discussion. The only assumption was thatthe functional behavior near the maximum is quadratic. The rest of the argument fol-lows independent of the form of f (s,a) because we are looking at its properties aftermany iterations. These properties depend only on the region right in the vicinity ofthe maximum of the function. Thus only the first-order term in the expansion of theoriginal function f (s,a) matters. This illustrates the notion of universality so essentialto the concept of renormalization—the behavior is controlled by very few parame-ters. All other parameters are irrelevant—changing their values in the original itera-tive map is irrelevant to the behavior after many iterations (many renormalizations)of the iterative map. This is similar to the study of renormalization in models like theIsing model, where most of the details of the behavior at small scales no longer mat-ter on the largest scales.

1.10.6 MultigridThe multigrid technique is designed for the solution of computational problems thatbenefit from a description on multiple scales. Unlike renormalization, which is largelyan analytic tool,the multigrid method is designed specifically as a computational tool.It works well when a problem can be approximated using a description on a coarselattice, but becomes more and more accurate as the finer-scale details on finer-scalelattices are included. The idea of the method is not just to solve an equation on finerand finer levels of description, but also to correct the coarser-scale equations based onthe finer-scale results. In this way the methodology creates an improved descriptionof the problem on the coarser-scale.



01adBARYAM_29412 3/10/02 10:17 AM Page 288

The multigrid approach relies upon iterative refinement of the solution.Solutions on coarser scales are used to approximate the solutions on finer scales. Thefiner-scale solutions are then iteratively refined. However, by correcting the coarser-scale equations,it is possible to perform most of the iterative refinement of the fine-scale solution on the coarser scales. Thus the iterative refinement of the solution isbased both upon correction of the solution and correction of the equation. The ideaof correcting the equation is similar in many ways to the renormalization group ap-proach. However, in this case it is a particular solution, which may be spatially de-pendent, rather than an ensemble averaging process, which provides the correction.

We explain the multigrid approach using a conventional problem, which is thesolution of a differential equation. To solve the differential equation we will find anapproximate solution on a grid of points.Our ultimate objective is to find a solutionon a fine enough grid so that the solution is within a prespecified accuracy of the ex-act answer. However, we will start with a much coarser grid solution and progressivelyrefine it to obtain more accurate results. Typically the multigrid method is applied intwo or three dimensions, where it has greater advantages than in one dimension.However, we will describe the concepts in one dimension and leave out many of thesubtleties.

For concreteness we will assume a differential equation which is:

(1.10.59)

where g(x) is specified. The domain of the equation is specified, and boundary con-ditions are provided for f (x) and its derivative.On a grid of equally spaced points wemight represent this equation as:

(1.10.60)

This can be written as a matrix equation:

(1.10.61)

The matrix equation can be solved for the values of f (i) by matrix inversion (usingmatrix diagonalization). However, diagonalization is very costly when the matrix islarge, i.e., when there are many points in the grid.

A multigrid approach to solving this equation starts by defining a set of lattices(grids), Lj, j ∈ {0,. . .,q}, where each successive lattice has twice as many points as theprevious one (Fig. 1.10.15). To explain the procedure it is simplest to assume that westart with a good approximation for the solution on grid Lj −1 and we are looking fora solution on the grid Lj . The steps taken are then:

1. Interpolate to find f j0(i), an approximate value of the function on the finer

grid Lj.

j∑ A (i, j) f (j) = g(i)

1

d 2( f (i +1) + f (i −1) − 2 f (i)) = g(i)

d 2f (x)

dx2= g(x)



01adBARYAM_29412 3/10/02 10:17 AM Page 289

2. Perform an iterative improvement (relaxation) of the solution on the finer grid.This iteration involves calculating the error

(1.10.62)

where all indices refer to the grid Lj. This error is used to improve the solution onthe finer grid, as in the minimization procedures discussed in Section 1.7.5:

(1.10.63)

The scalar c is generally replaced by an approximate inverse of the matrix A(i,j)as discussed in Section 1.7.5. This iteration captures much of the correction ofthe solution at the fine-scale level; however, there are resulting corrections atcoarser levels that are not captured. Rather than continuing to iteratively improvethe solution at this fine-scale level, we move the iteration to the next coarser level.

3. Recalculate the value of the function on the coarse grid Lj −1 to obtain f 1j –1(i). This

might be just a restriction from the fine-grid points to the coarse-grid points.However, it often involves some more sophisticated smoothing. Ideally, it shouldbe such that interpolation will invert this process to obtain the values that werefound on the finer grid. The correction for the difference between the interpo-lated and exact fine-scale results are retained.

4. Correct the va lue of g(i) on the coa rse grid using the va lues of r j(i) re s tri cted tothe coa rs er gri d . We do this so that the coa rs e - grid equ a ti on has an ex act soluti onthat is con s i s tent with the fine-grid equ a ti on . From Eq . (1.10.62) this essen ti a llymeans adding r j(i) to g(i) . The re su l ting corrected va lues we call g1

j– 1(i) .

f1j(i) = f0

j(i) − cr j(i)

′ i ∑ A (i, ′ i ) f0

j( ′ i ) − g(i) = r

j(i)



L0

L1

L2

L3

Figure 1.10.15 Illustration of four grids for a one-dimensional application of the multigridtechnique to a differential equation by the procedure illustrated in Fig. 1.10.16. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 290

5. Relax the solution f1j –1(i) on the coarse grid to obtain a new approximation to the

function on the coarse grid f2j –1(i). This is done using the same procedure for re-

laxation described in step 3; however g(i) is replaced by g1j –1(i).

The procedure of going to coarser grids in steps 3 through 5 is repeated for allgrids Lj −2, Lj −3,… till the coarsest grid, L0. The values of the function g(i) are pro-gressively corrected by the finer-scale errors. Step 5 on the coarsest grid is re-placed by exact solution using matrix diagonalization. The subsequent steps aredesigned to bring all of the iterative refinements to the finest-scale solution.

6. Interpolate the coarse-grid solution L0 to the finer-grid L1.

7. Add the correction that was previously saved when going from the fine to thecoarse grid.

Steps 6–7 are then repeated to take us to progressively finer-scale grids all the wayback to Lj.

This procedure is called a V-cycle since it appears as a V in a schematic that showsthe progressive movement between levels. A V-cycle starts from a relaxed solution ongrid Lj −1 and results in a relaxed solution on the grid Lj. A full multigrid procedure in-volves starting with an exact solution at the coarsest scale L0 and then performing V-cycles f or progressively finer grids. Such a multigrid procedure is graphically il lus-trated in Fig. 1.10.16.

F ra c t a l s , s c a l i n g a n d re n o r m a l i z a t i o n 291


Figure 1.10.16 The multigrid algorithm used to obtain the solution to a differential equa-tion on the finest grid is described schematically by this sequence of operations. The opera-tion sequence is to be read from left to right. The different grids that are being used are in-dicated by successive horizontal lines with the coarsest grid on the bottom and the finestgrid on the top. The sequence of operations starts by solving a differential equation on thecoarsest grid by exact matrix diagonalization (shaded circle). Then iterative refinement of theequations is performed on finer grids. When the finer-grid solutions are calculated, thecoarse-grid equations are corrected so that the iterative refinement of the fine-scale solutioncan be performed on the coarse grids. This involves a V-cycle as indicated in the figure by theboxes. The horizontal curved arrows indicate the retention of the difference between coarse-and fine-scale solutions so that subsequent refinements can be performed. ❚

01adBARYAM_29412 3/10/02 10:17 AM Page 291

There are several advantages of the multigrid methodology for the solution ofdifferential equations over more traditional integration methods that use a single-grid representation. With careful implementation, the increasing cost of finer-scalegrids grows slowly with the number of grid points, scaling as N ln(N). The solutionof multiple problems of similar type can be even more efficient,since the correctionsof the coarse-scale equations can often be carried over to similar problems, accelerat-ing the iterative refinement. This is in the spirit of developing universal coarse-scalerepresentations as discussed earlier. Finally, it is natural in this method to obtain esti-mates of the remaining error due to limited grid density, which is important toachieving a controlled error in the solution.

1.10.7 Levels of description, emergence of simplicityand complexity

In our explorations of the world we have often discovered that the natural world maybe described in terms of underlying simple objects, concepts, and laws of behavior(mechanics) and interactions. When we look still closer we see that these simple ob-jects are composite objects whose internal structure may be complex and have awealth of possible behavior. Somehow, the wealth of behavior is not relevant at thelarger scale. Similarly, when we look at longer length scales than our senses normallyare attuned to, we discover that the behavior at these length scales is not affected byobjects and events that appear important to us.

Examples are found from the behavior of galaxies to elementary particles: galax-ies are composed of suns and interstellar gases, suns are formed of complex plasmasand are orbited by planets, planets are formed from a diversity of materials and evenlife, materials and living organisms are formed of atoms,atoms are composed of nu-clei and electrons, nuclei are composed of protons and neutrons (nucleons),and nu-cleons appear to be composed of quarks.

Each of these represents what we may call a level of description of the world. Alevel is an internally consistent picture of the behavior of interacting elements that aresimple. When taken together, many such elements may or may not have a simple be-havior, but the rules that give rise to their collective behavior are simple.We note thatthe interplay between levels is not always just a self-contained description of one levelby the level immediately below. At times we have to look at more than one level in or-der to describe the behavior we are interested in.

The existence of these levels of description has led science to develop the notionof fundamental law and unified theories of matter and nature. Such theories are theself-consistent descriptions of the simple laws governing the behavior and interplayof the entities on a particular level. The laws at a particular level then give rise to thelarger-scale behavior.

The ex i s ten ce of s i m p l i c i ty in the de s c ri pti on of u n derlying fundamental lawsis not the on ly way that simplicity arises in scien ce . The ex i s ten ce of mu l tiple lev-els implies that simplicity can also be an em er gent property. This means that theco ll ective beh avi or of m a ny el em en t a ry parts can beh ave simply on a mu ch largers c a l e .



01adBARYAM_29412 3/10/02 10:17 AM Page 292

The study of complex systems focuses on understanding the relationship be-tween simplicity and complexity. This requires both an understanding of the emer-gence of complex behavior from simple elements and laws, as well as the emergenceof simplicity from simple or complex elements that allow a simple larger-scale de-scription to exist.

Much of our discussion in this section was based upon the understanding thatmacroscopic behavior of physical systems can be described or determined by only afew relevant parameters. These parameters arise from the underlying microscopic de-scription. However, many of the aspects of the microscopic description are irrelevant.Different microscopic models can be used to describe the same macroscopic phe-nomenon. The approach of scaling and renormalization does not assume that all thedetails of the microscopic description become irrelevant, however, it tries to deter-mine self-consistently which of the microscopic parameters are relevant to the macro-scopic behavior in order to enable us to simplify our analysis and come to a betterunderstanding.

Whenever we are describing a simple macroscopic behavior, it is natural that thenumber of microscopic parameters relevant to model this behavior must be small.This follows directly from the simplicity of the macroscopic behavior. On the otherhand,if we describe a complex macroscopic behavior, the number of microscopic pa-rameters that are relevant must be large.

Nevert h el e s s , we know that the ren orm a l i z a ti on group approach has some va-l i d i ty even for com p l ex sys tem s . At long length scales, a ll of the details that occur onthe smallest length scale are not rel eva n t . The vi bra ti ons of an indivi dual atom arenot gen era lly rel evant to the beh avi or of a com p l ex bi o l ogical or ga n i s m . In deed ,t h ere is a pattern of l evels of de s c ri pti on in the stru ctu re of com p l ex sys tem s . Forbi o l ogical or ga n i s m s , com po s ed out of a tom s , t h ere are ad d i ti onal levels of de-s c ri pti on that are interm ed i a te bet ween atoms and the or ga n i s m : m o l ec u l e s , cell s ,ti s su e s , or gans and sys tem s . The ex i s ten ce of these levels implies that many of t h edetails of the atomic beh avi or are not rel evant at the mac ro s copic level . This shouldalso be unders tood from the pers pective of the mu l ti - grid approach . In this pictu re ,wh en we are de s c ri bing the beh avi or of a com p l ex sys tem , we have the po s s i bi l i ty ofde s c ri bing it at a very coa rse level or a finer and yet finer level . The nu m ber of l ev-els that are nece s s a ry depends on the level of prec i s i on or level of detail we wish toach i eve in our de s c ri pti on . It is not alw ays nece s s a ry to de s c ri be the beh avi or interms of the finest scale. It is essen ti a l , h owever, to iden tify properly a model thatcan captu re the essen tial underlying para m eters in order to discuss the beh avi or ofa ny sys tem .

Like biological organisms, man-made constructs are also built from levels ofstructure. This method of organization is used to simplify the design and enable us tounderstand and work with our own creations. For example, we can consider the con-struction of a factory from machines and computers,machines constructed from in-dividual moving parts, computers constructed from various components includingcomputer chips, chips constructed from semiconductor devices, semiconductor de-vices composed out of regions of semiconductor and metal. Both biology and

F ra c t a l s , s c a l i n g a nd re n o r m a l i z a t i o n 293


01adBARYAM_29412 3/10/02 10:17 AM Page 293

engineering face problems of design for function or purpose. They both make use ofinteracting building blocks to engineer desired behavior and therefore construct thecomplex out of the simple. The existence of these building blocks is related to the ex-istence of levels of description for both natural and artificial systems.

Our discussion thus brings us to recognize the importance of studying the prop-erties of substructure and its relationship to function in complex systems. This rela-tionship will be considered in Chapter 2 in the context of our study of neuralnetworks.



01adBARYAM_29412 3/10/02 10:17 AM Page 294

Documents

Introduction and Preliminaries · 2013. 3. 24. · 16 # 29412 Cust: AddisonWesley Au: Bar-Yam Pg. No. 16 Title: Dynamics Complex Systems Short / Normal / Long 1 Introduction and Preliminaries