167
1 INTRODUCTION The aim of computational structural biology is to understand (and predict) the structure and function of biological macromolecules, such as proteins and nucleic acids based on their microscopic (atomic) interactions. These are thermodynamic systems, which are also affected by environmental factors such as temperature, pressure, and solvent conditions. Classical thermodynamics is a general

INTRODUCTION The aim of computational structural biology is to

  • Upload
    metea

  • View
    14

  • Download
    0

Embed Size (px)

DESCRIPTION

INTRODUCTION The aim of computational structural biology is to understand (and predict ) the structure and function of biological macromolecules, such as proteins and nucleic acids based on their microscopic (atomic) interactions. These are thermodynamic systems, which are also - PowerPoint PPT Presentation

Citation preview

Page 1: INTRODUCTION The aim of computational structural biology is to

1

INTRODUCTION 

The aim of computational structural biology is to understand (and predict) the structure and function of biological macromolecules, such as proteins and nucleic acids based on their microscopic (atomic) interactions. These are thermodynamic systems, which are also affected by environmental factors such as temperature,

pressure, and solvent conditions.   Classical thermodynamics is a general methodology that enables one to derive relations among macroscopic properties such as volume (V), pressure (P), temperature (T), energy (E) and entropy (S) without the need to consider the atomic properties of the system, e.g., the equation of state, PV=NRT.

Page 2: INTRODUCTION The aim of computational structural biology is to

2

In statistical mechanics, on the other hand, the macroscopic thermodynamic behavior is obtained from the specific microscopic description of a system (atomic interactions, atomic masses, etc.). Therefore, statistical mechanics also provides information that is beyond the reach of classical thermodynamics - what is the native structure of a protein ?

It is impossible to know the exact dynamic behavior of a large system (i.e., the atomic coordinates at time t). Appearance of a 3D configuration is only known with a certain probability. Thus, statistical mechanics is a probabilistic theory, and computational methods, such as Monte Carlo and molecular dynamics are based on these probabilistic properties. Part of the course will be devoted to basic probability theory.  

Page 3: INTRODUCTION The aim of computational structural biology is to

3

This is a short course. Therefore, the theory of statistical mechanics will not be derived rigorously. The emphasis will be on solving problems in statistical mechanics within the framework of the canonical ensemble, treating polymer systems and proteins. The probabilistic nature of the theory will be emphasized as reflected in computer simulation techniques.

Page 4: INTRODUCTION The aim of computational structural biology is to

4

Mechanical Systems are Deterministic

Refreshing some basic physics

Examples of forces F (F=|F|):

1) Stretching a spring by a distance x: F=-kx, Hook’s Law k- spring constant.

2) Gravitation force: F= kMm/r2 - m and M masses with distance r; k - constant. On earth (R,M large), g=kM/R2

F=mg

3) Coulomb law: F=kq1q2/r2 q1,q2 charges.

Page 5: INTRODUCTION The aim of computational structural biology is to

5

Newton’s second law: F=ma - a acceleration

Mechanical work W: if a constant force is applied along distance d, W=Fd (F=|F|). More general, W= F.

.dx.

Potential energy: If mass m is raised to height, h negative work is done, W = –mgh and the mass gains potential energy,Ep= -W = +mgh - the ability to do mechanical work: when m falls dawn, Ep is converted into kinetic energy, Ek = mv2/2, where v2/2=gh (at floor).

A spring stretched by d: Ep= -W = k xdx = kd2/2

In a closed system the total energy, Et = Ep+ Ek is constant but Ep/Ek can change; e.g., oscillation of a mass hung on a spring and distorted from its equilibrium position.

Page 6: INTRODUCTION The aim of computational structural biology is to

6

The dynamics of a mechanical macroscopic system in principle is deterministic in the sense that if the forces are known, and the positions and velocities of the masses at time t=0 are known as well, their values at time t can in principle be determined by solving Newton’s equations.

Simple examples: harmonic oscillator (a spring), a trajectory of a projectile, movement of spaceship, etc. In some cases the solution is difficult and requires strong computers.

Page 7: INTRODUCTION The aim of computational structural biology is to

7

Stability - a system of interacting masses (by some forces) tends to arrange itself in the lowest potential energy structure (which might be degenerate) also called the ground state. The system will stay in the ground state if the kinetic energy is very small - this situation defines maximum order.

The larger the kinetic energy the larger is the disorder - in the sense that at each moment a different arrangement of the masses will occur (no ground state any more). Still, in principle, the trajectories of the masses can be calculated. Two argon atoms at rest positioned at the lowest energy distance interacting through Lennard-Jones potential. Microscopic system.

612

4rr

r)(

Page 8: INTRODUCTION The aim of computational structural biology is to

8

Thermodynamic systems and Statistical MechanicsA typical system is described below; examined from a microscopic point of view – non-rigorous treatment

TR

A system C of N molecules in a constant volume V is in thermal contact with a large reservoir (R) (also called heat bath) with a well defined temperature TR. At equilibrium (after a long time) energy is still exchanged between R and C but the average kinetic (and potential) energy of a molecule of C is constant, leading to Tc=TR.

CRTC

Page 9: INTRODUCTION The aim of computational structural biology is to

9

However, in contrast to a macroscopic mechanical system, there is no way to know the trajectories of the particles thatare changed constantly due to the energy exchange between C&R and quantum mechanics limitations.

Relating kinetic energy to temperature, at low T, Ek is low, the effect of the molecular forces significant - the system arrange itself in a low potential energy state – relatively high order. At high T, Ek is high and dominant, Ep is high – high disorder that includes the uncertainty related to trajectories.

Therefore, a thermodynamic system at equilibrium cannot be characterized by the positions & velocities of its 1023 particles but only by the average values of several macroscopic parameters such as P, T, E (internal energy) and entropy, S.

Page 10: INTRODUCTION The aim of computational structural biology is to

10

For example, in measuring T the thermometer feels the average effect of many molecular configurations of the tested system and for long measurement all the microscopic states are realized and affect the result of T.

Hence to obtain the average values of macroscopic parameters from microscopic considerations a probability density P(xN,vN) should be assigned to each system state (xN,vN) where (xN,vN) = (x1,y1,z1,x2,y2,z2, ….xN,yN,zN,vx1,vy1vz1…. vxN,vyNvzN) thus assuming that all states contribute.

Page 11: INTRODUCTION The aim of computational structural biology is to

11

Then a macroscopic parameter M is a statistical average,

<M> = P (xN,vN) M(xN,vN) d(xNvN).

The entropy for a discrete and continuous system is (kB is the Boltzmann constant),

S = <S> = -kB Pi ln Pi and

S= -kB P(xNvN) lnP(xN,vN) d(xNvN)+

+ ln{const.[dimension (xNvN)]}

Notice, P is a probability density with dimension 1/ (xNvN) . The constant is added to make S independent of (xNvN).

Page 12: INTRODUCTION The aim of computational structural biology is to

12

The problem – how to determine P. In thermodynamics an N,V,T system is described by the Helmholtz free energy A,

A(T,V,N)=E –TS,

which from the second law of thermodynamics should be minimum for a given set of constraints.

We shall determine P by minimizing the statistical free energy with respect to P.

Page 13: INTRODUCTION The aim of computational structural biology is to

13

A can be expressed as the average

A = <A(P)>= P(X)[E(X)+kBTlnP(X)]dX +const.

X=(xN,vN). We derive A with respect to P andequate to zero; the const. is omitted.

A’= [E(X)+kBTlnP(X) + kBT P(X) /P(X)]dX=0

E(X)+kBT[lnP(X) + 1]=0 lnP(X) = - [E(X) + kBT]/kBT= = - [E(X)/kBT] +1 P(X) =const.exp[-E(X)/kBT]

The normalization is Q= exp[-E(X)/kBT)]

Page 14: INTRODUCTION The aim of computational structural biology is to

14

i.e., PB(X)=exp[-E(X)/kBT]/Q

PB – the Boltzmann probability (density).

Q – the canonical partition function.

The intgrand defining A is E(X)+kBTlnPB(X)

Substituting PB and taking the ln gives,

E(X)+kBT[- E(X)/kBT –lnQ]= -kBTlnQ

-kBTlnQ is constant for any X and can be taken out of the integral of A. Thus, A=E-TS= - kBTlnQ

Page 15: INTRODUCTION The aim of computational structural biology is to

15

The relation A= - kBTlnQ is very important. It enables calculating A by Q that is based on the details of the system.

In classical thermodynamics all the quantities are obtained as derivatives of A. Hence, with statistical mechanics all these quantities can be obtained by taking derivatives of lnQ.

Also, the probabilistic character of statistical mechanics enables calculating averages even without calculating Q. These averages can be even geometrical properties that are beyond the reach of thermodynamics, such as the end-to-end distance of a polymer, the radius of gyration of a protein and many other geometrical or dynamic properties. This is in particular useful in computer simulation.

Page 16: INTRODUCTION The aim of computational structural biology is to

16

(1)   Probability and Statistics, M.R. Spiegel, Schaum’s Outline Series, McGRAW-Hill ISBN 0-07-060220-4.

(2)  An Introduction to Statistical Thermodynamics, T.L. Hill, Dover, ISBN 0-486-652- 42-4. (Also Eddison-Wesley).

(3) Statistical Mechanics, R. Kubo. North Holland (ISBN 0-7204-0090-2) and Elsevier (0-444-10637-5).

(4)   Statistical Mechanics of Chain Molecules. P. J. Flory. Hanser (ISBN 3-446- 15205-9) and Oxford (ISBN 0-19-520756-4).

(5)   Phase Transitions and Critical Phenomena. H.E Stanley (Oxford).

6) Introduction to Modern Statistical Mechanics. David Chandler (Oxford). ISBN 0-19-504277-8.

Page 17: INTRODUCTION The aim of computational structural biology is to

17

TR

N particles with velocities vN and coordinates xN are moving in a container in contact with a reservoir of temperature T.We have seen last time that the Helmholtz free energy, A is

A=E-TS= - kBTlnQ where

Q= exp[-E(xN,vN)/kBT) d(xNvN)]

Lecture 2: Calculation of the partition function Q

Systems at equilibrium

Page 18: INTRODUCTION The aim of computational structural biology is to

18

E(xN,vN)= Ek(vN) + Ep(xN)

E(xN,vN) is the Hamiltonian of the system.If the forces do not depend on the velocities (most cases) Ek is independent of Ep and the integrations can be separated. Also, the integrations over the velocities of different particles are independent. Moreover, the integrations over the components vx ,vy ,and vz are independent. Therefore, we treat first the integration over vx (denoted v) of a single particle.

To recall: the linear momentum vector p=mv; therefore, for one component: Ek= mv2/2 = p2/2m

Page 19: INTRODUCTION The aim of computational structural biology is to

19

A useful integral (from table):

adxax

0 2

12 )exp(

Therefore, our integral is

TBmkdp

TBmkp

22

2)exp(

because

TBmka

21

Page 20: INTRODUCTION The aim of computational structural biology is to

20

The integral over the 3N components of the momentum (velocities) is the following product:

NTBmk 32 )( Q is (h is the Planck constant – see Hill p.74; =VN),

NdTBk

NENhN

NTBmkQ x)

)(xexp(

!

)(3

32

The problem is to calculate the configurational integral

Page 21: INTRODUCTION The aim of computational structural biology is to

21

The origin of the division by N factorial, N!=12 3… N is that each configuration of the N particles in positions x1 , x2, x3,….,xN can be obtained N! times by exchanging the particles in these positions (introduced by Gibbs). For example, for 3 particles there are 3!=6 possible permutations.

sites

Particles 1,2,3 1 2 3 3 2 1 3 1 2 1 3 2 2 1 3 2 3 1

....))(x

exp( 222111 dzdydxdzdydxTBk

NE

Page 22: INTRODUCTION The aim of computational structural biology is to

22

Stirling approximate formula: ln N! N ln (N/e)

The Helmholtz free energy A(N,V,T) is:

NdTBk

NETBk

Ne

h

TBmkTNBk x

)(xexpln}

/ln{

23

22

The velocities (momenta) part is completely solved; it contains m and h - beyond classical thermodynamics!The problem of statistical mechanics is thus to solve the integral. For an ideal gas, E = 0 (no interactions) hence =VN

trivial!!}

/ln{),,(

NeV

h

TBmkTNBkTVNA

23

22

Page 23: INTRODUCTION The aim of computational structural biology is to

23

Thermodynamic derivatives of A of an ideal gas (properties ~N or V are called extensive)

TBNkPVVTBNkNTV

AP

/

,

TBNk

NV

TTA

TE232

,

)(

Pressure:Intensive variable ~N/V

Internal energy:Extensive variable~N

E is the average kinetic energy, proportional to T, independent of V. Each degree of freedom contributes (1/2)kBT. If the forces (interactions) do not depend on the velocities, T is determined by the kinetic energy only (see first lecture).

Page 24: INTRODUCTION The aim of computational structural biology is to

24

Specific heat CV is independent of T and V.

BNkNVT

EVC

23

,

NVe

h

TBmkBNk

TAE

NVTA

S2523

22 //

ln,

The entropy is:

S increases with increasing T, V and N- extensive variable. S is not defined at T=0 – should be 0 according the third low of thermodynamics; the ideal gas pictureholds only for high T. Both S and E increase with T.

Page 25: INTRODUCTION The aim of computational structural biology is to

25

In the case of a real gas E(xN)0 and the problem is to calculate the configurational partition function denoted Z, where the momentum part is ignored,

Z= exp[- E(xN)/kBT]dxN

Notice: While Q is dimensionless, Z has the dimension of xN.Also, Z= (E)exp[- E/kBT] dE

(E) – the density of states around E; (E)dE – the volume in configurational space with energy between E and E+ dE. For a discrete system (n(Ei) is the degeneracy of E)

Z= exp [- Ei/kBT] = n(Ei) exp [- Ei/kBT]

Page 26: INTRODUCTION The aim of computational structural biology is to

26

The contribution of Z to the thermodynamic functions is obtained from derivatives of the configurational Helmholtz free energy, F F=-kBTln ZCalculating Z for realistic models of fluids, polymers, proteins, etc. by analytical techniques is unfeasible. Powerful numerical methods such as Monte Carlo and molecular dynamics are very successful.

A simple example: N classical harmonic oscillators are at equilibrium with a heat bath of temperature T - a good model for a crystal at high temperature (the Einstein model), where each atom feels the forces exerted by its neighbor atoms and can approximately be treated as an independent oscillator that does not interact with the neighbor ones.

Page 27: INTRODUCTION The aim of computational structural biology is to

27

Therefore, QN=qN, where q is the partition function of a single oscillator. Moreover, the components (x, y, z),are independent as well; therefore, one can calculate qx

and obtain QN=qx3N.

The energy of a macroscopic oscillator (e.g., a mass hung on a spring) is determined by its amplitude (the stretching distance). The amplitude of a microscopic oscillator is caused by the energy provided by the heat bath. This energy changes all the time and the amplitude changes as well but has an averagevalue that increases as T increases. Unlike a macroscopicmechanical oscillator, the position of the mass is unknown asa function of t. We only know PB(x).

Page 28: INTRODUCTION The aim of computational structural biology is to

28

The kinetic and potential energy (Hamiltonian) of an oscillator are p2/2m+fx2/2

f is the force constant and q=qkqp where qk was calculated

before; is the frequency of the oscillator.

h

TBmkq

2k f

TBkq

2p

hTBk

hTBk

fm

q 2mf

21

Page 29: INTRODUCTION The aim of computational structural biology is to

29

TBkTTA

TE

2 CV = kB

The average energy of one component (e.g., x direction) ofan oscillator is twice as that of an ideal gas – effect of interaction. For N 3D oscillators, E=3NkBT – extensive (~N) The entropy is (also extensive),

S = E/T- A/T=3NkB(1+ln [kBT/h] )E and S increase with T.In mechanics the total energy of an oscillator is constant, fd2/2 where d is the amplitude of the motion and at time t the position of the mass is known exactly.

Page 30: INTRODUCTION The aim of computational structural biology is to

30

In statistical mechanics a classical oscillator changes its positions due to the random energy delivered by the heat bath. The amplitude is not constant, but the average energy is proportional to T.

The positions of the mass are only known with their Boltzmann probability. When T increases the energy increases meaning that the average amplitude increases and the position of the mass is less defined; therefore, the entropy is enhanced.

Notice: A classical oscillator is a valid system only at high T. At low T one has to use the quantum mechanical oscillator.

Page 31: INTRODUCTION The aim of computational structural biology is to

31

Lecture 3

Thus far we have obtained the macroscopic thermodynamic quantities from a microscopic picture by calculating thepartition function Q and taking (thermodynamic)derivatives of the free energy –kBTln Q. We have discussed two simple examples, ideal gas and classical oscillators.

We have not yet discussed the probabilistic significance of statistical mechanics. To do that, we shall first devote twolectures to basic probability theory.

Page 32: INTRODUCTION The aim of computational structural biology is to

32

Experimental probability Rolling a die n times.  What is the chance to get an odd number? n 10 50 100 400 1000 10,000 …. m 7 29 46 207 504 5,036 .…  Relative frequency f(n)=m/n 0.7 0.58 0.46 0.517 0.5040 0.5036 ….   f(n) → P = 0.5; P = experimental probability 

 

Page 33: INTRODUCTION The aim of computational structural biology is to

33

In many problems there is an interest in P and other properties. To predict them, it is useful to define for each problem an idealized model - a probability space.

Sample space

Elementary event:

Tossing a coin – A happened, or B happened.

Rolling a die - 1, 2, 3, 4, 5, or 6 happened.

 Event: Any combination of elementary events.

An even number happened (2,4,6); a number larger than 3 happened (4,5,6).

 

Page 34: INTRODUCTION The aim of computational structural biology is to

34

The empty event: impossible event –

(2 < a number < 3).

 The certain event – Ω (Coin: A or B happened).

 Complementary event - Ā=Ω-A (1,2,3,4,5,6) -(2,4,6) = (1,3,5).

 Union – (1,2,3) (2,3,5) = (1,2,3,5).

 Intersection – (1,2,4) (2,5,6) = (2)

 

 

Page 35: INTRODUCTION The aim of computational structural biology is to

35

  

A   B AB = AB = intersection red +green AB = A and B AB = whole    

Page 36: INTRODUCTION The aim of computational structural biology is to

36

Elementary probability space      The sample space consists of a finite number n of points ( (elementary events) B.      Every partial set is an event.      A probability P(A) is defined for each event A.      The probability of an elementary event B is P(B)=1/n.      P(A)=m/n; m- # of points in event A.

Page 37: INTRODUCTION The aim of computational structural biology is to

37

Properties of P      0 P(A) 1     P(AB) P(A) + P(B)     i P(Ai) = 1 (Ai, elementary events)

 Examples: a symmetric coin; an exact die. However, in the experimental world a die is not exact and its rolling is not random; thus, the probabilities of the elementary events are not equal.On the other hand, the probability space constitutes an ideal model with equal P’s. Comment: In general, elementary events can have different probabilities (e.g., a non-symmetric coin). 

Page 38: INTRODUCTION The aim of computational structural biology is to

38

Example:

A box contains 20 marbles, 9 white, 11 black. A marble is drawn at random.

What is the probability that it is white?

Elementary event (EE): selection of one of the 20 marbles.

Probability of EE: 1/20The event A – a white marble was chosen contains 9 EE,

P=9/20 This consideration involves the ideal probability space; the real world significance: P is a result of many experiments, P=f(n), n.

Page 39: INTRODUCTION The aim of computational structural biology is to

39

More complicated examples:

What is the number of ways to arrange r different balls in n cells?

Every cell can contain any number of balls.Each ball can be put in n cells

##of# ways= nn n….n=nr

nr= the number of words of length r (with repetitions)based on n letters. A,B,C AA, BB, AB, BA, CC, CA, AC, BC, CB = 32 = 9

Page 40: INTRODUCTION The aim of computational structural biology is to

40

Permutations:

# of samples of r objects out of n objects without repetitions (the order is considered) (0!=1):

(n)r= n(n-1)…(n-r+1) = 12 ... (n-r)(n-r+1)…n = n! 12 …(n-r) (n-r)!

(3)2 (1,2), (1,3), (2,1), (2,3), (3,1), (3,2) (1,3) (3,1)

# of r (2) letter words from n (3) letters:AB, BA, CA, AC, BC, CB

= 6

Page 41: INTRODUCTION The aim of computational structural biology is to

41

Problem:

Find the probability that r people (r 365) selected at random will have different birthdays?

Sample space: all arrangements of r objects (people) in 365 cells (days) – their number = 365r p(EE) = 1/365r

Event A: not two birthdays fall on the same day-

# of points in A = 365364 ……. (365-r+1)=(n)r

P(A) = (n)r = 365! 365r (365-r)!365r

Page 42: INTRODUCTION The aim of computational structural biology is to

42

Combinations # of ways to select r out of n objects if the order is not considered.

# of combinations = # of permutations / r!

!)!(!

!)(

rrnn

rrn

r

n

n=3; r=2Permutations/2(1,2) (2,1) /2 (1,3) (3,1) /2 (2,3) (3,2) /2

Page 43: INTRODUCTION The aim of computational structural biology is to

43

Problem: In how many ways can n objects be divided into k groups of r1, r2,…..rk; rk= n without considering the order in each group but considering the order between the groups?

!!....!!

......krrr

nr

rrn

r

rn

r

n

213

21

2

1

1

Problem: How many events are defined in a sample space of n elementary events?

nn

nnnn2

310

.....

r

n Binomial coefficient

nxn

nrxr

nx

nnnx

........)(

101

Page 44: INTRODUCTION The aim of computational structural biology is to

44

Problem: 23 chess players are divided into 3 groups of 8, 8, and 7 players. What is the probability that players A, B, and C are in the same group (event A)?

EE – an arrangement of the players in 3 groups.

# EE = 23!/(8!8!7!)

If A, B, and C in the first group the # of arrangs. 20!/(5!8!7!) etc.

25321

78823

48820

785220

A !!!

!!!!

!!!!

!)(P

Page 45: INTRODUCTION The aim of computational structural biology is to

45

Problem: What is the number of ways to arrange n objects that r1 , r2 , r3 , … rk of them are identical, ri= n?

A permutation of the n objects can be obtained in r1 ! r2 ! … rn !

times the n! permutations should be divided by this factor

6 6 24 5551112222 5115251222

!!...!!

#krrr

n

21ways

Page 46: INTRODUCTION The aim of computational structural biology is to

46

Problem: 52 cards are divided among 4 players. What is the probability that every player will have a king?

EE – a possible division of the cards to 4 groups of 13.

# EE 52!/(13!)4

If every player has a king, only 48 cards remained to be

distributed into 4 groups of 12 # of EE(A) = 48!/(12!)4

P(A)=[48!/(12!)4]/[52!/(13!)4]

Page 47: INTRODUCTION The aim of computational structural biology is to

47

Product Spaces

So far the probability space modeled a single experiment –tossing a coin, rolling a die, etc. In the case of n experiments we define a product space:

Coin; two EE: 0 ,1 P=1/2

1 2 ……. n 1 ………. nEE: (1,0,0,1,…,1); (0,1,1,1,…,0); (…….); # (EE)=2n

If the experiments are independent, P(EE)=(1/2)n

0

1

0

13

0

12

0

11 n

....

Page 48: INTRODUCTION The aim of computational structural biology is to

48

Die: EE are: 1, 2 , 3 , 4 , 5 ,6

1…………..n 1…………...nEE= (1,5,2,4….,3,6); (2,3,2,5,….,1,1); (4,…….)…; #(EE)=6n

In the cases of independent experiments, P(EE) = (1/6)n

6

5

43

2

1

6

5

43

2

13

6

5

43

2

12

6

5

43

2

11 n

....

Page 49: INTRODUCTION The aim of computational structural biology is to

49

Problem:15 dice are rolled. Find the probability to obtain three times the numbers 1,2, and 3 and twice, 4, 5, and 6?

EE: all possible outcomes of 15 experiments, #(EE)= 615

#(A): according to formula on p. 45: 15!/[(3!)3(2!)3]

Also: 1 can be chosen in (15 14 13)/3! ways.

2 in (12 11 10) )/3! ways etc.

3315 236

15A

)!()!(

!)( P

Page 50: INTRODUCTION The aim of computational structural biology is to

50

Dependent and independent events

Event A is independent of B if P(A) is not affected if B occurred.

P(A/B)=P(A)

P(A/B) – conditional probability. For example, die:

Independent:P(even)=1/2; P(even/square)=1/2 [ P({2,4,6}/{1,4})=1/2]

Dependent: P(2)=1/6; P(2/even)=1/3 P(even/odd)=0, while P(even)=1/2 (disjoint)

Page 51: INTRODUCTION The aim of computational structural biology is to

51

Bayes Formula

P(A/B) = P(AB)/P(B); P(B/A) = P(AB)/P(A);

P(A)>0; P(B)>0

)()()/(

)/(B

AABBA

PPP

P

A=(2) B=even (2,4,6) P(A/B)=1/3. Using formula: P(A)=1/6; P(B)=1/2; P(AB)=1/6

312161

BA ///

)/( P Independency: P(AB)= P(A)P(B)

Page 52: INTRODUCTION The aim of computational structural biology is to

52

If an event A must result in a mutually exclusive events, A1,….An , i.e., A=AA1 + AA2 +… AAn then

P(A)=P(A1)P(A/A1) +……+P(An)P(A/An)

Problem: Two cards are drawn successively from a deck. What is the probability that both are red?

EE- product space: (r,r), (r,no), (no,r), (no,no)A first card is red {(r,r); (r,no)}B second card is red {(no,r); (r,r)}

P(AB)=P(r,r)=P(A)P(B/A)=1/2·(25/51)

Page 53: INTRODUCTION The aim of computational structural biology is to

53

Problems:• Calculate the energy E of an oscillator by a free energy

derivative, where q=kBT/h

2) Prove the equation on p.45.

3) A die is rolled four times. What is the probability to obtain 6 exactly one time.

4) A box contains 6 red balls, 4 white balls, and 5 blue balls. 3 balls are drawn successively. Find the probability that they are drawn in the order red, white, and blue if the ball is (a) replaced, (b) not replaced. 

Page 54: INTRODUCTION The aim of computational structural biology is to

54

What is the expectation value of m in the random variable of Poisson:  P(X = m)=mexp(-)/m! (m = 0,1,2,…..). .

Page 55: INTRODUCTION The aim of computational structural biology is to

55

Summary

We have defined experimental probability as a limit of relative frequency, and then defined an elementary probability space, where P is known exactly. This space enables addressing and solving complicated problems without the need to carry out experiments.

We have defined permutations, combinations, product spaces, and conditional probability and described a systematic approach for solving problems.

However, in many cases solving problems analytically in the framework of a probability space is not possible and one has to use experiments or computer experiments, such as Monte Carlo methods.

Page 56: INTRODUCTION The aim of computational structural biology is to

56

Random variables

For a given probability space with a set of elementary events{}=, a random variable is a function X=X() on the real line - < X() < .

Examples:

Coin: p - head; q- tail. One can define X(p)=1; X(q) =0.However, any other definition is acceptable - X(p)=15 X(q) =2,etc., where the choice is dictated by the problem.

Tossing a coin n times, the sample space is vectors (1,0,0,1…) with P(1,0,0,1…). One can define X=m, where m is the number of successes (heads).

Page 57: INTRODUCTION The aim of computational structural biology is to

57

Distribution Function (DF) or Cumulative DF

For a random variable X (- < X() < )

Fx(X) = P[X()] x

Fx(X) is a monotonically increasing function

1 2 3 4 5 6

15/64/63/62/61/6

Die: for x <1, Fx(X) = 0

for 6, Fx(X) = 1

Page 58: INTRODUCTION The aim of computational structural biology is to

58

Random variable of Poisson

)exp(!

)( m

mxPm

)exp(!

)(

xm

m

mxF

m=1,2,3,…

So far we have discussed discrete random variables.

Continuous random variable – if F(x) continuous and its derivative f(x) = F’(x) is also continuous. f(x) is called the probability density function; f(x)dx is the probability between x and x+dx.

1

)()(;)()( FdttfdttfxF

x

Page 59: INTRODUCTION The aim of computational structural biology is to

59

The normal random variable

The uniform random variable

)exp()(

)exp()()(

221

221

2

2

xxf

dtt

XFxXPx

x

otherwise0

const bxaxf

.)(

f(x) is constant between a and b

x

x

b

a

bx

bxa

x

abax

dttfXF

abxfabcdxcxf

0

1

0

11

)()(

)()()()(

a b

1/(b-a)f(x)

Page 60: INTRODUCTION The aim of computational structural biology is to

60

Expectation Value

X is a discrete random variable with n values, x1, x2 ,… xn.

and P(x1), P(x2), … P(xn), the expectation value E(X) is:

n

iii xxPXE

1)()(

Other names are: mean, statistical average.Coin: X 1 ; 0 with P and 1-P. E(X) = P1 + (1-P)0=P

Die: X 1, 2, 3, 4, 5, 6 with P =1/6 for all.

E(X) = (1/6)(1+2+3+4+5+6)=21/6=3.5

Page 61: INTRODUCTION The aim of computational structural biology is to

61

Continuous random variable with f(x)

dxxxfXE )()(

Provided that the integral converges.

Uniform random variable

2221 222 ab

abab

a

b

abx

dxab

xXEb

a

)()(

)(

Properties of E(X): If X and Y are defined on the same space

E(X+Y) =E(X)+E(Y) ; E(CX)=CE(X) C= const.

[X()+Y()]P() = X()P() + Y()]P()

Page 62: INTRODUCTION The aim of computational structural biology is to

62

Arithmetic averageX1 , X2 , …… Xn are n random variables defined on the same sample space with the same expectation value =E(Xi), then the arithmetic average:

nXXX

X n.... 21

is also a random variable with )(XE

nn

XEXEXEn

XXXEnn

XXXEXE

n

nn

)]()...()([

)....()....

()(

21

2121

1

1 ,

Notice: is defined over the product space (X1, X2,…… Xn) with P(X1, X2,…… Xn). is important in simulations.

XX

Page 63: INTRODUCTION The aim of computational structural biology is to

63

Variance

dxxfxXEXXV )()(])[()()(

222

222222

222

2

2

)()()()()(

][])[()(

XEXEXEXEXE

XXEXEXV

Standard deviation: )()( XVX

Page 64: INTRODUCTION The aim of computational structural biology is to

64

diverges1

converges1

322

333

nn

nn

nc

n

cnXE

nc

n

cnE

n

cnxP

)(

;;;)(

Example: Random variable with an expectation value but without variance.

Independence: random variables X and Y defined on the same space are called independent if

P(X,Y)=P(X)P(Y)

Tossing of a coin twice: (1,1), (0,1), (0,0), (1,0) – the probability of the second toss is independent of the first.In a product space: P(X1, X2 …., Xn)=P(X1)P(X2)…….P(Xn) P(X1, X2 …., Xn) _ Joint probability

Page 65: INTRODUCTION The aim of computational structural biology is to

65

Uncorrelated random variables

If X and Y are independent random variables defined on the same sample space they are uncorrelated, i.e.,

E(XY)=E(X)E(Y)

Proof: E(XY) = ij xiyjP(xiyj) = ij xiyj P(xi)P(yj)=

=i xiP(xi)j yj P(yj) = E(X)E(Y)X,Y independent X,Y uncorrelated. The opposite is not always true. X,Y uncorrelated defined on the same sample space then V(X+Y)=V(X) +V(Y)

V(CX) = E(C2X2) - E2(CX) = C2E(X2) - [CE(X)]2 = C2V(X) V is not a linear operator.

Page 66: INTRODUCTION The aim of computational structural biology is to

66

Variance of the arithmetic average

X1, X2, ….. , Xn are uncorrelated random variables with the same and 2

nnV

nVn

XXXVn

nXXX

VXVX

n

n

2

2212

212

11

)...(

)...

()()(

While the expectation value of the arithmetic average is also , the variance decreases with increasing n!

The above result, is extremely important playing a central role in statistics & analysis of simulation data.

nX

)(

Page 67: INTRODUCTION The aim of computational structural biology is to

67

Sampling

So far we have dealt with probability spaces (ideal world), where the probability of an elementary event is known exactly and probabilities of events A could be calculated.

Then, we defined the notion of a random variable (X) which is a function from the objects of a sample space to the real line, where the function can be defined according to the problem of interest. Accumulative distribution function and probability density function (for a continuous random variable) were defined.

This enables, at least in principle, calculating expectation values E(X) and variances V(X).

Page 68: INTRODUCTION The aim of computational structural biology is to

68

It is important to calculate E ( ) and V of the normal distribution (also called the Gaussian distribution), see p. 61.

][exp)(2

2

221

x

xf

It can be shown (see integral on p.19) that f(x) is normalized,

1

221

2

2dx

xdxxf ][exp)(

0

dxxxfXE )()(

and its expectation value is 0, because x is a symmetric odd function

Page 69: INTRODUCTION The aim of computational structural biology is to

69

The variance is therefore:

22

32

2

22

2

21

22

221

)(][exp

)()(

dxx

x

dxxfxXV

Thus, a Gaussian is defined by only two parameters, E(X) and V(X) - in the above case, 0 and 2, respectively. In the general case, f(x) ~ exp-[(x- )2/(2 2)]. defines the width of the distribution.

Here we used the integral:

23

22

2 /][exp

adxaxx

Page 70: INTRODUCTION The aim of computational structural biology is to

70

= V

E==0

f(x)

The integral of f(x) from - x + provides ~ 68% of the total probability = 1; the integral over - 2 x + 2 covers ~95% of the total area.

Unlike the ideal case (i.e., known probabilities) sometimes a distribution is known to be Gaussian but and are unknown. To estimate one can sample an x value from the distribution – the smaller is the larger the chance that x is closer to .

x

Page 71: INTRODUCTION The aim of computational structural biology is to

71

We shall discuss later how to sample from a distribution without knowing its values. One example is a coin with unknown p (1) and 1-p (0); tossing this coin will produce a sample of relative frequencies (1) p, (0) 1-p.

Also, E(X) = p and V(X)=p(1-p) are unknown a-priori. If p 0 or p 1, V(X) 0 and even a single tossing experiment would lead (with high chance) to 0 and 1, the values of E(X), respectively.

To estimate an unknown E(X) in a rigorous and systematic way it is useful to use the structure of probability spaces developed thus far, in particular the product spaces and the properties of the arithmetic average, .

...

n

XXXX n 21

Page 72: INTRODUCTION The aim of computational structural biology is to

72

Thus, if X1, X2, ….. , Xn are random variables with the same and 2 ( ) and if these random variables are also uncorrelated then

)(XE

For example: one can toss a coin [p (1), 1-p (0), p unknown] n times independently. The result of this experiment is one term (vector) denoted (x1, x2, ….. , xn), [e.g., (1,0,0,1….1,0)] out of the 2n vectors in the product space. Estimation of by (x1+ x2+ ….. + xn )/n is improved in the statistical sense as the variance is decreased by increasing n, the number of experiments; for n the estimation becomes exact because .

)(XE

)(XV

0)(XV

2

nXV

)(

Page 73: INTRODUCTION The aim of computational structural biology is to

73

Thus, to estimate (and other properties) one has to move to the experimental world using the theoretical structure of the probability spaces. Notice again that while the value of P [or f(x)] is unknown, one should be able to sample according to P! (see the above example for the coin).

This is the basic theory of sampling that is used in Monte Carlo techniques and molecular dynamics. However, notice that with these methods the random variables in most cases are correlated; therefore, to use the equation , the #

of samples generated, n’ should be larger, sometimes significantly larger than n, the number of uncorrelated samples used in the above equation. This topic will be discussed in more detail later.

2

nXV

)(

Page 74: INTRODUCTION The aim of computational structural biology is to

74

The probabilistic character of statistical mechanics

Thus far, the thermodynamic properties were obtained from the relation between the free energy, A and the partition function Q, A=-kBTlnQ using known thermodynamics derivatives. However, the theory is based on the assumption that each configuration in phase space has a certain probability (or probability density) to occur - the Boltzmann probability,

PB(X)=exp[-E(X)/kBT]/Q

where X xNvN is a 6N vector of coordinates and velocities Therefore, any thermodynamic property such as the energy is an expectation value defined with PB(X).

Page 75: INTRODUCTION The aim of computational structural biology is to

75

For example, the statistical average (denoted by <>) of the energy is: <E>= PB(xN,vN) E(xN,vN) d (xNvN),where E(xN,vN) is a random variable. <E> is equal to E calculated by a thermodynamic derivative of the free energy, A. For an ideal gas (pp.19-20) <E> is obtained by,

TNkVTmk

ddTmkm

E

VTmk

TmkQ

hNTmk

P

VhN

TmkQ

BNNB

B

NNB

B

N

B

NN

NB

23

2

22

2

22

2

3

22

3

2

IG

32

B

3

3

IG

)(

xp]p

[expp

)(

]p

[exp!/]p

[exp

!

)(

Page 76: INTRODUCTION The aim of computational structural biology is to

76

This is the same result obtained from thermodynamics (p. 23). For two degrees of freedom the integral is:

TkTmk

m

TmkTmk

VTmk

dxdxdpdpTmkpp

mpp

BB

BB

B

B

223

222

222

2

22

2

23

22

2121

22

21

22

21

)(

)]()([

)(

][exp)(

/

Page 77: INTRODUCTION The aim of computational structural biology is to

77

The entropy can also be expressed as a statistical average. For a discrete system, S=<S>= -kB Pi ln Pi

If the system populates a single state k, Pk=1 and S=0 there is no uncertainty. This never occurs at a finite temperature. It occurs only for a quantum system at T=0 K.

On the other hand, if all states have the same probability, Pi=1/, where is the total number of states, the uncertainty about the state of the system is maximal (any state can be populated) and the entropy is maximal as well, S= -kBln This occurs at very high temperatures where the kinetic energy is large and the majority of the system’s configurations can be visited with the same random probability.

Page 78: INTRODUCTION The aim of computational structural biology is to

78

It has already been pointed out that the velocities (momenta) part of the partition function is completely solved (pp. 19-26). On the other hand, unlike an ideal gas, in practical cases the potential energy, E(xN)0 and the problem of statistical mechanics is to calculate the configurational partition function denoted Z, where the momentum part is ignored (see p. 20)

Z= exp[- E(xN)/kBT]dxN where Z has the dimension of xN. Also, Z= (E)exp[- E/kBT] dE (E) – the density of states; (E)dE – the volume in configurational space with energy between E and E+ dE. For a discrete system n(Ei) is the degeneracy and Z is

Z= exp [- Ei/kBT] = n(Ei) exp [- Ei/kBT]

Page 79: INTRODUCTION The aim of computational structural biology is to

79

The thermodynamic functions can be obtained from derivatives of the configurational Helmholtz free energy, F F=-kBTln Z.From now on we ignore the velocities part and mainly treat Z. Thus,the configurational space is viewed as a 3N dimensional sample space , where to each “point” xN (random variable) corresponds the Boltzmann probability density,

PB(xN)=exp-[E(xN )]/Z

where PB(xN)dxN is the probability to find the system between xN and xN + dxN. The potential energy E(xN) defined for xN is also a random variable with an expectation value <E>

Page 80: INTRODUCTION The aim of computational structural biology is to

80

<E> = E(xN ) PB(xN)dxN

While <E> is identical to the energy obtained by deriving F/T with respect to T (see p. 23 ), calculation of the latter is significantly more difficult than calculating <E> because of the difficulty to evaluate Z hence F. In simulations calculation of <E> is straightforward.

Again, notice the difference between E(xN ) -the potential energy of the system in configuration xN, and <E> - the average potential energy of all configurations weighed by the Boltzmann probability density.

Page 81: INTRODUCTION The aim of computational structural biology is to

81

The power of the probabilistic approach is that it enables calculating not only macroscopic thermodynamic properties such as the average energy, pressure etc. of the whole system, but also microscopic quantities, such as the average end-to-end distance of a polymer. This approach is extremely useful in computer simulation, where every part of the system can be treated, hence almost any microscopic average can be calculated (distances between the atoms of a protein, its radius of gyration, etc.).

The entropy can also be viewed as an expectation value, where ln PB(xN) is a random variable,

S = <S> = -kB PB(xN)ln PB(xN) dxN

Page 82: INTRODUCTION The aim of computational structural biology is to

82

Likewise, the free energy F can formally be expressed as an average of the random variable, E(xN ) + kBT ln PB(xN),

F = - kBT lnZ = PB(xN)[E(xN ) + kBT ln PB(xN)] dxN

Fluctuations (variances)

The variance (fluctuation) of the energy (see p. 63) is:

2(E) = PB(xN)[E(xN ) - <E>]2 dxN =

= <E(xN )2 > - <E(xN ) >2

Notice that the expectation value is denoted by <> and E is the energy. It can be shown (Hill p. 35) that the specific heat at

constant volume is: Cv = (dE/dT)V = 2(E) /kBT2

Page 83: INTRODUCTION The aim of computational structural biology is to

83

In regular conditions Cv = 2(E) /kBT2 is an extensive variable,

i.e., it increases ~N as the number of particles N increases; therefore, (E) ~ N1/2 and the relative fluctuation of E decreases with increasing N,

NNN

EE 1

~

)(

Thus, in macroscopic systems (N ~ 1023) the fluctuation (i.e., the standard deviation) of E can be ignored because it is ~1011 times smaller than the energy itself these fluctuations are not observed in macroscopic objects. Like Z,<E> can be expressed, <E> = E(E)PB(E)dE where (E) is the density of states and PB(E) is the Boltzmann probability of a configuration with E.

Page 84: INTRODUCTION The aim of computational structural biology is to

84

The fact that (E) is so small means that the contribution to <E > comes from a very narrow range of energies around a typical energy E*(T) that depends on the temperature

E*

PB

Therefore, the partition function can be approximated by taking into account only the contribution related to E*(T).

Z = (E)exp[- E/kBT] dE fT(E*) = (E*)exp[- E*/kBT]

F E*(T)+kBTln (E*) = E*(T)-TS

Page 85: INTRODUCTION The aim of computational structural biology is to

85

The entropy, -kBln (E*) is the logarithm of the degeneracy of the most probable energy. For a discrete system Z = n(Ei) exp [- Ei/kBT] where Ei are the set of energies of the system and n(Ei) their degeneracies. For a macroscopic system the number of different energies is large (~N for a discrete system) while only the maximal term n(E*) exp [- E*/kBT] contributes. This product consists of two exponentials. At very low T the product is maximal for the ground state energy, where most of the contribution comes from exp[- EGS*/kBT] while n(EGS*) ~ 1 (S=0). At very high T the product is maximal for a high energy, where n(E*) is maximal (maximum degeneracy maximum entropy) but the exponential of the energy is small. For intermediate T

Page 86: INTRODUCTION The aim of computational structural biology is to

86

fT(E) e-E*/kB

T

n(E)

E*(T)E

n(E)e-E/kBT

Page 87: INTRODUCTION The aim of computational structural biology is to

87

The fact that the contribution to the integrals comes from an extremely narrow region of energies makes it very difficult toestimate <E>, S and other quantities by numerical integration. This is because the 3N dimensional configurational space is huge and the desired small region that contributes to the integrals is unknown a-priori.

Therefore, dividing the space into small regions (grid) would be impractical and even if done the corresponding integration would contribute zero because the small important region would be missed - clearly a waste of computer time.

The success of Monte Carlo methods lies in their ability to find the contributing region very efficiently leading to precise estimation of various averages, such as <E>.

Page 88: INTRODUCTION The aim of computational structural biology is to

88

x

f(x)

x1 x2 x3 xn

f(x)dx i f(xi)xi xi= xi-xi-1

Numerical integration

Page 89: INTRODUCTION The aim of computational structural biology is to

89

What is the probability to find the system in a certain energy (not xN)?

PB(E)=n(E)exp-[E/kBT]/Z

So, this probability depends not only on the energy but also on the degeneracy n(E). The relative population of two energies is therefore:

21

2

1

2

1

2

1

2

1

EEE

TkEEnEn

TkETkE

EnEn

EP

EPB

B

BB

B

)/exp(

)()(

)/exp()/exp(

)()(

)(

)(

Page 90: INTRODUCTION The aim of computational structural biology is to

90

Problems:

1. Show that the number of ways n objects can be divided into k groups of r1, r2,…..rk; rk= n without considering the order in each group but considering the order between the groups is

n!/(r1!r2!….rk!)

2. Two random variables X and Y are uncorrelated if E(XY)=E(X)E(Y). Show that in this case V(X+Y)=V(X)+V(Y). V is the variance.

3. The configurational partition function of an one dimensional oscillator qp is defined on p.28. Calculate the average potential energy <E>. Use the integral on p. 69.

Page 91: INTRODUCTION The aim of computational structural biology is to

91

Solving problems in statistical mechanics

The first step is to identify the states of the system and the corresponding energies (e.g., the configurations of a fluid and their energies). Then, three options are available:

1) The thermodynamic approach: Calculate the partition function Z the free energy F=-kBTlnZ and obtain the

properties of interest as suitable derivatives of F.

2) Calculate statistical averages of the properties of interest.

3) Calculate the most probable term of Z and the most dominant contributions of the other properties.

Page 92: INTRODUCTION The aim of computational structural biology is to

92

Problem: N independent spins interact with a magnetic field H. the interaction energy (potential) of a spin is H or - H, depending of whether , the magnetic moment is positive or negative. Positive leads to energy - H. Calculate the various thermodynamic functions (E,F,S, etc.) at a given temperature T.

2N stats of the system because each spin is or +1(-) or -1().

Potential energy of spin configuration i: Ei = -N+ H +N- H or Ei = -(N-N-)H + N- H where N+ and N- are the numbers of +1 and –1 spins. The magnetization of i is:

M = N+H - N- H

No kinetic energy is defined for this model.

Page 93: INTRODUCTION The aim of computational structural biology is to

93

Option 1: Thermodynamic approach

We have to calculate Z = i exp-[Ei/kBT] ; i runs over all the 2N different states of the system!

This summation can be calculated by a trick. The spins are independent, i.e., they do not interact with each other changing a spin does not affect the other spins. Therefore, the summation over the states of N spins can be expressed as the product Z=(z1)N where z1 is the partition function of a single spin.z1 = exp(-H/kBT) + exp(+H/kBT)=2cosh[H/kBT]

cosh(x)=[exp(x)+exp(-x)]/2 Z= [2cosh(H/kBT)]N

Page 94: INTRODUCTION The aim of computational structural biology is to

94

F=-kBTlnZ =-kBTNln[2cosh(H/kBT)]

Entropy:

)cosh(

)sinh()tanh(

x

xxexe

xexex

Energy:

)tanh()}cosh(ln{Tk

H

Tk

H

Tk

HNk

T

FS

BBBB

H2

)tanh(Tk

HHN

T

TF

TTSFEB

H

2

T=, S/N=ln 2 ; T=0, S=0

Page 95: INTRODUCTION The aim of computational structural biology is to

95

Magnetization: M=N+- N-

)tanh(Tk

HN

HF

MBT

Specific heat:

)(coshTk

HTk

HNk

TE

C

B

BB

H

2

2

Page 96: INTRODUCTION The aim of computational structural biology is to

96

Option 2: Statistical approach

Again we can treat first a single spin and calculate its average energy. z1 for a single spin is: z1 = exp(-H/kBT) + exp(+H/kBT)= 2cosh[H/kBT]

The Boltzmann probability for spin is: exp[ H/kBT] / z1

The average energy is

<E>1= [-Hexp(+H/kBT) + Hexp(-H/kBT)]/z1

= -H {2sinh[H/kBT]}/ {2cosh[H/kBT]}=

= -H tanh(H/kBT) ; <E>= -HNtanh(H/kBT)

Page 97: INTRODUCTION The aim of computational structural biology is to

97

The entropy s1of a single spin is:

s1= -kB[P+lnP+ +P-lnP-], where P is the Boltzmann probability.

s1= -kB{exp[+H/kBT][H/kBT -ln z1] +

exp[-H/kBT][-H/kBT -ln z1]}/z1=

=-kB{{H/kBT [e+ - e-]/ z1- ln z1[e+ +e-]/ z1}=

= -kB {{H/kBT tanh[H/kBT ] –ln z1}

The same result as on p. 94.

Page 98: INTRODUCTION The aim of computational structural biology is to

98

Option 3: Using the most probable term

E =-N+ H + N- H E’= E/H =-N++ N-

N = N++N-

N- = (N+ E’)/2 ; N+= (N- E’)/2

M = (N+ - N-) E =-MH

The # of spin configurations with E, W(E)=N!/(N+!N-!)The terms of the partition function have the form,

]/exp[!!

!)( TkE

ENENN

Ef BT

22

Page 99: INTRODUCTION The aim of computational structural biology is to

99

For a given N, T, and H we seek to find the maximal term. We take ln fT(E), derive it with respect to E’, and equate the result to 0, using the Stirling formula, lnN! NlnN.

Tk

EENENENENNNEf

BT

2222lnlnln)(ln

022

122

1

TkHENEN

EEf

B

T lnln)(

TkH

ENEN

TkH

ENEN

BB

2

21

expln

The maximum or minimum of a function f(x) with respect to x, is obtained at the value x* where (df/dx)|x* = f’(x*)= 0 and (df’/dx)|x*<0 or >0, respectively.

Page 100: INTRODUCTION The aim of computational structural biology is to

100

]tanh[

)exp()exp(

)exp()exp(

)exp(

)exp(

)exp(

)exp(

Tk

HNE

Tk

H

Tk

HTk

H

Tk

H

Tk

HTk

H

N

Tk

HTk

H

NE

B

BB

BB

B

B

B

B

2

1

21

The most probable energy, E* for given T and H is:

E*=-NH tanh(H/kBT)and M= N tanh(H/kBT)As obtained before.

Page 101: INTRODUCTION The aim of computational structural biology is to

101

degeneracy energy typical T spin configurations

1 (min. S=0) -NH (min.) T0 = 0 … (H)

N -(N-1)H+H very low (T0) … = -NH+2H N!/[(N-2)!2!] =N(N-1)/2 -NH+4H T1>T 0 … . k N!/[(N-k)!k!] -NH +2kH Tk >Tk-1 … . . N/2 N/2

N! / [(N/2)!(N/2)!] H (N/2-N/2) high T . . (max. degeneracy =0

S = ln 2)

N - > N+ not physical, negative temperature! dE decreases, dS increases.

Page 102: INTRODUCTION The aim of computational structural biology is to

102

Page 103: INTRODUCTION The aim of computational structural biology is to

103

Several points:

The entropy can be defined in two ways:1) As a statistical average: S = -kBi PilnPi (Pi – Boltzmann) and

2) as S kB ln n(E*) n(E*) - degeneracy of the most probable energy. For large systems the two definitions are identical.

As a mechanical system the spins “would like” to stay in the ground state (all spins are up; lowest potential energy most stable state), where no uncertainty exists (maximum order) the entropy is 0.

Page 104: INTRODUCTION The aim of computational structural biology is to

104

However, the spins interact with a heat bath at a finite T, where random energy flows in and out the spin system. Thus, spins parallel to H (spin up) might absorb energy and “jump” to their higher energy level (spin down), then some of them will release their energy back to the bath by returning to the lower energy state () and vice versa.

For a given T the average number of excited spins () is constant and this number increases (i.e., the average energy increases) as T is increased.

The statistical mechanics treatment of this model describes this physical picture. As T increases the average energy (= E*(T) - the most probable energy) increases correspondingly.

Page 105: INTRODUCTION The aim of computational structural biology is to

105

As E*(T) increases the number of states n(E*) with energy E*(T) increases as well, i.e., the system can populate more states with the same probability the uncertainty about its location increases S increases.

So, the increased energy and its randomness provided by the heat bath as T increases, is expressed in the spin system by higher E*(T) and enhanced disorder, i.e., larger ln n(E*) larger S(T).

The stability of a thermodynamic system is a “compromise” between two opposing tendencies: to be in the lowest potential energy and to be in a maximal disorder. At T=0 the potential energy wins; it is minimal complete order S=0 (minimal). At T= the disorder wins S and E* are both maximal.

Page 106: INTRODUCTION The aim of computational structural biology is to

106

At finite T the stability becomes a compromise between the tendencies for order and disorder; it is determined by finding the most probable (maximal) term of the partition function [at E*(T)]: n(E*) exp-[E*/kBT] or equivalently the minimal term of the free energy,

E* + kBTln n(E*) = E*-TS*

Notice that while the (macroscopic) energy is known very accurately due to the small fluctuations, the configuration (state) is unknown. We only know that the system can be located with equal probability in any of the n(E*) states.

Page 107: INTRODUCTION The aim of computational structural biology is to

107

Simplest polymer model – “ideal chain”

This model only satisfies the connectivity of the chain’s monomers; the excluded volume interaction is neglected, i.e., two monomers can occupy the same place in space. In spite of this unrealistic feature the model is important as discussed later. We study this model on a d-dimensional lattice; the chain starts from the origin ;no kinetic or potential energy.

end–to-enddistance

a1

a2

a4

Probabilistic approach: Sample space – ensemble of all chain configurations for a given N. No interactions, E=0, exp-E/kBT =1 for all chains they are equally probable.

N=8 bonds (steps) orN+1=9 monomers

Page 108: INTRODUCTION The aim of computational structural biology is to

108

This ensemble can be built step-by-step. For a square lattice: first bond – 2d=4 possible directions, and the same for the rest. the partition function Z is the total # of chain configurations Z= i1=4N= (2d)N ; PB(i)=1/Z= (1/4)N = (1/2d)N

S = -kB i PB(i)ln PB(i) = NkB ln4 = NkB ln2d

There is interest in global geometrical properties of a polymer such as the root mean square end-to-end distance (ETED) R. Denoting by V the ETED vector, V= iai

and R2 = <VV> = ij <ai aj> = i <ai ai>= Na2 a=|a| R=N½a

Page 109: INTRODUCTION The aim of computational structural biology is to

109

This is an important result: even though the chain can intersect itself and go on itself many times, the global dimension of the chain increases as N1/2 as N is increased; this means that the number of open structures is larger than that of the compact ones dominating thereby the statistical averages.

About the calculation: i

For any direction of i :

i 1=1 ; i 2 = 0 ; i 3 = -1 ; i 4 = 0

if i j <aiaj> = 0A similar proof applies to a continuum chain.

1

2

3

4j

Page 110: INTRODUCTION The aim of computational structural biology is to

110

Problem: One dimensional (d=1) ideal chain of n bonds each of length a starts from the origin; the distance between the chain ends is denoted by x (ETED). Find the entropy as a function of x and the relation between the temperature and the force required to hold the chain ends at x.

Comments: 1) The force is defined as =(F/x)T parallel to the definition of the pressure of gas, P=-(F/V)T

2) We have defined the entropy in terms of the degeneracy of the most probable energy. Here it is defined as the degeneracy of the most probable x.

Thus, the entropy is maximal for x=0 and minimal when completely stretched, x=na (without force <x2>=an).

Page 111: INTRODUCTION The aim of computational structural biology is to

111

x

x can be defined by n+ bonds in direction + and n- bonds in

direction _ x= (n+ - n- )a n= n+ + n-

n+= (na+x)/2a=(n+x/a)/2 ; n-=(na-x)/2a = (n-x/a)/2

The number of chain combinations for given n+ and n- (i.e., a given x) is w(x)=n!/ (n+!n-!)

+

We shall use the most probable term method

Page 112: INTRODUCTION The aim of computational structural biology is to

112

Using Stirling’s formula,

S(x)=kBlnw(x)= kB (nln n - n+ln n+ - n-ln n-)

)}ln()()ln()({lnna

x

na

x

na

x

na

xnkS B 11

2

111

2

12

F = -TS = (F/x)T = -T(S/x) =

....))(ln(ln

22

211

1

1

2 an

xnax

nax

nax

nax

aTkB

Using 1/(1-q)= 1+q+q2+q3+ …… for q < 1 q = x/na 1i.e., x is much smaller than the stretched chain.

Page 113: INTRODUCTION The aim of computational structural biology is to

113

Because x/2a 1 it is justified to take only the leading terms in the expansion, 1 and 2x/na.

We also used here the expansion, ln(1+y) y for y 1.

Therefore, for x na we obtain Hook’s law where kBT/na2 is the force constant. This derivation applies to rubber elasticity which is mainly caused by entropic effects.

.........)ln( xna

Tkna

xaTk

nax

aTk BBB

22

22

12

Page 114: INTRODUCTION The aim of computational structural biology is to

114

Solution of set problems 1:

1) Calculate the energy E of an oscillator by a free energy derivative, where q=kBT/h

F=-kBTlnq= =-kBTln(kBT/h)

Tkh

k

Tk

hT

Th

Tk

TkT

h

Tkk

TTT

F

TE BB

B

B

B

BB

2222][ln]ln[

E=3NkBT

For N oscillators:

Page 115: INTRODUCTION The aim of computational structural biology is to

115

2) Prove the equation on p.45.

3) A die is rolled four times. What is the probability to obtain 6 exactly one time.

This experiment is defined in the product space, where elementary events are vectors (i,j,k,l), 0 i,j,k,l 1.

The total number of EE: 64 (P=1/6)The event “6 occurred exactly one time” consists of the

following events:A) i=6, j,k,l remain between 1-5 53 EE.B) j=6 the rest are between 1-5 53 EE, etc.

P = 453/64 = 0.3858

Page 116: INTRODUCTION The aim of computational structural biology is to

116

4) A box contains 6 red balls, 4 white balls, and 5 blue balls. 3 balls are drawn successively. Find the probability that they are drawn in the order red, white, and blue if the ball is (a) replaced, (b) not replaced.

(a) The problem is defined in the product space: (15)×(15)×(15), where the experiments are independent.

P(red)=6/15 ; P(white)=4/15 ; P(blue)=5/15

P(red,white,blue) =(645)/153= 120/3375=0.0356(b) P(red,white,blue)=(6/15)(4/14)(5/13)=0.0440

Page 117: INTRODUCTION The aim of computational structural biology is to

117

About force and entropy

In the mechanical systems the force is obtained from a derivative of the potential energy; in gravitation E=mgh; force = -dE/dh=-mg. For a spring, E=kx2/2; force = -dE/dx =-kx.

If the spring is stretched and released the mass will oscillate. The potential energy is converted into kinetic energy and vice versa.

TS has a dimension of energy. Can a force be obtained from pure entropy? Answer – yes.

To show that we examine the 1d ideal chain studied recently.

Page 118: INTRODUCTION The aim of computational structural biology is to

118

We have shown on p. 108 that the entropy of an1d ideal chain (i.e., ln of total # of chains) is, S = NkB ln2d = NkB ln2.

On p. 113 we obtained, -/T = dS/dx=-const.ln(1+2x/na+..) for x na. Equating this derivative to 0 and solving for x leads to x=0, which is the ETED value for which S is maximal (the second derivative of S with respect to x is negative at x=0).

For x=0, n+= n-= n/2; the # of chain configurs. for x=0 is:

w(x=0)=n!/[(n/2)!(n/2)!] and ln(w) = nlnn - nln(n/2) = nln2 = S/kB of this system!

using Stirling approximation the # of configurations at x=0 equals to the total number of configurations.

Page 119: INTRODUCTION The aim of computational structural biology is to

119

x=0 is the most probable ETED, i.e., it has the maximal entropy; thus, to “hold” the chain ends at a distance x>0 one has to apply force against the “will” of the chain to remain in its most probable state (x=0). In other words, the force is required for decreasing the chain entropy.

This is an example for what is known as the “potential of mean force” (PMF), which is the contribution to the free energy of configurations that satisfy a certain geometrical restriction. In our case this restriction is a certain value of the ETED x.

Deriving the PMF (free energy) with respect to the corresponding restriction (e.g., x), leads to the average force required to hold the system in the restricted geometry.

Page 120: INTRODUCTION The aim of computational structural biology is to

120

Rubber elasticity stems from such entropic effects. Rubber consists of polymer chains that are connected by cross-links. When the rubber is stretched bond angles and lengths are not changed (or broken). The main effect is that the polymers become more ordered, i.e., their entropy is decreased. When the rubber returns to its normal length the energy invested is released as heat which can be felt. Thus, this “thermo-dynamic spring” differs from the mechanical one.

Calculation of PMF values and the corresponding forces is also carried out for proteins. An example is the huge muscle protein titin (~33,000 residues), which is stretched by atomic force microscopy and the forces are compared to those obtained from derivatives of PMF calculated by MD. (see, Lu & Schulten, Biophysical Journal vol. 79, 51-65, 2000.

Page 121: INTRODUCTION The aim of computational structural biology is to

121

For x/2a 1 we have obtained Hook’s law for the force :

Thus, the force required to hold the chain at x increases linearly with increasing T, because the free energy (-TS) at x=0 decreases, i.e., the system becomes more stable at higher T and it is difficult to stretch it to a larger x.

Also, the force is proportional to 1/(na2 ), meaning that the longer the chain the easier is to stretch it.

.........)ln( xna

Tkna

xaTk

nax

aTk BBB

22

22

12

Page 122: INTRODUCTION The aim of computational structural biology is to

122

Phase transitions

A phase transition occurs when a solid become liquid and the latter gas. A Phase transition also occurs when a magnet loses its magnetization or a liquid crystal becomes disordered.

While phase transitions are very common they involve a discontinuity in thermodynamic properties and it was not clear until 1944 whether this phenomenon can be described within the framework of equilibrium statistical mechanics.

In 1944 Onsager solved exactly the Ising model for magnetism where the properties of phase transition appeared in the solution. This field was developed considerably during the last 40 years. It was also established that polymers correspond to

Page 123: INTRODUCTION The aim of computational structural biology is to

123

magnetic systems and show a phase transition behavior.

First order phase transition is a discontinuity in a first derivative of the free energy F. Examples of first derivatives are the energy, <E> = T2 [(F/T)/T]N,V and the magnetization M of a spin system in a magnetic field, <M> = (F/ H)T.

A known example for a system undergoing first order transition is a nematic liquid crystal. The molecules are elliptic or elongated. At low T they are ordered in some direction in space and <E> is low. A random arrangement (and high <E>) occurs at high T. At the critical temperature Tc the system can co-exist in two states random and ordered.

There is finite difference in the energy, E (latent heat) and entropy S of the two phases where E = TS F1= F2.

Page 124: INTRODUCTION The aim of computational structural biology is to

124

In other words, the two states are equally stable at Tc.

T< Tc ordered state T >Tc disordered statelow energy & entropy high energy & entropy

-<E>

ordered disordered T

EFordered=Fdisordered

Page 125: INTRODUCTION The aim of computational structural biology is to

125

For a regular system the function fT(E)=n(E)exp[-E/kBT] has a single sharp peak at E*(T). In a first order phase transition two peaks exist at Tc, around Eorder and Edisorder .

Notice that in a small system the peaks will be smeared, while in a macroscopic system they become very sharp.

In the case of a peptide or a protein it is possible that several energies contribute significantly to Z and the molecule will populate all of them significantly in thermodynamic equilibrium. Because the system is finite and relatively small the peaks will be smeared.

fT(E)

E1 E2 E3 E4

Page 126: INTRODUCTION The aim of computational structural biology is to

126

Question: The ``jump” in energy at a first order transition is E (latent heat). What is S in terms of E and Tc?

F1=F2 E1- TcS1=E1- TcS1 E2-E1 = Tc (S2-S1)

E = TcS S = E/Tc

In a second order phase transition the first derivatives of the free energy F are continuous but there is a discontinuity (or divergence to ) in a second derivative. Examples: the specific heat Cv & the magnetic susceptibility T,

Cv= (E/T)V - E itself is a first derivative of F T = -(M/H)T ; M= -(F/H)T

Page 127: INTRODUCTION The aim of computational structural biology is to

127

A second order phase transition occurs at the critical point of a magnetic system or a liquid, where the susceptibility & the compressibility, respectively diverge to . Polymers in some respect behave as a system undergoing such a transition.

To understand this phenomenon we define on an LL=N square lattice a simple spin model called the Ising model. On lattice site i a spin i is defined with two states, ``up” and ``down”, I = +1 or –1, respectively; the total # of spin configurations is 2N. Unlike the model studied on p. 92, spin i interacts with a nearest neighbor spin j with energy ij=-Jij, where the strength of the interaction is J >0 (ferromagnetic), but the magnetic field, H=0. ij(++)= ij(--)= -J ij(+-)= ij(-+)= J

Page 128: INTRODUCTION The aim of computational structural biology is to

128

The Ising model on a square lattice was solved analytically by Onsager in 1944; on a cubic lattice only approximate solutions exist. In some respects the behavior of this model is similar to the model of independent spins in a magnetic field studied earlier. In the ground state (typical for T=0) all spins are up (i=+1 ), while the random spin configuration are the most probable at high T. However, unlike the previous model, a critical T exists at which all the first derivatives of F are continuous but the second derivatives diverge with typical critical exponents.

- + + + - ++ - + + + ++ + + - + +

+ + + + - -+ + + - - -+ + - - - -

+ + + + + ++ + + + + ++ + + + + +

+ - + + - ++ + + - + -+ - - - + +

T=0, M=1 T <Tc , M=0.6 T =Tc , M=0 T Tc , M=0 large droplets of + & -

Page 129: INTRODUCTION The aim of computational structural biology is to

129

At Tc, CV/N and /N diverge as:

And the magnetization (# spins up - # of spins down)/N approach to 0 at Tc from the cold side as:

)()( c

c

c

c

TT

TNTT

TN

C TV

M1

T=0 Tc T > Tc

CV

c

c

TTT

M )(

=1/8 ; =1.75 for 2d Ising model

Page 130: INTRODUCTION The aim of computational structural biology is to

130

What is happening?

<M> is a first derivative of F and it is continuous.<E> (not shown) is a first derivative & continuous as well.

The divergence of CV/N and the susceptibility T/N stems from the fact that CV ~ 2(E) (as has already been pointed out) and ~ 2(M)N. At non-critical temperatures 2(E) & 2(M)N are extensive (i.e., ~ N), hence CV /N and T/N are finite.

At Tc 2(M)N & 2(E) increase with N stronger, as N1+x & N1+y; x, y >0 are critical exponents CV/N and T/N diverge.

This increase in the fluctuations is a result of the large ( for an system) droplets of spin up and down that occur at Tc.

Page 131: INTRODUCTION The aim of computational structural biology is to

131

Another measure of this “ criticality” is the correlations between spins as a function of distance

22

2corr

kiki ),(

In this equation is M. Thus, two spins on the same droplet will have the same sign they will be correlated. The correlation length increases as T Tc because the droplet size increases,

)( c

c

TT

T

where is another critical exponent (=1for a 2d Ising model).

+ + - - + - + - + + + - - + - - - - - + + - - - -

Page 132: INTRODUCTION The aim of computational structural biology is to

132

The interesting property of these exponents is that they are universal - i.e., they do not depend on the system (in most cases) but depend on the dimension they are different in 1, 2 and 3 dimensions. There are exact relations between them and inequalities.

It turns out that a polymer chain- like a magnetic or a fluid system at Tc is also a critical system with characteristic critical exponents, , , and . For example, we have already found that the ETED of an ideal chain behaves like an, where =1/2.

Page 133: INTRODUCTION The aim of computational structural biology is to

133

Models for polymers and proteins

Proteins, nucleic acids, or poly-saccharides are all long polymers; therefore, before treating such biological macromolecules (in particular proteins) we shall study general properties of polymers.

The most general model is of many polymers and solvent molecules (e.g., water) contained in a volume V. The potential energy is based on intra-polymer, polymer-polymer, polymer-solvent and solvent-solvent interactions. We shall be interested mainly in the very dilute regime where a polymer ”meets” another one very rarely one can treat a single polymer surrounded by solvent molecules.

This model describes faithfully proteins than do not aggregate.

Page 134: INTRODUCTION The aim of computational structural biology is to

134

The accuracy of the interaction energy depends on the questions asked. If one is interested in the specific 3D structure of a protein the potential energy function (also called force field) will be complicated, typically based on electrostatic, van der Waals, hydrogen-bond and other interactions.

An analytical solution of Z and other thermodynamic and geometrical parameters is impossible and the only available approach is numerical, in particular Monte Carlo (MC) and molecular dynamics (MD) simulation methods.

On the other hand, if one is interested in global properties of polymers (e,g., shape, size), relatively simple lattice models are sufficient.

Page 135: INTRODUCTION The aim of computational structural biology is to

135

Simple models for polymers – ideal chains

As we have seen, an ideal chain only satisfies the connectivity of a polymer but not the excluded volume (EV) interaction, i.e., the chain can intersect itself and go on itself. No potential energy (attractive) is applied.

Other names – Gaussian chain, random walk, and the unperturbed chain. If the chain is off-lattice (i.e., in continuum) it is called the freely jointed chain or a random flight chain.

When the EV interaction is considered the chain is called a “real chain”.

Page 136: INTRODUCTION The aim of computational structural biology is to

136

Importance of ideal chains

• Can be treated analytically; provide the basis for understanding real chains.

• Explain rubber elasticity.

• Describe a polymer at the Flory -point.

• An Ideal chain models a polymer in a dense multiple- polymer system.

We shall treat a single long polymer of N bonds (steps) (i.e., N+1 monomers) ignoring the volume. The chain interacts with a heat bath at temperature

T.

Page 137: INTRODUCTION The aim of computational structural biology is to

137

The global shape of a polymer

Besides the ETED, the spatial size of the entire chain can be expressed by the radius of gyration s, R

where si is the distance of monomer i from the center of mass, cM of the chain,

1

1

22

11 N

iis

Ns

i

iiM m

mc

x xi- vector coordinates

It can be shown (Flory, p.5) that

Nji

ijrN

s0

22

2

1

1

)(rij are distances between monomers i and j.

cM

si

Page 138: INTRODUCTION The aim of computational structural biology is to

138

Notice, s2 is based on all rij whereas the ETED on a single distance; therefore, in simulations <s2> converges faster than <R2> (ETED).

For a sphere of radius R the radius of gyration is

For a protein with a shape close to spherical <s2>1/2 will be ~ 0.8R of the radius of the protein.

RRs

RR

Rdrr

Rdrddrs

RR

7705

3

5

3

5

3

4

43

volume

1

2

25

30

43

0

42

.

sin

Page 139: INTRODUCTION The aim of computational structural biology is to

139

It was shown that on a lattice and in continuum while <R>=0 <R2>=Na2 ; <R2>½ = N½a=Na

and one can show that =1/2 also for the radius of gyration

<s2>= <R2>/6 <s2> ½ = N½a/6 the shape of the chain increases with critical exponent =1/2, the open chains dominate these averages as N is increased; this result is true for any dimension, d = 1,2 ,3 …

Recall, in the Ising model describes the increase of the spin droplets (correlation length) as TTc.

Page 140: INTRODUCTION The aim of computational structural biology is to

140

Adding various geometrical restriction to an ideal chain does not change the value =1/2 as long as the EV interaction is ignored. The only change occurs in the pre-factor.

Thus, for a chain with constant bond angles ,

If =109.5o <R2>½ = 2N½a ; for =90o <R2>½ = N½a Like for a freely-jointed chain.

Adding restrictions to the bond angles will open the chain (the size of the shortest possible loop increases), <R2> will grow leading however only to a larger pre-factor (2 for =109.5o). This is true as long as the restrictions are short range.

aNR 21212

11 //

)cos()cos(

Page 141: INTRODUCTION The aim of computational structural biology is to

141

Adding long-range interactions such as EV will affect the exponents as well.

One can define a statistical unit a’, based on several monomers. Thus, for a chain of N monomers a, M units a’ can be defined where, <R2>½ = M½a’ (M < N; a’ > a)

For =109.5o M = N/2 ; a’=2a. a’ is related to the persistence length.

Notice, the relation, <R2>½ = N½a is satisfied, not only for the chain ends, but for any pair of monomers inside the chain (internal monomers) – the chain “tails” do not change this relation; however, the pre-factor might be changed.

Page 142: INTRODUCTION The aim of computational structural biology is to

142

Distribution of the end-to-end distance

For an ideal chain the probability density to be at ETED, x can be obtained exactly and an excellent approximation is a Gaussian function (normal distribution; see pp. 59 & 68). It is easy to derive this probability density for the 1d lattice chain model studied on pages110-113. For simplicity we assume that the bond length is a=1.

One has to calculate the ratio between the number of chains with ETED=x and the total number of chains, 2N.

We write this expression exactly, simplify it by taking ln and applying Stirling’s formula; the result is then exponentiated back; it is a Gaussian distribution.

Page 143: INTRODUCTION The aim of computational structural biology is to

143

+n+ , n- : # of bonds in direction +, -n= n+ + n- x = n+ - n- but also -x = n- - n+

x

n+ = (n+x)/2 ; n- = (n-x)/2 ; total # of chains 2n

The ratio is: )!()!(

)(!)(

222

2xnxn

nxP

n

22222

2 lnlnlnlnln)(ln nxnxnxnxn

nnxP

Using Stirling’ formula:

Page 144: INTRODUCTION The aim of computational structural biology is to

144

After opening the ln terms and regrouping:

n

xn

xx

n

xn

nnn

xn

xnxxnxn

nnnxP

1

1

21

22

222

2

22 ln)(lnlnln

ln))(ln(lnln)(ln

For xn ln(1+x/n) ~ x/n

This Gaussian should be normalized (due to approx.) ln2 is ignored.

)exp()exp()(

lnlnln)(ln

2

22

222

2

2

22

22

22

222

R

x

n

xxP

n

x

n

x

n

x

n

x

n

xx

n

xnxP

Page 145: INTRODUCTION The aim of computational structural biology is to

145

For a freely jointed chain in 3 dimensions,

drrR

r

RdrrP 2

2

223

24

2

3

2

3

exp)(

/

|-----------------------------------|This is the probability to find the chain in a spherical shell defined by the radii r and r+dr. It is zero at r=0.The first part is the probability density per volume as obtained in the 1d case; this part is maximal at r=0 (see figure). 4 comes from integrating over the angles and .

This Gaussian is approximate because it is non 0 also for r>Na – the fully stretched chain! However, this is a very good approximation for large N. The Gaussian distribution exists also for distant enough internal monomers i,j.

Page 146: INTRODUCTION The aim of computational structural biology is to

146

n=10

r/na

Page 147: INTRODUCTION The aim of computational structural biology is to

147

n=10

n=20

n=4

r/na

Page 148: INTRODUCTION The aim of computational structural biology is to

148

Ideal chains and random walks

There is one-to-one correspondence, ideal chains random walks. E.g., on a lattice, the walker starts from the origin and “jumps” to one of the nearest neighbor sites with equal prob. & continues in the same way. Its pathway defines a chain configuration. The decision of the walker does not depend on previous jumps – no memory. Every pathway of the walker has the same probability- (1/2d)N as that of an ideal chain.

Indeed, it is known from Brownian motion, which describes a random walk, that the average distance squared <S2> is <S2> = 2D·t = (2kBT/f)·tD- diffusion constant ; f- viscosity constant ; t-time n ;<S2> <R2>; it is also known that <S2> has Gaussian distribution.

Page 149: INTRODUCTION The aim of computational structural biology is to

149

Problems - third set

(1) Given: <E> = i Ei exp[-Ei/kBT]/Z Z = i exp[-Ei/kBT]

Prove: 2(E) = <E2> - <E>2 = kBT2CV

where,

(2) A system of N0 adsorption points and N adsorbing molecules (N N0) is given. Calculate the chemical potential: Hint: calculate # of arrangements

TNF

Page 150: INTRODUCTION The aim of computational structural biology is to

150

(3) Nc monomeric units are ordered along a straight line and create a chain. Every unit can be in state or where its length is a or b with energy E or E, respectively. Find the relation between the chain length and the force K between the chain ends. K K

x See previous model solved in class.

Page 151: INTRODUCTION The aim of computational structural biology is to

151

Two comments on ideal chains

ETED – Because the statistical average of the end-to-end distance is zero, i.e., <R>=0, <R2> is the variance. This should be compared to the energy, where <E>#0, <E> ~N and the variance is <E2>-<E>2. However, in both cases the distribution is Gaussian with a very good approximation,

<R> = 0, 2(R) = <R2> ~Na<E> ~ N, 2(E) = <E2> - <E>2 ~ N

P(R)dV = C’exp[-3R2/22(R)]P(E)dE = Cexp[-(E - <E>)2/22(R)]

Page 152: INTRODUCTION The aim of computational structural biology is to

152

Ideal chain and ideal gas: Both do not satisfy the EV interaction, but because of connectivity, # of chain configurations is smaller than those of an ideal gas. Also, the chain has a global structure that increases with increasing N.

Chains with excluded volume – real chains

Assume again a chain of N steps (bonds), i.e., N+1 monomers anchored to the origin on a square lattice. The chain is not allowed to intersect itself because two monomers occupying the same site have interaction energy. Therefore, this model is called self-avoiding walks (SAWs) where ideal chains are also called random walks (RW). Due to the EV interaction, exact analytical solution of Z is not available and the information about the model is based only on

Page 153: INTRODUCTION The aim of computational structural biology is to

153

numerical calculations (mostly simulations) and approximate analytical treatments. Two interaction energies are defined for chain configuration i, Ei=0 if i is a SAW and Ei = for a self- intersecting walk. The partition function is

Z = exp[-Ei /kBT] = 1i

RW i SAW i

Thus, Z is the total number of SAWs – they constitute a subgroup of the random walks (ideal chains).

Self-intersecting walk, Ei =

SAW, Ei = 0

Page 154: INTRODUCTION The aim of computational structural biology is to

154

As for random walks, each SAW contribute 1 to the partition function (Ei=0); therefore, the Bontzmann probability is equal for all SAWs. PB(i)=1/ZZ can be calculated exactly for relatively short SAWs on lattices (N<50 on a square lattice) by enumerating all the different SAWs with a computer. Approximate value of Z for large N can then be obtained by a suitable extrapolation.

Because of the repulsion among the monomers of SAWs (EV interaction) the expansion of the chain’s shape with increasing N is larger than that found for random walks (ideal chains). [the compact RW configurations where a chain go on itself do not contribute to the average ETED(SAWs)]. Thus, the increase of the ETED (<R2>) and the radius of gyration, <s2> is governed by an exponent > ½.

Page 155: INTRODUCTION The aim of computational structural biology is to

155

<R2>1/2= BNa <s2>1/2 = DNa

B and D are constant pre-factors that depend on the kind of lattice, the model, and the dimension. on the other hand depend only on the dimensionality – all the d=2 models will have the same (e.g., a square or triangular lattice, etc.) – universality. An excellent approximation for of SAWs is given by Flory’s formula

d (Flory) (best) method 1 1 1 exact (trivial)

2 0.75 0.75 almost an exact solution

3 0.60 0.588 simulations (lattice) 4 0.50 0.5 exact

= 3/(d+2)

Page 156: INTRODUCTION The aim of computational structural biology is to

156

For d=1 is maximal, =1 because the chain can go left or right only; this also describes the increase of a rod-like molecule such as an -helix or a DNA. As d increases the EV interaction becomes less effective because the chain has more alternative pathways to circumvent the EV disturbances; therefore, decreases.

In 4 and higher dimensions the spatial freedom is so large that the chain does not “feel” the EV disturbances at all and it grows like an ideal chain, i.e., <R2>1/2= CN1/2a , =1/2.

SAW in three dimensions is sometimes called a swollen chain (because its shape is larger than that of a RW); it is a model for a polymer in a good solvent, where the solvent attracts the monomers and opens up the chain.

Page 157: INTRODUCTION The aim of computational structural biology is to

157

Returning to Z. One can express Z(SAW) as follows:

Z = CNN-1

C - a constant pre-factor that depends on the model.

- growth parameter - effective coordination number- depends on the model.

- critical exponent that depend only on d.

To understand this formula we consider a random walk on a d-dimensional lattice, where immediate return is forbidden; 2d-1 directions are available for each step besides the first one that has 2d possible directions

Z=2d(2d-1)N-1=[2d/(2d-1)](2d-1)N

In this case, C = 2d/(2d-1); =2d-1, =1 (N-1=N0=1)

Page 158: INTRODUCTION The aim of computational structural biology is to

158

For SAW on a simple cubic lattice 4.6 (as expected, it is smaller than =2d-1=5 for RW). On a square lattice 2.6 (again, smaller than 2d-1=3 for RW). For d=4 (SAW)=2d-1=7 – as for RW.

For large N, the N term dominates the N-1 term. Again, for a given d, depends on the model (lattice) while depends only on the dimensionality- i.e., it is universal.

(d=2)=1.34… exact (d=3)=1.16 numerical calculations (d=4)=1 exact- like random walks

Summary: As d increases a SAW becomes more and more like a random walk, because the EV interactions weaken. For d4 the EV becomes ineffective a RW behavior.

Page 159: INTRODUCTION The aim of computational structural biology is to

159

Polymer chain as a critical system

For the magnetic Ising model and a fluid we defined several critical exponents that describe the divergence of second derivatives of the free energy at the critical temperature Tc of a second order phase transition.

)(

)(

)(

c

c

c

c

c

c

TT

T

TT

T

N

TT

T

N

CV

specific heat (variance of energy)

Susceptibility (variance of magnetization)

correlation length (growth of droplet)

Other magnetic lattice models exist which undergo second order transition.

Page 160: INTRODUCTION The aim of computational structural biology is to

160

Of special interest is the n-vector model in the limit of n 0.We shall not define the model here but will only say that there is an one-to-one correspondence between the exponents of this model and the exponents we defined for SAWs (, and ).

One can consider the radius of gyration of a chain as defining the size of a droplet. In the magnetic model Tc/(T-Tc)

while <s2>1/2 ~ N = 1/(1/N)

So, 1/N corresponds to (T-Tc)/Tc. As N increases the chain approaches its critical temperature. What happens is that the contribution to of the magnet comes from the values of <R2> calculated for chains of connected spins that do not self-intersect (SAWs). Therefore, the two models have the same .

Page 161: INTRODUCTION The aim of computational structural biology is to

161

This correspondence is important for several reasons: First, sophisticated renormalization group techniques were developed for the magnetic systems critical exponents for the polymer systems can be obtained by treating magnetic models, and vice versa – certain critical exponents of magnetic systems can be obtained very accurately by Monte Carlo simulations of long polymers.

Also, this correspondence discovered in1972 by de Gennes (a Nobel prize winner) has shown that a branch of analytical approximations for polymers that had been active for 25 years- was incorrect!!!A more philosophical note: Critical behavior occurs in high energy and solid state physics, liquids, and polymers – these very different fields live now under the same roof ,i.e., all are treated by the normalization group theory.

Page 162: INTRODUCTION The aim of computational structural biology is to

162

These ideas are summarized in a book by de Gennes “scaling concepts in polymers”. Another polymer scientist who in many respects established this field, is Paul Flory who won the Nober prize in 1974, before de-Gennes.

Distribution of the end-to-end distance of SAWs

First, the growth with N of the ETED of two monomers i and j that are not end monomers is, <R2(ij)>1/2=A|j-i| with the same exponent as for the end monomers (<R2(ij)>1/2=B|j-i|) but with a different pre-factor, i.e., AB.

i

jETED When i and j approach the ends A B (might be important for

analyzing fluorescence experiments).

Page 163: INTRODUCTION The aim of computational structural biology is to

163

A non-Gaussian distribution for R has been developed

11 1

11

3

xxxfxxxf

ji

rfjirPij

),exp()(;,)(

)()(

f(x)

x

Other parameters (not exponents) appearing in this function change when i j are internal, end- internal etc. See Valleau JCP 104, 3071 (1996).

Page 164: INTRODUCTION The aim of computational structural biology is to

164

As discussed later, denatured proteins belong to the same regime as SAWs. In fluorescence experiments one can measure the ETED distributions of pairs of certain residues along the chain one can use the formulas for Pij(r) and Rij~|i-j| for analyzing the data.

The effect of solvent and temperature on chain size

A polymer in a good solvent is open due to its strong favor interaction with the solvent molecules. In a bad solvent the attractions among the monomers are stronger than with the solvent; therefore, the polymer’s size decreases and in extreme conditions the solute molecules will aggregate and precipitate.

Page 165: INTRODUCTION The aim of computational structural biology is to

165

A simple modeling for solvent (or temperature) effects is a SAW on a lattice where an attraction energy = -|| is defined between any two nearest neighbor monomers that are non-bonded.

Mi=5 # of attractions

Z= exp [-Ei /kBT] = exp [-Mi /kBT]

By decreasing the temperature the most probable chains are those with the lower energy, i.e., the more compact ones – a bad solvent condition. Notice that two kinds of long-range interactions exist here attractions and the EV repulsions.

SAWs SAWs

Page 166: INTRODUCTION The aim of computational structural biology is to

166

This model defines three solvent (or temperature) regimes:

T > - high temperature regime – high energy- open chain – good solvent behavior - exponents of SAWs( = 0).

T = - - Flory’s theta temperature – the polymer behaves, to a large extent, as an ideal chain (random walk). A counterbalance between the attractions and repulsions (EV) exists - ideal chain exponents ( =1/2, =1).

T < - the quality of the solvent decreases further – the polymer collapses – compact structure – low energy – corresponds to the state of a folded protein - =1/d –

other exponents are not defined – no connection to the n-vector model.

Page 167: INTRODUCTION The aim of computational structural biology is to

167

The -point

It is expected that at T= the attractions compensate the EV repulsions and the chain has an ideal behavior. The following parameters are obtained:

d method

2 0.571… 1.143… 3.2 exact 3 ~0.5 1.0050.02 5.058 simulation (MC) 4 0.5 1 RW 0.5 1 3(5)

Conclusions: in two dimensions (d=2) at the balance point, the EV prevails (even though =3.2 rather than 3. In d=3 the behavior is to a

large extent ideal even though is slightly larger than 5.