Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Introduction To DCA

K.K M

[email protected]

December 12, 2016

K.K M (HUST) Introduction To DCA December 12, 2016 1 / 44

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Contents

Contents

• Distance And Correlation• Maximum-Entropy Probability Model• Maximum-Likelihood Inference• Pairwise Interaction Scoring Function• Summary


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Definition of Distance

Definition of Distance

A statistical distance quantifies the distance between two statisticalobjects, which can be two random variables, or two probabilitydistributions or samples, or the distance can be between an individualsample point and a population or a wider sample of points.

• d(x , x) = 0 Identity of indiscernibles• d(x , y) ≥ 0 Non negative• d(x , y) = d(y , x) Symmetry• d(x , k) + d(k, y) ≥ d(x , y) Triangle inequality


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Different Distances

Different Distances

• Minkowski distance• Euclidean distance• Manhattan distance• Chebyshev distance• Mahalanobis distance• Cosine similarity

• Pearson correlation• Hamming distance• Jaccard similarity• Levenshtein distance• DTW distance• KL-Divergence


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Minkowski Distance

Minkowski Distance

P = (x1, x2, ..., xn) and Q = (y1, y2, ..., yn) ∈ Rn

• Minkowski distance: (n∑

i=1|xi − yi |p)

1p

• Euclidean distance: p = 2• Manhattan distance: p = 1• Chebyshev distance: p = ∞

limp→∞

(n∑

i=1|xi − yi |p)

1p = nmax

i=1|xi − yi |

• z-transform: (xi , yi) 7→ ( xi −µxσx

,yi −µy

σy), xi and yi is independent


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Mahalanobis Distance

Mahalanobis Distance

• Mahalanobis Distance

Covariance matrix C = LLT

Transformation x L−1(x−µ)−−−−−−→ x ′


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Mahalanobis Distance

Example for the Mahalanobis distance

Figure: Euclidean distance Figure: Mahalanobis distance


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson Correlation

Pearson Correlation

• Inner Product: Inner(x , y) = ⟨x , y⟩ =∑

ixiyi

• Cosine similarity: CosSim(x , y) =∑

i xi yi√∑i x2

i

√∑i y2

i= ⟨x ,y⟩

∥x∥ ∥y∥

• Pearson correlation

Corr(x , y) =∑

i(xi − x)(yi − y)√∑(xi − x)2

√∑(yi − y)2

= ⟨x − x , y − y⟩∥x − x∥ ∥y − y∥

= CosSim(x − x , y − y)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson Correlation

Different Values for the Pearson Correlation


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Limits of Pearson Correlation

Limits of Pearson Correlation

Q: Why don’t use Pearson correlation in pairwise associations?

A: Pearson correlation is a misleading measure for direct dependence asit only reflects the association between two variables while ignoring theinfluence of the remaining ones.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Partial Correlation

Partial Correlation

For M given samples in L measured variables:

x1 = (x11 , . . . , x1

L)T , . . . , xM = (xM1 , . . . , xM

L ) ∈ RL

Pearson correlation coefficient:

rij = Cij√Cii C jj

where Cij = 1M∑M

m=1(xmi − xi)(xm

j − xj) is empirical covariance matrix.Scale ecah of variables to zero-mean and unit-standard deviation,

xi 7→ (xi − xi)√Cii

Simplify the correlation coefficient: rij ≡ xixj


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Partial Correlation

Partial Correlation of Three-variable System

The partial correlation between X and Y given a set of n controllingvariables Z = Z1, Z2, . . . , Zn, written rXY ·Z , is

rXY ·Z = N∑N

i=1 rX ,irY ,i −∑N

i=1 rX ,i∑N

i=1 rY ,i√N∑N

i=1 r2X ,i − (

∑Ni=1 rX ,i)2

√N∑N

i=1 r2Y ,i − (

∑Ni=1 rY ,i)2

In particularly, where Z is a single variable, which of a three randomvariables between xA and xB given xC is defined as

rAB·C = rAB − rBCrAC√1 − r2

AC

√1 − r2

BC

≡ − (C−1)AB√(C−1)AA(C−1)BB


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson V.S. Partial Correlation

Reaction System Reconstruction


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Entropy

Entropy

The thermodynamic definition of entropy was developed in the early1850s by Rudolf Clausius:

∆S =∫

δQrevT

In 1948, Shannon defined the entropy H of a discrete random variable Xwith possible values x1, . . . , xn and probability mass function p(x) as:

H(X ) = −∑

xp(x) log p(x)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Joint & Conditional Entropy

Joint & Conditional Entropy

• Joint Entropy: H(X , Y )• Conditional Entropy: H(Y |X )

H(X , Y ) − H(X ) = −∑x ,y

p(x , y) log p(x , y) +∑

xp(x) log p(x)

= −∑x ,y

p(x , y) log p(x , y) +∑

x

(∑y

p(x , y))

log p(x)

= −∑x ,y

p(x , y) log p(x , y) +∑x ,y

p(x , y) log p(x)

= −∑x ,y

p(x , y) log p(x , y)p(x)

= −∑x ,y

p(x , y) log p(y |x)

= H(Y |X )


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Relative Entropy & Mutual Information

Relative Entropy & Mutual Information

• Relative Entropy(KL-Divergence): D(p∥q) =∑

xp(x) log p(x)

q(x)• D(p∥q) = D(q∥p)• D(p∥q) ≥ 0

• Mutual Information:

I(X , Y ) = D( p(x , y) ∥ p(x)p(y) ) =∑x ,y

p(x , y) log p(x , y)p(x)p(y)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Relative Entropy & Mutual Information

Mutual Information

H(Y ) − I(X , Y ) = −∑

yp(y) log p(y) −

∑x ,y


= −∑

y

(∑x

p(x , y))

log p(y) −∑x ,y


= −∑x ,y

p(x , y) log p(y) −∑x ,y


= −∑x ,y

p(x , y) log p(x , y)p(x)

= H(Y |X )

H(X , Y ) − H(X ) = H(Y |X ) ⇒ I(X , Y ) = H(X ) + H(Y ) − H(X , Y )


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model MI & DI

MI & DI

• MIij =∑σ,ω

fij(σ, ω) ln(

fij(σ, ω)fi(σ)fj(ω)

)

• DIij =∑σ,ω

Pdirij (σ, ω) ln

(Pdir

ij (σ, ω)fi(σ)fj(ω)

)

where Pdirij (σ, ω) = 1

zijexp(eij(σ, ω) + hi(σ) + hj(ω))


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Principle of Maximum Entropy

Principle of Maximum Entropy

• The principle was first expounded by E.T.Jaynes in two papers in1957 where he emphasized a natural correspondence betweenstatistical mechanics and information theory.

• The principle of maximum entropy states that, subject to preciselystated prior data, the probability distribution which best representsthe current state of knowledge is the one with largest entropy.

• maximize S = −∫

x P(x) ln P(x)dx


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Example

• Suppose we want to estimate a probability distribution p(a, b),where a ∈ x , y ans b ∈ 0, 1

• Furthermore the only fact known about p is thatp(x , 0) + p(y , 0) = 0.6


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Unbiased Principle

Figure: One way to satisfy constraints

Figure: The most uncertain way to satisfy constraints


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Maximum-Entropy Probability Model

• Continuous random variables: x = (x1, . . . , xL)T ∈ RL

• Constraints:•∫

x P(x)dx = 1

• ⟨xi⟩ =∫

xP(x)xidx = 1

M

M∑m=1

xmi = xi

• ⟨xixj⟩ =∫

xP(x)xixjdx = 1

M

M∑m=1

xmi xm

j = xixj

• Maximize: S = −∫

x P(x) ln P(x)dx


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Lagrange Multipliers Method

Lagrange Multipliers Method

Convert a constrained optimization problem into an unconstrained oneby means of the Lagrangian L.

L = S + α(⟨1⟩ − 1) +L∑

i=1βi(⟨xi⟩ − xi) +

L∑i ,j=1

γij(⟨xixj⟩ − xixj)

δLδP(x) = 0 ⇒ − ln P(x) − 1 + α +

L∑i=1

βixi +L∑

i ,j=1γijxixj = 0

Pairwise maximum-entropy probability distribution

P(x; β, γ) = exp

−1 + α +L∑

i=1βixi +

L∑i ,j=1

γijxixj

= 1Z e−H(x ;β,γ)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Lagrange Multipliers Method

Lagrange Multipliers Method

• Partition function as normalization constant

Z (β, γ) :=∫

xexp

L∑i=1

βixi +L∑

i ,j=1γijxixj

dx ≡ exp(1 − α)

• Hamiltonian

H(x) := −L∑

i=1βixi −

L∑i ,j=1

γijxixj


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Categorical Random Variables

• jointly distributed categorical variables x = (x1, . . . , xL)T ∈ ΩL,each xi is defined on the finite set Ω = σ1, . . . , σq.

• In the concrete example of modeling protein co-evolution, this setcontains the 20 amino acids represented by a 20-letter alphabet plusone gap element.

Ω = A, C , D, E , F , G , H, I, K , L, M, N, P, Q, R, S, T , V , W , Y , −and q = 21


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Binary Embedding of Amino Acid Sequence


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Pairwise MEP Distribution on Categorical Variables

• single and pairwise marginal probability

⟨xi(σ)⟩ =∑x(σ)

P(x(σ))xi(σ) =∑

xP(xi = σ) = Pi(σ)

⟨xi(σ)xj(ω)⟩ =∑

x(σ) P(x(σ))xi(σ)xj(ω) =∑

x P(xi = σ, xj = ω) = Pij(σ, ω)

• single-site and pair frequency counts

xi(σ) = 1M

M∑m=1

xmi (σ) = fi(σ)

xi(σ)xj(ω) = 1M

M∑m=1

xmi (σ)xm

j (ω) = fij(σ, ω)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



• Constraints ∑x

P(x) =∑x(σ)

P(x(σ)) = 1

Pi(σ) = fi(σ), Pij(σ, ω) = fij(σ, ω)

• Maximize S = −∑

xP(x) ln P(x) =

∑x(σ)

P(x(σ)) ln P(x(σ))

• Lagrangian

L = S + α(⟨1⟩ − 1) +L∑

i=1

∑σ∈Ω

βi(σ)(Pi(σ) − fi(σ))

+L∑

i ,j=1

∑σ,ω∈Ω

γij(σ, ω)(Pij(σ, ω) − fij(σ, ω))


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



Distribution

P(x(σ); β, γ)

= 1Z exp

L∑i=1

∑σ∈Ω

βi(σ)xi(σ) +L∑

i ,j=1

∑σ,ω∈Ω

γij(σ, ω)xi(σ)xj(ω)

Let

hi(σ) := βi(σ) + γii(σ, σ), eij(σ, ω) := 2γij(σ, ω)

Then

P(xi , . . . , xL) ≡ 1Z exp

L∑i=1

hi(xi) +∑

1≤i<j≤Leij(xi , xj)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Gauge Fixing

Gauge Fixing

L(q − 1) + L(L−1)2 (q − 1)2 independent constraints compared to

Lq + L(L−1)2 q2 free parameters to be estimated.

1 =∑σ∈Ω

Pi(σ), i = 1, . . . , L

Pi(σ) =∑ω∈Ω

Pij(σ, ω), i , j = 1, . . . , L

Two guage fixing ways• gap to zero: eij(σq, ·) = eij(·, σq) = 0, hi(σq) = 0• zero-sum guage:

∑σ eij(σ, ω) =

∑σ eij(ω′, σ) = 0,

∑σ hi(σ) = 0


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables

rewirte the exponent of the pairwise maximum- entropy probabilitydistribution

P(x; β, γ) = 1Z exp

(βT x − 1

2xT γx)

= 1Z exp

(12βT γ−1β − 1

2(x − γ−1β)T γ(x − γ−1β))

where γ := −2γ, let z = (z1, . . . , zL)T := x − γ−1β, then

P(x) = 1Z

exp(

−12(x − γ−1β)T γ(x − γ−1β)

)≡ 1

Ze− 1

2 zT γz

normalization condition

1 =∫

xP(x)dx ≡ 1

Z

∫x

e− 12 zT γzdz


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



use the point symmetry of the integrand∫z

e− 12 zT γzzidz = 0

then

⟨xi⟩ =∫

x P(x)xidx ≡ 1Z∫

z e− 12 zT γz

(zi +

∑Lj=1(γ−1)ijβj

)dz =

∑Lj=1(γ−1)ijβj

⟨xixj⟩ =∫

x P(x)xixjdx ≡ 1Z∫

z e− 12 zT γz(zi − ⟨xi⟩)(zj − ⟨xj⟩)dz = ⟨zizj⟩ + ⟨xi⟩⟨xj⟩

soCij = ⟨xixj⟩ − ⟨xi⟩⟨xj⟩ ≡ ⟨zizj⟩


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



the term ⟨zizj⟩ is solved using a spectral decomposition

⟨zizj⟩ = 1Z

L∑l ,n=1

(vl)i(vn)j

∫y

exp(

−12

L∑k=1

λky2k

)ylyndy

=L∑

k=1

1λk

(vk)i(vk)j ≡ (γ−1)ij

with Cij = (γ−1)ij , the Lagrange multipliers β and γ are

β = C−1⟨x⟩, γ = −12 γ = −1

2C−1

the real-valued maximum-entropy distribution

P(x ; ⟨x⟩, C) = (2π)−L/2 det(C)−1/2 exp(

−12(x − ⟨x⟩)T C−1(x − ⟨x⟩)

)K.K M (HUST) Introduction To DCA December 12, 2016 33 / 44

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Pair Interaction Strength

The pair interaction strength is evaluated by the already introducedpartial correlation coefficient between xi and xj given the remainingvariables xr r∈1,...,L\i ,j.

ρij·1,...,L\i,j ≡ γij√γiiγjj

=

− (C−1)ij√(C−1)ii (C−1)jj

if i = j ,

1 if i = j .


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Solution for Categorical Variables

Mean-Field Approximation

the empirical covariance matrix

Cij(σ, ω) = fij(σ, ω) − fi(σ)fj(ω)

application of the closed-form solution for continuous variables to thecategorical variables for C−1(σ, ω) ≈ C−1(σ, ω) yields the so-calledmean-field (MF) approximation

γMFij (σ, ω) = −1

2(C−1)ij(σ, ω) ⇒ eMFij (σ, ω) = −(C−1)ij(σ, ω)

The same solution has been obtained by using a perturbation ansatz tosolve the q-state Potts model termed (mean-field) Direct CouplingAnalysis (DCA or mfDCA)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood

What is the Maximum-Likelihood

• A well-known approach to estimate the parameters of a model ismaximum-likelihood inference.

• The likelihood is a scalar measure of how likely the modelparameters are, given the observed data, and themaximum-likelihood solution denotes the parameter set maximizingthe likelihood function.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Likelihood Function

Likelihood Function

To use the method of maximum likelihood, one first specifies the jointdensity function for all observations. For an independent and identicallydistributed sample, this joint density function is

f (x1, x2, . . . , xn|θ) = f (x1|θ) × f (x2|θ) × · · · × f (xn|θ)

whereas θ be the function’s variable and allowed to vary freely; thisfunction will be called the likelihood:

L(θ; x1, . . . , xn) = f (x1, x2, . . . , xn|θ) =n∏

i=1f (xi |θ)

log-likelihood

ln L(θ; x1, . . . , xn) =n∑

i=1ln f (xi |θ)

average log-likelihood ℓ = 1n lnL


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood Inference

Maximum-Likelihood InferenceFor a pairwise model with parameters h(σ) and e(σ, ω), the likelihoodfunction of given observed data x1, . . . , xM ∈ ΩL, which are assumed tobe independent and identically distributed as

l(h(σ), e(σ, ω)|x1, . . . , xM) =∏M

m=1 P(xm; h(σ), e(σ, ω))

hML(σ), eML(σ, ω) = arg maxh(σ),e(σ,ω)

l(h(σ), e(σ, ω))

= arg minh(σ),e(σ,ω)

− ln l(h(σ), e(σ, ω)

Maximum-entropy distribution as model distribution

ln l(h(σ), e(σ, ω)) =M∑

m=1ln P(xm; h(σ), e(σ, ω))

= −M

ln Z −L∑

i=1

∑σ

hi(σ)fi(σ) −∑

1≤i<j≤L

∑σ,ω

eij(σ, ω)fij(σ, ω)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Maximum-Likelihood Inference

Taking the derivatives

∂

∂hi(σ) ln l = −M[

∂

∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)

− fi(σ)]

= 0

∂

∂eij(σ, ω) ln l = −M[

∂

∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)

− fij(σ, ω)]

= 0

Partial derivatives of the partition function

∂

∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)

= 1Z ∂hi (σ) Z

∣∣∣∣h(σ),e(σ,ω)

= Pi(σ; h(σ), e(σ, ω))

∂

∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)

= 1Z ∂eij (σ,ω)Z

∣∣∣∣h(σ),e(σ,ω)

= Pij(σ, ω; h(σ), e(σ, ω))


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Maximum-Likelihood Inference

The maximizing parameters, hML(σ) and eML(σ, ω) are those matchingthe distributions single and pair marginal probabilities with the empiricalsingle and pair frequency counts,

Pi(σ; hML(σ), eML(σ, ω)) = fi(σ), Pij(σ, ω; hML(σ), eML(σ, ω)) = fij(σ, ω)

In other words, matching the moments of the pairwise maximum-entropyprobability distribution to the given data is equivalent to maximum-likelihood fitting of an exponential family.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Pseudo-Likelihood Maximization

Pseudo-Likelihood Maximization

Besag introduced the pseudo-likelihood as approximation to thelikelihood function in which the global partition function is replaced bycomputationally tractable local estimates.

In this approach, the probability of the m-th observation, xm, isapproximated by the product of the conditional probabilities of xr = xm

rgiven observations in the remaining variablesx\r := (x1, . . . , xr−1, xr+1, . . . , xL)T ∈ ΩL−1

P(xm; h(σ), e(σ, ω)) ≃L∏

r=1P(xr = xm

r |x\r = xm\r ; h(σ), e(σ, ω))


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Pseudo-Likelihood Maximization

Pseudo-Likelihood MaximizationEach factor is of the following analytical form,

P(xr = xmr |x\r = xm

\r ; h(σ), e(σ, ω)) =

exp

hr (xmr )+

∑j =r

erj(xmr , xm

j )

∑

σ

exp

hr (σ) +∑j =r

erj(σ, xmj )

By this approximation, the loglikelihood becomes thepseudo-loglikelihood,

ln lPL(h(σ), e(σ, ω)) :=M∑

m=1

L∑r=1

ln P(xr = xmr |x\r = xm

\r ; h(σ), e(σ, ω))

An ℓ2-regularizer is added to select for small absolute values of theinferred parameters,

hPLM(σ), ePLM(σ, ω) = arg minh(σ),e(σ,ω)

− ln lPL(h(σ), e(σ, ω))+

λh∥h(σ)∥22 + λe∥e(σ, ω)∥2

2K.K M (HUST) Introduction To DCA December 12, 2016 42 / 44

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Scoring Functions for Pairwise Interaction Strengths Scoring Functions

Scoring Functions for Pairwise Interaction Strengths

• Direct information: DIij =∑

σ,ω∈ΩPdir

ij (σ, ω) lnPdir

ij (σ, ω)fi(σ)fj(ω)

• Frobenius norm: ∥eij∥F = ∑

σ,ω∈Ωeij(σ, ω)2

1/2

• Average product correction: APC − FNij = ∥eij∥F − ∥ei·∥F ∥e·j ∥F∥e··∥F

where

∥ei ·∥F := 1L − 1

L∑j=1

∥eij∥F

∥e·j∥F := 1L − 1

L∑i=1

∥eij∥F

∥e··∥F := 1L(L − 1)

L∑i ,j=1

∥eij∥F


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary

Summary


Documents

Introduction To DCA - GitHub