44
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction To DCA K.K M [email protected] December 12, 2016 K.K M (HUST) Introduction To DCA December 12, 2016 1 / 44

Introduction To DCA - GitHub

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Introduction To DCA

K.K M

[email protected]

December 12, 2016

K.K M (HUST) Introduction To DCA December 12, 2016 1 / 44

Page 2: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Contents

Contents

• Distance And Correlation• Maximum-Entropy Probability Model• Maximum-Likelihood Inference• Pairwise Interaction Scoring Function• Summary

K.K M (HUST) Introduction To DCA December 12, 2016 2 / 44

Page 3: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Definition of Distance

Definition of Distance

A statistical distance quantifies the distance between two statisticalobjects, which can be two random variables, or two probabilitydistributions or samples, or the distance can be between an individualsample point and a population or a wider sample of points.

• d(x , x) = 0 Identity of indiscernibles• d(x , y) ≥ 0 Non negative• d(x , y) = d(y , x) Symmetry• d(x , k) + d(k, y) ≥ d(x , y) Triangle inequality

K.K M (HUST) Introduction To DCA December 12, 2016 3 / 44

Page 4: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Different Distances

Different Distances

• Minkowski distance• Euclidean distance• Manhattan distance• Chebyshev distance• Mahalanobis distance• Cosine similarity

• Pearson correlation• Hamming distance• Jaccard similarity• Levenshtein distance• DTW distance• KL-Divergence

K.K M (HUST) Introduction To DCA December 12, 2016 4 / 44

Page 5: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Minkowski Distance

Minkowski Distance

P = (x1, x2, ..., xn) and Q = (y1, y2, ..., yn) ∈ Rn

• Minkowski distance: (n∑

i=1|xi − yi |p)

1p

• Euclidean distance: p = 2• Manhattan distance: p = 1• Chebyshev distance: p = ∞

limp→∞

(n∑

i=1|xi − yi |p)

1p = nmax

i=1|xi − yi |

• z-transform: (xi , yi) 7→ ( xi −µxσx

,yi −µy

σy), xi and yi is independent

K.K M (HUST) Introduction To DCA December 12, 2016 5 / 44

Page 6: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Mahalanobis Distance

Mahalanobis Distance

• Mahalanobis Distance

Covariance matrix C = LLT

Transformation x L−1(x−µ)−−−−−−→ x ′

K.K M (HUST) Introduction To DCA December 12, 2016 6 / 44

Page 7: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Mahalanobis Distance

Example for the Mahalanobis distance

Figure: Euclidean distance Figure: Mahalanobis distance

K.K M (HUST) Introduction To DCA December 12, 2016 7 / 44

Page 8: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson Correlation

Pearson Correlation

• Inner Product: Inner(x , y) = ⟨x , y⟩ =∑

ixiyi

• Cosine similarity: CosSim(x , y) =∑

i xi yi√∑i x2

i

√∑i y2

i= ⟨x ,y⟩

∥x∥ ∥y∥

• Pearson correlation

Corr(x , y) =∑

i(xi − x)(yi − y)√∑(xi − x)2

√∑(yi − y)2

= ⟨x − x , y − y⟩∥x − x∥ ∥y − y∥

= CosSim(x − x , y − y)

K.K M (HUST) Introduction To DCA December 12, 2016 8 / 44

Page 9: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson Correlation

Different Values for the Pearson Correlation

K.K M (HUST) Introduction To DCA December 12, 2016 9 / 44

Page 10: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Limits of Pearson Correlation

Limits of Pearson Correlation

Q: Why don’t use Pearson correlation in pairwise associations?

A: Pearson correlation is a misleading measure for direct dependence asit only reflects the association between two variables while ignoring theinfluence of the remaining ones.

K.K M (HUST) Introduction To DCA December 12, 2016 10 / 44

Page 11: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Partial Correlation

Partial Correlation

For M given samples in L measured variables:

x1 = (x11 , . . . , x1

L)T , . . . , xM = (xM1 , . . . , xM

L ) ∈ RL

Pearson correlation coefficient:

rij = Cij√Cii C jj

where Cij = 1M∑M

m=1(xmi − xi)(xm

j − xj) is empirical covariance matrix.Scale ecah of variables to zero-mean and unit-standard deviation,

xi 7→ (xi − xi)√Cii

Simplify the correlation coefficient: rij ≡ xixj

K.K M (HUST) Introduction To DCA December 12, 2016 11 / 44

Page 12: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Partial Correlation

Partial Correlation of Three-variable System

The partial correlation between X and Y given a set of n controllingvariables Z = Z1, Z2, . . . , Zn, written rXY ·Z , is

rXY ·Z = N∑N

i=1 rX ,irY ,i −∑N

i=1 rX ,i∑N

i=1 rY ,i√N∑N

i=1 r2X ,i − (

∑Ni=1 rX ,i)2

√N∑N

i=1 r2Y ,i − (

∑Ni=1 rY ,i)2

In particularly, where Z is a single variable, which of a three randomvariables between xA and xB given xC is defined as

rAB·C = rAB − rBCrAC√1 − r2

AC

√1 − r2

BC

≡ − (C−1)AB√(C−1)AA(C−1)BB

K.K M (HUST) Introduction To DCA December 12, 2016 12 / 44

Page 13: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Distance And Correlation Pearson V.S. Partial Correlation

Reaction System Reconstruction

K.K M (HUST) Introduction To DCA December 12, 2016 13 / 44

Page 14: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Entropy

Entropy

The thermodynamic definition of entropy was developed in the early1850s by Rudolf Clausius:

∆S =∫

δQrevT

In 1948, Shannon defined the entropy H of a discrete random variable Xwith possible values x1, . . . , xn and probability mass function p(x) as:

H(X ) = −∑

xp(x) log p(x)

K.K M (HUST) Introduction To DCA December 12, 2016 14 / 44

Page 15: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Joint & Conditional Entropy

Joint & Conditional Entropy

• Joint Entropy: H(X , Y )• Conditional Entropy: H(Y |X )

H(X , Y ) − H(X ) = −∑x ,y

p(x , y) log p(x , y) +∑

xp(x) log p(x)

= −∑x ,y

p(x , y) log p(x , y) +∑

x

(∑y

p(x , y))

log p(x)

= −∑x ,y

p(x , y) log p(x , y) +∑x ,y

p(x , y) log p(x)

= −∑x ,y

p(x , y) log p(x , y)p(x)

= −∑x ,y

p(x , y) log p(y |x)

= H(Y |X )

K.K M (HUST) Introduction To DCA December 12, 2016 15 / 44

Page 16: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Relative Entropy & Mutual Information

Relative Entropy & Mutual Information

• Relative Entropy(KL-Divergence): D(p∥q) =∑

xp(x) log p(x)

q(x)• D(p∥q) = D(q∥p)• D(p∥q) ≥ 0

• Mutual Information:

I(X , Y ) = D( p(x , y) ∥ p(x)p(y) ) =∑x ,y

p(x , y) log p(x , y)p(x)p(y)

K.K M (HUST) Introduction To DCA December 12, 2016 16 / 44

Page 17: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Relative Entropy & Mutual Information

Mutual Information

H(Y ) − I(X , Y ) = −∑

yp(y) log p(y) −

∑x ,y

p(x , y) log p(x , y)p(x)p(y)

= −∑

y

(∑x

p(x , y))

log p(y) −∑x ,y

p(x , y) log p(x , y)p(x)p(y)

= −∑x ,y

p(x , y) log p(y) −∑x ,y

p(x , y) log p(x , y)p(x)p(y)

= −∑x ,y

p(x , y) log p(x , y)p(x)

= H(Y |X )

H(X , Y ) − H(X ) = H(Y |X ) ⇒ I(X , Y ) = H(X ) + H(Y ) − H(X , Y )

K.K M (HUST) Introduction To DCA December 12, 2016 17 / 44

Page 18: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model MI & DI

MI & DI

• MIij =∑σ,ω

fij(σ, ω) ln(

fij(σ, ω)fi(σ)fj(ω)

)

• DIij =∑σ,ω

Pdirij (σ, ω) ln

(Pdir

ij (σ, ω)fi(σ)fj(ω)

)

where Pdirij (σ, ω) = 1

zijexp(eij(σ, ω) + hi(σ) + hj(ω))

K.K M (HUST) Introduction To DCA December 12, 2016 18 / 44

Page 19: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Principle of Maximum Entropy

Principle of Maximum Entropy

• The principle was first expounded by E.T.Jaynes in two papers in1957 where he emphasized a natural correspondence betweenstatistical mechanics and information theory.

• The principle of maximum entropy states that, subject to preciselystated prior data, the probability distribution which best representsthe current state of knowledge is the one with largest entropy.

• maximize S = −∫

x P(x) ln P(x)dx

K.K M (HUST) Introduction To DCA December 12, 2016 19 / 44

Page 20: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Principle of Maximum Entropy

Example

• Suppose we want to estimate a probability distribution p(a, b),where a ∈ x , y ans b ∈ 0, 1

• Furthermore the only fact known about p is thatp(x , 0) + p(y , 0) = 0.6

K.K M (HUST) Introduction To DCA December 12, 2016 20 / 44

Page 21: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Principle of Maximum Entropy

Unbiased Principle

Figure: One way to satisfy constraints

Figure: The most uncertain way to satisfy constraints

K.K M (HUST) Introduction To DCA December 12, 2016 21 / 44

Page 22: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Principle of Maximum Entropy

Maximum-Entropy Probability Model

• Continuous random variables: x = (x1, . . . , xL)T ∈ RL

• Constraints:•∫

x P(x)dx = 1

• ⟨xi⟩ =∫

xP(x)xidx = 1

M

M∑m=1

xmi = xi

• ⟨xixj⟩ =∫

xP(x)xixjdx = 1

M

M∑m=1

xmi xm

j = xixj

• Maximize: S = −∫

x P(x) ln P(x)dx

K.K M (HUST) Introduction To DCA December 12, 2016 22 / 44

Page 23: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Lagrange Multipliers Method

Lagrange Multipliers Method

Convert a constrained optimization problem into an unconstrained oneby means of the Lagrangian L.

L = S + α(⟨1⟩ − 1) +L∑

i=1βi(⟨xi⟩ − xi) +

L∑i ,j=1

γij(⟨xixj⟩ − xixj)

δLδP(x) = 0 ⇒ − ln P(x) − 1 + α +

L∑i=1

βixi +L∑

i ,j=1γijxixj = 0

Pairwise maximum-entropy probability distribution

P(x; β, γ) = exp

−1 + α +L∑

i=1βixi +

L∑i ,j=1

γijxixj

= 1Z e−H(x ;β,γ)

K.K M (HUST) Introduction To DCA December 12, 2016 23 / 44

Page 24: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Lagrange Multipliers Method

Lagrange Multipliers Method

• Partition function as normalization constant

Z (β, γ) :=∫

xexp

L∑i=1

βixi +L∑

i ,j=1γijxixj

dx ≡ exp(1 − α)

• Hamiltonian

H(x) := −L∑

i=1βixi −

L∑i ,j=1

γijxixj

K.K M (HUST) Introduction To DCA December 12, 2016 24 / 44

Page 25: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Categorical Random Variables

• jointly distributed categorical variables x = (x1, . . . , xL)T ∈ ΩL,each xi is defined on the finite set Ω = σ1, . . . , σq.

• In the concrete example of modeling protein co-evolution, this setcontains the 20 amino acids represented by a 20-letter alphabet plusone gap element.

Ω = A, C , D, E , F , G , H, I, K , L, M, N, P, Q, R, S, T , V , W , Y , −and q = 21

K.K M (HUST) Introduction To DCA December 12, 2016 25 / 44

Page 26: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Binary Embedding of Amino Acid Sequence

K.K M (HUST) Introduction To DCA December 12, 2016 26 / 44

Page 27: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables

• single and pairwise marginal probability

⟨xi(σ)⟩ =∑x(σ)

P(x(σ))xi(σ) =∑

xP(xi = σ) = Pi(σ)

⟨xi(σ)xj(ω)⟩ =∑

x(σ) P(x(σ))xi(σ)xj(ω) =∑

x P(xi = σ, xj = ω) = Pij(σ, ω)

• single-site and pair frequency counts

xi(σ) = 1M

M∑m=1

xmi (σ) = fi(σ)

xi(σ)xj(ω) = 1M

M∑m=1

xmi (σ)xm

j (ω) = fij(σ, ω)

K.K M (HUST) Introduction To DCA December 12, 2016 27 / 44

Page 28: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables

• Constraints ∑x

P(x) =∑x(σ)

P(x(σ)) = 1

Pi(σ) = fi(σ), Pij(σ, ω) = fij(σ, ω)

• Maximize S = −∑

xP(x) ln P(x) =

∑x(σ)

P(x(σ)) ln P(x(σ))

• Lagrangian

L = S + α(⟨1⟩ − 1) +L∑

i=1

∑σ∈Ω

βi(σ)(Pi(σ) − fi(σ))

+L∑

i ,j=1

∑σ,ω∈Ω

γij(σ, ω)(Pij(σ, ω) − fij(σ, ω))

K.K M (HUST) Introduction To DCA December 12, 2016 28 / 44

Page 29: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Maximum-Entropy Probability Model

Pairwise MEP Distribution on Categorical Variables

Distribution

P(x(σ); β, γ)

= 1Z exp

L∑i=1

∑σ∈Ω

βi(σ)xi(σ) +L∑

i ,j=1

∑σ,ω∈Ω

γij(σ, ω)xi(σ)xj(ω)

Let

hi(σ) := βi(σ) + γii(σ, σ), eij(σ, ω) := 2γij(σ, ω)

Then

P(xi , . . . , xL) ≡ 1Z exp

L∑i=1

hi(xi) +∑

1≤i<j≤Leij(xi , xj)

K.K M (HUST) Introduction To DCA December 12, 2016 29 / 44

Page 30: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Gauge Fixing

Gauge Fixing

L(q − 1) + L(L−1)2 (q − 1)2 independent constraints compared to

Lq + L(L−1)2 q2 free parameters to be estimated.

1 =∑σ∈Ω

Pi(σ), i = 1, . . . , L

Pi(σ) =∑ω∈Ω

Pij(σ, ω), i , j = 1, . . . , L

Two guage fixing ways• gap to zero: eij(σq, ·) = eij(·, σq) = 0, hi(σq) = 0• zero-sum guage:

∑σ eij(σ, ω) =

∑σ eij(ω′, σ) = 0,

∑σ hi(σ) = 0

K.K M (HUST) Introduction To DCA December 12, 2016 30 / 44

Page 31: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables

rewirte the exponent of the pairwise maximum- entropy probabilitydistribution

P(x; β, γ) = 1Z exp

(βT x − 1

2xT γx)

= 1Z exp

(12βT γ−1β − 1

2(x − γ−1β)T γ(x − γ−1β))

where γ := −2γ, let z = (z1, . . . , zL)T := x − γ−1β, then

P(x) = 1Z

exp(

−12(x − γ−1β)T γ(x − γ−1β)

)≡ 1

Ze− 1

2 zT γz

normalization condition

1 =∫

xP(x)dx ≡ 1

Z

∫x

e− 12 zT γzdz

K.K M (HUST) Introduction To DCA December 12, 2016 31 / 44

Page 32: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables

use the point symmetry of the integrand∫z

e− 12 zT γzzidz = 0

then

⟨xi⟩ =∫

x P(x)xidx ≡ 1Z∫

z e− 12 zT γz

(zi +

∑Lj=1(γ−1)ijβj

)dz =

∑Lj=1(γ−1)ijβj

⟨xixj⟩ =∫

x P(x)xixjdx ≡ 1Z∫

z e− 12 zT γz(zi − ⟨xi⟩)(zj − ⟨xj⟩)dz = ⟨zizj⟩ + ⟨xi⟩⟨xj⟩

soCij = ⟨xixj⟩ − ⟨xi⟩⟨xj⟩ ≡ ⟨zizj⟩

K.K M (HUST) Introduction To DCA December 12, 2016 32 / 44

Page 33: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables

Closed-Form Solution for Continuous Variables

the term ⟨zizj⟩ is solved using a spectral decomposition

⟨zizj⟩ = 1Z

L∑l ,n=1

(vl)i(vn)j

∫y

exp(

−12

L∑k=1

λky2k

)ylyndy

=L∑

k=1

1λk

(vk)i(vk)j ≡ (γ−1)ij

with Cij = (γ−1)ij , the Lagrange multipliers β and γ are

β = C−1⟨x⟩, γ = −12 γ = −1

2C−1

the real-valued maximum-entropy distribution

P(x ; ⟨x⟩, C) = (2π)−L/2 det(C)−1/2 exp(

−12(x − ⟨x⟩)T C−1(x − ⟨x⟩)

)K.K M (HUST) Introduction To DCA December 12, 2016 33 / 44

Page 34: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Closed-Form Solution for Continuous Variables

Pair Interaction Strength

The pair interaction strength is evaluated by the already introducedpartial correlation coefficient between xi and xj given the remainingvariables xr r∈1,...,L\i ,j.

ρij·1,...,L\i,j ≡ γij√γiiγjj

=

− (C−1)ij√(C−1)ii (C−1)jj

if i = j ,

1 if i = j .

K.K M (HUST) Introduction To DCA December 12, 2016 34 / 44

Page 35: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Entropy Probability Model Solution for Categorical Variables

Mean-Field Approximation

the empirical covariance matrix

Cij(σ, ω) = fij(σ, ω) − fi(σ)fj(ω)

application of the closed-form solution for continuous variables to thecategorical variables for C−1(σ, ω) ≈ C−1(σ, ω) yields the so-calledmean-field (MF) approximation

γMFij (σ, ω) = −1

2(C−1)ij(σ, ω) ⇒ eMFij (σ, ω) = −(C−1)ij(σ, ω)

The same solution has been obtained by using a perturbation ansatz tosolve the q-state Potts model termed (mean-field) Direct CouplingAnalysis (DCA or mfDCA)

K.K M (HUST) Introduction To DCA December 12, 2016 35 / 44

Page 36: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood

What is the Maximum-Likelihood

• A well-known approach to estimate the parameters of a model ismaximum-likelihood inference.

• The likelihood is a scalar measure of how likely the modelparameters are, given the observed data, and themaximum-likelihood solution denotes the parameter set maximizingthe likelihood function.

K.K M (HUST) Introduction To DCA December 12, 2016 36 / 44

Page 37: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Likelihood Function

Likelihood Function

To use the method of maximum likelihood, one first specifies the jointdensity function for all observations. For an independent and identicallydistributed sample, this joint density function is

f (x1, x2, . . . , xn|θ) = f (x1|θ) × f (x2|θ) × · · · × f (xn|θ)

whereas θ be the function’s variable and allowed to vary freely; thisfunction will be called the likelihood:

L(θ; x1, . . . , xn) = f (x1, x2, . . . , xn|θ) =n∏

i=1f (xi |θ)

log-likelihood

ln L(θ; x1, . . . , xn) =n∑

i=1ln f (xi |θ)

average log-likelihood ℓ = 1n lnL

K.K M (HUST) Introduction To DCA December 12, 2016 37 / 44

Page 38: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood Inference

Maximum-Likelihood InferenceFor a pairwise model with parameters h(σ) and e(σ, ω), the likelihoodfunction of given observed data x1, . . . , xM ∈ ΩL, which are assumed tobe independent and identically distributed as

l(h(σ), e(σ, ω)|x1, . . . , xM) =∏M

m=1 P(xm; h(σ), e(σ, ω))

hML(σ), eML(σ, ω) = arg maxh(σ),e(σ,ω)

l(h(σ), e(σ, ω))

= arg minh(σ),e(σ,ω)

− ln l(h(σ), e(σ, ω)

Maximum-entropy distribution as model distribution

ln l(h(σ), e(σ, ω)) =M∑

m=1ln P(xm; h(σ), e(σ, ω))

= −M

ln Z −L∑

i=1

∑σ

hi(σ)fi(σ) −∑

1≤i<j≤L

∑σ,ω

eij(σ, ω)fij(σ, ω)

K.K M (HUST) Introduction To DCA December 12, 2016 38 / 44

Page 39: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood Inference

Maximum-Likelihood Inference

Taking the derivatives

∂hi(σ) ln l = −M[

∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)

− fi(σ)]

= 0

∂eij(σ, ω) ln l = −M[

∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)

− fij(σ, ω)]

= 0

Partial derivatives of the partition function

∂hi(σ) ln Z∣∣∣∣h(σ),e(σ,ω)

= 1Z ∂hi (σ) Z

∣∣∣∣h(σ),e(σ,ω)

= Pi(σ; h(σ), e(σ, ω))

∂eij(σ, ω) ln Z∣∣∣∣h(σ),e(σ,ω)

= 1Z ∂eij (σ,ω)Z

∣∣∣∣h(σ),e(σ,ω)

= Pij(σ, ω; h(σ), e(σ, ω))

K.K M (HUST) Introduction To DCA December 12, 2016 39 / 44

Page 40: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Maximum-Likelihood Inference

Maximum-Likelihood Inference

The maximizing parameters, hML(σ) and eML(σ, ω) are those matchingthe distributions single and pair marginal probabilities with the empiricalsingle and pair frequency counts,

Pi(σ; hML(σ), eML(σ, ω)) = fi(σ), Pij(σ, ω; hML(σ), eML(σ, ω)) = fij(σ, ω)

In other words, matching the moments of the pairwise maximum-entropyprobability distribution to the given data is equivalent to maximum-likelihood fitting of an exponential family.

K.K M (HUST) Introduction To DCA December 12, 2016 40 / 44

Page 41: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Pseudo-Likelihood Maximization

Pseudo-Likelihood Maximization

Besag introduced the pseudo-likelihood as approximation to thelikelihood function in which the global partition function is replaced bycomputationally tractable local estimates.

In this approach, the probability of the m-th observation, xm, isapproximated by the product of the conditional probabilities of xr = xm

rgiven observations in the remaining variablesx\r := (x1, . . . , xr−1, xr+1, . . . , xL)T ∈ ΩL−1

P(xm; h(σ), e(σ, ω)) ≃L∏

r=1P(xr = xm

r |x\r = xm\r ; h(σ), e(σ, ω))

K.K M (HUST) Introduction To DCA December 12, 2016 41 / 44

Page 42: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Maximum-Likelihood Inference Pseudo-Likelihood Maximization

Pseudo-Likelihood MaximizationEach factor is of the following analytical form,

P(xr = xmr |x\r = xm

\r ; h(σ), e(σ, ω)) =

exp

hr (xmr )+

∑j =r

erj(xmr , xm

j )

σ

exp

hr (σ) +∑j =r

erj(σ, xmj )

By this approximation, the loglikelihood becomes thepseudo-loglikelihood,

ln lPL(h(σ), e(σ, ω)) :=M∑

m=1

L∑r=1

ln P(xr = xmr |x\r = xm

\r ; h(σ), e(σ, ω))

An ℓ2-regularizer is added to select for small absolute values of theinferred parameters,

hPLM(σ), ePLM(σ, ω) = arg minh(σ),e(σ,ω)

− ln lPL(h(σ), e(σ, ω))+

λh∥h(σ)∥22 + λe∥e(σ, ω)∥2

2K.K M (HUST) Introduction To DCA December 12, 2016 42 / 44

Page 43: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Scoring Functions for Pairwise Interaction Strengths Scoring Functions

Scoring Functions for Pairwise Interaction Strengths

• Direct information: DIij =∑

σ,ω∈ΩPdir

ij (σ, ω) lnPdir

ij (σ, ω)fi(σ)fj(ω)

• Frobenius norm: ∥eij∥F = ∑

σ,ω∈Ωeij(σ, ω)2

1/2

• Average product correction: APC − FNij = ∥eij∥F − ∥ei·∥F ∥e·j ∥F∥e··∥F

where

∥ei ·∥F := 1L − 1

L∑j=1

∥eij∥F

∥e·j∥F := 1L − 1

L∑i=1

∥eij∥F

∥e··∥F := 1L(L − 1)

L∑i ,j=1

∥eij∥F

K.K M (HUST) Introduction To DCA December 12, 2016 43 / 44

Page 44: Introduction To DCA - GitHub

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary

Summary

K.K M (HUST) Introduction To DCA December 12, 2016 44 / 44