Using residue coevolution to retrieve protein homologs1

Using residue coevolution to retrieve protein homologs1

ComPotts

Hugo Talibart, François Coste

Co-evolutionary methods for the prediction and designof protein structure and interactions

CECAM-HQ-EPFL, June 18, 2019

1work in progress

The Dyliss bioinformatics teamhttp://www.irisa.fr/dyliss

H. Talibart, F. Coste ComPotts June 18, 2019 1 / 35

http://www.irisa.fr/dyliss

Bioinformatics at IRISA / Inria Rennes

Symbiose (bioinformatics Irisa/Inria Rennes):

Dyliss research teamGenscale research teamGenouest bioinformatics platform

Seminars: http://symbiose.irisa.fr/symbioseSeminars

Biogenouest western France life science and environment networkMarine biology, agriculture/food-processing, human health, and bioinformatics.


http://symbiose.irisa.fr/symbioseSeminars

Motivation

Sequences annotation problem

High throughput production of raw sequences

Problem

Function(s) of these sequences ?


Protein function?

In-vivo / in-vitro experimentsEspecially on model organisms:

Gene knockout and others mutations→ key sequence(s) for a functionStructure determination→ key positions for a function. . .

Does not scale well. . .To face the (ever-increasing) amount of available sequences, automaticmethods are needed in-silico functional or structural predictions.

Classical approach to predict the function of a new gene sequence

Search for annotated homologs . . .


Retrieve the homologs of a protein gene

Search for a significant match with:

an (already annotated) protein sequence, e.g. with BLAST2

>1shg:A

AKELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD

2S. F. Altschul et al. “Basic local alignment search tool”. Journal of molecular biology (1990).3S. R. Eddy. “Profile hidden Markov models.”. Bioinformatics 14.9 (1998), pp. 755–763.4M. Steinegger et al. “HH-suite3 for fast remote homology detection and deep protein

annotation”. bioRxiv (2019), p. 560029.H. Talibart, F. Coste ComPotts June 18, 2019 5 / 35




>1shg:A







>1shg:A


the Profile HMM of a protein family, e.g. with HMMER3 or HH-suite4



Yet. . .





>1shg:A








>1shg:A



Score each position independently :-(2S. F. Altschul et al. “Basic local alignment search tool”. Journal of molecular biology (1990).3S. R. Eddy. “Profile hidden Markov models.”. Bioinformatics 14.9 (1998), pp. 755–763.4M. Steinegger et al. “HH-suite3 for fast remote homology detection and deep protein


Our researchGrammatical inference on biological sequences

Automatic characterization of protein sequence families with:

Automata (Protomata-Learner5,6)

M

V

3: 1..11

4: 1..11

ELTIKSGDKV

7: 1

D

8: 1..9

I

V

ELDMKPGDKI

V

V

V

ALFDYA

V

ALYAFN

V

ALYDFL

ALYDYM

I

AKFDYV

V

ALYDFV

ALYSFA

26: 1

Q

ALYDFV

P

30: 1..10

I

V

AEYDYE

33

34: 1

T

S

37: 1..11

T

S

41: 1..12

K

T

HLPLNLGDTI

AVEG

E

51: 1..11

56: 1 12

AEHDFQ2

ALYDYE

9

ALYDYK10

ALYDYE12

ALYPYD13

ALYDYE

14

ALYPFK16

AMYDFQ

18

ALYDYQ

21ALYDFQ

24

ALYEYD31

P

2: 8..10

D

9: 8..11

SRDEAI12

GI13

G

14: 8..11

AV15

AI16

DLRLERGQEY

AIYDYE QV22

N24: 8..11

G26: 8..11

ALFDFD

G

27: 8..11

ALFDFNG

28: 8..11

AM31

33: 7..10

ALFSYEP

35: 8..10

DLNFAVGSQI

DLSFPAGAVI

ALYDYK

D

39: 8..11

ALFDYK

D

40: 8..11

ALYDYD

D42: 8..11

EMALSTGDVV

I

AADK

ALYPFS

E

50: 8..11

AQTS

WW

1: 23..31

ELSFKRGNTL KVLNK2

WY

WY

YILDD5

WW

8: 21..29

DLTFTKGEKF HILNN9

DLSFMKGDRM EVIDD10

KVITD11

DLSFQKGDQM VVLEE12

DLSFKKGEKM KVLEE13

DLGLKQGEKL RVLEE14

DLQVLKGEKL QVLRS15

DLSLEKNAEY EVIDD16

LILEK18

NLALRRAEEY LILEK19

DLQLRKGDEY FILEE20

ELTFHENDVF DVFDD22

ELDIKKNERL WLLDD23

DLPFRKGDVI TILKK26

ELGFRRGDFI QVLDN27

ELAFKRGDVI TLINK28

ELDFRRGDVI TVTDR29

ELSFRKGDVI TVLEQ30

ELTFKENDVI NLIKK31

WW

32: 23..31

WW

33: 22..30

DLEFQEGDII LVLSK35

MVTAR36

ELSLKEGDII KILNK37

EIVQR38

ELSFCRGALI HNVSK39

ELTFTKSAII QNVEK40

EVVEK43

HVLSK45

EVSLLEGEAV EVIHK46

WW

49: 25..32

ELNFEKGDVM DVIEK50

DLPFKKGEIL VIIEK51

E

1: 34..35

WW

V

5: 35..38

K

8: 32..35

WW

EARS9

WW

RVVN10

WW

KARS12

WW

KAKS13

WW

RAQS14

WW

LARS15

WW KVK16

WW RAR18

WW KAR19

WW RAR20

21

WW

Q

28: 34..35

WW

E

29: 34..35

WW

WW

V

31: 34..37

R

32: 34..35

E

33: 33..36

WW

E

37: 35..36

WW

38: 36..37

WW

WW

G

WW

Q

43: 34..35

WW

WW

R

46: 34..35

KAR49

A1

E L2

N K3

E L4

D5

D8

L S9

L A12

L L13

L V15

D A16

D K18

D R19

D K20

D R21

S D24

L28

I29

R31

C32

K33

E C36

I37

L D41

L K42

M43

C44

K46

R A49

D T54

I L56

ADHQGIVP1

DGNEGFIP2

VGREGIIP3

DGKEGLIP4

SGKSGLVP5

TGKEGYIP8

SGKTGCIP9

TRKEGYIP12

TKKEGFIP13

TGREGYVP15

LGNVGYIP16

YGNEGYIP18

LGNEGLIP19

NGQEGYIP20

NGHEGYVP21

SGNVGWVP24

NNRRGIFP28

GNRKGIFP29

TKQIGMLP31

HGHFGLFP32

TGEKGLFP33

FGRSGIFP36

YGRVGWFP37

GKKQLWFP40

GSKEGWVP41

SGQKGWAP42

KAKRGWIP43

SEKRGWFP44

DDVTGYFP46

NGETGIIP49

TGETGLVP54

NGVTGQFP56

A1

S2

A3

S4

A5

S8

S9

S12

S13

S15

S16

S18

S19

S20

S21

S24

S28

A29

S31

A32

S33

S36

A37

S40

T41

T42

A43

T44

S46

S49

T54

A56

1

2

3

4

5

8

9

12

13

15

16

18

19

20

21

24

28

29

31

32

33

36

37

40

41

42

43

44

46

49

54

56

D2

P

D13

D30

P

D

31

E35

2

9

S 10

E 12

13

14

G16

N

20

E

22

Q 23

26

27

28

G 29

30

31

35

37

39

40

42

D 46

D

D

50

51

K

52

WY2: 28..32

5: 28..32

D9: 28..30

10: 28..31

WY11: 28..34

E12: 28..29

E13: 28..29

E14: 28..29

D15: 28..29

16: 28..31

18: 28..31

19: 28..31

20: 28..31

22: 30..39

23: 28..30

26: 28..33

27: 28..31

28: 28..31

29: 28..31

30: 28..31

31: 28..31

35: 28..35

36: 28..35

37: 28..32

38: 28..33

39: 28..31

40: 28..31

43: 28..31

45: 28..31

46: 28..31

50: 28..33

51: 28..31

55: 28..35

2: 35..36

KAK

3

11: 37..40

3

16

18

19

20

21

49

9

12

13

14

15

9

12

13

15

10

46

47

52

12: 10..11

13: 10

16: 10..11

22: 10..13

31: 10

20: 8..11

23: 8..11

29: 8..11

22

27

28

35

39

40

42

50

Local dependencies :-)

5G. Kerbellec. “Apprentissage d’automates modélisant des familles de séquences protéiques”.PhD thesis. Université de Rennes 1, Apr. 2008, p. 139.

6A. Bretaudeau et al. “CyanoLyase: a database of phycobilin lyase sequences, motifs andfunctions”. Nucleic Acids Research 41.Database-Issue (2013), pp. 396–401.


Our researchGrammatical inference on biological sequences

Automatic characterization of protein sequence families with:

Context-free grammars (ReGLiS7, see also8)

ALYDFQA

1: 1

DLPFSKG

3: 1..12

D

8: 1..9

ALYDYEA

9: 1

ALYDYKS

10: 1

ALYDYEA12: 1

D

13: 1..11

ALYDYEP14: 1

D15: 1..29

H

YAMYDFQA18: 1

ALYDFLP19: 1

ALYDYMP20: 1

VAKFDY23

ALYDFQA24: 1

DLPFRKG

26: 1..12

ALYDFVP29: 1

ALYEYDA

31: 1

32: 1..31

IAKFDY34

FQEGDII

ARVNEEWLEGECFG

36: 1..25

ALYDYQA

38: 1

K39

K40

K

42

K

49

E

DWYKASN

52: 1..29ALYDYTA

53: 1GETGLVP

54: 1..41

R1

R9

R10

DLSFQKG12: 9..12

EYLILEK18: 9..20

YNKSGEWCEA

24: 9..26

31: 9..12

Q38

ALYDYKA K39

ALFDYKA

EDELTFTK

40: 9..10

ALYDYDAFKEGDII

42: 9..15

ALYPYDA

R

49: 9..31

R53

EDDLTFTK9: 10

DLSFMKG

10: 10..12

DVN

38: 10..29

G39: 10..29

53: 10..49

NWWEGQ

KEGYIPS

KEGYIPS

KEGFIPS

REGYVPS

NVGYIPS

NEGYIPS

NEGLIPS

NVGWVPSDV

3

DLSFKKG

D8: 18..28

E

12: 20..29

DLSFKKG

E

13: 20..29

DV26

DLPFKKG

EI51

NWYKAKN

3: 22..31

SQN

26: 22..29

Q

51: 22..30

REGIIPA

3: 39..41

52: 37..47

REGIFPA

RRGIFPS

RSGIFPS

8

13

WWFAR8

D

WWEAR

9

WWKAR12

WWKAK13

WWKVK16

WWLVK

17

H WWRAR18

H WWKAR19

D WWTGR26

E WWTGR38

WWKAR

49

WWKCR

50

WWSAR51

TG

8: 35..37

9: 37..48

R

12: 36..39K

13: 36..39

TG

16: 37..40

17: 37..48

DK18

DR19

WWRAR DK20

WWRVRN

23

VNG26

YNG38

R

49 K

50

N

51

8

15

9: 19..30

G40: 19..29

12

13

RE19

20: 9..31

QE29

YNHNGEWCEA

25: 26

18: 28..30

EYLILEK

19: 28..30

18: 39..40

19: 39..40

19: 11..20

29: 11..29

23: 7..30

34: 7..48

23: 37..46

51: 38..47

24: 37..41

25: 37..47

26

38

26

38: 42..48

N

28: 37..38

DWWLGE 2833

SKVNEEWLEGECKG

35: 23..25

42: 23..47

36

39

40

51: 9..12

42

49

GETGIIP

49: 39..40

50: 40..49

Nested dependencies :-D

7F. Coste, G. Garet, and J. Nicolas. “A bottom-up efficient algorithm learning substitutablelanguages from positive examples”. ICGI. 2014.

8W. Dyrka et al. “Estimating probabilistic context-free grammars for proteins using contactmap constraints”. PeerJ (2019).


Proteins are 3D objects

Many crossing interactions between amino-acidsdistant in the sequence but close in the structure


The Chomsky Hierarchy


Mutual information in HIV-1 gp120 homologs

Many (crossing) correlations between MSA columns


Direct Coupling Analysis to the rescue

Direct Coupling Analysis (DCA) to the rescueHugo Talibart’s PhD

A recent breakthrough for the prediction of 3D structures byprediction of contacts

Principle : disentangle direct from indirect effectsMutual information on HIV-1 gp120 protein

Idea: Use DCA for automatic characterization of protein families:

Identify important (crossing) dependencies with DCABuild accordingly a syntactic model that can be used in practice. . .


Choice of DCA method

CCMpred9

Best one-model precision for contact prediction10

“Structuring” couplings

Figure: Top 25 PSICOV predictions Figure: Top 25 CCMpred predictions

9S. Seemayer, M. Gruber, and J. Söding. “CCMpred—fast and precise prediction of proteinresidue–residue contacts from correlated mutations”. Bioinformatics 30.21 (2014),pp. 3128–3130.

10S. H. P. de Oliveira, J. Shi, and C. M. Deane. “Comparing co-evolution methods and theirapplication to template-free protein structure prediction”. Bioinformatics 33.3 (2017),pp. 373–381.


DCA workflow

1. Protein sequence query q

1CC8:A|PDBID|CHAIN|SEQUENCE MAEIKHYQFNVVMTCSGCSGAVNKVLTKLEPDVSKIDISLEKQLVDVYT


DCA workflow

2. Retrieve close homologs and build a MSA (e.g. with HHblits11)

5.

10.

15.

20.

25.

30.

35.

40.

45.

1CC8:A|PDBID|CHAIN|SEQUENCE MAEIKHYQFNVVMTCSGCSGAVNKVLTKLEPDVSKIDISLEKQLVDVYTsp|Q54PZ2|ATOX1_DICDI ....MTYSFFVDMTCGGCSKAVNAILSKIDGVS.NIQIDLENKKVCESStr|A0A0C7MWI5|A0A0C7MWI5_9SACH .STAQHYHFDVVMTCAGCSNAINRVLTRLEPDVSNIEISLEKQTVDVVStr|A7TF58|A7TF58_VANPO .SNDNHYQFEVVMTCSGCSNAVNKALTRLEPDVSNIDISLENQTVDVHStr|G0WD69|G0WD69_NAUDC .MAENHYQFNVVMTCSGCSNAINRVLTKLEPEVSKIDISLEDQTVDVTTtr|G8ZQK6|G8ZQK6_TORDC .SQQNHYQFNVVMSCSGCSNAINKVLSRLEPDVSKIETSLDSQTVDVYTtr|S6E8D5|S6E8D5_ZYGB2 .MSQNHYHFEVVMSCEGCSNAINRVLTKLKPDVSEIRISLENQTVDVYTtr|J7R785|J7R785_KAZNA ..MSNHYQFDVVMTCSACSNAISKVLTRMEPEVTKFDVSLEKQTVDVQTtr|W1QBQ2|W1QBQ2_OGAPD .MSAKHYKFDVTMACSGCSNAVNRVLTRL.PGVKNVEISLEKQTVDVIStr|H2AUI5|H2AUI5_KAZAF ..MIYCYHFNVVMTCSGCSDAIHRSLSKLGPEVTDIDISLENQYVEVFTtr|G8JMM3|G8JMM3_ERECY .MDTKHYQFQVALACSGCVAAVEKALAKLQPDISKFDISLEKQIVDVYTtr|S9Q3L9|S9Q3L9_SCHOY ....MKYSFNVVMTCDGCKNAIDRVLNRL..GVDEKEISLEAQEVHVTTtr|Q01AV4|Q01AV4_OSTTA ..MSTTVTLRCDFACDGCANAVKRILSKDDA....VRTSVEDKLVVVV.tr|E5R4F7|E5R4F7_LEPMJ ..MTHTYKFNVTMTCGGCSGAVERVLRKLE.GVESFNVNLETQTAEVVAtr|R7Z484|R7Z484_CONA1 .MSEHNYKFNVAMSCGGCSGAVERVLKKLD.GVKSFNVSLDTQTAEIVAtr|M3CXY4|M3CXY4_SPHMS .MAEHKYKFNVSMSCGGCSGAIERVLKKLD.GVKEFNVSLETQTAEITTtr|W9XE16|W9XE16_9EURO .MSEHHYKFNVTMTCGGCSGAVERVLKKLD.GVKNYTVSLDTQTADVTTtr|Q5BDJ0|Q5BDJ0_EMENI .DQEHHYKFNVSMSCGGCSGAVERVLKKLD.GVKSFDVNLDSQTASVVTtr|W3WZP2|W3WZP2_9PEZI .ADNHTYKFNVSMSCGGCSGAVDRVLKKLD.GIESYDVSLEKQEATVIAtr|A0A0D2B224|A0A0D2B224_9PEZI ..MSHTYKFNVAMSCGGCSGAIDRVLKKLE.GVDKYEVSLEKQTAEVHTtr|A0A093XHT8|A0A093XHT8_PENMA .MAEHQYKFNVSMSCGGCSGAVERVLKKLDVGVKSYDVSLESQTATVVAtr|A0A074WQB6|A0A074WQB6_9PEZI .MSDHTYNFNITMTCGGCSGAVERVLKKLD.GVKSFDVSLDSQTAFVIT

11M. Remmert et al. “HHblits: lightning-fast iterative protein sequence searching byHMM-HMM alignment”. Nature methods 9.2 (2012), p. 173.


DCA workflow

3. Infer a Potts model from MSA

P(a|w , v) =1

Zexp

(

L−1∑

i=1

L∑

j=i+1

wij(ai , aj) +L∑

i=1

vi(ai)

)

Probability of sequencea = a1, . . . , aL

Normalization constant

Couplings Fields


Inference of Potts model from MSA (CCMpred)

Maximise pseudo-likelihood of N aligned sequences, i.e.:

(w , v) = argmaxw ,v

N∑

n=1

L∑

i=1

logP(Ai = ani |an1, · · · , a

ni−1, a

ni+1, · · · , a

nL, v ,w)

(more tractable and still good precision)

while respecting empirical frequencies:

Pi (a) = fi (a)

Pij(a, b) = fij(a, b)


Using Potts model for contact prediction

4. Contacts in q are predicted using Frobenius norm of the couplings

||wij || =

√

∑

a

∑

b

wij(a, b)2

A larger norm is interpreted as a likelier contact between positions


Using Potts model for homology search

Using Potts model for homology search

Use whole Potts model Pq of q instead of Frobenius norms

Use Pq to score each possibly homologous sequence s

Require to compute best alignment of s in Pq

As HHalign for pairs of HMMs12, align directly pairs of Potts models

A new tool: ComPotts

ComPottsPqPs

alignment

with two options to get Potts model Ps of s:One-hot encoding vi (ai ) = 1, wij(ai , aj) = 1, others are 0

s 1-hot encoding Ps

From close homologs of s as for Pq

s HHblits MSA trimal trimmed MSA CCMpredPy Ps

12J. Söding. “Protein homology detection by HMM–HMM comparison”. Bioinformatics 21.7(2004), pp. 951–960.


ComPotts (Comparing Potts models)

Formulation of Potts model alignment as an Integer LinearProgramming (ILP) problem

Based on Inken Wohlers’ solver13

13I. Wohlers. “Exact Algorithms For Pairwise Protein Structure Alignment”. PhD thesis. VrijeUniversiteit, Jan. 2012, pp. 1 –147.


Scoring alignment of Potts models A and B

s(A,B) =

LA∑

i=1

LB∑

k=1

sv (vAi , v

Bk )xik +

LA−1∑

i=1

LA∑

j=i+1

LB−1∑

k=1

LB∑

l=k+1

sw (wAij ,w

Bkl )yikjl

where

xik = 1 iff position i of A and position k of B are aligned (otherwise, xik = 0)yikjl = 1 iff xik = 1 and xjl = 1 (otherwise, yikjl = 0)


Choice of sv(vi , vk) and sw(wij ,wkl): scalar products

sv (vAi , v

Bk ) = 〈vAi , v

Bk 〉

→ standard scalar product : 〈x , y〉 =∑

i xiyi

sw (wAij ,w

Bkl ) = 〈wA

ij ,wBkl 〉F

→ Frobenius scalar product : 〈X ,Y 〉F =∑

i

∑

j XijYij

Geometric insight

vBk

vAiθ

〈vAi , v

Bk 〉 =

∥

∥vAi

∥

∥

∥

∥vBk

∥

∥ cos θ

importance of position i

importance of position k

similarity measure


Natural extension of the 1D score of a sequence

P(a|w , v) =1

Zexp (H(a|v ,w))

H(a|v ,w) =

L∑

i=1

vi (ai ) +

L−1∑

i=1

L∑

j=i+1

wij(ai , aj)

=

L∑

i=1

〈vi , eai 〉+

L−1∑

i=1

L∑

j=i+1

〈wij , eaiaj 〉F

eai aj =

0 . . . . . . 0 . . . . . . 0

.

.

.

.

.

.

.

.

.0

0 . . . 0 1 0 . . . 00

.

.

.

.

.

.

.

.

.0 . . . . . . 0 . . . . . . 0

ai

aj

eai =

0

.

.

.010

.

.

.0

ai


First experiments

PDB 1CC8 : Atx1 metallochaperone (Saccharomyces cerevisiae)

× one homolog s (150 sequences sampling, identity with 1CC8 : 25%-50%)

One-hot encoding of Ps Timeout: 6 hours

trimmed not trimmedǫ = machine epsilon t ∈ [11s, 6h],

avg: 2ht ∈ [25s, 6h],avg: 4h30

ǫ = 114 t ∈ [8s, 6h],avg: 1h30

t ∈ [16s, 6h],avg: 2h

Build Ps from homologs of s Timeout: 6 hours

trimmed not trimmedǫ = machine epsilon t ∈ [3s, 6h],

avg: 3 mint ∈ [3s, 6h],avg: 8 min

ǫ = 1 t ∈ [2s, 6s],avg: 5s

t ∈ [3s, 50s],avg: 21s

Tractable time! not for the simpler models?Small proteins, easy to align. . .

14≃ Energy needed to change one a.a. into anotherH. Talibart, F. Coste ComPotts June 18, 2019 25 / 35

Testing the limits on thioredoxins

Enzymes involved in reduction–oxidation reactions through oxidationof their active site

Figure: 3D structure of thioredoxin-1 (Caenorhabditis elegans) (Q09433)

100 amino acids on average

Between 15 and 20% sequence identity within the family

→ known to be hard to align


An example of failure

{vQ12404i }i and {vP17967

i }i aligned by ComPotts:

Even well-conserved positions of the active site are not aligned


The trouble with scalar product alone

A well-conserved column i may have a smaller ||vi || than a lessconserved column j

It may be more profitable to align many less conserved columns thanto align fewer well-conserved columns with each other


An idea

Use rescaling function : f (x) = sign(x)(e |x | − 1)

Figure: Before rescaling

Figure: After rescaling


It’s better :-)

{vQ12404i }i and {vP17967

i }i aligned by ComPotts (wo couplings!)


To be continued. . .

How to rescale also consistently the couplings wij?

→ Slightly change the rescaling function f (x) = sign(x)(βeα|x| − γ)?

Other similarity functions?. . .

Introduce gap costs

Constrain Potts model inference?

Canonical Potts model?Better control amplitude of vectors and matrices?


Conclusion so far. . .

Good news: alignment to Potts model is tractable

A surprise: may require a transformation to a Potts model

A working efficient implementation

Quality of alignment can still be improved. . .


Thanks for your attention!

Ideas, remarks, suggestions are welcome.See you next to our poster. . .

Bibliography I

[Alt+90] S. F. Altschul et al. “Basic local alignment search tool”.Journal of molecular biology (1990).

[Edd98] S. R. Eddy. “Profile hidden Markov models.”. Bioinformatics14.9 (1998), pp. 755–763.

[Ste+19] M. Steinegger et al. “HH-suite3 for fast remote homologydetection and deep protein annotation”. bioRxiv (2019),p. 560029.

[Ker08] G. Kerbellec. “Apprentissage d’automates modélisant desfamilles de séquences protéiques”. PhD thesis. Université deRennes 1, Apr. 2008, p. 139.

[Bre+13] A. Bretaudeau et al. “CyanoLyase: a database of phycobilinlyase sequences, motifs and functions”. Nucleic Acids Research41.Database-Issue (2013), pp. 396–401.


Bibliography II

[CGN14] F. Coste, G. Garet, and J. Nicolas. “A bottom-up efficientalgorithm learning substitutable languages from positiveexamples”. ICGI. 2014.

[Dyr+19] W. Dyrka et al. “Estimating probabilistic context-freegrammars for proteins using contact map constraints”. PeerJ(2019).

[SGS14] S. Seemayer, M. Gruber, and J. Söding. “CCMpred—fast andprecise prediction of protein residue–residue contacts fromcorrelated mutations”. Bioinformatics 30.21 (2014),pp. 3128–3130.

[OSD17] S. H. P. de Oliveira, J. Shi, and C. M. Deane. “Comparingco-evolution methods and their application to template-freeprotein structure prediction”. Bioinformatics 33.3 (2017),pp. 373–381.


Bibliography III

[Jon+11] D. T. Jones et al. “PSICOV: precise structural contactprediction using sparse inverse covariance estimation on largemultiple sequence alignments”. Bioinformatics 28.2 (2011),pp. 184–190.

[Rem+12] M. Remmert et al. “HHblits: lightning-fast iterative proteinsequence searching by HMM-HMM alignment”. Naturemethods 9.2 (2012), p. 173.

[Söd04] J. Söding. “Protein homology detection by HMM–HMMcomparison”. Bioinformatics 21.7 (2004), pp. 951–960.

[Woh12] I. Wohlers. “Exact Algorithms For Pairwise Protein StructureAlignment”. PhD thesis. Vrije Universiteit, Jan. 2012, pp. 1–147.


Documents

Using residue coevolution to retrieve protein homologs1