Upload
tory
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Remote Homology Detection of Beta-Structural Motifs Using Random Fields. Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts ISMB 3Dsig 2010 July 10, 2010. Inferring structural similarity from homology is hard at the SCOP superfamily/fold level. Profile HMMs. - PowerPoint PPT Presentation
Citation preview
Remote Homology Detection of Beta-Structural Motifs Using
Random Fields
Matt Menke, Tufts
Bonnie Berger, MIT
Lenore Cowen, Tufts
ISMB 3Dsig 2010
July 10, 2010
Inferring structural similarity from homology is hard at the SCOP superfamily/fold level
Profile HMMs
HMM is trained from Sequence Alignment of Known Structures
But: cannot capture pariwise long-range beta-sheet interactions!
HMMs cannot capture statistical preferences from residues close in space but far, and a variable distance apart in seq.
Pectate Lyase C (Yoder et al. 1993)
Look at Just Pairs or Generalize to Markov Random Fields
Only look at Pairs:
Generalize to Markov Random Fields
Liu et al. 2009
Zhao et al. 2010
Menke et al. 2010
(This work)
B3 T2
B2
B1
[Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819-14,824 ; Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276]
Let’s look at what this would mean for propeller folds
Goal: capture HMM sequence information and pairwise information in beta-structural motifs at the same time!
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop
Structural Motifs Using Random Fields
SMURF
Structural Motifs Using Random Fields
Can we getthe benefitof pairwisecorrelationswithout having to throw awayall sequence info?
The template is learned from solved structures in the PDB
The template is learned from solved structures in the PDB:
Aligned with Matt
Digression: Matt structural alignment program
Menke, Berger, Cowen, (PLOS Combio 2008)
Specifically designed to align more distant homologs
AFP chaining using dynamic programming with “translations and twists”
(flexibility)
The template is learned from solved structures in the PDB:
Aligned with Matt
Two beta tables are learned from amphapathic beta sheets that are not propellers from solved structures in the PDB.
A C D E F G H I K L M N P Q R S T V W Y
A 0.78 0.18 0.14 0.15 0.59 0.70 0.06 1.06 0.07 1.19 0.17 0.12 0.05 0.11 0.08 0.22 0.25 1.53 0.17 0.27
C 0.18 0.24 0.03 0.06 0.12 0.14 0.05 0.28 0.03 0.34 0.07 0.02 0.01 0.03 0.02 0.05 0.08 0.39 0.10 0.10
D 0.14 0.03 0.03 0.06 0.10 0.15 0.02 0.11 0.01 0.16 0.05 0.07 0.01 0.05 0.08 0.07 0.11 0.16 0.03 0.03
E 0.15 0.06 0.06 0.05 0.26 0.18 0.14 0.40 0.10 0.57 0.08 0.10 0.02 0.08 0.15 0.19 0.25 0.57 0.05 0.18
F 0.59 0.12 0.10 0.26 0.66 0.61 0.10 1.06 0.05 1.19 0.24 0.08 0.05 0.15 0.08 0.13 0.22 1.35 0.13 0.43G 0.70 0.14 0.15 0.18 0.61 0.58 0.10 0.77 0.07 1.13 0.11 0.23 0.07 0.17 0.09 0.24 0.31 1.27 0.18 0.48
H 0.06 0.05 0.02 0.14 0.10 0.10 0.04 0.13 0.02 0.13 0.04 0.05 0.01 0.01 0.02 0.06 0.09 0.23 0.03 0.07
I 1.06 0.28 0.11 0.40 1.06 0.77 0.13 2.27 0.10 2.21 0.38 0.14 0.05 0.29 0.13 0.26 0.45 2.56 0.18 0.42
K 0.07 0.03 0.01 0.10 0.05 0.07 0.02 0.10 0.03 0.16 0.03 0.04 0.00 0.05 0.01 0.05 0.05 0.17 0.02 0.10
L 1.19 0.34 0.16 0.57 1.19 1.13 0.13 2.21 0.16 2.96 0.48 0.18 0.06 0.33 0.18 0.29 0.36 2.64 0.25 0.50
M 0.17 0.07 0.05 0.08 0.24 0.11 0.04 0.38 0.03 0.48 0.10 0.01 0.01 0.03 0.04 0.06 0.07 0.49 0.08 0.06
N 0.12 0.02 0.07 0.10 0.08 0.23 0.05 0.14 0.04 0.18 0.01 0.05 0.01 0.05 0.06 0.12 0.16 0.18 0.04 0.08
P 0.05 0.01 0.01 0.02 0.05 0.07 0.01 0.05 0.00 0.06 0.01 0.01 0.01 0.01 0.01 0.02 0.02 0.09 0.02 0.04
Q 0.11 0.03 0.05 0.08 0.15 0.17 0.01 0.29 0.05 0.33 0.03 0.05 0.01 0.04 0.08 0.17 0.17 0.27 0.05 0.13
R 0.08 0.02 0.08 0.15 0.08 0.09 0.02 0.13 0.01 0.18 0.04 0.06 0.01 0.08 0.04 0.05 0.07 0.16 0.02 0.07
S 0.22 0.05 0.07 0.19 0.13 0.24 0.06 0.26 0.05 0.29 0.06 0.12 0.02 0.17 0.05 0.17 0.15 0.29 0.08 0.09
T 0.25 0.08 0.11 0.25 0.22 0.31 0.09 0.45 0.05 0.36 0.07 0.16 0.02 0.17 0.07 0.15 0.25 0.44 0.03 0.11
V 1.53 0.39 0.16 0.57 1.35 1.27 0.23 2.56 0.17 2.64 0.49 0.18 0.09 0.27 0.16 0.29 0.44 3.74 0.23 0.64
W 0.17 0.10 0.03 0.05 0.13 0.18 0.03 0.18 0.02 0.25 0.08 0.04 0.02 0.05 0.02 0.08 0.03 0.23 0.05 0.05
Y 0.27 0.10 0.03 0.18 0.43 0.48 0.07 0.42 0.10 0.50 0.06 0.08 0.04 0.13 0.07 0.09 0.11 0.64 0.05 0.10A C D E F G H I K L M N P Q R S T V W Y
A 0.27 0.04 0.13 0.28 0.22 0.18 0.11 0.31 0.23 0.38 0.06 0.11 0.06 0.13 0.22 0.28 0.37 0.49 0.06 0.25
C 0.04 0.08 0.05 0.07 0.04 0.03 0.03 0.04 0.07 0.04 0.02 0.06 0.01 0.08 0.11 0.05 0.06 0.10 0.04 0.09
D 0.13 0.05 0.09 0.13 0.09 0.08 0.13 0.08 0.71 0.12 0.06 0.22 0.03 0.15 0.50 0.36 0.41 0.24 0.02 0.12
E 0.28 0.07 0.13 0.43 0.31 0.15 0.21 0.43 1.92 0.50 0.14 0.28 0.10 0.25 1.49 0.60 1.01 0.63 0.09 0.32
F 0.22 0.04 0.09 0.31 0.23 0.16 0.12 0.34 0.28 0.32 0.12 0.14 0.06 0.19 0.29 0.27 0.34 0.38 0.13 0.33
G 0.18 0.03 0.08 0.15 0.16 0.08 0.06 0.15 0.16 0.15 0.06 0.08 0.05 0.10 0.15 0.14 0.17 0.21 0.03 0.19
H 0.11 0.03 0.13 0.21 0.12 0.06 0.06 0.08 0.25 0.12 0.04 0.10 0.07 0.11 0.14 0.19 0.20 0.21 0.05 0.14
I 0.31 0.04 0.08 0.43 0.34 0.15 0.08 0.48 0.57 0.32 0.10 0.14 0.07 0.28 0.43 0.30 0.32 0.59 0.07 0.40
K 0.23 0.07 0.71 1.92 0.28 0.16 0.25 0.57 0.63 0.38 0.15 0.46 0.08 0.42 0.33 0.70 1.17 0.71 0.22 0.52
L 0.38 0.04 0.12 0.50 0.32 0.15 0.12 0.32 0.38 0.48 0.10 0.15 0.12 0.23 0.36 0.26 0.34 0.62 0.07 0.39
M 0.06 0.02 0.06 0.14 0.12 0.06 0.04 0.10 0.15 0.10 0.12 0.09 0.04 0.08 0.10 0.12 0.14 0.10 0.02 0.08
N 0.11 0.06 0.22 0.28 0.14 0.08 0.10 0.14 0.46 0.15 0.09 0.38 0.09 0.22 0.25 0.48 0.49 0.27 0.05 0.18
P 0.06 0.01 0.03 0.10 0.06 0.05 0.07 0.07 0.08 0.12 0.04 0.09 0.02 0.06 0.07 0.07 0.13 0.13 0.02 0.16
Q 0.13 0.08 0.15 0.25 0.19 0.10 0.11 0.28 0.42 0.23 0.08 0.22 0.06 0.24 0.32 0.28 0.48 0.26 0.03 0.16
R 0.22 0.11 0.50 1.49 0.29 0.15 0.14 0.43 0.33 0.36 0.10 0.25 0.07 0.32 0.36 0.47 0.68 0.72 0.11 0.30
S 0.28 0.05 0.36 0.60 0.27 0.14 0.19 0.30 0.70 0.26 0.12 0.48 0.07 0.28 0.47 0.91 0.88 0.50 0.06 0.27
T 0.37 0.06 0.41 1.01 0.34 0.17 0.20 0.32 1.17 0.34 0.14 0.49 0.13 0.48 0.68 0.88 1.60 0.82 0.07 0.27
V 0.49 0.10 0.24 0.63 0.38 0.21 0.21 0.59 0.71 0.62 0.10 0.27 0.13 0.26 0.72 0.50 0.82 0.87 0.21 0.64
W 0.06 0.04 0.02 0.09 0.13 0.03 0.05 0.07 0.22 0.07 0.02 0.05 0.02 0.03 0.11 0.06 0.07 0.21 0.02 0.13
Y 0.25 0.09 0.12 0.32 0.33 0.19 0.14 0.40 0.52 0.39 0.08 0.18 0.16 0.16 0.30 0.27 0.27 0.64 0.13 0.38
Buried Residue
Exposed Residue
http://bcb.cs.tufts.edu/propellers/si/
Computing a Score
• Sequences are scored by computing their best “threading” or “parse” against the template as a sum of HMM(score) + pairwise(score)
• No longer polynomial time (multi-dimensional dynamic programming)
• Tractable on propellers because paired beta-strands don’t interleave too much
Let’s look at what this would mean for propeller folds
Let’s look at what this would mean for propeller folds
• Training set for HMM score: leave-superfamily-out cross validation
• Training set for pairwise score: amphapathic beta-sheets from NON-propellers
Results on Propellers
6-bladed 7-bladed
TNeg Hmmer Smurf Hmmer Smurf
97% 52 80 80 87
96% 56 80 80 87
95% 64 80 87 93
94% 68 84 90 93
93% 68 84 90 93
92% 68 88 90 97
91% 68 92 90 97
90% 68 92 93 100
Results on Propellers
• Note that this is “6 (or 7)” bladed propeller versus non-propeller– distinguishing the number of blades in the propeller seems to be a much harder problem….
Different propeller closures
1jof 2trc
So: what new sequences fold into propellers?
• We predict a double propeller motif in the N-terminal region of a hybrid 2-component sensor protein.
What are these proteins?
• First found in a benign bacteria in human gut. • May be involved in adapting to changes in
diet/efficiently processing different sugars• Found in other bacterial species: help sense and
adapt to environmental changes.
• Big stretch (I am not a biologist): help to study human obesity epidemic??
Popular Domains
• HisKA histidine kinase domain• GGDEF adenylyl cyclase signalling domain• SpoIIE sporulation domain• Gaf domain • PAS domain• HATPase domain
Species distribution
Distinguishing Number of Blades
• The automatic SMURF consensus 7-bladed template only learns 6 blades.
• Sequence motifs are similar– the same Pfam motif occurs in propellers with different numbers of blades
• The fix: throw out propellers with a “funky” 7th blade by hand and build a new template. Now 6-bladed propellers don’t like the 7-bladed template
• Double propellers we found are probably 7-7 (but 7-6 is also plausible).
Predict propellers with Smurf!
• http://smurf.cs.tufts.edu– Accepts sequences in FASTA format– 6,7,8-bladed templates, as well as all 9
double-propeller template
http://bcb.cs.tufts.edu/propellers/sipairwise tables
long list of predicted propeller sequences
What’s Next for SMURF?
Long-range dependenciesDeeply interleaved β-strand pairs
Conclusions
• Combining an HMM score with a pairwise score can help recognize beta-structures
• Computing this score exactly with a random field is highly computationally intensive
• We will begin to look at when it is feasible and when we should use heuristics.
• Also: add side-chain packing, other model refinements.
More Questions
• When should we over-weight the HMM versus the pair portion of the score?
-- the case of 8-bladed propellers
• Are there other ways to incorporate pairwise dependencies into HMMs?
An Hmm is only as good as its training data
• An Hmm is only as good as its training data– or is it?
• Idea: we augment the training set, using the simplest model of evolution!
• See Kumar and Cowen’s ISMB proceedings paper!
Acknowledgements
• National Institutes of Health
Thank you!