Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Materials & Methods
Utilizing fDETECT, we performed a first-of-its-kind analysis of predicted crystallization propensity of 8,652,940 non-redundant proteins from 1,953 fully sequenced proteomes (106 archaea, 1,043 bacteria, 265 eukaryotes and 539 viruses) collected from release 2012_07 of UniProt.
0.1
0.2
0.3
0.4
0.5
0.6
Cry
stal
lizat
ion
pro
pe
nsi
ty
(me
dia
n +
/- 2
5 c
en
tile
)
resolution [Å]
Crystallization score3rd degree polynomial
Empirical tests show that fDETECT predicts crystallization propensity with competitive predictive performance while being six orders of magnitude faster than similarly accurate methods. Our evaluation also shows an interesting trend between structure resolution and the predicted crystallization propensity score; higher score implies, on average, that the crystal resolution of the structure is better.
There are 1,734,048 protein families clustered at 30% sequence identity level (modeling families), in the analyzed 1,953 proteomes. Each superkingdom show very different overall
propensity for crystallization.
0%
20%
40%
60%
80%
100%
0 0.2 0.4 0.6 0.8 1Crystallization propensity score
All SuperkingdomsArchaeaBacteriaEukaryotaViruses Archaea+BacteriaViruses Eukaryota
25
th c
enti
le s
core
on
PD
B
med
ian
sco
re o
n P
DB
75
th c
enti
le s
core
on
PD
B
stru
ctu
ral c
ove
rage
[%
of
clu
ster
s a
bo
ve t
he
sco
re]
Structural coverage using X-ray crystallography for a current snapshot of the protein universe
Marcin J Mizianty1, Xiao Fan1, Jing Yan1, Eric Chalmers1, Christopher Woloschuk1, Andrzej Joachimiak2 & Lukasz Kurgan1,*
1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada and 2Midwest Center for Structural Genomics, Biosciences Division, Argonne National Laboratory, USA *[email protected]
MCSG
Can three-dimensional structures of all protein families be determined using X-ray crystallography? Utilizing an accurate and time-efficient method for Fast DEtermination of Targets’ Eligibility for CrysTalization (fDETECT), we performed a first-of-its-kind analysis of crystallization propensity for all proteins encoded in nearly 2,000 fully sequenced genomes. We computed the attainable structural coverage that combines use of current crystallization protocols, and homology modeling, by utilizing crystallization propensity scores for target selection.
Introduction
We would like to gratefully acknowledge the travel award from Genome Canada to M.J.M. This work was supported by the Dissertation fellowship awarded by the University of Alberta to M.J.M.; and by the Discovery grant from the NSERC of Canada to L.K.
Acknowledgments
Analysis of the predicted crystallization propensity reveals that three superkingdoms show very different overall propensity for crystallization, with archaea being the easiest to crystallize and eukaryota the hardest. It appears that crystallization propensities of viral proteomes have properties similar to their host proteomes. We also show that current X-ray crystallography know-how combined with homology modeling (using 30% sequence identity cut off) can provide an average structural coverage of 73%. The structural coverage could be increased to 96% by improving homology modeling, if it would produce reliable predictions at 20% sequence identity. Moreover, the use of the knowledge-based target selection strategy significantly increases structural coverage.
Results
Using homology modeling at 30% sequence identity cut-off virtually all bacterial and archaeal proteomes as well as bacteriophages can be structurally covered at above 70%. However,
majority of the eukaryotes and eukaryotic viruses have coverage well below 70%.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 200 400 600 800 1000 1200 1400proteomes
BacteriaArchaeaEukaryotaViruses BacteriaViruses EukaryotaHomo sapiens
stru
ctu
ral c
ove
rage
[%
of
clu
ster
s ab
ove
th
e m
edia
n P
DB
sco
re]
*with homology modeling at 30% seq. ident.
Bacterial proteomes have the widest distribution of crystallization scores that overlap with scores for archaea proteomes and, to a smaller extend, with eukaryotes. Propensities for
archaea and eukaryotes show virtually no overlap.
0
20
40
60
80
100
120
140
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60
nu
mb
er o
f p
rote
om
es
median crystallization propensity score
BacteriaArchaeaEukaryotaViruses 0
2
4
6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Viruses - BacteriaViruses - Eukaryota
Method Avg. time [ms] AUC
fDETECT 0.8 0.754 PPCpred 152912.9 0.741 XtalPred 70624.4 0.665
CRYSTALP2 0.3 0.658 SVMcrys 153.3 0.615 OBScore 64 0.569 ParCrys N/A 0.557