PHAR 201 Lecture 08, 2012 1
From Reductionism Comes New Science:
Protein Structure Data Reveals How Environmental Pressures
Shape Evolution
PHAR 201/Bioinformatics I
Philip E. BourneDepartment of Pharmacology, UCSD
PHAR 201 Lecture 08, 2012 2
Introduction
• Previously we reviewed one system of reductionism – SCOP
• SCOP is used to assign superfamilies and families to complete proteomes in another resource called SUPERFAMILY
• Today we will see how this is used to do new science (Dupont et al PNAS 2007 103(47) 17822-17827; PNAS 2010
doi: 10.1073/pnas.0912491107 ) • We cast this new science in the context of the
Gaia hypothesis
PHAR 201 Lecture 08, 2012 3
The SCOP Hierarchy v1.75Based on 38221 Structures
7
1195
1962
3902
110800
PHAR 201 Lecture 08, 2012 4
The Gaia Hypothesis
Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet.
James Lovelock
Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖα; is a Greek goddess personifying the Earth
PHAR 201 Lecture 08, 2012 5
We Show Some Support for the Gaia Hypothesis
Emergent properties of an organism have been influenced
by the environment
These organisms in turn have influenced
the environment
PHAR 201 Lecture 08, 2012 6
Nature’s Reductionism
There are ~ 20300 possible proteins>>>> all the atoms in the Universe
11.2M protein sequences from 10,854 species (source RefSeq)
38,221 protein structures yield 1195 domain folds (SCOP 1.75)
PHAR 201 Lecture 08, 2012 7
What Does Nature’s Reductionism Tell Us?
• The advent of a new fold is a big deal
• From new folds come new function(s)
• Are these new folds enough to distinguish “species”?
PHAR 201 Lecture 08, 2012 8
To Answer this Question We Only Need to Make Use of Existing
Resources
• SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies
• SUPERFAMILY assigns the above to fully sequenced proteomes
PHAR 201 Lecture 08, 2012 9
Method – Distance Determination
(FSF)SCOP
SUPERFAMILY
organisms
C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Presence/Absence Data Matrix
Distance Matrix
PHAR 201 Lecture 08, 2012 10
The Answer Would Appear to be Yes
• It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies within a given proteome
Yang, Doolittle and Bourne2005 PNAS 102(2): 373-378
PHAR 201 Lecture 08, 2012 11
Moreover… Distribution of among the three kingdoms
as taken from SUPERFAMILY
• Superfamily distributions would seem to be related to the complexity of life
• Update of the work of Caetano-Anolles2
(2003) Genome Biology 13:1563
Eukaryota (650)
Archaea (416) Bacteria (564)
2 42
10
135
118
387
17
SCOP fold (765 total)
1
153/14
9/1
21/2 310/0645/49
29/0 68/0
Any genome / All genomes
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
PHAR 201 Lecture 08, 2012 12
The Unique Superfamily in Archaea – d.17.6
• Archaeosine tRNA-guanine transglycosylase (tgt), C2 domain
• First step in the biosynthesis of an archaea-specific modified base, archaeosine (7-formamidino-7-deazaguanosine)
• Found in tRNAs• At present found
exclusively in Archaea. Reference: Interpro IPR004804
PHAR 201 Lecture 08, 2012 13
Let us Take This a Step FurtherConsider the Distribution of Disulfide Bonds
among Folds• Disulphides are only stable under
oxidizing conditions• Oxygen content gradually
accumulated during the earth’s evolution
• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago
• Oxygen began to accumulate ~ 2.0 billion years ago
• Logical deduction – disulfides more prevalent in folds (organisms) that evolved later
• This would seem to hold true
• Can we take this further?
Eukaryota
Archaea Bacteria
0% (0/2)
16.7% (7/42)
0% (0/10)
31.9% (43/135)
14.4% (17/118) 4.7%
(18/387)
5.9% (1/17)
SCOP fold (708 total)
1
PHAR 201 Lecture 08, 2012 14
Recap So Far
• Structure is a useful tool to study evolution since it is conserved over longer periods of geological time
• A course-grained characterization of structure, namely superfamily, distinguishes between species
• There is a tantalizing suggestion that proteomes may contain imprints of their ancient environment
PHAR 201 Lecture 08, 2012 15
Recap So Far
• Structure is a useful tool to study evolution since it is conserved over longer periods of geological time
• A course-grained characterization of structure, namely superfamily, distinguishes between species
• There is a tantalizing suggestion that proteomes may contain imprints of their ancient environment
PHAR 201 Lecture 08, 2012 16
Consider Changes in Metal Ion Concentrations
Chris Dupont, Scripps Institute of Oceanography (now JCVI)
Bioinformatics Final Exam 2004
Dupont, Yang, Palenik, Bourne. PNAS 2007 103(47) 17822-17827; PNAS 2010 doi: 10.1073/pnas.0912491107
PHAR 201 Lecture 08, 2012 17
Evolution of the Earth
• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological
changes• Life has evolved in this time
• The ocean was the “cradle” for 90% of evolution
PHAR 201 Lecture 08, 2012 18
• Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).
• The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.
0
0.5
1
1.00E-20
1.00E-16
1.00E-12
1.00E-08
1.00E-15
1.00E-12
1.00E-09
1.00E-06
1.00E-11
1.00E-09
1.00E-07
00.511.522.533.544.5
Billions of years before present
Concentration
(O2
in arbitrary units, Zn and Fe in m
oles L-1
BacteriaArchaea
Eukarya
Oxygen
Zinc
Iron
CobaltManganese
Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History
Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318
PHAR 201 Lecture 08, 2012 19
Making the Metallome of Each Species – Can Only be Done from Structure
1. Start with SCOP2. Each {super}family level
assignment was checked manually for metal binding
3. All the structures representing the family had to bind the metal for it to be considered unambiguous
4. The literature was consulted to resolve ambiguities
5. Superfamily database used to map to proteomes
6. 23 Archaea, 233 Bacteria, 57 Eukaryota
7. Cu, Ni, Mo ignored (<0.3%) of proteome
PHAR 201 Lecture 08, 2012 20
Levels of Ambiguity
• Ambiguous superfamily binds different metals or have members that are not known to bind metals
• Ditto families
• Approx 50% of superfamilies and 10% of families are ambiguous
• Only unambiguous families used in this study
PHAR 201 Lecture 08, 2012 21
Bacteria Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Eukaryotic Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Superfamily Distribution As Well As Overall Content Has Changed
PHAR 201 Lecture 08, 2012 22
Metallomes are Very Diverse (Discriminatory)
• A quantile plot showing the percent of Bacterial proteomes each Fe-binding fold family occurs in (x).
• This plot also shows the average copy number of that fold family in the proteomes where it occurs (♦).
• Few Fe-binding folds are in most proteomes.
• Widespread Fe-binding folds are not necessarily abundant.
• Similar trends are observed for Zn, Mn, and Co in all three Superkingdoms.
0
2
4
6
8
10
12
14
010
20304050
607080
90100
Unique Fe-binding fold families (108 total)
(x) P
erce
nt o
f Bac
teri
al p
rote
omes
whi
ch a
fold
fam
ily o
ccur
s in
(♦)Average copy num
ber
PHAR 201 Lecture 08, 2012 23
Metal Binding Proteins are Not Consistent Across Superkingdoms
0
1
2
Zn Fe Mn Co
Archaea Bacteria Eukarya
Total domains in a proteome
Tot
al Z
n-bi
ndin
g do
mai
ns in
a p
rote
ome
10
104
102.5 105
Slo
pe o
f fi
tted
pow
er la
w
A B
Since these data are derived from current species they are independent ofevolutionary events such as duplication, gene loss, horizontal transfer andendosymbiosis
Power Laws: Fundamental Constants in the Evolution of Proteomes
A slope of 1 indicates that a group of structural domains is in equilibrium with genome
growth, while a slope > 1 indicates that the group of domains is being preferentially
duplicated (or retained in the case of genome reductions).
van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology 24PHAR 201 Lecture 08, 2012
PHAR 201 Lecture 08, 2012 25
Metal Binding Proteins are Not Consistent Across Superkingdoms
0
1
2
Zn Fe Mn Co
Archaea Bacteria Eukarya
Total domains in a proteome
Tot
al Z
n-bi
ndin
g do
mai
ns in
a p
rote
ome
10
104
102.5 105
Slo
pe o
f fi
tted
pow
er la
w
A B
PHAR 201 Lecture 08, 2012 26
Why are the Power Laws Different for Each Superkingdom?
• Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen
• We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom
• This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments
PHAR 201 Lecture 08, 2012 27
Do the Metallomes Contain Further Support for this Hypothesis?
Overall percent of Fe bound bySuperkingdom Fold Family % Fe-binding O2 Fe-S heme amino
Cytochrome P450 0.44 + 0.48 heme yesCytochrome c3-like 0.13 + 0.3 heme noCytochrome b5 0.12 + 0.09 heme no
Eukarya Purple acid phosphatase 0.11 + 0.08 amino no 21 + 9 47 + 19 32 + 12Penicillin synthase-like 0.07 + 0.1 amino yesHypoxia-inducible factor 0.07 + 0.04 amino yesDi-heme elbow motif 0.06 + 0.01 heme no
4Fe-4S ferredoxins 1.80 + 0.7 Fe-S noMoCo biosynthesis proteins 1.60 + 0.3 Fe-S noHeme-binding PAS domain 1.10 + 1.0 heme no
Archaea HemN 0.80 + 0.20 Fe-S 1 68 + 12 13 + 14 19 + 6a helical ferrodoxin 0.60 + 0.16 Fe-S nobiotin synthase 0.55 + 0.1 Fe-S noROO N-terminal domain-like 0.5 + 0.1 amino 2
High potential iron protein 0.38 + 0.25 Fe-S noHeme-binding PAS domain 0.3 + 0.4 heme 1MoCo biosynthesis proteins 0.21 + 0.15 Fe-S no
Bacteria HemN 0.2 + 0.15 Fe-S no 47 + 11 22 + 12 31 + 164Fe-4S ferredoxins 0.2 + 0.2 Fe-S nocytochrome c 0.14 + 0.2 heme noa helical ferrodoxin 0.12 + 0.09 Fe-S no
1. Some, but not all, PAS domains actually sense oxygen2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
PHAR 201 Lecture 08, 2012 28
e- Transfer ProteinsSame Broad Function, Same Metal, Different Chemistry
Induced by the Environment?
Fe-S clustersFe bound by S
Cluster held in place by Cys
Generally negative reduction potentials
Very susceptible to oxidation
CytochromesFe bound by heme (and
amino-acids)
Generally positive reduction potentials
Less susceptible to oxidation
The importance of “small class” Zn folds to Eukarya
1
10
100
1000
10000
100 1000 10000 100000
Total number of domainsin a proteomes
Tot
al “
smal
l cla
ss”
Zn
bi
ndin
g do
mai
ns
A B
Archaea0/531/28
Eukarya30/5318/28
Bacteria0/530/28
5/530/28
11/539/28
7/530/28
0/530/28
Archaea0/531/28
Eukarya30/5318/28
Bacteria0/530/28
5/530/28
11/539/28
7/530/28
0/530/28
Distribution of 53 unique small class Zn families
0
0.5
1
1.00E-20
1.00E-16
1.00E-12
1.00E-08
1.00E-15
1.00E-12
1.00E-09
1.00E-06
1.00E-11
1.00E-09
1.00E-07
00.511.522.533.544.5
Billions of years before present
Concentration
(O2
in arbitrary units, Zn and Fe in m
oles L-1
BacteriaArchaea
Eukarya
Oxygen
Zinc
Iron
CobaltManganese
29PHAR 201 Lecture 08, 2012
PHAR 201 Lecture 08, 2012 30
Hypothesis
• Emergence of cyanobacteria changed oxygen concentrations
• Impacted metal concentrations in the ocean
• Organisms used new metals in new ways to evolve new biological processes eg complex signaling
• This in turn further impacted the environment
PHAR 201 Lecture 08, 2012 31
A Final Thought
Perhaps We Should Study Both the Life Sciences and Earth
Sciences Together?