Upload
philip-bourne
View
572
Download
2
Embed Size (px)
DESCRIPTION
A lecture in BILD94 at UCSD on introducing undergraduates to various aspects of bioinformatics.
Citation preview
UCSD BILD 94 1
Bioinformatics in the Bourne Lab
Philip E. [email protected]
BILD 94May 3, 2012
August 14, 2009
5/3/12
UCSD BILD 94 2
Some Personal Background ….
5/3/12
UCSD BILD 94 35/3/12
UCSD BILD 94 4
The Life of One Scientist – The Early YearsSo That You Might Not Make the Same Mistakes
• My high school teacher Mr. Wilson said I would be a failure at chemistry
• My PhD is in chemistry
• The opportunity to live in different places shaped my life
• Good friends are forever5/3/12
UCSD BILD 94 5
40+ Years Later
Ten Simple Rules for Starting a CompanyPLoS Comp Biol 2012 8(3) 1002439
5/3/12
UCSD BILD 94 65/3/12
UCSD BILD 94 7
PhD in Physical Chemistry
5/3/12
UCSD BILD 94 8
Always Loved Computing
Circa 19745/3/12
UCSD BILD 94 9
Postdoctoral Work – The Molecular Basis of How the Body Works
• Regrets: never learnt another language
5/3/12
UCSD BILD 94 10
Post Doc
5/3/12
UCSD BILD 94 11
Some Things Stay with You Your Whole Life
5/3/12
UCSD BILD 94 12
Senior Scientist HHMI Columbia University New York
• Driven not by career but wanting to live in New York City
5/3/12
UCSD BILD 94 13
~1990 Got Involved with the The Human Genome
• Was only possible by applying computers to problems in biology
• Developed algorithms to support physical and genetic mapping of Chr 13
5/3/12
UCSD BILD 94 14
Came to UCSD to Apply Computers to Big Biological Problems
• Possibly the best place in the world to do computational biology
5/3/12
UCSD BILD 94 155/3/12
UCSD BILD 94 16
The Protein Kinase Family•A large family important to signal transduction in eukaryotes and many bacteria.
•Phosphotransferases: transfer phosphate group from ATP to Ser/Thr or Tyr residue on target protein, producing a range of downstream signaling effects.
•PKA: an example of a typical protein kinase (TPK) fold, shown in “open book” format
5/3/12
UCSD BILD 94 17
Sometime Ya Got to Just Do It Yourself
5/3/12
UCSD BILD 94 18
Num
ber
of r
elea
sed
entr
ies
Year
The Growth of Data is A Major Driver in Biology
5/3/12
UCSD BILD 94 20
Big Research Questions in the Lab1. Can we improve how science is
disseminated and comprehended?
2. What is the ancestry of the protein structure universe and what can we learn from it?
3. Are there alternative ways to represent proteins from which we can learn something new?
4. What really happens when we take a drug?
5. Can we contribute to the treatment of neglected {tropical} diseases?
August 14, 2009
5/3/12
UCSD BILD 94 21
Studying Evolution Through Structure
5/3/12
UCSD BILD 94 22
Nature’s Reductionism
There are ~ 20300 possible proteins>>>> all the atoms in the Universe
11.2M protein sequences from 10,854 species (source RefSeq)
38,221 protein structures yield 1195 domain folds (SCOP 1.75)
5/3/12
UCSD BILD 94 23
Initial Question:With the current coverage of proteomes by structure and
assuming we know a high percentage of all folds, is structure a useful discriminator of species?
5/3/12
UCSD BILD 94 24
Chapter 2 Initial Findings
Russ Doolittle, Professor
Center for Molecular GeneticsUCSD
Song YangPost Doc UC Berkeley
Department of Chemistry and BiochemistryUCSD
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
5/3/12
UCSD BILD 94 25
To Answer this Question We Only Need to Make Use of Existing Resources
• SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies
• SUPERFAMILY assigns the above to fully sequenced proteomes
5/3/12
UCSD BILD 94 26
The SCOP Hierarchy v1.75Based on 38221 Structures
7
1195
1962
3902
110800
5/3/12
UCSD BILD 94 27
Is Structure a Useful Discriminator of Species? - Maybe… Distribution among the three kingdomsas taken from SUPERFAMILY
• Superfamily distributions would seem to be related to the complexity of life
• Update of the work of Caetano-Anolles2 (2003) Genome Biology 13:1563
Eukaryota (650)
Archaea (416) Bacteria (564)
2 42
10
135
118
387
17
SCOP fold (765 total)
1
153/14
9/1
21/2 310/0645/49
29/0 68/0
Any genome / All genomes
5/3/12
UCSD BILD 94 28
Method – Distance Determination
(FSF)SCOP
SUPERFAMILY
organisms
C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Presence/Absence Data Matrix
Distance Matrix
Chapter 2 Initial Findings5/3/12
UCSD BILD 94 29
Is Structure a Useful Discriminator of Species? - Yes
Archaea Bacteria Eukaryota
The method cleanly placed all species in their correct superkingdoms
5/3/12
UCSD BILD 94 30
The Answer Would Appear to be Yes
• It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome
5/3/12
UCSD BILD 94 31
Environmental Influence
Chris Dupont Scripps Institute of Oceanography
UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827
5/3/12
UCSD BILD 94 32
Consider the Distribution of Disulfide Bonds among Folds
• Disulphides are only stable under oxidizing conditions
• Oxygen content gradually accumulated during the earth’s evolution
• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago
• Oxygen began to accumulate ~ 2.0 billion years ago
• Logical deduction – disulfides more prevalent in folds (organisms) that evolved later
• This would seem to hold true• Can we take this further?
Eukaryota
Archaea Bacteria
0% (0/2)
16.7% (7/42)
0% (0/10)
31.9% (43/135)
14.4% (17/118) 4.7%
(18/387)
5.9% (1/17)
SCOP fold (708 total)
1
5/3/12
UCSD BILD 94 33
Evolution of the Earth
• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological
changes• Life has evolved in this time
• The ocean was the “cradle” for 90% of evolution
5/3/12
UCSD BILD 94 34
• Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).
• The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.
0
0.5
1
1.00E-20
1.00E-16
1.00E-12
1.00E-08
1.00E-15
1.00E-12
1.00E-09
1.00E-06
1.00E-11
1.00E-09
1.00E-07
00.511.522.533.544.5
Billions of years before present
Concentration
(O2
in arbitrary units, Zn and Fe in m
oles L-1
BacteriaArchaea
Eukarya
Oxygen
Zinc
Iron
CobaltManganese
Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History
Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318
5/3/12
UCSD BILD 94 35
Bacteria Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Eukaryotic Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Superfamily Distribution As Well As Overall Content Has Changed
5/3/12
UCSD BILD 94 36
Hypothesis
• Emergence of cyanobacteria changed oxygen concentrations
• Impacted metal concentrations in the ocean• Organisms used new metals in new ways to
evolve new biological processes eg complex signaling
• This in turn further impacted the environment
5/3/12
UCSD BILD 94 37
Big Research Questions in the Lab1. Can we improve how science is
disseminated and comprehended?
2. What is the ancestry of the protein structure universe and what can we learn from it?
3. Are there alternative ways to represent proteins from which we can learn something new?
4. What really happens when we take a drug?
5. Can we contribute to the treatment of neglected {tropical} diseases?
August 14, 2009
5/3/12
UCSD BILD 94 38
Our Motivation• Tykerb – Breast cancer
• Gleevac – Leukemia, GI cancers
• Nexavar – Kidney and liver cancer
• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive
Collins and Workman 2006 Nature Chemical Biology 2 689-700Motivators5/3/12
UCSD BILD 94 39
Our Broad Approach
• Involves the fields of:– Structural bioinformatics– Cheminformatics – Biophysics– Systems biology – Pharmaceutical chemistry
• L. Xie, L. Xie, S.L. Kinnings and P.E. Bourne 2012 Novel Computational Approaches to Polypharmacology as a Means to Define Responses to Individual Drugs, Annual Review of Pharmacology and Toxicology 52: 361-379
• L. Xie, S.L. Kinnings, L. Xie and P.E. Bourne 2012 Predicting the Polypharmacology of Drugs: Identifying New Uses Through Bioinformatics and Cheminformatics Approaches in Drug Repurposing M. Barrett and D. Frail (Eds.) Wiley and Sons. (available upon request)
5/3/12
Approach - Need to Start with a 3D Drug-Receptor Complex – Either Experimental or Modeled
Generic Name Other Name Treatment PDBid
Lipitor Atorvastatin High cholesterol 1HWK, 1HW8…
Testosterone Testosterone Osteoporosis 1AFS, 1I9J ..
Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH
Viagra Sildenafil citrate ED, pulmonary arterial hypertension
1TBF, 1UDT, 1XOS..
Digoxin Lanoxin Congestive heart failure
1IGJ
5/3/12 UCSD BILD 94 40
A Reverse Engineering Approach to Drug Discovery Across Gene FamiliesCharacterize ligand binding site of primary target (Geometric Potential)
Identify off-targets by ligand binding site similarity(Sequence order independent profile-profile alignment)
Extract known drugs or inhibitors of the primary and/or off-targets
Search for similar small molecules
Dock molecules to both primary and off-targets
Statistics analysis of docking score correlations
…
Xie and Bourne 2009 Bioinformatics 25(12) 305-312
5/3/1241
UCSD BILD 94 42
• Initially assign Ca atom with a value that is the distance to the environmental boundary
• Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i
0.2
0.1)cos(
0.1
i
Di
PiPGP
neighbors
Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments
Characterization of the Ligand Binding Site - The Geometric Potential
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
5/3/12
UCSD BILD 94 43
Discrimination Power of the Geometric Potential
0
0.5
1
1.5
2
2.5
3
3.5
4
0 11 22 33 44 55 66 77 88 99
Geometric Potential
binding site
non-binding site
• Geometric potential can distinguish binding and non-binding sites
100 0
Geometric Potential Scale
For Residue Clusters
5/3/12
UCSD BILD 94 44
Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm
L E R
V K D L
L E R
V K D L
Structure A Structure B
• Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix
• The maximum-weight clique corresponds to the optimum alignment of the two structures
Xie and Bourne 2008 PNAS, 105(14) 5441
5/3/12
UCSD BILD 94 45
Similarity Matrix of Alignment
Chemical Similarity• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)• Amino acid chemical similarity matrix
Evolutionary Correlation• Amino acid substitution matrix such as BLOSUM45• Similarity score between two sequence profiles
ia
i
ib
ib
i
ia SfSfd
fa, fb are the 20 amino acid target frequencies of profile a and b, respectivelySa, Sb are the PSSM of profile a and b, respectively
5/3/12
UCSD BILD 94 46
The Problem with Tuberculosis
• One third of global population infected• 1.7 million deaths per year• 95% of deaths in developing countries• Anti-TB drugs hardly changed in 40 years• MDR-TB and XDR-TB pose a threat to
human health worldwide• Development of novel, effective and
inexpensive drugs is an urgent priority
5/3/12
UCSD BILD 94 47
The TB-Drugome
1. Determine the TB structural proteome
2. Determine all known drug binding sites from the PDB
3. Determine which of the sites found in 2 exist in 1
4. Call the result the TB-drugomeKinnings et al 2010 PLoS Comp Biol 6(11): e1000976
5/3/12
UCSD BILD 94 48
1. Determine the TB Structural Proteome
284
1, 446
3, 996 2, 266
TB proteome
homology models
solved structu
res
• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%
5/3/12
UCSD BILD 94 49
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370
20
40
60
80
100
120
140
2. Determine all Known Drug Binding Sites in the PDB
• Searched the PDB for protein crystal structures bound with FDA-approved drugs
• 268 drugs bound in a total of 931 binding sites
No. of drug binding sites
No.
of d
rugs
MethotrexateChenodiol
AlitretinoinConjugated estrogens
DarunavirAcarbose
5/3/12
UCSD BILD 94
Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/
Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
Research is a Good Life