View
218
Download
1
Embed Size (px)
Citation preview
Structural Genomics of Pathogenic Protozoa
Christopher Mehlin [email protected] Production and Crystallization Workshop 2004
WWW.SGPP.ORG
The SGPP is focused on protozoa which cause human disease
• Malaria – Plasmodium falciparum, P. vivax
• Leishmaniasis – Leishmania major + 8 others
• African sleeping sickness – Trypanosoma brucei
• Chagas’ disease – Trypanosoma cruziThese diseases afflict ~500 million people per year; roughly half the world’s population is at risk.
These targets are challenging!
• Eukaryotic organisms• Leishmania
– Only L. major sequence is known (more coming…)
• Plasmodium falciparum– 80% AT-rich genome– Requires cDNA – intron prediction difficult– Floppy loops e.g. CDK-2 has 83 asparagines
in a row
Primers-to-Protein Normally ~5% Overall Yield
Data from 1318 L. major and 368 P. falciparum targets
L. major 5.2%
P. falciparum 4.9%
>85% of our effort is put into cloning, screening, and expressing this 5%
SELECTION PCR EXPRESS SOLUBLE0
25
50
75
100
% REMAINING AFTER
STEP
Protein Variants Increase the Odds
• Multiple species variants– Especially Leishmania
• “Chunking”– Computational domain prediction– Random truncation
L. major L. aethiopicaL. infantumL. donovaniL. tropicaL. mexicanaL. guyanensisL. naiffiL. braziliensisL. tarentolaeE. scheideri
Homology
97%
60%
Human pathogens
Primers designed for L. major can fish out homologues from other species
L. major L. aethiopicaL. infantumL. donovaniL. tropicaL. mexicanaL. guyanensisL. naiffiL. braziliensisL. tarentolaeE. scheideri
Homology
97%
60%
PCR success using L. major primers
83%
10%
Primers designed for L. major can fish out homologues from other species
Multiple species targeted with a list of 40 high-value targets(enzymes with known inhibitors)
P. falciparum 4
L. major 4
Organism 1 2 3 4 5 6 7
Target Number
Two species gave us eight proteins and 7/40 (18%) of the targets.
HOMOLOGUES
Multiple species targeted with a list of 40 high-value targets(enzymes with known inhibitors)
P. falciparum 4
L. major 4
L. infantum 3
Organism 1 2 3 4 5 6 7 8 9 10
Target Number
95% IDENTICALNo overlap!
Small changes in sequence make an enormous difference in the behavior of the protein.
Multiple species targeted with a list of 40 high-value targets(enzymes with known inhibitors)
P. falciparum 4
L. major 4
L. infantum 3
L. mexicana 3
L. guyanensis 2
L. tarentole 1
L. braziliensis 2
Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14
TOTAL: 19 proteins, 14 of 40 (35%) of targets10 targets would not have been obtained otherwise
Target Number
Multiple species variants help crystallization, too! 1 60Lmaj001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHELVPENTKYIVTEHYLdon001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHEVVPENTKYIVTEHY
61 120Lmaj001686 PKGLGRIVPGITLPQTAHLIEKTRFSCIVPQVEELLEDVDNAVVFGIEGHACILQTVADLLdon001686 PKGLGRIVPEITLPKTAHLIEKTRFSCVVPQVEELLEDVDNAVVFGIEGHACILQTVADL
121 180Lmaj001686 LDMNERVFLPKDGLGSQKKTDFKAAMKLMGSWSPNCEITTSESILLQMTKDAMDPDFKKILdon001686 LDMNKRVFLPKDGLGSQKKTDFKAAIKLMSSWGPNCEITTSESILLQMTKDAMDPNFKRI
181 193Lmaj001686 SKLLKEEPPIPL.Ldon001686 SKLLKEEPPIPL.
95% IDENTITY
Lmaj001686AAA nice crystals, no diffraction
Ldon001686AAA “huge” crystals, 2.7Å diffraction
Consider a 3-domain protein:
Standard chunks would be the entire protein, each individual domain, and any contiguous series of domains. A 3 domain proteintherefore becomes 6 chunks.
Full length
Adjacent domains
Single domains
The concept of chunking… N(N+1) 2
Domain Parsing using GINZU
Step 1: PSI-Blast against the PDB
Step 2: Use consensus fold recognition methods to find remote PDB matches
PDB
PDBFoldRecognition
PDBFoldRecognition
Step 3: Search PFAM database for preassigned modular “chunks”
PfamStep 4: Identify new modular “chunk” regions in multiple sequence alignment
PDBFoldRecognition
Pfam
Final Step: Select cut points in linker regions using assigned boundaries and coil predictions
MSA
Target Sequence
Confidence
PDBPfam MSAFoldRecognition
PDBFoldRecognition
Pfam MSA
Step 5: Identify parse points in Rosetta structure predictions
Rosetta Rosetta
Rosetta Rosetta
Chunk Generation
David Kim, UW
Pfal006650AAA Example - tRNA SynthetasePFAM, PDB, and MSA coverage
Ginzu Domains1. No assignment but still based on MSA (remaining region)2. PFAM hit to PF01411 tRNA synthetases class II (A)3. PDB hit to 1nyqA (Threonyl-tRNA Synthetase)4. MSA based assignment
Ginzu Parse Results w/ Multiple Sequence Alignment PSI-BLAST against Non-redundent (NR) sequence database
PFAM PDB MSARemainingRegion
David Kim, UW
71 ORFs
12/66 inaccessible proteins have had at least one soluble chunk (18%)
17/71 proteins accessible via this technique (24%)
CHUNKING L. major PROTEINS
GINZU 205 Chunks (not counting full length)
5 ORFs solubly expressed (7%) 15 chunks solubly expressed (7%)11 ORFs had 1 soluble chunk 2 ORFs had 2 chunks soluble
2/16 chunks of soluble ORFs soluble (both of the same ORF)
1 chunk of non-crystallizing, soluble ORF crystallized
Superchunking: for high-value targets
Step 1: Determine functional domain of protein by comparison to known protein:
Functional Domain
Step 2: Determine 10 truncation sites on each side of functional domain; Make 20 primers.
Functional Domain
Step 3: Run 10x10=100 PCRs, clone products, screen for soluble expression, crystallizability
Superchunking Thioredoxin Reductase from P. falciparum
► 20 different soluble proteins from 90 cloned constructs.
► PCR success 100% -- used template of full-length PCR
Erica Boni
Chunks Expressed Soluble0
25000
50000
75000
Molecular Weight,
Daltons
►TR is a 60.7 kDa enzyme with a high degree of domain interaction
Erica Boni
PROTEIN YIELD
60708
59862
59340
59242
58534
57915
57672
57424
57052
0
25
50
75
100
Molecular Weight of Chunks, kDa
mg/Liter protein
produced
NA
TIV
E
Superchunking Thioredoxin Reductase
PROTEIN YIELD
60708
59862
59340
59242
58534
57915
57672
57424
57052
0
25
50
75
100
Molecular Weight of Chunks, kDa
mg/Liter protein
produced
Erica Boni
18 off N-terminus & 8 off C-terminus
16 off C-terminus
7 off N-terminus
Superchunking Thioredoxin Reductase
Conclusions:
Relatively small changes in protein sequence can have dramatic effects on the behavior of proteins in expression and crystallization.
Multiple species and chunking are two promising methods for obtaining protein variants.
Acknowledgements:
• University of Washington – Jamie Andreyka, Erica Boni, Tiffany Feist, Lutfiyah Haji, Colleen Liu, Natascha Mueller– Fred Buckner, Mike Gelb, Wes VanVoohris, Kevin Bauer– David Baker, David Kim, Erkang Fan, Stan Fields Group– Wim Hol and Hol group
• Seattle Biomedical Research Institute– Liz Worthey, Ellen Sisk, Peter Myler
• Hauptman Woodward Medical Research Institute– George Detitta, Joe Luft, Nancy Fehrman, Angela Luricella et al.
• Seattle Crystallization and Structure Determination Units– Oleksandr Kalyuzhniy, Lori Anderson– Ethan Merritt, Isolde Le Trong, Mark Robien
• Collaborators:– SSRL Stanford– ALS Berkeley
NIH/NIGMS/NIAID