Structural Genomics of Pathogenic Protozoa Christopher Mehlin [email protected] Protein Production and Crystallization Workshop 2004

Structural Genomics of Pathogenic Protozoa

Christopher Mehlin [email protected] Production and Crystallization Workshop 2004

WWW.SGPP.ORG

The SGPP is focused on protozoa which cause human disease

• Malaria – Plasmodium falciparum, P. vivax

• Leishmaniasis – Leishmania major + 8 others

• African sleeping sickness – Trypanosoma brucei

• Chagas’ disease – Trypanosoma cruziThese diseases afflict ~500 million people per year; roughly half the world’s population is at risk.

These targets are challenging!

• Eukaryotic organisms• Leishmania

– Only L. major sequence is known (more coming…)

• Plasmodium falciparum– 80% AT-rich genome– Requires cDNA – intron prediction difficult– Floppy loops e.g. CDK-2 has 83 asparagines

in a row

Primers-to-Protein Normally ~5% Overall Yield

Data from 1318 L. major and 368 P. falciparum targets

L. major 5.2%

P. falciparum 4.9%

>85% of our effort is put into cloning, screening, and expressing this 5%

SELECTION PCR EXPRESS SOLUBLE0

25

50

75

100

% REMAINING AFTER

STEP

Protein Variants Increase the Odds

• Multiple species variants– Especially Leishmania

• “Chunking”– Computational domain prediction– Random truncation

L. major L. aethiopicaL. infantumL. donovaniL. tropicaL. mexicanaL. guyanensisL. naiffiL. braziliensisL. tarentolaeE. scheideri

Homology

97%

60%

Human pathogens

Primers designed for L. major can fish out homologues from other species

L. major L. aethiopicaL. infantumL. donovaniL. tropicaL. mexicanaL. guyanensisL. naiffiL. braziliensisL. tarentolaeE. scheideri

Homology

97%

60%

PCR success using L. major primers

83%

10%

Primers designed for L. major can fish out homologues from other species

Multiple species targeted with a list of 40 high-value targets(enzymes with known inhibitors)

P. falciparum 4

L. major 4

Organism 1 2 3 4 5 6 7

Target Number

Two species gave us eight proteins and 7/40 (18%) of the targets.

HOMOLOGUES


P. falciparum 4

L. major 4

L. infantum 3

Organism 1 2 3 4 5 6 7 8 9 10

Target Number

95% IDENTICALNo overlap!

Small changes in sequence make an enormous difference in the behavior of the protein.


P. falciparum 4

L. major 4

L. infantum 3

L. mexicana 3

L. guyanensis 2

L. tarentole 1

L. braziliensis 2

Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14

TOTAL: 19 proteins, 14 of 40 (35%) of targets10 targets would not have been obtained otherwise

Target Number

Multiple species variants help crystallization, too! 1 60Lmaj001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHELVPENTKYIVTEHYLdon001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHEVVPENTKYIVTEHY

61 120Lmaj001686 PKGLGRIVPGITLPQTAHLIEKTRFSCIVPQVEELLEDVDNAVVFGIEGHACILQTVADLLdon001686 PKGLGRIVPEITLPKTAHLIEKTRFSCVVPQVEELLEDVDNAVVFGIEGHACILQTVADL

121 180Lmaj001686 LDMNERVFLPKDGLGSQKKTDFKAAMKLMGSWSPNCEITTSESILLQMTKDAMDPDFKKILdon001686 LDMNKRVFLPKDGLGSQKKTDFKAAIKLMSSWGPNCEITTSESILLQMTKDAMDPNFKRI

181 193Lmaj001686 SKLLKEEPPIPL.Ldon001686 SKLLKEEPPIPL.

95% IDENTITY

Lmaj001686AAA nice crystals, no diffraction

Ldon001686AAA “huge” crystals, 2.7Å diffraction

Consider a 3-domain protein:

Standard chunks would be the entire protein, each individual domain, and any contiguous series of domains. A 3 domain proteintherefore becomes 6 chunks.

Full length

Adjacent domains

Single domains

The concept of chunking… N(N+1) 2

Domain Parsing using GINZU

Step 1: PSI-Blast against the PDB

Step 2: Use consensus fold recognition methods to find remote PDB matches

PDB

PDBFoldRecognition

PDBFoldRecognition

Step 3: Search PFAM database for preassigned modular “chunks”

PfamStep 4: Identify new modular “chunk” regions in multiple sequence alignment

PDBFoldRecognition

Pfam

Final Step: Select cut points in linker regions using assigned boundaries and coil predictions

MSA

Target Sequence

Confidence

PDBPfam MSAFoldRecognition

PDBFoldRecognition

Pfam MSA

Step 5: Identify parse points in Rosetta structure predictions

Rosetta Rosetta

Rosetta Rosetta

Chunk Generation

David Kim, UW

Pfal006650AAA Example - tRNA SynthetasePFAM, PDB, and MSA coverage

Ginzu Domains1. No assignment but still based on MSA (remaining region)2. PFAM hit to PF01411 tRNA synthetases class II (A)3. PDB hit to 1nyqA (Threonyl-tRNA Synthetase)4. MSA based assignment

Ginzu Parse Results w/ Multiple Sequence Alignment PSI-BLAST against Non-redundent (NR) sequence database

PFAM PDB MSARemainingRegion

David Kim, UW

71 ORFs

12/66 inaccessible proteins have had at least one soluble chunk (18%)

17/71 proteins accessible via this technique (24%)

CHUNKING L. major PROTEINS

GINZU 205 Chunks (not counting full length)

5 ORFs solubly expressed (7%) 15 chunks solubly expressed (7%)11 ORFs had 1 soluble chunk 2 ORFs had 2 chunks soluble

2/16 chunks of soluble ORFs soluble (both of the same ORF)

1 chunk of non-crystallizing, soluble ORF crystallized

Superchunking: for high-value targets

Step 1: Determine functional domain of protein by comparison to known protein:

Functional Domain

Step 2: Determine 10 truncation sites on each side of functional domain; Make 20 primers.

Functional Domain

Step 3: Run 10x10=100 PCRs, clone products, screen for soluble expression, crystallizability

Superchunking Thioredoxin Reductase from P. falciparum

► 20 different soluble proteins from 90 cloned constructs.

► PCR success 100% -- used template of full-length PCR

Erica Boni

Chunks Expressed Soluble0

25000

50000

75000

Molecular Weight,

Daltons

►TR is a 60.7 kDa enzyme with a high degree of domain interaction

Erica Boni

PROTEIN YIELD

60708

59862

59340

59242

58534

57915

57672

57424

57052

0

25

50

75

100

Molecular Weight of Chunks, kDa

mg/Liter protein

produced

NA

TIV

E

Superchunking Thioredoxin Reductase

PROTEIN YIELD

60708

59862

59340

59242

58534

57915

57672

57424

57052

0

25

50

75

100

Molecular Weight of Chunks, kDa

mg/Liter protein

produced

Erica Boni

18 off N-terminus & 8 off C-terminus

16 off C-terminus

7 off N-terminus

Superchunking Thioredoxin Reductase

Conclusions:

Relatively small changes in protein sequence can have dramatic effects on the behavior of proteins in expression and crystallization.

Multiple species and chunking are two promising methods for obtaining protein variants.

Acknowledgements:

• University of Washington – Jamie Andreyka, Erica Boni, Tiffany Feist, Lutfiyah Haji, Colleen Liu, Natascha Mueller– Fred Buckner, Mike Gelb, Wes VanVoohris, Kevin Bauer– David Baker, David Kim, Erkang Fan, Stan Fields Group– Wim Hol and Hol group

• Seattle Biomedical Research Institute– Liz Worthey, Ellen Sisk, Peter Myler

• Hauptman Woodward Medical Research Institute– George Detitta, Joe Luft, Nancy Fehrman, Angela Luricella et al.

• Seattle Crystallization and Structure Determination Units– Oleksandr Kalyuzhniy, Lori Anderson– Ethan Merritt, Isolde Le Trong, Mark Robien

• Collaborators:– SSRL Stanford– ALS Berkeley

NIH/NIGMS/NIAID

Documents

Structural Genomics of Pathogenic Protozoa Christopher Mehlin [email protected] Protein Production and Crystallization Workshop 2004