Practical tips for cloning, expressing and purifying proteins for structural biology

Preview:

DESCRIPTION

Practical tips for cloning, expressing and purifying proteins for structural biology. Aled Edwards Banting and Best Department of Medical Research University of Toronto, Canada aled.edwards@utoronto.ca Affinium Pharmaceuticals Toronto, Canada aedwards@afnm.com. - PowerPoint PPT Presentation

Citation preview

Practical tips for cloning, expressing and purifying proteins for structural biology

Aled Edwards

Banting and Best Department of Medical ResearchUniversity of Toronto, Canadaaled.edwards@utoronto.ca

Affinium PharmaceuticalsToronto, Canadaaedwards@afnm.com

Molecular biological approaches to structural biology

An excellent structural sample usually has the following properties

• Lack of conformational heterogeneity

• Soluble at high concentrations

• Pure

Molecular biology is probably fastest way to transform “poor”sample into an “excellent” one.

Outline

• Historical perspective on engineering proteins for structural biology

• Practical advice for cloning/purification of structural samples

• Ancillary benefits of high-throughput studies

RNA polymerase IIFrom 15Å to 3Å by eliminating heterogeneity

Another source of sample heterogeneityEukaryotic proteins comprise multiple domains

• Conformational heterogeneity lowers probability of crystallization

• Protein domains

• Are resistant to proteolysis

• Fold autonomously

• Can usually be expressed in bacteria

• Are between 15 and 30kDa (NMR or X-ray size)

• Are fundamental unit of protein function

• Domains are often only tractable targets for HTP crystallography

EBNA1 DNA-binding domain(No sequence homologue in database)

RPA Domain StructureA collection of OB-folds

RPA70

RPA32

RPA14

A B

RPA crystallization

• Start with full-length protein purified using baculovirus (Wold)

• Identify domain (aa 1-442) soluble in E coli (Wold)

• Crystallize domain (7Å)

• Use limited proteolysis to define smaller domain (aa161-442) (3.5Å….and same cell as 7Å crystal)

• Create many constructs varying N- and C-termini to identify final construct (aa 181-422). (2.2Å…solve structure)

Final tally: 15 different constructs

RPA70 Domains A and BTwo OB-folds bound to DNA

AB L12 loops

L45 loops

How does one map domains?

Domain mapping using limited proteolysis

Integrative Proteomics

Protease

TFIIS

131

240

264

309

Transcript cleavage and read-through(Nucleic acid binding?)

RNA polymerase binding

TFIIS Domain Structure

124

1

Binds holoenzyme.Similar to elongin, CRSP70

I II III

Industrialized Domain Mapping

•Partial proteolysis in 96 well plates

•Optimized set of proteases

•Low protein requirement

•No SDS-PAGE

•No N-terminal sequencing

•Direct identification of domains by mass spectrometry

DomainHunterTM

DomainHunterTM

23000 28000 33000 m/z

-1.0

-0.8

-0.6

-0.4

-0.2

-0.0

0.2

r.i.

0

0.1

0.25

1.0

2.5

Pro

tea

se T

itrat

ion

5

25

350

57

333

18

316

50

253

60

233

32

219

52

216

12

205

07

Mass Matching sequence Expression Solubility

B 10324.0 G[44-133]R +++ ++C 12352.0 G[44-150]D noA 9131.0 I[55-133]R ++ ++D 11159.0 I[55-150]D no

DomainHunter Applied to NMR Sample

Fragment

Residue NumberN 20 40 60 80 100 120 140

B

AC

D

V8 cleavage site

Chymotrypsin site

A B

MTH40

MTH1184

MTH538

MTH129

MTH1048MTH1699

MTH1790

MTH152

MTH1615

MTH1175

MTH150

Structural Proteomics

Nat. Str. Biol. Oct/Nov 2000

5 moredone

3 moresoon

Molecular biology for crystallization and for large-scale studies

1. Basic steps in creating expression vectors for E. coli

2. Practical tips for making fewer mistakes

3. Application of methods to higher-throughput

4. Alternate expression systems

5. Some results

E coli is the first choice……why?

• Cost effective• Easy to grow• Abundance of expertise and reagents• Easy to incorporate selenomethionine• High yield• Rapid doubling time and rapid scale-up

Factors involved in successful expression of recombinant proteinsin Escherichia coli cytoplasm

Expression vector

Copy number (gene dosage – sometimes better less than more)

Promoter choice (T7, Ptac, Plac, Para )

Little or no expression before induction

Reliable and adjustable expression

mRNA stability (RNAaseE- mutant)

Translation

Consensus SD sequence

Proper spacing and sequence before the initiation codon

Possible mRNA secondary structures that block ribosome binding orinternal ribosome binding site

Codon Bias

But which E coli?

BL21(DE3) F- ompT hsdSB (rB-,mB-), gal, dcm, (DE3)

BL21-Star(DE3) F- ompT hsdSB (rB-,mB-), gal, dcm, rne131, (DE3)

Tuner(DE3) F- ompT hsdSB (rB- mB-) gal dcm lacY1 (DE3)

BL21-Gold(DE3) F- ompT hsdS (rB- mB-) dcm+ Tetr gal endA (DE3)

Conventional cloning approach

1. Select vector of choice

2. Restriction digest the vector

3. PCR the insert

4. Restriction digest the insert

5. Ligate the vector and insert

6. Transform and plate

7. Pick colonies and screen for insert

8. Screen positive clones for protein expression

9. Sequence positive clones

Which vector/tag?

1. T7 RNA polymerase-based systems is overwhelming choice

- Highly specific

- High yields

- Exquisitely controlled

2. Choice of vector

- Restriction sites (are there internal sites in gene?)

- Are there many possible sites?

- Are the enzymes commonly available?

- Do the enzymes cut near ends of DNA fragments?

3. Which tag?

- Relatively little data on which generates best proteins for

crystallization

- His-tag, GST, MBP all are effective at purification

- His tag offers advantage of being able to screen +/- tag

for crystals (double bang for the buck)

- Make sure there is a protease site to remove tag

Practical issues with cloning

1. Choice of protease???

- Thrombin (more difficult to get but highly effective)

- TEV, recombinant with his-tag, stable mutant with

less autoproteolysis activity (Waugh), needs calcium,

finicky

- Factor X, enterokinase…..avoid

“I can’t use thrombin, it digests my protein”

Purification of Thrombin from Thrombostat

1. We start with 10,000 units of Thrombostat fromParke-Davis and dissolved in 10 ml of 50Mm NaPO4Ph6.5 and 5% glycerol.

2. The solution was then spun at 10,000rpm for 10 minin an SS34 rotor to clarity

3. This was then loaded onto a Poros S Column (7.5mmX100mm, Perseptive Biosystems) preequilibrated in theabove buffer at 3ml/min

4. The column was then washed in the above buffer untilthe OD 280 reached zero.

5. The column was then washed with 100Mm NaPO4 Ph6.5and 5% glycerol until the absorbance went to zero.

6. Thrombin was then eluted from the column in 300MmNaPO4 Ph8.5 and 5% glycerol at a flow rate of1ml/min. 0.5 ML fractions were collected and runout on a 15% SDS-PAGE and 35kD protein (Thrombin)was pooled and frozen in small aliquots.

7. Total protein yield was about 3mg in 10 ml ofbuffer.

Schleiff, E., Khanna, R., Orlicky, S. and Vrielink, A.Expression, purification, and in vitro characterization ofthe human outer mitochondrial membrane receptor humantranslocase of the outer mitochondrial membrane 20. Arch.Biochem. Biophys. 367:95-103 (1999)

Practical issues with cloning

Restrict the plasmid

- Double digestion often leave one end undigested,

which in turn results in high background due to

re-ligation

- Phosphatase treatment and gel purification of

large prep makes life much easier in long run

- Optimize system to get no background

Practical issues with cloning

PCR the insert

- For HTP studies need to optimize condition for genome or clone

- Order primers from reputable supplier (most common

problem is in deprotecting oligos)

- Have someone else double-check primer sequence

- Order primers with requisite overhang (be over-cautious)

- Use error-correcting polymerase

Practical issues with cloning

Digest the PCR insert

- Make sure that there are no internal sites

- Purify the restricted product

Practical issues with cloning

Ligation and transformation

- If vector control background is low, and PCR product is

purified, then should be no problem

- Use highly competent cells

Practical issues with cloning

Screen for positive clones

- PCR screen from colony

- Screen by protein expression

- Make note of expression, as well as solubility

gene

T7

6HisTEV

6His TEVMBP

6His TEVTRX

STOP

STOP

STOP

STOP

T7

T7

T7

6His TEV

Clones

Screening for inserts by PCR

Cloning (conventional method)

TOPO cloning

GATEWAY™ Cloning System Technology - Phage

E.coli

attL attRE.coli lysogen

IHF, Int, Xis

att

L

attR

attB

attP

IHF, Int

attP

attB

attL+attR attB+attP

GATEWAY™ Cloning System Technology - Phage

IHF, Int, XisIHF, Int

attB

attB1 x attP1

attB2 x attP2

attR1 x attL1

attR2 x attL2

attP attPattP1attP2

attB1 attB2

E.coliattB

?

attP1

attB1

attP2

attB2

?

attR1 attR2

attL1 attL2?

attR1 attR2

attL1 attL2

x x?

“Gateway type” cloning

“Gateway type” cloning

PCR x96 clones

Cloning and Test Expression

ligate transform

Kan, Amp24 x 3ml LBKan, Amp37C, Induce at OD600Grow O/N 15C or 20C

300 ul 300 ul

X 96 X 96

Spin, Freeze, Lyse with BugBusterTM

Spin again

SDS PAGE

Spin, Dissolve pellet in SDS

supernatant

X 96

X 96

0

10

20

30

40

50

60

70

80

90

100

cloned expressed soluble

1750 clones

Expression systems for eukaryotic proteins

• Baculovirus infection of insect cells• Simple, relatively cost effective, selenomethionine-compatible, not fully able to replicate human post-translational modifications

• Viral infection of human cells• Viruses not as easy to work with, high yield, proper modification

• Stable transformation of human cells• Usually lower expression. After selection, transcription sometimes goes away. Low throughput due to selection process

• Transfection of human cells • High expression in few cells, uses up lots of DNA

lac Z mini attTn7

BacmidHelperHelper

ForeignGene 1

pPolh

ForeignGene 2

p10

Tn7LTn7R

pFastBacDualDonor

Competent DH10Bac E.coli cells

Transformation

E.coli (Lac7-)Containing Recombinant Bacmid

Mini-prep of HighMolecular Weight DNA

InfectionRecombinant GeneExpressionorViral Amplification

Transfection ofInsect cells with

CELLFECTIN Reagent

Transposition

Antibiotic selection

Day 1 Days 2-3

Day 4Day 8

RecombinantBacmid DNA

Generation of recombinant baculoviruses and gene expressionwith the Bac-To-Bac expression system

RecombinantBacmid DNA

Protein Purification

Purification parallel des proteines

1.

2.1 2 3 4 5 1’ 2’ 3’ 4’ 5’

ProteoMax – Automated Protein Purification and Concentration System

Affinium Pharmaceuticals

A few observations from our work

Structure determination strategy

< 20 kDa > 20 kDa

15N/13C-labeled

15N-labeled

Se-Methioninelabeled

3-5 weeks ofNMR data collection

Synchrotron Data

68 Escherichia coli 68 Thermotoga maritima

4, 288 ORFs 1, 877 ORFs

Topt 37 °C Topt 80 °C

4,639,221 bp 1,860,725 bp

Orthologues

Expressed & soluble

Concentratable to > 2mg/ml

62 48

50 44

3515 9

9 Proteins could not be purified from either

species

E. coli T. maritima

311 13

Total Crystals (30)

Total Good/Promising NMR spectra (14)

4

E. coli T. maritima

24

310 6

NMR & Crystallography: complementary!

24 small proteins for which both crystal trials and NMR data collected

Good/promising HSQC

crystals

Of 32 proteins that gave poor HSQC’s7 have crystallized

Data storage and Mining: Defined Vocabulary

Property Vocabulary

Expression level 0-5 (no expression – high expression)

Solubility (test expression) 0-5 (insoluble – highly soluble)

Concentratability 0-5 (or mg/ml)

Crystal trials clearprecipitatecrystal

Initial HSQC NMR goodpromising poor

5 5 4 3 2 1 0 0

Expression/solubility testing

Solubility Tree based On 58 sequence properties

Kluger & Gerstein

Mostly solubleMostly insoluble

Empirical Bioinformatics

Clear dropPrecipitateCrystal

Affinium Pharmaceuticals

Efficiency through mining crystal screens

Different proteins

Cry

stalli

zati

on

condit

ions

Crystal trial: Diminishing Returns

0

50

100

150

200

250

300

number of screening conditions

Lawrence McIntosh (UBC) C. Mackereth, G. Lee

Mike Kennedy (PNNL)* J. Cort, T. Ramelot

Kalle Gehring (McGill) I. Ekiel G. Kozlov

Dave Wishart (U. Alberta) S. Bhattacharyya

Weontae Lee (Yonsei U.)

Emil Pai (U. Toronto) V. Saridakis, N. Wu

Collaborators on Structural Proteomics

*Northeast Structural Genomics Consortium

Thomas Szypersky* (SUNY Buffalo) Mark Gerstein (Yale) * Yval Kluger Ning Lan

Sherry Mowbray (Sweden)

Liang Tong (Columbia) *John Hunt (Columbia) * Andrzej Joachimiak (ANL)* Guy Montelione (Rutgers) *

*Midwest Structural Genomics Consortium