Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Sponsored by:
Participating Experts:Joshua Plotkin, Ph.D.University of PennsylvaniaPhiladelphia, PA
Webinar SeriesWebinar SeriesScienceScience
Brought to you by the Science/AAAS Business Office
Christine Vogel, Ph.D.University of Texas at AustinAustin, TX
Mark Welch, Ph.D.DNA2.0Menlo Park, CA
28 October, 2009
From Genes to ProteinsFrom Genes to ProteinsThe Impact of Gene Sequence onTranslation and Expression
coding-sequence determinants of gene expression
joshua b. plotkinuniversity of pennsylvania
grzegorz kudla
andrew murray
david tollervey
in collaboration with
the genetic code
the expression code
High CAI =>High expression
Low CAI =>Low expression
codon adaptation (codon bias)
what features of coding sequences influence expression levels?
• synthesize library of synonymous GFP genes
• systematically interrogate effects of codon usage on transcription, mRNA stability, translation (starting in e. coli)
ATG TAA
ATG TAA
ATG TAA
experimental plan
5’...GGGNGTNCTNCARG
CANGANGTYCTYCA...5’
N = {25%A, 25%C, 25%G, 25%T}R = {50%A, 50%G}Y = {50%C, 50%T}
GFP library - synthesis
GFP library - alignment
Completely random mutations (cf Welch et al)
synthetic GFPs
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
GC3
CA
I
all E.coli genes
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
GC3
CA
I
GFP library – sequence diversity
in vitro recombination
reaction
(Gateway entry vector -> Gateway expression vector)
grow overnight to saturation
dilute
1/15
grow 1hgrow 3hinduce expression(T7 polymerase)
1 mM IPTGmeasure GFP fluorescence
protocol (E. coli)
E. coli
Inoculate mediumwith 4 replicates of the
same GFP
GFP library – protein levels
0
2000
4000
6000
8000
10000
12000
GFP ID
fluor
esce
nce
GFP library – protein levels
codon adaptation and protein levels
cf Bulmer (1991)
fluor
esce
nce
codon adaptation (CAI)
mRNA folding and protein levels
cf Andersson & Kurland (1990)Eyre-Walker & Bulmer (1993)
fluor
esce
nce
mRNA folding energy (nt -4 to +37)
mRNA folding and protein levels
mRNA folding energy (nt -4 to +37)
fluor
esce
nce
mRNA folding and protein levelsS
igni
fican
ce (-
log
p)
window center (nt)
mRNA folding and protein levels
endogenous e. coli genes may have already undergone selection for reduced 5’ mRNA structure:
energy (-4 to +37) vs (+38 to +79): Wilcoxon p
R2 = 0.231
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
0.2 0.3 0.4 0.5 0.6
codon adaptation and cellular fitness
codon adaptation (CAI)
optic
al d
ensi
ty
codon adaptation and cellular fitness
Toxic mistranslation-induced misfolding (Drummond et al 2005)?
r(CAI, fluor/mRNA) = 0.09 (ns)
r(CAI, fluor/coomassie)= -0.07 (ns)
…but even an undetectably small amount of mistranslation-induced toxicity could impose a large fitness cost
• codon adaptation does not correlate with expression in E. coli
• 5’ mRNA structure had a predominant effect on gene expression in our data (based on random mutations)
• significant residual variation yet unexplained
• poor codon adaptation reduces cellular fitness, likely by imposing a load on ribosome pool
summary
grzegorz kudla
andrew murray
david tollervey
thanks!
The relationship between protein and mRNA expression
levels
Christine Vogel
University of Texas at Austin
The Central Dogma of Biology
ProteinProtein
RNARNA
DNADNA
structures: http://www.molecularstation.com
Transcription
Translation
mRNA degradation
Protein degradation
The Central Dogma of Biology
ProteinProtein
RNARNA
DNADNATranscription
Translation
mRNA degradation
Protein degradation
structures: http://www.molecularstation.com
Multiple mechanisms regulate protein expression
RNARNA
ProteinProtein
AAA(A)nCap
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Translation
poly(A) tail
Multiple mechanisms regulate protein expression
RNARNA
ProteinProtein
AAA(A)nCap
Ribosome
Translation
poly(A) tailInternal entry sitesuORFs
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Multiple mechanisms regulate protein expression
RNARNA
ProteinProtein
AAA(A)nCapNucleotide composition
Codon usage
miRNARNA-binding protein
Ribosome
Translation
poly(A) tailInternal entry sitesuORFs
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Multiple mechanisms regulate protein expression
RNARNA
ProteinProteinUb
UbUbPEST
NH2
K
AAA(A)nCap
Protein degradation
poly(A) tail
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Multiple mechanisms regulate protein expression
RNARNA
ProteinProtein
AAA(A)nCap
Protein degradation
poly(A) tail
Ubiquitinylation
N-degrons
Degradation signals
Ub
UbUbPEST
NH2
K
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Multiple mechanisms regulate protein expression
RNARNA
ProteinProtein
Amino acid compositionStructure
AAA(A)nCap
Protein degradation
poly(A) tail
Ubiquitinylation
N-degrons
Modifications
Degradation signals
Ub
UbUbPEST
NH2
K
Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d
Methods to study protein expression regulation
ProteinProtein
RNARNA
DNADNA
Time course,poly(A)
Polysomal profiling, Ribosome footprinting
Cycloheximide, Pulse chase
Microarrays, SAGE, RNA-seq
Methods to study protein expression regulation
ProteinProtein
RNARNA
DNADNA
Microarrays, SAGE, RNA-seq
Time course, poly(A)
Polysomal profiling, Ribosome footprinting
Tagged proteinsShotgun proteomics
Lu, Nat Biotech, 2007 25(1)Braisted, BMC Bioinf, 2008 (8)
Cycloheximide, Pulse chase
Protein ~ mRNA across organisms
N=2468R2=0.58
Yeast
Abreu, Molecular BioSystems, 2009
Protein ~ mRNA across organisms
N=423R2=0.47
N=2468R2=0.58
E. coliYeast
Abreu, Molecular BioSystems, 2009
Protein ~ mRNA across organisms
N=511R2=0.22
N=423R2=0.47
N=2468R2=0.58
E. coli
Human
Yeast
Abreu, Molecular BioSystems, 2009
Protein expression regulation in humans
HumanDaoy medulloblastoma cell lysate
efficient translationstable protein
inefficient translationunstable protein
HumanDaoy medulloblastoma cell lysate
mRNA
Pro
tein
27%
What explains the rest?
Protein variance explained
Protein expression regulation in humans
Translation and protein degradation regulation are encoded in sequence features
ProteinProtein
RNARNAAAA(A)nCap
translation and protein degradation regulation
Ub
UbUbPEST
NH2
K
Translation and protein degradation regulation are encoded in sequence features
ProteinProtein
RNARNAAAA(A)nCap
translation and protein degradation regulation
Sequence lengthuORFs
NucleotidesALU
KozakCodon usage
miRNA bindingPoly-adenylation sites
Amino acids and PropertiesUnstructuredness
Degradation SignalsSecondary Structure
etc. etc. etc.sequence signatures
PEST
NH2
K
Ub
UbUb
Sequence length correlates strongly (and inversely) with protein expression
3’5’-0.53*** -0.19***-0.10
Spearman rank (Protein vs. feature, fixing mRNA)
mRNA
Pro
tein
Leng
th
Sequence length correlates strongly (and inversely) with protein expression
• ribosome fidelity [Ingolia, Science 2009 324(5924)]• protein folding [Drummond, PNAS 2003 102(40)]
• miRNA [Sandberg, Science 2008 320(5883)]• alternative cleavage [Mayr, Cell 2009 138(4)]
• secondary structures [e.g. Ringner, Plos CompBio 2005 1(7)]
3’5’-0.53*** -0.19***-0.10
Spearman rank (Protein vs. feature, fixing mRNA)
mRNA
Pro
tein
Leng
th
Protein stability is a significant factor to regulate expression levels
Spearman rank
PEST regions -0.37***Unstructuredness -0.18*
Protein Stability Index# 0.09
Ser, Glu, Leu (Polar amino acids) -0.24***Glycine 0.17***
Phosphorylation$ 0.06
# Yen, Science 2009, 322(5903)$ http://www.phosphopep.org/
Ub
UbUb
+
mRNAPro
tein
Length
Stability
Amino acids
PEST
NH2
K
Translation initiation efficiency influences protein production (per mRNA)
Spearman rank
AUG and uORFs (5’UTR) -0.21***Secondary structures (5’UTR) -0.20***
3’5’
mRNAPro
tein
Tran
slatio
n in
itiatio
n
-
Translation initiation efficiency influences protein production (per mRNA)
Spearman rank
AUG and uORFs (5’UTR) -0.21***Secondary structures (5’UTR) -0.20***
3’5’
mRNAPro
tein
Tran
slatio
n in
itiatio
n
-
>Putative Transcription Factor ZNF462 5’UTRGGAGAGGGAGGGAGGGAGAGAGAGAGAGAGGGAGAGAGACGGATATCTCAGGTCATCTGC AGCTGCAGCGAGTCTGAGGAGCCGAGGAAGGCAGGGAAGATGGCGATCCTCCATTGCTG AGACCCGGCAGAAGCACATGAGACTCCCAAACAACTTCCACAACAATAACCCGAGCAGGAA GAGGAGAAAGAGAAAGAGGATAAGGAGGCGGTGGGGCTGGAGAACCCGAAGCACCTCCCG GCGCCGGGACGCTTCTTCTGTTCCTAATGTGAGAGGCTAGACCCAGATC
mRNA expression and sequence characteristics explain two-thirds of protein expression variation
mRNA
Pro
tein
27%Protein variance explained:
mRNA expression and sequence characteristics explain two-thirds of protein expression variation
mRNA
Pro
tein
27%Protein variance explained:
mRNAP
rote
in
Leng
th
46%
mRNA expression and sequence characteristics explain two-thirds of protein expression variation
mRNA
Pro
tein
27%Protein variance explained:
mRNAP
rote
in
Leng
th
46%
271911
5
Combined contributions: mRNALengthAmino acids (and properties)Nucleotides (and structure)
…mRNA
Pro
tein
…
67%…
Summary and conclusions
The protein vs. mRNA correlation varies widely across organism.
We can explain and predict ~2/3 of the variation in protein expression in a human cell system.
ProteinProtein
RNARNA
DNADNA
We can use such models to:
- Identify ‘hotspots’ of extreme translation and degradation regulation
- Characterize human cell types
- Understand the relationship between transcription, translation and degradation
AcknowledgmentsCollaborators and Co-authors:
Edward Marcotte, Dan Boutz(UT Austin, TX)
Luiz Penalva, Raquel de Sousa Abreu, Daijin Ko, Devraj Sandhu(UT San Antonio, TX)
Dan Miranker, Smriti Ramakrishnan(Computer Science, UT Austin, TX)
John Braisted, Srilatha Kuntumalla, Rembert Pieper(JC Venter Institute, DC)
Bruce A. Shapiro, Shu-Yun Le(National Cancer Institute)
Funding:
Synthetic Gene Design for Heterologous Expression
Mark WelchDNA2.0, Inc
October 28, 2009
Navigating Gene Design Space
Max CAI
MatchHost Bias
Codon PairBias
mRNAStructure
TargetGC%
Min RareCodons
Min RNaseE
Min SD-like
RemoveSplice
Min polyA Sites
HarmonizeCodons
?
~10100 codings for a 30kDa protein!
Interrogating E. coli Preferences NSF SBIR Funded Study
• Only synonymous codon usage varied
• Two different genes studied: a DNA polymerase and an scFv
In silico design (DoE) Gene synthesis Express in E. coli(pET, BL21)
scFV Gene Variant Set
Increased biasDecreased bias
Expression(% cell mass)
0%
15%
30%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
scFv Variants
Variables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24GCA 0.025 0.011 0.014 0.007 0.018 0 0.025 0.018 0.011 0.011 0.021 0.014 0.014 0.021 0 0.014 0.007 0.025 0.021 0.028 0.007 0.018 0.007 0.014GCC 0.021 0.021 0 0.021 0.011 0 0.007 0.011 0 0.007 0.007 0.007 0.011 0.014 0 0.014 0.007 0.011 0.007 0 0.014 0.018 0.007 0.004GCG 0.018 0.025 0.028 0.018 0.014 0.028 0.032 0.032 0.021 0.021 0.036 0.028 0.028 0.028 0.021 0.025 0.032 0.011 0.014 0.025 0.032 0.018 0.028 0.032GCT 0 0.007 0.021 0.018 0.021 0.036 0 0.004 0.032 0.025 0 0.014 0.011 0 0.043 0.011 0.018 0.018 0.021 0.011 0.011 0.011 0.021 0.014AGA 0 0 0 0 0 0 0 0 0 0 0 0 0.018 0 0 0 0 0 0 0 0 0 0 0CGA 0 0.007 0 0 0 0 0 0.007 0 0 0 0.014 0 0 0 0 0 0 0 0 0 0 0 0CGC 0.014 0.007 0.021 0.014 0.007 0.014 0.018 0.007 0.018 0.018 0.021 0.011 0.018 0.021 0.021 0.011 0.007 0.011 0.018 0.025 0.028 0.014 0.018 0.018CGG 0 0.004 0 0 0 0 0 0.014 0 0 0 0 0.004 0 0 0 0 0 0 0 0 0.011 0 0CGT 0.032 0.028 0.025 0.032 0.039 0.032 0.028 0.018 0.028 0.028 0.025 0.021 0.007 0.025 0.025 0.036 0.039 0.036 0.028 0.021 0.018 0.021 0.028 0.028AAC 0.011 0.007 0.021 0.018 0.018 0.021 0.007 0.011 0.021 0.011 0.011 0.011 0.014 0.014 0.021 0.014 0.014 0.011 0.018 0.021 0.011 0.014 0.021 0.011AAT 0.011 0.014 0 0.004 0.004 0 0.014 0.011 0 0.011 0.011 0.011 0.007 0.007 0 0.007 0.007 0.011 0.004 0 0.011 0.007 0 0.011GAC 0.028 0.028 0.028 0.028 0.018 0.011 0.018 0.028 0.046 0.036 0.021 0.018 0.025 0.021 0.032 0.032 0.036 0.036 0.036 0.032 0.043 0.032 0.028 0.028GAT 0.032 0.032 0.032 0.032 0.043 0.05 0.043 0.032 0.014 0.025 0.039 0.043 0.036 0.039 0.028 0.028 0.025 0.025 0.025 0.028 0.018 0.028 0.032 0.032T GC 0.007 0 0.011 0.007 0.004 0.007 0.004 0.007 0.007 0.004 0.007 0.014 0.007 0.014 0.011 0.011 0 0.007 0.007 0.007 0.004 0.011 0.007 0.007T GT 0.007 0.014 0.004 0.007 0.011 0.007 0.011 0.007 0.007 0.011 0.007 0 0.007 0 0.004 0.004 0.014 0.007 0.007 0.007 0.011 0.004 0.007 0.007CAA 0.021 0.018 0.004 0.018 0.021 0.004 0.028 0.032 0.004 0.021 0.032 0.039 0.021 0.021 0.004 0.014 0.004 0.018 0.007 0.004 0.028 0.028 0.014 0.011CAG 0.028 0.032 0.046 0.032 0.028 0.046 0.021 0.018 0.046 0.028 0.018 0.011 0.028 0.028 0.046 0.036 0.046 0.032 0.043 0.046 0.021 0.021 0.036 0.039GAA 0.011 0.011 0.011 0.014 0.014 0.014 0.014 0.021 0.014 0.014 0.011 0.014 0.021 0.014 0.021 0.018 0.014 0.011 0.018 0.014 0.007 0.007 0.014 0.011GAG 0.011 0.011 0.011 0.007 0.007 0.007 0.007 0 0.007 0.007 0.011 0.007 0 0.007 0 0.004 0.007 0.011 0.004 0.007 0.014 0.014 0.007 0.011GGA 0 0.018 0 0.007 0 0 0 0.011 0 0.032 0 0.011 0.018 0 0 0.004 0 0 0 0 0 0.021 0 0GGC 0.039 0.039 0.046 0.039 0.046 0.064 0.064 0.043 0.046 0.032 0.071 0.032 0.053 0.05 0.046 0.046 0.043 0.053 0.068 0.036 0.021 0.036 0.06 0.064GGG 0 0.014 0 0.021 0.004 0 0 0.014 0 0.021 0 0.021 0.011 0 0 0 0 0 0 0 0 0.014 0 0GGT 0.078 0.046 0.071 0.05 0.068 0.053 0.053 0.05 0.071 0.032 0.046 0.053 0.036 0.068 0.071 0.068 0.075 0.064 0.05 0.082 0.096 0.046 0.057 0.053CAC 0.025 0.011 0.014 0.011 0.018 0.018 0.018 0.014 0.011 0.021 0.014 0.014 0.014 0.018 0.021 0.018 0.021 0.025 0.028 0.011 0.014 0.011 0.025 0.011CAT 0.007 0.021 0.018 0.021 0.014 0.014 0.014 0.018 0.021 0.011 0.018 0.018 0.018 0.014 0.011 0.014 0.011 0.007 0.004 0.021 0.018 0.021 0.007 0.021AT A 0 0.007 0 0 0 0 0 0.004 0 0 0 0.007 0 0 0 0 0 0 0 0 0 0.004 0 0AT C 0.021 0.018 0.028 0.021 0.028 0.036 0.018 0.021 0.028 0.036 0.032 0.018 0.025 0.025 0.025 0.039 0.021 0.036 0.028 0.021 0.014 0.011 0.032 0.032AT T 0.021 0.018 0.014 0.021 0.014 0.007 0.025 0.018 0.014 0.007 0.011 0.018 0.018 0.018 0.018 0.004 0.021 0.007 0.014 0.021 0.028 0.028 0.011 0.011CT A 0 0 0 0 0 0 0 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0CT C 0 0.007 0 0.011 0.004 0 0 0.007 0 0.014 0 0.011 0.007 0 0 0.004 0 0 0 0 0 0.018 0 0CT G 0.071 0.028 0.071 0.036 0.039 0.071 0.071 0.043 0.071 0.043 0.071 0.007 0.021 0.071 0.071 0.053 0.071 0.071 0.071 0.071 0.071 0.014 0.071 0.071CT T 0 0.007 0 0.004 0.011 0 0 0.004 0 0.004 0 0.025 0.014 0 0 0.011 0 0 0 0 0 0.011 0 0T T A 0 0.011 0 0.007 0.007 0 0 0.007 0 0.007 0 0.018 0.007 0 0 0 0 0 0 0 0 0.011 0 0T T G 0.004 0.021 0.004 0.018 0.014 0.004 0.004 0.011 0.004 0.007 0.004 0.014 0.025 0.004 0.004 0.007 0.004 0.004 0.004 0.004 0.004 0.021 0.004 0.004AAA 0.021 0.021 0.025 0.025 0.032 0.039 0.028 0.032 0.025 0.028 0.025 0.028 0.036 0.032 0.039 0.025 0.036 0.032 0.028 0.021 0.018 0.018 0.025 0.036AAG 0.018 0.018 0.014 0.014 0.007 0 0.011 0.007 0.014 0.011 0.014 0.011 0.004 0.007 0 0.014 0.004 0.007 0.011 0.018 0.021 0.021 0.014 0.004T T C 0.021 0.021 0.018 0.018 0.028 0.025 0.004 0.018 0.025 0.025 0.018 0.011 0.011 0.021 0.025 0.021 0.014 0.018 0.025 0.011 0.021 0.018 0.032 0.028T T T 0.011 0.011 0.014 0.014 0.004 0.007 0.028 0.014 0.007 0.007 0.014 0.021 0.021 0.011 0.007 0.011 0.018 0.014 0.007 0.021 0.011 0.014 0 0.004CCA 0 0.011 0 0.007 0.007 0 0 0.014 0 0.007 0 0.007 0.011 0 0 0 0.007 0.014 0.004 0 0.011 0.011 0.018 0.007CCC 0 0.004 0 0 0 0 0 0 0 0 0 0.014 0.007 0 0 0 0 0 0 0 0 0.011 0 0CCG 0.036 0.011 0.036 0.021 0.025 0.036 0.036 0.018 0.036 0.021 0.036 0.007 0.011 0.036 0.036 0.036 0.025 0.018 0.028 0.036 0.007 0.007 0.018 0.025CCT 0 0.011 0 0.007 0.004 0 0 0.004 0 0.007 0 0.007 0.007 0 0 0 0.004 0.004 0.004 0 0.018 0.007 0 0.004AGC 0.121 0.025 0.039 0.018 0.039 0.004 0.121 0.028 0.028 0.039 0.121 0.032 0.032 0.121 0.004 0.021 0.036 0.028 0.032 0.043 0.036 0.032 0.053 0.032AGT 0 0.014 0 0.025 0.004 0 0 0.018 0 0.007 0 0.018 0.021 0 0 0.004 0 0 0 0 0 0.018 0 0T CA 0.004 0.025 0.004 0.011 0.007 0.004 0.004 0.028 0.004 0.025 0.004 0.021 0.021 0.004 0.004 0.007 0.004 0.004 0.004 0.004 0.004 0.014 0.004 0.004T CC 0 0.011 0.05 0.032 0.021 0.05 0 0.025 0.046 0.028 0 0.021 0.021 0 0.06 0.039 0.036 0.039 0.05 0.039 0.05 0.021 0.025 0.05T CG 0 0.025 0 0.014 0.004 0 0 0.004 0 0.011 0 0.021 0.004 0 0 0.014 0 0 0 0 0 0.021 0 0T CT 0 0.025 0.032 0.025 0.05 0.068 0 0.021 0.046 0.014 0 0.011 0.025 0 0.057 0.039 0.05 0.053 0.039 0.039 0.036 0.018 0.043 0.039ACA 0 0.025 0 0.011 0 0 0 0.018 0 0.004 0 0.014 0.014 0 0 0.007 0 0 0 0 0 0.028 0 0ACC 0.039 0.032 0.043 0.028 0.05 0.057 0.05 0.018 0.057 0.036 0.043 0.021 0.028 0.046 0.043 0.032 0.05 0.043 0.05 0.043 0.018 0.007 0.043 0.046ACG 0.043 0.018 0.004 0.025 0.007 0.004 0.032 0.039 0.004 0.021 0.039 0.036 0.028 0.036 0.004 0.011 0.007 0.021 0.014 0.004 0.036 0.021 0.011 0.004ACT 0 0.007 0.036 0.018 0.025 0.021 0 0.007 0.021 0.021 0 0.011 0.011 0 0.036 0.032 0.025 0.018 0.018 0.036 0.028 0.025 0.028 0.032T AC 0.036 0.036 0.025 0.046 0.046 0.043 0.043 0.014 0.028 0.036 0.039 0.025 0.025 0.021 0.039 0.043 0.043 0.039 0.028 0.025 0.039 0.025 0.053 0.06T AT 0.028 0.028 0.039 0.018 0.018 0.021 0.021 0.05 0.036 0.028 0.025 0.039 0.039 0.043 0.025 0.021 0.021 0.025 0.036 0.039 0.025 0.039 0.011 0.004GT A 0 0.007 0.014 0.011 0.014 0 0 0 0.011 0.004 0 0.014 0.007 0 0 0.018 0 0.007 0.007 0.014 0.014 0.007 0.004 0GT C 0.021 0.018 0 0.007 0.004 0 0.014 0.021 0 0 0.011 0.004 0 0.014 0 0 0 0.007 0.007 0 0.014 0.014 0.004 0.011GT G 0.011 0.011 0.011 0.014 0.007 0.025 0.018 0.011 0.018 0.025 0.018 0.014 0.025 0.014 0.021 0.018 0.011 0.011 0.011 0.011 0.007 0.018 0.025 0.021GT T 0.011 0.007 0.018 0.011 0.018 0.018 0.011 0.011 0.014 0.014 0.014 0.011 0.011 0.014 0.021 0.007 0.032 0.018 0.018 0.018 0.007 0.004 0.011 0.011
GC% 5 7 .4 5 1 .7 5 4 .9 5 3 .3 5 2 .0 5 4 .9 5 6 .5 5 1 .6 5 5 .4 5 4 .2 5 8 .0 4 9 .7 5 0 .7 5 7 .1 5 4 .3 5 4 .6 5 3 .7 5 4 .2 5 6 .0 5 3 .7 5 4 .0 5 1 .5 5 6 .7 5 6 .3
5' AT 0 .4 0 0 0 .0 6 7 0 .2 0 0 0 .6 6 7 0 .6 6 7 0 .0 6 7 0 .4 0 0 0 .1 3 3 0 .4 0 0 0 .2 0 0 0 .0 6 7 0 .8 0 0 0 .7 3 3 0 .0 6 7 0 .4 0 0 0 .1 3 3 0 .4 0 0 0 .3 3 3 0 .2 6 7 0 .2 0 0 0 .3 3 3 0 .4 0 0 0 .3 3 3 0 .2 6 7
CAI 0 .7 2 5 0 .4 6 0 0 .8 0 8 0 .5 6 7 0 .7 1 2 0 .8 6 8 0 .7 2 3 0 .4 8 7 0 .8 2 8 0 .5 6 3 0 .7 2 5 0 .4 1 7 0 .4 7 8 0 .7 4 7 0 .8 6 3 0 .7 2 2 0 .8 2 2 0 .7 6 8 0 .8 0 5 0 .7 9 3 0 .6 7 7 0 .4 2 1 0 .8 0 0 0 .8 0 3
Rare Codons 2 5 3 2 3 6 1 5 2 2 4 2 2 3 7 2 5 9 4 8 2 2 1 6 2 2 2 2 2 5 7 2 2
GC Clusters 2 3 1 2 1 4 2 5 1 3 1 0 0 1 1 4 1 4 1 3 1 6 1 0 8 7 9 2 0 2 8 2 0
AT Clusters 5 1 1 1 1 8 1 0 4 5 2 1 1 0 9 8 1 3 8 1 3 6 7 5 4 1 3 1 2 6 1 0 3 2
mRNA Strucutre -1 1 .8 -9 .6 -1 0 .8 -1 0 .2 -9 .1 -1 0 .0 -1 1 .7 -9 .2 -1 0 .4 -1 0 .7 -1 2 .1 -9 .4 -9 .6 -1 2 .2 -9 .5 -1 0 .4 -1 0 .5 -1 0 .2 -1 0 .7 -1 0 .6 -1 0 .4 -9 .1 -1 0 .9 -1 0 .3
5' RNA Structure -1 3 .0 -1 5 .0 -1 2 .7 -1 1 .6 -1 2 .2 -1 2 .9 -1 3 .3 -1 0 .1 -1 2 .3 -1 1 .5 -1 4 .5 -1 0 .0 -1 0 .0 -1 6 .2 -1 2 .1 -1 7 .8 -1 0 .9 -1 3 .6 -1 4 .6 -1 2 .8 -1 1 .8 -1 0 .0 -1 3 .3 -1 1 .9
RNaseE Sites 7 .0 6 .0 3 .0 7 .0 7 .0 6 .0 1 0 .0 5 .0 3 .0 5 .0 6 .0 7 .0 7 .0 6 .0 5 .0 4 .0 5 .0 2 .0 3 .0 5 .0 6 .0 7 .0 1 .0 3 .0
F
P
S
T
Y
V
E
G
HI
L
K
A
R
NDCQ
Codon Frequencies
OtherFeatures
Protein Expression not Correlated to CAI or 5’ mRNA Structure
R²(scFv) = 0.1003
R²(Pol) = 0.0109
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
-11.0 -9.0 -7.0 -5.0 -3.0 -1.0
Rel
ativ
e E
xpre
ssio
n
Free Energy (kcal/mol)
R²(scFv) = 0.0013
R²(Pol) = 0.0029
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Rel
ativ
e E
xpre
ssio
n
CAI
CAI 5’ mRNA Structure(Window: -4 to +37)
Polymerase
scFv
Variant Relative Expression
15 7 ± 1
Hyb 15-15-19 17 ± 6
Hyb 15-19-15 7 ± 1
Hyb 19-15-15 10 ± 1
Hyb 15-19-19 34 ± 2
Hyb 19-15-19 35 ± 4
Hyb 19-19-15 25 ± 3
19 100 ± 22
Variant Hybrids Distributed Coding Effects – Polymerase Hybrid Set #1
75 325 575
• Search for any local effects (deleterious motifs, etc.)• Further diversification along useful trajectories
Multivariate Regression Analysis of Codon Usage
• Partial least squares regression (PLS) used to identify relationships between codon use and expression
• Input: individual codon frequencies in genes
• Output: optimal set of weights for each codon that best predict expression
Other directions of codon bias?
CAI
Exp
ress
ion
X
Multivariate Analysis of Expression Combined Model
• Validated with random subset cross validation (20% left out)
New Designs
Polymerase
scFv
Preference for ‘Reserve’ tRNAs?
AA%AA, scFv
%AA, coli Codon
Preference Ratio
tRNA sensitivity1
Ser 12.5 4.7 AGC 2.1 3.4AGU 0.0 3.4
UCA 0.4 7.5
UCC 0.6 35.5
UCG 0.6 4.4
UCU 0.4 7.9
Thr 8.2 5.4 ACA 0.0 5.7ACC 0.9 20.9
ACG 2.4 2.4ACU 0.5 6.6
1Elf, et al (2003) Science,300:1718
tRNA sensitivity dependent on Fci[tRNAi ]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33NEHKY 0 1 0 ‐1 0 ‐1 0 1 0 ‐1 0 1 0 1 0 ‐1 0 1 0 ‐1 0 ‐1 0 1 0 ‐1 0 1 0 1 0 ‐1 1
A 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1R 0 1 ‐1 0 0 1 ‐1 0 0 1 ‐1 0 0 1 ‐1 0 0 ‐1 1 0 0 ‐1 1 0 0 ‐1 1 0 0 ‐1 1 0 1D 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 1 1 ‐1 ‐1 0 0 0 0 1 1 ‐1 ‐1 1C 0 1 0 ‐1 ‐1 0 1 0 0 1 0 ‐1 ‐1 0 1 0 0 ‐1 0 1 1 0 ‐1 0 0 ‐1 0 1 1 0 ‐1 0 1Q 0 0 ‐1 ‐1 1 1 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 1 1 ‐1 ‐1 0 0 0 0 1 1 ‐1 ‐1 0 0 1G 0 1 ‐1 0 1 0 0 ‐1 0 1 ‐1 0 1 0 0 ‐1 0 ‐1 1 0 ‐1 0 0 1 0 ‐1 1 0 ‐1 0 0 1 1I 0 0 0 0 0 0 0 0 ‐1 ‐1 1 1 1 1 ‐1 ‐1 0 0 0 0 0 0 0 0 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1L 0 1 0 ‐1 0 ‐1 0 1 ‐1 0 1 0 1 0 ‐1 0 0 ‐1 0 1 0 1 0 ‐1 1 0 ‐1 0 ‐1 0 1 0 1F 0 0 ‐1 ‐1 0 0 ‐1 ‐1 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 ‐1 ‐1 0 0 ‐1 ‐1 0 0 1P 0 1 ‐1 0 0 1 ‐1 0 1 0 0 ‐1 1 0 0 ‐1 0 ‐1 1 0 0 ‐1 1 0 ‐1 0 0 1 ‐1 0 0 1 1S 0 0 0 0 ‐1 ‐1 1 1 1 1 ‐1 ‐1 0 0 0 0 0 0 0 0 1 1 ‐1 ‐1 ‐1 ‐1 1 1 0 0 0 0 1T 0 1 0 ‐1 ‐1 0 1 0 1 0 ‐1 0 0 ‐1 0 1 0 ‐1 0 1 1 0 ‐1 0 ‐1 0 1 0 0 1 0 ‐1 1V 0 0 ‐1 ‐1 1 1 0 0 1 1 0 0 0 0 ‐1 ‐1 0 0 1 1 ‐1 ‐1 0 0 ‐1 ‐1 0 0 0 0 1 1 1
5'AT 1 ‐1 1 ‐1 ‐1 1 ‐1 1 ‐1 1 ‐1 1 1 ‐1 1 ‐1 ‐1 1 ‐1 1 1 ‐1 1 ‐1 1 ‐1 1 ‐1 ‐1 1 ‐1 1 1RCO 1 1 ‐1 ‐1 ‐1 ‐1 1 1 ‐1 ‐1 1 1 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1 1 ‐1 ‐1 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1
Gene Variant Set for Yeast with Dr. Robert Stroud, UCSF
Gene Variants
Des
ign
Varia
bles
Protein
1 Increased bias (use more frequent codons)0 Host bias‐1 Decreased bias (use more infrequent codons)
• Human membrane protein expressed in S. cerevisiae• Total protein in membrane fraction analyzed• WT gene shows no detectable expression• Top expression level ~1mg/L
Yeast Bias, high frequency cut-off
Yeast Bias, low frequency cut-off
High frequency codon biased
PLS Model of Yeast Expression Data
R2=0.891R2 (CV)=0.809
Conclusions, Ongoing Work
• Systematic gene diversification useful to identify coding parameters relevant to expression
• Heterologous gene expression is correlated to codon usage• Codon preferences may reflect tRNA sensitivity to over-
consumption• Several ongoing studies with academic and industry
collaborators: E. coli, plants, yeasts, fungi, mammalian cells, trypanosomes, cell-free systems and more
Look out for more webinars in the series at:
www.sciencemag.org/webinar
For related information on this webinar topic, go to:
www.optimizedgene.com
To provide feedback on this webinar, please e‐mail
your comments to [email protected]
Sponsored by:
Brought to you by the Science/AAAS Business Office
Webinar SeriesWebinar SeriesScienceScience
28 October, 2009
From Genes to ProteinsFrom Genes to ProteinsThe Impact of Gene Sequence onTranslation and Expression
Slide Number 1coding-sequence determinants of gene expressionSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23The relationship between protein and mRNA expression levels��Christine Vogel�The Central Dogma of BiologyThe Central Dogma of BiologyMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMethods to study protein expression regulationMethods to study protein expression regulationProtein ~ mRNA across organismsProtein ~ mRNA across organismsProtein ~ mRNA across organismsProtein expression regulation in humansSlide Number 39Translation and protein degradation regulation are encoded in sequence featuresTranslation and protein degradation regulation are encoded in sequence featuresSequence length correlates strongly (and inversely) with protein expressionSequence length correlates strongly (and inversely) with protein expressionProtein stability is a significant factor to regulate expression levelsTranslation initiation efficiency influences protein production (per mRNA) Translation initiation efficiency influences protein production (per mRNA) mRNA expression and sequence characteristics explain two-thirds of protein expression variationmRNA expression and sequence characteristics explain two-thirds of protein expression variationmRNA expression and sequence characteristics explain two-thirds of protein expression variationSummary and conclusionsAcknowledgmentsSynthetic Gene Design for Heterologous ExpressionNavigating Gene Design SpaceInterrogating E. coli Preferences �NSF SBIR Funded StudyscFV Gene Variant SetProtein Expression not Correlated to CAI or 5’ mRNA StructureVariant Hybrids�Distributed Coding Effects – Polymerase Hybrid Set #1Multivariate Regression Analysis of Codon UsageSlide Number 59Preference for ‘Reserve’ tRNAs?Gene Variant Set for Yeast�with Dr. Robert Stroud, UCSFPLS Model of Yeast Expression DataConclusions, Ongoing WorkSlide Number 64