64
Sponsored by: Participating Experts: Joshua Plotkin, Ph.D. University of Pennsylvania Philadelphia, PA Webinar Series Webinar Series Science Science Brought to you by the Science/AAAS Business Office Christine Vogel, Ph.D. University of Texas at Austin Austin, TX Mark Welch, Ph.D. DNA2.0 Menlo Park, CA 28 October, 2009 From Genes to Proteins From Genes to Proteins The Impact of Gene Sequence on Translation and Expression

Participating Experts - Science Slides... · 2017. 1. 25. · Mark Welch, Ph.D. DNA2.0. Menlo Park, CA. 28 October, 2009. From Genes to Proteins. From Genes to Proteins The Impact

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Sponsored by:

    Participating Experts:Joshua Plotkin, Ph.D.University of PennsylvaniaPhiladelphia, PA

    Webinar SeriesWebinar SeriesScienceScience

    Brought to you by the Science/AAAS Business Office

    Christine Vogel, Ph.D.University of Texas at AustinAustin, TX

    Mark Welch, Ph.D.DNA2.0Menlo Park, CA

    28 October, 2009

    From Genes to ProteinsFrom Genes to ProteinsThe Impact of Gene Sequence onTranslation and Expression

  • coding-sequence determinants of gene expression

    joshua b. plotkinuniversity of pennsylvania

  • grzegorz kudla

    andrew murray

    david tollervey

    in collaboration with

  • the genetic code

  • the expression code

  • High CAI =>High expression

    Low CAI =>Low expression

    codon adaptation (codon bias)

  • what features of coding sequences influence expression levels?

  • • synthesize library of synonymous GFP genes

    • systematically interrogate effects of codon usage on transcription, mRNA stability, translation (starting in e. coli)

    ATG TAA

    ATG TAA

    ATG TAA

    experimental plan

  • 5’...GGGNGTNCTNCARG

    CANGANGTYCTYCA...5’

    N = {25%A, 25%C, 25%G, 25%T}R = {50%A, 50%G}Y = {50%C, 50%T}

    GFP library - synthesis

  • GFP library - alignment

    Completely random mutations (cf Welch et al)

  • synthetic GFPs

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    GC3

    CA

    I

    all E.coli genes

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    GC3

    CA

    I

    GFP library – sequence diversity

  • in vitro recombination

    reaction

    (Gateway entry vector -> Gateway expression vector)

    grow overnight to saturation

    dilute

    1/15

    grow 1hgrow 3hinduce expression(T7 polymerase)

    1 mM IPTGmeasure GFP fluorescence

    protocol (E. coli)

    E. coli

    Inoculate mediumwith 4 replicates of the

    same GFP

  • GFP library – protein levels

  • 0

    2000

    4000

    6000

    8000

    10000

    12000

    GFP ID

    fluor

    esce

    nce

    GFP library – protein levels

  • codon adaptation and protein levels

    cf Bulmer (1991)

    fluor

    esce

    nce

    codon adaptation (CAI)

  • mRNA folding and protein levels

    cf Andersson & Kurland (1990)Eyre-Walker & Bulmer (1993)

    fluor

    esce

    nce

    mRNA folding energy (nt -4 to +37)

  • mRNA folding and protein levels

    mRNA folding energy (nt -4 to +37)

    fluor

    esce

    nce

  • mRNA folding and protein levelsS

    igni

    fican

    ce (-

    log

    p)

    window center (nt)

  • mRNA folding and protein levels

    endogenous e. coli genes may have already undergone selection for reduced 5’ mRNA structure:

    energy (-4 to +37) vs (+38 to +79): Wilcoxon p

  • R2 = 0.231

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    1.1

    0.2 0.3 0.4 0.5 0.6

    codon adaptation and cellular fitness

    codon adaptation (CAI)

    optic

    al d

    ensi

    ty

  • codon adaptation and cellular fitness

    Toxic mistranslation-induced misfolding (Drummond et al 2005)?

    r(CAI, fluor/mRNA) = 0.09 (ns)

    r(CAI, fluor/coomassie)= -0.07 (ns)

    …but even an undetectably small amount of mistranslation-induced toxicity could impose a large fitness cost

  • • codon adaptation does not correlate with expression in E. coli

    • 5’ mRNA structure had a predominant effect on gene expression in our data (based on random mutations)

    • significant residual variation yet unexplained

    • poor codon adaptation reduces cellular fitness, likely by imposing a load on ribosome pool

    summary

  • grzegorz kudla

    andrew murray

    david tollervey

    thanks!

  • The relationship between protein and mRNA expression

    levels

    Christine Vogel

    University of Texas at Austin

  • The Central Dogma of Biology

    ProteinProtein

    RNARNA

    DNADNA

    structures: http://www.molecularstation.com

    Transcription

    Translation

    mRNA degradation

    Protein degradation

  • The Central Dogma of Biology

    ProteinProtein

    RNARNA

    DNADNATranscription

    Translation

    mRNA degradation

    Protein degradation

    structures: http://www.molecularstation.com

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProtein

    AAA(A)nCap

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

    Translation

    poly(A) tail

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProtein

    AAA(A)nCap

    Ribosome

    Translation

    poly(A) tailInternal entry sitesuORFs

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProtein

    AAA(A)nCapNucleotide composition

    Codon usage

    miRNARNA-binding protein

    Ribosome

    Translation

    poly(A) tailInternal entry sitesuORFs

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProteinUb

    UbUbPEST

    NH2

    K

    AAA(A)nCap

    Protein degradation

    poly(A) tail

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProtein

    AAA(A)nCap

    Protein degradation

    poly(A) tail

    Ubiquitinylation

    N-degrons

    Degradation signals

    Ub

    UbUbPEST

    NH2

    K

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

  • Multiple mechanisms regulate protein expression

    RNARNA

    ProteinProtein

    Amino acid compositionStructure

    AAA(A)nCap

    Protein degradation

    poly(A) tail

    Ubiquitinylation

    N-degrons

    Modifications

    Degradation signals

    Ub

    UbUbPEST

    NH2

    K

    Abreu, Molecular BioSystems, 2009, DOI: 10.1039/b908315d

  • Methods to study protein expression regulation

    ProteinProtein

    RNARNA

    DNADNA

    Time course,poly(A)

    Polysomal profiling, Ribosome footprinting

    Cycloheximide, Pulse chase

    Microarrays, SAGE, RNA-seq

  • Methods to study protein expression regulation

    ProteinProtein

    RNARNA

    DNADNA

    Microarrays, SAGE, RNA-seq

    Time course, poly(A)

    Polysomal profiling, Ribosome footprinting

    Tagged proteinsShotgun proteomics

    Lu, Nat Biotech, 2007 25(1)Braisted, BMC Bioinf, 2008 (8)

    Cycloheximide, Pulse chase

  • Protein ~ mRNA across organisms

    N=2468R2=0.58

    Yeast

    Abreu, Molecular BioSystems, 2009

  • Protein ~ mRNA across organisms

    N=423R2=0.47

    N=2468R2=0.58

    E. coliYeast

    Abreu, Molecular BioSystems, 2009

  • Protein ~ mRNA across organisms

    N=511R2=0.22

    N=423R2=0.47

    N=2468R2=0.58

    E. coli

    Human

    Yeast

    Abreu, Molecular BioSystems, 2009

  • Protein expression regulation in humans

    HumanDaoy medulloblastoma cell lysate

    efficient translationstable protein

    inefficient translationunstable protein

  • HumanDaoy medulloblastoma cell lysate

    mRNA

    Pro

    tein

    27%

    What explains the rest?

    Protein variance explained

    Protein expression regulation in humans

  • Translation and protein degradation regulation are encoded in sequence features

    ProteinProtein

    RNARNAAAA(A)nCap

    translation and protein degradation regulation

    Ub

    UbUbPEST

    NH2

    K

  • Translation and protein degradation regulation are encoded in sequence features

    ProteinProtein

    RNARNAAAA(A)nCap

    translation and protein degradation regulation

    Sequence lengthuORFs

    NucleotidesALU

    KozakCodon usage

    miRNA bindingPoly-adenylation sites

    Amino acids and PropertiesUnstructuredness

    Degradation SignalsSecondary Structure

    etc. etc. etc.sequence signatures

    PEST

    NH2

    K

    Ub

    UbUb

  • Sequence length correlates strongly (and inversely) with protein expression

    3’5’-0.53*** -0.19***-0.10

    Spearman rank (Protein vs. feature, fixing mRNA)

    mRNA

    Pro

    tein

    Leng

    th

  • Sequence length correlates strongly (and inversely) with protein expression

    • ribosome fidelity [Ingolia, Science 2009 324(5924)]• protein folding [Drummond, PNAS 2003 102(40)]

    • miRNA [Sandberg, Science 2008 320(5883)]• alternative cleavage [Mayr, Cell 2009 138(4)]

    • secondary structures [e.g. Ringner, Plos CompBio 2005 1(7)]

    3’5’-0.53*** -0.19***-0.10

    Spearman rank (Protein vs. feature, fixing mRNA)

    mRNA

    Pro

    tein

    Leng

    th

  • Protein stability is a significant factor to regulate expression levels

    Spearman rank

    PEST regions -0.37***Unstructuredness -0.18*

    Protein Stability Index# 0.09

    Ser, Glu, Leu (Polar amino acids) -0.24***Glycine 0.17***

    Phosphorylation$ 0.06

    # Yen, Science 2009, 322(5903)$ http://www.phosphopep.org/

    Ub

    UbUb

    +

    mRNAPro

    tein

    Length

    Stability

    Amino acids

    PEST

    NH2

    K

  • Translation initiation efficiency influences protein production (per mRNA)

    Spearman rank

    AUG and uORFs (5’UTR) -0.21***Secondary structures (5’UTR) -0.20***

    3’5’

    mRNAPro

    tein

    Tran

    slatio

    n in

    itiatio

    n

    -

  • Translation initiation efficiency influences protein production (per mRNA)

    Spearman rank

    AUG and uORFs (5’UTR) -0.21***Secondary structures (5’UTR) -0.20***

    3’5’

    mRNAPro

    tein

    Tran

    slatio

    n in

    itiatio

    n

    -

    >Putative Transcription Factor ZNF462 5’UTRGGAGAGGGAGGGAGGGAGAGAGAGAGAGAGGGAGAGAGACGGATATCTCAGGTCATCTGC AGCTGCAGCGAGTCTGAGGAGCCGAGGAAGGCAGGGAAGATGGCGATCCTCCATTGCTG AGACCCGGCAGAAGCACATGAGACTCCCAAACAACTTCCACAACAATAACCCGAGCAGGAA GAGGAGAAAGAGAAAGAGGATAAGGAGGCGGTGGGGCTGGAGAACCCGAAGCACCTCCCG GCGCCGGGACGCTTCTTCTGTTCCTAATGTGAGAGGCTAGACCCAGATC

  • mRNA expression and sequence characteristics explain two-thirds of protein expression variation

    mRNA

    Pro

    tein

    27%Protein variance explained:

  • mRNA expression and sequence characteristics explain two-thirds of protein expression variation

    mRNA

    Pro

    tein

    27%Protein variance explained:

    mRNAP

    rote

    in

    Leng

    th

    46%

  • mRNA expression and sequence characteristics explain two-thirds of protein expression variation

    mRNA

    Pro

    tein

    27%Protein variance explained:

    mRNAP

    rote

    in

    Leng

    th

    46%

    271911

    5

    Combined contributions: mRNALengthAmino acids (and properties)Nucleotides (and structure)

    …mRNA

    Pro

    tein

    67%…

  • Summary and conclusions

    The protein vs. mRNA correlation varies widely across organism.

    We can explain and predict ~2/3 of the variation in protein expression in a human cell system.

    ProteinProtein

    RNARNA

    DNADNA

    We can use such models to:

    - Identify ‘hotspots’ of extreme translation and degradation regulation

    - Characterize human cell types

    - Understand the relationship between transcription, translation and degradation

  • AcknowledgmentsCollaborators and Co-authors:

    Edward Marcotte, Dan Boutz(UT Austin, TX)

    Luiz Penalva, Raquel de Sousa Abreu, Daijin Ko, Devraj Sandhu(UT San Antonio, TX)

    Dan Miranker, Smriti Ramakrishnan(Computer Science, UT Austin, TX)

    John Braisted, Srilatha Kuntumalla, Rembert Pieper(JC Venter Institute, DC)

    Bruce A. Shapiro, Shu-Yun Le(National Cancer Institute)

    Funding:

  • Synthetic Gene Design for Heterologous Expression

    Mark WelchDNA2.0, Inc

    October 28, 2009

  • Navigating Gene Design Space

    Max CAI

    MatchHost Bias

    Codon PairBias

    mRNAStructure

    TargetGC%

    Min RareCodons

    Min RNaseE

    Min SD-like

    RemoveSplice

    Min polyA Sites

    HarmonizeCodons

    ?

    ~10100 codings for a 30kDa protein!

  • Interrogating E. coli Preferences NSF SBIR Funded Study

    • Only synonymous codon usage varied

    • Two different genes studied: a DNA polymerase and an scFv

    In silico design (DoE) Gene synthesis Express in E. coli(pET, BL21)

  • scFV Gene Variant Set

    Increased biasDecreased bias

    Expression(% cell mass)

    0%

    15%

    30%

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

    scFv Variants

    Variables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24GCA 0.025 0.011 0.014 0.007 0.018 0 0.025 0.018 0.011 0.011 0.021 0.014 0.014 0.021 0 0.014 0.007 0.025 0.021 0.028 0.007 0.018 0.007 0.014GCC 0.021 0.021 0 0.021 0.011 0 0.007 0.011 0 0.007 0.007 0.007 0.011 0.014 0 0.014 0.007 0.011 0.007 0 0.014 0.018 0.007 0.004GCG 0.018 0.025 0.028 0.018 0.014 0.028 0.032 0.032 0.021 0.021 0.036 0.028 0.028 0.028 0.021 0.025 0.032 0.011 0.014 0.025 0.032 0.018 0.028 0.032GCT 0 0.007 0.021 0.018 0.021 0.036 0 0.004 0.032 0.025 0 0.014 0.011 0 0.043 0.011 0.018 0.018 0.021 0.011 0.011 0.011 0.021 0.014AGA 0 0 0 0 0 0 0 0 0 0 0 0 0.018 0 0 0 0 0 0 0 0 0 0 0CGA 0 0.007 0 0 0 0 0 0.007 0 0 0 0.014 0 0 0 0 0 0 0 0 0 0 0 0CGC 0.014 0.007 0.021 0.014 0.007 0.014 0.018 0.007 0.018 0.018 0.021 0.011 0.018 0.021 0.021 0.011 0.007 0.011 0.018 0.025 0.028 0.014 0.018 0.018CGG 0 0.004 0 0 0 0 0 0.014 0 0 0 0 0.004 0 0 0 0 0 0 0 0 0.011 0 0CGT 0.032 0.028 0.025 0.032 0.039 0.032 0.028 0.018 0.028 0.028 0.025 0.021 0.007 0.025 0.025 0.036 0.039 0.036 0.028 0.021 0.018 0.021 0.028 0.028AAC 0.011 0.007 0.021 0.018 0.018 0.021 0.007 0.011 0.021 0.011 0.011 0.011 0.014 0.014 0.021 0.014 0.014 0.011 0.018 0.021 0.011 0.014 0.021 0.011AAT 0.011 0.014 0 0.004 0.004 0 0.014 0.011 0 0.011 0.011 0.011 0.007 0.007 0 0.007 0.007 0.011 0.004 0 0.011 0.007 0 0.011GAC 0.028 0.028 0.028 0.028 0.018 0.011 0.018 0.028 0.046 0.036 0.021 0.018 0.025 0.021 0.032 0.032 0.036 0.036 0.036 0.032 0.043 0.032 0.028 0.028GAT 0.032 0.032 0.032 0.032 0.043 0.05 0.043 0.032 0.014 0.025 0.039 0.043 0.036 0.039 0.028 0.028 0.025 0.025 0.025 0.028 0.018 0.028 0.032 0.032T GC 0.007 0 0.011 0.007 0.004 0.007 0.004 0.007 0.007 0.004 0.007 0.014 0.007 0.014 0.011 0.011 0 0.007 0.007 0.007 0.004 0.011 0.007 0.007T GT 0.007 0.014 0.004 0.007 0.011 0.007 0.011 0.007 0.007 0.011 0.007 0 0.007 0 0.004 0.004 0.014 0.007 0.007 0.007 0.011 0.004 0.007 0.007CAA 0.021 0.018 0.004 0.018 0.021 0.004 0.028 0.032 0.004 0.021 0.032 0.039 0.021 0.021 0.004 0.014 0.004 0.018 0.007 0.004 0.028 0.028 0.014 0.011CAG 0.028 0.032 0.046 0.032 0.028 0.046 0.021 0.018 0.046 0.028 0.018 0.011 0.028 0.028 0.046 0.036 0.046 0.032 0.043 0.046 0.021 0.021 0.036 0.039GAA 0.011 0.011 0.011 0.014 0.014 0.014 0.014 0.021 0.014 0.014 0.011 0.014 0.021 0.014 0.021 0.018 0.014 0.011 0.018 0.014 0.007 0.007 0.014 0.011GAG 0.011 0.011 0.011 0.007 0.007 0.007 0.007 0 0.007 0.007 0.011 0.007 0 0.007 0 0.004 0.007 0.011 0.004 0.007 0.014 0.014 0.007 0.011GGA 0 0.018 0 0.007 0 0 0 0.011 0 0.032 0 0.011 0.018 0 0 0.004 0 0 0 0 0 0.021 0 0GGC 0.039 0.039 0.046 0.039 0.046 0.064 0.064 0.043 0.046 0.032 0.071 0.032 0.053 0.05 0.046 0.046 0.043 0.053 0.068 0.036 0.021 0.036 0.06 0.064GGG 0 0.014 0 0.021 0.004 0 0 0.014 0 0.021 0 0.021 0.011 0 0 0 0 0 0 0 0 0.014 0 0GGT 0.078 0.046 0.071 0.05 0.068 0.053 0.053 0.05 0.071 0.032 0.046 0.053 0.036 0.068 0.071 0.068 0.075 0.064 0.05 0.082 0.096 0.046 0.057 0.053CAC 0.025 0.011 0.014 0.011 0.018 0.018 0.018 0.014 0.011 0.021 0.014 0.014 0.014 0.018 0.021 0.018 0.021 0.025 0.028 0.011 0.014 0.011 0.025 0.011CAT 0.007 0.021 0.018 0.021 0.014 0.014 0.014 0.018 0.021 0.011 0.018 0.018 0.018 0.014 0.011 0.014 0.011 0.007 0.004 0.021 0.018 0.021 0.007 0.021AT A 0 0.007 0 0 0 0 0 0.004 0 0 0 0.007 0 0 0 0 0 0 0 0 0 0.004 0 0AT C 0.021 0.018 0.028 0.021 0.028 0.036 0.018 0.021 0.028 0.036 0.032 0.018 0.025 0.025 0.025 0.039 0.021 0.036 0.028 0.021 0.014 0.011 0.032 0.032AT T 0.021 0.018 0.014 0.021 0.014 0.007 0.025 0.018 0.014 0.007 0.011 0.018 0.018 0.018 0.018 0.004 0.021 0.007 0.014 0.021 0.028 0.028 0.011 0.011CT A 0 0 0 0 0 0 0 0.004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0CT C 0 0.007 0 0.011 0.004 0 0 0.007 0 0.014 0 0.011 0.007 0 0 0.004 0 0 0 0 0 0.018 0 0CT G 0.071 0.028 0.071 0.036 0.039 0.071 0.071 0.043 0.071 0.043 0.071 0.007 0.021 0.071 0.071 0.053 0.071 0.071 0.071 0.071 0.071 0.014 0.071 0.071CT T 0 0.007 0 0.004 0.011 0 0 0.004 0 0.004 0 0.025 0.014 0 0 0.011 0 0 0 0 0 0.011 0 0T T A 0 0.011 0 0.007 0.007 0 0 0.007 0 0.007 0 0.018 0.007 0 0 0 0 0 0 0 0 0.011 0 0T T G 0.004 0.021 0.004 0.018 0.014 0.004 0.004 0.011 0.004 0.007 0.004 0.014 0.025 0.004 0.004 0.007 0.004 0.004 0.004 0.004 0.004 0.021 0.004 0.004AAA 0.021 0.021 0.025 0.025 0.032 0.039 0.028 0.032 0.025 0.028 0.025 0.028 0.036 0.032 0.039 0.025 0.036 0.032 0.028 0.021 0.018 0.018 0.025 0.036AAG 0.018 0.018 0.014 0.014 0.007 0 0.011 0.007 0.014 0.011 0.014 0.011 0.004 0.007 0 0.014 0.004 0.007 0.011 0.018 0.021 0.021 0.014 0.004T T C 0.021 0.021 0.018 0.018 0.028 0.025 0.004 0.018 0.025 0.025 0.018 0.011 0.011 0.021 0.025 0.021 0.014 0.018 0.025 0.011 0.021 0.018 0.032 0.028T T T 0.011 0.011 0.014 0.014 0.004 0.007 0.028 0.014 0.007 0.007 0.014 0.021 0.021 0.011 0.007 0.011 0.018 0.014 0.007 0.021 0.011 0.014 0 0.004CCA 0 0.011 0 0.007 0.007 0 0 0.014 0 0.007 0 0.007 0.011 0 0 0 0.007 0.014 0.004 0 0.011 0.011 0.018 0.007CCC 0 0.004 0 0 0 0 0 0 0 0 0 0.014 0.007 0 0 0 0 0 0 0 0 0.011 0 0CCG 0.036 0.011 0.036 0.021 0.025 0.036 0.036 0.018 0.036 0.021 0.036 0.007 0.011 0.036 0.036 0.036 0.025 0.018 0.028 0.036 0.007 0.007 0.018 0.025CCT 0 0.011 0 0.007 0.004 0 0 0.004 0 0.007 0 0.007 0.007 0 0 0 0.004 0.004 0.004 0 0.018 0.007 0 0.004AGC 0.121 0.025 0.039 0.018 0.039 0.004 0.121 0.028 0.028 0.039 0.121 0.032 0.032 0.121 0.004 0.021 0.036 0.028 0.032 0.043 0.036 0.032 0.053 0.032AGT 0 0.014 0 0.025 0.004 0 0 0.018 0 0.007 0 0.018 0.021 0 0 0.004 0 0 0 0 0 0.018 0 0T CA 0.004 0.025 0.004 0.011 0.007 0.004 0.004 0.028 0.004 0.025 0.004 0.021 0.021 0.004 0.004 0.007 0.004 0.004 0.004 0.004 0.004 0.014 0.004 0.004T CC 0 0.011 0.05 0.032 0.021 0.05 0 0.025 0.046 0.028 0 0.021 0.021 0 0.06 0.039 0.036 0.039 0.05 0.039 0.05 0.021 0.025 0.05T CG 0 0.025 0 0.014 0.004 0 0 0.004 0 0.011 0 0.021 0.004 0 0 0.014 0 0 0 0 0 0.021 0 0T CT 0 0.025 0.032 0.025 0.05 0.068 0 0.021 0.046 0.014 0 0.011 0.025 0 0.057 0.039 0.05 0.053 0.039 0.039 0.036 0.018 0.043 0.039ACA 0 0.025 0 0.011 0 0 0 0.018 0 0.004 0 0.014 0.014 0 0 0.007 0 0 0 0 0 0.028 0 0ACC 0.039 0.032 0.043 0.028 0.05 0.057 0.05 0.018 0.057 0.036 0.043 0.021 0.028 0.046 0.043 0.032 0.05 0.043 0.05 0.043 0.018 0.007 0.043 0.046ACG 0.043 0.018 0.004 0.025 0.007 0.004 0.032 0.039 0.004 0.021 0.039 0.036 0.028 0.036 0.004 0.011 0.007 0.021 0.014 0.004 0.036 0.021 0.011 0.004ACT 0 0.007 0.036 0.018 0.025 0.021 0 0.007 0.021 0.021 0 0.011 0.011 0 0.036 0.032 0.025 0.018 0.018 0.036 0.028 0.025 0.028 0.032T AC 0.036 0.036 0.025 0.046 0.046 0.043 0.043 0.014 0.028 0.036 0.039 0.025 0.025 0.021 0.039 0.043 0.043 0.039 0.028 0.025 0.039 0.025 0.053 0.06T AT 0.028 0.028 0.039 0.018 0.018 0.021 0.021 0.05 0.036 0.028 0.025 0.039 0.039 0.043 0.025 0.021 0.021 0.025 0.036 0.039 0.025 0.039 0.011 0.004GT A 0 0.007 0.014 0.011 0.014 0 0 0 0.011 0.004 0 0.014 0.007 0 0 0.018 0 0.007 0.007 0.014 0.014 0.007 0.004 0GT C 0.021 0.018 0 0.007 0.004 0 0.014 0.021 0 0 0.011 0.004 0 0.014 0 0 0 0.007 0.007 0 0.014 0.014 0.004 0.011GT G 0.011 0.011 0.011 0.014 0.007 0.025 0.018 0.011 0.018 0.025 0.018 0.014 0.025 0.014 0.021 0.018 0.011 0.011 0.011 0.011 0.007 0.018 0.025 0.021GT T 0.011 0.007 0.018 0.011 0.018 0.018 0.011 0.011 0.014 0.014 0.014 0.011 0.011 0.014 0.021 0.007 0.032 0.018 0.018 0.018 0.007 0.004 0.011 0.011

    GC% 5 7 .4 5 1 .7 5 4 .9 5 3 .3 5 2 .0 5 4 .9 5 6 .5 5 1 .6 5 5 .4 5 4 .2 5 8 .0 4 9 .7 5 0 .7 5 7 .1 5 4 .3 5 4 .6 5 3 .7 5 4 .2 5 6 .0 5 3 .7 5 4 .0 5 1 .5 5 6 .7 5 6 .3

    5' AT 0 .4 0 0 0 .0 6 7 0 .2 0 0 0 .6 6 7 0 .6 6 7 0 .0 6 7 0 .4 0 0 0 .1 3 3 0 .4 0 0 0 .2 0 0 0 .0 6 7 0 .8 0 0 0 .7 3 3 0 .0 6 7 0 .4 0 0 0 .1 3 3 0 .4 0 0 0 .3 3 3 0 .2 6 7 0 .2 0 0 0 .3 3 3 0 .4 0 0 0 .3 3 3 0 .2 6 7

    CAI 0 .7 2 5 0 .4 6 0 0 .8 0 8 0 .5 6 7 0 .7 1 2 0 .8 6 8 0 .7 2 3 0 .4 8 7 0 .8 2 8 0 .5 6 3 0 .7 2 5 0 .4 1 7 0 .4 7 8 0 .7 4 7 0 .8 6 3 0 .7 2 2 0 .8 2 2 0 .7 6 8 0 .8 0 5 0 .7 9 3 0 .6 7 7 0 .4 2 1 0 .8 0 0 0 .8 0 3

    Rare Codons 2 5 3 2 3 6 1 5 2 2 4 2 2 3 7 2 5 9 4 8 2 2 1 6 2 2 2 2 2 5 7 2 2

    GC Clusters 2 3 1 2 1 4 2 5 1 3 1 0 0 1 1 4 1 4 1 3 1 6 1 0 8 7 9 2 0 2 8 2 0

    AT Clusters 5 1 1 1 1 8 1 0 4 5 2 1 1 0 9 8 1 3 8 1 3 6 7 5 4 1 3 1 2 6 1 0 3 2

    mRNA Strucutre -1 1 .8 -9 .6 -1 0 .8 -1 0 .2 -9 .1 -1 0 .0 -1 1 .7 -9 .2 -1 0 .4 -1 0 .7 -1 2 .1 -9 .4 -9 .6 -1 2 .2 -9 .5 -1 0 .4 -1 0 .5 -1 0 .2 -1 0 .7 -1 0 .6 -1 0 .4 -9 .1 -1 0 .9 -1 0 .3

    5' RNA Structure -1 3 .0 -1 5 .0 -1 2 .7 -1 1 .6 -1 2 .2 -1 2 .9 -1 3 .3 -1 0 .1 -1 2 .3 -1 1 .5 -1 4 .5 -1 0 .0 -1 0 .0 -1 6 .2 -1 2 .1 -1 7 .8 -1 0 .9 -1 3 .6 -1 4 .6 -1 2 .8 -1 1 .8 -1 0 .0 -1 3 .3 -1 1 .9

    RNaseE Sites 7 .0 6 .0 3 .0 7 .0 7 .0 6 .0 1 0 .0 5 .0 3 .0 5 .0 6 .0 7 .0 7 .0 6 .0 5 .0 4 .0 5 .0 2 .0 3 .0 5 .0 6 .0 7 .0 1 .0 3 .0

    F

    P

    S

    T

    Y

    V

    E

    G

    HI

    L

    K

    A

    R

    NDCQ

    Codon Frequencies

    OtherFeatures

  • Protein Expression not Correlated to CAI or 5’ mRNA Structure

    R²(scFv) = 0.1003

    R²(Pol) = 0.0109

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    -11.0 -9.0 -7.0 -5.0 -3.0 -1.0

    Rel

    ativ

    e E

    xpre

    ssio

    n

    Free Energy (kcal/mol)

    R²(scFv) = 0.0013

    R²(Pol) = 0.0029

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

    Rel

    ativ

    e E

    xpre

    ssio

    n

    CAI

    CAI 5’ mRNA Structure(Window: -4 to +37)

    Polymerase

    scFv

  • Variant Relative Expression

    15 7 ± 1

    Hyb 15-15-19 17 ± 6

    Hyb 15-19-15 7 ± 1

    Hyb 19-15-15 10 ± 1

    Hyb 15-19-19 34 ± 2

    Hyb 19-15-19 35 ± 4

    Hyb 19-19-15 25 ± 3

    19 100 ± 22

    Variant Hybrids Distributed Coding Effects – Polymerase Hybrid Set #1

    75 325 575

    • Search for any local effects (deleterious motifs, etc.)• Further diversification along useful trajectories

  • Multivariate Regression Analysis of Codon Usage

    • Partial least squares regression (PLS) used to identify relationships between codon use and expression

    • Input: individual codon frequencies in genes

    • Output: optimal set of weights for each codon that best predict expression

    Other directions of codon bias?

    CAI

    Exp

    ress

    ion

    X

  • Multivariate Analysis of Expression Combined Model

    • Validated with random subset cross validation (20% left out)

    New Designs

    Polymerase

    scFv

  • Preference for ‘Reserve’ tRNAs?

    AA%AA, scFv

    %AA, coli Codon

    Preference Ratio

    tRNA sensitivity1

    Ser 12.5 4.7 AGC 2.1 3.4AGU 0.0 3.4

    UCA 0.4 7.5

    UCC 0.6 35.5

    UCG 0.6 4.4

    UCU 0.4 7.9

    Thr 8.2 5.4 ACA 0.0 5.7ACC 0.9 20.9

    ACG 2.4 2.4ACU 0.5 6.6

    1Elf, et al (2003) Science,300:1718

    tRNA sensitivity dependent on Fci[tRNAi ]

  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33NEHKY 0 1 0 ‐1 0 ‐1 0 1 0 ‐1 0 1 0 1 0 ‐1 0 1 0 ‐1 0 ‐1 0 1 0 ‐1 0 1 0 1 0 ‐1 1

    A 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 ‐1 ‐1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1R 0 1 ‐1 0 0 1 ‐1 0 0 1 ‐1 0 0 1 ‐1 0 0 ‐1 1 0 0 ‐1 1 0 0 ‐1 1 0 0 ‐1 1 0 1D 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 1 1 ‐1 ‐1 0 0 0 0 1 1 ‐1 ‐1 1C 0 1 0 ‐1 ‐1 0 1 0 0 1 0 ‐1 ‐1 0 1 0 0 ‐1 0 1 1 0 ‐1 0 0 ‐1 0 1 1 0 ‐1 0 1Q 0 0 ‐1 ‐1 1 1 0 0 0 0 ‐1 ‐1 1 1 0 0 0 0 1 1 ‐1 ‐1 0 0 0 0 1 1 ‐1 ‐1 0 0 1G 0 1 ‐1 0 1 0 0 ‐1 0 1 ‐1 0 1 0 0 ‐1 0 ‐1 1 0 ‐1 0 0 1 0 ‐1 1 0 ‐1 0 0 1 1I 0 0 0 0 0 0 0 0 ‐1 ‐1 1 1 1 1 ‐1 ‐1 0 0 0 0 0 0 0 0 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1L 0 1 0 ‐1 0 ‐1 0 1 ‐1 0 1 0 1 0 ‐1 0 0 ‐1 0 1 0 1 0 ‐1 1 0 ‐1 0 ‐1 0 1 0 1F 0 0 ‐1 ‐1 0 0 ‐1 ‐1 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 ‐1 ‐1 0 0 ‐1 ‐1 0 0 1P 0 1 ‐1 0 0 1 ‐1 0 1 0 0 ‐1 1 0 0 ‐1 0 ‐1 1 0 0 ‐1 1 0 ‐1 0 0 1 ‐1 0 0 1 1S 0 0 0 0 ‐1 ‐1 1 1 1 1 ‐1 ‐1 0 0 0 0 0 0 0 0 1 1 ‐1 ‐1 ‐1 ‐1 1 1 0 0 0 0 1T 0 1 0 ‐1 ‐1 0 1 0 1 0 ‐1 0 0 ‐1 0 1 0 ‐1 0 1 1 0 ‐1 0 ‐1 0 1 0 0 1 0 ‐1 1V 0 0 ‐1 ‐1 1 1 0 0 1 1 0 0 0 0 ‐1 ‐1 0 0 1 1 ‐1 ‐1 0 0 ‐1 ‐1 0 0 0 0 1 1 1

    5'AT 1 ‐1 1 ‐1 ‐1 1 ‐1 1 ‐1 1 ‐1 1 1 ‐1 1 ‐1 ‐1 1 ‐1 1 1 ‐1 1 ‐1 1 ‐1 1 ‐1 ‐1 1 ‐1 1 1RCO 1 1 ‐1 ‐1 ‐1 ‐1 1 1 ‐1 ‐1 1 1 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1 1 ‐1 ‐1 1 1 ‐1 ‐1 ‐1 ‐1 1 1 1

    Gene Variant Set for Yeast with Dr. Robert Stroud, UCSF

    Gene Variants

    Des

    ign

    Varia

    bles

    Protein

    1 Increased bias (use more frequent codons)0 Host bias‐1 Decreased bias (use more infrequent codons)

    • Human membrane protein expressed in S. cerevisiae• Total protein in membrane fraction analyzed• WT gene shows no detectable expression• Top expression level ~1mg/L

    Yeast Bias, high frequency cut-off

    Yeast Bias, low frequency cut-off

    High frequency codon biased

  • PLS Model of Yeast Expression Data

    R2=0.891R2 (CV)=0.809

  • Conclusions, Ongoing Work

    • Systematic gene diversification useful to identify coding parameters relevant to expression

    • Heterologous gene expression is correlated to codon usage• Codon preferences may reflect tRNA sensitivity to over-

    consumption• Several ongoing studies with academic and industry

    collaborators: E. coli, plants, yeasts, fungi, mammalian cells, trypanosomes, cell-free systems and more

  • Look out for more webinars in the series at:

    www.sciencemag.org/webinar

    For related information on this webinar topic, go to:

    www.optimizedgene.com

    To provide feedback on this webinar, please e‐mail

    your comments to [email protected]

    Sponsored by:

    Brought to you by the Science/AAAS Business Office

    Webinar SeriesWebinar SeriesScienceScience

    28 October, 2009

    From Genes to ProteinsFrom Genes to ProteinsThe Impact of Gene Sequence onTranslation and Expression

    Slide Number 1coding-sequence determinants of gene expressionSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23The relationship between protein and mRNA expression levels��Christine Vogel�The Central Dogma of BiologyThe Central Dogma of BiologyMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMultiple mechanisms regulate protein expressionMethods to study protein expression regulationMethods to study protein expression regulationProtein ~ mRNA across organismsProtein ~ mRNA across organismsProtein ~ mRNA across organismsProtein expression regulation in humansSlide Number 39Translation and protein degradation regulation are encoded in sequence featuresTranslation and protein degradation regulation are encoded in sequence featuresSequence length correlates strongly (and inversely) with protein expressionSequence length correlates strongly (and inversely) with protein expressionProtein stability is a significant factor to regulate expression levelsTranslation initiation efficiency influences protein production (per mRNA) Translation initiation efficiency influences protein production (per mRNA) mRNA expression and sequence characteristics explain two-thirds of protein expression variationmRNA expression and sequence characteristics explain two-thirds of protein expression variationmRNA expression and sequence characteristics explain two-thirds of protein expression variationSummary and conclusionsAcknowledgmentsSynthetic Gene Design for Heterologous ExpressionNavigating Gene Design SpaceInterrogating E. coli Preferences �NSF SBIR Funded StudyscFV Gene Variant SetProtein Expression not Correlated to CAI or 5’ mRNA StructureVariant Hybrids�Distributed Coding Effects – Polymerase Hybrid Set #1Multivariate Regression Analysis of Codon UsageSlide Number 59Preference for ‘Reserve’ tRNAs?Gene Variant Set for Yeast�with Dr. Robert Stroud, UCSFPLS Model of Yeast Expression DataConclusions, Ongoing WorkSlide Number 64