41
Tom Knight Ginkgo Bioworks Life with four billion atoms

Tom Knight Ginkgo Bioworks Life with four billion atoms

Embed Size (px)

Citation preview

Page 1: Tom Knight Ginkgo Bioworks Life with four billion atoms

Tom KnightGinkgo Bioworks

Life with fourbillion atoms

Page 2: Tom Knight Ginkgo Bioworks Life with four billion atoms

Energy inthe 1800s

The steam engine has given more to science thanscience has given to the steam engine. --- Lord Kelvin

Page 3: Tom Knight Ginkgo Bioworks Life with four billion atoms

Informationin the 1900s

Page 4: Tom Knight Ginkgo Bioworks Life with four billion atoms
Page 5: Tom Knight Ginkgo Bioworks Life with four billion atoms

AnabolismCatabolism

NaturalComplexity

(Food)

SpecifiedComplexity

(Organisms)

Core of simple universal parts (central metabolites)

(Energy carriers)

Design Information(Genome)

Page 6: Tom Knight Ginkgo Bioworks Life with four billion atoms

Harold Morowitz 1962 & 1984

Page 7: Tom Knight Ginkgo Bioworks Life with four billion atoms

Some history…

• Confusion over what PPLO/Mycoplasma were“The Microbe of pleuorpneumonia” Nocard 1896

• 1932 isolation of “PPLO” Koch postulates.• 1958 Klieneberger-Nobel: free living bacterial

species• Morowitz 1962 SciAm: “the smallest living cell”• 1980 Gilbert effort to sequence M. capricolum• 1982 Morowitz “complete understanding of life”• 1996 Fraser et al. M. genitalium sequence• 1999 Hutchison et al. Minimal genome set for

M. genitalium• 2009 Gibson & Lartigue: Genome transplantation• 2012 Serrano et al. Comprehensive model

Page 8: Tom Knight Ginkgo Bioworks Life with four billion atoms

Complexity, minimality &simplicity

• Complex systems have many parts• Reducing the part count leads generally to

simpler system (minimal part count)• Extreme part count reduction leads to

shared partsThese systems are ironically less simple

• There is an optimal part count for modular design

• Conservation of complexity• Stratified design

Structure function

Page 9: Tom Knight Ginkgo Bioworks Life with four billion atoms

Complexity ReductionUser

Application software

Operating system, user interface

Programming language

Instruction set architecture

Virtual machine

Computer hardware design

Functional computing units

Logic synthesis

Logic gates

Circuit design

Transistors

Mask geometry

Fabrication technologies

Semiconductor physics

Quantum physics

100’s of OS calls100 statements

100’s of instructions

10’s of units

10’s of gate types

4 types of transistors

15 mask layers

6 materials

Page 10: Tom Knight Ginkgo Bioworks Life with four billion atoms

Complexity Reduction

• Good News:

Biology is modular and abstract

Evolution needs modular design as much as we do

We can discover the modular designs, modify them, and use them

Page 11: Tom Knight Ginkgo Bioworks Life with four billion atoms

Engineered Simple Organisms• modular• understood• malleable• low complexity

• Start with a simple existing organism• Remove structure until failure• Rationalize the infrastructure• Learn new biology along the way

The chassis and power supply for our computing

Page 12: Tom Knight Ginkgo Bioworks Life with four billion atoms

Relative Complexity

100 1K 10K 100K 1M 10M 100M 1G 10G 100G

Gen

e

Plas

mid

Myc

opla

sma

geni

taliu

m (5

80 k

B)

Mes

opla

sma

floru

m (7

93 k

B)

E. c

oli (

4.6

MB

)S.

cer

evis

iae

(12

MB

)

Hum

an (3

.3 G

B)

Lilly

(12

GB

)

T7 P

hage

(36

kB)

Log Genome Size, base pairs

Alive

Autotroph

Page 13: Tom Knight Ginkgo Bioworks Life with four billion atoms

Choosing an organism:Mesoplasma florum

• Isolation from the flower of a lemon tree, Florida (McCoy84)

• Safe BSL-1 organism -- an insect commensal

• Not a human or plant or animal pathogen

• No growth at 37C

• Fast growing

40 minute vs.

six hours for doubling in M. genitalium

• Convenient to work with

Facultative anaerobe

• Small genome:

793,244 bp

682 coding regions

Page 14: Tom Knight Ginkgo Bioworks Life with four billion atoms

Tomographic EM of Me. florum

Grant Jensen – Caltech

3-D TEM image ofMesoplasma florumReconstructed from angledTEM images

300 x 400 nm6 nm membranes5 nm ribosomesFalse colored DNA

Page 15: Tom Knight Ginkgo Bioworks Life with four billion atoms

How many atoms?

• Cell diameter is about 400 nm• Approximately 2000 atoms in

diameter• About four billion atoms

70% of these are in water molecules

1.2 billion atoms in biomolecules

DNA is about 40 million atoms, 3%

Page 16: Tom Knight Ginkgo Bioworks Life with four billion atoms
Page 17: Tom Knight Ginkgo Bioworks Life with four billion atoms

Genome characteristics• 793281 base pairs• 26.52% G + C• 682 protein coding regions

UGA for tryptophanNo CGG codon or corresponding tRNAClassic circular genomeoriC, terminator region, gene orientation

• 39 stable RNAs29 tRNAs2x 16S, 23S, 5SRNAse-P, tmRNA, SRP

• One inactive insertion sequence• Gene direction largely oriented with replication

fork

Page 18: Tom Knight Ginkgo Bioworks Life with four billion atoms

Understand the metabolism• Identify major metabolic pathways by

finding critical genes coding for known enzymes

• Predict necessary enzymes which may not have been found

• Evaluate the list of unknown function genes for candidates

• Build the major metabolic pathway map of the organism

• Consider elimination of entire pathways

Page 19: Tom Knight Ginkgo Bioworks Life with four billion atoms

Mfl214, Mfl187

Mfl516, Mfl527, Mfl187

Mfl500 Mfl669Mfl009, Mfl033,Mfl318, Mfl312

Mfl666, Mfl667, Mfl668

Mfl023, Mfl024,Mfl025, Mfl026

ribose ABC transporter

glucosesucrose trehalose xylose

unknownfructose

sn-glycerol-3-phosphate ABC transporter

Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281

Glycolysis

Mfl497 Mfl515, Mfl526 Mfl499 Mfl317?, Mfl313? ?

Mfl181

beta-glucoside

Mfl009, Mfl011, Mfl012, Mfl425, Mfl615, Mfl034, Mfl617, Mfl430, Mfl313?

PTS II SystemMfl519, Mfl565

chitin degradation

Mfl223, Mfl640, Mfl642, Mfl105, Mfl349

Pentose-Phosphate Pathway

glyceraldehyde-3-phosphate

Mfl619, Mfl431, Mfl426

Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl170, Mfl195, Mfl372

Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648,Mfl143, Mfl466, Mfl198, Mfl556, Mfl385

Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375

Purine/Pyrimidine Salvage

glucose-6-phosphate

ribose-5-phosphate

Mfl413, Mfl658

xanthine/uracilpermease

DNA RNAMfl027, Mfl369

competence/DNA transport

DNA Polymerase

degradation

RNA Polymerase

Mfl047, Mfl048, Mfl475

Mfl237

protein translocation complex (Sec)

protein secretion (ftsY)

srpRNA, Mfl479

Signal Recognition Particle (SRP) Ribosome

Export

Mfl182, Mfl183, Mfl184

Mfl509, Mfl510, Mfl511

Mfl652Mfl557

Mfl605Mfl019

Mfl094, Mfl095, Mfl096, Mfl097,

Mfl098

Mfl015

spermidine/putrescineABC transporter

unknown amino acidABC transporterglutamine

ABC transporter

oligopeptide ABC transporter

arginine/ornithineantiporter lysine

APC transporteralanine/Na+ symporter

glutamate/Na+symporter

Mfl016, Mfl664

putrescine/ornithineAPC transporter

23sRNA, 16sRNA, 5sRNA,

Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082,Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080,

Mfl623, Mfl137, Mfl492, Mfl406

Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227,

Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440?

proteins

degradation

Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA

tRNA aminoacylation

ribosomal RNA transfer RNA

messenger RNA

Mfl029, Mfl412, Mfl540, Mfl014, Mfl196,Mfl156, Mfl282, Mfl387, Mfl682, Mfl673, Mfl077, rnpRNA

Mfl563, Mfl548, Mfl088, Mfl258, Mfl329, Mfl374, Mfl541, Mfl005, Mfl647, Mfl231, Mfl209

Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589,

Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355

Mfl086, Mfl162, Mfl163, Mfl161

amino acids

Amino Acid Transport

intraconversion?

Mfl590, Mfl591

Lipid SynthesisMfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626

fatty acid/lipid transporter

Identified Metabolic Pathways in

Mesoplasma florum

Mfl384, Mfl593,Mfl046, Mfl052

L-lactate,acetate

Mfl099, Mfl474,Mfl315, Mfl325,Mfl482

cardiolipin/phospholipids

membrane synthesis

x22

Mfl444, Mfl446, Mfl451

variable surface lipoproteins

hypotheticallipoproteins

phospholipid membrane

Mfl063, Mfl065, Mfl038,Mfl388

Mfl186 formate/nitratetransporter

Mfl060, Mfl167, Mfl383, Mfl250

Formyl-THF Synthesis

THF?

x57hypothetical transmembrane proteins

met-tRNA formylationMfl409, Mfl569

Mfl152, Mfl153, Mfl154

Mfl233, Mfl234, Mfl235

Mfl571, Mfl572

Mfl356, Mfl496, Mfl217

Mfl064, Mfl178Nfl289, Mfl037, Mfl653, Mfl193

Mfl109, Mfl110, Mfl111, Mfl112, Mfl113, Mfl114,

Mfl115, Mfl116

ATP Synthase Complex

ATP ADP

phosphate ABC transporter

phosphonate ABC transporter

metal ion transporter

Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582,

Mfl055, Mfl328

Mfl150, Mfl598, Mfl597, Mfl270, Mfl649

acetyl-CoA

cobalt ABC transporter

Mfl165, Mfl166

K+, Na+transporter

Mfl378

malate transporter?

Mfl340, Mfl373, Mfl521, Mfl588

Pyridine Nucleotide Cycling

NAD+

Electron Carrier Pathways

NADHNADPH

NADP

Flavin Synthesis

riboflavin?

FMN, FADMfl283, Mfl334

Mfl193

Mfl057, Mfl068, Mfl142,Mfl090,

Mfl275

Mfl347, Mfl558

G. Fournier02/23/04

x13+

unknown substrate transporters

PRPP

niacin?

Page 20: Tom Knight Ginkgo Bioworks Life with four billion atoms

How Simple is this?• Missing cell wall, outer membrane• Missing TCA cycle• Missing amino acid synthesis• Missing fatty acid synthesis• One sigma factor• Small number of dna binding proteins• One insertion sequence, probably not active• One restriction system (Sau3AI-like)• CTG/CAG methylation (function?)• Evidence for shared protein function

MDH/LDH (Pollack 97 Crit rev microbiol 23:269)

Page 21: Tom Knight Ginkgo Bioworks Life with four billion atoms

Proteome• Collaboration with Steve Tannenbaum / Yingwu

Wang• 2-D gels + MS spot ID• LC/LC/MS/MS ID of trypsin digests

Page 22: Tom Knight Ginkgo Bioworks Life with four billion atoms

Proteome Results

• 180 spots picked and analyzed

• Mudpit LC/LC/MS/MS also carried out

• 369 proteins identified by trypsin digestion and mass spec out of 682 annotated coding regions

• Transcription of 16S ribosomal RNA

• Stops don’t always stop Instead they cause frame shifts into other frames

Page 23: Tom Knight Ginkgo Bioworks Life with four billion atoms

Transposome insertions• Engineered tetM tetracycline resistance gene• Promoter from Tn4001 tetM gene• Outward directed primers for insertion site

verification• Unique I-SceI cut site

• In vitro binding of Tn5 transposase• Electroporation of Tn5 transposome• Selection with tetracycline• Genomic DNA prep• Cut with MboI frequent cutter & religate• PCR with outward directed primers• Sequence to identify insertion site• Locate disrupted genes• Alternatively: directly sequence from genomic DNA

Page 24: Tom Knight Ginkgo Bioworks Life with four billion atoms

Custom Transposon

tetM-recoded-transposon32165 bp

tetM

ME EndME End

stop codons

stop codons

SP6 promoter

T3 Promoter

I-sceI site

T7 promoter

M13-forward

EcoRI (78)

PstI (2064)

SpeI (2046)

XbaI (117)

AvrII (2139)

NotI (2054)

PvuII (4) PvuII (2163)

Page 25: Tom Knight Ginkgo Bioworks Life with four billion atoms

Tn5 Transposomes• Transposon design issues

• Codon usage

• Promoter design

• Restriction site avoidance

• Cell transformation

• Electroporation voltage

• Selection medium

Page 26: Tom Knight Ginkgo Bioworks Life with four billion atoms

Transposome insertion events

• 2700 currently picked, saved, and sequenced

• 337 Essential Genes + 29 tRNA + 7 essential RNA genes

• Most are unsurprising surface lipoproteins and “unknown function”

• Some surprises: inessential ftsZ, mreB, many ribosomal & tRNA modification proteins, But the Sau3AI homologous restriction system appears

essential About 80 unknown function genes (many GTPases) are

essential

• Compare with Dybvig08 results on M. arthritidis French08 results on M. pulmonis Glass06 results on M. genitalium

• Ordered library of cells with 330 inactivated genes

Page 27: Tom Knight Ginkgo Bioworks Life with four billion atoms

Functional Categories of Essential Genes

Category Number

DNA Replication 22

Cell Division 5

Transcription 12

Nucleotide Transport & metabolism

20

Protein translation 112

Post translational modification 6

Protein secretion 7

Lipid metabolism 7

Coenzyme metabolism 7

Energy production 11

Transporters 35

Page 28: Tom Knight Ginkgo Bioworks Life with four billion atoms

Transposome insertion events

Page 29: Tom Knight Ginkgo Bioworks Life with four billion atoms

Essential is not absolute

• Multi-copy genes are not identified as essentialNADH oxidaseAcyl carrier protein

• Essentiality is defined by the culture conditions

• Genes with stability and reliability function are marked as dispensableDNA repairChaparonesSome RNA modification enzymes

• This is a much more important effect in larger genomes

Page 30: Tom Knight Ginkgo Bioworks Life with four billion atoms

Next in Analysis and Tools

• Genome re-engineering with knock-in/knock-out

• Resequencing

• Whole cell metabolic models

• Plug and play modules for additional function

• Biosafety issues

Page 31: Tom Knight Ginkgo Bioworks Life with four billion atoms

Genome re-engineering tools

• Plasmid: S. citri pSci2 PE protein based (Breton08) J70302 registry part, under test now

• recET recombination system (S. citri recT gene) J70007 recT part, DNA available, being mutated Chloramphenicol resistance gene cassette PheS mutant gene cassette

• Phase 1: Turn on recombination Insert PheS/cat cassette in the target location Select with Chloramphenicol

• Phase 2: Turn on recombination Insert final modification Select with p-chlorophenylalanine

• Result: seamless editing of the chromosome

Page 32: Tom Knight Ginkgo Bioworks Life with four billion atoms

Resequencing• Illumina sequencing is cheap and very high throughput• Relatively straightforward with a pre-existing scaffold

sequence• We get millions of reads of limited length (35-70 bp)

Paired ends, 250-500 bp fragments• Bar coded samples can multiplex the sequencing effort

Allows many samples to be sequenced in a single run

• Resequence the Mesoplasma florum genome• De novo sequence for sixteen additional strains

Collection of Robert Whitcomb• De novo sequencing for several closely related species

Mesoplasma entomophilumMesoplasma lactucae

Page 33: Tom Knight Ginkgo Bioworks Life with four billion atoms

Whole cell modeling

• Approximately 2000 chemical reactions

• About 300 small molecule species

• Faster implementations of stochastic models

• Faster computers

• Comparison against realityMass spec quantitation of metabolites

Page 34: Tom Knight Ginkgo Bioworks Life with four billion atoms

Open Cell Modules

• Energy sources Arginine vs. glucose Photosynthesis

pathway Citric acid cycle

(reverse?)• Amino acid synthesis

Add unnatural AAs• Nucleotide synthesis• Lipid synthesis• Cofactor synthesis• Measurement structures• Environmental niche

Halobacterium Sulfur reducer Temperature optimum

• Membrane export / import• Membrane structure• Sensing of chemical

environment• Flagellar motion• Light sensing• Light production• Cell cycle control

(sporulation)• Biosafety modules

Page 35: Tom Knight Ginkgo Bioworks Life with four billion atoms

Biosafety Barriers

• Codon isolationCGG containing genes are unusable insideTGA containing genes are unusable outsideExtend this idea with more codons

• Pairs of required essential nutrientsReduces likelihood of gradual evolution of

workarounds

• Explicit “kill” switchesOtherwise benign chemicals lethal to this

organismShared function with critical metabolism reduces

drift

Page 36: Tom Knight Ginkgo Bioworks Life with four billion atoms

Engineeredorganism

Naturalorganism

No transfer: UGAnot translated

No transfer: CGGnot translated

X

X

Natural PhageEngineeredphage

XX

Recoding the genome of entire organisms

Page 37: Tom Knight Ginkgo Bioworks Life with four billion atoms

Kit Part the genome

• Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element

• Develop techniques for recombining parts into coherent modulesYAC editing and assembly, e.g.Lambda RED or RecET recombination

• Enable the bootstrapping of cells based on the redesigned genomeLiposome fusion, e.g.

• Learn the design rules for chromosomes

Page 38: Tom Knight Ginkgo Bioworks Life with four billion atoms

Thanks to…• Harold Morowitz• Greg Fournier• Gail Gasparich• Bob Whitcomb• Eric Lander• Bruce Birren• Nicole Stange-

Thomann• George Church• Roger Brent• Grant Jensen• Yingwu Wang• Samantha Burke• PJ Steiner

• Nick Papadakis• Ron Weiss• Drew Endy• Randy Rettberg• Austin Che• Reshma Shetty• MIT Synthetic biology

working group• DARPA, NTT, NSF,

Microsoft• Colleagues at Ginkgo

Bioworks

Page 39: Tom Knight Ginkgo Bioworks Life with four billion atoms

Thank you for your attention

Page 40: Tom Knight Ginkgo Bioworks Life with four billion atoms

Our Plan

• Completely understand a simple organism• Build excellent models and predictive tools• Simplify the organism further

Remove inessential genes Replace dual function genes with single function

equivalents

• Abstract useful modules from other living systems

• Understand and create good models for these modules

• Selectively add these modules to the existing simple cell

The code’s 4 billion years old; it’s time for a rewrite

Page 41: Tom Knight Ginkgo Bioworks Life with four billion atoms

The Mollicute Bibliome

Complete collection of mycoplasma related papers:

• 6,411 and counting

• Books and book chapters also

• Endnote file: mycoplasmas.enl

• Downloaded .pdfs for articles > 1995

• Scanned articles and books, OCR

•Collaboration for “shallow semantic” understanding

• people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso