53
1 Systems and Synthetic Biology: A programming languages point of view Saurabh Srivastava Assistant Research Engineer Computer Science + Bioengineering University of California, Berkeley

1 Systems and Synthetic Biology: A programming languages point of view Saurabh Srivastava Assistant Research Engineer Computer Science + Bioengineering

Embed Size (px)

Citation preview

1

Systems and Synthetic Biology: A programming languages point of view

Saurabh SrivastavaAssistant Research Engineer

Computer Science + Bioengineering

University of California, Berkeley

• Systems biology– Modeling complete biological processes

• Genomics– Reading DNA sequences got incredibly fast, and cheap– Algorithms for sequence data analytics within organisms

• Synthetic biology– DNA synthesis got incredibly cheap– Functional characterization on the way– Algorithms to predict tweaks to organisms

2

One view: biological specifications

One view: syntax of components

One view: semantics of components

In perspective

3

Different scales in biology

Systems Biology operates here

Synthetic Biology operates here

Intracellular processes

Intercellular processes

Protein interactions

Protein function

Organisms

4

Systems Biology operates here

Results and techniques used

Synthetic Biology operates here

Intracellular processes

Intercellular processes

Protein interactions

Protein function

5

Systems Biology operates here

Results and techniques used

Synthetic Biology operates here

Intracellular processes

Intercellular processes

Protein interactions

Protein function

Results:Constructed a bacteria that produces Paracetamol/Depon

Model of cell communication allows understanding cancer

Techniques:Automatically generating concurrent programs using input-output examples

Big data analysis and abstraction

PART I

Synthesizing models for Systems Biology

7

Synthesizing Systems Biology “Programs”

Synthesizing concurrent programs from examplesPrograms ≡ biological modelsExamples ≡ biological experiments

We assist natural sciences with formal methods • Given experiments, are there other models?• If so, compute a new, disambiguating experiment

Part I: how stem cells coordinate their fates

8

Understanding Diseases

• “Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered.” – from Wikipedia

• Research on cell differentiation helps understanding diseases such as cancer.

9

C. elegans: A Model Organism

Earthworm used in developmental biology.

959 cells; its organs found in other animals.

Differentiation studied on vulval development.

10

Differentiation and then development

into organ parts

Initial division of embryo

Identical precursor cells collaborate to

decide their fate

11

Modeling Goal

What is the mechanism (program) within each cell for deciding fates through communication?

12

Building Blocks of these Programs

Cells contain communicating proteins.

Protein interaction: a protein senses the concentration of other proteins.

Interaction is either activation or inhibition.

A B A B

13

How the Vulval Cells Differentiate

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

let-23lin

-12

sem-5

let-60

mpk-1

lst

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

AnchorCell

high med low

1º 2º 3º

If cells sense the same signalstrength, data races occur.

...

14

How Biologists Discover Interactions

• Measuring protein levels over time is infeasible

• If such “cell tracing” is infeasible, infer protein interaction from end-to-end experiments

• That is, mutate cells observe resulting fates

15

A Mutation Experiment

let-23

lin-1

2

let-23lin

-12

let-23

lin-1

2

AnchorCell

high med low

1º 2º 3º1º

...

16

Putting Experiments Together

Experiment AC lin-12 lin-15 let-23 lst Fate decisions

1 ON ON ON ON ON {332123}

2 ON OFF ON ON ON {331113}

3 ON ON OFF ON ON {112121,122121, 212121}

...

48 ...

No protein is mutated.

lin-12 is turned off. Multiple outcomes observed

Fate of six neighboring cells

Experiments over 35 years by 11 groups

17

How to Build Accurate Models?

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

let-23lin

-12

sem-5

let-60

mpk-1

lst

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

AnchorCell

high med low

1º 2º 3º

Inhibition discoveredby predictive modeling

[Fisher et al. 2007]

...

18

Semantics of the Modeling Language

Cell 1 Cell 2• Program has cells• Non-deterministic outcomes

via schedule interleaving

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

• Cell has proteins• All proteins advance

synchronously

• Proteins have discrete state and update functions.

19

Synthesizing Cellular Programs

20

Synthesis of Programs

specification

biologicalinsight

synthesizer completedprogram

Experiment AC lin-12 lin-15 let-23 lst Fate decisions

1 ON ON ON ON ON {332123}

2 ON OFF ON ON ON {331113}

3 ON ON OFF ON ON {112121,122121, 212121}

...

Given as a partial program

21

Partial ProgramsPartial programs express biological insight:• Which proteins are in the cell• Which proteins may interact

Update functions can be unknown.

let-23

lin-1

2

sem-5

let-60

mpk-1

lst

??

22

Synthesis Algorithm

23

Classical CEGIS

synthesizer

initial input/output setcandidatesolution

SAT

add input-outputcounterexample

SAT UNSATUNSAT

verifier

24

Correctness Condition

Safety: all schedules must lead the program to produce experiment outcomes observed in the wet lab.

∀mutation m. schedule s. P(m, s) E(m)∀ ∈

Completeness: each observed experiment outcome must be reproducible by the program for some schedule.

∀mutation m. fate f E(m). schedule s. P(m, s) = f∀ ∈ ∃

Experiment AC lin-12 lin-15 let-23 lst Fate decisions

1 ON ON ON ON ON {332123}

2 ON OFF ON ON ON {331113}

3 ON ON OFF ON ON {112121, 122121, 212121}

...

25

Counterexample-Guided Inductive Synthesis

inductivesynthesizer

safetyverifier

counterexample:execution P(m, s)

with bad outcome

counterexample:observation (m, f)not reproducible

completenessverifier

candidatecompletion

initial set ofinput/output examples

no candidatecompletion

ok

26

Verifying for Safety

• Safety: ∀mutation m. schedule s. P(m, s) E(m)∀ ∈

• Attempt to disprove by searching for a demonic schedule:∃mutation m. schedule s. P(m, s) E(m)∃ ∉

Unroll over the set ofperformed experiments

Search symbolically fora demonic schedule

27

Verifying for Completeness

• Completeness:∀mutation m. fate f E(m). schedule s. P(m, s) = f∀ ∈ ∃

• Attempt to disprove by showing lack of an angelic schedule for some outcome:∃mutation m. fate f E(m). schedule s. P(m, s) ≠ f∃ ∈ ∀

Unroll over pairs ofmutation and fate

Query symbolically foran angelic schedule

28

Synthesized Models

• We synthesized two models of VPCs.• Input: Partial model that specifies known,

simple protein behaviors.• Output: Synthesized update functions for two

key proteins.

model 1 model 2

29

Additional Algorithms for Going Beyond Synthesis

to Assist Scientists

Does there exist alternative model that

differs on new experiment?

For two or more models within the space, do there

exist disambiguating experiments?

What are the minimal number of experiments that constrain the space

to current model?

PART II

Predicting DNA insertions for Synthetic Biology

• Microbial chemical factories– Sustainable bacterial production of chemicals. E.g., drugs, polymers

• Bacteria as a sensor– Agricultural apps: e.g., sense nitrogen depletion in soil, change color

• Tumor killing bacteria– Sense multiple environmental factors (lower oxygen, high lactic acid)– Invade cancerous cell– Release drug inside cell

31

Synthetic Biology inserts DNAApplications enabled

Does it actually work?

• Jay D. Keasling

– Artemisinin: from Artemisia annua to Yeast– Amyris company: 219M incidence, 300M cure target– 8 years of work; manual insights

• Computationally predicted– Our tylenol E. coli strain Sugar

Tyenol

Incoming chemicals

How do cells manipulate chemicals

Some proteins are enzymes

Output chemicals

Changing the cellular chemical machinery

What happens when we add unnatural/external enzymes?

35

Sugar

Acetaminophen Tylenol

Bio repositoriesData dedup/correlation

+Search

DNA synthesis+ Plasmid

Transformation

Abstractions or rules from data

PolymersNylon etc.Biofuels

Sugar

Opportunity

Current status

+

Rule application to predict

36

Tylenol4-aminophenol4-aminobenzoate

Chorismatepathway

4ABH genefrom Mushroom

Prediction for Tylenol

Tylenol4AP

4ABH gene inserted

38

Abstractions or rules from data

PolymersNylon etc.Biofuels

Sugar

Opportunity

Rule application to predict

Biochemical rules/abstractions?

Biochemical rules/abstractions?

fGraph

transformation function

Operator taking (chemical) graph and transforming it

f’

f’’

fclass* * * *

Open problem 1: Language for chemicals and transforms

C1=CC=CC=C1

3D 2D 1D

XYZ, CML, PDB SMILES, SYBYL

Good for crystallographers

Good for biochemists

Good for computational

storage, retrieval

Alternative new representation

• Transformation representation– encmolecule

– Be able to efficiently compute fclass(encmolecule)

• Conflicting objectives:– Trace-based encoding

• Difficulty at cycle cuts

– True graph encoding• Subgraph isomorphism

Need midway encoding

* * * *

Applying reaction operators

We can now predict new edges: I.e., external enzymes or even new constructed enzymes

Open problem 2: Deriving biochemical programs

• fclass(encmolecule) is a single step

• Function composition to get “pathway programs”– fclass0

(fclass1 (fclass2 (fclass3 (encmolecule))))

• Complications– Path can be “acyclic” not just straight-line– Mixed concrete and rule instantiation

Crux of the research problemIntelligent rule instantiation: ------ edges

Given target

chemical

Phrase it as a model checking reachability problem: initial experiments point to a need for more efficient representation of search space.

Need better search methods

Arbekacin(semi-synthetic)MRSA Drug$20,900/gram

Amikacin (natural)Nothing interesting$118/gram

Eventual targets

Arbekacin(semi-synthetic)MRSA Drug$20,900/gram

Modular decomposition through function summaries

Amikacin (natural)Nothing interesting$118/gram

Acknowledgements

48

49

http://pathway.berkeley.edu

Sugar

Acetaminophen Tylenol

Bio repositoriesData dedup/correlation

+Search

DNA synthesis+ Plasmid

Transformation

Chemical transformation functions from data

Language for biochemistry

Synthesis of biochemical program (i.e., DNA modifications)

Synthesis of internals of transformation functionPolymersNylon etc.Biofuels

Sugar

Opportunity

Current status

+

Backup slides

51

Nylon precursor (polymer)

52

Butanol (fuel)

Architecture

55,000 entries

Pubchem: 53M entries

Uniprot: 23M entries

Organisms:Pubchem

names/Brenda names:

55k