52
Ross D. King Department of Computer Science Aberystwyth University [email protected] Automating Biology using Robot Scientists

Ross D. King Department of Computer Science Aberystwyth University [email protected]

  • Upload
    brit

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Automating Biology using Robot Scientists . Ross D. King Department of Computer Science Aberystwyth University [email protected]. Automating Biology. The Concept of a Robot Scientist. - PowerPoint PPT Presentation

Citation preview

Page 1: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Ross D. KingDepartment of Computer Science

Aberystwyth University

[email protected]

Automating Biology using Robot Scientists

Page 2: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Automating Biology

Page 3: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Concept of a Robot Scientist

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment selection Robot

Experiment Results Interpretation

We have developed the first computer system that is capable of originating its own experiments, physically doing them,

interpreting the results, and then repeating the cycle.

Hypothesis Formation

Page 4: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Motivation: Philosophical

What is Science?

The question whether it is possible to automate the scientific discovery process seems to me central to understanding science.

There is a strong philosophical position which holds that we do not fully understand a phenomenon unless we can make a machine which reproduces it.

“What I cannot create, I do not understand” (Richard Feynman).

Page 5: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Motivation: AI Science is an excellent test bed for AI

Science v Chess– abstract world: 64 squares, 36 pieces.– Computers play chess as well or better than the best

humans, and computers can now make strikingly beautiful moves.

Science v General Intelligence– Abstraction– Nature is honest

Page 6: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Motivation: Technological

In many areas of science our ability to generate data is outstripping our ability to analyse it.

One scientific area where this is true is Systems Biology, where data is now being generated on an industrial scale.

The analysis of scientific data needs to become as industrialised as its generation.

Page 7: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Technological Advantages Robot Scientists have the potential to increase the productivity

of science– by enabling the high-throughput testing of hypotheses.

Robot Scientists have the potential to improve the quality of science – by enabling the description of experiments in greater detail

and semantic clarity.

Page 8: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Scientific Discovery

Meta-Dendral: Analyis of mass-spectrometry data. Feigenbaum, Djerassi, Lederburg (1969).

Bacon: Rediscovering physics and chemistry. Langley, Bradshaw, Simon (1979).

Automated discovery in a chemistry laboratory. Zytkow, Zhu, Hussman (1990).

Page 9: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Robot Scientist Timeline 1999-2004 Initial Robot Scientist Project

– Limited Hardware– Collaboration with Douglas Kell (Aber Biology), Steve

Oliver (Manchester), Stephen Muggleton (Imperial)King et al. (2004) Nature, 427, 247-252

2004-2011 Adam Project– Sophisticated Laboratory Automation– Collaboration with Steve Oliver (Cambridge).King et al. (2009) Science, 324, 85-89

2008-2011 Eve Project

Page 10: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam

Page 11: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 12: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Application Domain

Functional genomics In yeast (S. cerevisiae) ~15% of the 6,000 genes still have

no known function. EUROFAN 2 made all viable single deletant strains. Task to determine the “function” of a gene by growth

experiments.

Page 13: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Logical Cell Model

We have developed a logical formalism for modelling metabolic pathways (encoded in Prolog). This is essentially a directed labeled hyper-graph: with metabolites as nodes and enzymes as arcs.

If a path can be found from cell inputs (metabolites in the growth medium) to all the cell outputs (essential compounds) then the cell can grow.

Page 14: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

ß

Page 15: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Genome Scale Model of Yeast Metabolism

It covers most of what is known about yeast metabolism.

Includes 1,166 ORFs (940 known, 226 inferred).

Growth if path from growth medium to defined end-points.

State-of-the-art accuracy in predicting cell viability.

Now integrated with Yeast 4.0.

Page 16: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 17: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Locally Orphan Enzymes

Adam’s model of yeast metabolism has “locally orphan enzymes” these catalyse biochemical reactions known to be in yeast, but for which the coding genes are unknown.

Adam uses bioinformatic methods to abduce genes which could encode these orphan enzymes - hypotheses.

Page 18: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

BioinformaticsDatabase ?

Model ofMetabolismHypothesis

Formation

ExperimentFormation

Gene Identification

REACTION

Experiment

+FASTA32

PSI-BLAST

Automated Model Completion

Deductionorthologous(Gene1, Gene2) → similar_sequence(Gene1, Gene2). Abductionsimilar_sequence(Gene1, Gene2) → orthologous(Gene1, Gene2).

Page 19: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

ß

?

Page 20: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 21: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Form of the Experiments Hypothesis 1: Gene YER152C codes for the enzyme the

reaction: chorismate prephenate. Hypothesis 2: Gene YGL060W codes for the enzyme the

reaction: chorismate prephenate.

These can be tested by:– Growing YER152C in environment +/- prephenate.– Growing YGL060W in environment +/- prephenate.

Page 22: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

ß

?

Page 23: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 24: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam One of the most sophisticated pieces of laboratory

automation in the world.

Designed to fully automate yeast growth experiments.

Has a -20C freezer, 3 incubators, 2 readers, 3 liquid handlers, 3 robotic arms, 2 robot tracks, a centrifuge, a washer, an environmental control system, etc.

Is capable of initiating ~1,000 new experiments and >200,000 observations per day in a continuous cycle.

Page 25: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

LIMS Setup

Page 26: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Diagram of Adam

Page 27: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam

Page 28: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam in Action

Page 29: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk
Page 30: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Growth plates

Page 31: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Example Growth CurvesArginine as N source

Page 32: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Growth curves

OD

Time

OD OD

OD

Time

TimeTime

Page 33: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 34: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Qualitative to Quantitative

The functions of most genes in S. cerevisiae that when deleted result in auxotrophy (no growth) have already been discovered.

Most genes of unknown function only affect growth quantitatively.

They may have slower growth (bradytrophs), faster growth, higher/lower biomass yield, etc.

Page 35: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

The Experimental Cycle

Background Knowledge

Analysis

ConsistentHypotheses

Final Theory Experiment(s) selection Robot

Experiments(s)

ResultsInterpretation

Hypothesis Formation

Page 36: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Discovery of Novel Science

Page 37: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Novel Science

Adam generated and confirmed twelve novel functional-genomics hypotheses concerning the identify of genes encoding enzymes catalysing orphan reactions in the metabolic network of the yeast S. cerevisiae.

Adam's conclusions have been manually verified using bioinformatic and biochemical evidence.

Page 38: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Novel Scientific Knowledge

Page 39: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

A 50 Year Old Puzzle

The enzyme 2-aminoadipate: 2-oxoglutarate aminotransferase was a locally orphan enzyme.

It is in the lysine biosynthesis pathway which has been studied for 50 years in fungi: target for antibiotics, and on path to penicillin.

Adam formed three hypotheses for the gene to encode this enzyme: YER152C, YJL060W, and YGL202W (in that order of probability).

Currently KEGG states that YGL202W is the gene. Evidence from 1960’s that at least 2 isoenzymes are involved.

Page 40: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Confirmed New Knowledge

Adam’s differential growth experiments were consistent with all three genes encoding 2-oxoglutarate aminotransferase.

Manual experiments: purified protein + enzyme assays are consistent.– YGL202W literature confirmed. – YJL060W (was annotated as an arylformamidase, new

(08) annotation kynurenine aminotransferase)– YER152C (currently not annotated)

YGL202W & YJL060W double knockout is lethal

Page 41: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam’s current work

Page 42: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Learning Systems Biology

The goal is to get Adam to automatically improve its state-of-the-art FBA model by designing new experiments.

The experiments are “perturbation” growth experiment – remove gene(s)– add metabolite.

Combines ideas from logical and FBA modelling

Page 43: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Overview

Page 44: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Formalising Biology

Page 45: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Formalization of Science The goal of science is to increase our knowledge of the

natural world through the performance of experiments.

This knowledge should, ideally, be expressed in a formal logical language.

Formal languages promote semantic clarity, which in turn supports the free exchange of scientific knowledge and simplifies scientific reasoning.

Page 46: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Robot Scientist & Formalisation Robot Scientists provide excellent test-beds for the

development of methodologies for formalising science.

Using them it is possible to completely capture and digitally curate all aspects of the scientific process.

The ontology LABORS is designed to enable the open access of the Robot Scientist experimental data and metadata to the scientific community.

Soldatova et al. (2006) Bioinformatics

Page 47: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Adam’s Investigations

This formalisation involves >10,000 different research units in a nested tree-like structure 11 levels deep.

It logically connects >6.6 million OD600nm measurements to hypotheses, experimental goals, results, etc.

No previous large-scale experimental work has been so comprehensively described and recorded.

Page 48: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk
Page 49: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Levels in the Formalisation

Investigation into the automation of Science

Investigation into the automation of novel science

Investigation into the automated discovery of genes encoding orphan enzymes

Automated study of E.C.2.6.1.39 encoding

Cycle 1 of automated study of YER152C function

YER152C and Lysine automated trial

Experiment 1 (wild-type no metabolite)

Replicate 1 (well)

Observation 1

Page 50: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

automated study: automated study of yer152c_function

has domain of study: functional genomics

has investigator = robot scientist Adam

has goal: 'To test the hypothesis that gene YER152C encodes an enzyme with enzyme class E.C.2.6.1.39'.

has organism of study: Saccharomyces Cerevisiae

has ncbi taxonomy ID: 4932

has hypotheses-set:

has research hypothesis 1: encodes(yer152c,ec_2_6_1_39)

has negative hypothesis 2: not encodes(yer152c,ec_2_6_1_39)

has cycle 1 of study

has study result: the strength of evidence that encodes(yer152c,ec_2_6_1_39):

highest accuracy of random forest evidence: 74%

proportion of random forest evidence >=70%: 2/3

has study conclusion: hypothesis 1 confirmed

automated study of yer152c function

a:automated study(X) :- a:automated_study_of_yer152c_function.a:hypotheses-set(X) :- a:research_hypothesis(X). a:cycle_of_study(X) :- a:cycle_1_of_study_(X). a:hypotheses-set(X) :- a:negative_hypothesis(X). a:domain_of_study(Y) :- a: automated study(X), a:has_ domain_of_study(X,Y).a:investigator(Y) :- a: automated study(X), a:has_ investigator(X,Y).a:goal(Y) :- a: automated study(X), a:has_goal(X,Y).a:organism_of_study (Y) :- a: automated study(X), a:has_organism_of_study(X,Y).a:hypotheses-set(Y) :- a: automated study(X), a:has_hypotheses-set(X,Y).a:cycle_of_study(Y) :- a: automated study(X), a:has_cycle_of_study(X,Y).a:study_result(Y) :- a: automated study(X), a:has_study_result(X,Y).a:study_conclusion(Y) :- a: automated study(X), a:has_study_conclusion(X,Y).a:domain_of_study(X) :- a:functional_genomics.a: investigator(X) :- a:adam.a:goal(X) :- a: to_test_the_hypothesis_that_gene_YER152C_encodes_an_enzyme_with_enzyme_class_E_C_2_6_1_39.a:organism_of_study(X) :- a:saccharomyces_cerevisiae.a:study_result(X) :- a:the_strength_of_evidence_of_hyypothesis_1.a:study_conclusion(X) :- a:hypothesis_1_confirmed.

<?xml version="1.0"?><rdf:RDF xmlns="http://www.owl-ontologies.com/Ontology1204198571.owl#"<owl:Class rdf:ID="goal"/> <owl:Class rdf:ID="study_result"/> <owl:Class rdf:ID="ncbi_taxonomy_ID"/> <owl:Class rdf:ID="cycle_of_study"/> <owl:Class rdf:ID="negative_hypothesis"> <rdfs:subClassOf> <owl:Class rdf:ID="hypotheses-set"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:ID="domain_of_study"/> <owl:Class rdf:ID="organism_of_study"/> <owl:Class rdf:ID="cycle_1_of_study_"> <rdfs:subClassOf rdf:resource="#cycle_of_study"/> </owl:Class> <owl:Class rdf:ID="automated_study"> <rdfs:subClassOf> <owl:Restriction> <owl:someValuesFrom rdf:resource="#goal"/> <owl:onProperty> <owl:ObjectProperty rdf:ID="has_goal"/> </owl:onProperty> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:someValuesFrom rdf:resource="#organism_of_study"/> <owl:onProperty> <owl:ObjectProperty rdf:ID="has_organism_of_study"/>……………………………………………………….

has text representation:

has datalog representation: has OWL representation:

Page 51: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Conclusions Automation is becoming increasingly important in scientific

research e.g. DNA sequencing, drug design.

The Robot Scientist concept represents the logical next step in scientific automation.

The Robot Scientist Adam is the first machine to have discovered novel scientific knowledge.

Scientific knowledge should be formalised to ensure the comprehensibility, reproducibility, and free exchange of knowledge with human and robot scientists.

Page 52: Ross D. King Department of Computer Science Aberystwyth University rdk@aber.ac.uk

Acknowledgments

ABERYSTWYTHWayne Aubrey

Amanda Clare

Douglas Kell

Maria Liakata

Chuan Lu

Magda Markham

Katherine Martin

Ronald Pateman

Jem Rowland

Andrew Sparkes

Larisa Soldatova

Mike Young

Ken Whelan

CAMBRIDGE

Elizabeth Bilsland

Steve Oliver

Pınar Pir