64
David Gilbert: drg@br c.dcs.gla.ac.uk BRC Glasgow 1 Bioinformatics Research Centre University of Glasgow David Gilbert www.brc.dcs.gla.ac.uk Department of Computing Science , University of Glasgow

Bioinformatics Research Centre University of Glasgow

  • Upload
    borka

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics Research Centre University of Glasgow. David Gilbert www.brc.dcs.gla.ac.uk Department of Computing Science , University of Glasgow. Bio informatics. Bio informatics. Bioinformatics. Bioinformatics. Bio - Molecular Biology. Informatics - Computer Science. - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 1

Bioinformatics Research CentreUniversity of Glasgow

David Gilbertwww.brc.dcs.gla.ac.uk

Department of Computing Science, University of Glasgow

Page 2: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 2

Bioinformatics Bioinformatics Bioinformatics Bioinformatics

•Bioinformatics - the study of the application of - molecular biology, computer science, artificial intelligence, statistics and mathematics

- to model, organise, understand and discover interesting information associated with the large scale molecular biology databases,

- to guide assays for biological experiments.

(Computational Biology - USA).

•Bio - Molecular Biology

•Informatics - Computer Science

Page 3: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 3

Bioinformatics in context -a new discipline?

ComputingMaths &

Stats

Lifesciences

PhysicalSciences

?Psychology?

Page 4: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 4

Bioinformatics in context (applications)

Page 5: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 5

How can we analyse the flood of data ?Data: don't just store it, analyze it ! By comparing

sequences, one can find out about things like

• How organisms are related & evolution

• How proteins function

• Population variability

• How diseases occur

Page 6: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 6

Separating sheep from goats...

Page 7: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 7

Dirty data?

Big Horn Sheep [Ovis canadensis]The Big Horn Sheep [Ovis canadensis] is a large North American species with a brown coat, which turns to bluish-grey in winter. It is so named from the size of the horns of the ram, which often measure over 1 m/3.3 ft round the curve.Classification: Ovis canadensis is in family Bovidae, order Artiodactyla

Page 8: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 8

Data, information, knowledge … • data : nucleotide sequence

• information : where are the “genes”.

Found using classifier, pattern, rule which has been mined/discovered

• knowledge : facts and rules

If a gene X has a weak psi-blast assignment to a function F

–and that gene is in an expression cluster –and sufficient members of that cluster are known to have function F, then believe assignment of F to X.

gene

TATA boxTermination

(stop)

start

controlstatement

controlstatement

Page 9: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 9

Some projects at theBioinformatics Research Centre

Page 10: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 10

Page 11: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 11

Rat-Mouse-Human

Page 12: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 12

IndexingEla Hunt [email protected]

• String indexing structures can be used to index DNA, proteins, XML and phylogenetic trees

• All data is read once, index in created on disk

• Index reduces the search space of the query (we read a % of disk only)

Page 13: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 13

Distributed databases and computation Cardiovascular Functional Genomics

• -£5.4 million project, 5 UK Universities: Glasgow, Leicester, Edinburgh, Oxford, Imperial; + Maastricht

• Led by Clinicians

• Combined studies: – scientific models of disease (Rat)

– parallel studies of patients

– large family and population DNA collections

• 3 pronged approach– Targeted transcript sequencing

– Microarray gene expression profiling

– Comparative genome analysis.

• Data generated at each of the 5 sites & made available for analysis:

• Issues of distributed data and computation.

• Mapping gene sequences Rat Mouse Human– an added layer of complexity in the computation.

Page 14: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 14

Wellcome Trust: Cardiovascular Functional Genomics

Glasgow Edinburgh

Leicester

Oxford

LondonNetherlands

Shared dataPublic curated

data

Page 15: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 15

BRIDGES: BioMedical Research Informatics

Delivered by Grid Enabled Services • National e-Science Centre, Bioinformatics Research Centre, IBM UK Life Sciences

• Incrementally develop and explore database integration over 6 geographically distributed research sites within the framework of the large Wellcome Trust biomedical research project Cardiovascular Functional Genomics.

• Three classes of integration will be developed to support a sophisticated bioinformatics infrastructure supporting:

– data sources (both public and project generated),

– bioinformatics analysis and visualisation tools,

– research activities combining shared and private data.

• The inclusion of patient records and animal experiment data means that privacy and access control are particular concerns.

• An exploration of index factories accelerating sequence processing will test the hypothesis that the Grid makes a new class of e-Science indexes feasible. Both OGSA-DAI and IBM DiscoveryLink

technology will be employed and a report will identify how each performed in this context.

Page 16: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 16

Functional GenomicsFunctional Genomics

~44,000GENES

~44,000GENES ~33% OF GENES HAVE

UNKNOWN FUNCTION

~33% OF GENES HAVE UNKNOWN FUNCTION

Page 17: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 17

Solution……• Solve the problem of the twilight zone (sequence

alignments below 30% sequence identity)• How?• Predict protein function using an alternative method to

BLAST:• Predict protein functional class from sequence, structural

and phylogenetic features using machine learning• Combination of these (computationally and statistically)

would provide the biologists like yourselves with the most accurate functional prediction of proteins that fall in the twilight zone.

Ali Al-ShahibChao He, Mark Girolami

Page 18: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 18

Locating genome duplicationsQ: did one or more genome-wide events affect all gene families?

Lamprey

Mouse

Mouse

Human

Human

gene duplication

Lamprey

Mouse

Human

Reptiles + Birds

Lungfish

Teleosts

Sharks & Rays

happened somewhere here

Molecular Evolution: A Phylogenetic Approach Rod [email protected]

Page 19: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 19

TOPSProtein

topology

David Gilbert, Juris Viksna,

Gilleain Torrance (BRC, Glasgow),

David Westhead and Ioannis Michalopoulos

(Leeds)BBSRC/EPSRC funded

Page 20: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 20

Pattern search: TIM Barrel

Page 21: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 21

Structure comparison

2bop (probe)

against

(subset of) CATH

Page 22: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 22

TOPS comparison server: www.tops.leeds.ac.uk

PDB file

TOPS diagram (graph)

Matches to motif library

(v.fast)

Pairwise comparison to structures in

database

(slower)

Page 23: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 23

Protein designDesign of a Novel Globular Protein Fold with Atomic-Level Accuracy

Brian Kuhlman,1 Gautam Dantas,1 Gregory C. Ireton,4 Gabriele Varani,1,2 Barry L. Stoddard,4 David Baker1,3

“A major challenge of computational protein design is the creation of novel proteins with arbitrarily chosen three-dimensional structures.

Here, we used a general computational strategy that iterates between sequence design and structure prediction to design a 93-residue /ß protein called Top7 with a novel sequence and topology.

Top7 was found experimentally to be folded and extremely stable, and the x-ray crystal structure of Top7 is similar (root mean square deviation equals 1.2 angstroms) to the design model.

The ability to design a new protein fold makes possible the exploration of the large regions of the protein universe not yet observed in nature.”1 Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.2 Department of Chemistry, University of Washington, Seattle, WA 98195, USA.3 Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.4 Division of Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA

Science. 2003 Nov 21;302(5649):1364-8

Page 24: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 24

Protein design

Generation of starting models.

“The target structure for the de novo design process can range from a detailed backbone model to a back-of-the-envelope sketch.”

“Because we aimed to create a novel protein fold,we selected a topology not present in the PDB according to the Topology of Protein Structure (TOPS) server (17).”

Page 25: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 25

User = [email protected] at 20:29:51 on 3/06/03Structure code = top7atype = PDB (user declared), Database = atlasDetails of sheets etc (including all connected SSEs): Sheet: [6,7,4,1,2]======================================================Domain Code RankComparison time : 43 sectop7a target_query 01bbi00 4.10.100.10.1 71pi200 4.10.100.10.1 71sro00 2.40.29.10.1 71atx00 2.20.20.10.1 92sh100 2.20.20.10.1 91vcc00 3.30.66.10.1 111hpm02 3.10.140.10.1 121csp00 2.40.50.40.1 132snv01 2.40.10.20.3 133tss02 2.40.50.50.3 131bcpF0 2.40.50.50.2 141bovA0 2.40.50.30.2 141tle00 2.10.25.10.1 141cdb00 2.60.40.10.1 151ckmA3 4.10.87.10.1 151kxf01 2.40.10.20.3 151svpA1 2.40.10.20.3 152pkaX0 2.40.10.20.1 151apo00 2.10.25.10.6 161ate00 2.10.40.10.1 161aww00 2.30.30.10.1 161cuk01 2.40.50.80.1 16

Use of TOPS for protein design

Top7a NEEheEC 1:2A 1:4A 2:4R 4:6R 4:7A 6:7A 1:4R 4:6R

Page 26: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 26

Use of TOPS for protein design

Page 27: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 27

Systems biology – some definitions

• Systems biology is the study of all the elements in a biological system (all genes, mRNAs, proteins, etc) and their relationships one to another in response to perturbations.

• Systems approaches attempt to study the behaviour of all of the elements in a system and relate these behaviours to the systems or emergent properties

Page 28: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 28

A Framework for Systems Biology(Ideker, Galitski & Hood, 2001)

• Define all of the components of the system

• Systematically perturb and monitor components of the system

• Reconcile the experimentally observed responses with those predicted by the model

• Design and perform new perturbation experiments to

distinguish between multiple or competing model hypotheses

Page 29: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 29

New database technologies for storing the output from high-throughput biological experiments

Andrew Jones

• Proteomics – study the set of proteins expressed in a sample

• Complex, variable output:• High-Resolution images• Numerical data generated by lab. equipment

and software• Human Annotation

• The data is not suitable for storage in a standard relational database

• Storage, retrieval and exchange of data is important• XML (Extensible Markup Language) is being

investigated for storing such data

Page 30: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 30

• Maintained by National Library of Medicine

• Free of charge, since 1997

• > 10 million references since 1971

• > 4000 biomedical journals

• > 80% in English• > 80% have an abstract

"Biochemical Network Data Mined from Scientific Texts"Te Ren (PhD student)with CXR Biosciences.

Page 31: Bioinformatics Research Centre University of Glasgow

Data complexityMethionine Biosynthesis in E.coli

L-aspartate

L-Aspartate-4-P

2.7.2.4

1.2.1.11

L-Homoserine

L-Aspartate semialdehyde

1.1.1.3

aspartate biosynth.aspartate biosynth.

aplha-succinyl-L-Homoserine

2.3.1.46

4.2.99.9

Homocysteine

Cystathionine

4.4.1.8

L-Methionine

2.1.1.13

2.5.1.6

L-Adenosyl-L-Methionine

2.1.1.14

AporepressorAporepressor

metJmetJ

codes for

is part ofis part of

is part ofis part of inhibitsinhibits

inhibitsinhibits

lysine biosynth.lysine biosynth.

threonine biosynth.threonine biosynth.

asdasd aspartate semialdehyde deshydrogenaseaspartate semialdehyde deshydrogenase

codes for catalyzescatalyzes

metAmetA homoserine-O-succinyltransferase

codes for catalyzescatalyzes

homoserine-O-succinyltransferase

catalyzes

cystathionine-gamma-synthasecystathionine-gamma-synthase

codes for catalyzes

metCmetC cystathionine-beta-lyasecystathionine-beta-lyase

codes for catalyzescatalyzes

metEmetECobalamin-independent homocysteine transmethylaseCobalamin-independent homocysteine transmethylase

codes for catalyzescatalyzes

codes for catalyzescatalyzes

Cobalamin-dependent homocysteine transmethylaseCobalamin-dependent homocysteine transmethylasemetHmetH

metRmetR

codes for

metR activatormetR activator

up-regulatesup-regulatesup-regulates

repressesrepresses

repressesrepresses

repressesrepresses

aspartate kinase II/homoserine dehydrogenase IIaspartate kinase II/homoserine dehydrogenase II

codes for catalyzescatalyzes

catalyzescatalyzes

repressesrepresses

repressesrepresses

ATPATP

ADPADP

NADPH; H+NADPH; H+

NADP+; PiNADP+; Pi

NADPH;H+NADPH;H+

NADP+NADP+

Succinyl SCoASuccinyl SCoA

HSCoAHSCoA

L-CysteineL-Cysteine

SuccinateSuccinate

H2OH2O

Pyruvate; NH4+Pyruvate; NH4+

5-Methyl THF5-Methyl THF

THFTHF

2.7.2.4

1.2.1.11

1.1.1.3

2.3.1.46

4.2.99.9

4.4.1.8

2.1.1.14 2.1.1.13up-regulates

ATPATP

Pi; PPiPi; PPi

2.5.1.6

expression

expression

expression

expression

expression

expression

expression

expression

expression

metB

metL

metBL operonmetBL operon

metB

metL

represses

Holorepressor

Page 32: Bioinformatics Research Centre University of Glasgow

Biochemical networks

• Pathway navigation

• Pathway comparison

• Pathway motif discovery

• Pathway simulation

• High-level abstraction inferred from low-level descriptions

• Novel pathways from gene expression experiments

DNA chip experiment

Transcription profiles

ClusteringClusters of

co-regulated genes

Functional meaning ?

Pathway extractionin metabolic reaction graph

Putative metabolic pathways

Matching againstmetabolic pathway

database

Known pathways

Novel pathways

Visualization

Page 33: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 33

A Software System forPattern Matching and Motif Discovery

in Biochemical NetworksSebastian Oehm

[email protected]

• Design a suitable data model using bipartite graphs• Define patterns and develop algorithms for pattern

matching in biochemical networks• Define pathway motifs and develop algorithms for

motif searching in biochemical networks• Develop algorithms for automated motif discovery• Develop algorithms to search for the largest common

part of two or more biochemical networks• Develop a measure of similarity for pathway

comparison

L-aspartyl-4-P

L-Aspartate

L-Homoserine

Homocysteine

L-Methionine

S-Adenosyl-L-Methionine

L-aspartic semialdehyde

1.1.1.3

2.7.2.4

2.1.1.14

2.5.1.6

1.2.1.11

L-aspartyl-4-P

L-Aspartate

L-Homoserine

Homocysteine

L-Methionine

S-Adenosyl-L-Methionine

L-aspartic semialdehyde

1.1.1.3

2.7.2.4

2.1.1.14

2.5.1.6

1.2.1.11

S.cerevisiae E.coli

O-acetyl-homoserine

2.3.1.31

4.2.99.10

Alpha-succinyl-L-Homoserine

Cystathionine

2.3.1.46

4.2.99.9

4.4.1.8

Page 34: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 34

Biochemical Pathway Simulator A Software Tool for Simulation &

Analysis of Biochemical Networks

Muffy Calder David Gilbert

Walter Kolch Keith van Rijsbergen

Brian Ross Oliver Sturm

DTI ‘Beacon’ project, £0.9M, 4 years

Page 35: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 35

Not a toy problem!

Experimental Data Analysis

Page 36: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 36

Complexity: real bioinformatics

Closing the loop from wet lab to in-silico

LabMAPK

LiteratureApoptosis Database

Apoptosis

DatabaseMAPK

Simulator

Analysis

Rules

DA

TA

PathwayEditor Use

r In

terf

ace

BioLab/Literature

BioinformaticsTools, database, interface

SimulatorConcurrency theory

Human feedback (in-the-loop)

Text miner

Abstract model

Web

por

tal

Mitogens Growth factors

Receptorreceptor

Ras

Raf

P PP

P

MEKP

ERK

P P

cytoplasmic substrates

ElkSAP Gene

Mitogens Growth factors

Receptorreceptor

Ras

Raf

P PP

P

MEKP

ERK

P P

Mitogens Growth factors

Receptorreceptor

Mitogens Growth factors

Receptor

Mitogens Growth factors

ReceptorReceptorreceptor

RasRasRasRas

Raf

P PP

P

MEKP

ERK

P P

Raf

P PP

P

MEKP

ERK

P P

cytoplasmic substrates

ElkSAP Gene

cytoplasmic substrates

ElkSAP Gene

ElkSAP Gene

Page 37: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 37

Proliferation (Cell division) vs Differentiation (Neurite outgrowth)in PC12 cell model

NGF (50 ng/ml)Differentiation into

nerve cell type

EGF (50 ng/ml)

Proliferation

neurite outgrowthcell division stimulated withoutneurite outgrowth

Page 38: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 38

Dynamic Behaviour of the Network

MEK1,2

ERK1,2

Ras

Receptor

Raf-1

Raf-1 is expressed in allcells, and its activationinduces ERK activation

MEK1,2

ERK1,2

RascAMP

PKA

Receptor

Raf-1

Many receptors that activate ERKalso elevate cAMP levels leadingto activation of PKA. PKA inhibits Raf-1 and blocks ERK activation

MEK1,2

ERK1,2

B-Raf

RascAMP

PKA

Receptor

Raf-1

However, cAMP induces activationof B-raf. In cells which expressB-raf, cAMP activates the ERK

pathway despite of Raf-1 inhibition.

Cell growth Growth arrest Cell growth

Page 39: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 39

Page 40: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 40

Mobility

Sometimes a signal sent in a communications network can change the connections or topology of that network. In the example below, a cell-phone is being carried out of range of Cell 1. The base station must send the frequency of the appropriate new Cell (Cell 2) to the phone. The phone connects to Cell 2 and discards its previous link to Cell 1.

Base Base

Cell 1Cell 2

Cell 2Frequency

Cell 2Frequency Conversation Conversation

Conversation Conversation

Page 41: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 41

GDP

Ras

SoSSoSGDP

GTP

GTP

Ras Raf

In biochemical networks, a protein can be granted or denied the opportunity to interact with certain other molecules by exchange factors, effectively changing the network topology dynamically. In the example below, the protein Ras is bound to a molecule of GDP, which renders Ras inactive. A molecule of SoS can interact with this Ras-GDP complex, causing the GDP to be exchanged for GTP. The Ras-GTP complex is active, permitting interaction with the protein Raf.

Page 42: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 42

ExtractedLit. Data

Reusable Subcomponents of a Solution forOffline Integration of 3rd party Databases

• By-products of the total process may correspond to other reusable sub-services

– Schema Translation – various schema definition langs are translated into one common, interpretable schema lang.

– Record Matching – builds a cross reference index that identifies records about a “same entity” and records the source and location of the matching records. Two or more records may match.

aMaze DB

MAPKsource data

cAMP PKsource data

IntegratedDatabase

Integrator

InputSchemas

DefaultValues

ConflictResolution

Rules

RecordMatching

Rules

RecordMatcher

Cross-refIndex

SchemaTranslator

Trans LocalSchemas

RecordMerger

Target Schema

Page 43: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 43

Validation

Drug target discovery: What is a good drug target? How do we select it?

Drug target validation: Does hitting the target change the biological response?

Side effects: What else is affected when the selected target is hit?

Lead Compound Selection: Which compounds should be taken further for development. What properties should the drug have?

Current Bottlenecks in Drug Development

Page 44: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 44

Validation

Drug target discovery: What is a good drug target? How do we select it?

Drug target validation: Does hitting the target change the biological response?

Side effects: What else is affected when the selected target is hit?

Lead Compound Selection: Which compounds should be taken further for development. What properties should the drug have?

Current Bottlenecks in Drug Development

EMPIRICAL

SLOW

EXPENSIVE

Page 45: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 45

Validation

A robust Pathway Simulation Software can help to …

Drug target discovery: What is a good drug target? How do we select it?

Drug target validation: Does hitting the target change the biological response?

Side effects: What else is affected when the selected target is hit?

Lead Compound Selection: Which compounds should be taken further for development. What properties should the drug have?

Current Bottlenecks in Drug Development

Select targets by defining its topology & function in the regulatory networks.

Validate the target by predicting how the biological response should change.

Predict side effects to allow early and targeted testing.

Predict the optimal drug profile to improve selection criteria.

Page 46: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 46

Validation

What we propose …

Rap B-rafRap B-raf

Ras Raf-1Ras Raf-1

EGF

proliferation

EGFEGFEGF

proliferationproliferation

MEK ERKMEK ERK

Transient ERK activity

Transient ERK activity

NGF

differentiation

NGFNGF

differentiationdifferentiationSustained ERK activitySustained ERK activity

PC12 cell model of neuronal differentiation

Target Validation: Predict & test the effect of Raf-1 and B-Raf inhibitors to the biological response to EGF vs. NGF.

Lead Compound Selection: Predict & test which inhibitory efficacy is necessary and sufficient to achieve the desired biological response.

Page 47: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 47

Nanofab &cell culture

Bioinformatics

Fab methodology

Model of cell behaviour

External databases

Other pathway data

Measured cell

behaviour

Morphology

Proteome

Dynamic behaviour

Adhesion

Gene expression

Cell shape

Physical substrate

Biochemical environment (other cells + biochemicals)

Genetic engineering

Bionanotechnology & Bioinformatics

Page 48: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 48

Machine Learning for Bioinformatics• Classification• Clustering• Characterisation

• Techniques:– ensemble methods– decision trees– inductive logic programming– pattern discovery– Statistical approaches– SVMs

Page 49: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 49

Cancer Classification Problem

ALL acute lymphoblastic leukemia

(lymphoid precursors)

AML acute myeloid leukemia

(myeloid precursor)

(Golub et al 1999)

Page 50: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 50

Machine Learning Approach

Machine Learning

Classifier

C4.5SVMk-NNANN

Gene Expression

ProfilesALL AML ALL AML

Page 51: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 51

Biological Data: Distributed and Heterogeneous!!

LPSYVDWRSA GAVVDIKSQG ECGGCWAFSA IATVEGINKI TSGSLISLSE QELIDCGRTQ NTRGCDGGYI TDGFQFIIND GGINTEENYP YTAQDGDCDV

Sequence Structure FunctionProtein

Gene expression Morphology

Microarray analysis

Page 52: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 52

Integrative Machine Learning

(Pratt Emotif)

Aik Choon Tan

Page 53: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 53

What kind of computational approaches do we use?• Operations over

– sequences (match)

– trees (e.g. suffix trees, supertree, joining, ...)

– graphs (sub-graph isomorphism, maximal common subgraph, path searching)

• Data modelling, databases, data conversion

• Machine learning, knowledge discovery, pattern discovery,...

• Clustering

• Theorem proving, concurrency analysis,…

• Integration: data, knowledge

• Data visualisation

• Web services, Grid, Coarse Grain parallelism, eScience,...

Page 54: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 54

Latest from BRC• New Systems Biology lab (March 9)

• Web services, www.brc.dcs.gla.ac.uk

• Research teams: Databases & Visualisation (Ela Hunt)Grid & eScience (Richard Sinnott)Functional genomics (David Leader)Machine learning (Mark Girolami)Structural bioinformatics (Pawel Herzyk)Systems biology (David Gilbert)

• Teaching: MScIT Bioinformatics Strand

Page 55: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 55

BRC members• Investigators:

– Yves Deville (Biochemical Networks) dcs– David Gilbert (Systems biololgy, Protein structure) dcs– Mark Girolomi (Machine learning) dcs– Pawel Herzyk (Protein structure) ibls– Ela Hunt (Database indexing, Data integration, Visualisation,…) dcs– David Leader (Visualisation tools) ibls– Gerhard May (Signalling pathways) ibls– Rod Page (Phylogenetic trees) ibls– Richard Sinnott (Grid computing / eScience) dcs– Juris Viksna (Graph algorithms) dcs

• Research Assistants: Micha Bayer, Rainer Breitling, Neil Hanlon, Derek Houghton, Richard Orton, Evangelos Pafilis, Oliver Sturm, Gilleain Torrance

• Research students: Ali Al-Shahib, David Cook, Iain Darroch, Amelie Gormand, Susan Fairley, Robert Japp, Andrew Jones, Julie Morrison, Te Ren, Aik Choon Tan, Tim Troup, Mallika Veeramalai

• Executive Assistant: Margaret Jackson • Associated: Malcolm Atkinson, Ernst Wit, John McClure, Mathis Riehle, Des Higham, Oliver

Sand

Page 56: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 56

Funding sources

EPSRCBBSRCMRC

Wellcome TrustDTI

Scottish EnterpriseSynergy

Carnegie TrustRoyal Society

Daiwa FoundationSHEFCE

EU

Page 57: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 57

Scottish Bioinformatics Forum

• Network of Bioinformatics researchers and industries in Scotland• A vehicle for developing Scotland as a Centre of Bioinformatics

Excellence• Nodes in Glasgow, Edinburgh, Dundee, Aberdeen, ...• Promoting collaborative research• Development of a Bioinformatics educational programme• www.sbforum.org, [email protected]

Visionary Meeting, 27 May (Zoology Building)Keynote : Prof Thornton

Director of the European Bioinformatics Centrewww.brc.dcs.gla.ac.uk/events.html

Page 58: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 58

Page 59: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 59

Bioinformatics Research CentreDavidson Building: 15 workstations + visitors’ facilities

Webserver

Fileserver

Unix Appserver

Microsoft App server

ClusterScotgrid+

2x100 CPU5 TB

Boyd-Orr Building(backup)

17 Lilybank Gardens

fire

wal

l

KelvinBuilding

Sun GridEngine

Databaseserver

3TB1TB

Page 60: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 60

www.brc.dcs.gla.ac.uk

Page 61: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 61

Where we are

Department of Computing Science

BRC (in Davidson Building)

BRC & Functional Genomics(Joseph Black)

Functional Genomics; Centre for Cell Engineering

Medicine & Theraputics

Vet School BeatsonInstitute

NeSC Hub

Page 62: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 62

BRC location

Page 63: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 63

Bioinformatics Research centre (230m2)G

ard

iner

lab

(w

et la

b)

Visitors’area

Visitors’area

Page 64: Bioinformatics Research Centre University of Glasgow

David Gilbert: [email protected] BRC Glasgow 64

The Future

Closing the loop from wet lab to in silico !

www.brc.dcs.gla.ac.uk

Collaboration!