45
From Laboratory to e-Laboratory? Introduction for ‘Lab-J’ of the LUMC Human Genetics Department Marco Roos Acknowledging the colleagues from BioSemantics, myGrid, OMII- UK, AID, The LUMC BioInformatics Expertise Centre

From Laboratory to e-Laboratory

  • View
    2.460

  • Download
    2

Embed Size (px)

DESCRIPTION

Presentation for Lab-J of the Human Genetics Department at the Leiden University Medical Centre.

Citation preview

Page 1: From Laboratory to e-Laboratory

From Laboratory to e-Laboratory?

Introduction for ‘Lab-J’ of the LUMC Human Genetics Department

Marco Roos Acknowledging the colleagues from BioSemantics, myGrid, OMII-UK, AID, The LUMC BioInformatics Expertise Centre

Page 2: From Laboratory to e-Laboratory

2

Introducing

Me

Page 3: From Laboratory to e-Laboratory

3

Liaison biology/bioinformatics – informatics

Biologist and bioinformatician, e-(bio)science researcherCoordinator BioSemantics group Leiden

Human Genetics Department Leiden University Medical Centre and Informatics Institute University of Amsterdam

Project or Area Liaison (PAL) OMII-UK Member BioAssist programme committee NBIC

Page 4: From Laboratory to e-Laboratory

4

also about

You

Page 5: From Laboratory to e-Laboratory

5

First about

Me

Page 6: From Laboratory to e-Laboratory

6

My C.V. before e-Sciencebefore 2003

• Molecular & Cellular biology (MSc)– microscopy and image analysis of chromosome structure– ‘minor’ computer science

• Image analysis methods to measure DNA content in bull sperm cells (civil service)

• Chromatin structure & function (PhD molecular cytology)

– F.I.S.H., microscopy, image analysis, statistics– 3-D chromosome structure during cell cycle (no luck)– DNA movement in Escherichia coli (success)

• Human Transcriptome Map (post-doc)– Gene expression to human genome sequence– Analysis of regions of increased gene expression

Page 7: From Laboratory to e-Laboratory

MotivationStructure and function of DNA in the nucleus

Esc

heri

chia

coli

Munti

acu

s m

untj

ak

Page 8: From Laboratory to e-Laboratory

8

Why bioinformatics?

Lab-J suggests…

Page 9: From Laboratory to e-Laboratory

07/04/2023 BioAID 9

Bioinformatics

A typical bioinformatician

Page 10: From Laboratory to e-Laboratory

07/04/2023 BioAID 10

Bioinformatics

A biologist behind a computerwho (just) learned perl

Page 11: From Laboratory to e-Laboratory

07/04/2023 BioAID 11

/* * determines ridges in htm expression table*/

#include "ridge.h"

int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable){

char querystring[256];

sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);htmtable = PQexec(conn, querystring);

return(validquery(htmtable, querystring));}

int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount)/* determines if mincount genes in a row are (part of) a ridge *//* pre: htmtable is valid and sorted on genStart (ascending)/* post: {

if (mincount<=0) return TRUE;

if (row>=PQntuples(htmtable)) return FALSE;

if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold){ return FALSE;}return(is_ridge(htmtable, ++row, exprthreshold, --mincount));

}

int main(){

PGconn *conn; /* holds database connection */char querystring[256]; /* query string */PGresult *result;int i;

conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");

if (PQstatus(conn)==CONNECTION_BAD){

fprintf(stderr, "connection to database failed.\n");fprintf(stderr, "%s", PQerrorMessage(conn));exit(1);

}else printf("Connection ok\n");

sprintf(querystring, "SELECT * FROM chromosomes");printf("%s\n", querystring);

result = PQexec(conn, querystring);

if (validquery(result, querystring)){

printresults(result);}else{

PQclear(result);PQfinish(conn);return FALSE;

}

PQclear(result);PQfinish(conn);return TRUE;

}

int printresults(PGresult *tuples){

int i;

for (i=0; i< PQntuples(tuples) && i < 10; i++){

printf("%d, ", i);printf("%s\n", PQgetvalue(tuples,i,0));

}return TRUE;

}

int validquery(PGresult *result, char *querystring){

printf(" in validquery\n");if (PQresultStatus(result) != PGRES_TUPLES_OK) {

printf("Query %s failed.\n", querystring);fprintf(stderr, "Query %s failed.\n", querystring);return FALSE;

}return TRUE;

}

Page 12: From Laboratory to e-Laboratory

13

Why e-science? What is wrong with bioinformatics?

Human geneticists think…

Page 13: From Laboratory to e-Laboratory

14

Why should a biologist be interested in e-science?

BioAssistants guessed…

• Involves Computation• Interpretation of results• Biology isn’t that interesting• Reduce reinvention of the wheel• Current lack of standards• Sharing results• Reshaping biology• Synergy between different sciences• Emerging Data driven science

Page 14: From Laboratory to e-Laboratory

15

Why e-Science?

A needy biologist

Single tiny brain

Lots of data to deal with

Lots of methodsand algorithms to try

and combine

No computationalsuperpowers

Lots of knowledge to deal with

Page 15: From Laboratory to e-Laboratory

16

1070 databases Nucleic Acids Research Jan 2008(96 in Jan 2001)

Proteomics, Genomics, Transcriptomics, Protein sequence prediction, Phenotypic studies, Phylogeny, Sequence analysis, Protein Structure prediction, Protein-protein interaction, Metabolomics, Model organism collections, Systems Biology, Epidemiology, etcetera …

All with a splendid interface… all different, of course

Page 16: From Laboratory to e-Laboratory

07/04/2023 17

Traditional data integration in bioinformatics

LocalDatabase

LocalDatabase

Page 17: From Laboratory to e-Laboratory

18

The ‘spaghetti’ approach

Page 18: From Laboratory to e-Laboratory

19

Some of my observations

• Reinvention– How many reannotation pipelines do you need?– Little reuse of components

• Reproducibility– Black boxes – Emphasis not on clarity– Can we understand bioinformatics as wet lab protocols?

• Focus on technicalities, not biological analysis– Should bioinformaticians write ‘job submission’ scripts?

• Data graveyards– Do we need >1000 databases?– Can we understand our own data?

Page 19: From Laboratory to e-Laboratory

21

SOME EXAMPLES FROM FIELD OF E-SCIENCE

Page 20: From Laboratory to e-Laboratory

22

Enhancement 1: Workflows(Taverna workflow)

Page 21: From Laboratory to e-Laboratory

23

Enhancement 2: exploiting brains

Page 22: From Laboratory to e-Laboratory

24

Exploiting Brains By Web Servicessource: http://biocatalogue.org (launched at ISMB2009)

>1000 annotated services, >3000 known to TavernaIncludes BioMart, R, Text mining, Kegg, NCBI Pubmed, Ensembl, etc.

Web Services run remotely

Page 23: From Laboratory to e-Laboratory

25

Exploiting more brains by sharing workflowssource: http://myExperiment.org

Social community web site for scientists2300 registered users in two years

750 workflows

Page 24: From Laboratory to e-Laboratory

Bioinformatics and e-science

Single purpose,single person,

black boxapplication

Customized experiments with reusable components

My component

Your componentMy component

Your component

My component

Page 25: From Laboratory to e-Laboratory

27

What do we know of our data?

Sufficient?

• Query discoveries?• Query across

experiment?• Fit biological

modelling?• Good basis for new

experiments?• Flexible enough?

Page 26: From Laboratory to e-Laboratory

Model-based data integration

Biological concepts (‘myModel’)

Data

Marshall et al., International Workshop on Knowledge Systems in Bioinformatics 2006Post et al., Bioinformatics 2007

Biologist readable

model

Computer

readable model

roos
Principle method extensively shown at previous SPX meeting
Page 27: From Laboratory to e-Laboratory

Model based data integrationExample: UCSC genome browser

partOf

Page 28: From Laboratory to e-Laboratory

30

Semantic Web (Linked Open Data)

Page 29: From Laboratory to e-Laboratory

31

Empower me with a ‘virtual brain’

My ws

Your ws

My ws

Your ws

My ws

* From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34

*

Page 30: From Laboratory to e-Laboratory

32

Query

Retrieve documents from Medline

Extract proteins (Homo sapiens)

Calculate ranking scores

Create biological cross references

Convert to table (html)

Add documents (IDs) to semantic model

Add proteins to semantic model

Add scores to semantic model

Add cross references to semantic model

Add query to semantic model

Workflow and Semantic Web

Page 31: From Laboratory to e-Laboratory

33

Concept web from a users point of view

Page 32: From Laboratory to e-Laboratory

34

e-Laboratories and e-Laboratory factories

Page 33: From Laboratory to e-Laboratory

35

e-Galaxy for NBIC

• Galaxy as front end

• Workflows & Web Services

• Grid enabled Taverna

• MOLGENIS

• Semantic/Concept Web

• myExperiment/BioCatalogue

• Scientific Research Objects

Vacancy! (software engineer)

Page 34: From Laboratory to e-Laboratory

37

e-Galaxy mock-up

Underlying workflow

Your Scientific Research Object

MOLGENISConvertImport/ExportResearch ObjectsStoreConfigureRun

Related research and documents

Adlsjflad jslf adsflkj alfd adsf Adflja dlfkjal adlfj lakdjflkj adf Adflkj lakjlkjadsf lakdfjlf ladoioewnJlakdsfo oiuw fja oija oisdflv oaijdf

Suggestions by semantic components

Page 35: From Laboratory to e-Laboratory

38

e-Science requirement: Reuse

E-La

bora

tory

com

pone

nt

Page 36: From Laboratory to e-Laboratory

39

http://www.epigenius.org/ (mock-up)

Page 37: From Laboratory to e-Laboratory

40

Research and development aims

• Automated support for hypothesis formation – E.g. on epigenetic mechanisms– Apply Workflow, Semantic Web, Concept Web– Concept-based meta-analysis– Automated triple creation from computational

analysis

Page 38: From Laboratory to e-Laboratory

41

Research and development ambitions

• Co-develop e-Laboratories– e-Galaxy– epiGenius– BioBanking

• Help BEC with support environment• Concept Web services

– Web services– E-Laboratory components– Transparent creation of triples– Personal semantic repositories

Page 39: From Laboratory to e-Laboratory

Liaison

OMII-UKManchester, Southampton, Edinburgh

(ca. 30 engineers)Taverna, myExperiment, e-Labs

W3C Health Care & Life Sciences Interest Group

Semantic Web expertsLinked Open Data

AIDUniversity of Amsterdam

e-Science expertsGrid tools

BioSemantics RotterdamText mining

Concept profile meta-analysis

NBICBioAssist core software development

Grid tools, Concept Web, e-Labs

Concept WebContent, tools and infrastructure

You?

Bioinformatics Expertise Centre LUMCStatistical and computer science expertise

Generic support

Page 40: From Laboratory to e-Laboratory

43

‘e’ for enhance, not enforce

Please help me to help you

Register for:http://snipurl.com/biosemanticsusers(http://www.myexperiment.org/groups/211)

Allows me to• Give you preferential treatment• Not spam everybody• Keep you informed• Ask your opinion (user driven development!)

Page 41: From Laboratory to e-Laboratory

44

Visit the BioSemantics web sitehttp://www.biosemantics.org/

Page 42: From Laboratory to e-Laboratory

45

Word of warning

Computer scientists are scientists too!Need to publishScore by papers, not by softwareAddressed by OMII-UK and BioAssist

Compare“How can I use it in the clinic?”“How can I use it in the lab?”

Page 43: From Laboratory to e-Laboratory

46

Dissemination

• Come by for help or information• Internal ‘mini-courses’?• Send me suggestions!

• FYI: Course ‘Managing Life Science Information’ for PhD students, 2010

Page 44: From Laboratory to e-Laboratory

47

Key points

• Liaisingbetween technology contacts and you, the colleagues of Human Genetics.

• No obligationsTry any new developments that we are involved in with our help, but don't feel obliged.

• Help us help you Express your wishes, problems, try things and give feedback – and be patient sometimes

Please join the biosemantics users group on myExperiment.org to help us communicate.

Page 45: From Laboratory to e-Laboratory

48

Thank you for your attention

An enhanced biologist

Lots of accessible data

Web Services, Workflows,

and their creatorsavailable

Other people’scomputationalsuperpowers

Knowledge basesto query

Communitybrain power

Homo biologicus enhancis