Upload
kellie-carpenter
View
213
Download
0
Embed Size (px)
Citation preview
Your Brains in My e-Laboratory
Feasting on brains with Taverna and Semantic Web tools
Marco Roosacknowledging the AID team (Scott Marshall, Sophia Katrenko, Willem van Hage, Edgar Meij, Konstantinos Krommydas, Pieter Adriaans), Andrew Gibson, Martijn Schuemie, Piter de Boer, the myGrid team (in particular Katy Wolstencroft, Carole Goble, and Dave de Roure), OMII-UK and NBIC
Amsterdam, May 28, 2009
2
Marco RoosBiologist and bioinformatician
Post-doc e-(bio)science, University of Amsterdam (BioRange/VL-e)Project or Area Liaison (PAL) OMII-UK
Member UK e-Science All Hands FoundationMember BioAssist programme committee NBIC
A biologist in e-Science
3
Mouse fibroblast (skin) cells
My primary motivationStructure and function of DNA in the nucleus
Esc
heri
chia
coli
5
/* * determines ridges in htm expression table*/
#include "ridge.h"
int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable){
char querystring[256];
sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);htmtable = PQexec(conn, querystring);
return(validquery(htmtable, querystring));}
int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount)/* determines if mincount genes in a row are (part of) a ridge *//* pre: htmtable is valid and sorted on genStart (ascending)/* post: {
if (mincount<=0) return TRUE;
if (row>=PQntuples(htmtable)) return FALSE;
if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold){ return FALSE;}return(is_ridge(htmtable, ++row, exprthreshold, --mincount));
}
int main(){
PGconn *conn; /* holds database connection */char querystring[256]; /* query string */PGresult *result;int i;
conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");
if (PQstatus(conn)==CONNECTION_BAD){
fprintf(stderr, "connection to database failed.\n");fprintf(stderr, "%s", PQerrorMessage(conn));exit(1);
}else printf("Connection ok\n");
sprintf(querystring, "SELECT * FROM chromosomes");printf("%s\n", querystring);
result = PQexec(conn, querystring);
if (validquery(result, querystring)){
printresults(result);}else{
PQclear(result);PQfinish(conn);return FALSE;
}
PQclear(result);PQfinish(conn);return TRUE;
}
int printresults(PGresult *tuples){
int i;
for (i=0; i< PQntuples(tuples) && i < 10; i++){
printf("%d, ", i);printf("%s\n", PQgetvalue(tuples,i,0));
}return TRUE;
}
int validquery(PGresult *result, char *querystring){
printf(" in validquery\n");if (PQresultStatus(result) != PGRES_TUPLES_OK) {
printf("Query %s failed.\n", querystring);fprintf(stderr, "Query %s failed.\n", querystring);return FALSE;
}return TRUE;
}
6
‘Old school’ bioinformatics approach
LocalDatabase
LocalDatabase
8
My tiny brain
9
Virtual professor
My ws
Your ws
My ws
Your ws
My ws
* From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34
*
10
Combining expertise
Edgar Meij
Information retrieval expert
11
Combining expertise
Sophia Katrenko
Machine learning expert
12
Combining expertise
Willem van Hage
Semantic web expert(and bass guitar player)
13
Combining expertiseTowards a knowledge framework
Computer scientist and bioinformatician
Scott Marshall
14
The AIDA toolbox, Web Services for knowledge extraction
and knowledge management
The AIDA toolbox, Web Services for knowledge extraction
and knowledge management
15
AIDA toolbox
e-Science collaboration
16
“Collaboration through Web Services”
Bio-text mining expertBioSemantics group,
Erasmus University Rotterdam
Martijn Schuemie
17
“Collaboration through Web Services”
Biological Database expert
Hideaki Sugawara
18
“Collaboration through Web Services”
e-bioscientist
19
An insightful computational experiment
Workflow paradigm for biologists
21
e-Science leveraging the use of more brains
Want this…
22
e-Science leveraging the use of more brains
…need this
23
Workflow and Semantic Web Decondensed chromatin
Condensed chromatin
Histone methylation at H3K9
DNA methylation
HDAC HAT
Histoneacetylation
Decondensed chromatin
Condensed chromatin
Histone methylation at H3K9
DNA methylation
HDAC HAT
Histoneacetylation
HDAC HAT
Histoneacetylation
Alpha versionof Concept
Web
24
Separation of models and instances in OWL
Protein or gene
Association
BiologicalModel
HDAC1 PCAF HDAC1-PCAF interaction
Chromatin condensation
hypothesis
Protein term
Proteins or genes associationassertion
Document
“HDAC1” “p68”
“p68 and p72 associate with
histonedeacetylase 1
(HDAC1)”
PMID: 15298701
Interaction term
“associate”
isComponentOf some
relates some
relatesBy some
isComponentOf
relatesrelatesisComponentOf
isComponentOf
relatesBy
isComponentOf some
Discovered protein term
Text mining process
AIDA based extraction process
“p68”“p68”
“DNMT3B also interacts with histone
deacetylase 1 (HDAC1)”
“DNMT3B also interacts with histone
deacetylase 1 (HDAC1)”
Discovered interaction
term
“interacts”“interacts”
discoveredBy
searchesWith
Document search query Discovered
interactionassertion
Retrieved document
“HDAC1 AND chromatin”
PMID: 15298701
PMID: 15298701
discoveredBy
discoveredBy
discoveredBy
Discovered protein term
“p68”“p68”“p68 and p72
associate with histone
deacetylase 1 (HDAC1)”
“p68 and p72 associate with
histonedeacetylase 1
(HDAC1)”
Document search query
Discovered associationassertion
“HDAC1”“HDAC1”
Protein or gene
BiologicalModel
Chromatin condensation
hypothesis
Chromatin condensation
hypothesis
HDAC1HDAC1
p68p68
HDAC1-p68 associationHDAC1-p68 association
Interaction
hasParticipant someProtein
ProteinAssociation
BiologicalModel
HDAC1 PCAF HDAC1-PCAF interaction
Chromatin condensation
hypothesis
hasModelComponent
hasModelComponent
hasModelComponenthasParticipant
hasParticipant
Biological model (representing cartoon elements)
<myModel:HDAC1><rdfs:type><myModel:Protein><myModel:Protein><rdfs:type><owl:Class>
26
Model for text mining observations
Protein term
Proteins or genes associationassertion
Document
“HDAC1” “p68”
“p68 and p72 associate with
histonedeacetylase 1
(HDAC1)”
PMID: 15298701
Interaction term
“associate”
isComponentOf some
relates some
relatesBy some
isComponentOf
relatesrelatesisComponentOf
isComponentOf
relatesBy
isComponentOf some
27
Experiment log model
Discovered protein term
Text mining process
AIDA based extraction process
“p68”“p68”
“DNMT3B also interacts with histone
deacetylase 1 (HDAC1)”
“DNMT3B also interacts with histone
deacetylase 1 (HDAC1)”
Discovered interaction
term
“interacts”“interacts”
discoveredBy
searchesWith
Document search query Discovered
interactionassertion
Retrieved document
“HDAC1 AND chromatin”
PMID: 15298701
PMID: 15298701
discoveredBy
discoveredBy
discoveredBy
28
Mappings between Biological and other models
Discovered protein or gene term
“p68”“p68 and p72 associate with
histone deacetylase 1
(HDAC1)”
Document search query
Discovered associationassertion
“HDAC1”
Protein
BiologicalModel
Chromatin condensation
hypothesis
HDAC1
p68
HDAC1-p68 association
ProteinAssociation
references
referencesreferences
references
references
references
29
PRELIMINARY RESULTS
SELECT label(comment), label(query1), label(query2) FROM {protein_instance} rdf:type {bio:Protein} rdf:type {owl:Class},
{protein_instance} rdfs:comment {comment};bioModel:isModelComponentOf {model1};bioModel:isModelComponentOf {model2},
{representation1} mappingModel:partially_represents {model1}; methodModel:has_query {query1},
{representation2} mappingModel:partially_represents {model2}; methodModel:has_query {query2}
WHERE model1 != model2
Pseudo RDF query and results
Protein Query for model 1 Query for model 2"protein referred to by as NF-kappaB and UniProt ID: P19838"
"HDAC1 chromatin" "(Nutrician OR food) AND (chromatin OR epigenetics) AND (protein OR proteins)"
"protein referred to by as p21 and UniProt ID: P38936"
"HDAC1 chromatin" "(Nutrician OR food) AND (chromatin OR epigenetics) AND (protein OR proteins)"
"protein referred to by as Bax and UniProt ID: P97436"
"HDAC1 chromatin" "(Nutrician OR food) AND (chromatin OR epigenetics) AND (protein OR proteins)"
Protein
Proteinname
Discoveryprocess run
Servicerun
Creator
Run date & time
Document
references
discovered by
implemented by
run at
creator
has input
component of
UniProt:P19838
NF-KappaB
Conditional Random FieldsProtein Name Recognition
AIDA:applyCRF
Sophia Katrenko(UvA)
2008-11-1803:29:30
PMID:17540846
references
discovered by
implemented by
run at
creator
has input
component of
Access to triples in Taverna via AIDA plugin
33
34
Knowledge mining
Knowledge mining:my knowledge is mine, your knowledge is mine
35
Demonstrate Exploiting Brains (2x)
My ws
Your ws
My ws
Your ws
My ws
* From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34
*
Computationalbrains
Biologicalbrains
36
A typical biologist…
A needy biologist
Tiny brain
Lots of data to deal with
Lots of methodsand algorithms to try
and combine
No computationalsuperpowers
Lots of knowledge to deal with
37
An enhanced biologist…
An enhanced biologist
Many brains
Lots of data to support me
Web Services, Workflows,
and their creatorsavailable
Other people’scomputationalsuperpowers
Knowledge basesto query
38
Publish and share on myExperiment.org
Publish & share research objects
Publish & share research objects
39
e-Laboratory factories
41
End of presentation...
Thank youhttp://adaptivedisclosure.org
Are you willing to share your brain?