Upload
jervenbolleman
View
126
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This 15 minute presentation shows how we can use multiple SPARQL endpoints to integrate biological data. SPARQL we no longer need to start a data warehouse project to integrate multiple datasources we just use the SPARQL 1.1 service keyword.
Citation preview
SPARQLing data stars in the biology cloud
Jerven BollemanDeveloperSwiss-Prot GroupSwiss Institute of Bioinformatics
Monday, September 3, 2012
Biohackathon A nursery galaxy for sparql endpoints Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics
http://beta.sparql.uniprot.org
• 5.2 Billion triples– All UniProt data
• Taxonomy• Sequences• Enzymes• Pathways• etc... etc...
• SPARQL 1.1 (January 05 2012 Working Draft)– SERVICE keyword
• No blank nodes– SHA-512 series used to stabilize anonymous resources
3Monday, September 3, 2012
© 2011 SIB
Data integration RDF/SPARQL
Chembl.rdf
UniProt.rdf
Own Lab Data
Developer&maintenance time saved
Triple StoreSPARQLQueries
Triple StoreSPARQL
Federation
SPARQLDriving
Services
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics5
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics6
Text
SELECT ?name (COUNT(?protein) as ?size)WHERE{ { ?protein :enzyme ?name. ?name rdfs:subClassOf enzyme:1.-.-.- } UNION { ?protein :enzyme ?name. ?name rdfs:subClassOf enzyme:2.-.-.- } ...}GROUP BY ?name ORDER BY ?name
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics7
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics8
SELECT ?japaneseTerm ?geneSymbol ?protein WHERE { SERVICE<http://data.allie.dbcls.jp/sparql>{ ?pc a allie:PairCluster; allie:hasShortFormRepresentationOf ?sfr ; allie:hasLongFormRepresentationOf ?lfr . ?sfr rdfs:label ?sf . ?lfr rdfs:label “β1アドレナリン受容体, β1アドレナリンレセプター”@ja, ?japaneseTerm. FILTER (lang(?sf) = "en" ) } BIND(str(?sf) as ?geneSymbol) . ?gene skos:prefLabel ?geneSymbol . ?protein uniprot:encodedBy ?gene .}
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics9
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics10
SELECT ?target ?protein WHERE { SERVICE <http://rdf.farmbio.uu.se/chembl/sparql> { ?target a chembl:Target. ?target owl:sameAs ?bio2rdfUniprot . FILTER(contains(str(?bio2rdfUniprot), "uniprot:")) } BIND(iri(concat("http://purl.uniprot.org/uniprot/",substr(str(?bio2rdfUniprot),28))) as ?protein ) ?protein a up:Protein . FILTER (NOT EXISTS { ?protein up:annotation ?annotation . ?annotation a up:Disease_Annotation . })}
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics11
SELECT ?target ?protein WHERE { SERVICE <http://rdf.farmbio.uu.se/chembl/sparql> { ?target a chembl:Target. ?target owl:sameAs ?protein } ?protein a up:Protein . FILTER (NOT EXISTS { ?protein up:annotation ?annotation . ?annotation a up:Disease_Annotation . })}
Monday, September 3, 2012
© 2012 SIB Swiss Instiute of Bioinformatics
Questions?
12Monday, September 3, 2012