12
SPARQLing data stars in the biology cloud Jerven Bolleman Developer Swiss-Prot Group Swiss Institute of Bioinformatics Monday, September 3, 2012

Uni protsparqlcloud

Embed Size (px)

DESCRIPTION

This 15 minute presentation shows how we can use multiple SPARQL endpoints to integrate biological data. SPARQL we no longer need to start a data warehouse project to integrate multiple datasources we just use the SPARQL 1.1 service keyword.

Citation preview

Page 1: Uni protsparqlcloud

SPARQLing data stars in the biology cloud

Jerven BollemanDeveloperSwiss-Prot GroupSwiss Institute of Bioinformatics

Monday, September 3, 2012

Page 2: Uni protsparqlcloud

Biohackathon A nursery galaxy for sparql endpoints Monday, September 3, 2012

Page 3: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics

http://beta.sparql.uniprot.org

• 5.2 Billion triples– All UniProt data

• Taxonomy• Sequences• Enzymes• Pathways• etc... etc...

• SPARQL 1.1 (January 05 2012 Working Draft)– SERVICE keyword

• No blank nodes– SHA-512 series used to stabilize anonymous resources

3Monday, September 3, 2012

Page 4: Uni protsparqlcloud

© 2011 SIB

Data integration RDF/SPARQL

Chembl.rdf

UniProt.rdf

Own Lab Data

Developer&maintenance time saved

Triple StoreSPARQLQueries

Triple StoreSPARQL

Federation

SPARQLDriving

Services

Monday, September 3, 2012

Page 5: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics5

Monday, September 3, 2012

Page 6: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics6

Text

SELECT ?name (COUNT(?protein) as ?size)WHERE{  {    ?protein :enzyme ?name.    ?name rdfs:subClassOf enzyme:1.-.-.-  } UNION { ?protein :enzyme ?name.    ?name rdfs:subClassOf enzyme:2.-.-.- } ...}GROUP BY ?name ORDER BY ?name

Monday, September 3, 2012

Page 7: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics7

Monday, September 3, 2012

Page 8: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics8

SELECT ?japaneseTerm ?geneSymbol ?protein WHERE { SERVICE<http://data.allie.dbcls.jp/sparql>{ ?pc a allie:PairCluster; allie:hasShortFormRepresentationOf ?sfr ; allie:hasLongFormRepresentationOf ?lfr . ?sfr rdfs:label ?sf . ?lfr rdfs:label “β1アドレナリン受容体, β1アドレナリンレセプター”@ja, ?japaneseTerm. FILTER (lang(?sf) = "en" ) } BIND(str(?sf) as ?geneSymbol) . ?gene skos:prefLabel ?geneSymbol . ?protein uniprot:encodedBy ?gene .}

Monday, September 3, 2012

Page 9: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics9

Monday, September 3, 2012

Page 10: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics10

SELECT ?target ?protein WHERE { SERVICE <http://rdf.farmbio.uu.se/chembl/sparql> { ?target a chembl:Target. ?target owl:sameAs ?bio2rdfUniprot . FILTER(contains(str(?bio2rdfUniprot), "uniprot:")) } BIND(iri(concat("http://purl.uniprot.org/uniprot/",substr(str(?bio2rdfUniprot),28))) as ?protein ) ?protein a up:Protein . FILTER (NOT EXISTS { ?protein up:annotation ?annotation . ?annotation a up:Disease_Annotation . })}

Monday, September 3, 2012

Page 11: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics11

SELECT ?target ?protein WHERE { SERVICE <http://rdf.farmbio.uu.se/chembl/sparql> { ?target a chembl:Target. ?target owl:sameAs ?protein } ?protein a up:Protein . FILTER (NOT EXISTS { ?protein up:annotation ?annotation . ?annotation a up:Disease_Annotation . })}

Monday, September 3, 2012

Page 12: Uni protsparqlcloud

© 2012 SIB Swiss Instiute of Bioinformatics

Questions?

12Monday, September 3, 2012