171
RDF what and why Jerven Bolleman Developer Swiss-Prot Group

RDF: what and why plus a SPARQL tutorial

Embed Size (px)

Citation preview

Page 1: RDF: what and why plus a SPARQL tutorial

RDF what and why

Jerven Bolleman Developer Swiss-Prot Group

Page 2: RDF: what and why plus a SPARQL tutorial

Introduction

• RDF  

• Its  a  technology  

• Cost  and  affordability  are  key  concerns

Page 3: RDF: what and why plus a SPARQL tutorial
Page 4: RDF: what and why plus a SPARQL tutorial

-.---------.>+++++++[<---------->-]<+.>+++++++[<++++++++++>-]<--.+++++++++++.++++++++.---------.>++++++++[<---------->-]<++.>+++++[<+++++++++++++>-]<.+++++++++++++.----------.>+++++++[<---------->-]<++.>++++++++[<++++++++++>-]<.>+++[<----->-]<.>+++[<++++++>-]<..>+++++++++[<--------->-]<--.>+++++++[<++++++++++>-]<+++.+++++++++++.>++++++++[<----------->-]<++++.>+++++[<+++++++++++++>-]<.>+++[<++++++>-]<-.---.++++++.-------.----------.>++++++++[<----------->-]<+.---.[-]<<<->[-]>[-]<<[>+>+<<-]>>[<<+>>-]>>>[-]<<<+++++++++<[>>>+<<[>+>[-]<<-]>[<+>-]>[<<++++++++++>>>+<-]<<-<-]+++++++++>[<->-]>>+>[<[-]<<+>>>-]>[-]+<<[>+>-<<-]<<<[>>+>+<<<-]>>>[<<<+>>>-]<>>[<+>-]<<-[>[-]<[-]]>>+<[>[-]<-]<++++++++[<++++++<++++++>>-]>>>[>+>+<<-]>>[<<+>>-]<[<<<<<.>>>>>-]<<<<<<.>>[-]>[-]++++[<++++++++>-]<.>++++[<++++++++>-]<++.>+++++[<+++++++++>-]<.><+++++..--------.-------.>>[>>+>+<<<-]>>>[<<<+>>>-]<[<<<<++++++++++++++.>>>>-]<<<<[-]>++++[<++++++++>-]<.>+++++++++[<+++++++++>-]<--.---------.>+++++++[<---------->-]<.>++++++[<+++++++++++>-]<.+++..+++++++++++++.>++++++++[<---------->-]<--.>+++++++++[<+++++++++>-]<--.-.>++++++++[<---------->-]<++.>++++++++[<++++++++++>-]<++++.------------.---.>+++++++[<---------->-]<+.>++++++++[<+++++++++++>-]<-.>++[<----------->-]<.+++++++++++..>+++++++++[<---------->-]<-----.---.+++.---.[-]<<<]@

Page 5: RDF: what and why plus a SPARQL tutorial

What is RDF?

What?

Why?

SPARQL?

Example

s Examples

Page 6: RDF: what and why plus a SPARQL tutorial

RDF: Resource Description Framework

• Resource – Generalization of “Web resource” – A thing that can be identified (but not necessarily

retrieved) on the Web • Description

– A resource is described with statements that specify the properties and property values of the resource

• Statement (aka Triple) – subject: identifies the resource – predicate: identifies a property of the resource – object: identifies the value of that property

Page 7: RDF: what and why plus a SPARQL tutorial

Everything can be described with (loads of) triples...

SubjectProperty

(resource)

A Triple

Object (resource

or literal value)

Subject (resource)

Page 8: RDF: what and why plus a SPARQL tutorial

Related triples form a graph...

Page 9: RDF: what and why plus a SPARQL tutorial

An RDF graph can be serialized in several ways

• RDF/XML: the W3C’s official format – XML is well established: good for application developers – very verbose, not very “readable” – e.g. uniprot.org/uniprot/P00750.rdf

• N-Triple – good for loading into triple stores – e.g. uniprot.org/uniprot/P00750.nt

• Turtle ⟵ most examples will use this – good for reading by humans – e.g. uniprot.org/uniprot/P00750.ttl

• JSON-LD – easy for javascript/websites

• .... • Conversion 100% lossless

Page 10: RDF: what and why plus a SPARQL tutorial

A simple example

RDF What and why

presented by

A Triple

“Jerven Bolleman”Literal value

Page 11: RDF: what and why plus a SPARQL tutorial

RDF identifies resources with URIs

UniProt.rdf What and why

presented by

A Triple

expasy.org/people/Jerven_Tjalling .Bolleman.htm

URI

Page 12: RDF: what and why plus a SPARQL tutorial

Multiple URIs may identify the same thing

expasy.org/people/Jerven_Tjalling .Bolleman.htm

ch.linkedin.com/in/jervenbolleman

owl:sameAs

A Triple

Page 13: RDF: what and why plus a SPARQL tutorial

The life sciences have an identity problem...

• www.genenames.org/data/hgnc_data.php?hgnc_id=9993 – RGS11: regulator of G-protein signaling 11

• http://www.uniprot.org/taxonomy/9993 – European alpine marmot

• ...

TextTe What is “9993”?

Page 14: RDF: what and why plus a SPARQL tutorial

Hello, I a 9993. I like flower?

Page 15: RDF: what and why plus a SPARQL tutorial

The solution are URIs

• In RDF statements: – subject and predicates must be URIs – objects may be URIs or literal values

• Advantages: – No risk of “name clashes” when integrating data from

different sources – Different people can make statements about the same

resource:Distributed annotation at a global scale!

Page 16: RDF: what and why plus a SPARQL tutorial

Example: From tab-delimited to semantic

RDF in Turtle format

Tab delimited Converted To

An example

Page 17: RDF: what and why plus a SPARQL tutorial

Example: From tab-delimited to semantic

A TripleQ9VGZ4P25724Q9V3H7Q00403P23312P31928Q9NAE1Q9TYY1Q10666Q21921

Interactions.txtP32234P32234P32234P42643P42643P42643P41932P41932P41932P41932

Page 18: RDF: what and why plus a SPARQL tutorial

Example step 1: Use URIs for subjects and objects

A TripleInteractions.txt

...prot/Q9VGZ4

...prot/P25724

...prot/Q9V3H7

...prot/Q00403

...prot/P23312

...prot/P31928

...prot/Q9NAE1

...prot/Q9TYY1

...prot/Q10666

...prot/Q21921

purl.uniprot.org/uniprot/P32234purl.uniprot.org/uniprot/P32234purl.uniprot.org/uniprot/P32234...prot/P42643...prot/P42643...prot/P42643...prot/P41932...prot/P41932...prot/P41932...prot/P41932

Page 19: RDF: what and why plus a SPARQL tutorial

Example step 2: Use shorthand syntax

A TripleInteractions.txt

prot:Q9VGZ4 .prot:P25724 .prot:Q9V3H7 .prot:Q00403 . prot:P23312 .prot:P31928 .prot:Q9NAE1 .prot:Q9TYY1 . prot:Q10666 . prot:Q21921 .

@prefix prot:<purl.uniprot.org/uniprot/>prot:P32234prot:P32234prot:P32234prot:P42643prot:P42643prot:P42643prot:P41932prot:P41932prot:P41932prot:P41932

Page 20: RDF: what and why plus a SPARQL tutorial

Example step 3: Make statements

A TripleInteractions.txt

@prefix prot:<purl.uniprot.org/uniprot/>prot:P32234prot:P32234prot:P32234prot:P42643prot:P42643prot:P42643prot:P41932prot:P41932prot:P41932prot:P41932

interacts_withinteracts_withinteracts_withinteracts_withinteracts_withinteracts_withinteracts_withinteracts_withinteracts_withinteracts_with

prot:Q9VGZ4 .prot:P25724 .prot:Q9V3H7 .prot:Q00403 .prot:P23312 .prot:P31928 .prot:Q9NAE1 .prot:Q9TYY1 .prot:Q10666 .prot:Q21921 .

Page 21: RDF: what and why plus a SPARQL tutorial

Example step 4: Use URIs for properties

@prefix prot:<purl.uniprot.org/uniprot/>@prefix core:<purl.uniprot.org/core/>prot:P32234prot:P32234prot:P32234prot:P42643prot:P42643prot:P42643prot:P41932prot:P41932prot:P41932

core:interacts_withcore:interacts_withcore:interacts_withcore:interacts_withcore:interacts_withcore:interacts_withcore:interacts_withcore:interacts_withcore:interacts_with

Interactions.ttl

prot:Q9VGZ4 .prot:P25724 .prot:Q9V3H7 .prot:Q00403 .prot:P23312 .prot:P31928 .prot:Q9NAE1 . prot:Q9TYY1 .prot:Q10666 .

Page 22: RDF: what and why plus a SPARQL tutorial

RDF What? Quick recap

• RDF describes data with statements (aka triples) – statement = subject + predicate + object – related statements form a directed graph

• RDF uses URIs to identify things: – subject and predicates must be URIs – objects may be URIs or literal values • Multiple serialisation formats that are 99.999999%

automatically convertible

Page 23: RDF: what and why plus a SPARQL tutorial

Why RDF? Isn’t there a simpler solution?

What?

Why?

SPARQL?

Example

s Examples

Page 24: RDF: what and why plus a SPARQL tutorial

A very simple example: FASTA

• Why does everyone in the sequence world use FASTA?

Page 25: RDF: what and why plus a SPARQL tutorial

A very simple example: FASTA

• Why does everyone in the sequence world use FASTA? – The smallest common denominator – You can put in the header what you like and I can

choose to ignore it

• BUT: You only get a sequence...

>Who|cares_about:this?THISISWHATWEWANT

Page 26: RDF: what and why plus a SPARQL tutorial

A simple example: GFF

• Some people want to exchange more than sequences, and invented GFF:

• BUT: ...

SEQ1 EMBL atg 103 105 . + 0 SEQ1 EMBL exon 103 172 . + 0

Page 27: RDF: what and why plus a SPARQL tutorial

A simple example: GFF

• Some people want to exchange more than sequences, and invented GFF:

• BUT: What do the columns mean? – Originally, an exchange format for sequence

feature descriptions, later also used for other annotations

– 3 versions known (to me ;) – Not extendable without prior agreement of all

users

SEQ1 EMBL atg 103 105 . + 0 SEQ1 EMBL exon 103 172 . + 0

Page 28: RDF: what and why plus a SPARQL tutorial

A proper solution: XML

• There is a world beyond sequences and bioinformatics!

• XML is an IT-industry standard – Datatypes – Multi namespaces – Schemas

• BUT: – Hierarchical data model – Schemas close extension

Page 29: RDF: what and why plus a SPARQL tutorial

XML represents data as a tree

• XML datatypes – Multi namespace – XML Schema closes extensions

• Tree format

entry

Protonacceptor 196

active

2.7.11.-

EC

Page 30: RDF: what and why plus a SPARQL tutorial

No XML standard for other relationships prizes:a case study

• XML datatypes – Multi namespace – XML Schema closes extensions

• Tree format

entry

Protonacceptor 196

active

2.7.11.-

EC

Page 31: RDF: what and why plus a SPARQL tutorial

Our data is a graph!

entry

Protonacceptor

196active

2.7.11.-

EC

Page 32: RDF: what and why plus a SPARQL tutorial

RDF advantages

• W3C standard • Can be serialized as XML or JSON

• i.e. most benefits of XML or JSON • Generic graph structure • URIs as a standard way to identify resources and

their properties – data integration without name clashes – distributed annotation – normalization

• Extensible!

Page 33: RDF: what and why plus a SPARQL tutorial

RDF is extensible

• Anyone can say Anything about Anything – You can say something about my data

• RDF extensions remain compatible • RDF encourages data and schema reuse

@prefix prot:<purl.uniprot.org/uniprot/>@prefix intact:<fake.ebi.ac.uk/intact/example>prot:P32234prot:P32234

intact:interacts_withintact:interacts_with

Interactions.ttl

prot:Q9VGZ4prot:P25724

Page 34: RDF: what and why plus a SPARQL tutorial

RDF data model is simple

• Everything can be said with triples

• Generic triple stores – low maintenance data integration

• SPARQL – SQL – XPath – Regular expressions

for RDF for RDF for RDF

Page 35: RDF: what and why plus a SPARQL tutorial

Comparison

Flat file XML RDF

Standard NO YES YES

Scalable NO YES YES +

Extendable NO NO YES

GenericData model NO NO YES

Page 36: RDF: what and why plus a SPARQL tutorial

Modeling data using RDF

Page 37: RDF: what and why plus a SPARQL tutorial

Most common failure in RDF world: Philosophy over pragmatism

1.  Be  honest  about  your  data  • what  you  have:  not  what  you  want  

2.  Change  the  concept  change  the  IRI  •  One  concept  can  be  referred  to  by  multiple  IRI  

3.  Better  to  “todo”  than  to  “debate”  

Page 38: RDF: what and why plus a SPARQL tutorial

Model real data not the the “real world”

• Describe  records  that  relate  to  real  world  things  

• Acknowledge  that  they  are  records  

• Model  measurements  before  “facts”

Page 39: RDF: what and why plus a SPARQL tutorial

Example: mouse in a lab

1.5g

<weight>

Page 40: RDF: what and why plus a SPARQL tutorial

Example: mouse in a lab

1.5g

<weight>

20g

<weight>

Page 41: RDF: what and why plus a SPARQL tutorial

TIME it made you a liar

Page 42: RDF: what and why plus a SPARQL tutorial

Example: mouse in a lab

1.5g

<measurement>

20g

<measurem

ent>

<weight>

<weight>

1week

3week

_:1

_:2 <age>

<age>

Page 43: RDF: what and why plus a SPARQL tutorial

Describing models using OWL

Page 44: RDF: what and why plus a SPARQL tutorial

OWL: Web Ontology Language

• Will  be  presented  in  detail  during  the  week  

• Logical  meaning  added  to  RDF  statements  

• That  tools  use  

• Classifies  existing  data  or  infers  new  data  

• Very  powerful  and  useful

Page 45: RDF: what and why plus a SPARQL tutorial

‹#›

DANGER

It  is  pure  Logic  (first order)  

45

Page 46: RDF: what and why plus a SPARQL tutorial

Classification by restricting set membership

<human> a owl:Class ; rdfs:subClassOf [ owl:onProperty <legs> . owl:cardinality 2 ] ; rdfs:subClassOf [ owl:onProperty <brains> . owl:cardinality 1 ] ; rdfs:subClassOf [ owl:onProperty <referenceGenome> . owl:allValuesFrom <HGCHR_genome> ] ;…

Page 47: RDF: what and why plus a SPARQL tutorial

Classification by restricting set membership

<human> a owl:Class ; rdfs:subClassOf [ owl:onProperty <legs> . owl:cardinality 2 ] ; rdfs:subClassOf [ owl:onProperty <brains> . owl:cardinality 1 ] ; rdfs:subClassOf [ owl:onProperty <referenceGenome> . owl:allValuesFrom <HGCHR_genome> ] ;…

Lose a leg → no longer human

Page 48: RDF: what and why plus a SPARQL tutorial

Validating RDF Data

Page 49: RDF: what and why plus a SPARQL tutorial

W3C workgroup in progress

• Data-­‐Shapes    

• You  don’t  want  to  know  how  the  sausage  is  made…    

• Vendors  looking  forward  to  implementing  it  

• Currently  not  that  bad,  could  be  better  

• First  Working  Draft

Page 50: RDF: what and why plus a SPARQL tutorial

SPARQL

What?

Why?

SPARQL?

Example

s Examples

Page 51: RDF: what and why plus a SPARQL tutorial

Why provide a public SPARQL endpoint

• A  10  man  wet  laboratory  can  not  afford:  

– to  host  their  own  database  in  house  holding  all  or  even  a  bit  of  all  life  science  data.    

– not  to  have  access,  and  use,  existing  life  science  information.

Page 52: RDF: what and why plus a SPARQL tutorial

← Not CPU Time...But Brain Time

The right kind of optimisation

Page 53: RDF: what and why plus a SPARQL tutorial

Why provide a public SPARQL endpoint

• Classical  SQL  can  be  provided  on  the  web  

–Is  not  practical  –No  federation  –Poor  standards  conformance  

• Local SQL is expensive • Local  JSON  is  no  better  

• Nor  is  local  XML

Page 54: RDF: what and why plus a SPARQL tutorial

Data Integration Traditional

Pathway.txt

UniProt.txt

Pathway Parser

UniProt Parser

Pathway Schema

UniProt Schema

Own Lab Data

Data warehouse

SQL queries

$

$

$

$

$

$

Page 55: RDF: what and why plus a SPARQL tutorial

Data Integration RDF/SPARQL

Pathway.rdf

UniProt.rdf

Own Lab Data

Triple Store SPARQL Queries

$

$?

Page 56: RDF: what and why plus a SPARQL tutorial

Why provide a public SPARQL endpoint

• Document  centric  REST  is  not  enough  

–Swiss-­‐Prot  available  as  REST    –(over e-mail !!) since 1986

–expasy.ch since 1993 –www.uniprot.org  since  2002  

• Most user use a GUI not a CLI • developers  build  GUI  on  a  CLI

Page 57: RDF: what and why plus a SPARQL tutorial

57© 2015 SIB

Page 58: RDF: what and why plus a SPARQL tutorial

58© 2015 SIB

Page 59: RDF: what and why plus a SPARQL tutorial
Page 61: RDF: what and why plus a SPARQL tutorial

100

10'000

1'000'000

2015-01

2015-02

2015-03

2015-04

2015-05

2015-06

2015-07

2015-08

queries ask selectconstruct describe

Queries per month in 2015 peak: 4 million per month

Page 62: RDF: what and why plus a SPARQL tutorial

Real users

Mix between hard analytics and super specific

Estimate somewhere between: 300 - 1000 real humans per month

We know they are real because they take holidays ;)

Page 63: RDF: what and why plus a SPARQL tutorial

Using the Semantic Web for faster (Bio-) Research

Page 64: RDF: what and why plus a SPARQL tutorial

Exercises with SPARQL

tutorial.sparql.uniprot.org

Page 65: RDF: what and why plus a SPARQL tutorial

Why learn SPARQL

• Standardised formal query language – implementation independent

• SPARQL ➔ SQL (via R2RML) • SPARQL ➔ webservice (via SADI) • SPARQL ➔ LDAP (e.g. SquirrelRDF) • SPARQL ➔ RDF (triplestore e.g. OWLIM-se) • SPARQL ➔ HADOOP/HIVE (e.g. SHARD) • SPARQL ➔ Linked Data Fragments

– How you query independent of how you store!

Page 66: RDF: what and why plus a SPARQL tutorial

Apparently it helps kill vampires !!!

Page 67: RDF: what and why plus a SPARQL tutorial

Its SPARQLy mammal time !!

Page 68: RDF: what and why plus a SPARQL tutorial

Lets look at an single taxon record www.uniprot.org/taxonomy/9993

Page 69: RDF: what and why plus a SPARQL tutorial

Lets look at an single taxon record www.uniprot.org/taxonomy/9993

Page 70: RDF: what and why plus a SPARQL tutorial
Page 71: RDF: what and why plus a SPARQL tutorial

@base <http://purl.uniprot.org/taxonomy/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix up: <http://purl.uniprot.org/core/> . <9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:seeAlso <http://animaldiversity.ummz.umich.edu/site/accounts/information/Marmota_marmota.html> , <http://www.alphagalileo.org/Organisations/ViewItem.aspx?OrganisationId=2043&ItemId=70106&CultureCode=en> , <http://www.arkive.org/alpine-marmot/marmota-marmota/info.html> , <http://www.biolib.cz/en/taxon/id20598/> ,

Page 72: RDF: what and why plus a SPARQL tutorial

Turtle is the RDF serialization aligned with SPARQL• Shorthand  to  avoid  typing  so  much  

– .  ‘dot’  is  end  statement  

– ;  ‘semi-­‐colon’  repeat  subject  

– ,  ‘comma’  is  repeat  subject  and  predicate  

• prefix  

– before  ‘:’  is  abbreviation  of  uri

Page 73: RDF: what and why plus a SPARQL tutorial

Why don’t these queries work elsewhere?

• PREFIX  

– On  the  web  you  often  have  to  add  these  

– But  some  can  be  preconfigured

PREFIX :<http://purl.uniprot.org/core/> SELECT ?x FROM <http://purl.uniprot.org/taxonomy/> WHERE {?x a :Taxon}

Page 74: RDF: what and why plus a SPARQL tutorial

a = rdf:type = <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

Page 75: RDF: what and why plus a SPARQL tutorial

<9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:subClassOf <9992> ; skos:narrowerTransitive <9994> ;

rdfs:subClassOf taxon:9994 is a more specific classification than

Page 76: RDF: what and why plus a SPARQL tutorial

<9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:subClassOf <9992> ; skos:narrowerTransitive <9994> ;

rank => “The level, for nomenclatural purposes, of a taxon in a taxonomic hierarchy”

Page 77: RDF: what and why plus a SPARQL tutorial

Lets learn SPARQL

• Queries  over  RDF  data.  

– Four  basic  types  

• SELECT  

– Returns  “tab  delimited”  results    

• CONSTRUCT  

– Makes  new  triples  

• DESCRIBE  

– Returns  all  triples  mentioning  a  resource  

Page 78: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

taxon:9606 rdf:type core:Taxon .

Page 79: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon .

Page 80: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon .

SELECT ?anyTaxon WHERE {

}

Page 81: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

taxon:9606 rdf:type core:Taxon . taxon:9606 core:reviewed “true” .

Page 82: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon . ?anyTaxon core:reviewed “true” .

Page 83: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon . ?anyTaxon core:reviewed “true” .

SELECT ?anyTaxon WHERE {

}

Page 84: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon . ?anyTaxin core:reviewed “true” .

SELECT ?anyTaxon WHERE {

}

Page 85: RDF: what and why plus a SPARQL tutorial

SPARQL:queries triple pattern

?anyTaxon rdf:type core:Taxon . $anyTaxon core:reviewed “true” .

SELECT ?anyTaxon WHERE {

}

Page 86: RDF: what and why plus a SPARQL tutorial

tutorial.sparql.uniprot.org

Page 87: RDF: what and why plus a SPARQL tutorial

1: Select all taxon from NCBI/UniProt taxonomy

• Taxonomy  at  www|sparql.uniprot.org  

• Matches  NCBI  

• Time  sync  

• Adds  more  names  

• Ands  images

Page 88: RDF: what and why plus a SPARQL tutorial

‹#›

88

Page 89: RDF: what and why plus a SPARQL tutorial

Lets learn SPARQLShorthand a = rdf:type

Page 90: RDF: what and why plus a SPARQL tutorial

2: AND join (default)

Page 91: RDF: what and why plus a SPARQL tutorial

3: Shortcuts

Page 92: RDF: what and why plus a SPARQL tutorial

Remember ‘;’ shortcut

Page 93: RDF: what and why plus a SPARQL tutorial

4: Two variables one output column

Page 94: RDF: what and why plus a SPARQL tutorial

5: Optional

• When  values  may  be  missing  

– yet  interesting  when  they  are  there  

• Use  as  sub  query  

• bound  values  from  outside  stay  bound  inside  

– ?x  ?y?z  .  OPTIONAL  {?x  ?b  ?c}    

• ?x  same  variable  =  same  thing

Page 95: RDF: what and why plus a SPARQL tutorial

5: OPTIONAL commonName

Page 96: RDF: what and why plus a SPARQL tutorial

6: UNION

• Allows  you  to  combine  query  patterns  as  an  OR  operation.  

• Joins  are  still  from  outer  to  inner.  

Page 97: RDF: what and why plus a SPARQL tutorial

UNION

Page 98: RDF: what and why plus a SPARQL tutorial

Negation

• When  you  do  not  want  a  certain  category  of  matches.

SELECT ?pet WHERE {

?pet a pets:Friendly . }

Page 99: RDF: what and why plus a SPARQL tutorial

Oooops

Page 100: RDF: what and why plus a SPARQL tutorial

7: Not exists (Negation 1)

Page 101: RDF: what and why plus a SPARQL tutorial

8: Minus (Negation 2)

Page 102: RDF: what and why plus a SPARQL tutorial

MINUS{} or FILTER (NOT EXISTS{})

• Whats  the  difference?  

– MINUS  subtracts  results  

– NOT  EXITS  tests  if  the  sub  pattern  is  possible  at  all.  

• Normally  the  faster  option.

Page 103: RDF: what and why plus a SPARQL tutorial

9: MINUS all data

Page 104: RDF: what and why plus a SPARQL tutorial

10: FILTER (NOT EXISTS{}) no results

Page 105: RDF: what and why plus a SPARQL tutorial

11: Negation option 3 SPARQL 1.0

SELECT ?subject ?rankWHERE { ?subject up:rank ?rank . OPTIONAL { ?subject up:rank up:Genus . ?subject up:rank ?genus .} FILTER(! BOUND(?genus))}

Page 106: RDF: what and why plus a SPARQL tutorial
Page 107: RDF: what and why plus a SPARQL tutorial

FILTERS

• You  just  saw  it  twice  

– Once  in  the  !BOUND  

– Once  in  the  NOT  EXISTS  

• FILTERS  a  result  set  by  possibly  removing  values  

– FILTER  do  not  add  a  value  to  the  result  

• Inside  the  same  graph  pattern  order  independent.

Page 108: RDF: what and why plus a SPARQL tutorial

12: Filter

Page 109: RDF: what and why plus a SPARQL tutorial

13: Filter on not in

Page 110: RDF: what and why plus a SPARQL tutorial

Using implicit AND between lines

Page 111: RDF: what and why plus a SPARQL tutorial

Using implicit AND between lines

Page 112: RDF: what and why plus a SPARQL tutorial

15: FILTER IN

Page 113: RDF: what and why plus a SPARQL tutorial

16: FILTER using OR

Page 114: RDF: what and why plus a SPARQL tutorial

FILTER on numbers

• <    

– FILTER  (1  <  2)          (17)  

• >  

– FILTER  (2  >  1)          (18)  

• =  

– FILTER  (1  =1)          (19)  

• !=  

– FILTER(1  !=  2)        (20)  

Page 115: RDF: what and why plus a SPARQL tutorial

Filters

• ?x  =  ?y  does  casting  (value  conversions)  (21)  

– 1.0^^xsd:float  =  1^^xsd:int  is  true  

• sameTerm(?x,  ?y)  does  not  (22)  

– sameTerm(1.0^^xsd:float,  1^^xsd:int)

Page 116: RDF: what and why plus a SPARQL tutorial

FUNCTIONS for in filters and in binds

• Functions  

– STRLEN  

– SUBSTR  

– UCASE  

– LCASE  

– STRSTARTS  

– STRENDS  

– CONTAINS  

– STRBEFORE  

– STRAFTER  

– ENCODE_FOR_URI  

– CONCAT  

– langMatches  

– REGEX  

– REPLACE  

– IRI  

– STR

Page 117: RDF: what and why plus a SPARQL tutorial

24: SUBSTR == substring

Page 118: RDF: what and why plus a SPARQL tutorial

24: STRLEN == String Length

Page 119: RDF: what and why plus a SPARQL tutorial

25: CONTAINS is case sensitive is it in there

Page 120: RDF: what and why plus a SPARQL tutorial

26: REGEX, just like java|python regex

Page 121: RDF: what and why plus a SPARQL tutorial

BIND

• Builds  new  Values  

– Closes  the  basic  graph  pattern  (22)  

• Always  declare  before  use.

SELECT ?p WHERE { {

?taxon a :Taxon . } BIND (?taxon AS ?p)

}

Page 122: RDF: what and why plus a SPARQL tutorial

BIND existing variable to a new one

Page 123: RDF: what and why plus a SPARQL tutorial
Page 124: RDF: what and why plus a SPARQL tutorial

27: CONCAT

Page 125: RDF: what and why plus a SPARQL tutorial

BIND can assign any output

Page 126: RDF: what and why plus a SPARQL tutorial

Aggregate functions

• on  select  line  

• limited  in  number  

– count  

– sum  

– avg  

– min  

– max  

– groupConcat  

– sample

Page 127: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

30: count

Page 128: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

31: SAMPLE should give a random result back

Page 129: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

Follow the path

Page 130: RDF: what and why plus a SPARQL tutorial

32: Path queries

Page 131: RDF: what and why plus a SPARQL tutorial

33: Finding a grand parent using normal joins

Page 132: RDF: what and why plus a SPARQL tutorial

34: Finding a grandParent using a path query

Page 133: RDF: what and why plus a SPARQL tutorial

35: | is OR for predicate

Page 134: RDF: what and why plus a SPARQL tutorial

36: Same result with UNION

Page 135: RDF: what and why plus a SPARQL tutorial

37: Finding any ancestor

Page 136: RDF: what and why plus a SPARQL tutorial

38: Can use the variable in a normal join afterwards

Page 137: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

GROUP BY

Page 138: RDF: what and why plus a SPARQL tutorial

GROUP BY

• Needed  for  aggregate  values  

• After  closing  the  where  clause  

– ...  WHERE  {?x  ?y  ?z}  GROUP  BY  ?x

Page 139: RDF: what and why plus a SPARQL tutorial

39: GROUP BY

Page 140: RDF: what and why plus a SPARQL tutorial

HAVING

• \

I have carrot !

Page 141: RDF: what and why plus a SPARQL tutorial

HAVING

• FILTER  for  aggregates    

• After  the  GROUP  BY  clause  

– ...  GROUP  BY  ?x  HAVING  (count(?y)  >  2)  

– ...  GROUP  BY  ?x  HAVING  (min(?y)  =  2)  

– etc...

Page 142: RDF: what and why plus a SPARQL tutorial

40: HAVING

Page 143: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

LIMITS & OFFSET

Page 144: RDF: what and why plus a SPARQL tutorial

41: LIMIT and OFFSET

• OFFSET  is  skip  first  results  

• LIMIT  return  no  more  than  x  results

Page 145: RDF: what and why plus a SPARQL tutorial

ORDER

Page 146: RDF: what and why plus a SPARQL tutorial

ORDER

Page 147: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

Page 148: RDF: what and why plus a SPARQL tutorial
Page 149: RDF: what and why plus a SPARQL tutorial

VALUES

• Super  BIND  

• Provide  inline  data

Page 150: RDF: what and why plus a SPARQL tutorial

Marmota marmota marmota

Page 151: RDF: what and why plus a SPARQL tutorial

Examples

• Parameter  lists  are  between  ()  

Text

VALUES (?annotation) { (core:Disease_Annotation) (core:Disulfide_Bond_Annotation) }

Page 152: RDF: what and why plus a SPARQL tutorial

Examples

• Undef  means  no  value  at    

– all  not  bound

Text

VALUES (?annotation ?begin) { (core:Disease_Annotation UNDEF) (core:Disulfide_Bond_Annotation 2) }

Page 153: RDF: what and why plus a SPARQL tutorial

VALUES

• After  declaring  a  set  of  values  you  can  use  them  in  your  query.

SELECT ?comment WHERE { VALUES (?annotation ?begin) { (core:Disease_Annotation UNDEF) (core:Disulfide_Bond_Annotation 2) } ?annotation rdfs:comment ?comment . }

Page 154: RDF: what and why plus a SPARQL tutorial

SERVICE: Using other sparql endpoints

• SERVICE<URL  of  other  endpoint>  

– Runs  a  sub  query  on  the  other  endpoint  and  merges  it  back  into  your  query.

Page 155: RDF: what and why plus a SPARQL tutorial

“Life is better with friends who understand you.”

Page 156: RDF: what and why plus a SPARQL tutorial

SERVICE

Page 157: RDF: what and why plus a SPARQL tutorial

SERVICE

• Useful  

– Quick  experimenting  with  combing  multiple  datasources  

– Quick  for  queries  where  not  to  much  data  is  send  to  the  remote  point  

• Slow  

– When  you  ask  for  to  much  data  

– Remote  endpoint  not  resourced  for  your  questions

Page 158: RDF: what and why plus a SPARQL tutorial

SERVICE

• Slowly  improving  

• Theoretically  unfixable  

• Practically  could  be  much  better  

• 1000  x  speed  up  small  step  away

Page 159: RDF: what and why plus a SPARQL tutorial

Lets make some triples

Page 160: RDF: what and why plus a SPARQL tutorial

Construction

• CONSTRUCT  

– New  triples    

• downloads  RDF  

– Does  not  update  store

Page 161: RDF: what and why plus a SPARQL tutorial
Page 162: RDF: what and why plus a SPARQL tutorial

Constructing an owl:sameAs between two URI

Page 163: RDF: what and why plus a SPARQL tutorial

INSERT

• Adds  data  

– like  construct

Page 164: RDF: what and why plus a SPARQL tutorial
Page 165: RDF: what and why plus a SPARQL tutorial

DELETE

• Removes  data  

– Triples  matching  are  removed  from  the  data  

– Triples  can  be  bound  using  where  clause

Page 166: RDF: what and why plus a SPARQL tutorial

DELETE

Page 167: RDF: what and why plus a SPARQL tutorial

DELETE INSERT

• Single  atomic  operation  

• Transactions  store  API  option

Page 168: RDF: what and why plus a SPARQL tutorial

Atomic operation

Page 169: RDF: what and why plus a SPARQL tutorial

© 2013 SIB

I’m exhausted now

Page 170: RDF: what and why plus a SPARQL tutorial

Of Course Biology is complicated#baseURI: http://purl.uniprot.org/unirule/UR000107224#Rule UR000107224 Created by:bridge on:2009-02-12 Modified by:rantunes on:2015-06-09PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX uniprot:<http://purl.uniprot.org/uniprot/>PREFIX sequence:<http://purl.uniprot.org/sequences/>PREFIX unirule:<http://purl.uniprot.org/unirules/>PREFIX taxon:<http://purl.uniprot.org/taxonomy/>PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIX hamap-sparql:<http://example.org/hamap_sparql/>PREFIX up:<http://purl.uniprot.org/core/>PREFIX faldo:<http://biohackathon.org/resource/faldo#>PREFIX method:<http://example.org/method/>PREFIX keyword:<http://purl.uniprot.org/keywords/>PREFIX owl:<http://www.w3.org/2002/07/owl#>PREFIX proteome:<http://purl.uniprot.org/proteomes/>PREFIX hamap:<http://purl.uniprot.org/hamap/>PREFIX annotation:<http://purl.uniprot.org/annotation/>PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>CONSTRUCT { ?this up:annotation ?annotation0, ?annotation1, ?annotation2, ?annotation3, ?annotation5; up:classifiedWith <http://purl.obolibrary.org/obo/19805>, <http://purl.obolibrary.org/obo/334>, <http://purl.obolibrary.org/obo/34354>, <http://purl.obolibrary.org/obo/43420>, <http://purl.obolibrary.org/obo/6569>, <http://purl.obolibrary.org/obo/8198>, keyword:223, keyword:560, keyword:662 . ?annotation0 a up:Function_Annotation; rdfs:comment "Catalyzes the oxidative ring opening of 3-hydroxyanthranilate to 2-amino-3-carboxymuconate semialdehyde, which spontaneously cyclizes to quinolinate." .

Page 171: RDF: what and why plus a SPARQL tutorial

Questions