1
0 375 750 1125 1500 1875 2250 2625 3000 15 Usage of http://sparql.uniprot.org See also: www.isb-sib.org See also: www.isb-sib.org See also: www.sib.swiss Contact [email protected] www.uniprot.org The UniProt SPARQL Endpoint: 34 Billion Triples in Production Jerven Bolleman 1 , Sebastien Gehant 1 , Thierry Lombardot 1 , Alan Bridge 1 , Ioannis Xenarios 1,2,3 , Nicole Redaschi 1 , and the UniProt Consortium 1,4,5 1 Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva 4, Switzerland, 2 Vital-IT Group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3 University of Lausanne, 1015 Lausanne, Switzerland, 4 European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK, 5 Protein Information Resource (PIR), Georgetown University Medical Center, 3300 Whitehaven Street, NW, Suite 1200, Washington, DC 20007, USA UniProt is mainly supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. PIR's UniProt activities are also supported by the NIH grants 5R01GM080646-07, 3R01GM080646-07S1, 5G08LM010720-03, and 8P20GM103446-12, and the National Science Foundation (NSF) grant DBI-1062520. UniProt on the web UniProt is a comprehensive resource for protein sequence and annotation data. It has been available on the web since its creation in 2002 (and its predecessors Swiss-Prot and TrEMBL much longer...). UniProt on the semantic web All UniProt data is available in RDF since 2007 and can be downloaded in this format from the UniProt FTP site and the www.uniprot.org REST interface. Since 2014 you can also query the data directly on our public SPARQL endpoint at sparql.uniprot.org. The UniProt data has grown eight fold over the last five years. UniProt release 2017_11 consists of 34 billion triples and requires just over 1.6TB of disk space when loaded in Virtuoso 7.2., a columnar relational database that supports SPARQL. at your SERVICE The SERVICE keyword allows you to run part of your query on another SPARQL endpoint. For example you can combine the UniProt and Ensembl endpoints to get the coding exons for a protein. select ?protein ?transcript ?exon ?order { ?protein rdfs:seeAlso ?transcript . ?transcript up:database database:Ensembl . SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/> { ?transcript obo:SO_translates_to ?peptide . ?peptide a ensemblterms:protein . ?transcript obo:SO_has_part ?exon ; sio:SIO_000974 ?orderedPart . ?orderedPart sio:SIO_000628 ?exon ; sio:SIO_000300 ?order . } } Using http://sparql.uniprot.org This website contains example queries with brief English explanations. You can download query results in a number of formats, including tab- or comma-separated for use in Excel, R and other tools. SPARQL: A graph query language SPARQL is a standard for querying a graph database and looks a little bit like SQL. It is optimised for pattern matching and cross data source queries. There are more than 40 compliant implementations of the latest version 1.1 recommendation. Hardware Node 2 64 cpu cores 256 GB ram 8 TB consumer SSD Node 1 64 cpu cores 256 GB ram 8 TB consumer SSD Load Balancer = Apache mod_balancer Many more endpoints on the web Triples: Simple sentences for complicated data RDF uses (many) simple ‘sentences’ to describe information. Each one consists of subject-predicate-object, making it a triple. Example: <http://sparql.uniprot.org/> rdfs:comment ‘a free API for you’ SERVICE SERVICE SPARQL endpoints: Communicating over HTTP SERVICE Your everyday tools: Accessing endpoints over HTTP SPARQL API SPARQL API SPARQL API SPARQL API Powered by Vital-IT Powered by Vital-IT ChEMBL & more 14 17 16

sparql.uniprot.org in production poster

Embed Size (px)

Citation preview

Page 1: sparql.uniprot.org in production poster

0

375

750

1125

1500

1875

2250

2625

3000

15

Usage of http://sparql.uniprot.org

See also: www.isb-sib.orgSee also: www.isb-sib.org

See also: www.sib.swiss

Contact [email protected] www.uniprot.org

The UniProt SPARQL Endpoint: 34 Billion Triples in Production Jerven Bolleman1, Sebastien Gehant1, Thierry Lombardot1, Alan Bridge1, Ioannis Xenarios1,2,3, Nicole Redaschi1, and the UniProt Consortium1,4,5 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva 4, Switzerland, 2Vital-IT Group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3University of Lausanne, 1015 Lausanne, Switzerland, 4European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK, 5Protein Information Resource (PIR), Georgetown University Medical Center, 3300 Whitehaven Street, NW, Suite 1200, Washington, DC 20007, USA

UniProt is mainly supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. PIR's UniProt activities are also supported by the NIH grants 5R01GM080646-07, 3R01GM080646-07S1, 5G08LM010720-03, and 8P20GM103446-12, and the National Science Foundation (NSF) grant DBI-1062520.

UniProt on the web

UniProt is a comprehensive resource for protein sequence and annotation data. It has been available on the web since its creation in 2002 (and its predecessors Swiss-Prot and TrEMBL much longer...).

UniProt on the semantic web

All UniProt data is available in RDF since 2007 and can be downloaded in this format from the UniProt FTP site and the www.uniprot.org REST interface. Since 2014 you can also query the data directly on our public SPARQL endpoint at sparql.uniprot.org.

The UniProt data has grown eight fold over the last five years. UniProt release 2017_11 consists of 34 billion triples and requires just over 1.6TB of disk space when loaded in Virtuoso 7.2., a columnar relational database that supports SPARQL.

at your SERVICE

The SERVICE keyword allows you to run part of your query on another SPARQL endpoint. For example you can combine the UniProt and Ensembl endpoints to get the coding exons for a protein.

select ?protein ?transcript ?exon ?order {   ?protein rdfs:seeAlso ?transcript . ?transcript up:database database:Ensembl .  SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/> {   ?transcript obo:SO_translates_to ?peptide .   ?peptide a ensemblterms:protein .   ?transcript obo:SO_has_part ?exon ;                sio:SIO_000974 ?orderedPart .  ?orderedPart sio:SIO_000628 ?exon ; sio:SIO_000300 ?order .  } }

Using http://sparql.uniprot.org

This website contains example queries with brief English explanations. You can download query results in a number of formats, including tab- or comma-separated for use in Excel, R and other tools.

SPARQL: A graph query language

SPARQL is a standard for querying a graph database and looks a little bit like SQL. It is optimised for pattern matching and cross data source queries. There are more than 40 compliant implementations of the latest version 1.1 recommendation.Hardware

Node 2

64 cpu cores 256 GB ram

8 TB consumer SSD

Node 1

64 cpu cores 256 GB ram 8 TB consumer SSD

Load Balancer = Apache mod_balancer

Many more endpoints on the web

Triples: Simple sentences for complicated data

RDF uses (many) simple ‘sentences’ to describe information. Each one consists of subject-predicate-object, making it a triple. Example:

<http://sparql.uniprot.org/> rdfs:comment ‘a free API for you’

🏖🎄 🎄

SERVICE

SERVICE

SPARQL endpoints: Communicating over HTTP

SERVICE

Your everyday tools:Accessing endpoints over HTTP

SPARQL API

SPARQL API

SPARQL API

SPARQL API

Powered by Vital-IT

Powered by Vital-IT

ChEMBL

& more

🏖 🎄14 17🏖 🎄16