27
Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge © Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge @fsheer Fadi Maali RDF AnalyticsSPARQL and Beyond[email protected]

RDF Analytics... SPARQL and Beyond

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

@fsheer

Fadi Maali

RDF Analytics… SPARQL and Beyond…

[email protected]

Page 2: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Why analytics (1/2)

Page 3: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Why analytics (2/2)

Page 4: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Appetite Whetting (1/3)

Google accurately detects Flu trend ahead of the U.S. Center for Disease Control.

http://www.google.org/flutrends/about/how.html

Page 5: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-accurately-investment-tactic-say-scientists.html

Appetite Whetting (2/3)

Page 6: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Appetite Whetting (3/3)

http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html

Flavor pyramids for North American and East Asian cuisines

Page 7: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Data Science and RDF

Ø  Can we do “data science” using RDF data?

§  Do we have the data?

§  Do we have the tools?

Ø  Why should we use RDF?

Page 8: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

RDF Characteristics

§  Graph data model

§  Clearly defined semantics

§  Support Web-scale distributed publication

Page 9: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Available RDF Data

§  Freebase has 1.2 billion triples (Google) §  The LOD Cloud has more than 31 billion triples §  Embedded RDF data: schema.org, Drupal…

http://lod-cloud.net/

Page 10: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

Available RDF Tools

In this presentation we focus on the standard SPARQL: q  W3C Recommendation

q  Supports Querying, transforming and updating RDF data

q  Large number of available implementations

q  Define a communication protocol

q  427 public SPARQL endpoints registered on the DataHub* * http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf

Page 11: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

RDF Data… a graph

Page 12: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?name  WHERE{      ?p  :name  ?name  .  }ORDER  BY  ?name  

SPARQL… Simple queries

Page 13: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?gender  (COUNT(*)  AS  ?count)  WHERE{      ?p  :gender  ?gender  }  GROUP  BY  ?gender  

SPARQL… BI queries

Page 14: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?gender  (COUNT(*)  AS  ?count)  WHERE{      ?p  :gender  ?gender  }  GROUP  BY  ?gender  

SPARQL… BI queries

Page 15: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?name  (COUNT(?n)  AS  ?neighbours)  WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .  }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)  

SPARQL… BI queries

Page 16: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?name  (COUNT(?n)  AS  ?neighbours)  WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .  }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)  

SPARQL… BI queries

Page 17: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL… BI queries

Ø  How influential a person is within a social network Ø  How a road is within an urban network Ø  How central an employee in an enterprise

Page 18: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL… Graph measure

Can we use SPARQL to compute shortest paths in the graph? Short answer: NO! Long answer: Let’s try!

Page 19: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SELECT  ?v1  ?v2  (MIN(?l)  AS  ?shortestPath)  WHERE{      {          ?v1  :knows  ?v2  BIND  (1  AS  ?l)      }  UNION        {          ?v1  :knows{2}  ?v2  BIND  (2  AS  ?l)      }  UNION        {          ?v1  :knows{3}  ?v2  BIND  (3  AS  ?l)      }        FILTER  (?v1  !=  ?v2)  }  GROUP  BY  ?v1  ?v2  

SPARQL… graph measure

Page 20: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL… graph measure

Page 21: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL… graph measure

Ø  finding directions between physical locations

Ø  finding the most direct way to contact a person

Ø  finding the min-delay communication path

Page 22: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL… clustering

Can we do clustering using SPARQL? YES! Peer-pressure algorithm implemented using (almost only) SPARQL*

* http://yarcdata.com/blog/?p=318

Page 23: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

DROP  GRAPH  <urn:ga/g/xjz1>  ;    CREATE  GRAPH  <urn:ga/g/xjz1>;    INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s    }  

SPARQL… clustering

Page 24: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

DROP  GRAPH  <urn:ga/g/xjz1>  ;    CREATE  GRAPH  <urn:ga/g/xjz1>;    INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s    }  

SPARQL… clustering

Page 25: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL Expressivity

Ø  BI-like operations (rollup and drilldown)

Ø  Graph Measures

Ø  Iterative algorithms (Clustering)

Page 26: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

SPARQL Scalability…

One approach is to use a scale-out architecture… think MapReduce or Hadoop q  Translate SPARQL into MapReduce

q  Process RDF data directly in MapReduce

Page 27: RDF Analytics... SPARQL and Beyond

Digital Enterprise Research Institute www.deri.ie

Enabling networked knowledge

All examples used in this presentation and equivalent of some of them using Pig Latin is available at: https://github.com/fadmaa/rdf-analytics

Conclusion

Ø  Can we do “data science” using RDF data?

§  Do we have the data? YES

§  Do we have the tools? Almost

v  Is SPARQL expressive enough? Almost v  Does it scale? Yes… in principle, No in practice v  Is it usable/easy? Not really