55
Ef#icient Execution of topk SPARQL queries Sara Magliacane (VU University Amsterdam) Alessandro Bozzon (Politecnico di Milano) Emanuele Della Valle (Politecnico di Milano)

ISWC 2012 "Efficient execution of top-k SPARQL queries"

Embed Size (px)

DESCRIPTION

Slides of my talk on efficient execution of top-k queries in SPARQL at ISWC 2012 in Boston. Including some bonus back-up slides :)

Citation preview

Page 1: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ef#icient  Execution  of  top-­‐k  SPARQL  queries  Sara  Magliacane  (VU  University  Amsterdam)  Alessandro  Bozzon  (Politecnico  di  Milano)  Emanuele  Della  Valle  (Politecnico  di  Milano)  

Page 2: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Outline  •  Introduc?on  

• What  are  top-­‐k  queries?  • Why  do  we  need  to  op?mize  them?  

• Our  approach:  •  A  rank-­‐aware  SPARQL  algebra  •  A  rank-­‐aware  execu?on  model  •  Three  planning  strategies  

•  Evalua?on   1  

Page 3: ISWC 2012 "Efficient execution of top-k SPARQL queries"

What  is  a  top-­‐k  query?  

2  

 

• A  query  that  returns    1.  a  limited  number  of  results  k    

2.  ordered  by  a  scoring  func?on  that  combines  several  criteria  

 

Page 4: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

3  

Page 5: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

4  

Page 6: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Rankings,  rankings  everywhere…  

5  

Page 7: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 

A  very  intui?ve  and  simplified  example:  

• Top  3  largest  countries  (by  both  area  and  popula?on)  

6  

 

Why  do  we  need  to  optimize  them?  

Page 8: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  standard  way:    materialize-­‐then-­‐sort  scheme  

7  

Countries  by  area  

Countries  by  popula?on  

Compute  all  the  242  join  combina?ons  

Sort  all  the  242  join  combina?ons    

Fetch  3  best  results  

…  

…  

…  

…  

…  

242   242  

…  

Page 9: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Can  we  make  it  more  ef#icient?  

8  

Countries  by  area  

Countries  by  popula?on  

Order  incrementally  the  combina?ons  using  par0al  orders  

Fetch  3  best  results  

…  

7   9  

Can  we  exploit  the  available  sorted  access  by  area  and  by  popula?on?  

Page 10: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  split-­‐and-­‐interleave  scheme    

•  The  intui?on  of  the  previous  example  can  be  formalized  with  the  split-­‐and-­‐interleave  scheme  from  RDBMS  [Li2005,  Hwang2007,  Ilyas2004,  Ilyas2008]  1.  Split  the  evalua?on  of  the  scoring  func?on  into  single  criteria    2.  Interleave  them  with  other  operators  3.  Use  par?al  orders  to  construct  incrementally  the  final  order  

•  Standard  assump?ons:  •  Monotone  scoring  func?on  •  Each  criterion  is  evaluated  as  a  [0,1]  number  (normaliza?on)  

•  Op?mized  for  the  case  of  fast  sorted  access  for  each  criterion  9  

Page 11: ISWC 2012 "Efficient execution of top-k SPARQL queries"

No  free  lunch…  

Users  are  interested  in  1<=  k  <=  100  (search  engines)  

!

"#$%&'(%)*+!

"#$%&'(%)*+!

",)*-).-!!

/01(+!

234!,567!

*8697.!0:!-7;5.7-!.7;8<,;!+!

!!+=!

!

>!

?61.0@767*,! /[email protected])-!

234!,567!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!

C537-!*8697.!0:!)<<!.7;8<,;!E!

C537-!

237A8,50*!,567!0:!,B7!;A0.5*F!!

:8*A,50*!:0.!7)AB!95*-5*F!

G9.7)+!7@7*!105*,H!,737A!

!

",)*-).-!"#$%&'!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!!

",)*-).-!"#$%&'!

*8697.!0:!)<<!.7;8<,;!E!

234!,567!;89<5*7).!

!

"#$%&'(%)*+!

"#$%&'(%)*+!

",)*-).-!!

/01(+!

234!,567!

*8697.!0:!-7;5.7-!.7;8<,;!+!

!!+=!

!

>!

?61.0@767*,! /[email protected])-!

234!,567!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!

C537-!*8697.!0:!)<<!.7;8<,;!E!

C537-!

237A8,50*!,567!0:!,B7!;A0.5*F!!

:8*A,50*!:0.!7)AB!95*-5*F!

G9.7)+!7@7*!105*,H!,737A!

!

",)*-).-!"#$%&'!

C537-!*8697.!0:!-7;5.7-!.7;8<,;!D!!

",)*-).-!"#$%&'!

*8697.!0:!)<<!.7;8<,;!E!

234!,567!;89<5*7).!

Orders  of  magnitude    

Split-­‐and-­‐interleave  

Orders  of  magnitude    

Page 12: ISWC 2012 "Efficient execution of top-k SPARQL queries"

11  

Page 13: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Top-­‐k  queries  in  SPARQL  1.1  Example  query  on  BSBM  [Bizer2009]:  •  The  top  10  offers  ordered  by  the  product  ra?ngs  and  offer  price:  

 

     

Tens  of  seconds  on  5M  triples  (could  be  improved  to  milliseconds)  

SELECT  ?product  ?offer    (norm1(?avgRat1)  +  norm2(?avgRat2)  +  norm3(?price)  AS  ?score)  WHERE  {    

?product  hasAvgRat1  ?avgRat1  .  ?product  hasAvgRat2  ?avgRat2  .  ?product  hasName  ?name  .  ?product  hasOffers  ?offer  .  ?offer  hasPrice  ?price    

}  ORDER  BY  DESC  (?score)    LIMIT  10  

       

12  

Page 14: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Split-­‐and-­‐interleave  in  SPARQL?    Related  work    • A  possible  solu?on  [Straccia2010,  Bozzon2011]:  

•  Rewrite  SPARQL  into  SQL    •  Use  exis?ng  op?mized  RDBMS  (e.g.  RankSQL  [Li2005])  

• Disadvantages:    • Works  if  data  are  already  in  a  RDBMS  

• What  about  na?ve  SPARQL  op?miza?ons?  •  Federated  queries  over  Linked  Data  [Wagner2012]:  complementary  to  our  approach   13  

Page 15: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Challenges  for  native  SPARQL  split-­‐and-­‐interleave  solutions    

14  

Differences  with  SQL  and  RDBMS   Proposed  solu0on  

Different  algebra     STEP  1:  New  algebra  (algebraic  operators  and  algebraic  equivalences)  

Different  cost  of  data  access  in  na?ve  RDF  triplestores  (sorted  access  is  slow)  

STEP  2:  New  algorithms  for  physical  operators,  possibly  using  less  sorted  access  

Addi?onal  op?miza?on  dimensions   STEP  3:  New  planning  strategies  

Algebra  Planning  strategies  

Planner  Query   Algebraic    tree  

Physical  operators  

Algebra  generator  

Query  plan    

Page 16: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Step  1:  a  rank-­‐aware  algebra  •  SPARQL-­‐Rank  algebra  [Bozzon2011]  

•  Extends  the  standard  SPARQL  algebra  [Perez2009]    •  Ranked  set  of  mappings:  set  of  mappings  augmented  with  an  order  rela?on  

 

15  

Extended  OPERATORS  

New  EQUIVALENCES  

Page 17: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

New  operator  rank    

16  

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

Page 18: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ω  

ρp1  

ρp1(Ω  )  

?x   ?y   ?p1   ?p2  

µ1   1   8   0.8   0.8  

µ2   3   3   0.3   0.6  

µ3   3   4   0.4   0.6  

?x   ?y   ?p1   Fp1  

µ1   1   8   0.8   1.8  

µ3   3   4   0.4   1.4  

µ2   3   3   0.3   1.3  

The  Rank  Operator  

Page 19: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

Redefined    standard    operators    

18  

Page 20: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?x   ?z   ?p2   Fp2  µ4   1   9   0.8   1.8  

µ5   3   0   0.6   1.6  

Ω’p2

 

?x   ?y   ?z   ?p1   ?p2   Fp1Up2  

µ1  U  µ4   1   8   9   0.8   0.8   1.6  

µ3  U  µ5   3   4   0   0.4   0.6   1.0  

µ2  U  µ5   3   3   0   0.3   0.6   0.9  

?x   ?y   ?p1   Fp1  

µ1   1   8   0.8   1.8  

µ3   3   4   0.4   1.4  

µ2   3   3   0.3   1.3  

Ωp1  

The  Join  Operator  

Page 21: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

Split  

20  

Page 22: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

21  

•  Allows  the  splimng  of  a  monolithic  scoring  func?on  into  several  rank  operators  

 

Page 23: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

Interleave  

22  

Page 24: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

•  Allows  to  order  incrementally  the  results  by  pushing  the  rank  operator  inside  the  query  tree.  

     

Page 25: ISWC 2012 "Efficient execution of top-k SPARQL queries"

24  

Image  from:    hnp://de-­‐?mekeeper.com/yahoo_site_admin/assets/images/benzinger20gold20gears200291.17120724_std.jpg  

From  algebra  to  execution  

Page 26: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 •  Rank  operator    

•  If  there  is  a  sorted  access  index  on  the  ranking  criterion  we  use  it  •  Otherwise:  rank  aggrega?on  algorithms,  e.g.  [Hwang2007]    

•  Join  operator  •  If  the  right  operand  does  not  influence  the  ranking:  streaming  index  join  

•  Otherwise:  a  rank-­‐join  algorithm  [see  next  slides]  

•  Other  operators  are  straighsorward:  •  E.g.  the  standard  FILTER  conserves  the  ordering  of  its  input  

Step  2:  physical  operators    (top-­‐k  algorithms)  

25  

Page 27: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

26  

Page 28: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

27  

Literature  

Page 29: ISWC 2012 "Efficient execution of top-k SPARQL queries"

•  Different  algorithms  based  on  available  access  in  the  inputs:  

•  Hash  Rank-­‐Join  •  e.g.  HRJN  [Ilyas2004]        

     •  Random  Access  Rank-­‐Join  

•  e.g.  RA-­‐HRJN  [Ilyas2004]      

•  RankSequence  (e,g,  RSEQ)  •  Minimum  sorted  access  •  Leverages  random  access  

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

(a)RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess(a)

RankJoin

sortedAccesssortedAccess

(b)RankSequence

randomAccesssortedAccess

(c)

RA-RankJoin

sortedAccessrandomAccess

sortedAccessrandomAccess

Rank-­‐Join  algorithms  

28  

New  

Page 30: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Step3:  planning  strategies  •  Using  the  algebraic  equivalences  we  can  produce  several  equivalent  algebraic  trees  

•  The  planner  can  use  them  to  implement  several  planning  strategies  

 

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

(a) (b) (c)

g1(?a1)

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

seqScan

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

g3(?p1)

?pr, ?of, ?score

[0,10]SLICE

orderScan_a1

?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICE

g1(?a1)

g3(?p1)seqScan

?pr hasN ?n

Sequence

seqScan

?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1

?pr, ?of, ?score

[0,10]SLICE

?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.

[?score]ORDER

[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND

(a)

?pr = ?pr

?pr, ?of, ?score

[0,10]SLICEJoin

g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .

?pr = ?prRankJoin

?pr = ?pr?pr hasN ?n .

RankJoin

g2(?a2)

?pr hasA2 ?a2 .

(b)

1.  Rank  of  BGPs   2.  Interleaved   3.  Rank  Join   29  

Page 31: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

norm2(?avgRat2)

norm3(?price)

?product, ?offer, ?score

[0,10]SLICE

?product, ?offer, ?score

norm1(?avgRat1)

norm3(?price)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

norm1(?avgRat1) norm2(?avgRat2)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasOffer ?offer .?offer hasPrice ?price.

1.  Rank  of  BGPs  (ROB)  •  Split  the  monolithic  scoring  func?on  into  several  incremental  rank  operators  (rho)  

30  Rank  of  BGPs  Materialize-­‐then-­‐sort  

Page 32: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm1(?avgRat1)

norm3(?price) {?product hasName ?name }

norm2(?avgRat2)

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasOffer ?offer .?offer hasPrice ?price.

2.  Interleaved  (INTER)  •  Separate  the  panern  in  two  groups:  

•  Triple  panerns  that  influence  the  ranking    •  Triple  panerns  that  don’t  influence  the  ranking  

 

31  Interleaved  

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

Materialize-­‐then-­‐sort  

Page 33: ISWC 2012 "Efficient execution of top-k SPARQL queries"

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)

?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

3.  Rank-­‐Join  (RJ)  •  Split  into  one  triple  panern  for  each  ranking  criterion  •  Most  appropriate  join  algorithm  based  on  available  access  

32  Rank-­‐Join  

?product, ?offer, ?score

[0,10]SLICE

?product hasAvgRat1 ?avgRat1. ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffer ?offer .?offer hasPrice ?price.

[?score]ORDER

[?score = norm1(?avgRat1)+norm2(?avgRat2)+norm3(?price)]EXTEND

?product = ?product

?product, ?offer, ?score

[0,10]SLICE

norm3(?price) norm1(?avgRat1)?product hasOffer ?offer .?offer hasPrice ?price. ?product hasAvgRat1 ?avgRat1}

?product = ?productRankJoin

?product = ?product {?product hasName ?name} RankJoin

norm2(?avgRat2)

?product hasAvgRat2 ?avgRat2}

Materialize-­‐then-­‐sort  

Page 34: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experimental  evaluation  

33  

Page 35: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experimental  evaluation  •  Prototype  implementa?on  of  our  system:  

•  ARQ-­‐Rank  (extends  Jena  ARQ  2.8.9)    •  Extended  version  of  Berlin  SPARQL  Benchmark  [Bizer2009]  •  Added  ranking  anributes    •  Added  top-­‐k  queries  

•  Jena  TDB  0.8.11  as  storage  

•  Code  and  experiments:  sparqlrank.search-­‐compu?ng.org  34  

Page 36: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (slow  sorted  access)  

35  

One  to  two  orders  of  magnitude  bener    

Page 37: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  1:  compare  planning  strategies  •  Example  query,  5M  triples  dataset  •  Standard  scenario:  sorted  access  indexes  (fast  sorted  access)    

36  

Two  orders  of  magnitude  bener    

Page 38: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Experiment  2:  Small  Benchmark    (8  queries)  

!""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$*+,-.$,

/,0+12

3$14

,$5467$

89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$

!"#$ !%#$ !&#$

37  !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$ !""#$$ %&"#$$ &""#$ !'$ &'$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$

!"($

!")$

!"%$

!"!$*+,

-.$,/,0+12

3$14

,$54

67$

89:96,:$6;<,$ 89:96,:$6;<,$ 89:96,:$6;<,$

!"#$ !%#$ !&#$

Page 39: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Conclusions  and  Future  Work  •  A  system  that  speeds  up  the  execu?on  of  top-­‐k  queries  in  SPARQL  by  orders  of  magnitude:  •  STEP  1:  A  rank-­‐aware  SPARQL  algebra  (SPARQL-­‐Rank  algebra)  •  STEP  2:  A  rank-­‐join  algorithm  (RSEQ)  •  STEP  3:  Three  planning  strategies  (ROB,  INTER,  RJ)  

 •  ARQ-­‐Rank,  a  rank-­‐aware  extension  of  Jena  ARQ  •  A  small  benchmark  for  top-­‐k  queries,  based  on  BSBM  [Bizer2009]    

•  All  available  at  sparqlrank.search-­‐compu?ng.org      •  Future  work:  

•  More  advanced,  cost-­‐based,  op?miza?on  techniques  •  Extension  to  federated  top-­‐k  query  processing    •  Top-­‐k  queries  under  OWL2QL  entailment  regime  

38  

Page 40: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Bibliography  •  [Bozzon2011]  A.  Bozzon  et  al.  Towards  and  efficient  SPARQL  top-­‐k  query  execu?on  in  virtual  RDF  stores.  In  DBRANK  workshop  at  VLDB  ’11,  2011.  

•  [Wagner2012]  A.  Wagner  et  al.  Top-­‐k  Linked  Data  Query  Processing.  In  ESWC  ’12.  Springer,  2012.  

•  [Bizer2009]  C.  Bizer  and  A.  Schultz.  The  Berlin  SPARQL  Benchmark.  Int.  J.  Seman?c  Web  Inf.  Syst.,  5(2),  2009.  

•  [Li2005]  C.  Li  et  al.  RankSQL:  query  algebra  and  op?miza?on  for  rela?onal  top-­‐k  queries.  In  SIGMOD  ’05.  ACM,  2005.  

•  [DellaValle2012]  E.  Della  Valle  et  al.  Order  maners!  harnessing  a  world  of  orderings  for  reasoning  over  massive  data.  Seman?c  Web  Journal,  2012.  

•  [Hwang2007]  S.-­‐w.  Hwang  and  K.  Chang.  Probe  minimiza?on  by  schedule  op?miza?on:  Suppor?ng  top-­‐k  queries  with  expensive  predicates.  IEEE  TKDE,  19(5),  2007.  

39  

Page 41: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Bibliography  •  [Ilyas2004]  I.  F.  Ilyas  et  al.  Rank-­‐aware  Query  Op?miza?on.  In  SIGMOD  ’04.  ACM,  2004.    

•  [Ilyas2008]  I.F.Ilyas  et  al.  A  survey  of  top-­‐k  query  processing  techniques  in  rela?onal  database  systems.  ACM  Comput.  Surv.,  40(4),  2008.    

•  [Perez2009]  J.  Perez  et  al.  Seman?cs  and  complexity  of  SPARQL.  ACM  Trans.  Database  Syst.,  34(3),  2009.    

•  [Schmidt2010]  M.  Schmidt  et  al.  Founda?ons  of  SPARQL  query  op?miza?on.  In  ICDT  ’10,  ACM,  2010.    

•  [Straccia2010]  U.  Straccia.  SoxFacts:  A  top-­‐k  retrieval  engine  for  ontology  mediated  access  to  rela?onal  databases.  In  SMC  ’10.  IEEE,  2010.    

40  

Page 42: ISWC 2012 "Efficient execution of top-k SPARQL queries"

41  

Page 43: ISWC 2012 "Efficient execution of top-k SPARQL queries"

BACK-­‐UP  SLIDES   42  

Page 44: ISWC 2012 "Efficient execution of top-k SPARQL queries"

 

An  addi?onal  less  intui?ve  and  less  simplified  example:  

• Top  2  couples  of  most  populated  ci?es  and  largest  countries  

43  

 

Why  do  we  need  to  optimize  them?  

Shanghai  Moscow  

Page 45: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  materialize-­‐then-­‐sort  scheme  

44  

Countries  by  area  

Ci?es  by  popula?on  

Materialize  all  14K  combina?ons  

Sort  all  14K  join  combina?ons    

Fetch  2  best  results  

…  

249   14K*  

0.04  

2e-­‐08  

Shanghai  Moscow  

Va?can  

*  According  to  DBPedia,  but  probably  more  

1   Shanghai  Istanbul  Karachi  Mumbai  Moscow  

0.567  

0.563  0.497  

0.05  0.185  

Shanghai  …   Va?can  

Page 46: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Can  we  make  it  more  ef#icient?  

45  

Countries  by  area  

Ci?es  by  popula?on  

Order  incrementally  the  combina?ons  using  par0al  orders  

Fetch  2  best  results  

9   13  

Can  we  exploit  the  sorted  access  by  area  and  by  popula?on?    

Shanghai  Moscow  

…  

Shanghai  Istanbul  Karachi  Mumbai  Moscow  …  

Page 47: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Maximal possible score

Given a scoring function F (p1, …, pn) and a set of predicates P = {p1, …, pj} the maximal possible score for a mapping µ is defined as:

pi = pi [µ] if pi ∈ P pi = 1 otherwise

∀i FP (p1, …, pn) [µ] = F ( )

Mapping µ … an intermediate SPARQL solution, equivalent to a SQL tuple

?x   ?y   ?p1   ?p2  µ1   1   8   0.8   0.8  

µ2   3   3   0.3   0.6  set of mappings

SPARQL-­‐Rank  algebra  De#initions  

Page 48: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Ranking principle

Ranked set of mappings

Given a set of predicates P, a ranked set of mappings ΩP is a set of mappings Ω augmented with the following properties:

•  Score: for each mapping µ, the maximal possible score FP [µ] •  Order: the order relation <ΩP is defined on ΩP based on the scores

of the single mappings

Given two mappings µ1 e µ2 with FP [µ1]> FP [µ2] , if we process µ2 we need to process also µ1.

SPARQL-­‐Rank  algebra  De#initions  

Page 49: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  SPARQL-­‐Rank  algebraic  operators  

48  

Page 50: ISWC 2012 "Efficient execution of top-k SPARQL queries"

SPARQL-­‐Rank  algebraic  equivalences  

49  

Page 51: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Allows to order incrementally the results by pushing the rank operator inside the query execution tree.

SPARQL-­‐Rank  algebraic  equivalences  

Page 52: ISWC 2012 "Efficient execution of top-k SPARQL queries"

The  RSEQ  algorithm  

51  

Page 53: ISWC 2012 "Efficient execution of top-k SPARQL queries"

Evaluation:  additional  technical  information  •  Experimental  semng:  

•  AMD  64  bit  processor  2.66  GHz  •  4  GB  RAM  •  Debian  kernel  2.6.26-­‐2  •  Sun  Java  1.6.0    

•  Maximum  heap  size  2GB    

•  8  queries  available  at  sparqlrank.search-­‐compu?ng.org  

52  

Page 54: ISWC 2012 "Efficient execution of top-k SPARQL queries"

More  experimental  results  the  RankJoin  operators  •  Example  query,  5M  triples  dataset  •  Worst-­‐case  scenario:  no  sorted  access  indexes  (lex)  

•  RSEQ  is  the  best,  especially  for  k  <  1000  •  Standard  scenario:  sorted  access  indexes  (right)  

•  All  three  are  comparable,  RA-­‐HRJN  is  best  for  k  >  1000  

53  

Page 55: ISWC 2012 "Efficient execution of top-k SPARQL queries"

ARQ-­‐Rank  architecture  

54