48
Evaluating Reachability Queries over Path Collections* P. Bouros 1 , S. Skiadopoulos 2 , T. Dalamagas 3 , D. Sacharidis 3 , T. Sellis 1,3 1 National Technical University of Athens 2 University of Peleponnese 3 Institute for Management of Information Systems – R.C. Athena HDMS'09 * Long version of SSDBM’09 paper

Evaluating Reachability Queries over Path Collections* P. Bouros 1, S. Skiadopoulos 2, T. Dalamagas 3, D. Sacharidis 3, T. Sellis 1,3 1 National Technical

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Evaluating Reachability Queries over Path Collections*

P. Bouros1, S. Skiadopoulos2, T. Dalamagas3, D. Sacharidis3, T. Sellis1,3

1National Technical University of Athens2University of Peleponnese

3Institute for Management of Information Systems – R.C. Athena

HDMS'09

* Long version of SSDBM’09 paper

Introduction (I)

• Several applications store and query large collections of data sequences– Recent advances in GIS and geoservices resulted

in large volumes of routes (e.g., Points of Interest (POIs) sequences)

• Route collections– Points => nodes– Sequences => routes

HDMS'09

Introduction (II)

• Web sites retain huge collections of routes– ShareMyRoutes.com– TravelByGPS.com

• People visiting Athens– Track their sightseeing– Create routes of

interesting places

• Frequent updates– Users upload new routes

HDMS'09

Problem

• Route collections1. Too large to fit in main

memory2. Frequently updated,

adding new routes

• Reachability queries– Q: path from Academy to

Zappeion– A: Academy -> University

of Athens (change to route p2) -> Parliament-> Zappeion

HDMS'09

Problem

• Route collections1. Too large to fit in main

memory2. Frequently updated,

adding new routes

• Reachability queries– Q: path from Academy to

Zappeion– A: Academy -> University

of Athens (change to route p2) -> Parliament-> Zappeion

HDMS'09

Why not a graph-based solution?

• Transform route collection P into graph GP

1) Searching: depth or breadth-first search• Low storage and maintance cost• Slow query evaluation

2) Enconding transitive closure:1)Fast query evaluation2)Expensive precomputation, not for frequently updated graphs

1)2-hop [CH+02], HOPI [STW05] 2)DAGs: Geometric-based & partitioning 2-hop [CY+06,08], interval LB

[AB+89]3)GRIPP [TL07]

HDMS'09

Outline

• The pfs algorithm– Indexing route collections– Indexing route transitions

• Index maintenance• Experimental evaluation• Conclusions and Further work

HDMS'09

The pfs algorithm (I)

• Path-first search, basic idea: – Examine part of routes at once, not single nodes

• Extend depth-first search– Work with routes instead of graph edges

• For each route p containing current node v– Visit each node after v (successor) in p– Push to dfs stack set of successors at once

HDMS'09

The pfs algorithm (II)

• Find a path from node F to C

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfs algorithm (II)

• Find a path from node F to C

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfs algorithm (II)

• Find a path from node F to C

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfs algorithm (II)

• Find a path from node F to C

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)• Answer:

(F, D, N, B, C)

P-Index

• Inverted index on route collections– For each node store

routes containing it

• Access paths containing current node

• Better termination condition => pfsP– Identify a path containing

current node before target

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

node routes list

A <p1,1>, <p2,1>, <p5,1>

B <p1,2>, <p2,5>, <p4,3>

C <p1,3>

D <p1,4>, <p2,3>, <p4,1>

… …

P-Index

• Inverted index on route collections– For each node store

routes containing it

• Access paths containing current node

• Better termination condition => pfsP– Identify a path containing

current node before target

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

node routes list

A <p1,1>, <p2,1>, <p5,1>

B <p1,2>, <p2,5>, <p4,3>

C <p1,3>

D <p1,4>, <p2,3>, <p4,1>

… …

P-Index

• Inverted index on route collections– For each node store

routes containing it

• Access routes containing current node

• Better termination condition => pfsP– Identify a route

containing current node before target

HDMS'09

node routes list

A <p1,1>, <p2,1>, <p5,1>

B <p1,2>, <p2,5>, <p4,3>

C <p1,3>

D <p1,4>, <p2,3>, <p4,1>

… …p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsP algorithm

• Find a path from F to T

HDMS'09

node routes list

… ….

F <p2,2>, <p4,4>, <p5,2>

… …

T <p2,6>

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsP algorithm

• Find a path from F to T

HDMS'09

JOIN

JOIN

node routes list

… ….

F <p2,2>, <p4,4>, <p5,2>

… …

T <p2,6>

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsP algorithm

• Find a path from F to T

HDMS'09

JOIN

JOIN

node routes list

… ….

F <p2,2>, <p4,4>, <p5,2>

… …

T <p2,6>

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsP algorithm

• Find a path from F to T

• Answer: (F, D, N, B, T)

HDMS'09

JOIN

JOIN

node routes list

… ….

F <p2,2>, <p4,4>, <p5,2>

… …

T <p2,6>

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-graph (I)

HDMS'09

• Graph representation of collection– Nodes

• Routes of the collection

– Edges (pi, pj, v) • All possible transitions among

routes • Edge label v => share node,

link

• Better termination condition => pfsH– Identify an edge on H-graph

H-graph (I)

• Graph representation of collection– Nodes

• Routes of the collection

– Edges (pi, pj, v) • All possible transitions among

routes • Edge label v => share node,

link

• Better termination condition => pfsH– Identify an edge on H-graph

HDMS'09

p1 (A, B, C, D, J)

p4 (D, N, B, F, K)

H-graph (I)

HDMS'09

p1 (A, B, C, D, J)

p4 (D, N, B, F, K)

• Graph representation of collection– Nodes

• Routes of the collection

– Edges (pi, pj, v) • All possible transitions among

routes • Edge label v => share node,

link

• Better termination condition => pfsH– Identify an edge on H-graph

H-graph (I)

HDMS'09

p1 (A, B, C, D, J)

p4 (D, N, B, F, K)

• Graph representation of collection– Nodes

• Routes of the collection

– Edges (pi, pj, v) • All possible transitions among

routes • Edge label v => share node,

link

• Better termination condition => pfsH– Identify an edge on H-graph

H-graph (I)

HDMS'09

• Graph representation of collection– Nodes

• Routes of the collection

– Edges (pi, pj, v) • All possible transitions among

routes • Edge label v => share node,

link

• Better termination condition => pfsH– Identify an edge on H-graph

H-graph (II)

• Find a path from node F to J

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-graph (II)

• Find a path from node F to J

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-graph (II)

• Find a path from node F to J

HDMS'09

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)• Answer: (F, D, J)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

H-Index

• In practice, H-Index, adj. lists of H-graph

HDMS'09

route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,

<p4, D:4:1>

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

p1 p2

B,D

The pfsH algorithm

• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}

routes[J] = {<p1,5>}

HDMS'09

route edges list

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>

p5 <p2, F:2:2>,<p4, F:2:4>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsH algorithm

• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}

routes[J] = {<p1,5>}

HDMS'09

route edges list

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>

p5 <p2, F:2:2>,<p4, F:2:4>

… …

p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsH algorithm

• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}

routes[J] = {<p1,5>}

HDMS'09

route edges list

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>

p5 <p2, F:2:2>,<p4, F:2:4>

… …

JOIN

JOIN p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsH algorithm

• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}

routes[J] = {<p1,5>}

HDMS'09

route edges list

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>

p5 <p2, F:2:2>,<p4, F:2:4>

… …

JOIN

JOIN p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

The pfsH algorithm

• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}

routes[J] = {<p1,5>}

HDMS'09

route edges list

p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>

p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>

p5 <p2, F:2:2>,<p4, F:2:4>

… …

JOIN

JOIN p1 (A, B, C, D, J)

p2 (A, F, D, N, B, T)

p3 (N, L, M)

p4 (D, N, B, F, K)

p5 (A, F, K)

• Answer: (F, D, J)

Index maintenance

• P-Index, H-Index as inverted files on disk– Updates -> adding new routes– Not consider each new route separately– Batch updates, consider set of new routes

• Basic idea:– Build memory resident P-Index, H-Index for new

routes– Merge disk-based indices with memory resident

onesHDMS'09

Outline

• The pfs algorithm– Indexing route collections– Indexing route transitions

• Index maintenance• Experimental evaluation• Conclusions and Further work

HDMS'09

Setup

• Synthetic route collections– |P|, lavg, |V|, zipf, U

• Compare– Convert collection to graph, dfs & adjacency lists– pfsP & P-Index– pfsH & P-Index, H-Index

• Construction cost, query evaluation, vary one of |P|, lavg, |V|, zipf

• Maintenance cost, vary UHDMS'09

Index construction

HDMS'09

|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8

|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8

Query evaluation (I)

HDMS'09

|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8

lavg

|P| = 100000, |V| = 100000, zipf = 0.8

Query evaluation (II)

HDMS'09

|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8

zipf|P| = 100000, lavg = 10, |V| = 100000

Index maintenance

HDMS'09

|P| = 100000, lavg = 10, |V| = 100000, zipf = 0.8

U (%) U (%)

Conclusions

• Reachability queries over frequently updated route collections

• The path-first search (pfs) algorithm– Indexing route collections: P-Index & pfsP– Indexing route transitions: H-Index & pfsH

• Handling frequent updates, adding new routes• Experimental evaluation

– P-Index & pfsP, low construction & maintance cost– H-Index, P-Index & pfsH, fast query evaluation

HDMS'09

Further work

• Ongoing– New index that combines P-Index & H-Index

advantages• Low constructing and maintenance cost• Fast query evaluation

• Future work– Other types of queries

• Considering constraints

HDMS'09

Thank you!

HDMS'09