STAR: Steiner-Tree Approximation in Relationship Graphs

Preview:

DESCRIPTION

STAR: Steiner-Tree Approximation in Relationship Graphs. Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci , Maya Ramanath , Mauro Sozio , Fabian M. Suchanek , Gerhard Weikum. Introduction. Entity-Relationship Graphs - PowerPoint PPT Presentation

Citation preview

STAR: Steiner-Tree Approximation in RelationshipGraphs

Max-Planck Institute for Informatics,Database and Information Systems,

Gjergji Kasneci , Maya Ramanath , Mauro Sozio, Fabian M. Suchanek , Gerhard Weikum

Introduction

• Entity-Relationship Graphs – An other way of representing relational Data– Consist of labeled Nodes and Edges, – Node Labels correspond to Entities– Edge Labels represent relations between Entities – Edge Weights and Entity relation strength. – Taxonomic Relations (subClassOf, type )

Introduction

• Example of an Entity Relationship Graph

Specialization

Generalization

Introduction• Quering E-R Graphs

– The Relationship Search Query Class: • Given a set of two, three, or more entities (nodes), find their closest

relationships (edges or paths) that connect the entities in the strongest possible way.

• Strongest Related to Informativenes– A Relationship Search Query Example

• Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?”

• Informative answer: All have a doctoral degree from a German university

– How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?

Motivation and Problem • Information Discovery as opposed to Lookup• The Nature of the Answer

– Can be a Tree embeded In Original Graph – Input Nodes (Query) must be connected by the Tree– How Good is the answer?

• A scoring model can exploit node and edge weights

• The formal Definition of the Problem: – Compute the k lowest-cost Steiner trees:

Motivation and Problem • What is a Steiner Tree Problem?

• Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S.

Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.

Motivation and Problem • Steiner Tree Problem Complexity

– NP-Hard Complete (Optimal)– Approximate Solution algorithms– Approximation Ratio:

• Measures the Quality of approximation algorithm • Weight of Aproximate Graph out / weight of Optimal Graph Output

• Benefits by Reducing Approximation Ratio– Viable Runtimes (efficiency)– Better Graph quality (Informativenes) near-optimal

Paper Contributions• Presents STAR a new Efficient algorithm

– Computes near-optimal Steiner Trees– Exploits Taxonomic Schema (when available)– Viable Runtimes over large graphs

• STAR Approximation Ratio Proofs:– O(logn), for n given query entities (Worst Case)

• Improvement over other approximation ratios– , or – STAR practically is better than a - approximation algorithm

• STAR top-k tree capability• STAR Outperforms State of the art algorithms by an order of magnitude• Can be applied either on main memory datasets or on-disc resident

Large Graphs. • Evaluation via Comparison with other cutting edge algorithms

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

The Star Algorithm – Introduction • Problem Definition

– As Stated in introduction – Further we are interested in finding Top-k result trees by increasing order

• Exploitation of Taxonomic Backbones– Node Labels as Entities – Edge Labels as weights or relations– Taxonomic Availability is not compulsory

• Runs in 2 Phases• Phase 1: Uses Taxonomic Information (when available)

– Builds a quick Tree by pruning the Original Graph– Interconnects all given nodes

• Phase 2: Iteratively improves the Tree from Phase 1

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

The Star Algorithm - First Phase

• Prunes Original Graph • Runs Iterators in each Terminal• Iterators Run in a Round Robin Manner• Iterators Follow only Taxonomic Edges:

– subClassOf, type

Single Breadth – First - Search Iterator Pruning Example

15

Breadth First Search

s

2

5

4

7

8

3 6 9

Observe Taxonomic Structure

16

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s

Top of queue

2

1Shortest pathfrom s

17

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s 2

Top of queue

3

1

1

18

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s 2 3

Top of queue

5

1

1

1

19

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5

Top of queue

1

1

1

20

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5

Top of queue

4

1

1

1

2

21

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5 4

Top of queue

1

1

1

2

5 already discovered:don't enqueue

22

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5 4

Top of queue

1

1

1

2

23

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4

Top of queue

1

1

1

2

24

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4

Top of queue

1

1

1

2

6

2

25

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4 6

Top of queue

1

1

1

2

2

26

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 5 4 6

Top of queue

1

1

1

2

2

27

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 5 4 6

Top of queue

1

1

1

2

2

28

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6

Top of queue

1

1

1

2

2

29

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6

Top of queue

1

1

1

2

2

8

3

30

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6 8

Top of queue

1

1

1

2

2

3

31

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8

Top of queue

1

1

1

2

2

3

7

3

32

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8 7

Top of queue

1

1

1

2

2

3

9

3

3

33

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8 7 9

Top of queue

1

1

1

2

2

3

3

3

34

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 8 7 9

Top of queue

1

1

1

2

2

3

3

3

35

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

36

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

37

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

38

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

39

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

40

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

41

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

42

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue:

Top of queue

1

1

1

2

2

3

3

3

43

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Level Graph

1

1

1

2

2

3

3

3

First – Phase Example

(Simple Breadth – First – Search Iterator from each Terminal)

V’ = {Max Planck, Arnold Schwarzenegger, Germany}

45

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Breadth First Search Iterators from Each Terminal

As soon as iterators meet a result is constructed

46

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:

Queue: T3:

T1 T2 T3

Breadth First Search Iterators from Each Terminal

47

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3:

T1 T2 T3

Breadth First Search Iterators from Each Terminal

48

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

49

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

50

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

51

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

52

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

53

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician, Actor

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

54

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

55

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

56

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

57

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

58

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

59

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

60

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

61

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

62

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: don't enqueue

63

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3: Organization Unit, Entity

T1 T2 T3

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T2 & T3 Iterators Met Stop T3 Iterator

64

Queue T1: Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T1 T2

Breadth First Search Iterators from Each Terminal

65

Queue T1: Person, Entity

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T1 T2

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T1 & T2 Iterators Met Stop T1 Iterator

66

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T2

Breadth First Search Iterators from Each Terminal

67

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T2

Breadth First Search Iterators from Each Terminal

68

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:

Queue: T3:

Breadth First Search Iterators from Each Terminal

The Star Algorithm – Second Phase • Aims to Improve the Tree from Phase 1• Follows an iterative improvement procedure

– Certain paths are replaced on each Iteration– New path weights are lower

• Some Definitions : • Terminal Node:

– Any node v є V’• Degree of a node v, deg(v):

– Is the number of edges connected to the node• Fixed Node:

– Any node v, of deg(v) ≥ 3 – Any Terminal Node

The Star Algorithm – Second Phase • Loose Path :

– A path p in T is a loose path if it has minimal length and its end nodes are fixed nodes.

• Fixed nodes should not be removed during Improvement• Follows that Every intermediate node v in a loose path must

be a Steiner node of deg(v) = 2

• A loose Path is a path that can be replaced during improvement process

• A minimal Steiner Tree with respect to V’ is a tree in which all loose paths represent shortest paths between fixed nodes.

The Star Algorithm – Second PhaseObservations

• Removing a LP T1, T1 subtrees

• Replacing any LP by a shorter– Compute shortest path between

any node of T1 to any node of T2• Removing and Inserting LPs

Fixed nodes and Unfixed nodes

The Star Algorithm – Second PhaseFinding an approximate Steiner Tree

1. Remove a LP2. Decomposition of

T into T1 and T23. Connect T1 and T2

by a shorter than LP path

The Star Algorithm – Second PhaseFinding an approximate Steiner Tree

The Star Algorithm – Second PhaseThe Tree improving algorithm

• The Difficult Steiner Tree Problem is Reduced– Find shortes paths between node subsets

• In each iteration lp with max weight is removed (Heuristic)

The Star Algorithm – Second PhaseThe method: replace(lp, T)

• Removes the loose path form T• T is split into subgraphs T1 and T2 • The shortest path connecting any node of T1

to any node of T2 is determined– replace (lp, T) calls findShortestPath(VT1, VT2, lp)– findShortestPath(VT1, VT2, lp), returns the

shortest path

Steiner Tree Approximation - Phase 2

Physicist

Max Planck

Scientist

Person

Entity

PoliticianActor

Arnold Schwarzenegger

Organization Unit

State

Germany

Angela Merkel

• The overall Graph G

Steiner Tree Approximation - Phase 2 • Output of Phase 1

• deg(Person) = 3, therefore it is a Fixed point• Largest LP occurs between Person & Germany

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Steiner Tree Approximation - Phase 2

• Remove Largest LP• Fixed Nodes are not removed

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Steiner Tree Approximation - Phase 2

• G splitted into sub graphs T1 & T2• V(T1) = { Person, Politician, Scientist, Physicist, Max Plank} • V(T2) = {Germany}• Algorithm for finding shortest path between T1 & T2 • Method call: shortestPath(V(T1), V(T2), lp)

Physicist

Max Planck

Scientist

Person

Politician

Arnold Schwarzenegger

Germany

T1 T2

Phase 2– shortest Path Algorithm

• All pruned vertices are needed• Runs “One single source shortest

path iterator from V(T1) and V(T2)”

• i.e. Find the shortest path from a source Vertex V to all other vertices in graphs.

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

GermanyAngela Merkel

Phase 2– shortest Path Algorithm

• Vertex distance d(v) initialization• Assign TWO distances (d1, d2) to each vertex• Assign d1 = 0 to all vertices of V(T1)• Assign d2 = 0 to all vertices of V(T2)• Assign d1= ∞ to all vertices of V(T2)• Assign d2= ∞ to all vertices of V(T1)• Assign d1= d2 = ∞ to all pruned or not

queried vertices

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

GermanyAngela Merkel

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

• T1 is considered a single node of distance 0 from itself and distance ∞ from T2

• T2 accordingly • Other nodes not members of T1 or T2 have infinite distances from

both T1 or T2

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2Q1 (d1) Q2 (d2)

Arn(0) Ger(0)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Current: points to iterator of minimal fringe nodes And that is currently expanded

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2

1 2 1 Ger

Q1 (d1) Q2 (d2)

Arn(0)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Fringe(Q2) < Fringe (Q1)• Swap (current, Other)• Dequeue Germany form Q2

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

Q1 (d1) Q2 (d2)

Arn(0) Sta(0,95)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(State) = 0 + 0,95• Enqueue(State) in Q2

0,95

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

2 2 1 Ger Ang

Q1 (d1) Q2 (d2)

Arn(0) Ang(0,96)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(Angela Merkel) = 0 + 0,96• Enqueue Angela Merkel in Q2

0,95

0,96

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang

Q1 (d1) Q2 (d2)

Arn(0) Sta(0,95)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Dequeue Angela Merkel from Q2

0,95

0,96

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

Q1 (d1) Q2 (d2)

Arn(0) Phy(1,91)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(Physicist) = 0,96 + 0,95• Enqueue Physicist in Q2

0,95

0,96

0,95

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

Q1 (d1) Q2 (d2)

Arn(0) Phy(1,91)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

• d2(Politician) = 0,96 + 0,95• Enqueue Politician in Q2

0,95

0,96

0,95

0,95

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy

Q1 (d1) Q2 (d2)

Arn(0) Pol(1,91)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Dequeue Physicist from Q2

0,95

0,96

0,95

0,95

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Sci (2,9)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

• d2(Scientist) = 1,91 + 0,99=2,9

• Enqueue Scintist in Q2

0,95

0,96

0,95

0,950,99

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Sci (2,9)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

0,95

0,96

0,95

0,950,99

Stop since Physicist ϵ V(T1)

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Per(3,8)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

Return vertices in vector V : V = {Germany, Angela Merkel,

Physicist }

0,95

0,96

0,950,950,99

0,99

Phase 2– shortest Path AlgorithmFirst Iteration Result:

Phase 2– shortest Path Algorithm Second Iteration:

Remove LP

Apply Again the algorithm: To find Shortest Path between T1 and T2

Stop here Since no Loose Paths can be improved

Aproximation GuaranteeLemmas and Theorems

• Lemma 1– A Tree T with terminal set V’, |V’| ≥ 2 has at least

|V’| - 1 and at most 2|V’| - 3 loose paths.

• The approximation ratio for the cost of the tree returned by star is independent of the 1st Phase result.

Aproximation GuaranteeLemmas and Theorems

• Lemma 2– Let TA be the Steiner tree yielded by the STAR algorithm. Let L(TA) be

the set of loose paths in TA . For any circular ordering u1, …, uN of the terminals in TA there is a mapping μ: L(TA) V’ X V’, such that:

1. μ is defined for all loose paths in TA

2. For each loose path P with end points u and v, let T1 and T2 the two trees obtained by removing from TA all nodes in P (and their edges), except u and v; then μ(P) = {ui , ui+1 } for some i=1, …, N and one of the nodes ui , ui+1 belongs to T1 , while the other one belongs to T2 ;

3. For each pair of terminals {ui , ui+1 } there are at most 2┌ logN┐+2 loose paths mapped to {ui , ui+1 } .

Aproximation GuaranteeLemmas and Theorems

• Theorem 1 (approximation order)– The STAR algorithm is a

(4┌ logN┐+4 )-approximation algorithm for the Steiner Tree Problem.

– Therefore:

Aproximation GuaranteeLemmas and Theorems

• Improvement Guarantee Rule – STAR might have exponential running time. – Infinitesimally small amount cost reduction at

each iteration. – An Improvement Guarantee Rule solves this: – Replace loose path P if and only if:

– Where P’ is the path to be replaced by STAR, given that є > 0

Aproximation GuaranteeLemmas and Theorems

• Lemma 3 (Time complexity )– Given є > 0, the STAR algorithm with the

improvement-guarantee rule is guaranteed to terminate in

– steps – Where m is the number of edges– is the ratio of the maximum and minimum

cost of the edges in the input graph.

Aproximation GuaranteeLemmas and Theorems

• Theorem 2– Given є > 0, the STAR algorithm with the

improvement-guarantee rule is a - approximation algorithm for the steiner tree problem. Its Running time is

Where n, m, N denote the number of Vertices, edges and terminals of the input graph.

Approximate Top-K Interconnections

• Observing loose path weight is an upper bound for new interconnecting path weights

• No loose paths in the final tree T after improvements

• Top-K interconections are computed starting from the final tree T returned by original STAR

Approximate Top-K Interconnections

• Lines 1-3 compute the original tree T

• T is enqueued in priority queue Q

• New trees generated by artificially relaxing current tree lps (Lines 4-9)

Approximate Top-K Interconnections

• Relax(T, є )– Tunable value є >0 used to

artificially create loose path weights

– New weights used as upper bounds.

– Artificial Upper Bounds for New interconnecting paths between sub trees

Approximate Top-K Interconnections

• improveTree’(T’, V’)– Replace(lp, T) calls

findShortestPath(V(T1), V (T2), lp)– findShortestPath(V(T1), V (T2), lp)

uses higher artificiall weights – New interconnecting paths are

not the same but still the shortest between T1, T2.

– Node disjoint to loose path new interconnecting paths considered.

– This gives us result diversity required for top-k

Original algorithm

Approximate Top-K Interconnections

• reweight(T’)– Re-weights the result of

improveTree(T, V’) by: – Acting on loose paths of

T’ (also loose paths of T)

– Setting back W(T’) to its initial value before relaxation.

Evaluation

• STAR Compared to most known Steiner Tree Approximation Algorithms: – DNH, DPBF, BLINKS, BANKS (both versions)

• Compared in terms of quality (avg. weight) and performance (avg. runtime)

• Semantic Quality or User perceived Relevance is not Considered

• An earlier work of them showed that: – A steiner tree based scoring function contribute to high

relevance from a users view point

Evaluation - Algorithms in Comparison

• DNH (Distance Network Heuristic)– 2-approximation algorithm

• DPBF– Dynamic programming approach– Optimal Tree Can be computed (not an approximation)– Best on small number of terminals (Queries)

• BLINKS– Newest – Experimentally BEST in the field

• Banks I & II– Keyword proximity search on relational data

Evaluation Types of Comparisons Performed

• Top-1 comparison of STAR, DNH DPBF, BANKS I & II

• Top-k comparison of STAR, BANKS I & II, BLINKS

• External Storage Comparison of STAR and BANKS

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Worst Case Theoretical properties of algorithms: – DNH, approximation ratio: 2(1- 1/n), n =|V’|

• Goal a good approximation ratio on given G, V’ – STAR, approximation ratio: 4logn + 4– BANKS I & II approximation ratio: O(n)– DPBF, approximation ratio: Does n’t nave (Optimal

Steiner Tree)• Used for comparison of all others to optimal tree weights

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Datasets – View DBLP and IMDB as Graphs

• Nodes entities: (author, publication, conference, actor, movie, year, etc.)

• Edges Relations: (cited by, author of, acted in, etc.).– Dataset DBLB: Sub graph of 15,000 nodes & 150,000 Edges.

• Due to DNH & DPBF constraints (perform on main memory only)– Dataset IMDB : Sub graph of 30,000 nodes & 80,000 Edges. – Two Different Datasets needed to tackle different Topologies – No edge weights present in both datasets -> randomly assigned– No taxonomic information present in both datasets

(Not a problem for STAR tackled in 1st Phase)

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Queries– Query sets of 3, 5 and 7 – Each set of 60 queries – Same number of terminals only– Randomly acquired terminals

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Metrics– Reference: Optimal Scores Returned by DPBF – Compare weight by STAR to weights by all others – Running times of all Algorithms comparison

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• Results

– Observe DPBF performance for all #terminals • Weight • Runtime

– Observe DPBF performance for 7 #terminals ????

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• DBLP Results

– For all –Terminals• STAR weight is better than all the others • STAR runtime outperforms all others • Even though DNH has a better Approximation Ratio

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• IMDB Results

– STAR weight is slightly not better than DNH– A hypothesis is; DBLP Higher Edge-To-Node Ratio– Banks II performance improved relative to

competitors ? – Still Outperformed by STAR

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• DNH can not compute Top-k results• BLINKS

– Uses indexing for Query time Speedup– Requires Entire Graph in Main Memory– Datasets are again used– Uses a partitioning strategy (Block Sizes of

Nodes)– Initially Tuned for better results

• DBLB: 100 node Block Size• IMDB: 5 node Block Size

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Metrics

– BLINKS avg. weight is not applicable• Returns only Root nodes of result trees at output

• Queries– Comparison for k=10, k=50, k=100– DBLP & IMDB:

• 5 terminals• Random queries• 60 queries

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Results

– Index construction Time by BLINKS excluded– BLINKS has the worst runtime though– BANKS II & BLINKS runtimes is worse on denser DBLP

Graph

Top-k Comparisons (STAR, BANKS I & II, BLINKS)

• STAR performance explanation: – Uses only 2 iterators per

improvement step– Does not visit nodes of:

d> W(lp) – Tighter upper bounds for

pruning

External Storage Comparison of STAR and BANKS

• STAR & BANKS direct applicability to Graphs NOT FITED to main memory

• Simulation of such a scenario – Disk Resident Datasets

• Dataset:– YAGO Knowledge Base

• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Graph Stored in a Relational Database of Schema:

EDGE(source , target, weight)• Type and Subclass taxonomy (STAR 1st Phase) supported

– Database Call overhead uniformly treated on STAR & BANKS

External Storage Comparison of STAR and BANKS

• STAR & BANKS direct applicability to Graphs NOT FITED to main memory

• Simulation of such a scenario:– Disk Resident Datasets

• Dataset:– YAGO Knowledge Base

• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Type and Subclass taxonomy (STAR 1st Phase) supported

– Graph Stored in a Relational Database of Schema: EDGE(source , target, weight)

– Edge Exploration: Database access for each edge– overhead uniformly treated on both STAR & BANKS by edge loading.

External Storage Comparison of STAR and BANKS

• Queries: – 2 sets, 3 and 6 Terminals– Top-1, Top-3, Top-6 results– Terminal nodes randomly chosen – 30 queries made

• Metrics: – Average Weight (quality of output Trees)– Efficiency (running times)– Number of edges accessed

External Storage Comparison of STAR and BANKS

• Results: – BANKS I & II, some times 30 min to return results

• Excluded from Evaluation – fair enough– STAR Outperforms:

• an order of magnitude faster – STAR accesses an order of magnitude fewer edges

• Gain from taxonomic structure (1st Phase)

Results Summary

• Fairness by Giving all algorithms the same inputs• Diversity of algorithms

– DNH only handles graphs in main memory– BLINKS: Indexing, different metric, luck of approximation

guarantee– Not Steiner-Tree-Like query methods

• STAR outstanding performance: – 1) Graph Taxonomic Structure when Possible – 2) Iterators needed per improvement step, Number of

Terminal Independence– 3) Tight upper bounds and path pruning

Conclusion

• E-R Style data Graph Query Problem addressed• Inherent Taxonomic Structure Exploited• STAR Does not depend ONLY on Taxonomic

Information – 2nd Phase fast “findShortestPath” algorithm

• DNH Contradiction: – Better approximation rate while similar results as STAR

• STAR achieves a good approximation O(logn) , to Optimal Steiner Tree

Thank YouFor

Your Attention

Recommended