View
56
Download
0
Category
Tags:
Preview:
DESCRIPTION
STAR: Steiner-Tree Approximation in Relationship Graphs. Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci , Maya Ramanath , Mauro Sozio , Fabian M. Suchanek , Gerhard Weikum. Introduction. Entity-Relationship Graphs - PowerPoint PPT Presentation
Citation preview
STAR: Steiner-Tree Approximation in RelationshipGraphs
Max-Planck Institute for Informatics,Database and Information Systems,
Gjergji Kasneci , Maya Ramanath , Mauro Sozio, Fabian M. Suchanek , Gerhard Weikum
Introduction
• Entity-Relationship Graphs – An other way of representing relational Data– Consist of labeled Nodes and Edges, – Node Labels correspond to Entities– Edge Labels represent relations between Entities – Edge Weights and Entity relation strength. – Taxonomic Relations (subClassOf, type )
Introduction
• Example of an Entity Relationship Graph
Specialization
Generalization
Introduction• Quering E-R Graphs
– The Relationship Search Query Class: • Given a set of two, three, or more entities (nodes), find their closest
relationships (edges or paths) that connect the entities in the strongest possible way.
• Strongest Related to Informativenes– A Relationship Search Query Example
• Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?”
• Informative answer: All have a doctoral degree from a German university
– How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?
Motivation and Problem • Information Discovery as opposed to Lookup• The Nature of the Answer
– Can be a Tree embeded In Original Graph – Input Nodes (Query) must be connected by the Tree– How Good is the answer?
• A scoring model can exploit node and edge weights
• The formal Definition of the Problem: – Compute the k lowest-cost Steiner trees:
Motivation and Problem • What is a Steiner Tree Problem?
• Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S.
Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.
Motivation and Problem • Steiner Tree Problem Complexity
– NP-Hard Complete (Optimal)– Approximate Solution algorithms– Approximation Ratio:
• Measures the Quality of approximation algorithm • Weight of Aproximate Graph out / weight of Optimal Graph Output
• Benefits by Reducing Approximation Ratio– Viable Runtimes (efficiency)– Better Graph quality (Informativenes) near-optimal
Paper Contributions• Presents STAR a new Efficient algorithm
– Computes near-optimal Steiner Trees– Exploits Taxonomic Schema (when available)– Viable Runtimes over large graphs
• STAR Approximation Ratio Proofs:– O(logn), for n given query entities (Worst Case)
• Improvement over other approximation ratios– , or – STAR practically is better than a - approximation algorithm
• STAR top-k tree capability• STAR Outperforms State of the art algorithms by an order of magnitude• Can be applied either on main memory datasets or on-disc resident
Large Graphs. • Evaluation via Comparison with other cutting edge algorithms
The Star Algorithm
• Introduction • First Phase• Second Phase• Examples
The Star Algorithm
• Introduction • First Phase• Second Phase• Examples
The Star Algorithm – Introduction • Problem Definition
– As Stated in introduction – Further we are interested in finding Top-k result trees by increasing order
• Exploitation of Taxonomic Backbones– Node Labels as Entities – Edge Labels as weights or relations– Taxonomic Availability is not compulsory
• Runs in 2 Phases• Phase 1: Uses Taxonomic Information (when available)
– Builds a quick Tree by pruning the Original Graph– Interconnects all given nodes
• Phase 2: Iteratively improves the Tree from Phase 1
The Star Algorithm
• Introduction • First Phase• Second Phase• Examples
The Star Algorithm - First Phase
• Prunes Original Graph • Runs Iterators in each Terminal• Iterators Run in a Round Robin Manner• Iterators Follow only Taxonomic Edges:
– subClassOf, type
Single Breadth – First - Search Iterator Pruning Example
15
Breadth First Search
s
2
5
4
7
8
3 6 9
Observe Taxonomic Structure
16
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: s
Top of queue
2
1Shortest pathfrom s
17
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: s 2
Top of queue
3
1
1
18
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: s 2 3
Top of queue
5
1
1
1
19
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 2 3 5
Top of queue
1
1
1
20
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 2 3 5
Top of queue
4
1
1
1
2
21
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 2 3 5 4
Top of queue
1
1
1
2
5 already discovered:don't enqueue
22
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 2 3 5 4
Top of queue
1
1
1
2
23
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 3 5 4
Top of queue
1
1
1
2
24
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 3 5 4
Top of queue
1
1
1
2
6
2
25
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 3 5 4 6
Top of queue
1
1
1
2
2
26
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 5 4 6
Top of queue
1
1
1
2
2
27
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 5 4 6
Top of queue
1
1
1
2
2
28
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 4 6
Top of queue
1
1
1
2
2
29
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 4 6
Top of queue
1
1
1
2
2
8
3
30
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 4 6 8
Top of queue
1
1
1
2
2
3
31
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 6 8
Top of queue
1
1
1
2
2
3
7
3
32
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 6 8 7
Top of queue
1
1
1
2
2
3
9
3
3
33
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 6 8 7 9
Top of queue
1
1
1
2
2
3
3
3
34
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 8 7 9
Top of queue
1
1
1
2
2
3
3
3
35
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 7 9
Top of queue
1
1
1
2
2
3
3
3
36
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 7 9
Top of queue
1
1
1
2
2
3
3
3
37
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 7 9
Top of queue
1
1
1
2
2
3
3
3
38
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 7 9
Top of queue
1
1
1
2
2
3
3
3
39
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 9
Top of queue
1
1
1
2
2
3
3
3
40
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 9
Top of queue
1
1
1
2
2
3
3
3
41
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue: 9
Top of queue
1
1
1
2
2
3
3
3
42
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Undiscovered
Discovered
Finished
Queue:
Top of queue
1
1
1
2
2
3
3
3
43
Breadth First Search
s
2
5
4
7
8
3 6 9
0
Level Graph
1
1
1
2
2
3
3
3
First – Phase Example
(Simple Breadth – First – Search Iterator from each Terminal)
V’ = {Max Planck, Arnold Schwarzenegger, Germany}
45
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Breadth First Search Iterators from Each Terminal
As soon as iterators meet a result is constructed
46
Queue T1: Max Planck
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:
Queue: T3:
T1 T2 T3
Breadth First Search Iterators from Each Terminal
47
Queue T1: Max Planck
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger
Queue: T3:
T1 T2 T3
Breadth First Search Iterators from Each Terminal
48
Queue T1: Max Planck
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger
Queue: T3: Germany
T1 T2 T3
Breadth First Search Iterators from Each Terminal
49
Queue T1: Max Planck, Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger
Queue: T3: Germany
T1 T2 T3
Breadth First Search Iterators from Each Terminal
50
Queue T1: Max Planck, Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger, Politician
Queue: T3: Germany
T1 T2 T3
Breadth First Search Iterators from Each Terminal
51
Queue T1: Max Planck, Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger, Politician
Queue: T3: Germany, State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
52
Queue T1: Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:Arnold Schwarzenegger, Politician
Queue: T3: Germany, State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
53
Queue T1: Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Arnold Schwarzenegger, Politician, Actor
Queue: T3: Germany, State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
54
Queue T1: Physicist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Arnold Schwarzenegger, Politician
Queue: T3: State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
55
Queue T1: Physicist, Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Arnold Schwarzenegger, Politician
Queue: T3: State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
56
Queue T1: Physicist, Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor
Queue: T3: State
T1 T2 T3
Breadth First Search Iterators from Each Terminal
57
Queue T1: Physicist, Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor
Queue: T3: State, Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each Terminal
58
Queue T1: Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor
Queue: T3: State, Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each Terminal
59
Queue T1: Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor, Entity
Queue: T3: State, Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each Terminal
60
Queue T1: Scientist
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor, Entity
Queue: T3: Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each Terminal
61
Queue T1: Scientist, Person
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Politician, Actor, Entity
Queue: T3: Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each Terminal
62
Queue T1: Scientist, Person
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Actor, Entity
Queue: T3: Organization Unit
T1 T2 T3
Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: don't enqueue
63
Queue T1: Scientist, Person
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Entity
Queue: T3: Organization Unit, Entity
T1 T2 T3
Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T2 & T3 Iterators Met Stop T3 Iterator
64
Queue T1: Person
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Entity
Queue: T3:
T1 T2
Breadth First Search Iterators from Each Terminal
65
Queue T1: Person, Entity
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Entity
Queue: T3:
T1 T2
Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T1 & T2 Iterators Met Stop T1 Iterator
66
Queue T1:
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Entity
Queue: T3:
T2
Breadth First Search Iterators from Each Terminal
67
Queue T1:
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2: Entity
Queue: T3:
T2
Breadth First Search Iterators from Each Terminal
68
Queue T1:
Undiscovered
Discovered
Finished
Top of queue
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Queue T2:
Queue: T3:
Breadth First Search Iterators from Each Terminal
The Star Algorithm – Second Phase • Aims to Improve the Tree from Phase 1• Follows an iterative improvement procedure
– Certain paths are replaced on each Iteration– New path weights are lower
• Some Definitions : • Terminal Node:
– Any node v є V’• Degree of a node v, deg(v):
– Is the number of edges connected to the node• Fixed Node:
– Any node v, of deg(v) ≥ 3 – Any Terminal Node
The Star Algorithm – Second Phase • Loose Path :
– A path p in T is a loose path if it has minimal length and its end nodes are fixed nodes.
• Fixed nodes should not be removed during Improvement• Follows that Every intermediate node v in a loose path must
be a Steiner node of deg(v) = 2
• A loose Path is a path that can be replaced during improvement process
• A minimal Steiner Tree with respect to V’ is a tree in which all loose paths represent shortest paths between fixed nodes.
The Star Algorithm – Second PhaseObservations
• Removing a LP T1, T1 subtrees
• Replacing any LP by a shorter– Compute shortest path between
any node of T1 to any node of T2• Removing and Inserting LPs
Fixed nodes and Unfixed nodes
The Star Algorithm – Second PhaseFinding an approximate Steiner Tree
1. Remove a LP2. Decomposition of
T into T1 and T23. Connect T1 and T2
by a shorter than LP path
The Star Algorithm – Second PhaseFinding an approximate Steiner Tree
The Star Algorithm – Second PhaseThe Tree improving algorithm
• The Difficult Steiner Tree Problem is Reduced– Find shortes paths between node subsets
• In each iteration lp with max weight is removed (Heuristic)
The Star Algorithm – Second PhaseThe method: replace(lp, T)
• Removes the loose path form T• T is split into subgraphs T1 and T2 • The shortest path connecting any node of T1
to any node of T2 is determined– replace (lp, T) calls findShortestPath(VT1, VT2, lp)– findShortestPath(VT1, VT2, lp), returns the
shortest path
Steiner Tree Approximation - Phase 2
Physicist
Max Planck
Scientist
Person
Entity
PoliticianActor
Arnold Schwarzenegger
Organization Unit
State
Germany
Angela Merkel
• The overall Graph G
Steiner Tree Approximation - Phase 2 • Output of Phase 1
• deg(Person) = 3, therefore it is a Fixed point• Largest LP occurs between Person & Germany
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Steiner Tree Approximation - Phase 2
• Remove Largest LP• Fixed Nodes are not removed
Physicist
Max Planck
Scientist
Person
Entity
Politician
Arnold Schwarzenegger
Organization Unit
State
Germany
Steiner Tree Approximation - Phase 2
• G splitted into sub graphs T1 & T2• V(T1) = { Person, Politician, Scientist, Physicist, Max Plank} • V(T2) = {Germany}• Algorithm for finding shortest path between T1 & T2 • Method call: shortestPath(V(T1), V(T2), lp)
Physicist
Max Planck
Scientist
Person
Politician
Arnold Schwarzenegger
Germany
T1 T2
Phase 2– shortest Path Algorithm
• All pruned vertices are needed• Runs “One single source shortest
path iterator from V(T1) and V(T2)”
• i.e. Find the shortest path from a source Vertex V to all other vertices in graphs.
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
GermanyAngela Merkel
Phase 2– shortest Path Algorithm
• Vertex distance d(v) initialization• Assign TWO distances (d1, d2) to each vertex• Assign d1 = 0 to all vertices of V(T1)• Assign d2 = 0 to all vertices of V(T2)• Assign d1= ∞ to all vertices of V(T2)• Assign d2= ∞ to all vertices of V(T1)• Assign d1= d2 = ∞ to all pruned or not
queried vertices
Physicist
Max Planck
Scientist
Person
Entity
Politician Actor
Arnold Schwarzenegger
Organization Unit
State
GermanyAngela Merkel
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. ∞)
Germany(∞, 0)
Angela Merkel(∞. ∞)
• T1 is considered a single node of distance 0 from itself and distance ∞ from T2
• T2 accordingly • Other nodes not members of T1 or T2 have infinite distances from
both T1 or T2
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. ∞)
Germany(∞, 0)
Angela Merkel(∞. ∞)
Itr Cur Oth V V’
1 2Q1 (d1) Q2 (d2)
Arn(0) Ger(0)
Pol(0)
Max(0)
Phy(0)
Sci(0)
Per(0)
• Current: points to iterator of minimal fringe nodes And that is currently expanded
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. ∞)
Germany(∞, 0)
Angela Merkel(∞. ∞)
Itr Cur Oth V V’
1 2
1 2 1 Ger
Q1 (d1) Q2 (d2)
Arn(0)
Pol(0)
Max(0)
Phy(0)
Sci(0)
Per(0)
• Fringe(Q2) < Fringe (Q1)• Swap (current, Other)• Dequeue Germany form Q2
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. ∞)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
Q1 (d1) Q2 (d2)
Arn(0) Sta(0,95)
Pol(0)
Max(0)
Phy(0)
Sci(0)
Per(0)
• d2(State) = 0 + 0,95• Enqueue(State) in Q2
0,95
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
2 2 1 Ger Ang
Q1 (d1) Q2 (d2)
Arn(0) Ang(0,96)
Pol(0) Sta(0,95)
Max(0)
Phy(0)
Sci(0)
Per(0)
• d2(Angela Merkel) = 0 + 0,96• Enqueue Angela Merkel in Q2
0,95
0,96
Phase 2– shortest Path Algorithm
Physicist (0. ∞)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang
Q1 (d1) Q2 (d2)
Arn(0) Sta(0,95)
Pol(0)
Max(0)
Phy(0)
Sci(0)
Per(0)
• Dequeue Angela Merkel from Q2
0,95
0,96
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. ∞)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
Q1 (d1) Q2 (d2)
Arn(0) Phy(1,91)
Pol(0) Sta(0,95)
Max(0)
Phy(0)
Sci(0)
Per(0)
• d2(Physicist) = 0,96 + 0,95• Enqueue Physicist in Q2
0,95
0,96
0,95
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. 1,91)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
2 2 1 Ang Pol
Q1 (d1) Q2 (d2)
Arn(0) Phy(1,91)
Pol(0) Pol(1,91)
Max(0) Sta(0,95)
Phy(0)
Sci(0)
Per(0)
• d2(Politician) = 0,96 + 0,95• Enqueue Politician in Q2
0,95
0,96
0,95
0,95
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0. ∞)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. 1,91)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
2 2 1 Ang Pol
3 2 1 Phy
Q1 (d1) Q2 (d2)
Arn(0) Pol(1,91)
Pol(0) Sta(0,95)
Max(0)
Phy(0)
Sci(0)
Per(0)
• Dequeue Physicist from Q2
0,95
0,96
0,95
0,95
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0.2,9)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. 1,91)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
2 2 1 Ang Pol
3 2 1 Phy Sci
Q1 (d1) Q2 (d2)
Arn(0) Sci (2,9)
Pol(0) Pol(1,91)
Max(0) Sta(0,95)
Phy(0)
Sci(0)
Per(0)
• d2(Scientist) = 1,91 + 0,99=2,9
• Enqueue Scintist in Q2
0,95
0,96
0,95
0,950,99
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0.2,9)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. 1,91)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
2 2 1 Ang Pol
3 2 1 Phy Sci
Q1 (d1) Q2 (d2)
Arn(0) Sci (2,9)
Pol(0) Pol(1,91)
Max(0) Sta(0,95)
Phy(0)
Sci(0)
Per(0)
0,95
0,96
0,95
0,950,99
Stop since Physicist ϵ V(T1)
Phase 2– shortest Path Algorithm
Physicist (0. 1,91)
Max Planck(0. ∞)
Scientist (0.2,9)
Person (0. ∞)
Entity (∞. ∞)
Politician(0. 1,91)
Actor(∞. ∞)
Arnold Schwarzenegger(0. ∞)
Organization Unit(∞. ∞)
State(∞. 0,95)
Germany(∞, 0)
Angela Merkel(∞. 0,96)
Itr Cur Oth V V’
1 2
2 1 Ger
1 2 1 Ger Sta
1 2 1 Ger Ang
2 2 1 Ang Phy
2 2 1 Ang Pol
3 2 1 Phy Sci
Q1 (d1) Q2 (d2)
Arn(0) Per(3,8)
Pol(0) Pol(1,91)
Max(0) Sta(0,95)
Phy(0)
Sci(0)
Per(0)
Return vertices in vector V : V = {Germany, Angela Merkel,
Physicist }
0,95
0,96
0,950,950,99
0,99
Phase 2– shortest Path AlgorithmFirst Iteration Result:
Phase 2– shortest Path Algorithm Second Iteration:
Remove LP
Apply Again the algorithm: To find Shortest Path between T1 and T2
Stop here Since no Loose Paths can be improved
Aproximation GuaranteeLemmas and Theorems
• Lemma 1– A Tree T with terminal set V’, |V’| ≥ 2 has at least
|V’| - 1 and at most 2|V’| - 3 loose paths.
• The approximation ratio for the cost of the tree returned by star is independent of the 1st Phase result.
Aproximation GuaranteeLemmas and Theorems
• Lemma 2– Let TA be the Steiner tree yielded by the STAR algorithm. Let L(TA) be
the set of loose paths in TA . For any circular ordering u1, …, uN of the terminals in TA there is a mapping μ: L(TA) V’ X V’, such that:
1. μ is defined for all loose paths in TA
2. For each loose path P with end points u and v, let T1 and T2 the two trees obtained by removing from TA all nodes in P (and their edges), except u and v; then μ(P) = {ui , ui+1 } for some i=1, …, N and one of the nodes ui , ui+1 belongs to T1 , while the other one belongs to T2 ;
3. For each pair of terminals {ui , ui+1 } there are at most 2┌ logN┐+2 loose paths mapped to {ui , ui+1 } .
Aproximation GuaranteeLemmas and Theorems
• Theorem 1 (approximation order)– The STAR algorithm is a
(4┌ logN┐+4 )-approximation algorithm for the Steiner Tree Problem.
– Therefore:
Aproximation GuaranteeLemmas and Theorems
• Improvement Guarantee Rule – STAR might have exponential running time. – Infinitesimally small amount cost reduction at
each iteration. – An Improvement Guarantee Rule solves this: – Replace loose path P if and only if:
– Where P’ is the path to be replaced by STAR, given that є > 0
Aproximation GuaranteeLemmas and Theorems
• Lemma 3 (Time complexity )– Given є > 0, the STAR algorithm with the
improvement-guarantee rule is guaranteed to terminate in
– steps – Where m is the number of edges– is the ratio of the maximum and minimum
cost of the edges in the input graph.
Aproximation GuaranteeLemmas and Theorems
• Theorem 2– Given є > 0, the STAR algorithm with the
improvement-guarantee rule is a - approximation algorithm for the steiner tree problem. Its Running time is
Where n, m, N denote the number of Vertices, edges and terminals of the input graph.
Approximate Top-K Interconnections
• Observing loose path weight is an upper bound for new interconnecting path weights
• No loose paths in the final tree T after improvements
• Top-K interconections are computed starting from the final tree T returned by original STAR
Approximate Top-K Interconnections
• Lines 1-3 compute the original tree T
• T is enqueued in priority queue Q
• New trees generated by artificially relaxing current tree lps (Lines 4-9)
Approximate Top-K Interconnections
• Relax(T, є )– Tunable value є >0 used to
artificially create loose path weights
– New weights used as upper bounds.
– Artificial Upper Bounds for New interconnecting paths between sub trees
Approximate Top-K Interconnections
• improveTree’(T’, V’)– Replace(lp, T) calls
findShortestPath(V(T1), V (T2), lp)– findShortestPath(V(T1), V (T2), lp)
uses higher artificiall weights – New interconnecting paths are
not the same but still the shortest between T1, T2.
– Node disjoint to loose path new interconnecting paths considered.
– This gives us result diversity required for top-k
Original algorithm
Approximate Top-K Interconnections
• reweight(T’)– Re-weights the result of
improveTree(T, V’) by: – Acting on loose paths of
T’ (also loose paths of T)
– Setting back W(T’) to its initial value before relaxation.
Evaluation
• STAR Compared to most known Steiner Tree Approximation Algorithms: – DNH, DPBF, BLINKS, BANKS (both versions)
• Compared in terms of quality (avg. weight) and performance (avg. runtime)
• Semantic Quality or User perceived Relevance is not Considered
• An earlier work of them showed that: – A steiner tree based scoring function contribute to high
relevance from a users view point
Evaluation - Algorithms in Comparison
• DNH (Distance Network Heuristic)– 2-approximation algorithm
• DPBF– Dynamic programming approach– Optimal Tree Can be computed (not an approximation)– Best on small number of terminals (Queries)
• BLINKS– Newest – Experimentally BEST in the field
• Banks I & II– Keyword proximity search on relational data
Evaluation Types of Comparisons Performed
• Top-1 comparison of STAR, DNH DPBF, BANKS I & II
• Top-k comparison of STAR, BANKS I & II, BLINKS
• External Storage Comparison of STAR and BANKS
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)
• Worst Case Theoretical properties of algorithms: – DNH, approximation ratio: 2(1- 1/n), n =|V’|
• Goal a good approximation ratio on given G, V’ – STAR, approximation ratio: 4logn + 4– BANKS I & II approximation ratio: O(n)– DPBF, approximation ratio: Does n’t nave (Optimal
Steiner Tree)• Used for comparison of all others to optimal tree weights
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)
• Datasets – View DBLP and IMDB as Graphs
• Nodes entities: (author, publication, conference, actor, movie, year, etc.)
• Edges Relations: (cited by, author of, acted in, etc.).– Dataset DBLB: Sub graph of 15,000 nodes & 150,000 Edges.
• Due to DNH & DPBF constraints (perform on main memory only)– Dataset IMDB : Sub graph of 30,000 nodes & 80,000 Edges. – Two Different Datasets needed to tackle different Topologies – No edge weights present in both datasets -> randomly assigned– No taxonomic information present in both datasets
(Not a problem for STAR tackled in 1st Phase)
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)
• Queries– Query sets of 3, 5 and 7 – Each set of 60 queries – Same number of terminals only– Randomly acquired terminals
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)
• Metrics– Reference: Optimal Scores Returned by DPBF – Compare weight by STAR to weights by all others – Running times of all Algorithms comparison
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• Results
– Observe DPBF performance for all #terminals • Weight • Runtime
– Observe DPBF performance for 7 #terminals ????
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• DBLP Results
– For all –Terminals• STAR weight is better than all the others • STAR runtime outperforms all others • Even though DNH has a better Approximation Ratio
Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• IMDB Results
– STAR weight is slightly not better than DNH– A hypothesis is; DBLP Higher Edge-To-Node Ratio– Banks II performance improved relative to
competitors ? – Still Outperformed by STAR
Top-k Comparisons (STAR, BANKS I & II, BLINKS)• DNH can not compute Top-k results• BLINKS
– Uses indexing for Query time Speedup– Requires Entire Graph in Main Memory– Datasets are again used– Uses a partitioning strategy (Block Sizes of
Nodes)– Initially Tuned for better results
• DBLB: 100 node Block Size• IMDB: 5 node Block Size
Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Metrics
– BLINKS avg. weight is not applicable• Returns only Root nodes of result trees at output
• Queries– Comparison for k=10, k=50, k=100– DBLP & IMDB:
• 5 terminals• Random queries• 60 queries
Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Results
– Index construction Time by BLINKS excluded– BLINKS has the worst runtime though– BANKS II & BLINKS runtimes is worse on denser DBLP
Graph
Top-k Comparisons (STAR, BANKS I & II, BLINKS)
• STAR performance explanation: – Uses only 2 iterators per
improvement step– Does not visit nodes of:
d> W(lp) – Tighter upper bounds for
pruning
External Storage Comparison of STAR and BANKS
• STAR & BANKS direct applicability to Graphs NOT FITED to main memory
• Simulation of such a scenario – Disk Resident Datasets
• Dataset:– YAGO Knowledge Base
• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Graph Stored in a Relational Database of Schema:
EDGE(source , target, weight)• Type and Subclass taxonomy (STAR 1st Phase) supported
– Database Call overhead uniformly treated on STAR & BANKS
External Storage Comparison of STAR and BANKS
• STAR & BANKS direct applicability to Graphs NOT FITED to main memory
• Simulation of such a scenario:– Disk Resident Datasets
• Dataset:– YAGO Knowledge Base
• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Type and Subclass taxonomy (STAR 1st Phase) supported
– Graph Stored in a Relational Database of Schema: EDGE(source , target, weight)
– Edge Exploration: Database access for each edge– overhead uniformly treated on both STAR & BANKS by edge loading.
External Storage Comparison of STAR and BANKS
• Queries: – 2 sets, 3 and 6 Terminals– Top-1, Top-3, Top-6 results– Terminal nodes randomly chosen – 30 queries made
• Metrics: – Average Weight (quality of output Trees)– Efficiency (running times)– Number of edges accessed
External Storage Comparison of STAR and BANKS
• Results: – BANKS I & II, some times 30 min to return results
• Excluded from Evaluation – fair enough– STAR Outperforms:
• an order of magnitude faster – STAR accesses an order of magnitude fewer edges
• Gain from taxonomic structure (1st Phase)
Results Summary
• Fairness by Giving all algorithms the same inputs• Diversity of algorithms
– DNH only handles graphs in main memory– BLINKS: Indexing, different metric, luck of approximation
guarantee– Not Steiner-Tree-Like query methods
• STAR outstanding performance: – 1) Graph Taxonomic Structure when Possible – 2) Iterators needed per improvement step, Number of
Terminal Independence– 3) Tight upper bounds and path pruning
Conclusion
• E-R Style data Graph Query Problem addressed• Inherent Taxonomic Structure Exploited• STAR Does not depend ONLY on Taxonomic
Information – 2nd Phase fast “findShortestPath” algorithm
• DNH Contradiction: – Better approximation rate while similar results as STAR
• STAR achieves a good approximation O(logn) , to Optimal Steiner Tree
Thank YouFor
Your Attention
Recommended