Upload
ty-chumley
View
213
Download
0
Embed Size (px)
Citation preview
VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional
Indexing Schemes
Presenter: Quang Hieu Vu
H.V.Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou
VBI (Virtual Binary Index) -Tree framework
BATON: BAlanced Tree Overlay Network (VLDB’05)
Definition: A tree is balanced if and only if at any node in the tree the height of its two subtrees differ by at most one.
Binary Balanced Tree Index Architecture
m
Properties
Property 1: A tree is a balanced tree if every node in the tree that has a child also has both its left and right routing tables full
A routing table is full if none of its valid links is NULL
Property 2: If a node x contains a link to another node y in its left or right routing tables, the parent node of x must also contain a link to the parent node of y unless the same node is parent of both x and y
VBI-Tree structure
VBI-Tree Structure
VBI-Tree structure
Differences with BATON:
Two kinds of nodes: routing nodes (internal nodes) and data nodes (leaf nodes).
Each peer node is in charge of a pair of node: routing node and data node
Each node has an additional upside path which keeps information of regions covered by the node’s ancestors.
Theorem: In an in-order traversal of the VBI-Tree, data nodes and routing nodes alternate.
Node join
Two phases:
First phase: determine where the new node should joinSimilar to BATON’s join algorithm except that only
routing nodes are considered
Second phase: the node accepting the new node splits its correspondence data node into two parts
It keeps one part, the new node keeps one part
New node u joins the network
a
ih kj ml on
fd ge
b c
p q r s
u
h’ d’ i’ b’ k’ a’ m’ c’ n’ g’ o’l’
p’ j’ q’ e’ r’ f’ s’ t’u’ n’
u
Example
Node departure
Similar to BATON except that only routing nodes are considered
Only leaf routing nodes whose routing neighbor nodes don’t have full routing children nodes can leave the network
Upon leaving, children data nodes of the departed routing node are merged together.
The new data node is pulled up to replace the position of the departed routing node.
Others have to find a replacement routing node which is a routing node in the first case
r’ f’
Leaf routing node r leaves the network
a
ih kj ml on
fd ge
b c
p q r sh’ d’ i’ b’ k’ a’ m’ c’ n’ g’ o’l’
p’ j’ q’ e’ s’ t’
f’
Example
Index construction
Each internal node manages a region covering all regions managed by its children
Data is stored only at leaf nodes
Discrete data: help to avoid updating upside paths frequently
Two dimensional index construction
Range query search
If the node region covers or intersects with the searched region
The query is processed at the node and/or its childrenIf there is any ancestor of a node whose region intersects with the searched region
If the node in other side hasn’t been searched beforeThe query is forwarded to that node
ElseThe query is forwarded upward to the ancestor
Note: ancestors whose children cover the whole searched region don’t need to be searched
Example
Node h wants to search the shaded region
ed gf
b c
a
j ke’ a’ g’
j’ f’ k’ c’
l’h i
h’ d’ i’ b’
Load balancing
Similar to rotations of AVL tree
LL Rotation
LR Rotation
Load balancing
RR Rotation
RL Rotation
Experimental study
Experimental setup
Implement M-tree over VBI-Tree framework.
100000 data objects are inserted into a network of 10000 nodes.
1000 exact queries, 1000 range queries, and 1000 kNN queries are executed.
CAN is used for comparison.
Performance of point queries
Average and maximum hops in different dimensions
Average hops in different network sizes
Performance of range and kNN queries
Average hops to find range query results
Average hops to find kNN query results
Cost of updating upside path vs cost of search
Average number of messages for updating upside paths
Average number of hops for searching queries
Workload distribution
Workload distribution among nodes
Effect of load balancing
Average additional number of messages required in case of skewed data distribution
Size of load balancing process
Access load
Access load for nodes at different levels
Conclusion
VBI-Tree
A framework capable of supporting a variety of well-tested multi-dimensional indexing methods such as the R-Tree, X-Tree, SSTree, and M-Tree in a P2P system.
Introduction of discrete data as a mean to minimize update costs and novel P2P search algorithms that account properly for such discrete data.
An AVL-tree like rotation scheme for rebalancing the virtual binary tree when needed, leading to effective load balance even with highly skewed data.
Thank you !
Questions & Answers