Upload
tokala
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Covering Indexes for Branching Path Queries. Raghav Kaushik , Philip Bohannon, Jeffrey F Naughton and Henry F Korth. XML as Graph Data. Leaf nodes with attributes are suppressed. oid. label(3). Non-tree edges: model IDREF relationships in the document. Branching Path Expression. - PowerPoint PPT Presentation
Citation preview
Covering Indexes for Branching Path Queries
Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth
1Abdullah Mueen
XML as Graph Data
Abdullah Mueen 2
oid
label(3)
Non-tree edges: model IDREF relationships in the document
Leaf nodes with attributes are suppressed
Branching Path Expression
Abdullah Mueen 3
ROOT/metro/neighborhoods/neighborhood[/business=>cinema-hall]/cultural=>museum
Example (1)
Abdullah Mueen 4
//hotel[/star][<=business\neighborhood[/cultural=>museum[\art]]]
Covering Index
• A covering index can answer any query from a set of queries without consulting with the original document.
• The GOAL of this paper is to find a covering index for “Branching Path Queries” .
Abdullah Mueen 5
k-bisimilarity
Abdullah Mueen 6
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
Two nodes u and v are called k-bisimilar (u ≈k v) if
1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa.
2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar
≈k defines an equivalence class over the set of nodes in G
The algorithm for computing k-bisimulation will be shown later
1-index : Covering Index for Simple Path Expression
Abdullah Mueen 7Abdullah Mueen 7
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
18
15
12
13
16
14
17
11
R
D
CB
A
DCB
{8,9}
{1}{2}
{4} {5}
{3}
{6}
19
C
{7}
18
15
12
13
16
14
17
11
R
D
CB
A
DCB
{8}
{1}{2}
{4} {5}
{3}
{6}
19
C
{7}
18
D{9}
12
13
14
15
11
R
CB
A
D
{1} {2,4}{3,5,7}
{6,8,9}
A(0) A(1)
A(2) A(3) = 1-index
data graph G
15
12
13
16
14
17
11
R
CBA
DCB
{1}{2}
{4} {5,7}
{3}
{6,8,9}
SuccStable
SuccStable
SuccStable
Inverse edges
Abdullah Mueen 8
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
5,7 are not 1-bisimilar 5,7 are 1-bisimilar
The F&B index
Abdullah Mueen 9
• While there is no change– Reverse all edges– Compute Forward Bismilarity Partition– Reverse all edges again.– Compute Backward Bisimilarity Partition
Forward Bisimulation
Abdullah Mueen 10
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
Backword Bisimulation
Abdullah Mueen 11
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
8
4
1 2
5
7
3
6
9
0 R
D
CBA
D
DC
C
B
Properties of F&B index
• The F&B index over a data graph G covers all branching path expression.
• F&B index is the smallest of the indexes that covers branching path queries.
• Generally F&B is large for most of the real documents.
Abdullah Mueen 12
1. Tags to be indexed
• There are tags that are not used for Queries. • bold, emph• We specify set of tags to be indexed.• In a 100MB document, the F&B index on all
tags has 436,000 nodes while ignoring formatting tags it has 18,000 nodes.
Abdullah Mueen 13
2. IDREF edges to be indexed
• IDREF edges are not counted in // operation.• IDREF edges are explicitly described in the
path expression by => operator.• We specify the Set of IDREF edges to be
indexed.• The 100MB document has 1.3 million nodes
with all IDREF edges while it has 18,000 nodes without any IDREF edges and formatting tags.
Abdullah Mueen 14
3. Exploiting Local Similarity
• Long Queries are not frequent and interesting.• If we restrict the length of the possible
queries, we can get much smaller index tree than the F&B index.
• We specify the length of the local path by using k-bisimilarity instead of bisimilarity while computing the F&B index.
Abdullah Mueen 15
4. Restricting Tree Depth
• Long nested conditions are less likely to occur.• We specify the maximum depth of the
conditional path expression by tree-depth (defined next).
Abdullah Mueen 16
tree depth
Abdullah Mueen 17
//museums/history/museum[/featured and <=cultural\neighborhood[/cultural=>museum[\art]]]
Definition of an Index
• A set of tags T• Set of IDREF edges on both directions reffwd
and refbwd
• Two parameters kbwd and kfwd to restrict the length of the path queries
• One parameter td to restrict the depth of the nested conditional expression.
Abdullah Mueen 18
The BPCI index
Abdullah Mueen 19
• Remove all tags not in T such that the removal does not cut out a tag in T.
• Start with label grouping as current partition P• For i=0 and i≤td
– Reverse all edges in G, retain IDREF edges only in reffwd .
– P ← Forward kfwd -Bismilar Partition of P and inc(i)
– Reverse all edges in G again, retain IDREF edges only in refbwd .
– P ← Backward kbwd-Bisimilar Partition of P and inc(i)
Variations of BPCI
Abdullah Mueen 20
Testing if an Index covers a Query
• Build the Query graph• Check if all tags and IDREF edges in the query
are in T and in (refbwd U reffwd)
• Check if the tree depth of the query is less than td of the index
• Check if all paths in the query with even tree depth have length < kbwd
• Check if all paths in the query with odd tree depth have length < kfwdAbdullah Mueen 21
Result on Xmark benchmark
Abdullah Mueen 22
1. Iall is the F&B index2. Iallmost-all is F&B with kfwd = 13. Ispecific is built on the query
Result
Abdullah Mueen 23
Conlclusion
• BPCI is the covering index for Branching Path Queries.
• By setting appropriate parameters, we can get a wide range of queries suitable for various applications.
• Extensions– Updating and Bulk loading– Integration with value indexes
Abdullah Mueen 24