View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1
Rewriting Nested XML Queries Using Nested Views
Nicola Onose
joint work with
Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola
University of California, San Diego
2
query result
The problem
…
• views defined by queries V1, …, Vn and materialized as docV1, …, docVn
the query Q
docVndocV1
V1 Vn
Can we answer Q using only view access paths?
Input XML data
INTRO
3
The problem
• views defined by queries V1, …, Vn and materialized as docV1, …, docVn
• is there a query R such that R(V1(Input) … Vn(Input)) = Q(Input)?
query result
…
the query Q
the rewritingquery R
docVndocV1
V1 Vn
Input XML data
INTRO
4
Motivation: caching & indexes
• caching: answer new queries using results of previously answered ones
• (partial) indexes: materialized references to frequently accessed parts of the data
materialized views, faster to access than the original input
query result
…
the query Q
the rewritingquery R
docVndocV1
V1 Vn
Input XML data
INTRO
5
query result
Motivation: security views
…
• checking existence of R security problem:allow only queries that can be expressed in terms of certain permitted queries, the security views
the query Q
the rewritingquery R
docVndocV1
V1 Vn
security views(permitted queries)Input XML data
INTRO
6
query result
Motivation: data integration
…
• data integration: given a query expressed in global terms, rewrite it using the descriptions of the particular sources
the query Q
the rewritingquery R
source1 sourcen
local/global mappings expressed as views
INTRO
Virtual global DB
7
Rewritings enabled by pattern matching
• Previous literature: find parts of the query that are precomputed by the views.
• How to decide that: match the patterns of the views into the query– In the relational case, patterns were: tableaux, conjunctive
queries– For XPath: tree patterns
• Matching XML queries?– (until recently) no pattern based description of XQuery
semantics– Nested XML Tableaux (NEXT) come to fill the gap
The NEXT Logical Framework for XQuery, A.Deutsch et al., VLDB’04
INTRO
8
Scope of Our Approach
• Nested XML Tableaux (NEXT) extend previous work on tree patterns.
• NEXT+ extends NEXT to the whole XQuery.
Tree Patterns cover XPath
NEXT extend TreePatterns with: - nested for-loops - joins - element construction etc.
NEXT+ extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc.
INTRO
9
Scope of Our ApproachINTRO
Tree Patterns cover XPath
NEXT extend TreePatterns with: - nested for-loops - joins - element construction etc.
NEXT+ extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc.
soundness guarantee:if a rewriting is found, it
is equivalent to the original query
completeness guarantee:if a rewriting exists, we will
find one
10
Query Q: group titles by author
for each distinct author, output the titles of his/her books
View V: group authors by title
for each book, output its title and the list of authors
Rewriting using views example
Rewriting R
scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective authorData on the
Web
bib.xml
book
titleauthor
The result of the view is cached and has faster access time than getting
the data directly from the source
INTRO
11
View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>
Rewriting using views example
Rewriting R
scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective author
INTRO
Previous work captures: - XPath navigation
Query Q: group titles by author
for each distinct author, output the titles of his/her books
12
View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>
Rewriting using views example
Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>
Previous work captures: - XPath navigation
NEXT captures: - XPath navigation
- nested for loops
- joins
- element construction etc.
INTRO
13
View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>
Rewriting using views example
Query Q: group titles by authorfor in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some in $b/author satisfies $a1 eq $a return $t } </bibentry>
INTRO
Previous work captures: - XPath navigation
NEXT captures: - XPath navigation
- nested for loops
- joins
- element construction etc.
$a1
$a
14
View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>
Rewriting using views example
Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>
INTRO
Previous work captures: - XPath navigation
NEXT captures: - XPath navigation
- nested for loops
- joins
- element construction etc.
15
Rewriting using views example
Data on the Web
bib.xml
book
titleauthor
Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry> bound to
the root of the view output
INTRO
View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>
Rewriting Rfor $a3 in distinct-values($docV/authorlist[title]/author)return <bibentry> { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $t3 } </bibentry>
navigate inside the
view output
16
Outline
• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions
17
Outline
• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions
18
Architecture of the NEXT framework
Nested XML Tableaux (NEXT)
Normalization
XQuery query and views
MinimizationRewriting
Using Views
Logical Optimization
Plan Execution Engine
Logical Plan
VLDB’04 presented at this
conference
NEXT
patterns
Nested XML Tableaux (NEXT)
Translate to XQuery
To Any XQuery Processor
19
The need for normalization
Nested XML Tableaux(NEXT)
Normalization
XQuery query and views
NEXT
for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>
20
Normalization into NEXT
Nested XML Tableaux(NEXT)
Normalization
XQuery query and views for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>
NEXT
for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a return $t } </bibentry>
21
Normalization into NEXT
Nested XML Tableaux(NEXT)
Normalization
XQuery query and views
NEXT
for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a groupby [$b], [$t] return $t } </bibentry>
for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>
cardinality?
NEXT
…
22
NEXT Patterns
book($b1)
title($t1)
book($b1)
author($a2)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
B2(V)
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
View V:
• graphical representation of NEXT: nested patterns
NEXT
B1(V)B2(V)
forest of tree
patterns
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
23
NEXT Patterns
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
View V:
book($b1)
title($t1)
book($b1)
author($a2)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
B2(V)
• graphical representation of NEXT: nested patterns
NEXT
B1(V)B2(V)
descendant navigation
child navigation
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
24
NEXT Patterns
book($b1)
title($t1)
book($b1)
author($a2)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
B2(V)
return function
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
View V:
• graphical representation of NEXT: nested patterns
NEXT
B1(V)B2(V)
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
25
NEXT Patterns
book($b1)
title($t1)
book($b1)
author($a2)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
B2(V)
list of groupby variable
s
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
View V:
• graphical representation of NEXT: nested patterns
NEXT
B1(V)B2(V)
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
26
NEXT Patterns
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0)
Query Q:
author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
$doc <bibentry> $a, B2(Q)</bibentry>
$t
B1(Q)
$a
B2(Q)
[$b], [$t]
B2(V)
for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/authorgroupby $areturn <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in
$b/title where $a1 eq $a groupby [$b],[$t] return $t } </bibentry>
NEXT
View V:
• graphical representation of NEXT: nested patterns
B1(V)B2(V)
B1(Q)B2(Q)
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
27
NEXT Patterns
• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
B1(V)
[$a2]
[$b1],[$t1]
$doc
$doc <bibentry> $a, B2(Q)</bibentry>
$t
B1(Q)
$a
B2(Q)
[$b], [$t]
B2(V)
NEXT
View V:
• graphical representation of NEXT: nested patterns
Query Q:for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/authorgroupby $areturn <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in
$b/title where $a1 eq $a groupby [$b],[$t] return $t } </bibentry>
for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>
28
Outline
• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions
29
Architecture of the NEXT framework
Nested XML Tableaux (NEXT)
Normalization
XQuery query and views
MinimizationRewriting
Using Views
Logical Optimization
Plan Execution Engine
Logical Plan
NEXT
Nested XML Tableaux (NEXT)
Translate to XQuery
Independent XQuery Processor
rewriting algorith
m
30
Overview of the Rewriting Algorithm
Input: query Q, views V1. detect alternative access paths towards the variable bindings
through the views
2. build a candidate rewriting R that uses only the access paths from phase 1.
3. check that R is equivalent to Q
REWRITING ALGORITHM
Query QAccess paths
through V
Access paths(candidate rewriting)
31
Step 1: Detect View Access Paths
• access paths: ways of accessing data using the view• identify matching subqueries
(extended tree pattern matching)• find a mapping and add navigation from the view return
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
$doc
$doc
view query body
REWRITING ALGORITHM
32
Step 1: Detect View Access Paths
• access paths: ways of accessing data using the view• identify matching subqueries
(extended tree pattern matching)• find a mapping and add navigation from the view return
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
$doc
$doc
view query body
$docV
authorlist($p0)
title($t2)
extended query
REWRITING ALGORITHM
33
Step 1: Detect View Access Paths
• access paths: ways of accessing data using the view• identify matching subqueries
(extended tree pattern matching)• find a mapping and add navigation from the view return• and another one…
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
$doc
$doc
view query body
$docV
authorlist($p0)
extended query
author($a3)title($t2)
REWRITING ALGORITHM
34
Step 1: Detect View Access Paths
• access paths: ways of accessing data using the view• identify matching subqueries
(extended tree pattern matching)• find a mapping and add navigation from the view return• and another one…• computing all such mappings query extension that uses only view
access paths
book($b1)
title($t1)
book($b1)
author($a2)
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
<authorlist> $t1, B2(V)</authorlist>
$a2
$doc
$doc
view query body
extended query
$docV
authorlist($p0)
title($t2) author($a3)
authorlist($p)
title($t3) author($a4)
$docV
query extension
REWRITING ALGORITHM
35
Step 2: Candidate Rewriting
• same return function as the initial query, but with other variable bindings
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
$doc
original query
$docV
authorlist($p0)
title($t2) author($a3)
authorlist($p)
title($t3) author($a4)
$docV
extended query
<bibentry> $a, B2(Q)</bibentry>
$t
B1(Q)
$a
B2(Q)
[$b], [$t]
REWRITING ALGORITHM
36
Step 2: Candidate Rewriting
• same return function as the initial query, but with other variable bindings
$doc
book($b0)
title($t0) author($a)
book($b)
title($t) author($a1)
$doc
original query
$docV
authorlist($p0)
title($t2) author($a3)
authorlist($p)
title($t3) author($a4)
$docV
<bibentry> $a3, B2(R)</bibentry>
$t3
B1(R)
B2(R)
$a3
[$t3]
candidate rewriting
B1(Q)
$a
B2(Q)
[$b], [$t]
REWRITING ALGORITHM
37
Step 3: Equivalence Check
• check that R ≡ Q: containment mappings defined on the tree of query blocks
• and then (optional step) translate back to XQuery:
$docV
authorlist($p0)
title($t2) author($a3)
authorlist($p)
title($t3) author($a4)
$docV
<bibentry> $a3, B2(R)</bibentry>
$t3
B1(R)
B2(R)
$a3
[$t3]
Rewriting R:
for $a3 in distinct-values($docV/authorlist[title]/author)
return <bibentry> { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $p } </bibentry>
REWRITING ALGORITHM
38
Under the Hood
• two types of equality: by value and by node id– mappings must take it into consideration– the groupby clause also
• XQuery results have order. We consider rewritings that:– do not respect order (for DB-centric applications)– respect order (for text-centric applications)
• for rewritings that respect order: look for an ordering of the view access paths that preserves the original query order (details in the paper)
REWRITING ALGORITHM
39
for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x
Extensions to NEXT
• Extended NEXT to NEXT+: – extend the pattern based representation to the whole XQuery– functions and other expressions (negation, disjunction,
aggregates etc.) modeled as uninterpreted functions
• Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks
REWRITING ALGORITHM
40
Extensions to NEXT
• Extended NEXT to NEXT+: – extend the pattern based representation to the whole XQuery– functions and other expressions (negation, disjunction,
aggregates etc.) modeled as uninterpreted functions
• Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks.
REWRITING ALGORITHM
for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x
rewrite blocks inside function arguments, with free variables bound in upper blocks
rewrite outer block, disregarding function calls
41
• The rewriting algorithm is sound• and complete for a large fragment of XQuery (the one
that can be translated into NEXT), without order– Completeness means that if there are any rewritings, we are
guaranteed to find at least one.
• There is no hope for completeness for– ordered rewritings: equivalence is undecidable– expressions beyond NEXT: negation and universal quantification
also lead to undecidability
In these cases, our algorithm is a best effort approach, with guaranteed soundness.
Formal GuaranteesREWRITING ALGORITHM
42
Implementation (considerations)
• completeness guarantees a price to pay:compute mappings between view and query patterns
• in general, NP-complete, but PTIME if the patterns are trees (no equality conditions): based on M. Yanakakis, Algorithms for acyclic database schemes, 1981
• our goal: design an implementation whose running time is polynomial for pure tree patterns and degrades progressively with the number of added joins
REWRITING ALGORITHM
43
Implementation in practice
• when computing the query plan, apply techniques from the Yanakakis algorithm: push projections & selections
• performance degrades with the number of equalities: the problem is NP-complete in the width of the view pattern (see the paper) and in PTIME when no join equalities.
V
query plan (SPJ)
Q
XML instance
compile
evaluate
..…mappings
REWRITING ALGORITHM
compile
44
Outline
• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions
45
Experiments: Design
• The running time of the algorithm increases with:– number of nested levels: mappings are block by block– size of the pattern: # of mapped and target nodes increases– number of views: more patterns to match
• Our experiments measured how the algorithm scales with these parameters.
• We designed a configuration where we generated queries and views of increasing size and nesting depth.
EXPERIMENTS
46
Experiments: Implementation
Queries & views with similar basic patterns, in a vertical chain of blocks:$doc
mk
a c1
$doc
mk
a c2
$doc
mk+1
a c1
$doc
mk+1
a c2
…..
…..basic pattern
$doc
mk
a ci
Irrelevant views don’t matter (can be quickly discarded). We create only relevant views (with mappings into query):– split the query recursively into fragments = views– make them overlap on basic patterns
EXPERIMENTS
block Bk+1
block Bk
47
Experiments: Good Scalability
d = depth (# of nested levels in a query)
b = breadth (# of basic patterns in a block)
EXPERIMENTS
1.25s for d=16, b=16 and 128 views
48
Previous work
• rewriting XPath queries using XPath viewsRewriting XPath Queries Using Materialized ViewsW.Xu et al. VLDB 2005
• rewriting XQuery using XPath viewsA Framework for Using Materialized XPath Views in XML Query ProcessingA. Balmin et al. VLDB 2004
• rewrite an XQuery with only one XQuery view that has to contain the queryACE-XQ: A CachE-aware XQuery Answering SystemL.Chen et al. WebDB 2002
• caching common XQuery subexpressionsImplementing Memoization in a Streaming XQuery ProcessorY.Diao et al. XSym 2004
49
Conclusions
• NEXT is a pattern based representation that describes what the query result is and not how it is computed more opportunities for semantic optimizations
• extensible to all of XQuery, using NEXT+
• rewriting using views algorithm– sound for the whole language– complete for a large fragment of XQuery– good scalability– independent of the underlying algebra of the query processor