Containment of Nested XML Queries

  • Upload
    harley

  • View
    54

  • Download
    4

Embed Size (px)

DESCRIPTION

Containment of Nested XML Queries. Presented by: Orly Goren. Xin Dong,. Alon Halevy,. Igor Tatarinov. Query Containment. The most fundamental relationship between a pair of queries Query Q is contained in Q’ if: For any database D, Q(D) is a subset of Q’(D). Roadmap. - PowerPoint PPT Presentation

Citation preview

  • Containment of Nested XML QueriesPresented by: Orly Goren

    Xin Dong,Igor TatarinovAlon Halevy,

  • Query ContainmentThe most fundamental relationship between a pair of queriesQuery Q is contained in Q if:For any database D,Q(D) is a subset of Q(D)

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP completeIn coNEXPTIME

  • Applications of Query ContainmentSemantic cachingDetermining independence of database updatesQuery answering using viewsDetecting that a reformulated query is redundant Query minimizationVerification of knowledge bases

  • Query Processing in PDMSXML Query Containment in Peer Data Management System (PDMS)

    Answering queries using views to extract remote dataRemoving redundant queries to enhance performanceQWQWQWQPQB1QB2QSQB1QS

  • Query Containment: Relational v.s. XML

    RelationalInput DSets of tuplesOutput Q(D)A set of tuplesInstance containmentQ(D) Q(D) SubsetQuery containmentQ Q for every input D, Q(D) Q(D)

  • Query Containment: Relational v.s. XML

    RelationalXMLInput DSets of tuplesAn XML instance treeOutput Q(D)A set of tuplesAn XML instance treeInstance containmentQ(D) Q(D) SubsetQ(D) Q(D) Tree embeddingQuery containmentQ Q for every input D, Q(D) Q(D)Q Q for every input D, Q(D) Q(D)

  • Example An XML InstanceD:

    Alice

    Bob

  • Example An XML QueryQ:for $x in /project return{for $y in $x/member return {where $y=Alicereturn where $y=Bobreturn }}D:Q(D):

  • Example Another XML QueryQ:for $x in /project return{for $y in /project/member return {where $y=Alicereturn where $y=Bobreturn }}D:Q(D):

  • Tree EmbeddingGiven two trees, a node mappingfrom T1 to T2 is said to be an embedding from T1 to T2 if:

    maps the root of T1 to the root of T2.If node n2 is a child of node n1 in T1, then(n2) is a child of(n1), and the labels of n1 and n2 has the same labels as(n1) and(n2).

    What is the time complexity of finding an embedding from t1 to t2?

  • XML Instance ContainmentLet e and e be two XML instances. e is contained in e, denoted as e e, if the tree of e can be embedded in the tree of e.

    Containment is reflexive and transitive.Containment is not antisymmetric: e e and e e do not imply e = e.

    aababTwo XML instances that contain each other but are not equivalent.

  • XML Query ContainmentLet Q and Q be two XML queries.Q is contained in Q, denoted as Q Q, if for every input XML instance D, Q(D) Q(D).

  • Example Tree Embedding and Query ContainmentQ(D):Q(D):

  • Query Containment ProblemFrom answer containment to query containment

    Our problemsGiven queries Q and Q, decide whether Q QThe complexity of query containmentQ(D) Q (D) Q QQ (D) Q(D) Q Q

  • Previous Work (I)Relational query containmentConjunctive queries [Chandra and Merlin, STOC 1977]Acyclic queries [Yannakakis, VLDB 1981]Queries with union [Sagiv and Yannakakis, JACM 1980]Queries with negation [Levy and Sagiv, VLDB 1993]Queries with arithmetic comparisons [Klug, JACM 1988]Recursive queries [Shmueli, 1993], [Chaudhuri and Vardi, 1992]Queries over bags [Ioannidis and Ramakrishnan, 1995]

  • Previous Work (II)XML query containment two new challengesXPath containmentWith *, // and [] [Miklau and Suciu, PODS 2002]With equality testing on tag variables [Deutsch and Tannen, KRDB 2001]Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]Nested query containment

  • Containment Cannot be Determined Solely by Comparing XPath ComponentsQ: for $g in /group where $g/gname/text() = databasereturn{for $p in $g/person return {$p/text()}{for $q in $g/paper where $q/author/text() = $p/text() return{$q/title/text()}}}Q: for $g in /group return{for $p in $g/person return {$p/text()} {$g/gname/text()}{for $q in $g/paper where $q/author/text() = $p/text() return{$q/title/text()}}}

  • Previous Work (II)XML query containment two new challengesXPath containmentWith *, // and [] [Miklau and Suciu, PODS 2002]With equality testing on tag variables [Deutsch and Tannen, KRDB 2001]Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]Nested query containmentComplex object query containment [Levy and Suciu, PODS 1997]Containment of nested XML queries has not been fully studied

  • Conjunctive XML Queries (c-XQueries)Returned variables are bound to tag names or text values only.Conjunctive no two sibling query blocks return the same tagXPath:HAVEChild axis (/)Wildcards (*)Branches ([])NOT HAVEdescendant //Arithmetic comparisonUnionHere, XPath containment is in PTIME

  • Conjunctive Queries cont.A c-XQuery consists of nested query blocks.The fan-out of a query block is the number of its immediate sub-blocks.The nesting depth of a query is 1 plus the maximal nesting depth if its sub-blocks.The nesting depth of the query is the depth of its outer-most block.

  • Query Head TreeThe structure of an XML query and its answers can be described using a query head tree.Edges represents query blocks. The label of the node n in the head tree is the returned tag of the block corresponding to the incoming edge of n in Q .A head tree is also an XML instance if its variables are substituted with actual values.

  • Query Head Tree Example:Q: for $x in /project return{for $s in $x/title/text() return{$s}} {for $t in $x/member/text() return{$t}}

    Query Head TreegroupnameprojtitlestWhat is the fan-out and the nesting depth of Q?

  • Constant Conjunctive XML Queries (cc-XQueries)A cc-XQuery is a c-XQuery that does not return tag variables.

    The head tree of a cc-XQuery has constant labels only.

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP completeIn coNEXPTIME

  • Deciding Q Q?How to find a property for an infinite number of input XML instancesStandard techniqueFind a finite set of input representatives Canonical DatabasesRelational query: each canonical database is a minimal input to generate the answer templateXML query answers have infinite number of shapesFind a finite set of answer templates Canonical Answers

  • Answer Shapes Determined by the Head Tree Q:for $x in /project return{for $y in /project/member return {where $y=Alicereturn where $y=Bobreturn }}AliceBobHead Tree:groupname

  • An Additional Candidate AnswergroupAlicenameBobHead Tree:

  • Why Consider the Additional CasegroupAlicenameBobHead Tree:Q(D):Q(D):D:

  • What can Serve as Canonical Answers?Prefix subtrees of the head tree? necessary but not sufficientTrees contained in the head tree? necessary and sufficient but, too many and too complex

  • A Head Tree can Have Many Trees Contained in itgroupnamenameAliceBobAlicegroupnamenameAliceBobAliceBobnamegroupgroupAliceBobAliceBobgroupnamenamenamegroupAlicenameBobHead Tree:

  • What can Serve as Canonical Answers?Prefix subtrees of the head tree? necessary but not sufficientTrees contained in the head tree? necessary and sufficient but, too many and too complexSolution: consider only minimal trees that are contained in the head tree

  • Canonical AnswerA minimal XML instance: No two sibling subtrees where one is contained in the other Canonical Answer : A minimal XML instance contained in the head tree

    Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA AgroupnamenameAliceBobAlicegroupAlicenameBobgroupnamenameAliceBob

  • Canonical DatabaseCanonical Database: DBCAThe minimal XML instance to generate CAgroupnamename AliceBobCA:DB:for $x in /project return{for $y in /project/member return {where $y=Alicereturn where $y=Bobreturn }}

  • Canonical Database Formal Def.Canonical Database of a cc-XQuery DBCA.DBCA is an XML instance, s.t. for each node N of CA where Ns generator query block is qn the following holds:

    Let p0/p1/pn be a path expression in qn, where p0 is an optional node variable from an ancestor query block.For each pi, i [1,n], there is a distinct node, labeled i, that is a child of the node for pi-1. If p0 is absent, then p1 is a child ofDBCAs root.

  • Sound and Complete Conditions for Nested Query ContainmentLet Q and Q be two cc-XQueries. The following three conditions are equivalent:1. Q Q2. For every canonical database DB of Q, Q(DB) Q(DB)3. For every canonical answer CA of Q,CA is a canonical answer of QDBCA DBCA

  • Properties of Canonical Answers and Databases.Lemma 1: Let Q be a cc-XQuery and D be an XML instance. There exist a unique canonical answer CA of Q, s.t. Q(D) CA and CA Q(D).

    Lemma 2: Let Q be a cc-XQuery, CA be a canonical answer of Q, DBCA be the canonical database for CA of Q, and D be an XML instance. CA Q(D) if only if DBCA D.

  • Containment of cc-XQueries Proof (1)1) => 2) Follows from definition.2) => 3) CA Q(DBCA) Q(DBCA) Q(DBCA) CA Q(DBCA) a) holds.CA is a canonical answer of Q (a), CA Q(DBCA ), DBCA DBCA b) holds.Lemma 22)Containment is transitiveLemma 2

  • Containment of cc-XQueries Proof (2)3) => 2) To show Q Q, we need to show for every XML instance D, Q(D) Q(D). There exists a unique CA of Q, s.t. Q(D) CA and CA Q(D) DBCA D. DBCA DBCA DBCA D. CA Q(D) Q(D) Q(D).

  • Query Containment AlgorithmAlgorithm:for every canonical answer CA of Q docheck whether CA is a canonical answer of Qgenerate DBCA and DBCAcheck DBCA DBCA

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1??Arbitrary??

  • Query Containment AlgorithmAlgorithm:for every canonical answer CA of Q docheck whether CA is a canonical answer of Qgenerate DBCA and DBCAcheck DBCA DBCAPolynomial in the size and number of canonical answersWhat are the sizes of canonical answers?What is the number of canonical answers?

  • Containment of XML Queries with Fanout 1E.g. d=3 the depth; m=1 the maximum fanout

    Canonical Answers and ComplexityNumber: the depth of the querySize: bounded by the depth of the queryComplexity: O( d|Q||Q|)Theorem: Testing containment of XML Queries with fanout 1 is in PTIMEfor $x in /project return{for $y in /project/member return {where $y =Alice return }}groupAlicenamegroupnamegroupNesting with fanout 1 does not increase complexity

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrary??

  • Containment of XML Queries with Arbitrary FanoutE.g. d=4 the depth; m=3 the maximum fanout

    Canonical AnswersComplexityNumber:

    Size:

    Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    NOT TIGHT Query containment in practiceConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP hardcoNP hard

  • Effect of the Depth on Containment of XML QueriesInsight: Kernel Canonical AnswerThe root node has a single childIn any subtree, a path pattern is repeated no more than cd times.d query depthc #(maximum path steps in a query block)The size of kernel canonical answersPolynomial in the query size (for fixed nesting depth).Exponential in the query depth (for arbitrary depth).Theorem: Testing containment of XML queries with fixed depth is coNP-completeTesting containment of XML queries with arbitrary depth is in coNEXPTIME

  • Effect of the Depth on Containment of XML Queries Cont.Lemma 3: Let Q and Q be two cc-XQueries. Q Q iff for each KCA of Q 1. KCA is a Canonical Answer of Q.2. DBKCA DBKCA.

    The size of a KCA is O(bcd)dThe number of KCA is O(m(bcd)d)b = #(query blocks in Q).m = #(maximum fanout in Q).

  • Effect of the Depth on Containment of XML Queries Cont.Lemma 3: Let Q and Q be two cc-XQueries. Q Q iff for each KCA of Q 1. KCA is a Canonical Answer of Q.2. DBKCA DBKCA.

    The size of a KCA is O(bcd)dThe number of KCA is O(m(bcd)d)b = #(query blocks in Q).m = #(maximum fanout in Q).

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP completeIn coNEXPTIME

  • Containment Checking in PracticeAnalyze element cardinality to reduce the number of canonical answers for containment checkingGiven the query structure and the underlying XML database schema, we can infer the cardinality of elements in the query answer. Specifically, CAs are pruned according to the following 3 rules:1. (=1) The schema implies that the a certain element occurs exactly once under its parent element.2. (1) A schema implies that t will occur at least once under its parent element.3. (1) Schema indicates a certain element occurs at most once under its parent element.

  • Containment Checking in Practice Example

    #canonical answers originally : 71 after analysis : 2Q: for $g in /group where $g/gname/text() = databasereturn{for $p in $g/person return {$p/text()}{for $q in $g/paper where $q/author/text() = $p/text() return{$q/title/text()}}}Q: for $g in /group return{for $p in $g/person return {$p/text()} {$g/gname/text()}{for $q in $g/paper where $q/author/text() = $p/text() return{$q/title/text()}}}

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP completeIn coNEXPTIME

  • An Example Query that Returns Tag Variablesfor $x in dbGrp return{for $y in $x/proj return {for $u in $y/member return $u/text() for $v in $y/paper return $v/text() }}

  • Deciding Query ContainmentLeverage previous results simulation mapping [Levy and Suciu, PODS97]Check query simulation mapping for every canonical answerComplexitySimulation mapping can be checked in polynomial time in terms of query sizeComplexity of checking containment does not arise

  • RoadmapIntroduction and problem definitionContainment of a subset of XML queriesQuery containment is decidable

    Query containment in practiceRelaxing the assumptionsConclusions

    DepthFanoutFixedArbitrary= 1PTIMEPTIMEArbitrarycoNP completeIn coNEXPTIME

  • Other Extensions

    QueryTypeNo tag variablesWith tag variablesWith unionsWithnegWith//Witheuiq-join on tagsWith arith compUn-nestedPTIMEPTIMEcoNP completecoNP completecoNP completeNP complete2P completeFan-out=1PTIMEPTIMEcoNP completecoNP completecoNP completeNP complete2P completeFixed- depthcoNP completecoNP completecoNP completecoNP completecoNP complete2P complete2P completeGeneralin coNEXPTIME

  • ConclusionsContributionsA sound and complete condition for containment of nested XML queriesDetailed complexity analysisFuture workEvaluate and optimize the containment algorithm with element cardinality analysisAnswering nested XML queries using views

  • Containment cannot be determined solely based on comparing their respective XPath components. The two queries, Q and Q take an input document consisting of paper and person elements where a person element contains a person name and a paper element contains a title and author sub-elements. In the output the person element have a name and paper sub-elements instead. Clearly, checking containment of XPath fragments in every block is not sufficient to establish that Q is contained in Q. In addition, one must consider the structure of the queries, the predicates and returned values that appear in each block and how the XPath expressions are spread across the query blocks.Are similar to select-project-join in SQLTag names or text values are tag variablesNote that if an XML query has variable bound to an XML element, we can easily expand the RETURN clause according to the schema and transform the query to a c-XQuery. 2) A block can return a tag variable only when it has no siblings.Since containment of un-nested acyclic queries is in PTIME, our result isolates the exact effect o nesting on the complexity of query containment.The semantics of a c-XQuery is an extension of the semantics of an un-nested conjunctive query. Specifically, each node n in the answer is generated by a query block with the same depth as n. Note that since c-XQuery does not allow disjunction, each node has a unique generator. For every valid variable substitution in a query block, we generate an output element with the corresponding tag. When there is at least one satisfying substitution, we evaluate the blocks sub-blocks. The output element of the outer-most block of an c-XQuery is the answer to the query. C-XQUERY . n " n , , , , . Note that we consider an expression {$s} a query block with empty for and where clauses and with a return tag of $s.

    Although cc-XQuery are rarely useful in practice, we study them for two reasons: first, they already show some of the important lower bounds of query containment. And second, they help us obtain insights on the techniques we use to establish containment, which later carry over to c-XQueries.These techniques do not decrease the upper bound of the computational complexity.The effect of these techniques still needs to be verified by a thorough experimental evaluation. Second bullet intuition