Transcript
  • Annotated XML: Queries and Provenance

    Nate Foster TJ Green Val Tannen

    University of PennsylvaniaSymposium on Database ProvenanceUniversity of EdinburghMay 21, 2008

  • Need to Track XML ProvenanceFor scientific data processing [Buneman+ 01]Tree-structured data, heterogeneous sources XML is the natural data modelData annotated with source info; annotations need to be propagated during query processingFor incomplete/probabilistic data [Sen.&Abit. 06]Query output annotated with Boolean formulasAnnotations indicate correlations between source data and output dataFor data warehousing [Cui+ 00]Even when data is relational, often have XML views

    *

  • Provenance for Relational Algebra Views*V := AB((AC(R) C(R)) [ (AB(R) BC(R)))source Rview V???

    ABCabcdbefge

    ABacaedcdefe

  • Semiring-Annotated Relations [PODS07]Associate each tuple in database with an annotation from a commutative semiring (K, +, , 0, 1)Combine and propagate annotations during (positive) relational query processing, , combine annotations using , [ combine annotations using + multiplies annotations by 0 or 1*

  • Annotated Relations Example

    *RVV := AB((AC(R) C(R)) [ (AB(R) BC(R)))

    ABCabcpdberfges

    ABac2p2aeprdcprde2r2 + rsfe2s2 + rs

  • Semiring Bestiary(B, , , ?, >)Set semantics(N, +, , 0, 1)Bag semantics(PosBool(B), , , ?, >)Incomplete dbs(P(), [, , ;, )Probabilistic dbs(P(P(X)), [, d, ;, {;})Why-provenancewhere A d B := {a [ b : a 2 A, b 2 B}(C, min, max, absent, public) Security clearances(N[X], +, , 0, 1)Prov. polynomials

    *

  • Our Contribution: Annotated XMLWe show how to decorate unordered XML data with semiring annotations: K-UXML

    We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery)

    We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees

    We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases*

  • K-UXMLNo attributes, no text values, no repeated children (inessential); no order (essential!)Each node decorated with a value k from semiring K (1 neutral, 0 not present)K-collection: a finite set of elements annotated with values from KFormally, the children of a node form a K-collection of subtrees (to annotate root, also have a top-level K-collection)*

  • Example: XPath on K-UXML*abx1cy3cy1adacy2bx2dSource, $T:rcx1y3 + y1y2cy1dacy2bx2Answer:Query: element r { $T//c }Omitted annotations are 1 (and omitted subtrees have annotation 0)

  • Example: For-Loops in K-UXQuery*azbx1cx2dy1dy2ey3Source, $S:Answer:Query: element p { for $t in $S return for $x in ($t)/ return ($x)/ }(i.e., element p { $S// })pd zx1y1 + zx2y2e zx2y3

  • Outline of Technical ApproachExtend NRC with a recursive tree typesatisfies: tree = label { tree }and an operation for structural recursion on trees (srt) [Robertson+ 07]apply to each child subtree, collect results using NRC big unionGeneralize NRC + srt to handle semiring-annotated complex values ) NRCK + srtDefine semantics of K-UXQuery by translation to NRCK + srt*

  • Semantics of Small UnionSums annotationse1 [ e2K (x) := e1K (x) + e2K (x)

    Example:*axbyaxbyaxbz,Query: return ($S, $T) (in NRC: $S [ $T)a2xbyaxbz,Source:Answer:

  • Semantics of Big UnionSums and multiplies annotations[(x 2 e1) e2K (y) := e1K (ai) e2K[x := ai] (y)

    where the support (the set of elements with non-zero annotations) of e1K is {a1, ..., an} *

  • Big Union Example With K = N*Query: return $T// (in NRC: [(x 2 $T) [(y 2 x) { y })b2c3bbccccccc7bcbcSource, $T :Answer:c, c, c, c, c, c, c,,,

  • XPath Descendant Operator Uses srt// applied to forest $T translates to[(x 2 $T) 1((srt(b, s) . f) x)wheref := let self = Tree(b, [(x 2 s) {2(x)} in let matches = [(x 2 s) {1(x)} in (matches [ {self}, self))//a, similar to above*

  • Data annotated with clearance levels fromtotal order C : P < C < S < T < 0

    Joint use of data () requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances)(C, min, max, 0, P) is a commutative semiringApplication: Security Clearances*Query: element p { $S//}

  • For any given clearance level (e.g., C), want the following diagram to commute:Security Condition: Non-Interference*queryqueryerase > Cerase > C

  • Application: Incomplete XMLData annotated with Boolean expressions; tree T represents set of possible worlds Mod(T)*T =7 possible worlds

  • Correctness: Possible Worlds*For every incomplete tree T, and every UXQuery query q, want this diagram to commute:TMod(T)q(Mod(T)) = Mod(q(T))q(T)qqModMod

  • Commutation with HomomorphismsTheorem: Let h : K1 K2 be a semiring homo-morphism. Then for any UXQuery query q, and for any K1-UXML document D, we have h(q(D)) = q(h(D)).Ex: security clearanceshc : C Chc(k) := if k c then k else 0Ex: incomplete dbs : B B Eval : PosBool(B) BEx: duplicate elimination : N B (k) := if k = 0 then ? else >

    *

  • Related WorkBag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06]Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07]XML provenance [Buneman+ 01]NRC provenance [Hidders+ 07]Semiring-annotated XPath [Grahne+ 07]Negation, expressiveness of RAK [Geerts&Poggi 08]*

  • ConclusionWe showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt)

    We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms

    *

  • Future WorkPractical applications based on frameworkSecurity clearancesJointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!)Query optimization: containment/equivalence wrt annotated semantics depends on KIn paper, we show K-equivalence for UXQuery is the same as B-equivalence when K is a distributive lattice

    *

  • *

  • K-UXQuery Syntax*

    highlight tuple*highlight tuple*more space in between bullets!*Animation here, make annotations black*Animation here

    *Special font for U-XQuery keywords*Make this pretty*put annotations back*Animate reveal bullets, space between theorem and examples**


Recommended