Symposium on Database Provenance University of Edinburgh May 21, 2008

  • Upload
    hoshi

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania. Symposium on Database Provenance University of Edinburgh May 21, 2008. Need to Track XML Provenance. For scientific data processing [Buneman+ 01] - PowerPoint PPT Presentation

Citation preview

  • Annotated XML: Queries and Provenance

    Nate Foster TJ Green Val Tannen

    University of PennsylvaniaSymposium on Database ProvenanceUniversity of EdinburghMay 21, 2008

  • Need to Track XML ProvenanceFor scientific data processing [Buneman+ 01]Tree-structured data, heterogeneous sources XML is the natural data modelData annotated with source info; annotations need to be propagated during query processingFor incomplete/probabilistic data [Sen.&Abit. 06]Query output annotated with Boolean formulasAnnotations indicate correlations between source data and output dataFor data warehousing [Cui+ 00]Even when data is relational, often have XML views

    *

  • Provenance for Relational Algebra Views*V := AB((AC(R) C(R)) [ (AB(R) BC(R)))source Rview V???

    ABCabcdbefge

    ABacaedcdefe

  • Semiring-Annotated Relations [PODS07]Associate each tuple in database with an annotation from a commutative semiring (K, +, , 0, 1)Combine and propagate annotations during (positive) relational query processing, , combine annotations using , [ combine annotations using + multiplies annotations by 0 or 1*

  • Annotated Relations Example

    *RVV := AB((AC(R) C(R)) [ (AB(R) BC(R)))

    ABCabcpdberfges

    ABac2p2aeprdcprde2r2 + rsfe2s2 + rs

  • Semiring Bestiary(B, , , ?, >)Set semantics(N, +, , 0, 1)Bag semantics(PosBool(B), , , ?, >)Incomplete dbs(P(), [, , ;, )Probabilistic dbs(P(P(X)), [, d, ;, {;})Why-provenancewhere A d B := {a [ b : a 2 A, b 2 B}(C, min, max, absent, public) Security clearances(N[X], +, , 0, 1)Prov. polynomials

    *

  • Our Contribution: Annotated XMLWe show how to decorate unordered XML data with semiring annotations: K-UXML

    We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery)

    We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees

    We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases*

  • K-UXMLNo attributes, no text values, no repeated children (inessential); no order (essential!)Each node decorated with a value k from semiring K (1 neutral, 0 not present)K-collection: a finite set of elements annotated with values from KFormally, the children of a node form a K-collection of subtrees (to annotate root, also have a top-level K-collection)*

  • Example: XPath on K-UXML*abx1cy3cy1adacy2bx2dSource, $T:rcx1y3 + y1y2cy1dacy2bx2Answer:Query: element r { $T//c }Omitted annotations are 1 (and omitted subtrees have annotation 0)

  • Example: For-Loops in K-UXQuery*azbx1cx2dy1dy2ey3Source, $S:Answer:Query: element p { for $t in $S return for $x in ($t)/ return ($x)/ }(i.e., element p { $S// })pd zx1y1 + zx2y2e zx2y3

  • Outline of Technical ApproachExtend NRC with a recursive tree typesatisfies: tree = label { tree }and an operation for structural recursion on trees (srt) [Robertson+ 07]apply to each child subtree, collect results using NRC big unionGeneralize NRC + srt to handle semiring-annotated complex values ) NRCK + srtDefine semantics of K-UXQuery by translation to NRCK + srt*

  • Semantics of Small UnionSums annotationse1 [ e2K (x) := e1K (x) + e2K (x)

    Example:*axbyaxbyaxbz,Query: return ($S, $T) (in NRC: $S [ $T)a2xbyaxbz,Source:Answer:

  • Semantics of Big UnionSums and multiplies annotations[(x 2 e1) e2K (y) := e1K (ai) e2K[x := ai] (y)

    where the support (the set of elements with non-zero annotations) of e1K is {a1, ..., an} *

  • Big Union Example With K = N*Query: return $T// (in NRC: [(x 2 $T) [(y 2 x) { y })b2c3bbccccccc7bcbcSource, $T :Answer:c, c, c, c, c, c, c,,,

  • XPath Descendant Operator Uses srt// applied to forest $T translates to[(x 2 $T) 1((srt(b, s) . f) x)wheref := let self = Tree(b, [(x 2 s) {2(x)} in let matches = [(x 2 s) {1(x)} in (matches [ {self}, self))//a, similar to above*

  • Data annotated with clearance levels fromtotal order C : P < C < S < T < 0

    Joint use of data () requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances)(C, min, max, 0, P) is a commutative semiringApplication: Security Clearances*Query: element p { $S//}

  • For any given clearance level (e.g., C), want the following diagram to commute:Security Condition: Non-Interference*queryqueryerase > Cerase > C

  • Application: Incomplete XMLData annotated with Boolean expressions; tree T represents set of possible worlds Mod(T)*T =7 possible worlds

  • Correctness: Possible Worlds*For every incomplete tree T, and every UXQuery query q, want this diagram to commute:TMod(T)q(Mod(T)) = Mod(q(T))q(T)qqModMod

  • Commutation with HomomorphismsTheorem: Let h : K1 K2 be a semiring homo-morphism. Then for any UXQuery query q, and for any K1-UXML document D, we have h(q(D)) = q(h(D)).Ex: security clearanceshc : C Chc(k) := if k c then k else 0Ex: incomplete dbs : B B Eval : PosBool(B) BEx: duplicate elimination : N B (k) := if k = 0 then ? else >

    *

  • Related WorkBag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06]Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07]XML provenance [Buneman+ 01]NRC provenance [Hidders+ 07]Semiring-annotated XPath [Grahne+ 07]Negation, expressiveness of RAK [Geerts&Poggi 08]*

  • ConclusionWe showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt)

    We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms

    *

  • Future WorkPractical applications based on frameworkSecurity clearancesJointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!)Query optimization: containment/equivalence wrt annotated semantics depends on KIn paper, we show K-equivalence for UXQuery is the same as B-equivalence when K is a distributive lattice

    *

  • *

  • K-UXQuery Syntax*

    highlight tuple*highlight tuple*more space in between bullets!*Animation here, make annotations black*Animation here

    *Special font for U-XQuery keywords*Make this pretty*put annotations back*Animate reveal bullets, space between theorem and examples**