Annotated XML: Queries and Provenance
Nate Foster TJ Green Val Tannen
University of PennsylvaniaSymposium on Database ProvenanceUniversity of EdinburghMay 21, 2008
Need to Track XML ProvenanceFor scientific data processing [Buneman+ 01]Tree-structured data, heterogeneous sources XML is the natural data modelData annotated with source info; annotations need to be propagated during query processingFor incomplete/probabilistic data [Sen.&Abit. 06]Query output annotated with Boolean formulasAnnotations indicate correlations between source data and output dataFor data warehousing [Cui+ 00]Even when data is relational, often have XML views
*
Provenance for Relational Algebra Views*V := AB((AC(R) C(R)) [ (AB(R) BC(R)))source Rview V???
ABCabcdbefge
ABacaedcdefe
Semiring-Annotated Relations [PODS07]Associate each tuple in database with an annotation from a commutative semiring (K, +, , 0, 1)Combine and propagate annotations during (positive) relational query processing, , combine annotations using , [ combine annotations using + multiplies annotations by 0 or 1*
Annotated Relations Example
*RVV := AB((AC(R) C(R)) [ (AB(R) BC(R)))
ABCabcpdberfges
ABac2p2aeprdcprde2r2 + rsfe2s2 + rs
Semiring Bestiary(B, , , ?, >)Set semantics(N, +, , 0, 1)Bag semantics(PosBool(B), , , ?, >)Incomplete dbs(P(), [, , ;, )Probabilistic dbs(P(P(X)), [, d, ;, {;})Why-provenancewhere A d B := {a [ b : a 2 A, b 2 B}(C, min, max, absent, public) Security clearances(N[X], +, , 0, 1)Prov. polynomials
*
Our Contribution: Annotated XMLWe show how to decorate unordered XML data with semiring annotations: K-UXML
We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery)
We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees
We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases*
K-UXMLNo attributes, no text values, no repeated children (inessential); no order (essential!)Each node decorated with a value k from semiring K (1 neutral, 0 not present)K-collection: a finite set of elements annotated with values from KFormally, the children of a node form a K-collection of subtrees (to annotate root, also have a top-level K-collection)*
Example: XPath on K-UXML*abx1cy3cy1adacy2bx2dSource, $T:rcx1y3 + y1y2cy1dacy2bx2Answer:Query: element r { $T//c }Omitted annotations are 1 (and omitted subtrees have annotation 0)
Example: For-Loops in K-UXQuery*azbx1cx2dy1dy2ey3Source, $S:Answer:Query: element p { for $t in $S return for $x in ($t)/ return ($x)/ }(i.e., element p { $S// })pd zx1y1 + zx2y2e zx2y3
Outline of Technical ApproachExtend NRC with a recursive tree typesatisfies: tree = label { tree }and an operation for structural recursion on trees (srt) [Robertson+ 07]apply to each child subtree, collect results using NRC big unionGeneralize NRC + srt to handle semiring-annotated complex values ) NRCK + srtDefine semantics of K-UXQuery by translation to NRCK + srt*
Semantics of Small UnionSums annotationse1 [ e2K (x) := e1K (x) + e2K (x)
Example:*axbyaxbyaxbz,Query: return ($S, $T) (in NRC: $S [ $T)a2xbyaxbz,Source:Answer:
Semantics of Big UnionSums and multiplies annotations[(x 2 e1) e2K (y) := e1K (ai) e2K[x := ai] (y)
where the support (the set of elements with non-zero annotations) of e1K is {a1, ..., an} *
Big Union Example With K = N*Query: return $T// (in NRC: [(x 2 $T) [(y 2 x) { y })b2c3bbccccccc7bcbcSource, $T :Answer:c, c, c, c, c, c, c,,,
XPath Descendant Operator Uses srt// applied to forest $T translates to[(x 2 $T) 1((srt(b, s) . f) x)wheref := let self = Tree(b, [(x 2 s) {2(x)} in let matches = [(x 2 s) {1(x)} in (matches [ {self}, self))//a, similar to above*
Data annotated with clearance levels fromtotal order C : P < C < S < T < 0
Joint use of data () requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances)(C, min, max, 0, P) is a commutative semiringApplication: Security Clearances*Query: element p { $S//}
For any given clearance level (e.g., C), want the following diagram to commute:Security Condition: Non-Interference*queryqueryerase > Cerase > C
Application: Incomplete XMLData annotated with Boolean expressions; tree T represents set of possible worlds Mod(T)*T =7 possible worlds
Correctness: Possible Worlds*For every incomplete tree T, and every UXQuery query q, want this diagram to commute:TMod(T)q(Mod(T)) = Mod(q(T))q(T)qqModMod
Commutation with HomomorphismsTheorem: Let h : K1 K2 be a semiring homo-morphism. Then for any UXQuery query q, and for any K1-UXML document D, we have h(q(D)) = q(h(D)).Ex: security clearanceshc : C Chc(k) := if k c then k else 0Ex: incomplete dbs : B B Eval : PosBool(B) BEx: duplicate elimination : N B (k) := if k = 0 then ? else >
*
Related WorkBag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06]Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07]XML provenance [Buneman+ 01]NRC provenance [Hidders+ 07]Semiring-annotated XPath [Grahne+ 07]Negation, expressiveness of RAK [Geerts&Poggi 08]*
ConclusionWe showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt)
We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms
*
Future WorkPractical applications based on frameworkSecurity clearancesJointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!)Query optimization: containment/equivalence wrt annotated semantics depends on KIn paper, we show K-equivalence for UXQuery is the same as B-equivalence when K is a distributive lattice
*
*
K-UXQuery Syntax*
highlight tuple*highlight tuple*more space in between bullets!*Animation here, make annotations black*Animation here
*Special font for U-XQuery keywords*Make this pretty*put annotations back*Animate reveal bullets, space between theorem and examples**