Upload
nathan-mccullough
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Symmetrically Exploiting XML
Shuohao Zhang and Curtis DyresonSchool of E.E. and Computer Science
Washington State UniversityPullman, Washington, USA
The 15th International World Wide Web ConferenceMay 2006
Edinburgh, Scotland
Symmetrically Exploiting XML: Zhang, Dyreson
• Hierarchical model vs. relational model• Codd: symmetric exploitation of data
part/project works on some, but not all
• Path expressions are asymmetric• Currently, all XML query languages use path expressions
1970’s Database Controversy
Part
Project
Project
Part
Commit
Project Part
Symmetrically Exploiting XML: Zhang, Dyreson
Querying Data with Path Expressions
• Task Find books by E. F. Codd
• XQuery return doc("author.xml")//author[name= 'E. F. Codd']/book
name
author
book book
titletitle publisherpublisher price price
Addison Wesley Academic PressDB 46.95 Automata 9.99
E. F. Codd
Symmetrically Exploiting XML: Zhang, Dyreson
Same Data, Different Structure
• Same task Find books by E. F. Codd
• Need different XQuery return doc("book.xml")//book[author/name='E. F. Codd']
publisher
book book
titletitle authorauthor price price
Addison Wesley
DB 46.95 Automata 9.99name
E. F. Codd
publisher
Academic Pressname
Codd
name
author
book book
titletitle publisherpublisher price price
Addison Wesley Academic PressDB 46.95 Automata 9.99
E. F. Codd
Symmetrically Exploiting XML: Zhang, Dyreson
Goal
• Make same query work on different structures
• Useful when there is lack of schema knowledge heterogeneous data irregular data schema evolution
• Factor off problem of different label sets, others are working on it
Symmetrically Exploiting XML: Zhang, Dyreson
Existing Axes are Directional
preceding followingdescendent
ancestor
self
Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis
preceding followingdescendent
ancestor
self
Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis
preceding followingdescendent
ancestor
self
Symmetrically Exploiting XML: Zhang, Dyreson
Proposal: A Non-directional Axis
preceding followingdescendent
ancestor
self
Symmetrically Exploiting XML: Zhang, Dyreson
The Closest Axis• Syntax
closest:: ->name is abbreviation for closest::name
• Semantics a function that takes a context node and returns a sequence of
closest nodes
Symmetrically Exploiting XML: Zhang, Dyreson
Closest Axis of the First Title
• closest::* Returns a list of five nodes
• closest::price Returns the first price node
name
author
book book
titletitle publisherpublisher price price
Symmetrically Exploiting XML: Zhang, Dyreson
• Node selection restricted by minimal type distance The minimal distance between a title and a price is 2
• closest::price Returns an empty list
When the First Book Lacks a Price
name
author
book book
titletitle publisherpublisher price
Symmetrically Exploiting XML: Zhang, Dyreson
• closest::name for each book?
• Root-to-node path type author/name author/book/publisher/name
Type Distance is Crucial
name
author
book book
titletitle publisherpublisher price
name
Symmetrically Exploiting XML: Zhang, Dyreson
Querying with the Closest Axes
Closest axis-enabled
XQuery evaluation
engine
Query
Same query --return doc("any.xml")->author[->name='E. F. Codd']->book
Query Result#2
Result#3
Query
Result#1
Symmetrically Exploiting XML: Zhang, Dyreson
Querying with Directional Axes
XQuery
evaluation
engine
Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book
Query#2 -- ……
Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd']
Result#2
Result#3
Result#1
Symmetrically Exploiting XML: Zhang, Dyreson
Find the closest price for title Non-directional expression closest::price
Directional (path) expression parent::*/child::price
In-memory Implementation
• Naïve approach Compute Closest for every node Time complexity is O(sn2)
s: number of labels in the signature n: number of nodes
• Converting to a path expression
name
author
book
title publisher price
Symmetrically Exploiting XML: Zhang, Dyreson
Experiment
• Compare directional vs. nondirectional for $b in doc("bib.xml")//title/closest::publisher
return $b
for $b in doc("bib.xml")//title/..//publisher
return $b
• Implemented closest in
eXist (an XML DBMS)
0
200
400
600
800
1000
1200
1400
1600
2500
0
5000
0
7500
0
1000
00
1250
00
1500
00
Number of Nodes
Tim
e (m
illi
seco
nd
s)
descendant
closest
Symmetrically Exploiting XML: Zhang, Dyreson
Persistent Implementation
• Take advantage of type indexes• LCA-join
Every Closest pair related via an LCA Idea is to merge lists of types
O(sn)
Symmetrically Exploiting XML: Zhang, Dyreson
Related Work
• Data integration TSIMMIS
Garcia-Molina et al. (Journal of Intelligent Information Systems 1997) YAT
Christophides, Cluet, Simèon (SIGMOD Record June 2000) Silkroute
Fernandez, Tan, Suciu (WWW 2000)
• LCA-related techniques Schmidt, Kersten, Windhouwer (ICDE 2001) Cohen, Mamou, Kanza, Sagiv (VLDB 2003) Li, Yu, Jagadish (VLDB 2004)
Symmetrically Exploiting XML: Zhang, Dyreson
Related Research Projects
• XML Restructuring Zhang, Dyreson (IIWeb 2006)
• XML Compaction Zhang, Dyreson, Dang (DASFAA 2006)
• Common theme – symmetric exploitation!
Symmetrically Exploiting XML: Zhang, Dyreson
Conclusion
• Current XQuery depends on path expressions
• A path expression is directional (asymmetric) May break down if structure changes
• The closest axis is non-directional (symmetric) Simple in syntax
Can be easily integrated in XQuery Can be implemented efficiently
In-memory Persistent
Thank You!