Streaming XPath / XQuery Evaluation and Course Wrap-Up Zachary G. Ives University of Pennsylvania...

Streaming XPath / XQuery Evaluationand Course Wrap-Up

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Implementing Data Management Systems

December 2, 2008

Administrivia

Recall that the final project is due – with a write-up and a 10-minute demo presentation – on Tuesday 12/16, 9-11AM

Also: course evaluations (at end)

XML: Its Roles

Perhaps used as a superset of HTML for documents, but…

Most successful as a transport format for sending data between systems SOAP, WSDL, etc. Data interchange formats like ebXML, MAGE-ML, …

So why would we want to store it in a database to query it, when we could query over XML as it streams across the network? (Note: not infinite streams, as in DSMSs, and it’s

hierarchical)

Streaming XPaths and XQueries

Suppose I give an XPath expression (which is a subset of a regular expression) Can I match it against the parse tree of the data?

An XQuery takes multiple XPaths in the FOR clause, and iterates over the elements of each Xpath (binding the variable to each) FOR $i in doc(“abc”)/xyz, $j in $i/def

We can think of an XQuery as doing tree matching, which returns tuples ($i, $j) for each tree matching $i and $j

Where This Leads

An XQuery can be broken into two operations: A parsing / tree matching stage (FOR and also LET)

* Finds matches to the variables * Returns a tuple of trees

A (mostly) pipelined SPJ / union / group by / order by engine – (WHERE, ORDER BY, nesting in RETURN) * Like a regular relational engine extended with XML tree

datatype!

The first engine to put these things together: Tukwila (Ives+ 2000, 2002)

IBM DB2 was built upon a nearly identical model – TurboXPath (Josifowski 2004)

The Key: SAX (Simple API for XML)

If we are to match XPaths in streaming fashion, we need a stream of data items

The original parser model: DOM (Document Object Model) Builds an entire object hierarchy in memory, which

is traversable Not incremental! (Until later versions)

SAX: a series of event notifications open-tag, close-tag, character data Idea: build a state machine (or similar

mechanism) to match on the events!

Different Options

Many different “streaming XPath:” matching algorithms were developed with some differences What to match with (DFA, NFA, lazy DFA, PDA,

proprietary format) Complexity of the path language (regular path

expressions, XPath), axes (downwards, upwards, sideways), internal references (IDREFs, foreign keys), recursive patterns

Which operations can be pushed into the operator (selection predicates, joins, position indices)

We’ll consider TurboXPath, highlighted in red above(Tukwila’s x-scan is highlighted in green)

From XPath Patterns to Tuplesand A Normal Query Plan

for $c in doc("d1")//customerfor $p in doc("d2")//profiles[cid/text() = $c/cid/text()]for $o in $c/order[date = ‘12/12/01’]return <result>

{$c/name} {$p/status} {$o/amount} </result>

($c/cid/text(), $c/name, $o/amount) ($p/status, $p/cid/text())

⋈Pipelined join

TurboXPath over “d1” TurboXPath over “d2”

($c/name, $p/status, $o/amount)

XML tagger (add “result”)

XPath Processing in TurboXPath

Performance Issues

Predicate pushdown Similar to “sargable predicates” – reduces the internal

state that must be run through a cross-product to produce tuples

“Smart” memory management Want to deallocate space from partial pattern matches as

early as possible

Parser efficiency We found that Xerces-C (validating C++ parser used by

TurboXPath) was 10x slower than expat (non-validating C parser)

Wrapping up…

This semester has been a whirlwind tour of many different aspects of the “data ecosystem” Storage Concurrency control Query processing Data distribution and streams Heterogeneity, mappings, and reformulation (and the

limitations thereof) Many styles of data integration XML processing

I hope I’ve been able to convey some of what makes this field both relevant and, I think, cool…

Where There Is Room for More Work (Among Many Topics)

Storage: rows versus columns Concurrency control Query processing

Is there a theory of adaptivity, and an optimal scheme? Data distribution, networks, and streams

How do we distribute to 10,000 nodes? What is the relationship between network communication and query processing?

Data integration, better support for collaboration How can we make it less human-intensive?

“Lightweight databases” Probabilistic databases Visualization and interfaces Databases meets machine learning and info retrieval

A Sampler of Some of the SystemsWork by (Some) Major DB Groups

Washington: Mystiq – probabilistic databases; distrib. streams

Stanford: Trio – probabilities and “lineage” meets databases Cornell: databases meets games; probabilistic databases Wisconsin: Cimple; database support for monitoring clusters MIT: Sensor query processing; signal processing; column

stores Berkeley: Data management for sensors and networks Maryland: Querying data models; learning and probabilities

meets databases Penn: Orchestra; data and workflow provenance; keyword

querying with learned ranks over databases; lightweight data integration; networking meets databases; sensor integration

Thanks!!!

I had a great time this semester – I hope you learned a lot and found it to be enjoyable I’m looking forward to seeing your projects!

Streaming XPath / XQuery Evaluation and Course Wrap-Up Zachary G. Ives University of Pennsylvania...

Documents

Lecture 5 - XML Databases: XPath, XQuery (26. 10. 2021

Querying XML: XPath and XQuery

Xpath Xlink Xpointer Xquery Sources: amoeller/XML

1 Lecture 4 XML/Xpath/XQuery Tuesday, January 30, 2007

XML, XML Schema, Xpath and Xquery

XPath and XQuery

XML, XPath, and XQuery Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 18, 2005 Some slide content courtesy

XQuery Helena Galhardas DEI IST. Agenda XPath – exercício XQuery

New Standards from W3C: XPath, XQuery, and XSLTewh.ieee.org/r6/scv/computer/shannon/chamberlin-shannon-lecture.p… · New Standards from W3C: XPath, XQuery, and XSLT Don Chamberlin

XPath (and XQuery)czarnik/zajecia/xml14/08-xpath-slides.pdf · XPath – status XPath 1.0 W3C Recommendation, XI 1999 used within XSLT 1.0, XML Schema, XPointer XPath 2.0 Several

XML Schema, XPath, and XQuery - The College of …cs5530/Lectures/xml_files/xml_schema_query_20… · XML Schema, XPath, and XQuery ... The 1995 SQL Reunion

XML Schema, XPath, and XQuery - School of Computingdakoop/cs5530/lectures/xml-schema-query.pdf · XML Schema, XPath, and XQuery ... SQL XML Schema Relational ... xsd="> University

XML Query Languages: XPath & XQuerydianeh/csc343_2014f/posted_tutorials/...XPath & XQuery CSC343 Tutorial Based on material from ‘XQuery’, by Priscilla Walmsley, O’Reilly Media

1 Lecture 11: Xpath/XQuery Friday, October 20, 2006

XML Schemas, XPath, and XQuery Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 19, 2004 Some slide content

XML Parsers XPath, XQuery Outline - EPFLlsir · ¥XML parsers ¥XPath ¥XQuery. 31 XQuery Motivation ¥Query is a strongly typed query language ¥Builds on XPath ¥XPath expressivity

XSLT and XQuery - fgeorges.orgfgeorges.org/papers/fgeorges-xmlss-xslt-trends-2016.pdfIntroduction • XPath 2.0, XSLT 2.0 and XQuery 1.0 have been released in 2007 • XPath and XSLT

XML and Databases XQuery, XSLT and XPath XQuery XQuery XML

XPath and XQuery - Search - University of Malta

XML, XPath€¦ · XPath, XQuery XSLT is a language for data transformation Exploits XPath for targeting parts of XML document Has XML syntax XQuery is more suitable for querying