Upload
hong
View
43
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Introduction to XML and XQuery. Guangjun (Kevin) Xie. Road Map. XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery. XML Data Model XML Information Set (Infoset). Infoset is an abstract data set containing all information in an XML document - PowerPoint PPT Presentation
Citation preview
Introduction to XML and XQuery
Guangjun (Kevin) Xie
Nov 28, 2005 York University 2
Road Map
XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005 York University 3
XML Data ModelXML Information Set (Infoset)
Infoset is an abstract data set containing all information in an XML document
provide a consistent set of definitions to refer to the information in a well-formed XML document
Usually, Infosets result from parsing XML documents; but it could also be synthetic
By use of an API, such as DOM By transforming from existing infoset
An infoset consists of a number of information items.
Nov 28, 2005 York University 4
XML Data Model XML Infoset
"information set" and "information item" are similar in meaning to the generic terms "tree" and "node”
An information item is an abstract description of some part of an XML document.
Each information item has a set of associated named properties, indicated as [property name]
Nov 28, 2005 York University 5
XML Data Model Information Items
11 types of information items1. Document Information Item 2. Element Information Items 3. Attribute Information Items 4. Character Information Items 5. Processing Instruction Information Items 6. Unexpanded Entity Reference Information Items 7. Comment Information Items 8. The Document Type Declaration Information Item 9. Unparsed Entity Information Items 10. Notation Information Items 11. Namespace Information Items
We will discuss the first 3 today
Nov 28, 2005 York University 6
XML Data Model Document Information Item
Exactly one doc item in an infoset Other information accessible thru its
properties: [children] – containing PIs, comments, etc [document element] – element item
corresponding to the document element [version] – XML version of the document … etc
Nov 28, 2005 York University 7
XML Data Model Element Information Items
One element item for each element in XML document
The “root” element item is the [document element] prop. of document info item
Properties: [namespace name] – the ns part of tag name [local name] – the local part of tag name [children] – all other info items inside [attributes] – attributes elems of this item [parent] – info. Item containing this item … etc.
Nov 28, 2005 York University 8
XML Data Model Attribute information items
One attribute item for each attribute in an XML element
Properties: [namespace name] – the ns part of tag name [local name] – the local part of tag name [attribute type] – the data type of this attribute [owner element] – the element info item containing
this attr … etc
Nov 28, 2005 York University 9
XML Data Model Infoset example
<?xml version="1.0"?><msg:message doc:date="19990421" xmlns:doc=“http://doc.example.org/namespaces/doc” xmlns:msg="http://message.example.org/" >Phone home!</msg:message>
The information set contains: A document information item. An element information item with namespace name
"http://message.example.org/", local part "message", and prefix "msg". An attribute information item with the namespace name
"http://doc.example.org/namespaces/doc", local part "date", prefix "doc", and normalized value "19990421".
Three namespace information items for the http://www.w3.org/XML/1998/namespace, http://doc.example.org/namespaces/doc, and http://message.example.org/ namespaces.
Two attribute information items for the namespace attributes. Eleven character information items for the character data.
Nov 28, 2005 York University 10
XML Data Model Infoset Example
Version=1.0
msg:message
xmlns:msgxmlns:doc
P h o en h o em !
doc:date
Legend:
Document info. Item Element info. Item Attribute info. Item Character info. Item
Nov 28, 2005 York University 11
Road Map
XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005 York University 12
XML Data vs Relational Data Relational DB stems from commercial data
processing Information usually has regular structure
XML has roots in text documents processing Often have irregular structure.
Both are general model and capable of representing all forms of information.
Different heritages cause them to be optimized for different types of applications.
Nov 28, 2005 York University 13
XML Data vs Relational Data Nesting
XML Model Deeply nested structure Flexible (un-predefined) Query easily handled by “descendants” axis
in XPath 2.0
Relational Model Flat table structure Primary-foreign keys represent nesting
relationship Complex and flexible nesting may result in
awkward queries
Nov 28, 2005 York University 14
XML Data vs Relational Data Metadata
XML Model Metadata mixed with ordinary data Hight ratio of metadata to ordinary data
Relational Model Metadata easily factored out Difficult when query involve metadata
Ex: find the names of columns containing the value “red”
Nov 28, 2005 York University 15
XML Data vs Relational Data Ordering
XML Model Intrinsic ordering can’t derived from value
Ex: sentences in a book is essential Impose challenge for the query language
Relational Model Ordering is dependent on values Rows not considered to have ordering
Nov 28, 2005 York University 16
XML Data vs Relational Data Null Values
XML Model Representing missing value by absence of
element Retrieving missing value results empty list Need rule on how handle empty list
Relational Model “null” value to represent missing value Rules for operators in the presence of null
Nov 28, 2005 York University 17
XML Data vs Relational Data Structural Transformations
XML Model Queries on XML documents and generate
new XML documents XPath 2.0 – navigating inside a document XQuery – joining elements, constructing new
elements/structures
Relational Model Queries on tables and generate new tables
Nov 28, 2005 York University 18
XML Data vs Relational Data Data Definition
XML Model Mixture of primitive data and nested
elements Elements may be optional Constraints on cardinality and order Impose challenges on type inference
Ex: proving output satisfies a given schema?
Relational Model Specifying the properties of columns All rows have same columns Relatively simple
Nov 28, 2005 York University 19
Road Map
XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005 York University 20
XPath 2.0What’s XPath?
XPath is a specification for defining parts of an XML document. XPath 2.0 provides a method to locate individual node or set of nodes in
a XML data model.
XPath 2.0 is close related to XQuery Same data model based on XML data model (infoset) XQuery uses XPath to refer to information in the data model
XPath 2.0 uses path expressions to navigate in XML documents XPath 2.0 uses path expressions to select nodes in an XML document. An XPath expression evaluates to a sequence of nodes These path expressions look very much like the expressions you see whe
n you work with a traditional computer file system.
XPath 2.0 is a W3C recommendation
Nov 28, 2005 York University 21
XPath 2.0Data model
Represent various values including the input and the output of a query all values of expressions used during the intermediate
calculations.
Based on XML infoset data model Shared with XQuery Model XML data as trees
Sequence based data model Using sequence to represent set of trees or tree fragments Everything is sequence Sequences never contain other sequences
Nov 28, 2005 York University 22
XPath 2.0Data model
A tree whose root node is a Document Node is referred to as a document.
A tree whose root node is not a Document Node is referred to as a fragment.
Nov 28, 2005 York University 23
XPath 2.0Data model
Every instance of the data model is a sequence
A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values
A sequence is an ordered collection of zero or more items
An item is either a node or an atomic value A single item appearing on its own is
modeled as a sequence containing one item.
Nov 28, 2005 York University 24
XPath 2.0Data model
There are seven kinds of Nodes in the data model: Document node Element node Attribute node Text node Namespace node processing instruction node Comment node
Nov 28, 2005 York University 25
XPath 2.0Sample XML Document
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore>
<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book>
<book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>
<book category="WEB"> <title lang="en">XQuery Kick
Start</title>
<author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book>
<book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book>
</bookstore>
Books.xml
Nov 28, 2005 York University 26
XPath 2.0Example
<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book>
<book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>
<book category="WEB"> <title lang="en">XQuery Kick Start</title>
<author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book>
<book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book>
/bookstore/book evaluated to a sequence of nodes, each node corresponding to a book element:
//book evaluated to the same result
Nov 28, 2005 York University 27
XPath 2.0Example
<book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book>
<book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price></book>
//book[@category=“WEB”] evaluates to a sequence containing 2 book element nodes:
Nov 28, 2005 York University 28
XPath 2.0Example
some $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value TRUE
every $x in //book satisfies $x/price > 49 evaluates to a sequence containing a atomic value FALSE
Nov 28, 2005 York University 29
XPath 2.0Example
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
/bookstore/book[position()=1] evaluated to a sequence containing one element node:
Nov 28, 2005 York University 30
Road Map
XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005 York University 31
XQueryWhat’s XQuery?
The language for querying XML data XQuery is a language for finding and extracting elements
and attributes from XML documents.
XQuery for XML is like SQL for relational databases
Lots of the concepts and techniques used in SQL processing and optimization can be applied to XQuery processing and optimization.
Nov 28, 2005 York University 32
XQueryWhat’s XQuery?
XQuery is built on XPath 2.0 expressions XQuery 1.0 and XPath 2.0 share the same data model Support the same functions and operators. Understanding XPath 2.0 is essential to understanding XQ
uery.
Supported by all the major database venders
IBM Oracle Microsoft etc
Nov 28, 2005 York University 33
XQueryWhat’s XQuery?
closed with respect to a data model value of every expression in the language is guaranteed t
o be in the data model. XPath 2.0 is also closed
Designed to be a functional language No side-effect Processing and producing sequences
XQuery is becoming a W3C standard Current draft version is XQuery 1.0 Not yet a W3C Recommendation (XQuery is a Working Dra
ft)
Nov 28, 2005 York University 34
XQueryFLWOR expression
For expression binds a variable with each element in a sequence iteratively
Let expression binds a variable with a sequence
Where expression applies conditions during For expression binding
Order By sort the output of the For expression
Return expression returns a sequence
York University 35
XQuerysample XML document – bib.xml<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book></bib>
Nov 28, 2005 York University 36
XQuerysample XML document – reviews.xml<reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry></reviews>
Nov 28, 2005 York University 37
XQuerysample XML document – prices.xml
<prices> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>bstore2.example.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>bstore1.example.com</source> <price>39.95</price> </book></prices>
Nov 28, 2005 York University 38
XQueryExample 1
Solution in XQuery:
<bib> { for $b in doc("bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> }</bib>
Result:
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book></bib>
List books published by Addison-Wesley after 1991, including their year and title
Nov 28, 2005 York University 39
XQueryExample 2
Solution in XQuery:
for $b in doc("bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result>
Result:
<result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author></result><result> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author></result><result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author></result><result> <title>Data on the Web</title> <author><last>Buneman</last><first>Peter</first></author></result><result> <title>Data on the Web</title> <author><last>Suciu</last><first>Dan</first></author></result>
Create a flat list of all the title-author pairs
Nov 28, 2005 York University 40
XQueryExample 3
Solution in XQuery:
for $b in doc("bib.xml")/bib/book
return <result> { $b/title } { $b/author } </result>
Result:
<result> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author></result><result> <title>Advanced Programming in the Unix environment</title
> <author><last>Stevens</last><first>W.</first></author></result><result> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author
> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author></result><result> <title>The Economics of Technology and Content for Digital
TV</title></result>>
For each book in the bibliography, list the title and authors
Nov 28, 2005 York University 41
XQueryExample 4
Solution in XQuery:
<books-with-prices> { for $b in doc("bib.xml")//book, $a in doc("reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <bib-price>
{ $a/price/text() } </bib-price> <review-price> { $b/price/text() } </review-price> </book-with-prices> }</books-with-prices>
Result:
<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix
environment</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Data on the Web</title> <price-bstore2>34.95</price-bstore2> <price-bstore1>39.95</price-bstore1> </book-with-prices></books-with-prices>
For each book found at both bib.xml and reviews.xml, list the title of the book and its price from each source
Nov 28, 2005 York University 42
XQueryExample 5
Solution in XQuery:
<bib> { for $b in doc("bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 order by $b/title return <book> { $b/@year } { $b/title } </book> }</bib>
Result:
<bib> <book year="1992"> <title> Advanced Programming in the Unix environment </title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book></bib>
List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order
Nov 28, 2005 York University 43
XQueryExample 6
Solution in XQuery:
<results> { let $doc := doc("prices.xml") for $t in distinct-values($doc//book/title) let $p := $doc//book[title = $t]/price return <minprice title="{ $t }"> <price>{ min($p) }</price> </minprice> }</results>
Result:
<results> <minprice title="Advanced Programming in the Unix environ
ment"> <price>65.95</price> </minprice> <minprice title="TCP/IP Illustrated"> <price>65.95</price> </minprice> <minprice title="Data on the Web"> <price>34.95</price> </minprice></results>
In the document “prices.xml”, find the minimum price for each book, in the form of a “miniprice” element with the book title as its title attribute
York University 44
XQuerysample XML document – book.xml
<?xml version="1.0"?><book> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <section id="intro" difficulty="easy" > <title>Introduction</title> <p>Text ... </p> <section> <title>Audience</title> <p>Text ... </p> </section> <section> <title>Web Data and the Two
Cultures</title> <p>Text ... </p> <figure height="400" width="400"> <title>Traditional client/server
architecture</title> <image source="csarch.gif"/> </figure> <p>Text ... </p> </section> </section> <section id="syntax" difficulty="medium" > <title>A Syntax For Data</title>
<p>Text ... </p> <figure height="200" width="500"> <title>Graph representations of structures</title
> <image source="graphs.gif"/> </figure> <p>Text ... </p> <section> <title>Base Types</title> <p>Text ... </p> </section> <section> <title>Representing Relational Databases</title
> <p>Text ... </p> <figure height="250" width="400"> <title>Examples of Relations</title> <image source="relations.gif"/> </figure> </section> <section> <title>Representing Object Databases</title> <p>Text ... </p> </section> </section></book>
Nov 28, 2005 York University 45
XQueryExample 7
Solution in XQuery:
declare function local:toc( $book-or-section as element()) as element()*
{ for $section in $book-or-section/section return <section> { $section/@*, $section/title, local:toc($section) } </section>};
<toc> { for $s in doc("book.xml")/book return local:toc($s) }</toc>
<toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <section> <title>Audience</title> </section> <section> <title>Web Data and the Two Cultures</title> </section> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <section> <title>Base Types</title> </section> <section> <title>Representing Relational Databases</title> </section> <section> <title>Representing Object Databases</title> </section> </section></toc>
Prepare a (nested) table of contents, listing all sections and their titles. Preserve the original attributes of each <section> element, if any
Nov 28, 2005 York University 46
Road Map
XML data model XML data vs Relational data XPath 2.0 XQuery Processing XQuery
Nov 28, 2005 York University 47
Processing XQueryApproaches for querying XML data
Mapping XML data into relational data Query with SQL May produces too many relations Loses of information may occurs
Ex: ordering, explicit hierarchical relationship between elements
Using specific query languages Usually integrated with SQL and relational data
management SQL/XML or XQuery
Nov 28, 2005 York University 48
Processing XQueryIBM System RX SQL/XQuery compiler
A new XQuery parser is added to the existing relational query processing
All components extended to process XQuery
Nov 28, 2005 York University 49
Processing XQueryOracle XQuery Compilation Engine
Parser convert XQuery into XQueryX
XQueryX is an XML representation of XQuery (another W3C candidate recommendation)
XML parser construct a DOM tree from XQueryX
Work on the DOM afterward
Corresponding components are extended for XQuery too
Nov 28, 2005 York University 50
Processing XQueryMicrosoft XQuery compilation
XQuery compiled into XML algebra tree, which is an internal representation
Algebra tree can be optimized and executed by relational query processor
Optimizations are rule-based
Mapper traverses the algebra tree, converting each XML operator into a relational operator sub-tree
Nov 28, 2005 York University 51
References M. Nicola, Bert van der Linden. Native XML Support in DB2
Universal Database. Proceeding of the 31st VLDB Conference, Trondheim, Norway, 2005
Kevin Beyer, Chun Zhang, etc. System RX: One Part Relational, One Part XML. SIGMOD 2005, Baltimore, Maryland, USA.
Shankar Pal, Istvan Cseri, etc. XQuery Implementation in a Relational Database System. Proceedings of the 31st VLDB Conference
Zhen Hua Liu, Vikas Arora. Native XQuery Processing in Oracle XMLDB. SIGMOD 2005, Baltimore, Maryland, USA
Scott Boag, Don Chamberlin, etc. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/
Mary Fernandaz, Norman Walsh, etc. XQuery 1.0 and XPath 2.0 Data Model. http://www.w3.org/TR/xpath-datamodel/