71
1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

Embed Size (px)

Citation preview

Page 1: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

1

Advanced Database Topics

Copyright © Ellis Cohen 2002-2005

XML & Databases

These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

For more information on how you may use them, please see http://www.openlineconsult.com/db

Page 2: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 2

Lecture Topics

Processing & Accessing XMLQuerying & Updating XMLDOM: Document Object ModelNative XML DBsXML-Object MappingsXML-Relational MappingsLinks & PointersXML-Relational Databases

Page 3: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 3

Processing & Accessing

XML

Page 4: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 4

Building vs Processing XML Text

XML Text XML Data Model (tree)Process it

Parse itConstruct a model of it

To persist it to a database for fast access/update

Or take some programmatic action based on its contents

XML Data Model (tree) XML TextBuild XML to store or transport the

data

Page 5: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 5

Fidelity and Round-Tripping

When XML text is stored in a data structure, which details of the document are retained?

When XML is regenerated from the data structure, how similar is it to the original XML?

• Order of elements (essential for DTD validation)

• Order of attributes• Whitespace (esp newlines, indentations)• Comments, PI's, Entities

Page 6: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 6

Processing XML Text

Process XML text in 2 ways1. Parse it with SAX (Simple API for XML)

One pass through XMLVery efficientRequires minimal memory space

2. Construct XML Data ModelImportant if you need

To parse forward & backwardsEfficient random accessEfficient query & update

Can use SAX to construct it

Page 7: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 7

Example XML Text<CourseBooks>

<Course>CS779</Course><Book isbn="3047921861">

<Title>Database Design, Implementation & Management, 5th Edition</Title>

<Author>Rob</Author><Author>Coronel</Author><Publisher>Course

Technology</Publisher></Book><Book>

<Title>Professional XML Databases</Title>

<Author>Williams</Author><Publisher>Wrox Press</Publisher>

</Book></CourseBooks>

Page 8: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 8

Corresponding XML Tree Data Model

root

CourseBooks

Course Book Book

Title Author Publisher

isbn

text text text

Author

text

Page 9: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 9

Querying & Updating XML

Page 10: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 10

Query & Update Approachesfor Relational and OO Data

Result Set ApproachAny client-side cache is invisible to clientsResult set returned by query is transferred into the

client's memory, where it is independent of any client-side cache

Persistent data (or data in the cache) must be modified via commands INSERT/DELETE/UPDATE

Most common approach used with RDBs

Visible Client-Side Cache ApproachQueries (and navigation) cache results in a client-

side cache, which is integrated with the user's address space.

Persistent data modifications result from writing back data modified in the client's address space

Approach used with OODBs and with Object-Relational Mapping

Page 11: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 11

XPath/XQuery Result

root

CourseBooks

Course Book Book

Author"CS779"

"Rob & Coronel"

Author

"Williams"

Query: //AuthorResult: node sequence

In general, an XML query returns a

sequence of nodes and values

Page 12: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 12

Query & Update for XML

Result Set ApproachReturns a sequence of values and XML fragments (e.g.

<Author>Rob & Coronel</Author>, <Author>Williams</Author>) divorced from persistent XML data

Must Update XML Data via commands e.g. INSERT/ADD/DELETE/REPLACE

Visible Client-Side Cache Approach (PDOM)Nodes of the persistent XML's tree model are cached in

the client's address space as a result of queries & navigation

Local XML tree nodes can be modified by the clientPersistent XML data modifications result from writing

back modified XML tree nodes

Page 13: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 13

Command Based XML Modification

No standard approach to command-based XML modification

XUpdate proposal is adopted by a variety of XML DB vendors

Basic idea: FLWM StatementsUse (sequences of) modification commands instead of RETURN clauses in FLWOR expressions

Page 14: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 14

Example XML Modification Operations (not XUpdate)

– for MOVE, the sequence must be a sequence of nodes, and the node (with all its children intact) is moved

– for COPY, the sequence may contain both nodes and values: values become text nodes, nodes are deep-copied

MOVE/COPY sequence AFTER nodeMOVE/COPY sequence BEFORE nodeMOVE/COPY sequence REPLACE node

– the three commands above place the sequence after/before/instead of the specified node

MOVE/COPY sequence ADD node– appends the sequence to the children of the node

MOVE/COPY sequence REPLACE [NODES | ATTRIBUTES | CONTENTS ] node– first deletes either all the child nodes / just the attribute child

nodes / all but the attribute nodes, replacing those nodes by the ones specified

DELETE node-sequence– deletes the specified nodes

Page 15: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 15

XML Modification Example

FOR $a in //Author/NameLET $nam := $a/Firstname + " " + $a/LastnameCOPY $nam REPLACE CONTENTS $a

Does the DB support XML validation?If so, when is XML revalidated?– Explictly– At end of modification command– At end of FLWM statement– At end of transaction

Page 16: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 16

DOMDocument Object Model

Page 17: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 17

DOM: Document Object Model

Standard Tree Data Model for XML (and HTML)

Used with API to navigate around tree (Different APIs for each programming language)

Widely used with Dynamic XML/HTML (allows client-side languages like Javascript to update displayed HTML/XML on the fly)

Used both for client-side XML data, and to access server-side XML data (PDOM: Persistent DOM)

Page 18: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 18

JDOM: DOM API for Java

Class Document {Element getRootElement();DocType getDocType();

}

Class Content { // represents a nodeDocument getDocument();void detach(); // detach from parentObject clone(); // makes a deep copyParent getParent();

– interface for Contents & Documentsvoid setParent( Parent parent );

}

Page 19: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 19

JDOM Element ClassClass Element extends Content {

String getName();void setName( String name );List<Attribute> getAttributes();Attribute getAttribute( String name );void setAttribute( Attribute attr );List<Element> getChildren();List<Element> getChildren( String name );Element getChild( String name );List<Content> getContent();Content getContent( int index );void setContent( List<Content> content );void setContent( Content content );void setContent( int index, List<Content> content );void addContent( List<Content> content );void addContent( Content content );void addContent( int index, List<Content> content );String getText();

}

Page 20: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 20

Element.getText()

elem.getText() =>"It's way cool"

em

"way"

Description

"It's " " cool"

<Description>It's <em>way</em> cool</Description>

elem

corresponds to the XQuery expression

string($elem)

Page 21: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 21

JDOM Attributes and Text

Class Attribute extends Content {String getName();void setName( String name );String getValue();void setValue( String value );

}

Class Text extends Content {String getText();void setText( String text );void append( String text );

}

Page 22: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 22

JDOM Example

List nameElems = XPath.selectNodes( doc, "//Author/Name");

Iterator iter = nameElems.iterator();while (iter.hasNext()) { Element e = (Element)iter.next(); String firstnam =

e.getChildText("Firstname"); String lastnam =

e.getChildText("Lastname"); String nam = firstnam + " " + lastnam; Text text = new Text( nam ); e.setContent( text );}

Page 23: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 23

Persistent DOM (PDOM)

Provides a client-side DOM API to access XML stored persistently on a server

Nodes of the Persistent XML's DOM model are cached locally as a result of evaluating XPath or XQuery expressions & by navigating to nodes of the DOM not yet in the cache

DOM nodes are directly modified via the DOM API (e.g. setText, setContent, setName, setValue)

Modified nodes are copied back to the server (automatically on commit, or explicitly), updating the persistently stored XML.

Page 24: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 24

Native XML DBs

Page 25: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 25

Why Store XML Natively

•Structured documents, but with varying, non-standardized structures

•Semi-structured documents, especially with fast searching of underlying text (with tags removed)

Page 26: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 26

XML Document Collections

XML documents often stored in a Folder TreeFolder elements contain both documents &

subfoldersDocuments and folders have keywords

and other attributesWebDAV (HTTP extension) often used to modify

folder tree; XUpdate or PDOM could be used as well

Folder Tree & Documents are generally separateEach document element contain a link to a

document stored as a separate treeIf XPath can follow links/pointers,

allows a single XPath expression in the cotext of a folder element to span multiple XML documents

Page 27: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 27

XML Representation Goals

Efficient tree navigation and querying

Fast locating (find all authors) and indexing (find all authors named Jones)

Efficient text-search, including cross-tag text – For example, searching for "way cool" in<description> It's <em>way</em> cool</description>

Page 28: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 28

DOM + Text Representation

description

em

I t ' s w a y c o o l… …

Also ensures whitespace fidelity

Should nodes be identified by OIDs or by a lighter-weight

representation?

Page 29: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 29

Locating & IndexingEfficiently access parts of the XML

representation to speed up queries

• Element name//Author

• Element or Attribute within Parent//Author/Name, //Author/@id– Locating would efficiently find all //Author/@id

nodes– Indexing additionally would efficiently find a

particular //Author/@id value, e.g.//Author/[@id='Jon32']

• Arbitrary XPath expressions//Book//Lastname//Book[count(Author)=1]In addition, some XML DB's also support text-based

indexed to efficiently locate text within XML

Page 30: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 30

Concurrency

Most Native XL DB concurrency is currently lock-based

• Generally entire XML documents are locked

• Often, the folder hierarchy is not locked• Some systems support fine-grained (i.e.

node-level) concurrency.– Node-based locking (locking a node may or

may not place S locks on all nodes up to the parent, depending upon how nodes are identified)

PDOM systems with fine-grained concurrency may do better with optimistic concurrency control

Page 31: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 31

Concurrency, Moveable Subtrees,and Node Identification

Suppose U1 executes myD := //D[1]; COMMIT;

Then U2 executes MOVE //C[1] ADD //B[2]; COMMIT;

Then U1 executes dtext := string( myD )

Can this work?myD refers to a node which has moved (its parent was moved to //B[2]).

A

B B

C

C

D

Page 32: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 32

Access ControlMost systems support access control on XML

documents at the document and folder level (much like OS-provided directory and file access control)

Other possibilities include– Access control on nodes (by granting privileges or

associating security predicates or ACLs with nodes specified by XPath expressions)

– Access control on views of XML Documents (although updatable views are computationally problematic both for XSLT and XQuery)

– Security domains: allowing some or all of an XML document to be accessed and/or updated only through operations (to which execute access can be selectively granted, or which dynamically determine when or how to execute)

Page 33: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 33

XML Constraints Built-In To XML Schema

<xs:key name="authkeys"><xs:selector xpath="//author"/><xs:field xpath="@authid"/>

</xs:key>

Every author's authid is unique and non-nil

Each book's Authref refers to a legal authid

<xs:keyref name="authrefs" refer="authkeys"><xs:selector xpath="//book"/><xs:field xpath="Authref"/>

</xs:keyref>

The contents of a book's authref attribute must correspond to some author's authid attribute

Page 34: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 34

Expressing XML Constraints with XQuery

Every author's authid is unique and non-nil

count(//Author) = count(distinct-values(//Author/@authid))

Each book's Authref refers to a legal authid

every $ar in //Book/Authref satisfiessome $a in //Author/@authid satisfies $a = $ar

Page 35: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 35

XML-ObjectMappings

Page 36: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 36

Object Representations for XML

Having an object representation for XML is important because it allows us to– Use an OODB (or any persistent object model)

to save and query XML– Save an OO structure as XML– Have a standard way of accessing and

manipulating XML in an OOPL

Two standard approaches

•DOM (Document Object Model)Object classes are

Document, Node, Element, Attribute, Text, …

•Data BindingUse object classes based on DTD elements:

Book, Author, …

Page 37: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 37

Data Binding

<!ELEMENT Author (Name, Address)><!ELEMENT Name (Firstname, Lastname)>

class Author (extent authorlist) {attribute Name name;attribute string address;

}

class Name {attribute string firstname;attribute string lastname;

}

Page 38: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 38

Data Binding Example

<Author>

<Name>

<Firstname>Monica</Firstname>

<Lastname>Lewinsky</Lastname>

</Name>

<Address>609 Penn Ave</Address>

</Author>

name:address: "609 Penn Ave"

firstname: "Monica"lastname: "Lewinsky"

Author Name

Page 39: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 39

DOM Examplename: "Author"contents:

name: "Name"contents:

name: "Address"contents:

name: "Firstname"contents:

name: "Lastname"contents:

text: "609 Penn Ave"

text: "Lewinsky"

text: "Monica"

Text Text

Text

Element

ElementElement

Element

Element

Page 40: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 40

DOM vs Data Binding

DOMNaturally supports Fidelity

Naturally retains attribute & element orderWhitespace can be explicitly retained in text

nodes or as additional node propertiesSupports Round-Tripping

XML generated from DOM can be made identical to original

Most useful for document-centric XML

Data BindingMaps XML to Data Model that reflects

natural/intended structure/useFidelity not easily retainedMost useful for data-centric XML

Page 41: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 41

Runtime XML/Object Mappings

Map runtime objects via Data Bindingto XML for storage or transport.

Doesn't fully address non-hierarchicalnetworks of objects

Map XML to local objects for runtime accessusing DOM or Data Binding

Page 42: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 42

Persistent XML/Object Mappings

Map OODB objects to XML via Data Binding(typically) for transport

Persist XML in an OODB using DOM or Data Binding

Disk

Disk

Page 43: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 43

Integrated Object Querying

Use an OODB, View both as XML– User still sees XML data as XML. Its mapping to

the OODB is hidden– User sees an XML representation of the OO data

as well, based on a hidden mapping– User still queries all data with XPath & XQuery,

which is automatically mapped to OQL

Use an OODB, View both as OO data– User sees OO data as is– User sees XML data in terms of its OO

representation, based on a public mapping – User uses OQL to query all data

Use an XML-Object Database– Has XML classes to store XML and XML

sequences– Extends/integrates OQL and XQuery

Suppose we have both persistent XML and OO data. How can we arrange to do queries joining both of

them?

Page 44: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 44

Links & Pointers

Page 45: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 45

Non-Hierarchical XML

XML works well for representing hierarchical networks of objects.

How can/should XML represent non-hierarchical networks of objects?

How can/should XML elements reference other XML elements in a different part of the same or different tree?

Page 46: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 46

OO References vs XML LinksIn Object model

Objects can contain referencesReferences identify other objectsReferences are used for navigation

(both by OOPL and OQL)Reference require notion of identity

• Primary-key based ("semi-soft" links)• Based on unique OID ("hard" links)

In XML modelIn addition to just using IDs and IDREFs,

XML Links act as referencesElement links are "semi-soft"

based on primary-key identityXPointer links are "soft"

based on XPath description of targetXPointers links can also refer to ranges or nodes

and points or ranges within text

Page 47: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 47

Declared as an ID

Using IDs and IDREFs<BookDB> <Booklist>

<Book isbn="4132589794"> … <AuthorRef authref="a309"/>

</Book> …

</Booklist> <Authorlist>

<Author authid="a309"> <Name>John Jones</Name>

… </Author>

… </Authorlist></BookDB>

Declared as an IDREF

Page 48: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 48

Element Links<BookDB xmlns:xlink=

"http://www.w3.org/1999/xlink"> <Booklist>

<Book isbn="4132589794"> … <AuthorRef xlink:href="#element(a309)"/>

</Book> …

</Booklist> <Authorlist>

<Author authid="a309"> <Name>John Jones</Name>

… </Author>

… </Authorlist></BookDB>

Declared as an ID

Page 49: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 49

Traversal Across Links <Booklist>

<Book isbn="4132589794"> … <AuthorRef xlink:href="#element(a309)"/>

</Book> …

</Booklist> <Authorlist>

<Author authid="a309"> <Name>John Jones</Name>

Should XPath follow the link?

Should //Booklist//AuthorRef/Name work?XPath does not currently traverse links.Allowing // to traverse links is problematic since it requires testing for circularitiesMight instead be reasonable to extend XPath with an xlink navigation axis: //Booklist//AuthorRef/xlink::Author/Name

Page 50: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 50

Cross-Document Links

XLink more generally allows links to nodes within other documents

Suppose books & authors are in separate files

<Book isbn="4132589794"> … <AuthorRef xlink:href=

"authlist.xml#element(a309)"/></Book>

Page 51: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 51

Soft Link<CompanyDB xmlns:xlink=

"http://www.w3.org/1999/xlink"> <Depts>

… </Depts> <Emps>

<Emp> … <Dept xlink:href=

"#xpointer(//Depts/Dept[Dname="HR"])"/> </Emp>

… </Emps></CompanyDB>

Soft link:Based on using XPath to describe the destinationPotentially rebound on each use

Page 52: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 52

Points & Ranges

XPointer extends XLink to allow a pointer to refer to– A single point within the text of a document

– A range between two points in a document

– These Node points are soft; they represent nodes identified by XPath

XPointer also can be used to refer to– A range of text (ignoring element boundaries)

within a document

– Text points are also soft; they are identified by string pattern matching.

Page 53: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 53

XML-RelationalMappings

Page 54: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 54

XML-Relational Mapping

Knowing how to map between XML and Tables is important because it allows us to– Use an RDB to save and query XML

– Convert relations and result sets to XML

– Use XPath/XQuery to query relational data

Two situations

•Designing a new relational model suitable for storing our XML data

•Mapping legacy relational data to XML

Page 55: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 55

Persisting XML -> RDB

Disk

Use a 2 step approach

1. Map XML -> Object Model Either Use Data Binding Or Use DOM

2. Map Object Model -> RDB(If using DOM, called "shredding")

Page 56: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 56

Mapping Legacy RDB -> XML

Disk

Model-Based Mapping(reverse of XML -> RDB mapping)

Disk

Table or Template-Based Mapping

Page 57: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 57

Table-Based MappingEmps ( empno, ename, deptno )

<table name="Emps"><row>

<col name="empno">304792</col><col name="ename">Joe Jones</col><col name="deptno">20</col>

</row>…

</table>OR

<table name="Emps"><row>

<empno>304792</empno><ename>Joe Jones</ename><deptno>20</deptno>

</row>…

</table>

Page 58: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 58

Template-Based Mapping<emplist> <sql:apply query= "SELECT ename, dname

FROM Emps NATURAL JOIN Depts"> <employee> <name> <sql:field name="ename"/> </name> <dept> <sql:field name="dname"/> </dept> </employee> </sql:apply></emplist>

Can be extended to support nested queries

• sql:apply names queriessql:field specifies result fieldbecomes like OQL for clause

• Nested queries can use named field values of outer queries

Page 59: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 59

Reasons for RDB XML Mapping

Transporting RDB dataTable-based most common,

Model-based used as well

Using results for other purposesExample: for display using XSLT or CSSData-Binding Model-Based &

Template-Based most common

To allow querying with XPath/XQuerySupports joins of XML & RDB dataSimplifies Recursive QueriesUses Data-Binding Model-Based approach

Page 60: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 60

Recursive Models & Transitive Closures

Mapping XML to RDBMaps XPath to SQLBut // involves arbitrary depth search =>

transitive closure

XML<!ELEMENT Part (ID, Name, Part*,Manufacturer)>

//Part[ID="MB386"]//Part[Manufacturer="DMQ"]/ID

RDB (Oracle)Part( id, parentid, name, manufacturer )

SELECT id FROM PartSTART WITH id='MB386'CONNECT BY PRIOR id = parentidWHERE manufacturer = 'DMQ'

Oracle syntax; SQL-99 defines recursive views

which are not generally implemented

Page 61: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 61

XML-Based IntegrationXMLDB

OODB

RDB

Files

Web Service

Client

Issue XML Query

XML

Query

OQL

SOAP R

equest

Read/Parse

SQL/SQLX

Integration Server (maybe combined with XMLDB or XRDB)

• Develop XML Model for each data source

• Maps XPath & XML Query to distributed native queries/request

• Maps combined results to XML

MDDBMDX

Page 62: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 62

Integrated Relational Querying

Use an RDB, View both as XML– User still sees XML data as XML. Its mapping to

the RDB is hidden– User sees an XML representation of the relational

data as well, based on a hidden mapping– User still queries all data with XPath & XQuery,

which are automatically mapped to SQL

Use an RDB, View both as Relational Data– User sees relational data as is– User sees XML data in terms of its relational

representation, based on a public mapping – User uses SQL to query all data

Use an XML-Relational Database– Has XML classes to store XML and XML

sequences– Extends/integrates SQL and XQuery

Suppose we have both persistent XML and Relational data.

How can we arrange to do queries joining both of them?

Page 63: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 63

XRDBsXML-Relational

Databases

Page 64: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 64

XRDBs (XML-Relational Databases)Integrate XML into RDBsAdd XML data type

Cells in XML columns each hold a sequence of XML fragments (i.e. So, XML may be stored both in a folder hierarchy as well as in SQL tables with XML columns)

May be stored using• LOBs• Transparently mapped (generally by shredding)• Native XML DB representation

Add Support for SQL/XML StandardSQL query functions can build XML

dynamically during SQL queriesThe XMLQuery function can apply XQuery

expressions to both stored and dynamically created XML

The XMLTable function can produce a table-like view of XQuery result sequences

Page 65: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 65

Constructing XML in SQL

SELECT XmlElement( emp, empno )FROM Emps

<emp>3047</emp>

<emp>6051</emp>

XmlElement returns an XML elementgiven a tag & value

The query returns a result set of XMLType objects

Page 66: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 66

Constructing Nested XML

SELECT XmlElement( employee, XmlElement( name, ename ), XmlElement( dept, dname ) ) FROM Emps NATURAL JOIN Depts

<employee> <name>JONES</name> <dept>ACCOUNTING</dept></employee>

<employee> <name>SMITH</name> <dept>RESEARCH</dept></employee>

XmlElement can return an XML element given a tag & a sequence of values, which themselves may be XML elements

There is also support for adding attributes

Page 67: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 67

XML Grouping & Aggregation

SELECT XmlElement( dept, XmlElement( name, dname ), XmlAgg( XmlElement( emp, ename ) ORDER BY ename)FROM Depts NATURAL JOIN EmpsGROUP BY dname

<dept> <name>ACCOUNTING</name> <emp>FOSTER</emp> <emp>JONES</emp> <emp>WHITBY</emp></dept>

<dept>…

• Group the joined tuples by dname

• Sort the tuples in each group by ename

• Produce an emp element for each tuple in the group

Page 68: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 68

Using XMLQuery

Suppose Emps has a description column, which contains XML, including an address element

SELECT empno, XmlQuery( './/address' PASSING description )FROM Emps

The result set contains two columns• The employee number• The XML for the address

Page 69: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 69

Generating Tables using XQuery

XmlTable( 'doc("mybooks.xml")//Authors'COLUMNS Name VARCHAR(20), dob DATE PATH '@dob')

Generates a read-only view– Each row is an author (corresponding to

the author nodes in mybooks.xml)– The Name column contain the text value

of each author's Name element– The dob column contains the value of

each author's dob attribute (any XQuery expression could be used following PATH)

Page 70: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 70

Joining XML & Relational Data

What does this do?

SELECT DISTINCT e.empno, e.nameFROM Emps e, XmlTable( 'doc("mybooks.xml")//Authors' COLUMNS ssno CHAR(11) PATH '@ssno', dob DATE PATH '@dob') tWHERE e.ssno = t.ssno AND t.dob > '12-31-1949'

Page 71: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 XML & Databases These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 71

XML/Relational Join SolutionSELECT DISTINCT e.empno, e.name

FROM Emps e, XmlTable( 'doc("mybooks.xml")//Authors' COLUMNS ssno CHAR(11) PATH '@ssno', dob DATE PATH '@dob') tWHERE e.ssno = t.ssno AND t.dob > '12-31-1949'

List the employee # and name of employees who have authored a book (listed in mybooks.xml) and who are born after 1949