66
© 2006 IBM Corporation IBM Information Management A next generation hybrid data server Holger Seubert [email protected] DB2 Information Management Development IBM Laboratory Boeblingen LUW DB2 Version 9 – the Viper Release 122. Datenbankstammtisch der HTW Dresden Fachbereich Informatik/Mathematik 13. Dezember 2006

DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

© 2006 IBM Corporation

IBM Information Management

A next generation hybrid data server

Holger [email protected]

DB2 Information Management DevelopmentIBM Laboratory Boeblingen

LUW

DB2 Version 9 – the Viper Release

122. Datenbankstammtisch der HTW Dresden Fachbereich Informatik/Mathematik13. Dezember 2006

Page 2: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

"There are 68 patents alone in Viper, and it involved 750 developers over five years,"

Bob Picciano, VP WW Information Management Sales said.

"This is something no one else has and will take years to get here."

There's a lot of innovation in Viper.Let’s go and explore ….

Explore yourself for free withDB2 Express-C 9:

à http://www-306.ibm.com/software/data/db2/express/

Page 3: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Agenda – Start the hybrid engine

• < Overview />

• pureXML Storage

• XML Indexes

• XQuery & SQL/XML support

• XML Schema support (XSR)

• Utilities, Tools & API’s

• Summary

• XML Query Execution

Page 4: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

History of XML: How it all began …

XML 1.0W3C Recommendation

1998 2000

2004

beginning 1980’sThe relational data-model

becomes popular

XML 1.02nd Edition

XML 1.03rd Edition

& XML 1.1

1983

SGML Standardization(ISO)

1993HTML

1st Version

2005

The “Standard Generalized Markup Language”(SGML) is a metalanguage in which you can definemarkup languages for documents. SGML was originallydesigned for sharing machine readable documents. Italso has been used extensively in the printing andpublishing industries.

The Extensible Markup Language (XML) is a general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data. It is a simplified subset of SGML.

“HyperText Markup Language” (HTML) is a markup language (subset of SGML) designed for the creation of web pages and other information viewablein a browser. HTML is used to structure informationand can be used to describe the appearance and semantics of a document.

Page 5: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

… and where we are today – the hype

XHTML

JAX-RPC

DOMSAX

Windows Installer XML

XML Schema XQuery

XUpdate

XPath

XSLT Ajax

UDDISOAP

WSDL

SQL/XML

XPointer

XLink

RSS

XML-FO CSS

XML INFOSET

Native XML Databases

XML-enabled Databases

XML 1.0W3C Recommendation

1998 2000

2004

XML 1.02nd Edition

XML 1.03rd Edition

& XML 1.1

1983

SGML Standardization(ISO)

1993HTML

1st Version

2005

Pls Note: The order of the different technologies mentioned above does not reflect their 100% order of invention/ appearance.

XForms

2006

Page 6: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

The evolution to a new database technology

û no chance to query XML directly on database tierû XML is read in strings and passed to “middle-tier”, which then queries against XML data

Last generation database technology ..

ü new XML datatype allows to store native data inside the databaseü run queries against XML with XQuery or XPathü embed XQuery statements directly into SQL statementsü special XML indexes are used to boost performanceü assign a schema to XML data, ensure that XML data is valid

.. needed data types like CLOB or text to store XML data:

Next generation database technology …… interacts directly with the XML data:

Page 7: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Developer“I see a sophisticated XML repository that also supports SQL."

SQL Developer"I see a sophisticated

RDBMS that also supports XML."

Familiar Programming Models

OptimizedStorage Models

MatureServices

Familiar Tooling

OptimizedPerformance &

Scale

A New Model is Emerging – a hybrid system

Page 8: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Agenda – Start the hybrid engine

• Overview

• < pureXML Storage />

• XML Indexes

• XQuery & SQL/XML support

• XML Schema support (XSR)

• Utilities, Tools & API’s

• Summary

• XML Query Execution

Page 9: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

9 DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

pureXML in DB2 9

§ Standards compliant & driving the standards – XML, XQuery, SQL/XML, XML Schema …

§ 100% integrated in DB2 – leveraging performance, scalability, reliability, availability …

§ 100% integrated with SQL – XML is a new SQL type

– Access relational and XML data in same statement

§ 100% integrated with application APIs: – JDBC, ODBC, .NET, embedded SQL, PHP

Page 10: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

pureXML in DB2 9§ What does pureXML® support mean?

– Storage, compiler, optimizer, indexing, tools, utilities, APIs, …à XML capabilities in all DB2 components

§ pureXML® Storage– XML stored in parsed, annotated DOM-like trees – the XQuery Data Model is persistedà NOT shredded, NOT as LOB

– XML data is formatted to buffered data pages (LOB pages or not buffered!)

– XML data can be placed in separate table spaceà Shared with LOB data of that table

– New data XDA object on disk (new data type)

§ Customer benefits– Faster navigation and queries– Simpler indexing– Natural XML user paradigm

The XDA object

Page 11: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Integration of XML & Relational Capabilities

DB2 HYBRID DATA SERVER

CLIENT SQL/XML

XQuery

DB2 Engine

XML

Interface

Relational

Interface Relational

XML

DB2 Storage:

DB2 Client /Customer Client Application

§ Native XML data type (server & client side)– not Varchar, not CLOB, not object-relational !

§ XML capabilities in all DB2 components

§ Applications can combine XML & relational data

CompIler

Page 12: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

The XML Data Type

DB2 storage

………

<dept>

<emp>…</emp>

</dept>

…“PR27”

deptdoc…deptID

§ A column of type XML can hold one well-formed XML document for everyrow of the table

create table dept (deptID char(8),…, deptdoc xml);

­ Relational columns are stored in relational format (tables)

­ XML values are stored nativelyin the XQuery Data Model

­ A descriptor pointing to the XML storage is stored in the row

§ XML and relational columns are stored differently:

§ no limit on size of XML document (no length associated with XML data type, client-server protocol limits document size to 2GB at the moment)

§ Parse-once paradigm: No XML parsing at query time!

Page 13: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

DB2 XML Storage – XML to the core

§ Document is stored in parsed hierarchical representation– This is similar to a DOM representation of the XML INFOSET– IBM’s version of open-source Xerces is used.– The XQuery Data Model is persisted

§ All XML nodes are type annotated, according to the XQuery Specification (W3C)– XML Schema types if validated.– Default types otherwise.

§ All data is stored in UTF-8– Regardless of the document encoding– Regardless of the locale– Regardless of the codepage of the database

store XML intact with full DBMS knowledge of documents internal structure

Page 14: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Information stored with every node:

§ Name (e.g. element name, encoded as StringID from the string table)§ A nodeID§ Type of node (e.g. element, attribute, etc.)§ Namespace§ Namespace prefix§ Data type§ Pointer to parent§ Array of child pointers§ For text/attribute notes: the data itself

DB2 XML Storage – XML to the core

Page 15: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

15 DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

DB2 XML storage – XML to the core

§ Node hierarchy of an XML document stored on DB2 pages

– Large documents split into pages/regions

§ Nodes are physically connected– Query performance

§ Regions are logically connected– Regions index is a system

component

page page page

Regions index

Large XML document

split into

regions

Page 16: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

New System Indexes

§ Entries in SYSCAT.INDEXES with the following INDEXTYPE:

§ XRGN: XML Region Index– Created once for table with XML column(s)– Maps logical pointers to XML data pages

§ XPTH: XML Path Index– Created for each XML column– Holds local subset of global path/pathID mapping information /

path table– Can be used for wildcard resolution

Page 17: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

DB2 XML Storage

…PR28

…ACC

…PR27

DEPTDOC…ID

Region Path

/dept/dept/employee/dept/employee/@id…

INX ObjectDAT Object

XDA Object

Page 18: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

• Overview

• pureXML Storage

• < XML Indexes />

• XQuery & SQL/XML support

• XML Schema support (XSR)

• Utilities, Tools & API’s

• Summary

Agenda – Start the hybrid engine

• XML Query Execution

Page 19: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Indexes for High Query Performance

§ Index elements and attributes inside the document

§ Uses an XML pattern expression to index paths and values in XML documents stored in a single XML column

§ Specify the type to index§ Should be the same as used in the query§ Query /Person[Age = 5] needs a numeric index on Age

§ 0,1 or multiple index entries per document

create table t1 (docID int, XMLDoc xml);create index AgeIndex on t1( XMLDoc);

generate key using xmlpattern '/Person/Age' as sql Double;

NOTE: Declaration & use of namespace prefix supported (not shown above)

Page 20: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Index DDL

AS SQL VARCHAR (integer)

CREATE index-name

ON table-name (xml-column-name) GENERATE KEY USING xmlpattern

UNIQUE

text()@attribute-tag@*

///

element-tag*

///

INDEX

DOUBLEDATETIMESTAMP

VARCHAR (HASHED)

xmlpattern:

xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)

Page 21: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

§ create unique index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;

§ create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;

§ create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);

<dept bldg=101><employee id=901>

<name>John Doe</name><phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

XML Index Examplescreate table dept(deptID char(8) primary key, deptdoc xml);

Page 22: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Index Wizard (DB2 Control Center)

Create a value index onXML elements orXML attributes by right-clicking in the document structure

Page 23: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Index Wizard (DB2 Control Center)

A pop-up menu showspossibilities to createXML value index onselected XML node

Page 24: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

The Big Indexing Picture

SQL Table with XML Column

XML ColumnPathsIndex

XML Storage.XDA file

XML Regions Index

Catalog Path Table

Index on XML Column

Relational Column 1 Relational Column 2 XML Column

Relational Index

Maps paths to path ids for each XML column. Subset of paths stored in global catalog path table.

Logical mapping of regions in an XML document used to retrieve the document data

Created by users to improve performance during queries on XML documents

Page 25: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

A full text index for XML

§ XML is less like traditional data stored in database

§ Applications on XML documents often rely on a full text index

§ DB2 9 offers both – Traditional-behaving database indexes – Full-text indexing

§ Existing Net Search Extender is used for full text index– XML aware: limit search to specific elements or attributes– Proximity searches– Wildcard searches– and a lot more … text

Page 26: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

• Overview

• pureXML Storage

• XML Indexes

• < XQuery & SQL/XML support />

• XML Schema support (XSR)

• Utilities, Tools & API’s

• Summary

Agenda – Start the hybrid engine

• XML Query Execution

Page 27: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XQuery and SQL/XML

§ DB2 treats both, SQL and XQuery as primary query languages(hybrid system)

§ SQL and XQuery independently operate on their respective data models

§ DB2 also allows to combine and correlate relational and XML types of data

Two ways to query XML data:

Next section:- Querying XML data with SQL- Optional: XQuery embedded in SQL

This section: - Querying XML data with XQuery- Optional: SQL embedded in XQuery

Page 28: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

What is XQuery?

XQuery 1.0 & XPath 2.0 Data Model

XQUERY

www.w3.org/TR/xquery-operators/

www.w3.org/TR/query-datamodel/

Expressions

Functions & Operators

XPath 2.0XMLSchema

www.w3.org/TR/xquery

www.w3.org/TR/xpath20/

www.w3.org/XML/Schema

A query language designed for XML data……and supported in DB2 9.

Page 29: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

The FLWOR Expression

§ FOR: iterates through a sequence, bind variable to items

§ LET: binds a variable to a sequence

§ WHERE: eliminates items of the iteration

§ ORDER: reorders items of the iteration

§ RETURN: constructs query results

for $movie in db2-fn:xmlcolumn(‘movies.doc’)let $actors := $movie//actorwhere $movie/duration > 90 order by $movie/@yearreturn <movie>

{$movie/title, $actors}</movie>

<movie><title>Chicago</title><actor>Renee Zellweger</actor><actor>Richard Gere</actor><actor>Catherine Zeta-Jones</actor>

</movie>

Page 30: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Which data does XQuery (as a primary language) work on?

§ All XML data is in XML typed columns in tables

§ XQuery standard defines a “collection” function– Very abstract, implementation dependent

§ DB2 XQuery uses 2 XQuery functions to get data:

– db2-fn:xmlcolumn()– db2-fn:sqlquery()

Page 31: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

1)1) Identifying XML data by column: Identifying XML data by column: db2-fn:xmlcolumn()

Querying XML data – with XQuery

for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept/employee

operate on entire XML column

2)2) Identifying XML data via a select statement: Identifying XML data via a select statement: db2-fn:sqlquery()Leverage predicates/ indexes on relational columns:

§ for $d in db2-fn:sqlquery(“select deptdoc from dept”)/dept/employee

§ for $d in db2-fn:sqlquery(“select deptdoc from dept where deptID = ‘PR27’ ”)

§ for $d in db2-fn:sqlquery(“select deptdoc from dept where contains(deptdoc, SECTION(/dept/employee/) ‘John’)=1”)

… entire column

… single document

… some documents

Page 32: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with XQueryThis query returns all customerinfo elements in documents in the CUSTOMER.INFO column where the value of the attribute Cid is greater than 1000.

Prefix each XQueryquery with thekeyword ‘XQuery’to indicate the DB2 parser to use the XQuery language.

Page 33: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Visual XQuery Builder integrated in DB2 Developer Workbench (Eclipse IDE)

Querying XML data – with XQuery

Page 34: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XQuery and SQL/XML

§ DB2 treats both, SQL and XQuery as primary query languages(hybrid system)

§ SQL and XQuery independently operate on their respective data models

§ DB2 also allows to combine and correlate relational and XML types of data

Two ways to query XML data:

This section:- Querying XML data with SQL- Optional: XQuery embedded in SQL

Last section: - Querying XML data with XQuery- Optional: SQL embedded in XQuery

Page 35: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with SQL/XML

SQL/XML Publishing Functions since DB2 V8.2

Castconverts XML data type to serialized XML as a char/ varchar/ clob/ blob

XMLSERIALIZECastconverts the XML data type into a CLOBXML2CLOBAggregateto group or aggregate XML dataXMLAGGScalarproduces a namespace declarationXMLNAMESPACESScalarconcatenates a variable number of XML valuesXMLCONCAT

Scalarproduces a forest of XML elements from SQL values

XMLFORESTScalarused within XMLELEMENT, specifies attributesXMLATTRIBUTESScalargenerates an XML elementXMLELEMENTTypeDescriptionFunction

Several functions are available to enable XML values to be constructed, or published, from SQL values.

Page 36: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – new SQL/XML functions in DB2 9

Executes an XQuery, returns the result sequence as a relational table (if possible)

Executes an XQuery and returns the result sequence

Determines if an XQuery returns a result (i.e. a sequence of one or more items, non-empty sequence)

Validates XML value against XML schema and type-annotates the XML value.

Parses character/ BLOB data and produces XML value.

Description

Refer to following slides

SELECT ID, XMLQUERY(‘for $i in $d/dept

let $j := $i//namereturn $j’passing xmldoc as “d”)

FROM T1

SELECT ID FROM T1 WHERE

XMLEXSISTS (‘$d/dept[@bldg = 101]’passing xmldoc as “d”)

INSERT INTO T1(XMLDOC) VALUES(XMLVALIDATE (XMLPARSE(DOCUMENT ‘<a>...</a>’))

according to xmlschema id ‘ibm.invoice’)

INSERT INTO T1(XMLDOC) VALUES(XMLPARSE(DOCUMENT ‘<a>some XML doc</a>’))

Example

XMLTABLE

XMLQUERY

XMLEXISTS

XMLVALIDATE

XMLPARSEFunction

Page 37: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with SQL/XML

-- Create a new table with xml datatype columnCREATE TABLE dept(deptID char(8) primary key, deptdoc xml)

-- Plain SQL to get full XML document(s)SELECT deptID, deptdoc FROM dept WHERE deptID = “PR37”

-- SQL with embedded XPath or XQuery expressionSELECT deptID,

XMLQUERY(‘for $i in $d/deptlet $j := $i//namereturn $j’ passing deptdoc as “d”)

FROM deptWHERE deptID LIKE “PR%”AND XMLEXISTS(‘$d/dept[@bldg = 101]’ passing deptdoc as “d”)

Examples

Page 38: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with SQL/XML

<dept bldg=101><employee id=901>

<name><first>John</first><last>Doe</last>

</name><office>344</office>

</employee><employee id=902>

<name><first>Peter</first><last>Pan</last>

</name><office>216</office>

</employee></dept>

216PanPeter902

344DoeJohn901

officelastnamefirstnameempID

SELECT X.* FROM dept,XMLTABLE (‘$d/dept/employee’ passing deptdoc as “d”

COLUMNS “empID” INTEGER PATH ‘@id’,“firstname” VARCHAR(30) PATH ‘name/first’,“lastname” VARCHAR(30) PATH ‘name/last’,“office” INTEGER PATH ‘office’) AS “X”

XMLTABLE(), generates a table from XML data

Page 39: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with SQL/XMLVisual SQL Builder integrated in DB2 Developer Workbench (Eclipse IDE)

Page 40: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Querying XML data – with SQL/XML

Graphically create SQL andSQL/XML queries with the support of an Expression Builder

Page 41: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Data Modification

§ Data Modification Language (DML) only supports full document replace (no XUpdate standard yet):update dept set deptdoc = ? where …

§ DB2 provides a Stored Procedure for sub-document level updates:– Value updates of text nodes or attributes

– Replace elements or document subtrees

– Delete any node or subtree

– Insert (append) any element or subtree

– Document to update: identified by SQL or XQuery

– New values or elements can be static, or produced on the fly by SQL or XQuery

– One or multiple updates in 1 stored procedure call

Page 42: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Data Modification

Call DB2XMLFUNCTIONS.XMLUPDATE (

'<updates>

<update action="replace" col=“1”

path="/dept/employee[@id=301]/phone"><phone>408-463-4963</phone>

</update>

(…)

</updates>',

'Select deptdoc from dept where deptid=1006','',?,?);

Which doc to update

What to update

New value(static)

Type of update

action = replace | append | delete

1 or moreupdates

“Update the phone number of employee 301”

Page 43: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Data Modification

Call DB2XMLFUNCTIONS.XMLUPDATE (

'<updates>

<update using="XQUERY" action="replace" col="1“

path="/dept/employee[@id=301]/phone">for $i in db2-fn:xmlcolumn(‘T.col’)/Phone

where $i/change/emp/@id = 301

return $i/phone

</update>

(…)

</updates>',

'Select deptdoc from dept where deptid=1006','',?,?);

using = XQUERY | SQL

New value, produced by an XQuery

“Update the phone number of employee 301”

Page 44: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

• Overview

• pureXML Storage

• XML Indexes

• XQuery & SQL/XML support

• XML Schema support (XSR)

• Utilities, Tools & API’s

• Summary

Agenda – Start the hybrid engine

• < XML Query Execution />

Page 45: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

DB2 Query Operators (Explain)§ Base access methods: TBSCAN, IXSCAN, FETCH

§ Joins: NLJOIN, MSJOIN, HSJOIN

§ Aggregation: GRPBY

§ Temping: TEMP

§ Sorting: SORT

§ Index AND’ing, dynamic bit map indexing: IXAND

§ Index OR’ing, list prefetch: RIDSCN

§ XML Scan and Navigation: XSCAN

§ XML Index access: XISCAN

§ XML Index anding: XANDOR

§ Table queues (xTQ)

New !

Matthias Nicola, IBM SVL

Extended hybrid optimizer

Tom Eliaz,

Page 46: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]

<customerinfo><name>Matt Foreman</name><phone>905-555-4789</phone>

</customerinfo>

XSCAN – XML Scan Operator

RETURN |

NLJOIN |

/-+-\/ \

TBSCAN XSCAN||

TABLE: MNICOLA.MYTEST

No indexXSCAN = XML Document Scan

• Navigates 1 document at a time

• Evaluates the expression /customerinfo[…]• Returns XML nodes that satisfy the expression

• Takes input via sideways passing NLJOIN

Page 47: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XISCAN – XML Index Scan Operator

RETURN |

NLJOIN |

/-+-\/ \

FETCH XSCAN|

/---+---\/ \

RIDSCN TABLE: | MNICOLA.MYTEST

SORT |

XISCAN

1 index, on name Find matching rows efficiently

using XML Indexes

• Evaluates the expression /customerinfo[name=“Matt Foreman”]

• Varchar(hashed) index may produce falsepositives -> eliminated by XSCAN

•Only for value comparisons, not for “structural” predicates (element existence)

Matthias Nicola, IBM SVLTom Eliaz,

Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]<customerinfo>

<name>Matt Foreman</name><phone>905-555-4789</phone>

</customerinfo>

Page 48: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XANDOR – Pivot XML Anding (and oring)

RETURN |

NLJOIN |

/-+-\/ \

FETCH XSCAN|

/---+---\/ \

RIDSCN TABLE: | MNICOLA.MYTEST

SORT |

XANDOR|

/-+-\/ \

XISCAN XISCAN

2 indexes, on name and phone

Efficient XML Index ANDingusing pivot algorithm

• Combine the results of 2 or more XISCANs

• Only for equality predicates without wildcards,traditional IXAND used otherwise

Matthias Nicola, IBM SVLTom Eliaz,

Query: /customerinfo[name=“Matt Foreman” and phone=“905-555-4789”]

Page 49: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

• Overview

• pureXML Storage

• XML Indexes

• XQuery & SQL/XML support

• < XML Schema support (XSR) />

• Utilities, Tools & API’s

• Summary

Agenda – Start the hybrid engine

• XML Query Execution

Page 50: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

DB2 XML Schema Repository (XSR)§ Database needs a Schema repository

– Stable & high performance access to Schemas for validation at XML insert/update time

– Support for XML Schema management

§ DB2 XML Schema Repository (XSR)– XML Schemas are registered

• Consistent set of .xsd document– Registered Schema identification

• A SQL 2-part name• The URL the Schema is externally known as (e.g. used in schemaLocation attributes)• The "primary namespace"

– Also used by Shred• Stores annotated Schema• Internal formats to make Shredding effecient

– Also DTDs and External entities• Used for entity reference resolution and defaults• NOT used for validation

Page 51: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Register a Schema – via DB2 Control Center

Already registeredXML Schema documents

Register new XMLSchema via wizard.

Page 52: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Register a new Schema – via DB2 Developer Workbench

Invoke Schemaregistration wizard

Browse registeredSchemas

Page 53: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Validation is optional.

Can override schema location found in the documentby referencing a schema from DB2’s schema repository:

insert into dept(deptdoc) values (xmlvalidate(?))

insert into dept(deptdoc) values (xmlvalidate(? according to xmlschema id “ibm.invoice”)

insert into dept(deptdoc) values (xmlvalidate(? according to xmlschema uri ‘http://my.world.com’)

XMLVALIDATE

create table dept(deptID char(8) primary key, deptdoc xml);

Page 54: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Schema evolution with DB2 9

§ No agreement how to evolve schemas because the general problem is very complex

§ Applications do it anyway because there are point solutions

§ Enable schema evolution (don't prevent it)

§ DB2 XML Schema Repository is very flexible– Register conflicting schemas– Register schemas with same namespace– Register schemas with same URL

Page 55: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Shredding into relational tables§ There are still reasons to shred XML:

– Co-existence with legacy applications– Relational processing is faster than XML– Analytics/cubes work over non-XML data

§ Mapping from XML to relational: – Annotate the XML schema– Register XML schemas in the schema repository– Shred via CLP commands or stored procedure calls

§ Replaces XML Extender shred (XML collection)– Faster; using XML Schema

Annotation Example:<xsd:element name=“phone" type="xsd:string“

db2-xdb:rowSet="employee_tab"db2-xdb:column=“phone_col"/>

Page 56: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Define XML mapping rules – DB2 Developer Workbench

Invoke AnnotatedXSD mapping editor

Page 57: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Define XML mapping rules – DB2 Developer Workbench

Graphically define mapping rules fromXML to a relational schema

Page 58: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

• Overview

• pureXML Storage

• XML Indexes

• XQuery & SQL/XML support

• XML Schema support (XSR)

• < Utilities, Tools & API’s />

• Summary

Agenda – Start the hybrid engine

• XML Query Execution

Page 59: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML Utilities & Tools

§ XML Import & Export

§ XML Runstats

§ XML type support in stored procedures

§ XML type supported by HADR replication

§ Control Center extensions (e.g. Index creation wizard)

§ DB2 Developer Workbench

§ and more…

Enhancements for the new XML data type

Page 60: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

XML API enhancements

§ New XML type support added to APIs:

– JDBC, .NET, ODBC/CLI, Embedded SQL

§ SQL/XML supported by all APIs

§ XQuery supported by all APIs

– Result sequence will be treated as a resultset

– Each item will be treated as a row

Page 61: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Agenda – Start the hybrid engine

• Overview

• pureXML Storage

• XML Indexes

• XQuery & SQL/XML support

• XML Schema support (XSR)

• Utilities, Tools & API’s

• < Summary />

• XML Query Execution

Page 62: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Summary

§ Standards compliant & driving the standards – XML, XQuery, SQL/XML, XML Schema …

§ 100% integrated in DB2 – leveraging performance, scalability, reliability, availability …

§ 100% integrated with SQL – XML is a new SQL type

– Access relational and XML data in the same statement

§ 100% integrated with application APIs: – JDBC, ODBC, .NET, embedded SQL

Page 63: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Summary

§ Flexibility because that is what XML is all about…– any document, any schema, not just the ones that are mapped to relational

tables

§ pureXML storage – XML is parsed and stored hierarchical. – shredded: using annotated Schema– CLOB/ BLOB

§ Sophisticated XML indexing

§ Broad XQuery support– both embedded in SQL and as a primary language

§ Supports Digital Signatures– signatures can be validated on retrieved documents

Page 64: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Users will see1. New XML data type for columns

create s1.t1 (c1 int, c2 xml)

2. Language bindings for the new XML typeJava, .Net, C, Cobol, Embedded SQL

3. New XML indexescreate index ix1 on t1(c2) generate keys using

pattern ‘/dept/emp/@empno’

4. An XML Schema/DTD repository

5. Support for XQuery as a primary language as well as:Support for SQL within XQuerySupport for XQuery with SQLSupport for new SQL/XML functions

6. Performance, scale, and everything else expected from a DBMS

Page 65: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

References

www.ibm.com/db2/viperDB2 9 on the net

http://www-128.ibm.com/developerworks/db2/Articles @ IBM developerWorks

Page 66: DB2 9 pureXML - HTW Dresdendbst/material/20061213_122_seubert.pdfDB2 9 – The next generation hybrid data server IBM Software Group - Information Management "There are 68 patents

DB2 9 – The next generation hybrid data server

IBM Software Group - Information Management

Thank you for your attention

Holger SeubertSoftware Engineer

DB2 Information [email protected]