Native XML support in DB2 9 for z/OS

Preview:

DESCRIPTION

 

Citation preview

Native XML Support inDB2 9 for z/OS

Phil GraingerCA

2

Agenda

> Introduction

> What exactly IS XML?

> DB2 9 XML storage

> DB2 9 XML processing

> Further thoughts on XML and DB2

> Bibliography

3

What IS XML?

> eXtensible Markup Language

> Self describing data storage/transport

> Vendor and platform independent Eg RSS feeds

Podcasts

> Can contain structured, unstructured or a mix of data

4

An example of XML

> XML consists of a series of nodes which form a hierarchy

> Neither the names nor the contents of the nodes are predefined

This is why it’s termed “extensible”

> A node is enclosed between <nodename> and </nodename> tags

Windows Word has a neat way of showing this

> I’ll use some XML borrowed from the Sky television news feed

RSS feeds are a great example of XML usage

> Please note that the screenshots are only the FIRST PART of the XML

So some </end> tags are missing

5

An example of “raw” XML

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel>

<title>Sky News | Strange News | First For Breaking News</title> <link>http://news.sky.com/skynews/strangebuttrue</link>

<image>http://static.sky.com/images/skynews/rss/rss.gif<title>Sky News</title><url>http://static.sky.com/images/skynews/rss/rss.gif</url><link>http://news.sky.com/</link>

</image> <description>Sky News Strange But True</description> <language>en-us</language> <copyright>Copyright 2007, BSKYB. All Rights Reserved.</copyright> <lastBuildDate>Thu, 02 Aug 2007 11:22:30 GMT</lastBuildDate> <category>Sky News</category> <ttl>60</ttl>

<channelLinks> <link name="Uk News" url="http://news.sky.com/skynews/uknews"/> <link name="World" url="http://news.sky.com/skynews/worldnews"/> <link name="Money" url="http://news.sky.com/skynews/money"/> <link name="Business" url="http://news.sky.com/skynews/business"/> </channelLinks>

<item><title><![CDATA[World's Cheekiest Burglar Hunted On Facebook]]></title><link>http://news.sky.com/skynews/article/0,,30100-1278207,00.html?f=rss</link><description><![CDATA[A disgruntled homeowner has fallen victim to possibly the

cheekiest burglar in the world - and has now turned to social networking website Facebook to track him down.]]></description>

<enclosure url="" length="123456" type="image/gif" height="45" width="95"><![CDATA[]]></enclosure>

</item>

6

Or as Microsoft Word shows it

7

Or an alternative Word view

> Showing how all the elements and nodes are related

> Also this makes the hierarchical nature of XML even more obvious

8

So we could draw it as a hierarchy!

rss

channel

ttile link image

title url link

description language copyright lastbuilddate category ttl channellinks

link

item

title link description enclosure

*

So, whodefines the

format? In ourexample

this isrepeating

HeyDoes this look

like IMS to you?

9

XML Schemas

> We’ve already seen that XML is infinitely extensible

> Does this mean anarchy?

> It can

> But there are also things called “XML Schemas” A schema defines what can appear in an XML document

A well formed XML document can still violate an XML schema

10

DB2 9 for z/OS support for XML

> DB2 9 provides a new XML datatype

> An entire XML document can be stored in a single XML column

So one column of one row in a table has one complete document

11

DB2 9 for z/OS support for XML

There are some limitations, for example:

> Only well-formed documents are allowed All tags must have end tags

Elements must be nested correctly

Attributes must have values (enclosed by “ or ‘)

Tags are case sensitive </A> doesn’t end <a>

> An XML schema can optionally be applied with the DSN_XMLValidate() function

Providing you have defined a schema to DB2

In database DNSXRS (part of the catalog)

12

DB2 9 for z/OS support for XML

> Documents are not stored as strings So not comparable with any string data type

> But are manipulated by various XML expressions and functions

Including an XML predicate “function”

13

DB2 9 for z/OS support for XML

> Storage of XML data is a little like LOB storage

> When you create a table with an XML column, you get some other things as well

A hidden column called DB2_GENERATED_DOC_ID_FOR_XML

A unique index on this column

An table space to store the XML data

A table in the above table space

An XML index for the above table

> Luckily DB2 creates all of these things for us!

14

DB2 9 for z/OS support for XML

CREATE TABLE GRAPH02.XML_TABLE2

( KEY_COULMN INTEGER NOT NULL,

XML_COLUMN XML NOT NULL);

> Wasn’t THAT easy!

> Note that there is no length specification Maximum XML size is the same as the max LOB

size

Currently 2GB

15

DB2 9 for z/OS support for XML

> There are some things you can’t do with an XML column Sort

Group

Most predicates

Primary, foreign or unique key

> Also, no host languages have XML data type manipulation support

Yet …….

So XML data has to be manipulated as string data

16

Processing XML data

> Inserting data into an XML column is simply a matter of issuing an INSERT statement

Or a LOAD

> The XML statement MUST conform to DB2 standards And must be “well formed”

Look out for SQLCODE -20398 which says you have an error somewhere

– A byte offset IS given, but this is after DB2 has converted the XML to UTF-8

– So may not exactly match where the error is

17

Processing XML data

> And you can optionally apply a schema too remember Bear in mind that there WILL be an overhead to applying a

schema

18

XML functions

> There are a number of functions for manipulating XML data

> Be careful though, not all the functions starting “XML” are for manipulating XML data

Many are for CREATING XML data from relational data

19

XMLDOCUMENT()

> For creating XML documents from relational data or from parts of other XML documents

> At it’s simplest, it returns the same as a basic SELECT from the table

But can produce XML documents with all the necessary headers

20

XMLSERIALIZE()

> Converts XML data into textual data Can include/exclude XML declarations

Converts to LOB, BLOB, CLOB or DBCLOB

21

XPath

> Before we can talk about working with these XML data types, we need to talk about XPath

> XPATH notation allows you to navigate the XML document

> You can use XPATH to return subsets of your documents

22

XPath

> There is not time here for an in-depth XPATH discussion But, for example ….

DB2 needs to know, when we refer to a node name, which specific one we mean

/rss/channel/item/titlewould allow us to work with the /title/ nodes in our XML data

– In our case the <item> node is also a repeating node

23

XPath

> XPath can be used in SELECT lists Using XMLQUERY functions, for example

> In predicates Using the new XMLEXISTS predicate

Returns TRUE or FALSE depending on XPath expression

24

Let’s start simple - XMLQUERY()

> Returns a portion of an XML document matching a query

> Also returns all the subsidiary nodes

SELECT KEY_COLUMN,

XMLSERIALIZE(XMLQUERY('/rss/channel/item

[title="Worlds Cheekiest Burglar Hunted On Facebook"]'

PASSING XML_COLUMN)

AS CLOB(2K))

FROM GRAPH02.XML_TABLE

> Could return

25

XMLQUERY()

> KEY_COLUMN followed by textual XML

1 <item><title>Worlds Cheekiest Burglar Hunted On Facebook</title><link>http://news.sky.com/skynews/article/0,,30100-1278207,00.html?f=rss</link><description>A disgruntled homeowner has fallen victim to possibly the cheekiest burglar in the world - and has now turned to social networking website Facebook to track him down.</description><enclosure url="" length="123456" type="image/gif" height="45" width="95"/></item>

> This is ONE of a repeating set of nodes from one document

26

XMLQUERY()

> HOWEVER, if the XMLQUERY() returns <null> As it will if it can’t find the text in your document

> A row will still be returned for each row in the table KEY_COLUMN value and <null>

> We also need a way to specify predicates on the XML data

27

A bit more complex - XMLEXISTS()

> XMLEXISTS() returns TRUE or FALSE depending on whether an XPath expression finds a result

> So we expand our query into:SELECT KEY_COLUMN,

XMLSERIALIZE(XMLQUERY('/rss/channel/item

[title="Worlds Cheekiest Burglar Hunted On Facebook"]'

PASSING XML_COLUMN)

AS CLOB(2K))

FROM GRAPH02.XML_TABLE

WHERE XMLEXISTS('/rss/channel/item

[title="Worlds Cheekiest Burglar Hunted On Facebook"]'

PASSING XML_COLUMN)

> Now, rows will only be returned where the XPath in XMLEXISTS() finds data

28

Searching

> You can see though that the arguments to XMLQUERY and XMLEXISTS have to be an EXACT match for the content we are searching for

> What if we want to do a wildcarded sort of search

> XPath has no concept of “%” or “_”, but it does have a series of functions that may help

> One useful one is contains Like this:

29

Searching

SELECT KEY_COLUMN,

XMLSERIALIZE(XMLQUERY('/rss/channel

[contains(title,“Facebook")]'

PASSING XML_COLUMN)

AS CLOB(8K))

FROM GRAPH02.XML_TABLE

WHERE XMLEXISTS('/rss/channel

[contains(title,“Facebook")]'

PASSING XML_COLUMN)

30

Searching

>This should be clear, but we are Wanting data returned that has

“Facebook” in the /rss/channel/title node ONLY for rows that contain “Facebook” in

an /rss/channel/title node

31

Not just SELECT

> Of course, we could also say something like

DELETE

FROM GRAPH02.XML_TABLE

WHERE XMLEXISTS('/rss/channel

[contains(title,“Favebook")]'

PASSING XML_COLUMN)

> Delete all the rows that contain “Facebook” in a title node

32

XML indexes

> Using XPath notation, you can create indexes on your XML column

CREATE UNIQUE INDEX XML_INDEX

ON GRAPH02.XML_TABLE(XML_COLUMN)

GENERATE KEY USING XMLPATTERN

'/rss/channel/item/title'

AS SQL VARCHAR(128)

> Yes, this IS a unique index

And it DOES constrain the content of the node specified to VARCHAR(128)

33

XML indexes

> So you can also uses indexes to constrain the CONTENT of nodes

> In the previous example, we said

GENERATE KEY USING XMLPATTERN

'/rss/channel/item/title'

AS SQL VARCHAR(128)

> Any attempt to insert a document with a /title/ longer than 128 characters will fail

34

XML indexes

> Here’s something unusual

> I have two rows in my table SELECT COUNT(*) does indeed return 2

> So why does REBUILD INDEX sayDSNUCRUL - UNLOAD PHASE STATISTICS - NUMBER OF RECORDS PROCESSED=34 ?

Because each row has MULTIPLE index keys! Look again at the CREATE INDEX XPath statement We’re indexing INTO an XML document Each document (in this case) has 17 occurrences of

/rss/channel/item/title

35

More on XPath

> XPath arguments are case sensitive

> Be VERY careful about how you code them!

> Also, errors in XPath specifications can be hard to debug Syntax errors are easy DB2 tells you

XML errors aren’t so simple

– Spelling errors

– Capitalization errors

– Etc.

36

More on XPath

>Why does this not return any data?SELECT KEY_COLUMN,

XMLSERIALIZE(XMLQUERY('/rss/chanel/item

[title="ITV Profits Take A Dive In First Half"]'

PASSING XML_COLUMN)

AS CLOB(2K))

FROM GRAPH02.XML_TABLE

WHERE XMLEXISTS('/rss/chanel/item

[title="ITV Profits Take A Dive In First Half"]'

PASSING XML_COLUMN)

37

More on Xpath

>It’s because we spelled “channel” with one “n”

NO error is returned even though the node “chanel” does not exist in the XML

>Just because no node of that name exists TODAY, that does not mean one will not be there tomorrow

38

Further thoughts on XML

> Firstly, remember that the XML data is effectively free form

> What is in your XML column could be ANY valid XML data Each row in the table does not have to contain similar data

for the XML column

39

Further thoughts on XML

> My examples just happen to contain two almost identical rows

But I could add a third, very different, document

From a business perspective, this would not be sensible

But DB2 would allow it

40

New Features by APAR

> PK51571, 51572 and 51573XMLTABLE() and XMLCAST()

41

XMLTABLE()

> Turns a “repeating group” in an XML document into rows in a “table”

SELECT X.*

FROM GRAPH02.XML_TABLE G,

XMLTABLE('/rss/channel/item' PASSING G.XML_COLUMN

COLUMNS "SEQ" FOR ORDINALITY,

"TITLE" CHAR(64) PATH 'title')

AS X;

42

XMLTABLE()

> Returns

1 Worlds Cheekiest Burglar Hunted On Facebook

2 Deadly Petrol Roller Skates Seized

3 Why Panda Poo Will Play A Part In Olympics

4 Surgeons Operate By Mobile Phone Light

5 Great White Shark Seen In Cornwall

6 Lightning Strike No Flash In The Pan For Survivor

7 Spooky Scamp Has Skill For Sniffing Death

43

XMLTABLE() in a View

CREATE VIEW XML_VIEW AS

SELECT X.*

FROM GRAPH02.XML_TABLE G,

XMLTABLE('/rss/channel/item' PASSING G.XML_COLUMN

COLUMNS "SEQ" FOR ORDINALITY,

"TITLE" CHAR(64) PATH 'title')

AS X

44

XMLTABLE() in a View

SELECT * FROM

XML_VIEW

WHERE TITLE LIKE '%Facebook%‘

> Now we can use wildcards and column names to access our XML data!!

> Do be careful of performance though This will require materialisation of the data

BEFORE the predicate can be applied

45

New Features by APAR

> PK55585 and PK55831 (still open)13 new XPATH functionse.g. fn.lower-case, fn.upper-case, fn.matches, fn.position, fn.replace & fn.tokenize

> PK47594 and PK58766XML Load performance improvement

Questions??

Bibliography

48

Bibliography

> Look out for GC18-9856

“DB2 Version 9.1 for z/OS – What’s New” SG24-7330

“DB2 9 for z/OS Technical Overview” SG24-7239

“Enhancing SAP by Using DB2 9 for z/OS ” SC18-9858

“DB2 Version 9.1 for z/OS – XML Guide”

> SG24-7315 “DB2 9 pureXML Guide” is for DB2 LUW NOT for z/OS

Recommended