Upload
gabriella-jennings
View
228
Download
0
Embed Size (px)
Citation preview
Native XML Support inDB2 9 for z/OS
Phil GraingerCA
2
Agenda
> Introduction
> What exactly IS XML?
> DB2 9 XML storage
> DB2 9 XML processing
> Further thoughts on XML and DB2
> Bibliography
3
What IS XML?
> eXtensible Markup Language
> Self describing data storage/transport
> Vendor and platform independent Eg RSS feeds
Podcasts
> Can contain structured, unstructured or a mix of data
4
An example of XML
> XML consists of a series of nodes which form a hierarchy
> Neither the names nor the contents of the nodes are predefined
This is why it’s termed “extensible”
> A node is enclosed between <nodename> and </nodename> tags
Windows Word has a neat way of showing this
> I’ll use some XML borrowed from the Sky television news feed
RSS feeds are a great example of XML usage
> Please note that the screenshots are only the FIRST PART of the XML
So some </end> tags are missing
5
An example of “raw” XML
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel>
<title>Sky News | Strange News | First For Breaking News</title> <link>http://news.sky.com/skynews/strangebuttrue</link>
<image>http://static.sky.com/images/skynews/rss/rss.gif<title>Sky News</title><url>http://static.sky.com/images/skynews/rss/rss.gif</url><link>http://news.sky.com/</link>
</image> <description>Sky News Strange But True</description> <language>en-us</language> <copyright>Copyright 2007, BSKYB. All Rights Reserved.</copyright> <lastBuildDate>Thu, 02 Aug 2007 11:22:30 GMT</lastBuildDate> <category>Sky News</category> <ttl>60</ttl>
<channelLinks> <link name="Uk News" url="http://news.sky.com/skynews/uknews"/> <link name="World" url="http://news.sky.com/skynews/worldnews"/> <link name="Money" url="http://news.sky.com/skynews/money"/> <link name="Business" url="http://news.sky.com/skynews/business"/> </channelLinks>
<item><title><![CDATA[World's Cheekiest Burglar Hunted On Facebook]]></title><link>http://news.sky.com/skynews/article/0,,30100-1278207,00.html?f=rss</link><description><![CDATA[A disgruntled homeowner has fallen victim to possibly the
cheekiest burglar in the world - and has now turned to social networking website Facebook to track him down.]]></description>
<enclosure url="" length="123456" type="image/gif" height="45" width="95"><![CDATA[]]></enclosure>
</item>
6
Or as Microsoft Word shows it
7
Or an alternative Word view
> Showing how all the elements and nodes are related
> Also this makes the hierarchical nature of XML even more obvious
8
So we could draw it as a hierarchy!
rss
channel
ttile link image
title url link
description language copyright lastbuilddate category ttl channellinks
link
item
title link description enclosure
*
So, whodefines the
format? In ourexample
this isrepeating
HeyDoes this look
like IMS to you?
9
XML Schemas
> We’ve already seen that XML is infinitely extensible
> Does this mean anarchy?
> It can
> But there are also things called “XML Schemas” A schema defines what can appear in an XML document
A well formed XML document can still violate an XML schema
10
DB2 9 for z/OS support for XML
> DB2 9 provides a new XML datatype
> An entire XML document can be stored in a single XML column
So one column of one row in a table has one complete document
11
DB2 9 for z/OS support for XML
There are some limitations, for example:
> Only well-formed documents are allowed All tags must have end tags
Elements must be nested correctly
Attributes must have values (enclosed by “ or ‘)
Tags are case sensitive </A> doesn’t end <a>
> An XML schema can optionally be applied with the DSN_XMLValidate() function
Providing you have defined a schema to DB2
In database DNSXRS (part of the catalog)
12
DB2 9 for z/OS support for XML
> Documents are not stored as strings So not comparable with any string data type
> But are manipulated by various XML expressions and functions
Including an XML predicate “function”
13
DB2 9 for z/OS support for XML
> Storage of XML data is a little like LOB storage
> When you create a table with an XML column, you get some other things as well
A hidden column called DB2_GENERATED_DOC_ID_FOR_XML
A unique index on this column
An table space to store the XML data
A table in the above table space
An XML index for the above table
> Luckily DB2 creates all of these things for us!
14
DB2 9 for z/OS support for XML
CREATE TABLE GRAPH02.XML_TABLE2
( KEY_COULMN INTEGER NOT NULL,
XML_COLUMN XML NOT NULL);
> Wasn’t THAT easy!
> Note that there is no length specification Maximum XML size is the same as the max LOB
size
Currently 2GB
15
DB2 9 for z/OS support for XML
> There are some things you can’t do with an XML column Sort
Group
Most predicates
Primary, foreign or unique key
> Also, no host languages have XML data type manipulation support
Yet …….
So XML data has to be manipulated as string data
16
Processing XML data
> Inserting data into an XML column is simply a matter of issuing an INSERT statement
Or a LOAD
> The XML statement MUST conform to DB2 standards And must be “well formed”
Look out for SQLCODE -20398 which says you have an error somewhere
– A byte offset IS given, but this is after DB2 has converted the XML to UTF-8
– So may not exactly match where the error is
17
Processing XML data
> And you can optionally apply a schema too remember Bear in mind that there WILL be an overhead to applying a
schema
18
XML functions
> There are a number of functions for manipulating XML data
> Be careful though, not all the functions starting “XML” are for manipulating XML data
Many are for CREATING XML data from relational data
19
XMLDOCUMENT()
> For creating XML documents from relational data or from parts of other XML documents
> At it’s simplest, it returns the same as a basic SELECT from the table
But can produce XML documents with all the necessary headers
20
XMLSERIALIZE()
> Converts XML data into textual data Can include/exclude XML declarations
Converts to LOB, BLOB, CLOB or DBCLOB
21
XPath
> Before we can talk about working with these XML data types, we need to talk about XPath
> XPATH notation allows you to navigate the XML document
> You can use XPATH to return subsets of your documents
22
XPath
> There is not time here for an in-depth XPATH discussion But, for example ….
DB2 needs to know, when we refer to a node name, which specific one we mean
/rss/channel/item/titlewould allow us to work with the /title/ nodes in our XML data
– In our case the <item> node is also a repeating node
23
XPath
> XPath can be used in SELECT lists Using XMLQUERY functions, for example
> In predicates Using the new XMLEXISTS predicate
Returns TRUE or FALSE depending on XPath expression
24
Let’s start simple - XMLQUERY()
> Returns a portion of an XML document matching a query
> Also returns all the subsidiary nodes
SELECT KEY_COLUMN,
XMLSERIALIZE(XMLQUERY('/rss/channel/item
[title="Worlds Cheekiest Burglar Hunted On Facebook"]'
PASSING XML_COLUMN)
AS CLOB(2K))
FROM GRAPH02.XML_TABLE
> Could return
25
XMLQUERY()
> KEY_COLUMN followed by textual XML
1 <item><title>Worlds Cheekiest Burglar Hunted On Facebook</title><link>http://news.sky.com/skynews/article/0,,30100-1278207,00.html?f=rss</link><description>A disgruntled homeowner has fallen victim to possibly the cheekiest burglar in the world - and has now turned to social networking website Facebook to track him down.</description><enclosure url="" length="123456" type="image/gif" height="45" width="95"/></item>
> This is ONE of a repeating set of nodes from one document
26
XMLQUERY()
> HOWEVER, if the XMLQUERY() returns <null> As it will if it can’t find the text in your document
> A row will still be returned for each row in the table KEY_COLUMN value and <null>
> We also need a way to specify predicates on the XML data
27
A bit more complex - XMLEXISTS()
> XMLEXISTS() returns TRUE or FALSE depending on whether an XPath expression finds a result
> So we expand our query into:SELECT KEY_COLUMN,
XMLSERIALIZE(XMLQUERY('/rss/channel/item
[title="Worlds Cheekiest Burglar Hunted On Facebook"]'
PASSING XML_COLUMN)
AS CLOB(2K))
FROM GRAPH02.XML_TABLE
WHERE XMLEXISTS('/rss/channel/item
[title="Worlds Cheekiest Burglar Hunted On Facebook"]'
PASSING XML_COLUMN)
> Now, rows will only be returned where the XPath in XMLEXISTS() finds data
28
Searching
> You can see though that the arguments to XMLQUERY and XMLEXISTS have to be an EXACT match for the content we are searching for
> What if we want to do a wildcarded sort of search
> XPath has no concept of “%” or “_”, but it does have a series of functions that may help
> One useful one is contains Like this:
29
Searching
SELECT KEY_COLUMN,
XMLSERIALIZE(XMLQUERY('/rss/channel
[contains(title,“Facebook")]'
PASSING XML_COLUMN)
AS CLOB(8K))
FROM GRAPH02.XML_TABLE
WHERE XMLEXISTS('/rss/channel
[contains(title,“Facebook")]'
PASSING XML_COLUMN)
30
Searching
>This should be clear, but we are Wanting data returned that has
“Facebook” in the /rss/channel/title node ONLY for rows that contain “Facebook” in
an /rss/channel/title node
31
Not just SELECT
> Of course, we could also say something like
DELETE
FROM GRAPH02.XML_TABLE
WHERE XMLEXISTS('/rss/channel
[contains(title,“Favebook")]'
PASSING XML_COLUMN)
> Delete all the rows that contain “Facebook” in a title node
32
XML indexes
> Using XPath notation, you can create indexes on your XML column
CREATE UNIQUE INDEX XML_INDEX
ON GRAPH02.XML_TABLE(XML_COLUMN)
GENERATE KEY USING XMLPATTERN
'/rss/channel/item/title'
AS SQL VARCHAR(128)
> Yes, this IS a unique index
And it DOES constrain the content of the node specified to VARCHAR(128)
33
XML indexes
> So you can also uses indexes to constrain the CONTENT of nodes
> In the previous example, we said
GENERATE KEY USING XMLPATTERN
'/rss/channel/item/title'
AS SQL VARCHAR(128)
> Any attempt to insert a document with a /title/ longer than 128 characters will fail
34
XML indexes
> Here’s something unusual
> I have two rows in my table SELECT COUNT(*) does indeed return 2
> So why does REBUILD INDEX sayDSNUCRUL - UNLOAD PHASE STATISTICS - NUMBER OF RECORDS PROCESSED=34 ?
Because each row has MULTIPLE index keys! Look again at the CREATE INDEX XPath statement We’re indexing INTO an XML document Each document (in this case) has 17 occurrences of
/rss/channel/item/title
35
More on XPath
> XPath arguments are case sensitive
> Be VERY careful about how you code them!
> Also, errors in XPath specifications can be hard to debug Syntax errors are easy DB2 tells you
XML errors aren’t so simple
– Spelling errors
– Capitalization errors
– Etc.
36
More on XPath
>Why does this not return any data?SELECT KEY_COLUMN,
XMLSERIALIZE(XMLQUERY('/rss/chanel/item
[title="ITV Profits Take A Dive In First Half"]'
PASSING XML_COLUMN)
AS CLOB(2K))
FROM GRAPH02.XML_TABLE
WHERE XMLEXISTS('/rss/chanel/item
[title="ITV Profits Take A Dive In First Half"]'
PASSING XML_COLUMN)
37
More on Xpath
>It’s because we spelled “channel” with one “n”
NO error is returned even though the node “chanel” does not exist in the XML
>Just because no node of that name exists TODAY, that does not mean one will not be there tomorrow
38
Further thoughts on XML
> Firstly, remember that the XML data is effectively free form
> What is in your XML column could be ANY valid XML data Each row in the table does not have to contain similar data
for the XML column
39
Further thoughts on XML
> My examples just happen to contain two almost identical rows
But I could add a third, very different, document
From a business perspective, this would not be sensible
But DB2 would allow it
40
New Features by APAR
> PK51571, 51572 and 51573XMLTABLE() and XMLCAST()
41
XMLTABLE()
> Turns a “repeating group” in an XML document into rows in a “table”
SELECT X.*
FROM GRAPH02.XML_TABLE G,
XMLTABLE('/rss/channel/item' PASSING G.XML_COLUMN
COLUMNS "SEQ" FOR ORDINALITY,
"TITLE" CHAR(64) PATH 'title')
AS X;
42
XMLTABLE()
> Returns
1 Worlds Cheekiest Burglar Hunted On Facebook
2 Deadly Petrol Roller Skates Seized
3 Why Panda Poo Will Play A Part In Olympics
4 Surgeons Operate By Mobile Phone Light
5 Great White Shark Seen In Cornwall
6 Lightning Strike No Flash In The Pan For Survivor
7 Spooky Scamp Has Skill For Sniffing Death
43
XMLTABLE() in a View
CREATE VIEW XML_VIEW AS
SELECT X.*
FROM GRAPH02.XML_TABLE G,
XMLTABLE('/rss/channel/item' PASSING G.XML_COLUMN
COLUMNS "SEQ" FOR ORDINALITY,
"TITLE" CHAR(64) PATH 'title')
AS X
44
XMLTABLE() in a View
SELECT * FROM
XML_VIEW
WHERE TITLE LIKE '%Facebook%‘
> Now we can use wildcards and column names to access our XML data!!
> Do be careful of performance though This will require materialisation of the data
BEFORE the predicate can be applied
45
New Features by APAR
> PK55585 and PK55831 (still open)13 new XPATH functionse.g. fn.lower-case, fn.upper-case, fn.matches, fn.position, fn.replace & fn.tokenize
> PK47594 and PK58766XML Load performance improvement
Questions??
Bibliography
48
Bibliography
> Look out for GC18-9856
“DB2 Version 9.1 for z/OS – What’s New” SG24-7330
“DB2 9 for z/OS Technical Overview” SG24-7239
“Enhancing SAP by Using DB2 9 for z/OS ” SC18-9858
“DB2 Version 9.1 for z/OS – XML Guide”
> SG24-7315 “DB2 9 pureXML Guide” is for DB2 LUW NOT for z/OS