32
© 2006 IBM Corporation Information On Demand Conference 2007 A New Way to Handle Complex Dynamic Data Type by Using pureXML Samson Tai Manager of BetaWorks, Software Group

A New Way to Handle Complex Dynamic Data Type …€¦ · A New Way to Handle Complex Dynamic Data Type by Using pureXML ... – allows any XML schema in a XML column ... 4 v12 v13

Embed Size (px)

Citation preview

© 2006 IBM Corporation

Information On Demand Conference 2007

A New Way to Handle Complex Dynamic Data Type by Using pureXML

Samson TaiManager of BetaWorks, Software Group

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Introduction

DB2 9 pureXML support enables us to justify new applications which may not have been practical, such as office reports, patients medical records..

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Agenda

Characteristics of DB2 9 pureXML

Advantages of XML data type

pureXML for office reports / eForms /e-Catalog / e-Patient Record

Querying XML

Creating indexes on XML

Conclusion

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

XML is eXtensible Markup Language

– XML describes data

CSV data example

XML data example<book>

<authors><author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>

</authors><title>Database systems</title><price>29</price><keywords>

<keyword>SQL</keyword><keyword>relational</keyword>

</keywords></book>

47; John Doe; 58; Peter Pan; Database systems; 29; SQL; relational

Characteristics of DB2 9 pureXMLWhat is XML?

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Easy to expand the data structure

– Schema Flexibility• what XML is all about• by self describing nature

Characteristics of DB2 9 pureXMLXML is flexible

<book><authors>

<author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>

</authors><title>Database systems</title><price>29</price><keywords>

<keyword>SQL</keyword><keyword>relational</keyword>

</keywords></book>

<book><authors>

<author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>

</authors><title>Database systems</title><price>29</price><keywords>

<keyword>SQL</keyword><keyword>relational</keyword><keyword>hierachical</keyword>

</keywords><review>…….. </review><review>…….. </review>

</book>

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

DB2 V8 XML extender supported two modes for storing XML with trade-offs–Storing as a CLOB

• Parsing degrades query performance–Shredding to relational tables / columns

• Loses Schema Flexibility

DB2 9 pureXML realizes both–Schema Flexibility and Performance

Query Performance

SchemaFlexibility

Characteristics of DB2 9 pureXMLDB2 V8 XML extender vs DB2 9 pureXML

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

XML data type support in Application / API

XML data type support in DDL

XML data storage support

– Storing XML data in parsed hierarchical format (W3C XDM)

– Optimal for query

– Schema Flexibility• Schema Less• One or Multiple Schema

XML indexing

<dept> …<emp>…</emp></dept>

Application

create table dept (…, doc xml)

create index .. on dept(doc)

Characteristics of DB2 9 pureXMLNative XML data type support

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Advantages of XML data type

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Meta Data definition in relational data

– Is mandatory

– Before inserting data

Advantages of XML data typeSingle schema – Relational Data

CBA

CREATE TABLE tbl( A type,

B type, C type)

CREATE TABLE tbl( A type,

B type, C type)

INSERT INTO tbl( A, B, C ) ……

INSERT INTO tbl( A, B, C ) ……

1. Creating table

2. Insert

DB2 CatalogMeta Data

DB2 CatalogMeta Data

C DBA

ALTER TABLE tblADD COLUMN

(D type)

ALTER TABLE tblADD COLUMN

(D type)

INSERT INTO tbl(A, B, C, D) ……

INSERT INTO tbl(A, B, C, D) ……

1. Altering table

2. Insert

DB2 CatalogMeta Data

DB2 CatalogMeta Data

Adding column DTABLE tblTABLE tbl

A B C A B C D

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

No XML data structure definition as a meta data

DB2 pureXML

– parse XML and store

– allows any XML schema in a XML column• Less number of columns involved in programming • Basis for Ultra-RAD

Advantages of XML data typeSchema Less – XML

<Order>..

</Order>

<Product>..

</Product>

<Cust>..

</Cust>

DOC......

TABLE tbl

XML COLUMNdoc

INSERT INTO tbl (…, doc ) (customerXML)INSERT INTO tbl (…, doc ) (customerXML)

INSERT INTO tbl (…, doc ) (productXML)INSERT INTO tbl (…, doc ) (productXML)

INSERT INTO tbl (…, doc ) (orderXML)INSERT INTO tbl (…, doc ) (orderXML)

Inserting rows

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Multiple XML Schemas are allowed in a XML column – Validating XML data by a XML schema

• Per Row basis when insert / update– w/o affecting existing data

• Can restrict sequence of data, repeating limits, data types, etc

– Allows continuous evolution of data• Schema Evolution• Realizes new requirements quickly

Advantages of XML data typeMultiple Schema – XML

<A>

<B>...</B><B>...</B>

<C><D>x</D></C>

</A>

<A>

<B>..</B>

<B>..</B>

</A>

<A> <B>..</B>

</A>

DOC..

INSERT INTO tbl (.., doc )(XML) schema1.0INSERT INTO tbl (.., doc )(XML) schema1.0~Mar31

Apr1~

May1~

TABLE tbl

XML COLUMNdoc

INSERT INTO tbl (.., doc )(XML) schema1.2 INSERT INTO tbl (.., doc )(XML) schema1.2

INSERT INTO tbl (.., doc )(XML) schema1.1INSERT INTO tbl (.., doc )(XML) schema1.1

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Mostly sparse or absent values in many possible attributes–Merchant Catalog–Form–Wage calculation

How do you describe these data in relational model?

Advantages of XML data type - Sparse attributes in relational model

X

Fuel

Req.

..

XXXFXXXEXXXXD

XXXXCXXXXB

XXXXXA

..Thermal req.

Power req.

WeaveStyleMaterialWeightColorSizeproduct

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

XML allows – More natural data

representation• For representing sparse

attributes• Less complex and expensive

search– To store in a column

– To justify new applications

Advantages of XML data type Sparse attributes in XML

<product code=”A”><size>S</size><color>blue</color><material>pearl</material><style>flat</style><weave>smooth</weave>

</product>

<product code=”A”><size>S</size><color>blue</color><material>pearl</material><style>flat</style><weave>smooth</weave>

</product>

<product code=”B”><size>Large</size><weight>1200</weight><power>200</power><thermal>flat</thermal><fuel>coal</fuel>

</product>

<product code=”B”><size>Large</size><weight>1200</weight><power>200</power><thermal>flat</thermal><fuel>coal</fuel>

</product>

<product code=”C”><color>red</color><weight>25</weight><material>wool</material><style>slim</style>

</product>

<product code=”C”><color>red</color><weight>25</weight><material>wool</material><style>slim</style>

</product>

<product code=”D”> …

</product>

<product code=”D”> …

</product>

XML Column

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Using pureXML for office reports / eForms / eCatalog

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Requirements–Various formats / Schema

–Schema Change/Evolution• adding new fields

–Searching capability

–Workflow support

pureXML satisfies all

pureXML for office reports / eFormsRequirements for Data

Form Form

Old New

Form1Form2 Form NForm NForm NForm NForm N

Form

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Schema change involves database maintenance

– Application change is dependent on database change

– TCO for database, DB access and delay of application change

pureXML for office reports / eFormsIf you implemented it on RDB,…

application

database

< >

< >

< >< >< >

< >< >

< >

< >< >

< >< >< >

< >< >

< >< >< >

< >

< >

< >< >< >

< >< >

< >

< >< >

< >< >< >

< >< >

< >< >< >

< >< >

< >< >

Old NewAddingNew fields < > < >

Change

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Schema flexibility reduces TCO

– No change in DB side when schema change

– Allows continuous evolution of data and applications

pureXML for office reports / eFormsSchema flexibility of XML

application

database

< >

< >

< >< >< >

< >< >

< >

< >< >

< >< >< >

< >< >

< >< >< >

< >

< >

< >< >< >

< >< >

< >

< >< >

< >< >< >

< >< >

< >< >< >

< >< >

< >< >

Old

XML column

NewAddingNew fields

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

RDB: Strong DB schema dependency in application development

XML: Least DB dependency in application development–Allows continuous evolution of data and application–Enable to justify projects by less TCO

pureXML for office reports / eFormsProductivity improvement by XML

DB Schema Design / Maintenance

Space management

Using Data Layout

DB Schema Design / Maintenance

Using Data LayoutSpace management

Pre Post

ManagesEvolutionof data

dependency

database

database application

application

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Storing office spread sheets into pureXML enables– Eliminate manual gathering, copy pasting spread sheets– Powerful searching and aggregation – Better decision support by leveraging data belongs to individual

person– Ubiquitous opportunities for leveraging data

pureXML for office reports / eFormsStoring spread sheets as XML

Cell oriented

XML

Purchase estimation

Transform

.XLS file.XML file XSLT

(per format)

<product><price><amount><subtotal></product>

DB2 9XML

XQuerySQL/XML

Purchase OrderPlanning/Decision

Support

DB2import

Save AsXML

product price amount subtotal

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

pureXML for e-Catalog

Similarly, an e-Catalog usually contains the descriptions for many products

Each product has its own set of attributes

– T-shirt: size, style, color, price

– TV set: brand, view-type, signal-type, screen-size, price

The total number of attributes across all products can be huge

E-catalog should efficiently support users to search for products of interest via constraints on attributes

– Find all the OIDs of T-shirts with size='M' and price<$25

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Schemas for e-Catalog

Horizontal Schema: one big "fat" table H(OID, A1, A2, ..., An)– Conceptually easy– Too many columns: impossible for real RDBMS– Very sparse: a lot of null values, resulting in poor query

processing– High processing and maintenance cost for product changesBinary Schema: each attribute corresponds to one table Bi(OID, Ai)– Dense– A lot of joins are involved in search query: poor query

performance

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Schemas for e-Catalog (Continued)

Vertical Schema: one big "skinny" table V(OID, attribute_name, attribute_value)

This is the schema used in many commercial e-commerce systems!

– Advantages• High Flexibility• Ease of schema evolution• Low storage overhead (dense)

– Disadvantages• Writing SQL queries against V is cumbersome• A lot of joins are involved in search queries: query performance is

no better than binary schema

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

OID AttrName Value1 A1 v11 A2 v21 A3 v31 A5 v42 A1 v52 A2 v62 A3 v72 A5 v83 A1 v93 A2 v103 A5 v11

OID xmlDOC1 A52 A43 A5

OID A11 v12 v53 v9

OID A21 v22 v63 v10

OID A31 v32 v74 v125 v15

OID A44 v136 v17

OID A51 v42 v83 v114 v145 v166 v18

Horizontal Schema

Binary Schema

Vertical SchemaOID A1 A2 A3 A4 A5

1 v1 v2 v3 v42 v5 v6 v7 v83 v9 v10 v114 v12 v13 v145 v15 v166 v17 v18

XML Table

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

pureXML for Patient Medical Records(Schema Flexibility)

000001

…月经量周期持续天数

初潮年龄

月经史月经情况

000001

…月经量周期持续天数

初潮年龄

月经史月经情况

1死因0健康0未婚平日喜好进食

…月经情况

配偶死亡原因

配偶死忘年龄

配偶健康情况

结婚年龄

婚姻史

饮食特点

1死因0健康0未婚平日喜好进食

…月经情况

配偶死亡原因

配偶死忘年龄

配偶健康情况

结婚年龄

婚姻史

饮食特点

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

pureXML for Patient Medical Records(Hybrid Support)

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Querying XML

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

XQuery– Can apply XML predicates XPath expression– Returns XML sequence

• from XML column db2-fn:xmlcolumn()• from SQL/XML db2-fn:sqlquery()

SQL/XML select– Can apply

• XML predicates WHERE XMLEXISTS()• Relational predicates trival

– Returns XML sequence• from XML column trivial• from XQuery XMLQUERY()• from relational XMLELEMENT(),etc

– Returns relational • from XML FROM XMLTABLE()• from relational trivial

Querying XMLXQuery and SQL/XML overview

Any to any manipulation

Xpath is the key concept

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Example

– Whose phone number =415 010 1234?• Searches an XML column with xpath• Returns name element

Wildcard is possible• -e.g. for any phone

Querying XMLA Simple XQuery

xquery for $p in db2-fn:xmlcolumn(“DEPT.DOC")/dept/employee/phonewhere $p[ . ='415 010 1234'] return $p/../name

-------------------------------<name>John Doe</name>

<dept bldg=101><employee id=901>

<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

<dept bldg=101><employee id=901>

<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

Dept table, doc column

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Example

– List the name and phone number of the document

• Select qualified row(s)• Searches an XML column with xpath• Return results as relatinal

Querying XMLA simple SQL/XML

select x.* from dept t, xmltable('$d/dept/employee/phone' passing t.doc as “d"columns "NAME" char(20) path '../name'

,"PHONE" char(20) path '. ' ) as x where id=101

NAME PHONE-------------------- --------------------John Doe 408 555 1212John Doe 415 010 1234Peter Pan 408 555 9918

<dept bldg=101><employee id=901>

<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

<dept bldg=101><employee id=901>

<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

Dept table, id, doc column

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Creating indexes on XML

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

XML indexes can be created

– On any elements, attributes• Repeating elements• Schema less docs• Multiple schema docs• With namespace• With wild card expression

Creating Indexes on XMLXML Indexes

create unique index idx2 on dept(doc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;

create index idx3 on dept(doc) generate keyusing xmlpattern '/dept/employee/phone' as sql varchar(35)

DOC

Table dept<dept bldg=”101”>

<employee id=”901”><name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=”902”>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

<dept bldg=”101”><employee id=”901”>

<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>

</employee><employee id=”902”>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

© 2006 IBM Corporation

IBM Software Group | Lotus software

Information On Demand Conference 2007

Conclusion

– pureXML enhances the database applications into the area which may not have been practical.

– Semi-structured office reports / eForms is a typical use case of pureXML.

– Schema flexibility with high performance is the key technology to reduce TCO, as no schema maintenance is required in DB side. This promotes rapid application development.

– pureXML drives ubiquitous database applications for leveraging data