Upload
trandung
View
229
Download
0
Embed Size (px)
Citation preview
© 2006 IBM Corporation
Information On Demand Conference 2007
A New Way to Handle Complex Dynamic Data Type by Using pureXML
Samson TaiManager of BetaWorks, Software Group
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Introduction
DB2 9 pureXML support enables us to justify new applications which may not have been practical, such as office reports, patients medical records..
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Agenda
Characteristics of DB2 9 pureXML
Advantages of XML data type
pureXML for office reports / eForms /e-Catalog / e-Patient Record
Querying XML
Creating indexes on XML
Conclusion
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
XML is eXtensible Markup Language
– XML describes data
CSV data example
XML data example<book>
<authors><author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>
</authors><title>Database systems</title><price>29</price><keywords>
<keyword>SQL</keyword><keyword>relational</keyword>
</keywords></book>
47; John Doe; 58; Peter Pan; Database systems; 29; SQL; relational
Characteristics of DB2 9 pureXMLWhat is XML?
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Easy to expand the data structure
– Schema Flexibility• what XML is all about• by self describing nature
Characteristics of DB2 9 pureXMLXML is flexible
<book><authors>
<author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>
</authors><title>Database systems</title><price>29</price><keywords>
<keyword>SQL</keyword><keyword>relational</keyword>
</keywords></book>
<book><authors>
<author id=“47”>John Doe</author><author id=“58”>Peter Pan</author>
</authors><title>Database systems</title><price>29</price><keywords>
<keyword>SQL</keyword><keyword>relational</keyword><keyword>hierachical</keyword>
</keywords><review>…….. </review><review>…….. </review>
</book>
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
DB2 V8 XML extender supported two modes for storing XML with trade-offs–Storing as a CLOB
• Parsing degrades query performance–Shredding to relational tables / columns
• Loses Schema Flexibility
DB2 9 pureXML realizes both–Schema Flexibility and Performance
Query Performance
SchemaFlexibility
Characteristics of DB2 9 pureXMLDB2 V8 XML extender vs DB2 9 pureXML
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
XML data type support in Application / API
XML data type support in DDL
XML data storage support
– Storing XML data in parsed hierarchical format (W3C XDM)
– Optimal for query
– Schema Flexibility• Schema Less• One or Multiple Schema
XML indexing
<dept> …<emp>…</emp></dept>
Application
create table dept (…, doc xml)
create index .. on dept(doc)
Characteristics of DB2 9 pureXMLNative XML data type support
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Advantages of XML data type
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Meta Data definition in relational data
– Is mandatory
– Before inserting data
Advantages of XML data typeSingle schema – Relational Data
CBA
CREATE TABLE tbl( A type,
B type, C type)
CREATE TABLE tbl( A type,
B type, C type)
INSERT INTO tbl( A, B, C ) ……
INSERT INTO tbl( A, B, C ) ……
1. Creating table
2. Insert
DB2 CatalogMeta Data
DB2 CatalogMeta Data
C DBA
ALTER TABLE tblADD COLUMN
(D type)
ALTER TABLE tblADD COLUMN
(D type)
INSERT INTO tbl(A, B, C, D) ……
INSERT INTO tbl(A, B, C, D) ……
1. Altering table
2. Insert
DB2 CatalogMeta Data
DB2 CatalogMeta Data
Adding column DTABLE tblTABLE tbl
A B C A B C D
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
No XML data structure definition as a meta data
DB2 pureXML
– parse XML and store
– allows any XML schema in a XML column• Less number of columns involved in programming • Basis for Ultra-RAD
Advantages of XML data typeSchema Less – XML
<Order>..
</Order>
<Product>..
</Product>
<Cust>..
</Cust>
DOC......
TABLE tbl
XML COLUMNdoc
INSERT INTO tbl (…, doc ) (customerXML)INSERT INTO tbl (…, doc ) (customerXML)
INSERT INTO tbl (…, doc ) (productXML)INSERT INTO tbl (…, doc ) (productXML)
INSERT INTO tbl (…, doc ) (orderXML)INSERT INTO tbl (…, doc ) (orderXML)
Inserting rows
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Multiple XML Schemas are allowed in a XML column – Validating XML data by a XML schema
• Per Row basis when insert / update– w/o affecting existing data
• Can restrict sequence of data, repeating limits, data types, etc
– Allows continuous evolution of data• Schema Evolution• Realizes new requirements quickly
Advantages of XML data typeMultiple Schema – XML
<A>
<B>...</B><B>...</B>
<C><D>x</D></C>
</A>
<A>
<B>..</B>
<B>..</B>
</A>
<A> <B>..</B>
</A>
DOC..
INSERT INTO tbl (.., doc )(XML) schema1.0INSERT INTO tbl (.., doc )(XML) schema1.0~Mar31
Apr1~
May1~
TABLE tbl
XML COLUMNdoc
INSERT INTO tbl (.., doc )(XML) schema1.2 INSERT INTO tbl (.., doc )(XML) schema1.2
INSERT INTO tbl (.., doc )(XML) schema1.1INSERT INTO tbl (.., doc )(XML) schema1.1
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Mostly sparse or absent values in many possible attributes–Merchant Catalog–Form–Wage calculation
How do you describe these data in relational model?
Advantages of XML data type - Sparse attributes in relational model
X
Fuel
Req.
..
XXXFXXXEXXXXD
XXXXCXXXXB
XXXXXA
..Thermal req.
Power req.
WeaveStyleMaterialWeightColorSizeproduct
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
XML allows – More natural data
representation• For representing sparse
attributes• Less complex and expensive
search– To store in a column
– To justify new applications
Advantages of XML data type Sparse attributes in XML
<product code=”A”><size>S</size><color>blue</color><material>pearl</material><style>flat</style><weave>smooth</weave>
</product>
<product code=”A”><size>S</size><color>blue</color><material>pearl</material><style>flat</style><weave>smooth</weave>
</product>
<product code=”B”><size>Large</size><weight>1200</weight><power>200</power><thermal>flat</thermal><fuel>coal</fuel>
</product>
<product code=”B”><size>Large</size><weight>1200</weight><power>200</power><thermal>flat</thermal><fuel>coal</fuel>
</product>
<product code=”C”><color>red</color><weight>25</weight><material>wool</material><style>slim</style>
</product>
<product code=”C”><color>red</color><weight>25</weight><material>wool</material><style>slim</style>
</product>
<product code=”D”> …
</product>
<product code=”D”> …
</product>
…
XML Column
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Using pureXML for office reports / eForms / eCatalog
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Requirements–Various formats / Schema
–Schema Change/Evolution• adding new fields
–Searching capability
–Workflow support
pureXML satisfies all
pureXML for office reports / eFormsRequirements for Data
Form Form
Old New
Form1Form2 Form NForm NForm NForm NForm N
Form
?
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Schema change involves database maintenance
– Application change is dependent on database change
– TCO for database, DB access and delay of application change
pureXML for office reports / eFormsIf you implemented it on RDB,…
application
database
< >
< >
< >< >< >
< >< >
< >
< >< >
< >< >< >
< >< >
< >< >< >
< >
< >
< >< >< >
< >< >
< >
< >< >
< >< >< >
< >< >
< >< >< >
< >< >
< >< >
Old NewAddingNew fields < > < >
Change
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Schema flexibility reduces TCO
– No change in DB side when schema change
– Allows continuous evolution of data and applications
pureXML for office reports / eFormsSchema flexibility of XML
application
database
< >
< >
< >< >< >
< >< >
< >
< >< >
< >< >< >
< >< >
< >< >< >
< >
< >
< >< >< >
< >< >
< >
< >< >
< >< >< >
< >< >
< >< >< >
< >< >
< >< >
Old
XML column
NewAddingNew fields
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
RDB: Strong DB schema dependency in application development
XML: Least DB dependency in application development–Allows continuous evolution of data and application–Enable to justify projects by less TCO
pureXML for office reports / eFormsProductivity improvement by XML
DB Schema Design / Maintenance
Space management
Using Data Layout
DB Schema Design / Maintenance
Using Data LayoutSpace management
Pre Post
ManagesEvolutionof data
dependency
database
database application
application
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Storing office spread sheets into pureXML enables– Eliminate manual gathering, copy pasting spread sheets– Powerful searching and aggregation – Better decision support by leveraging data belongs to individual
person– Ubiquitous opportunities for leveraging data
pureXML for office reports / eFormsStoring spread sheets as XML
Cell oriented
XML
Purchase estimation
Transform
.XLS file.XML file XSLT
(per format)
<product><price><amount><subtotal></product>
DB2 9XML
XQuerySQL/XML
Purchase OrderPlanning/Decision
Support
DB2import
Save AsXML
product price amount subtotal
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
pureXML for e-Catalog
Similarly, an e-Catalog usually contains the descriptions for many products
Each product has its own set of attributes
– T-shirt: size, style, color, price
– TV set: brand, view-type, signal-type, screen-size, price
The total number of attributes across all products can be huge
E-catalog should efficiently support users to search for products of interest via constraints on attributes
– Find all the OIDs of T-shirts with size='M' and price<$25
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Schemas for e-Catalog
Horizontal Schema: one big "fat" table H(OID, A1, A2, ..., An)– Conceptually easy– Too many columns: impossible for real RDBMS– Very sparse: a lot of null values, resulting in poor query
processing– High processing and maintenance cost for product changesBinary Schema: each attribute corresponds to one table Bi(OID, Ai)– Dense– A lot of joins are involved in search query: poor query
performance
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Schemas for e-Catalog (Continued)
Vertical Schema: one big "skinny" table V(OID, attribute_name, attribute_value)
This is the schema used in many commercial e-commerce systems!
– Advantages• High Flexibility• Ease of schema evolution• Low storage overhead (dense)
– Disadvantages• Writing SQL queries against V is cumbersome• A lot of joins are involved in search queries: query performance is
no better than binary schema
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
OID AttrName Value1 A1 v11 A2 v21 A3 v31 A5 v42 A1 v52 A2 v62 A3 v72 A5 v83 A1 v93 A2 v103 A5 v11
OID xmlDOC1 A52 A43 A5
OID A11 v12 v53 v9
OID A21 v22 v63 v10
OID A31 v32 v74 v125 v15
OID A44 v136 v17
OID A51 v42 v83 v114 v145 v166 v18
Horizontal Schema
Binary Schema
Vertical SchemaOID A1 A2 A3 A4 A5
1 v1 v2 v3 v42 v5 v6 v7 v83 v9 v10 v114 v12 v13 v145 v15 v166 v17 v18
XML Table
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
pureXML for Patient Medical Records(Schema Flexibility)
…
000001
…月经量周期持续天数
初潮年龄
月经史月经情况
…
000001
…月经量周期持续天数
初潮年龄
月经史月经情况
…
1死因0健康0未婚平日喜好进食
…月经情况
配偶死亡原因
配偶死忘年龄
配偶健康情况
结婚年龄
婚姻史
饮食特点
…
1死因0健康0未婚平日喜好进食
…月经情况
配偶死亡原因
配偶死忘年龄
配偶健康情况
结婚年龄
婚姻史
饮食特点
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
pureXML for Patient Medical Records(Hybrid Support)
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Querying XML
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
XQuery– Can apply XML predicates XPath expression– Returns XML sequence
• from XML column db2-fn:xmlcolumn()• from SQL/XML db2-fn:sqlquery()
SQL/XML select– Can apply
• XML predicates WHERE XMLEXISTS()• Relational predicates trival
– Returns XML sequence• from XML column trivial• from XQuery XMLQUERY()• from relational XMLELEMENT(),etc
– Returns relational • from XML FROM XMLTABLE()• from relational trivial
Querying XMLXQuery and SQL/XML overview
Any to any manipulation
Xpath is the key concept
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Example
– Whose phone number =415 010 1234?• Searches an XML column with xpath• Returns name element
Wildcard is possible• -e.g. for any phone
Querying XMLA Simple XQuery
xquery for $p in db2-fn:xmlcolumn(“DEPT.DOC")/dept/employee/phonewhere $p[ . ='415 010 1234'] return $p/../name
-------------------------------<name>John Doe</name>
<dept bldg=101><employee id=901>
<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
<dept bldg=101><employee id=901>
<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
Dept table, doc column
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Example
– List the name and phone number of the document
• Select qualified row(s)• Searches an XML column with xpath• Return results as relatinal
Querying XMLA simple SQL/XML
select x.* from dept t, xmltable('$d/dept/employee/phone' passing t.doc as “d"columns "NAME" char(20) path '../name'
,"PHONE" char(20) path '. ' ) as x where id=101
NAME PHONE-------------------- --------------------John Doe 408 555 1212John Doe 415 010 1234Peter Pan 408 555 9918
<dept bldg=101><employee id=901>
<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
<dept bldg=101><employee id=901>
<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=902>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
Dept table, id, doc column
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Creating indexes on XML
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
XML indexes can be created
– On any elements, attributes• Repeating elements• Schema less docs• Multiple schema docs• With namespace• With wild card expression
Creating Indexes on XMLXML Indexes
create unique index idx2 on dept(doc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;
create index idx3 on dept(doc) generate keyusing xmlpattern '/dept/employee/phone' as sql varchar(35)
DOC
Table dept<dept bldg=”101”>
<employee id=”901”><name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=”902”>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
<dept bldg=”101”><employee id=”901”>
<name>John Doe</name><phone>408 555 1212</phone><phone>415 010 1234</phone><office>344</office>
</employee><employee id=”902”>
<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>
</employee></dept>
© 2006 IBM Corporation
IBM Software Group | Lotus software
Information On Demand Conference 2007
Conclusion
– pureXML enhances the database applications into the area which may not have been practical.
– Semi-structured office reports / eForms is a typical use case of pureXML.
– Schema flexibility with high performance is the key technology to reduce TCO, as no schema maintenance is required in DB side. This promotes rapid application development.
– pureXML drives ubiquitous database applications for leveraging data