Upload
baldwin-thomas
View
214
Download
0
Embed Size (px)
Citation preview
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping DTDsto Databases
Ronald [email protected]://www.rpbourret.com
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Overview
• Table-based mappings• Object-based mappings• Generating relational schemas from DTDs• Generating DTDs from relational schemas• Mapping XML Schemas to databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table-based Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
There is an obvious mapping fromthis XML document ...
<A> <B> <C>ccc</C> <D>ddd</D> <E>eee</E> </B> <B> <C>fff</C> <D>ggg</D> <E>hhh</E> </B></A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this table
<A> <B> <C>ccc</C> <D>ddd</D> <E>eee</E> </B> <B> <C>fff</C> <D>ggg</D> <E>hhh</E> </B></A>
Table A C D E... ... ...ccc ddd eeefff ggg hhh... ... ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does the mapping work?
• Views the XML document as a single table ...
<Table> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> ...</Table>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does the mapping work?
• ... or a set of tables
<Tables> <Table_1> <Row> <Column_1>...</Column_1> ... <Column_n>...</Column_n> </Row> ... </Table_1> ... <Table_n> ... </Table_n></Tables>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table-based Mappings
• Pros:» Easy to understand
» Easy to write code to transfer data
» Efficient way to transfer data between relational databases
• Cons:» Only works for a small subset of XML documents
• Used by data transfer middleware such as ASP2XML, DatabaseDOM (IBM), DB2XML, DBIx::XML_RDB, ODBC Socket Server, and XML-DB Link
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Object-based Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
There is also an obvious mapping from this XML document ...
<A> <B>bbb</B> <C>ccc</C> <D>ddd</D></A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this object ...
<A> <B>bbb</B> <C>ccc</C> <D>ddd</D></A>
object A { B = "bbb" C = "ccc" D = "ddd"}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to a row in this table
<A> <B>bbb</B> <C>ccc</C> <D>ddd</D></A>
object A { B = "bbb" C = "ccc" D = "ddd"}
Table A B C D... ... ...bbb ccc ddd... ... ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
And an obvious mapping fromthis element type definition ...
<!ELEMENT A (B, C, D)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this class ...
<!ELEMENT A (B, C, D)>
class A { String B; String C; String D;}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this table schema
class A { String B; String C; String D;}
CREATE TABLE A ( B VARCHAR(10) NOT NULL, C VARCHAR(10) NOT NULL, D VARCHAR(10) NOT NULL)
<!ELEMENT A (B, C, D)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
A more complex example
<SalesOrder> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Line Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Line> <Line Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Line></SalesOrder>
• This XML document ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
A more complex example
object SalesOrder { number = 1234; customer = "Gallagher Industries"; date = 29.10.00; lines = {ptrs to Line objects};}
object Line { number = 1; part = "A-10"; quantity = 12; price = 10.95;}
object Line { number = 2; part = "B-43"; quantity = 600; price = 3.95;}
• ... maps to these objects ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
SaleOrdersNumber Customer Date1234 Gallagher Industries 29.10.00... ... ...... ... ...
LinesSONumber Line Part Quantity Price1234 1 A-10 12 10.951234 2 B-43 600 3.99... ... ... ... ...
A more complex example• ... which map to these rows
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Objects are data-specific...• Different for each DTD (schema)• Model the content (data) of the document
<Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Line LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Line> </SalesOrder></Orders>
Orders
SalesOrder
Customer Line
Part
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... not the DOM• Same for all XML documents• Model the structure of the document
Element(Orders)
Element Attr (SalesOrder) (SONumber)
Element Element Element (Customer) (OrderDate) (Line)
<Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> ... </Customer> <OrderDate>150999</OrderDate> <Line LineNumber="1"> <Part Name="Cherries"> ... </Part> <Qty Unit="ton">2</Qty> </Line> </SalesOrder></Orders>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does the mapping work?
1. Map a DTD to an object schema
2. Map the object schema to a database schema» Direct mapping to object-oriented databases
» Object-relational mapping to relational databases
3. (Optional) Combine steps (1) and (2) for aDTD-to-database mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does data transfer work?
• With intermediate objects:1. Transfer data from an XML document to a tree of objects
2. Transfer data from objects to the database
• Without intermediate objects:1. Transfer data directly from an XML document to the database
• Whether intermediate objects are useful dependson the application
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
The Basic Mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Element types are data types
• “Simple” element types have PCDATA-only content
• “Complex” element types have element or mixed content and/or attributes<!ELEMENT Order (Number, Date, Customer, Line*)><!ELEMENT Line (LineNum, Quantity, Part)><!ELEMENT Customer EMPTY><!ATTLIST Customer CustNum CDATA #REQUIRED Name CDATA #REQUIRED Address CDATA #REQUIRED>
<!ELEMENT Number (#PCDATA><!ELEMENT Date (#PCDATA)><!ELEMENT Quantity (#PCDATA)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping “complex” element types
• Map complex element types to classes ...
• ... which are mapped to tables (class tables)
class A { ...}
class A { ...}
Table A...
<!ELEMENT A (B, C)><!ATTLIST A F CDATA #REQUIRED><!ELEMENT B (#PCDATA)><!ELEMENT C (D, E)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping content models
• Map references to simple element types to scalar properties ...
• ... which are mapped to data columns
<!ELEMENT A (B, C)><!ATTLIST A F CDATA #REQUIRED><!ELEMENT B (#PCDATA)><!ELEMENT C (D, E)>
class A { String b; ...}
class A { String b; ...}
Table AColumn b ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping content models (cont.)
• Map references to “complex” element typesto pointer/reference properties ...
• ... which are mapped to primary / foreign key columns
<!ELEMENT A (B, C)><!ATTLIST A F CDATA #REQUIRED><!ELEMENT B (#PCDATA)><!ELEMENT C (D, E)>
class A { String b; C c; ...}
Table AColumn b Column c ...
Table CColumn c ...
class C { ...}
class A { String b; C c; ...}
class C { ...}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping attributes
• Map attributes to scalar properties ...
• ... which are mapped to data columns
<!ELEMENT A (B, C)><!ATTLIST A F CDATA #REQUIRED><!ELEMENT B (#PCDATA)><!ELEMENT C (D, E)>
class A { String b; C c; String f;}
class A { String b; C c; String f;}
Table AColumn b Column c Column f
Table CColumn c ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
The basic mapping: summary
• Map “complex” element types to classes, then to tables• Map content models to properties, then to columns• Map attributes to properties, then to columns• Join class tables with primary key / foreign key pairs
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Some Important Points
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
References are not definitions
• References must be mapped separately for each content model
<!ELEMENT Chapter (Title, Section+)><!ELEMENT Appendix (Title, Section+)><!ELEMENT Title (#PCDATA)><!ELEMENT Section (#PCDATA)>
class Chapter { String title; String[] section;}
class Appendix { String title; String[] section;}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data types
• Can map “simple” data types to any scalar data type
<!ELEMENT Part (Number, Price)><!ELEMENT Number (#PCDATA)><!ELEMENT Price (#PCDATA)>
class Part { String number; float price;}
CREATE TABLE Part ( number CHAR(10) NOT NULL, price REAL NOT NULL}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Names can be mapped, too
• DTD, object schema, and relational schema can use different names
<!ELEMENT Part (Number, Price)><!ELEMENT Number (#PCDATA)><!ELEMENT Price (#PCDATA)>
class PartClass { String numberProp; float priceProp;}
CREATE TABLE PRT ( PRTNUM CHAR(10) NOT NULL, PRTPRICE REAL NOT NULL)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Primary and foreign keys
• Primary key on “one” side of one-to-many relationship• May be in table of parent or child
<SalesOrder> <Number>123</Number> <Date>10/29/00</Date> <Line> <LineNum>1</LineNum> <Part> <PartNum>ABC</PartNum> <Price>12.95</Price> </Part> <Quantity>3</Quantity> </Line> <Line> ... </Line></SalesOrder>
Table SalesNumber, Date
Table PartsNumber, Price
Table LinesSONum, Num, Part, Qty
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Complex Content Models
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Complex content models
• How do you map the following?
<!ELEMENT A (B?, (C | ((D | E | F | G)*, (H | I)+, J?)))>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Sequences
• Map each reference in a sequence to a property ...
• ...which map to tables and columns as appropriate
<!ELEMENT A (B, C, D)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; String c; D d;}
class A { String b; String c; D d;}
Table AColumn b Column c Column d
Table DColumn d ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Choices
• Map each reference in a choice to a property ...
• ... which map to nullable columns
<!ELEMENT A (B | C | D)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; String c; D d;}
class A { String b; String c; D d;}
CREATE TABLE A ( b VARCHAR(10), c VARCHAR(10), d INTEGER}
No
NOT NULLclause
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Choices (cont.)
• Property values can be null ...
• ... so column values can be NULL
<A> <B>bbb</B></A>
object a { b = "bbb" c = null d = null}
Table A b c d... ... ...bbb NULL NULL... ... ...
object a { b = "bbb" c = null d = null}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Repeated children
• Map to multi-valued properties• Map repeated references to arrays of known size ...
• ... which map to multiple columns (shown) orseparate tables
<!ELEMENT A (B, B, B, C)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)>
class A { String[3] b; String c;}
class A { String[3] b; String c;}
Table AColumn b1 Column b2 Column b3 ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Repeated children (cont.)
• Map references with * or + operator to arrays of unknown size ...
• ... which map to separate tables (property tables)
<!ELEMENT A (B*, C, D)><!ELEMENT b (#PCDATA)><!ELEMENT c (#PCDATA)><!ELEMENT D (#PCDATA)>
class A { String[] b; String c; String d}
class A { String[] b; String c; String d}
Table AColumn a Column c Column d
Table BColumn a Column b
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Optional children
• Map to nullable properties, then to nullable columns• Applies to children in a choice ...
• ... and to children with ? or * operator
<!ELEMENT A (B?, C*, D)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; // May be null String[] c; // May be null D d;}
<!ELEMENT A (B | C | D)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; // May be null String c; // May be null D d; // May be null}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups
• Map references in subgroup to properties of parent class
• ... which map to columns in class table
Table AColumn b, Column c, Column d
<!ELEMENT A (B, (C | D))><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; String c; D d;}
class A { String b; String c; D d;}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups (cont.)
• Works because elements in subgroup are still children of parent
<A> <B>bbbbbb</B> <C>cccccc</C></A>
object a { b = "bbbbbb" c = "cccccc" d = null}
<A> <B>bbbbbb</B> <D> <E>eee</E> <F>fff</F> </D></A>
object a { b = "bbbbbb" c = null d = ptr. to object d}
<!ELEMENT A (B, (C | D))>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups (cont.)
• Repeatability and optionality can be indirect
<!ELEMENT A (B, (C | (D, E))+)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)><!ELEMENT E (#PCDATA)>
class A { String b; String[] c; // May be null D[] d; // May be null String[] e; // May be null}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Mixed Content
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Model PCDATA as elements
<A>This text <c>cc</c> makes<b>bbbb</b> no sense<c>cccc</c> except as<b>bb</b> an example.</A>
<A><pcdata>This text </pcdata><c>cc</c><pcdata> makes</pcdata><b>bbbb</b><pcdata> no sense</pcdata><c>cccc</c><pcdata> except as</pcdata><b>bb</b><pcdata> an example.</pcdata></A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping mixed content
• Map PCDATA and element references to arrays of unknown size ...
• ... which are mapped to property tables
<!ELEMENT A (#PCDATA | B | C)*><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)>
class A { String[] pcdata; String[] b; String[] c;}
class A { String[] pcdata; String[] b; String[] c;}
Table AColumn a
Table BColumn a Column b
Table PCDATAColumn a Column pcdata
Table CColumn a Column c
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example
<A>This text <c>cc</c> makes<b>bbbb</b> no sense<c>cccc</c> except as<b>bb</b> an example.</A>
object a { pcdata = {"This text ", " makes ", " no sense ", " except as", " an example."} b = {"bbbb", "bb"} c = {"cc", "cccc"}}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example (cont.)
object a { pcdata = {"This text ", " makes ", " no sense ", " except as", " an example."} b = {"bbbb", "bb"} c = {"cc", "cccc"}}
Table A a 1
Table B a b 1 bbbb 1 bb
Table C a c 1 cc 1 cccc
Table PCDATAa pcdata1 This text 1 makes1 no sense1 except as1 an example.
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Sibling Order
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Siblings
• Sibling means “brother or sister”• Sibling elements and text have the same parent
A
This text C makes B no sense C except as B an example
cc bbbb cccc bb
<A>This text <C>cc</C> makes<B>bbbb</B> no sense<C>cccc</C> except as<B>bb</B> an example</A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Sibling order
• Sibling order is order in which siblings occur
• Sibling order is different from hierarchical order ...
A
This text C makes B no sense C except as B an example1 2 3 4 5 6 7 8 9
cc bbbb cccc bb
A
This text C makes B no sense C except as B an example
cc bbbb cccc bb
1
2
3
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Sibling order (cont.)
• ... and from document order
A1
This text C makes B no sense C except as B an example2 3 5 6 8 9 11 12 14
cc bbbb cccc bb 4 7 10 13
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is sibling order important?
• Usually not important in data-centric applications
• Exception: Document validated against DTD
<Part> <Number>123</Number> <Desc>Turkey wrench</Desc> <Price>10.95</Price></Part> object part {
number = 123 desc = "Turkey wrench" price = 10.95}<Part>
<Price>10.95</Price> <Desc>Turkey wrench</Desc> <Number>123</Number></Part>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is sibling order important? (cont.)
• Very important in document-centric applications
<Review> <p>Ronald Bourret is an <b>excellent writer</b>. Only an <b>idiot</b> wouldn’t read his work.</p></Review>
<Review> <p>Ronald Bourret is an <b>idiot</b>. Only an <b>execellent writer</b> wouldn’t read his work.</p></Review>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping sibling order
• Store order values in:» Order properties/columns
» Mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order properties/columns
• One order property per element or PCDATA ...
<!ELEMENT A (#PCDATA | B | C)*><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)>
class A { String[] pcdata; int[] pcdataOrder; String[] b; int[] bOrder; String[] c; int[] cOrder}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order properties/columns (cont.)
• ... each of which is mapped to a separate column
Table AColumn a
Table Ca, c, cOrder
Table PCDATAa, pcdata, pcdataOrder
Table Ba, b, bOrder
class A { String[] pcdata; int[] pcdataOrder; String[] b; int[] bOrder; String[] c; int[] cOrder}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example
• All sibling order properties share the same order space
<A>This text <c>cc</c> makes<b>bbbb</b> no sense<c>cccc</c> except as<b>bb</b> an example.</A>
object a { pcdata = {"This text ", " makes ", " no sense ", " except as", " an example."} pcdataOrder = {1, 3, 5, 7, 9} b = {"bbbb", "bb"} bOrder = {4, 8} c = {"cc", "cccc"} cOrder = {2, 6}}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example (cont.)
Table A a 1
Table B a b order 1 bbbb 4 1 bb 8
Table C a c order 1 cc 2 1 cccc 6
Table PCDATAa pcdata order1 This text 11 makes 31 no sense 51 except as 71 an example. 9
object a { pcdata={"This text ", " makes ", " no sense ", " except as", " an example."} pcdataOrder={1,3,5,7,9} b={"bbbb","bb"} bOrder={4,8} c={"cc","cccc"} cOrder={2,6}}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Element content andorder properties / columns
• Element content can use order properties / columns
<!ELEMENT A (B, C, D)><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)><!ELEMENT D (E, F)>
class A { String b; int bOrder; String c; int cOrder; D d; int dOrder;}
Table Da, dOrder, ...
Table Aa, b, bOrder, c, cOrder
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing order in the mapping
• Order values can be stored in the mapping
• Children of same type must be groupable
• No ordering within type (data-centric documents only)
OK: <!ELEMENT A (B, C, D)> <!ELEMENT A (B*, C+, D)> <!ELEMENT A (B, (C | D)+>
Not OK: <!ELEMENT A (B, C, B, C, D)> <!ELEMENT A (B*, (C, D)+)>
Element Type Order B 1 C 2 D 3
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Attributes
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Single-valued attributes
• CDATA, ID, IDREF, NMTOKEN, ENTITY, NOTATION, and enumerated
• Map to scalar-valued properties ...
• ... which map to columns
<!ELEMENT A (B, C)><!ATTLIST A D CDATA #IMPLIED><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)>
class A { String b; String c; String d;}
class A { String b; String c; String d;}
Table AColumn b Column c Column d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Multi-valued attributes
• IDREFS, NMTOKENS, and ENTITIES• Map to arrays of unknown size ...
• ... which map to property tables
<!ELEMENT A (B, C)><!ATTLIST A D IDREFS #IMPLIED><!ELEMENT B (#PCDATA)><!ELEMENT C (#PCDATA)>
class A { String b; String c; String[] d;}
class A { String b; String c; String[] d;}
Table AColumn a Column b Column c
Table DColumn a Column d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order of attributes
• Not significant according to XML Information Set• No order properties needed
<A B="bbb" C="ccc" D="ddd"/>
<A C="ccc" D="ddd" B="bbb"/>
=
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order in multi-valued attributes
• One order property per multi-valued attribute
• Separate order space for each multi-valued attribute
<!ELEMENT A EMPTY><!ATTLIST A B IDREFS #IMPLIED C NMTOKENS #IMPLIED>
class A { String[] b; int[] bOrder; String[] c; int[] cOrder;}
<A B="dd ee ff" C="gg hh"/>
object a { b = {"dd", "ee", "ff"} bOrder = {1, 2, 3} c = {"gg", "hh"} cOrder = {1, 2}}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
ID/IDREF(S) Attributes
• Map to primary key / foreign key relationship• “Decorate” IDs if not unique across all documents
A
B C
D
Table Aa ...
Table Ba ref_d ...
<A> <B ref_d="1"> ... </B> <C ref_d="1"> ... </C> <D id="1"> ... </D></A>
Table Ca ref_d ...
Table Did ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Alternate Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Map complex element types toscalar types
• Map references to complex element types to scalar properties, then to columns
• Value is XML (e.g. XHTML)
<!ELEMENT Part (Num, Desc)><!ELEMENT Number (#PCDATA)><!-- Use Inline entity from XHTML --><!ELEMENT Desc (%Inline;)>
class Part { String num; String desc;}
<Part> <Number>127</Number> <Desc> A very <b>big</b> turkey wrench. </Desc></Part>
Table PartNum Desc127 A very <b>big</> turkey wrench.
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Map scalar properties to tables
• Useful for storing BLOBs separately
Table PartsColumn num
Table DescriptionsColumn num, Column desc
class Part { String num; String desc;}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Additional Comments
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
What isn’t mapped?
• Physical structure and information:» Character and entity references
» CDATA sections
» Character encodings
» Standalone declaration
• Document information:» Document type
» DTD
• Other:» Comments
» Processing instructions
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Pros and cons
• Pros:» Mapping model works for all XML documents
» Preserves logical structure of data in XML document
» Provides basis for easy-to-use, model-driven software
» Round-tripping at the element/attribute/text level
• Cons:» Discards physical structure information
» Inefficient for mixed content or deeply nested data
• Ideal for data-centric applications• Poor for document-centric applications
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Widely used
• Relational databases such as Oracle 8i, IBM DB2, Informix, and Microsoft SQL Server
• Data transfer middleware such as ADO, XML SQL Utility for Java (Oracle), XML-DBMS, DB/XML Vision, and InterAccess
• Object servers such as Castor, Object Translator (Informix), and Total-e-Business (Bluestone)
• Varying levels of implementation
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating Database Schema from DTDs
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example
<!ELEMENT Order (OrderNum, Date, CustNum, Line*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Line (LineNum, Quantity, Part)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)><!ELEMENT Part (PartNum, Price)><!ELEMENT PartNum (#PCDATA)><!ELEMENT Price (#PCDATA)>
PartPartPK
OrderOrderPK
LineLinePK
1) Generate class tables and prim. keys for complex element types
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Order (OrderNum, Date, CustNum, Line*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Line (LineNum, Quantity, Part)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)><!ELEMENT Part (PartNum, Price)><!ELEMENT PartNum (#PCDATA)><!ELEMENT Price (#PCDATA)>
PartPartPK, PartNum, Price
OrderOrderPK, OrderNum, Date, CustNum
LineLinePK, LineNum, Quantity
2) Generate columns for single references to simple element types
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Order (OrderNum, Date, CustNum, Line*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Line (LineNum, Quantity, Part)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)><!ELEMENT Part (PartNum, Price)><!ELEMENT PartNum (#PCDATA)><!ELEMENT Price (#PCDATA)>
3) Generate foreign keys for references to complex element types
PartPartPK, PartNum, Price, LineFK
OrderOrderPK, OrderNum, Date, CustNum
LineLinePK, LineNum, Quantity, OrderFK
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating database schemafrom DTDs
• Process element types» Complex element types generate class tables and primary keys
• Process content models» Single references to simple element types generate columns
» Repeated references to simple element types generate property tables with foreign keys
» References to complex element types generate foreign keys in remote class tables
» PCDATA in mixed content generates a property table with a foreign key
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating database schemafrom DTDs (cont.)
• Process attributes» Single-valued attributes generate columns
» Multi-valued attributes generate property tables with foreign keys
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs from Database Schema
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example
<!ELEMENT Orders ()>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
1) Generate element type for root table
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (Date, CustNum)>
<!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
2) Generate PCDATA-only elements for each data column
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (Date, CustNum, OrderNum)> <!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
3) Generate PCDATA-only element for primary key
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (Date, CustNum, OrderNum, Lines*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Lines()>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
4) Add element for table to which primary key is exported
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (Date, CustNum, OrderNum, Lines*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Lines(Quantity, LineNum)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
5) Process remote table (data columns and primary key column)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (Date, CustNum, OrderNum, Lines*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Lines(Quantity, LineNum, Parts)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)><!ELEMENT Parts()>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
6) Add element for table to which foreign key corresponds
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
<!ELEMENT Orders (OrderNum, Date, CustNum, Lines*)><!ELEMENT OrderNum (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT CustNum (#PCDATA)><!ELEMENT Lines (LineNum, Quantity, Parts)><!ELEMENT LineNum (#PCDATA)><!ELEMENT Quantity (#PCDATA)><!ELEMENT Parts (PartNum, Price)><!ELEMENT PartNum (#PCDATA)><!ELEMENT Price (#PCDATA)>
PartsPartNum, Price
OrdersOrderNum, Date, CustNum
LinesOrderNum, LineNum, Quantity, PartNum
7) Process remote table (data columns and primary key column)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs fromdatabase schema
• Process table» Generate element type with sequence content model
• Process data columns» Generate PCDATA-only elements
» Add references to generated elements to sequence
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs fromdatabase schema (cont.)
• Process primary / foreign key columns» If key is primary key, optionally:
• Add PCDATA-only element types for columns in key
• Add references to element types to sequence
» Add reference to remote element type to sequence• If key is primary key, reference is optional/multiple (* operator)
• If key is foreign key and is nullable, reference is optional (? operator)
» Process remote table
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generation problems
• Naming problems (both directions)» Collisions» Illegal names» Names don’t necessarily make sense
• DTD => database schema» Can’t predict data types or lengths» Can’t recognize element types/attributes to use as keys» Can’t determine if primary key is in parent or child» DTD often has excess structure
• Database schema => DTD» Can’t recognize order columns or property tables
• Can’t round-trip
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping XML Schemas to Databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping XML Schemasto databases
• Map complex element types to classes (sort of)» Map complex type extension to inheritance
• Map simple data types to scalar data types» Many facets can’t be mapped» Map wildcards to Java Object, C++ pointer to void, etc.
• Treat “all” groups as unordered sequences• Treat substitution groups as choices• Map most identity constraints to keys
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Resources
• Ronald Bourret’s Papers Page» http://www.rpbourret.com/xml/index.htm
• XML:DB.org’s Resources Page» http://www.xmldb.org/resources.html
• XML:DB Mailing List» http://www.xmldb.org/projects.html
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Questions?
Ronald [email protected]://www.rpbourret.com