Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle...

Preview:

Citation preview

Foundational Data Modeling and Schema Transformations for XML Data Engineering

Stephen W. LiddleInformation Systems Department

Reema Al-Kamha & David W. EmbleyComputer Science Department

Brigham Young University, Provo, Utah

224 April 2008 UNISCON 2008, Klagenfurt, Austria

XML Data Engineering

Model XML conceptually Map conceptual models to XML Reverse-engineer XML to conceptual models Ensure properties

Information preserving transformations Constraint preserving transformations Redundancy-free guarantees

3

C-XML

24 April 2008 UNISCON 2008, Klagenfurt, Austria

424 April 2008 UNISCON 2008, Klagenfurt, Austria

Modeling XML Conceptually

Scaling the mountain of abstraction Delicate balance

Enough modeling constructs But not to many

High-level capture of essentials Avoidance of low-level implementation details

Formal but easily understood XML needs better abstractions

524 April 2008 UNISCON 2008, Klagenfurt, Austria

XML Schema/Model Mismatch

XML features not explicitly supported in traditional conceptual models: Ordered lists of concepts Choice of concept from among several Mixed content Use of content from another model Nested information hierarchies

C-XML

624 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (1)

Sequence structure Parent concept Ordered child concepts Constrained recurrence of children Constrained recurrence of sequence itself

<xs:sequence minOccurs="1" maxOccurs="2"> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleName" type="xs:string“ minOccurs="0" maxOccurs="2"/> <xs:element name="LastName" type="xs:string"/></xs:sequence>

7

Missing Modeling Constructs (1)

24 April 2008 UNISCON 2008, Klagenfurt, Austria

824 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (2)

Choice structure Parent concept Choose one child concept from several

alternatives Constrained recurrence of chosen child Constrained recurrence of choice itself

<xs:choice maxOccurs="2"> <xs:element name="PhoneNumber" type="xs:string" minOccurs="1" maxOccurs="2" /> <xs:element name="Email" type="xs:string"/> <xs:element name="Fax" type="xs:string"/></xs:choice>

924 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (3)

Mixed attribute Allows character and element data to be

intertwined <xs:complexType mixed="true">

Any and anyAttribute structures Insert structures from other namespaces Constrained recurrence <xs:any namespace="##other" minOccurs="0"/> <xs:anyAttribute namespace="##any"/>

1024 April 2008 UNISCON 2008, Klagenfurt, Austria

Missing Modeling Constructs (4)

Nesting of hierarchical structures Key organizational characteristic of XML Arbitrarily complex nesting possible

11

C-XML Example

24 April 2008 UNISCON 2008, Klagenfurt, Austria

12

C-XML TO XML SCHEMA

24 April 2008 UNISCON 2008, Klagenfurt, Austria

13

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> <xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref> </xs:element> <xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:choice minOccurs="1" maxOccurs="1"> <xs:element name="StudentName" type="xs:string"/> <xs:sequence> <xs:element name="FirstName" type="xs:string"/> <xs:element name="MiddleNames"> <xs:complexType> <xs:sequence> <xs:element name="MiddleName" minOccurs="0" maxOccurs="2"> <xs:complexType> <xs:attribute name="MiddleName" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="MiddleName-Key"> <xs:selector xpath="./MiddleName"/> <xs:field xpath="@MiddleName"/> </xs:key> </xs:element> <xs:element name="LastName" type="xs:string"/> </xs:sequence> </xs:choice> <xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=>exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Semester-Course-Grade-Key"> <xs:selector xpath="./Semester-Course-Grade"/> <xs:field xpath="@Semester"/> <xs:field xpath="@Course"/> <xs:field xpath="@Grade"/> </xs:key> </xs:element> </xs:sequence> <xs:attribute name="StudentOID" type="xs:string" use="required"/> <xs:attribute name="StudentID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element> <xs:element name="Courses"> <xs:complexType> <xs:sequence> <xs:element name="Course" maxOccurs="unbounded"> <xs:complexType> <xs:attribute ref="Course" use="required"/> <xs:attribute name="Department" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="Course-Key"> <xs:selector xpath="./Course"/> <xs:field xpath="@Course"/> </xs:key> </xs:element> <xs:element name="GradStudents"> <xs:complexType> <xs:sequence> <xs:element name="GradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="GradStudentOID" type="xs:string" use="required"/> <xs:attribute name="Advisor" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="GradStudentOID-Key"> <xs:selector xpath="./GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:key> </xs:element> <xs:element name="UndergradStudents"> <xs:complexType> <xs:sequence> <xs:element name="UndergradStudent" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="UndergradStudentOID" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="UndergradStudentOID-Key"> <xs:selector xpath="./UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:key> </xs:element> <xs:attribute name="Course" type="xs:string"/></xs:schema>

C-XML XML Schema

14

Algorithm Overview

Generate a forest of scheme trees Translate an individual object set Translate scheme-tree collections of

object sets Create a root node Add uniqueness constraints Translate generalization/specialization

hierarchies

15

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

Generate Scheme Trees

16

(Course, Department)*

Generate Scheme Trees

17

(GradStudent, Advisor)*(UndergradStudent)*

Generate Scheme Trees

18

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

(Course, Department)*

(GradStudent, Advisor)* (UndergradStudent)*

Generate Scheme Trees

19

Student, StudentID, StudentName, FirstName, LastName

MiddleName Course, Semester, Grade

Course, Department GradStudent, Advisor UndergradStudent

(Student, StudentID, StudentName, FirstName, LastName, (MiddleName)*, (Course, Semester, Grade)*)*

(Course, Department)*

(GradStudent, Advisor)* (UndergradStudent)*

Generate Scheme Trees

20

Individual Object Sets

<xs:attribute name="Department" type="xs:string"/><xs:attribute name="Course" type="xs:string"/><xs:attribute ref="Course"/><xs:element name="FirstName" type="xs:string"/><xs:element name="Student"> <xs:complexType> ... <xs:attribute name="StudentOID" type="xs:string" use="required"/> </xs:complexType></xs:element>

21

Scheme-Tree Translation

Students

Courses GradStudents UndergradStudents

MiddleNames

Course-Semester-GradesMiddleNames

Students

Student

MiddleName

Course GradStudent UndergradStudent

Course-Semester-Grade

22

Scheme-Tree Translation

<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>

<xs:element name="Semester-Course-Grades"> <xs:complexType> <xs:sequence> <xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> ...</xs:element>

23

Scheme-Tree Translation

<xs:element name="Semester-Course-Grade" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Semester" use="required"/> <xs:attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=> exists [0:*] <x1, x2, x3> (Course(x) Student(x1) Semester(x2) Grade(x3) )) --> <xs:attribute name="Grade" type="xs:string" use="required"/> </xs:complexType></xs:element>

2424

25

Root Element

Students

Courses GradStudents UndergradStudents

<xs:schema > <xs:element name="Root"> <xs:complexType> <xs:all> <xs:element ref="Students"/> <xs:element ref="Courses"/> <xs:element ref="GradStudents"/> <xs:element ref="UndergradStudents"/> </xs:all> </xs:complexType> ... </xs:element> ...</xs:schema>

26

Uniqueness Constraints

<xs:element name="Students"> <xs:complexType> <xs:sequence> <xs:element name="Student" maxOccurs="unbounded"> <xs:complexType> ... </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="StudentOID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentOID"/> </xs:key> <xs:key name="StudentID-Key"> <xs:selector xpath="./Student"/> <xs:field xpath="@StudentID"/> </xs:key> </xs:element>

27

Generalization/Specialization

<xs:keyref name="UndergradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./UndergradStudents/UndergradStudent"/> <xs:field xpath="@UndergradStudentOID"/> </xs:keyref> <xs:keyref name="GradStudentOID-Keyref" refer="StudentOID-Key"> <xs:selector xpath="./GradStudents/GradStudent"/> <xs:field xpath="@GradStudentOID"/> </xs:keyref>

28

XML SCHEMA TO C-XML

24 April 2008 UNISCON 2008, Klagenfurt, Austria

29

XML Schema C- XML

30

Algorithm Overview

Generate object sets for each element & attribute Specify built-in and simple types in data frames Obtain relationship sets from parent-child connections Obtain participation constraints from minOccurs, maxOccurs, and use constraints

31

Attribute Transformation

32

Element Transformation

33

Choice Transformation

34

Sequence Transformation

35

Key Constraints Transformation

36

Substitution Group & Extension Transformation

37

Observation on Transformations

These transformations to and from C-XML are not inverses of one another

However,

C-XMLXML Schema

C-XML XML Schema

38

Demo

24 April 2008 UNISCON 2008, Klagenfurt, Austria

39

PROPERTY GUARANTEES

24 April 2008 UNISCON 2008, Klagenfurt, Austria

40

Transformation Properties: C-XML to XML Schema

Theorem 1: … preserves information. Proof: injective

Theorem 2: Allowing for pragma constraints, … preserves constraints. Proof: by construction

Theorem 3: … yields an XML-Schema instance whose complying XML documents are redundancy free. Proof: [TKDE, Aug06]

24 April 2008 UNISCON 2008, Klagenfurt, Austria

41

Transformation Properties: XML Schema to C-XML

Theorem 4: … preserves information. Proof: injective

Theorem 5: … preserves constraints. Proof: by construction

24 April 2008 UNISCON 2008, Klagenfurt, Austria

4224 April 2008 UNISCON 2008, Klagenfurt, Austria

Conclusions

C-XML models XML conceptually Transformations

C-XML to XML Reverse-engineer XML to C-XML

Properties Information preserving Constraint preserving Redundancy-free guarantee

www.deg.byu.edu

Recommended