View
46
Download
0
Category
Preview:
DESCRIPTION
Efficiently Publishing Relational Data as XML Documents. University of Wisconsin-Madison/ IBM Almaden Research Center. Jayavel Shanmugasundaram. Joint work with:. Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh Berthold Reinwald Eugene Shekita. Outline. Why? How? Which? Hence. - PowerPoint PPT Presentation
Citation preview
Efficiently Publishing Relational Data as XML Documents
Jayavel Shanmugasundaram
University of Wisconsin-Madison/University of Wisconsin-Madison/IBM Almaden Research CenterIBM Almaden Research Center
Joint work with: Rimon BarrMichael CareyBruce LindsayHamid PiraheshBerthold ReinwaldEugene Shekita
XML Example<department name=“Purchasing”>
<emplist>
<employee> John </employee>
<employee> Mary </employee>
</emplist>
<projlist>
<project> Internet </project>
<project> Recycling </project>
</projlist>
</department>
What is the big deal about XML?
• Elegantly models complex, hierarchical/ graph-structured data
• Domain-specific tags (unlike HTML)
• Simple!
Fast emerging as dominant standard for data exchange on the WWW
Why Relational Data?
• Most business data stored in relational databases
• Unlikely to change in the near future– Scalability, Reliability, Performance, Tools
Need efficient means to publish relational data as XML documents
Usage Scenario
Existing Database System
(RDBMS)
Application/User Query to produce XML Documents
XML Result (processed or
displayed in browser)
The Internet
Example Relational Schema
Department
DeptId DeptName
10 PurchasingProject
ProjId DeptId ProjName
888 10 Internet
795 10 Recycling
EmployeeEmpId DeptId EmpName
101 10 John
91 10 Mary
Salary
50K
70K
XML Representation<department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist></department>
Main Issues
• Relational data is flat, XML is a tagged graph
• How do we specify translation from flat model to a graph model?– A query language to map from relations to XML
• How do we transform flat representations to tagged nested representations?– Efficient implementation strategies
Example Relational Schema
Department
DeptId DeptName
10 PurchasingProject
ProjId DeptId ProjName
888 10 Internet
795 10 Recycling
EmployeeEmpId DeptId EmpName
101 10 John
91 10 Mary
Salary
50K
70K
XMLQL: Default XML View
<defaultview>
<department>
<row> <deptid>10</> <deptname>Purchasing</> </row>
</department>
<employee>
<row> <empid>101</> <deptid>10</> <empname>John</> <salary>50K</> </row>
<row> <empid>91</> <deptid>10</> <empname>Mary</> <salary>70K</> </row>
</employee>
<project>
<row> <projid>888</> <deptid>10</> <projname>Internet</> </row>
<row> <projid>795</> <deptid>10</> <projname>Recycling</> </row>
</project>
</defaultview>
XMLQL: Query Over Default ViewWHERE <defaultview.department.row>
<deptid> $did </> <deptname> $dname </>
</> IN DefaultView
CONSTRUCT <department name=$dname>
<emplist>
</emplist>
<projlist>
</projlist> </>
{ WHERE <defaultview.employee.row>
<deptid> $did </> <empname> $ename </> </> IN DefaultView CONSTRUCT <employee> $ename </> }
{ WHERE <defaultview.project.row>
<deptid> $did </> <projname> $pname </> </> IN DefaultView CONSTRUCT <project> $pname </> }
XMLQL: Query Result<department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist></department>
XMLQL: Pros and Cons
• Pros:– Natural for XML users– Infrastructure to build hierarchies of XML views– One query language for XML and relational data
• Cons:– Ignores existing API (JDBC), tools, support– Need to mature new query language (aggregates etc.)
SQL: Key Ideas
• Sub-queries to specify nesting
• Scalar functions to specify tags/attributes– XML Constructors
• Aggregate functions to group child elements
SQL: Query to publish XML
Select DEPT(d.name,
<subquery to produce emplist>,
<subquery to produce projlist>
)From Department d
SQL: XML Constructor
Define XML Constructor DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( <department name=$dname> <emplist> $emplist </emplist> <projlist> $projlist </projlist></department>
)
SQL: Query to publish XML
Select DEPT(d.name,
<subquery to produce emplist>,
<subquery to produce projlist>
)From Department d
SQL: Query to publish XML
Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist>
)From Department d
SQL: XML Constructor
Define XML Constructor EMP(ename: varchar(20)) As (
<employee> <name> $ename </name></employee>
)
SQL: Query to publish XML
Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist>
)From Department d
SQL: Query to publish XML
Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d
Query Result
<department name=“Purchasing”>
<emplist>
<employee> John </employee>
<employee> Mary </employee>
</emplist>
<projlist>
<project> Internet </project>
<project> Recycling </project>
</projlist>
</department>
(<XML Result>)
SQL: Pros and Cons
• Pros:– Reuses SQL infrastructure/API– Natural for SQL users– Efficient execution inside relational engine
• Cons:– Limited support for XML View Composition
Relations to XML: Issues
• Two main differences:– Nesting (structuring)– Tagging
• Space of alternatives:Late TaggingEarly Tagging
Late Structuring
Early StructuringInside Engine Inside Engine
Inside Engine
Outside Engine Outside Engine
Outside Engine
Stored Procedure Approach
• Issue queries for sub-structures and tag them
• Could be a Stored Procedure
DBMS EngineDepartment
Employee
Project
• Problem: Too many SQL queries!
(10, Purchasing)
(John)
(Mary)
(Internet)
(Recycling)
Early Tagging, Early Structuring, Outside Engine
Correlated CLOB Approach
• Problem: Correlated execution of sub-queries
Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d
Early Tagging, Early Structuring, Inside Engine
De-Correlated CLOB Approach
• Problem: CLOBs during processing
With EmpStruct (deptname, empinfo) AS (
Select d.deptname,
XMLAGG(EMP(employee, e.empname))
From department d left join employee e
on d.deptid = e.deptid
Group By d.deptname)
With ProjStruct (deptname, projinfo) AS (
Select d.deptname,
XMLAGG(PROJ(employee, p.projname))
From department d left join project p
on d.deptid = e.deptid
Group By d.deptname)
Select DEPT(name, d1.empinfo, d2.projinfo))
From EmpStruct d1 full join ProjStruct d2
on d1.deptname = d2.deptname
Early Tagging, Early Structuring, Inside Engine
Late Tagging, Late Structuring• XML document content produced without
structure (in arbitrary order)
• Tagger enforces order as final step
Relational QueryProcessing
Unstructured content
TaggingResult XML Document
Redundant Relation Approach• How do we represent nested content as relations?
(10, Purchasing)
(10, Internet)
(10, Recycling)
(10, John)
(10, Mary) (Purchasing, John, Internet)
(Purchasing, John, Recycling)
(Purchasing, Mary, Internet)
(Purchasing, Mary, Recycling)
• Problem: Large relation due to data redundancy!
Late Tagging, Late Structuring
Outer Union Approach• How do we represent nested content as relations?
• Problem: Wide tuples (having many columns)
Department
Employee ProjectDepartment
Employee Project
Union
(Purchasing, Internet)
(Purchasing, Recycling)
(Purchasing, John)
(Purchasing, Mary)
(10, Purchasing)
(Purchasing, null, Internet , 0)
(Purchasing, null, Recycling, 0)
(Purchasing, John, null , 1)
(Purchasing, Mary, null , 1)
Late Tagging, Late Structuring
Hash-based Tagger
• Results not structured early– In arbitrary order
• Tagger has to enforce order during tagging– Hash-based approach
• Inside/Outside engine tagger
Late Tagging, Late Structuring
• Problem: Requires memory for entire document
Late Tagging, Early Structuring• Structured XML document content produced
• Tagger just adds tags (constant space)
Relational QueryProcessing
Structured content
TaggingResult XML Document
Sorted Outer Union Approach
A
B C
D E F G
A B n n E n n
A n C n n F n
A n C n n n G
Late Tagging, Early Structuring
A B n D n n n
Sort By: Aid, Bid, Cid
• Problem: Only partial ordering required
Constant Space Tagger
• Detects changes in XML document hierarchy
• Adds appropriate opening/closing tags
• Inside/outside engine
Late Tagging, Late Structuring
Classification of AlternativesLate TaggingEarly Tagging
LateStructuring
EarlyStructuring
Inside Engine
Inside Engine
De-Correlated CLOB
Out
side
Eng
ine
Stored Procedure
Inside Engine
Out
side
Eng
ine
Sorted Outer Union(Tagging inside)
Sorted Outer Union(Tagging outside)
Unsorted Outer Union(Tagging inside)
Unsorted Outer Union(Tagging outside)
Out
side
Eng
ine
Correlated CLOB
Performance Evaluation
TABLE000 TABLE001 TABLE011TABLE010
TABLE00 TABLE01
TABLE0
Query Depth
Query Fan Out
Database Size
Inside vs. Outside Engine
0
10
20
30
40
50
60
2 3 4
Query Fan Out
Tim
e (in
sec
onds
)
Stored Proc
CLOB-Corr
CLOB-DeCorr
Redundant R
Unsorted OU (Out)
Unsorted OU (In)
Sorted OU (Out)
Sorted OU (In)
Effect of Query Fan Out
0
5
10
15
2 3 4
Query Fan Out
Time (
in sec
onds
)
CLOB-Corr
CLOB-DeCorr
Unsorted OU
Sorted OU
Effect of Query Depth
0
20
40
60
2 3 4
Query Depth
Time (
in se
cond
s)
CLOB-Corr
CLOB-DeCorr
Unsorted OU
Sorted OU
Conclusion
• Publishing XML from relational sources important in Internet
• Language alternatives:– SQL based
– XML query language based
• Implementation Alternatives– Inside engine >> Outside engine
– Unsorted Outer Union : sufficient main memory
– Sorted Outer Union : otherwise
Recommended