28
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Using XML to View Relational Data

Xin He

AMPS SeminarNovember 30, 2001

Page 2: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Outline

Introduction

An XML Primer

Several related systems and results

An XML middle-ware system: SilkRoute

XML and Spreadsheets

XML and OLAP

Page 3: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Introduction

XML -- Basic ideas are simple but potential impact is significant Easy to read Simple and flexible Easy to extract useful information

Research opportunities XML brings to Database Management XML will “turn the Web into a database” Thus general Database Management issues arise for

XML

Page 4: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Introduction

Why using XML to view Relational Data

XML is emerging as the standard data-exchange format between applications on the Web

While most existing data is stored in relational databases This scenario is common This scenario is challenging

Relational data is flat, normalized, its schema is often proprietary XML data is nested, unnormalized, its schema is public So, the mapping is inherently complex and maybe difficult to

compute efficiently

Page 5: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

An XML(eXtensible Markup Language) Primer

Example XML file (From Apache Tomcat 4.0 configuration file):

1. <Server>2. <Service name=“Tomcat-Standalone”>3. <Connector className=“http.HttpConnector” port=“80”>4. <Connector className=“http.HttpConnector” port=“8443”>5. <Factory className=“SSLServerSocketFactory”>6. </Connector>7. <Engine>8. <parameter>9. <name>mail.smtp.host</name>10. <value>localhost</value>11. </parameter>12. </Engine>13. </Service>14. <Service name=“Tomcat-Apache”>15. <Connector className=“warp.WarpConnector” port=“8008”>16. </Service>

17.</Server>

Page 6: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

An XML PrimerExample DTD(Document Type Definitions)

1. <?xml encoding="US-ASCII"?>

2. <!ELEMENT Server (Service*)>

3. <!ELEMENT Service (Connector*, Engine*)>

4. <!ATTLIST Service name ID>

5. <!ELEMENT Connector (Factory?)>

6. <!ATTLIST Connector className #REQUIRED

7. port #REQUIRED>

8. <!ELEMENT Engine (Parameter)>

9. <!ELEMENT Parameter (name, value)>

10. <!ELEMENT name (#PCDATA)>

11. <!ELEMENT value (#PCDATA)>

Page 7: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

An XML Primer

XML is a method for putting structured data in a text file

XML looks a bit like HTML but isn't HTML

XML is a family of technologies http://www.w3c.org

XML is new, but not that new

XML is license-free, platform-independent and well-supported

Page 8: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Several Related Systems and Results

An XML middle-ware system in AT&T research labs: SilkRoute Automate the conversion of realtional data into XML

A new paper published to optimize the query processing algorithm

IBM research center: Efficiently Publishing Relational Data as XML Documents Language specification is based on SQL with minor extension

So standard APIs like ODBC can be used

Query performance is worse than the revised SilkRoute

Page 9: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Several Related Systems and Results

UCSD: MIX--Mediation of Information using XML DTD inference Concentrate on Information Integration A more complicated architecture

University of Wisconsin-Madison: Relational Databases for Querying XML Documents Objective is different, but part of techniques is related Limitations and opportunities: some valuable points

Page 10: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

SilkRoute Introduction

Public DTDs: Numerious industries are working on it http://www.oasis-open.org/cover

Construct XML views which conform to the public DTDs from vast stores of relational data automatically

The system is general, dynamic, and efficient

Page 11: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Motivating ExampleA simple example from electronic commerce: Suppliers provide product information to resellers

For mutual benefit, they have agreed on a particular DTD

Supplier's business data is organized according to a relational schema

Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers

Assume supplier wants to export a subset of its inventory(e.g. only its winter-outerwear stock)

Resellers: access that data by formulating queries over the XML view

Reseller is typically only interested in a small subset of the info(e.g. sale price less than half of the retail price)

Relational schemas differ from supplier to supplier

Page 12: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Architecture

SilkRoute’s Architecture

Page 13: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

The View Query: RXL

Full power on both sides Joins, selection conditions, aggregates, and nested

queries Generate XML data with arbitrary levels of nesting

RXL has three powerful features which make it possible to create arbitrary complex XML structures Nested queries, Skolem functions, and Block structure

Page 14: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

1. construct2. <supplier ID=Supp()>3. <company ID=Comp()>”Acme Clothing”</company>4. {5. from Clothing $c6. where $c.category = “outerwear”7. construct 8. <product ID=Prod($c.pid)>9. <name ID=Name($c.pid,$c.item)>$c.item</name>10. <category ID=Cat($c.pid,$c.category)>$c.category</category>11. <retail ID=Retail($c.pid,$c.price)>$c.price</retail>12. {from SalePrice $s13. where $s.pid = $c.pid14. construct15. <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail>16. }17. {from Problems $p18. where $p.pid = $c.pid19. construct20. <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)>21. $p.comments22. </report>23. }24. </product>25. } 26. </supplier>

RXL (Relational to XML transformation Language) view query ( V )

Page 15: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

The User Query: XML-QL1. Construct2. <results>3. { where <supplier>4. <company>$company</company>5. <product>6. <name>$name</name>7. <retail>$retail</retail>8. <sale>$sale</sale>9. </product>10. </supplier> in http://acme.com/products.xml,11. $sale < 0.5 * $retail12. construct13. <result ID=Result($Company)>14. <supplier>$company</supplier>15. <name>$name</name>16. </result>17. }18. </results>

XML-QL user query (U)

Page 16: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

The Query Composer

The composed RXL query is equivalent to the user query evaluated on the materialized view

Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database

Page 17: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

The Query Composer

1. Construct

2. <results>

3. { from Clothing $c, SalePrice $sfrom Clothing $c, SalePrice $s

4.4. where $c.category = “outerwear”,where $c.category = “outerwear”,

5.5. $c.pid = $s.pid,$c.pid = $s.pid,

6. $s.price < 0.5 * $c.retail$s.price < 0.5 * $c.retail

7. construct

8. <result ID=Result(<result ID=Result( “Acme Clothing” )>)>

9. <supplier><supplier> ”Acme Clothing” </supplier></supplier>

10. <name<name ID=Name($c.pid, $c.item)>$c.item </name></name>

11. }

12. </results>

Composed RXL query ( C )

Page 18: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Composition Algorithm

Problem Statement:

* C = UοV * XD = V ( RDB )

* A = U ( XD ) = U ( V ( RDB ) )

* C ( RDB ) = A = U ( V ( RDB ) )

= ( UοV ) ( RDB )

Page 19: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Composition Algorithm

Key Idea: Match U's pattern on V directly, without constructing XD

First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted)

Second step: Construct C C's construct clause is the same as U's construct clause, with

variable substitutions C's from and where clauses consist of all the "relevant" from and

where in V and all the where filters in U, with variable renaming

Page 20: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

1. construct2.2. <supplier ID=Supp()><supplier ID=Supp()>3. <company ID=Comp()>”Acme Clothing”</company><company ID=Comp()>”Acme Clothing”</company>4. {5. from Clothing $c6. where $c.category = “outerwear”7. construct 8. <product ID=Prod($c.pid)><product ID=Prod($c.pid)>9.9. <name ID=Name($c.pid,$c.item)>$c.item</name><name ID=Name($c.pid,$c.item)>$c.item</name>10. <category ID=Cat($c.pid,$c.category)>$c.category</category>11. <retail ID=Retail($c.pid,$c.price)>$c.price</retail><retail ID=Retail($c.pid,$c.price)>$c.price</retail>12. {from SalePrice $s13. where $s.pid = $c.pid14. construct15. <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail><sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail>16. }17. {from Problems $p18. where $p.pid = $c.pid19. construct20. <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)>21. $p.comments22. </report>23. }24. </product>25. } 26. </supplier>

RXL view query ( V ) with patterns from XML-QL query in REDRED

Page 21: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Composition Algorithm

Diagram of Query Composition

Page 22: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Translator and XML Generator

The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template

Initial SilkRoute uses full partition strategy The IBM research paper: sorted, outer union strategy The 2001 SIGMOD paper gives an optimal algorithm

XML generator merges the result tuples into XML document in a single pass

Page 23: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Other Scenarios

Minor changes to the information flow permit other

scenarios

Export the entire database as one, large XML document by materializing the view query

The result of query composition can be kept virtual for later composition with other user queries

Page 24: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Alternative Approaches Materialized XML view

Precompute or compute on demand Feasible when the XML view is small and the

applicaton needs to load the entire view in memory Data may become stale

Use a native XML database engine Stanford DB group: Lore Project One can materialize an XML view using SilkRoute

and store the result in an XML engine Avoid the cost of query composition Performance is unlikely to compete with SQL engine

anytime soon Can't guarantee data freshness and incur a high space cost

Page 25: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

XML and SpreadsheetsXML support in Microsoft Excel 2002 for Office XP

"..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment."

Bi-direction transformation Excel can recognize and open XML documents including XSL

processing XML flattening

Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"

Page 26: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

XML and Spreadsheets

Well, Microsoft enable Spreadsheets to manipulate

XML documents

We can enable Spreadsheets to manipulate

relational data using XML view!

Page 27: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

XML and SpreadsheetsCan we execute Spreadsheet style processing directly on

XML files ?

XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets

Given system like SilkRoute

Can we define Spreadsheet function map on XML view? Think the function executions as XML view queries

Define one function on the other is similar as define a new XML view from an existing composed view

It will also arise challenge in the view generation systems

Page 28: Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

XML and OLAP (OnLine Analytical Processing)

Physically integrating unexpected data into OLAP systems is time-consuming

Logical integration is the better choice XML’s increasing use in data-exchange suggests that the

required data can be available through XML views Possibilities:

Reference external XML data in OLAP queries XML data can be presented along with dimensional data in the result

of an OLAP query Use XML data for selection and grouping

Microsoft and Hyperion published "Open XML for Analysis

Specification" in April 2001