19
Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer Science, Aalborg University www.cs.auc.dk/NDB

Query Optimization For OLAP-XML Federations

Embed Size (px)

DESCRIPTION

Query Optimization For OLAP-XML Federations. Dennis Pedersen Karsten Riiis Torben Bach Pedersen. Nykredit Center for Database Research Department of Computer Science , Aalborg University www.cs.auc.dk/NDB. Overview. Motivation OLAP-XML federations Federation system Inlining - PowerPoint PPT Presentation

Citation preview

Page 1: Query Optimization  For OLAP-XML Federations

Query Optimization For OLAP-XML Federations

Dennis Pedersen Karsten Riiis Torben Bach Pedersen

Nykredit Center for Database ResearchDepartment of Computer Science, Aalborg University

www.cs.auc.dk/NDB

Page 2: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 2November 8, 2002

Overview• Motivation • OLAP-XML federations• Federation system• Inlining• Optimizing XML retrieval • Caching and prefetching• Experimental results• Related work• Conclusion and future work

Page 3: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 3November 8, 2002

Motivation• OLAP-systems are good for complex analysis queries

Easy-to-use Fast Business, science ...

• More and more data is needed for the analyses• Problems with physical integration in existing OLAP systems

Integrating new data requires (partial) cube rebuild => too slow

• Problems arise with Short term and varying data needs

Demographic info, disease info... Dynamic data

Stock quotes, competitors prices, disease info... Data with limited access

Product information, public databases, scientific data banks

Page 4: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 4November 8, 2002

Motivation• Data will often (in the future) be available in XML format

Info syndicators, public institutions,… Scientific sources: SWISS-PROT, TrEMBL, GenBank,…

• Goal: flexible access to XML data from OLAP systems

<Cities><City>

<Name>Aalborg</Name><Population>160.000</Population><Category>University Town</Category><Company>Sonofon</Company><Company>Nokia</Company>

</City>…

</Cities>

• Hierarchical

• Elements

• Start-tag, end-tag

• Describes the semantics of the data

• Irregular structure

Page 5: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 5November 8, 2002

OLAP-XML Federation• Using of external XML data as virtual dimensions

δ: decoration (add extra dimension) π: generalized projection (grouping) σ: selection

• Loosely-coupled federation Data stored in the ”best” place Supports ad-hoc integration Rapid prototyping of DWs High degree of autonomy Data is always up-to-date Performance problem?

• SQLXM language SELECT City[ALL]/Category, SUM(Price) FROM Sales

GROUP BY City[ALL]/Category WHERE City[ALL]/Population>200,000

Logicalfederation

OLAP

SQLXM query

OLAP query XML query

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

XML

Page 6: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 6November 8, 2002

OLAP-XML Federation

<Cities><City>

<Name>Aalborg</Name><Population>160,000</Population><Category>University Town</Category><Company>Sonofon</Company><Company>Nokia</Company>

</City>

</Cities>

Time

T

Year

Quarter

Month

Products

T

Family

Brand

Product

Stores

T

Country

City

Store

Amount, Price

Page 7: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 7November 8, 2002

OLAP-XML Federation

<Cities><City>

<Name>Aalborg</Name><Population>160,000</Population><Category>University Town</Category><Company>Sonofon</Company><Company>Nokia</Company>

</City>

</Cities>

Time

T

Year

Quarter

Month

Products

T

Family

Brand

Product

Stores

T

Country

City

Store

Amount, Price

Link

Page 8: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 8November 8, 2002

Prototype Architecture• SQLXM queries handled by

Federation Manager (FM)• XML Data stored in XML

Component (Tamino)• OLAP data stored in OLAP

Component (MS Analysis)• Link, Temp, and Meta-data

stored in RDBMS• FM fetches relevant data

from components and processes SQLXM queries using Temp data

Tem p. D ata XM L D ata

U ser In terface

XM L C om p. Interface

O LAP Com p. Interface

FederationM anager

SQ L M

SQ L XM

SQ L SQ L

SQ LO LAP Q uery

LanguageXM L Q ueryLanguage

XPath

M eta-data Link D ata

O LAP D ata

Page 9: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 9November 8, 2002

Query Processing and Optimization• Naive strategy

Fetch necessary XML data and store in Temp DB Fetch necessary OLAP data and store in Temp DB Join XML and OLAP data + perform XML-specific part of SQLXM query Problem: too much data transferred to Temp DB

• Rule-based optimization Based on algebraic transformation rules Partition query tree into OLAP part, XML part, and Temp part Push as much evaluation to components as possible (always valid)

• Cost-based optimization Inline literal XML data values in OLAP queries Efficient fetch of XML data over XPath interfaces Caching/Prefetching in this particular setting Focus of this paper

Page 10: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 10November 8, 2002

Overview• Motivation • OLAP-XML federations• Federation system• Inlining• Optimizing XML retrieval • Caching and prefetching• Experimental results• Related work• Conclusion and future work

Page 11: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 11November 8, 2002

Inlining• Problem: decorations used for selections are expensive

A lot of low-level OLAP data must be transferred to Temp

• Solution: inlining Get level expression predicate data from XML component Transform predicate to refer to literal data values Allows selection+aggregation to take place in OLAP component

• Predicate length will be linear/quadratic in no of dim values• Queries may get too long

Split into partial queries

• Inlining is particularly effective for OLAP systems

Allows more aggregation in OLAP component

Page 12: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 12November 8, 2002

Inlining• Problem: what predicates to inline?

Exponential number of possibilities Theorem: choosing the best inlining

strategy is NP-hard No obvious best choice Often, an exhaustive search is

feasible Otherwise, two heuristics are used

• Top down (see right above) Start from no inlining and add Will often fail

• Bottom up Start with all inlined and remove Less likely to fail

• General strategy Generate all bottom up, except if

cost goes up very much

ABC D

ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D

Ø

12 100

50

2

20 14

ABC D

ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D

Ø

100 20

2

30

150

90

Page 13: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 13November 8, 2002

Optimizing XML Retrieval• XML component queries retrieve (locator,level exp) tables

Example tuple: (Aalborg, 160,000)

• Easy to do in XQuery, but not in XPath XPath can fetch only entire existing nodes Many systems provide only an XPath interface

• One query for every dimension value? Often too much overhead because of many queries Combining with OR/IN not possible as XPath result is unordered

• Common parent of locator and level exp can be fetched Larger data volume may have to be returned

• Data for several level exps may be fetched together by combining XPath expressions

Algorithm for doing this in full paper

• Cost analysis chooses between these options

Page 14: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 14November 8, 2002

Caching and pre-fetching• General technique specialized for this particular domain

Caching/prefetching not feasible for very dynamic data

• Given a new query tree the cache is checked Identical query tree found: use cache (always fastest) Lower part of query tree found: compare cost of using cache to not

using it (not using cache may be faster due to OLAP pre-aggregation)

• Caching and prefetching is done for component queries only Entire federation queries take up too much space and has too little

probability of reuse

• XML cache Raw XML data + temporary XML mapping tables

• OLAP cache Intermediate OLAP result tables

• LRU cache replacement strategy used

Page 15: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 15November 8, 2002

Cost Model• Federation cost model

Parallel exec of queries 1-4 OLAP waits for 3+4 (inlined) OLAP query not split Results combined in Temp

• OLAP cost model CostOLAP = tOLAPOH+ tOLAPEval+ tOLAPTrans

Based on network+disk rate, selectivity, fact size, rollup fraction, cube size, and possibilities for using pre-aggregation

• XML cost model Based on assumption of constant data rate from a given document CostXML = tXMLOH+ tXMLProc

• Temp component used standard relational cost model • All models adapt based on real execution times• Details + parameter estimation in another paper

XML1

XML2

XML3

XML4

OLAP

TEMP

Time

Page 16: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 16November 8, 2002

Experiments• Test data

50 MB TPC-H OLAP, 10MB preagg, 10 MB cache, 3 MB XML,

• Behavior of different query types Decoration, grouping, selection Temp time generally short compared to OLAP and XML time OLAP time generally highest

• Inlining Compared three cases (no, simple, and cost-based inlining) Simple inlining 5 times faster than no inlining Cost-based inlining

3 times faster than simple inlining 15 times faster than no inlining

• Caching/prefetching Remote data versus all data stored locally Local evaluation 4-25 times faster than remote

Page 17: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 17November 8, 2002

Experiments• Another experiment compared:

Our solution (flexibility optimal) Physical integration

(performance optimal)

• Data 1GB fact data based on TPC-H 10MB XML data 100 MB pre-aggregated data

• Decoration, grouping, selection queries

• All optimizations used• For ”typical” queries

The overhead of our solution was only 25-50%

• Some queries will perform badly

Page 18: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 18November 8, 2002

Related Work• Generic data integration

Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic

aggregation, dimension hierarchies and correct aggregation

• Query processing and optimization for: Data warehousing/OLAP Federated, distributed, and multi-databases Heterogeneous data XML and semi-structured data Does not address the special case of a federated OLAP

environment

• OLAP-object federations (own work) Simple form of inlining Does not consider cost-based optimization

Page 19: Query Optimization  For OLAP-XML Federations

DOLAP 2002 Workshop # 19November 8, 2002

Conclusion and Future Work• OLAP handles schema changes and dynamic data poorly• Solution

OLAP-XML federations Cost-based optimization needed

• Three techniques presented Inlining Optimized XML retrieval Caching and prefetching Made OLAP-XML federation perform very well

• Technology being implemented by OLAP query tool vendor• Future work

Further experimentation Capturing document order Getting measures from XML data