View
218
Download
1
Category
Preview:
Citation preview
The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of
Dependencies
Aris Tsois Timos SellisKnowledge and Database Systems Laboratory
National Technical University of Athens, Hellas
29th International Conference on Very Large DatabasesSeptember 9-12, 2003
Updated version: DBLAB presentation 14/10/2003
A. TsoisNTUA, 2003
Motivation DW & OLAP require fast answers to heavy
aggregate queries Recent approach: Multidimensional Hierarchical
Clustering & Hierarchical Indexing (MHC/HI) [IDEAS’99]
Indexed access to MD clustered data – UB-tree Results in reduced number of I/O operations
MHC/HI is extremely effective but we can do even more in order to achieve efficiency in query processing
One such technique is Hierarchical Pre-Grouping
A. TsoisNTUA, 2003
The Hierarchical Pre-Grouping Karayannidis et al. [VLDB’02], Pieringer et al.
[ICDE’03] Goal: reduce the cost of join operations in star-
join queries on MHC/HI data bases The join of large parts of the fact table with large
dimension tables is very expensive Exploits the existence of hierarchical
surrogates in the fact table Pushes-down the aggregation operations as
early as possible Even if this introduces a new aggregation operation
Delays (pulls-up) the joins Expect to have reduced input size
Removes redundant joins
A. TsoisNTUA, 2003
Example schema
sales
SALES_FACT
store_key
date_key
product_key
PRODUCT
category
brand
product
DATE
month
year
day
STORE
area
region
storehsk
prod_hsk
store_hsk
date_hsk
1
2
1
2
3
1
2
3
hsk
hsk
A. TsoisNTUA, 2003
The hierarchical surrogates
sales
SALES_FACT
store_key
date_key
product_key
PRODUCT
category
brand
product
DATE
month
year
day
STORE
area
region
storehsk
prod_hsk
store_hsk
date_hsk
1
2
1
2
3
1
2
3
hsk
hsk
07
13
0713
95
27
03
952703
952703 952703
A. TsoisNTUA, 2003
Example query
sales
SALES_FACT
store_key
date_key
product_key
PRODUCT
category
brand
product
DATE
month
year
day
STORE
area
region
storehsk
prod_hsk
store_hsk
date_hsk
hsk
hsk
SUM(sales)
areaarea
brandbrand
month
SUM(sales)
Grouping attributesSelected attributes
Join
SELECT SUM(sales), brand, areaFROM SALES_FACT, STORE, PRODUCT, DATE WHERE <join conditions>GROUP BY brand, area, month
A. TsoisNTUA, 2003
Simple execution plan
Join
Join
Join
DATE PRODUCT STORESALES_FACT
Group By & Aggregatebrand area
brandGroup By: monthbrand areamonth
s=SUM(sales)
date_hsk:month
Can we avoid
some of the joins?
A. TsoisNTUA, 2003
Optimized plan (#1)
Join
Join
PRODUCT STORESALES_FACT
Group By & Aggregatebrand area
brandGroup By: monthbrand areadate_hsk:month
s=SUM(sales)
store_hsk:area
Can we delay a join
for after the
aggregation?
A. TsoisNTUA, 2003
Optimized plan (#2)
Join
Join
PRODUCT STORESALES_FACT
Group By & Aggregatebrand store_hsk:area
brandGroup By: monthbrand areadate_hsk:month
s=SUM(sales)
store_hsk:area
areabrands
hsk:areaGroup By & Aggregate
hsk:area area
area
Can we make the join work on a smaller intermediate result?
A. TsoisNTUA, 2003
Optimized plan (#3)
Join
Join
PRODUCT STORESALES_FACT
Group By & Aggregatebrand store_hsk:area
brandGroup By: monthbrand store_hsk:areadate_hsk:month
s=SUM(x)
areabrands
hsk:areaGroup By & Aggregate
hsk:area area
areaGroup By & Aggregate
prod_hsk store_hsk:area
brandprod_hsk store_hsk:areadate_hsk:month
x=SUM(sales)
A. TsoisNTUA, 2003
Hierarchical Pre-Grouping Classification of the effects:
Remove a join with a dimension table (DATE) Postpone a join for after the grouping operation
(STORE) Introduce an additional grouping operation before all
joins thus creating a two-stage grouping process (PRODUCT)
Experimental results show an important impact:
Reduces response time by more than 50% - 75% (Karayannidis et al. [VLDB’02], Pieringer et al. [ICDE’03])
A. TsoisNTUA, 2003
Motivating questions Can Pre-Grouping be applied to other database
schemata without h-surrogates?
What are the precise conditions required to apply the transformations done by Pre-Grouping?
Is Pre-Grouping a combination of known optimization techniques or does it introduce some novelty?
A. TsoisNTUA, 2003
Main results Define the Generalized Pre-Grouping as an
algebraic transformation: E1=E2 Using Select (), Cross-product () and Generalized
Projection (Л) operators Decompose Pre-Grouping into a sequence of more
simple transformations Analyze the relationship between Pre-Groupings and
other known transformations Clarify which transformations use semantic information
(I.C.) Establish the importance of the Surrogate-Join
transformation Identify sufficient conditions for applying the
various transformations Use relations with bag semantics and NULL values
A. TsoisNTUA, 2003
SuSu
SuGroup By & Aggregate
A=F(Ag) Su
Sd3
Generalized Pre-Grouping (1)
Join
…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3
Sd3 SKd3 …Kd3Rd3
Ru
Case #1: Remove redundant join
Hu3
Kd3
Sd3SHu3
Sd3
I1: Sd3SKd3
I2: {Kd3,SKd3}{Hu3,SHu3}
I1
I3: Kd3 key of Rd3
Kd3
I3
I2Hu3 SHu3SHu3
Su
A. TsoisNTUA, 2003
SuSu
SuGroup By & Aggregate
A=F(Ag) Su
Sd2
Generalized Pre-Grouping (2)
Join
…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3
Sd2 SKd2 …Kd2Rd2
Ru
Case #2: Delay a join
Hu2
Kd2
Sd2SHu2
Sd2
I6: Sd2SKd2
I4: {Kd2,SKd2}{Hu2,SHu2}
I6
I5: Kd2 key of Rd2
Kd2
I5
SHu2I4
Hu2 SHu2
Su
Sd2SHu2
Join
SuA
Group By & AggregateSd2 SKd2
Sd3SKd2
Sd2
A. TsoisNTUA, 2003
Generalized Pre-Grouping (3)
Join
Group By & AggregateSu Sd1
Sd1Su
…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3 Su Sd1 SKd1 …Kd1Ru Rd1
Case #3: Split aggregation into two stages
SKd1Group By & Aggregate
Sd1 SKd1
A=F(Ag)A=F(AgO(x))
Group By & Aggregatex=F(Ag) Su SHu1
SHu1Su
I8: {Kd1,SKd1}{Hu1,SHu1}
I7: Ag(z)=Ago(Ag(z))
Kd1
I9: Kd1 key of Rd1
SKd1Sd1
A. TsoisNTUA, 2003
Generalized Pre-Grouping The combination of all three cases define the
Generalized Pre-Grouping The decomposition proves:
A set of sufficient conditions for applying the Generalized Pre-Grouping transformation
The relationship to other known transformations The usage of semantic information
The Generalized Pre-Grouping uses Surrogate-Join to modify the join conditions
A. TsoisNTUA, 2003
Surrogate-Join transformation
Join
Join
A B
Group By & AggregateSK
B SKSH A
SK B PKH SH A O
A B
H SH A O
, 1 2
, , 1 , 2
( )
( ( ) ( ))
A B H K
SKA B SH SK A SH B SK
R × R
R R
Л
Л Л Л
SK B PK
R1 R2
R1
R2
K
A. TsoisNTUA, 2003
Surrogate-Join example
Join
PageID PageHits ServerID ServerHits
Page_Server_Hits
PageID ServerID Hour HourPageHits
PageID HourPageHits/ServerHitsHour
Page_Hour_Hits
Group By & AggregateServerID
ServerID ServerHits
SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM (SELECT DISTINCT ServerID, ServerHits
FROM Page_Server_Hits) s, Page_Hour_Hits hWHERE s.ServerID=h.ServerID
SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM Page_Server_Hits s, Page_Hour_Hits hWHERE s.PageID=h.PageID
A. TsoisNTUA, 2003
Bad news: Surrogate-Join can be described as a conjunctive query transformation
H SH A OR1
Join
A B
SK BP KR2
SK BP KR21SK
Join
R22
A. TsoisNTUA, 2003
Bad news: Surrogate-Join can be described as a conjunctive query transformation
H SH A OR1
Join
A B
SK BP KR21SK
Join
R22
H=K & SH=SK
R1 R21
R22
H=K & SH=SK
SKSH=SK
SKK SK
A. TsoisNTUA, 2003
Bad news: Surrogate-Join can be described as a conjunctive query transformation
H SH A OR1
Join
A B
SK BP K SK R22
SH=SK
SKK SKR21
A. TsoisNTUA, 2003
Conclusions The Pre-Grouping transformation is a mixture
of known and new transformations The Generalized Pre-Grouping can be applied
in the absence of h-surrogates using only SQL integrity constraints
The Surrogate-Join transformation is an important ingredient of Pre-Grouping. It exploits functional and inclusion dependencies
Semantic Query Optimization techniques are particularly effective in the DW & OLAP areas
A. TsoisNTUA, 2003
Contact
S P Q G1 J C HG3G2
Aris TsoisKnowledge and Database Systems LaboratoryNational Technical University of Athens, Hellas
e-mail: atsois@dblab.ece.ntua.grURL: http://www.dblab.ece.ntua.gr/~atsois/
Long version (TR-2003-4) available at:http://www.dblab.ece.ntua.gr/publications/TR-2003-4.pdf
Recommended