Upload
temira
View
42
Download
0
Embed Size (px)
DESCRIPTION
Optimizing Refresh of a Set of Materialized Views. N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bellamkonda, S. Shankar, T. Bozkaya, L. Sheng. Oracle Corporation. Overview. Shapes of MVs for analysis: Rollup & Federated Cubes Improvements in refresh of a single MV - PowerPoint PPT Presentation
Citation preview
Optimizing Refresh of a Set of Materialized Views
N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bellamkonda, S. Shankar, T. Bozkaya, L. Sheng
Oracle Corporation
Overview Shapes of MVs for analysis: Rollup &
Federated Cubes Improvements in refresh of a single MV Scheduling refresh of a set of MVs
Shapes of MV for Analysis Queries
– Ask for aggregations on different hierarchy levels
– Rank, using window functions, within partitions– Rewrite a major performance tool– Hence, we need materialized views
representing cubes DW Maintenance of schema
– Mostly partition based (add, drop, exchange, truncate)
– Corrections done using DML touching few partitions
– Hence, we need partition based refresh
Example – Hierachical Star Schema– Sales(city, day, amt) – fact table. Assume partition by time
– Times(day, month, quart, year) – hierarchical dimension– Geog(city, state, region)- hierarchical dimension
ROLLUP CUBE – A grouping for each level, in each hierarchy– All groupings in a single MV– Easy to Manage– Variation where as many cubes as time attributes
CREATE MATERIALIZED VIEW rollup_cube_MV ASSELECT t.year, t.quart, t.month, t.day, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY ROLLUP (t.year, t.quart, t.month, t.day) ROLLUP (g.region, g.state, g.city)
ROLLUP CUBE – A grouping for each level, in each hierarchy– All groupings in a single MV– Easy to Manage– Variation where as many cubes as time attributes
CREATE MATERIALIZED VIEW year_rollup_cube_MV ASSELECT t.year, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY year, ROLLUP (g.region, g.state, g.city)
CREATE MATERIALIZED VIEW quart_rollup_cube_MV ASSELECT t.quart, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY quart, ROLLUP (g.region, g.state, g.city)
Federated Cube– Each groupings in a separate MV– MVs with a time attribute candidates for partitioning– Flexibility of partitioning– Indexes smaller than ROLLUP CUBE (no NULL
indexing)– Flexible “MV_size – query response” trade-off
CREATE MATERIALIZED VIEW quart_state_MV ASPARTITION BY RANGE (quart) ( PARTITION VALUES LESS THEN ‘Q1 03’ PARTITION VALUES LESS THEN ‘Q2 03’……)SELECT t.quart, g.state, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY t.quart, g.state
Intuition for Partition refresh
Jan
03Fe
b 03
Mar
03
Apr
03
Oct
04
Nov
04
Dec
04
Q1
03Q
2 03
Q3
04Q
4 04
2003
2004
Q3
03Q
4 03
Q1
04Q
2 02
… Sales PBY month on day
quart_state_MV PBY quart
year_state_MV PBY quart
fact
MV
MV
Intuition for Partition refresh - Drop
Jan
03Fe
b 03
Mar
03
Apr
03
Oct
04
Nov
04
Dec
04
Q1
03Q
2 03
Q3
04Q
4 04
2003
2004
Q3
03Q
4 03
Q1
04Q
2 02
… Sales PBY month on day
quart_state_MV PBY quart
year_state_MV PBY quart
fact
MV
MV
Intuition for Partition refresh – Add
Feb
03M
ar 0
3A
pr 0
3
Oct
04
Nov
04
Dec
04
Q1
03Q
2 03
Q3
04Q
4 04
2003
2004
Q3
03Q
4 03
Q1
04Q
2 02
… Sales PBY month on day
quart_state_MV PBY quart
year_state_MV PBY quartJa
n 05
fac t
MV
MV
Intuition for MV refresh – Maintain
Feb
03M
ar 0
3A
pr 0
3
Oct
04
Nov
04
Dec
04
Q1
03Q
2 03
Q3
04Q
4 04
2003
2004
Q3
03Q
4 03
Q1
04Q
2 02
… Sales PBY month on day
quart_state_MV PBY quart
year_state_MV PBY quartJa
n 05
Q1
05
2005
fac t
MV
MV
Refresh Expressions Conventional Refresh Expressions Partition Based Refresh Expressions
Conventional Refresh – Use MV log
Jan
03Fe
b 03
Mar
03
Q1
03
delta_sales s
Maintenance Steps• Delete delta logged in MV log
delta
sal
e
…fac t
MV
mv
log
Conventional Refresh – Use MV log
Feb
03M
ar 0
3Q
1 03
(SELECT t.quart, g.state, sum(amt) amt FROM delta_sales s, geog g, times t WHERE s.day = t.day & s.city = g.city GROUP BY quart, state) delta
Maintenance Steps• Delete delta logged in MV log• Calculate aggregate on the delta
delta
sal
e
…fac t
MV
mv
log
Conventional Refresh – Use MV log
Feb
03M
ar 0
3Q
1 03
UPDATE (SELECT m.quart, m.state, m.amt, delta.amt delta_amt FROM quart_state_MV m, (SELECT t.quart, g.state, sum(amt) amt FROM delta_sales s, geog g, times t WHERE s.day = t.day & s.city = g.city GBY..) delta WHERE m.quart = delta.quart & m.state=delta.state) SET amt = amt – delta_amt
Maintenance Steps• Delete delta logged in MV log• Calculate aggregate on the delta• Join MV with the delta MV• Update MV using delta MV
delta
sal
e
…fac t
MV
mv
log
Partition Change Tracking The base fact table is partitioned RDBMS keeps track of affected partitions
– After partition operations (truncate, add, exchange, etc.)– After a DML (Insert, Delete, Update) to a partition
Can use instead of MV logs, but with high granularity
Jan
03Fe
b 03
Mar
03
Apr
03
Oct
04
Nov
04
Dec
04
…
Single row deleteTruncate partition
Partition Based RefreshJa
n 03
Feb
03M
ar 0
3Q
1 03
CREATE MATERIALIZED VIEW quart_state_MV ASPARTITION BY RANGE (quart) ( PARTITION VALUES LESS THEN ‘Q1 03’ PARTITION VALUES LESS THEN ‘Q2 03’……)SELECT t.quart, g.state, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY quart, state
• Fact table partitioned• MV partitioned • MV partitions determined by Fact partitions • Partition – join dependent columns in MV
…
Dec
04
Q1
04…fact partition columnmv partition column
fact
MV
Partition Based RefreshJa
n 03
Feb
03M
ar 0
3Q
1 03
Maintenance Steps• Determine Affected MV partitions
…
Dec
04
Q1
04
01-0
1-20
03
01-0
2-20
03
(SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)
fac t
MV
Partition Based RefreshFe
b 03
Mar
03
Maintenance Steps• Determine Affected MV partitions• Delete / Truncate the partitions
…
Dec
04
Q1
04
DELETE quart_state_MVWHERE quart IN (SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)
fac t
MV
Partition Based RefreshFe
b 03
Mar
03
INSERT INTO quart_state_MVSELECT t.quart, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.quart IN (SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY quart, state
Maintenance Steps• Determine Affected MV partitions• Delete / Truncate the partitions• Insert recalculated partitions
…
Dec
04
Q1
04
Q1
03
fac t
MV
Rewrite During RefreshINSERT INTO year_state_MVSELECT t.year, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.year IN (SELECT t.year FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state
2003
2004
Feb
03Ja
n 03
Dec
03
…
MV
fac t
Rewrite During RefreshINSERT INTO year_state_MVSELECT t.year, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.year IN (SELECT t.year FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state
Q2
03
INSERT INTO year_state_MVSELECT t.year, mv.state, sum(mv.amt) amtFROM quart_state_mv, (SELECT DISTINCT quart, year FROM t) tWHERE mv.quart = t.quart mv.quart IN (SELECT t.quart FROM times t WHERE t.day >=’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state
2003
2004
Q1
03
Q3
03Fe
b 03
Jan
03
Dec
03
…
MV
MV
fac t
Efficiency of Partition Refresh Truncate is very fast. No undo logs, just dictionary op
INSERT into an MV partition uses fast path with no logging, faster then conventional INSERT or UPDATE
Scan of a base partition may be more efficient than scan of MV logs (as they may have to keep more data)
If more than one partition of MV affected, we use MULTI-TABLE INSERT instead of multiple INSERTs. Scans the base data once
We use new optimization - Dynamic Partition Pruning
Generated refresh expressions automatically rewritten against already fresh MVs. Refresh expressions more suitable for rewrite.
Choosing Optimal Refresh Method Partition Based Refresh may remove and recalculate more data in MV than conventional refresh
Cost based generation of refresh expressions
– Conventional refresh expressions using MV logs
– Partition Refresh with DELETE from MV– Partition Refresh with TRUNCATE of MV
partition Use Optimizer to estimate their cost and
chose the cheapest method
Building Refresh Schedule Usage of rewrite for schedule building Resource allocation for concurrent execution
of the schedule
Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph
• Nodes = Set of MVs to refresh
day_city_mv
month_city_mv
year_region_mv
day_state_mv
day_region_mvquart_city_mv
…
Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph
• Nodes = Set of MVs to refresh• Edges = Optimizer determined best rewrite for MV
day_city_mv
month_city_mv day_state_mv
day_region_mvquart_city_mv
…cost=10
year_region_mv
Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph
• Nodes = Set of MVs to refresh• Edges = Optimizer determined best rewrite for MV
day_city_mv
month_city_mv day_state_mv
day_region_mvquart_city_mv
…
cost=100cost=10
year_region_mv
Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph
• Nodes = Set of MVs to refresh• Edges = Optimizer determines best rewrite for MV
day_city_mv
month_city_mv day_state_mv
day_region_mvquart_city_mv
…cost=10
year_region_mv
Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph
• Nodes = Set of MVs to refresh• Edges = Optimizer determines best rewrite for MV• Result – acyclic graph with cost. (Cycles removed using SSC)
day_city_mv
month_city_mv day_state_mv
day_region_mvquart_city_mv
…
cost=100
cost=10
cost=20cost=40
year_region_mv
Execution of Schedule - ResourcesAssign resources (processes for now) to the graph
• If cost > threshold, assign all processes to the refresh
cost=100. (3)
cost=10
cost=20cost=40
cost=20
cost=10
11 11
4
2 3
6
5
processors
Execution of Schedule - ResourcesAssign resources (processes for now) to the graph
• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost
cost=100. (3)
cost=10
cost=20. (1)cost=40. (2)
cost=20
cost=10
11 11
4
2 3
6
5
processors
322
Execution of Schedule - ResourcesAssign resources (processes for now) to the graph
• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost
cost=100. (3)
cost=10. (1)
cost=20. (1)cost=40. (2)
cost=20. (2)
cost=10
11 11
4
2 3
6
5
processors
322
544
Execution of Schedule - ResourcesAssign resources (processes for now) to the graph
• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost
cost=100. (3)
cost=10. (1)
cost=20. (1)cost=40. (2)
cost=20. (1)
cost=10
11 11
4
2 3
6
5
processors
322
354
Execution of Schedule - ResourcesAssign resources (processes for now) to the graph
• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost
cost=100. (3)
cost=10. (1)
cost=20. (1)cost=40. (2)
cost=20. (2)
cost=10
11 11
4
2 3
6
5
processors
322
544
666
Execution of Schedule - OthersAssign resources (processes for now) to the graph
• How to get cost and thus schedule for refresh of empty MVs, • Use query defining the MV to get an estimate of its cost
cost=100
cost=10
cost=20cost=40
cost=20
cost=10
Performance APB schema with 5.0 density
– Fact with four hierarchical dimensions – 62 M rows– Dimensions (channel (2 levels), time (3), cust (3), prod
(7))– Federated full cube with 350 M rows
System 24 CPUs with total of 24 GB shared memory
Complete refresh– 6 times performance improvement over Oracle 9i
Incremental refresh – 1 month (3.5 M rows) added– 5 times performance improvement over Oracle 9i
Partition Based vs Conventional Refresh
50
100
150
200
250
300
350
0 1000 3000
400
Rows inserted
Incremental refresh time
2000
Conventional
Partition Based with Delete
Partition Based with Truncate
Summary Improvements in refresh of a single MV
– Algorithms use partitioning of based tables and/or MVs
– Costing of Multiple refresh methods using Optimizer
– Dynamic Partition Pruning– MV rewrite during refresh
Scheduling refresh of a set of MVs– Usage of rewrite for schedule building– Resource allocation for concurrent execution of
the schedule
Summary MVs can be partitioned for more efficient
refresh Partition based refresh very suitable for
cubes partitioned by time granules (good rewrites)
Federated cube offers good compromise size vs refresh & rewrite time if not too many groupings (less then 100)
Need better rewrite strategy for Federated cubes with many groupings (optimizer needs to do intelligent pruning of mvs)
AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S