View
216
Download
1
Embed Size (px)
Citation preview
George Papastefanatos1, Panos Vassiliadis2,
Alkis Simitsis3 ,Yannis Vassiliou1
(1) National Technical University of Athens{gpapas,yv}@dbnet.ece.ntua.gr
(2) University of [email protected]
(3) IBM Almaden Research [email protected]
What-if Analysis for Data Warehouse Evolution
DaWaK'07, Regensburg, September 2007 2
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 3
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 5
ETL
WWW
Sources
Extract Transform & Clean
DW
Load
DSA
Act1
Act2
Act3
Act4
Act5
DaWaK'07, Regensburg, September 2007 7
Evolving ETL sources…
• Schema Changes on the sources of ETL processes. Design constructs are– Added, Removed, Modified
• ETL processes affected:– SyntacticallySyntactically – i.e., become invalid– SemanticallySemantically – i.e., query must conform to the new
source database semantics
• Adaptation of activities, queries and views– time-consuming task,treated in most of the cases
manually by the administrators/developers
DaWaK'07, Regensburg, September 2007 8
We would like to know…
• What part of the process is affected and how if e.g., an attribute is deleted?
• Can we predict the impact of changes? • To what extent can readjustment be automated? • Can we perform what-if analysis for potential
changes of source configurations?
DaWaK'07, Regensburg, September 2007 9
Contribution
• A general mechanism for performing what-if analysis for potential changes of ETL source configurations– A graph model for relations, queries, views, ETL activities, and
their significant properties
– A framework for annotating the graph with policies concerning the behavior of nodes in the presence of hypothetical changes.
– A set of rules that dictate the proper actions, when additions, deletions or updates are performed to relations, attributes, and conditions
– An experimental assessment of our proposal
DaWaK'07, Regensburg, September 2007 10
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 11
Query representation
Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours
FROM EMP, WORKS
WHERE EMP.Emp# = WORKS.Emp#
GROUP BY EMP.Emp#
JoinJoin
DaWaK'07, Regensburg, September 2007 12
Query representation
Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours
FROM EMP, WORKS
WHERE EMP.Emp# = WORKS.Emp#
GROUP BY EMP.Emp#
JoinJoin
DaWaK'07, Regensburg, September 2007 13
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 14
Annotating the graph with adaptation rules
According to prevailing policy, the proper action is taken graph transformation
Set of evolving database constructs:• relations• attributes• constraints
11
Set of potential evolution changes: • addition• deletion• modification
22
Set of reaction policies: • propagate• block• prompt
33
DaWaK'07, Regensburg, September 2007 15
Annotating the graph with adaptation rules
• Assuming that a graph construct is annotated with a policy for a particular event (e.g., an activity node is tuned to deny deletions of its provider attributes) : – (a) it performs the identification of the affected
subgraph
– (b) if the policy is appropriate, it automates the readjustment of the graph to fit the new semantics imposed by the change.
DaWaK'07, Regensburg, September 2007 16
Query Adaptation - Example
from
map-select
S
Q
SS
EMP
Emp# NameEmp#
NameS
map-select
...
On attribute addition To EMPTHEN propagate
Annotated Query Graph Event
Add attribute Phone to EMP relation
Phone
from
map-select
S
Q
S SS
EMP
PhoneEmp# NameEmp#
NameS
map-select
...
On attribute addition To EMPTHEN propagate
S
map-select
Transformed Query Graph
Q: SELECT EMP.Emp#, EMP.Name
FROM EMP
Q’: SELECT EMP.Emp#, EMP.Name, EMP.Phone
FROM EMP
DaWaK'07, Regensburg, September 2007 17
Algorithm Propagate changeS (PS)
Input: an ETL summary S over a graph Go=(Vo,Eo) and an event eOutput: a graph Gn=(Vn,En)Variables: a set of events E, and an affected node A Begin dps(S, Go, Gn, {e}, A)End
dps(S, Gn, Go, E, A) { I = Ins_by_policy(affected(E)) D = Del_by_policy(affected(E)) Gn = Go – D I E = E–{e}action(affected(E)) if consumer(A)nil for each consumer(A) dps(S,Gn,Go,E,consumer(A))}
DaWaK'07, Regensburg, September 2007 18
Conflict resolution
• Graph constructs may have contradictory policies for the same event
Policies defined on query graph structures are stronger than policies defined on view graph structures which in turn prevail on policies defined on relation graph structures
RuleRule
Query Module
View Module
Relation Module
Query Conditions
Query Attributes
Query
View Conditions
View Attributes
View
Relation Conditions
Relation Attributes
Relation
User’s Choice
DaWaK'07, Regensburg, September 2007 19
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 20
Evaluation of our framework
• Reverse engineering of real-world ETL processes, extracted from an application of the Greek public sector
• Monitored the changes that took place to the sources of the studied data warehouse
• Performed what-if analysis for several evolution scenarios
DaWaK'07, Regensburg, September 2007 21
Configuration of our setting
7 ETL processes
S1 Tmp1
L1 L2 Tmp2
Join
S2
T1
T2
Add field
Load to DSA
γ V3
Load to DSA
π
π Add field
Patch text fields
Sources DW Data Staging Area (DSA)
Patch text fields
53 ETL activities3 lookup tables
7 source tables 9 target tables
DaWaK'07, Regensburg, September 2007 22
Configuration of our setting
• evolution events– renaming source tables,
– renaming attributes of source tables,
– adding and deleting attributes from source tables,
– modifying the domain of attributes
– changing the primary key of lookup tables
DaWaK'07, Regensburg, September 2007 23
Measurements
• For each event, we counted: – (a) the number of activities affected both semantically
and syntactically,
– (b) the number of activities, that have automatically been adapted by our framework (propagate or block policies) as opposed to those
– (c) that required administrator’s intervention (i.e., a prompt policy)
DaWaK'07, Regensburg, September 2007 24
Adapted activities w.r.t. the ETL scenario size
0
10
20
30
40
50
60
5 5 5 7 7 12 12
scenario size
# A
cti
vit
ies
Affected
Adapted
DaWaK'07, Regensburg, September 2007 25
Adapted activities w.r.t. the complexity of activities.
0
2
4
6
8
10
12
14
16
1 2 3 4 5
complexity
#a
cti
vit
ies
Affected
Adapted
DaWaK'07, Regensburg, September 2007 26
Outline
• Motivation
• Graph-based modeling of ETL
• Adapting ETL workflows
• Evaluation
• Conclusions
DaWaK'07, Regensburg, September 2007 27
Summary
• A uniform representation for modeling relations, queries, views and ETL activities
• A framework for annotating the graph with policies concerning the behavior of nodes in the presence of hypothetical changes
• A set of rules that dictate the proper adaptation actions, when evolution events are performed to ETL sources
• An experimental assessment of our proposal
DaWaK'07, Regensburg, September 2007 28
On-going/Future Work
• Hecataeus: A tool for visualizing and performing what-if analysis for several evolution scenarios.
• SQL extensions for annotating graph constructs with evolution semantics
• Patterns of evolution sequences
DaWaK'07, Regensburg, September 2007 29http://www.cs.uoi.gr/~pvassil/projects/architecture_graph/
Danke schön!Questions?
DaWaK'07, Regensburg, September 2007 32
Related work
• DB schema Evolution– Schema Versioning
• DW schema Evolution
• Materialized View Evolution
• Evolution wrt Model Mappings
DaWaK'07, Regensburg, September 2007 33
Scenario 1
S1 Tmp1
L1 L2 Tmp2
Join
S2
T1
T2
Add field
Load to DSA
γ V3
Load to DSA
π
π Add field
Patch text fields
Sources DW Data Staging Area (DSA)
Patch text fields
DaWaK'07, Regensburg, September 2007 34
Evolution Metadata
• Metadata Repository maintaining– Graph constructs– Annotations– What if analysis scenarios
DaWaK'07, Regensburg, September 2007 35
Extending SQL With Evolution Semantics
• Used for annotating graph constructs
ON <event> TO <element> THEN <policy>
E.g.E.g.
SELECT Emp#, NAME, AGE
FROM V
ON condition addition TO V THEN propagate,
ON attribute deletion TO V.AGE THEN block
DaWaK'07, Regensburg, September 2007 36
Hecataeus
• A tool for visualizing and performing what-if analysis for several evolution scenarios
DaWaK'07, Regensburg, September 2007 37
Adapted activities per Event
Affected Activities Per Event
0
10
20
30
40
50
60
70
80
RenameTable
RenameAttributes
AddAttributes
DeleteAttributes
ModifyAttributes
Change PK
Affected
Adapted
DaWaK'07, Regensburg, September 2007 38
Evolution changes occurred on source and lookup tables
Activities (53) Source Name Event Type
Affected Autom. adjusted Prompt % S1 Rename 14 14 0 100% Rename Attributes 14 14 0 100% Add Attributes 34 31 3 91% Delete Attributes 19 18 1 95% Modify Attributes 18 18 0 100%
S4 Rename 4 4 0 100% Rename Attributes 4 4 0 100% Add Attributes 19 15 4 79% Delete Attributes 14 12 2 86% Modify Attributes 6 6 0 100%
S2 Rename 1 1 0 100% Rename Attributes 1 1 0 100% Add Attributes 4 4 0 100% Delete Attributes 4 3 1 75%
S3 Rename 1 1 0 100% Rename Attributes 1 1 0 100%
S5 Modify Attributes 3 3 0 100% S6 NO_CHANGES 0 0 0 - S7 Rename 1 1 0 100% Rename Attributes 1 1 0 100%
L1 NO_CHANGES 0 0 0 - L2 Add Attributes 1 0 1 0% L3 Add Attributes 9 0 9 0% Change PK 9 0 9 0%
DaWaK'07, Regensburg, September 2007 39
Annotation of graph constructs with policies for kinds of events
parts of the system affected nodes annotated with policies edges annotated with policies
event on database schema
R/V R/V Attr.
R/V Cond.
Q/V Q/V Attr.
Q/V Cond.
R Ω V C Map-Select
Op From GB/OB
Ω
C Add
R/V
Ω
C Delete
R/V
Ω
C Modify/Rename
R/V