23
Data Warehouse Data Warehouse View Maintenance View Maintenance Presented By: Presented By: Katrina Salamon Katrina Salamon For CS561 For CS561

Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Data WarehouseData WarehouseView MaintenanceView Maintenance

Presented By:Presented By:

Katrina SalamonKatrina Salamon

For CS561For CS561

Page 2: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

What is a Data Warehouse?What is a Data Warehouse?

Repository of integrated informationRepository of integrated information

As information becomes available from a As information becomes available from a source it is added to the repositorysource it is added to the repository

Page 3: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

What is a View?What is a View?

A function from a set of base tables to a A function from a set of base tables to a derived tablederived table

Can be recreated every time the view is Can be recreated every time the view is accessedaccessed

Page 4: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

What’s a Materialized ViewWhat’s a Materialized View

A view where the tuples are stored in a A view where the tuples are stored in a database (or warehouse)database (or warehouse)

Can create indexes on themCan create indexes on them

Provides fast data accessProvides fast data access

Similar to a cacheSimilar to a cache

Page 5: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

What’s View Maintenance What’s View Maintenance

View data becomes out of date when base View data becomes out of date when base tables are changedtables are changed

Updating the view to reflect these changes Updating the view to reflect these changes is called view maintenance is called view maintenance

Page 6: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Sounds Easy Right!Sounds Easy Right!

Page 7: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Sounds Easy Right!Sounds Easy Right!

Page 8: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Here’s Why. . .Here’s Why. . .

Data sources are typically legacy systems Data sources are typically legacy systems and do not understand viewsand do not understand views

Sources can tell the warehouse there is Sources can tell the warehouse there is new data, but they cannot determine if any new data, but they cannot determine if any additional data is neededadditional data is needed

Page 9: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ExamplesExamples

Ideal World – new record is added to base Ideal World – new record is added to base relation and view is notified and updatedrelation and view is notified and updated

The Real WorldThe Real World– Maintenance Anomaly – trying to update Maintenance Anomaly – trying to update

a view while the underlying data is a view while the underlying data is changingchanging

Update AnomalyUpdate AnomalyDeletion AnomalyDeletion Anomaly

Page 10: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

The (Possible) SolutionsThe (Possible) Solutions

Recompute the viewRecompute the view

Store all relations involved in the Store all relations involved in the warehousewarehouse

Eager Compensating Algorithm (ECA)Eager Compensating Algorithm (ECA)

Page 11: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Recompute the ViewRecompute the View

When?When?

– Whenever an update occursWhenever an update occurs

– At a periodic intervalAt a periodic interval

Time and Resource intensive especially in Time and Resource intensive especially in a distributed environment (transferring of a distributed environment (transferring of data from one source to the other)data from one source to the other)

Page 12: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Storing Base RelationsStoring Base Relations

Keep up-to-date copies of all relations in Keep up-to-date copies of all relations in the warehouse, queries can be evaluated the warehouse, queries can be evaluated locally and no anomalies occurlocally and no anomalies occur

Takes up extra space in the warehouse, Takes up extra space in the warehouse, storing duplicate datastoring duplicate data

Copied relations still need to be updatedCopied relations still need to be updated

Page 13: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Eager Compensating AlgorithmEager Compensating Algorithm

Most promising solutionMost promising solution

– No duplicating base relations or No duplicating base relations or recomputing overheadsrecomputing overheads

All queries sent have compensating All queries sent have compensating queries added to them to offset concurrent queries added to them to offset concurrent updates to the source dataupdates to the source data

Page 14: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ECA cont. . . ECA cont. . .

Strongly Consistent Strongly Consistent – Upon competition of activity, view is Upon competition of activity, view is

consistent with base relationsconsistent with base relations– Every View state has a corresponding Every View state has a corresponding

state in the base relations and they are state in the base relations and they are completed in ordercompleted in order

Not complete Not complete – Every source state may not be reflected Every source state may not be reflected

in a view state (direct mapping)in a view state (direct mapping)

Page 15: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

How ECA Works - 4 basic eventsHow ECA Works - 4 basic events

1.1. Source executes an update (U) and Source executes an update (U) and notification is sent to the warehousenotification is sent to the warehouse

2.2. Warehouse receives update (U) and Warehouse receives update (U) and creates query (Q) to be evaluated by the creates query (Q) to be evaluated by the sourcesource

3.3. Source evaluates query (Q) against base Source evaluates query (Q) against base relations and sends answer (A) to relations and sends answer (A) to warehousewarehouse

4.4. Warehouse receives query result and Warehouse receives query result and updates viewupdates view

Page 16: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Resolving Anomalies Resolving Anomalies

Two Updates: Query1 is assumed to be Two Updates: Query1 is assumed to be computed before Update2 but is actually computed before Update2 but is actually computed after Update2computed after Update2– ECA knows that is happens and takes ECA knows that is happens and takes

Update2 into account when Updating the view Update2 into account when Updating the view by using a compensating query for each by using a compensating query for each query it createsquery it creates

Page 17: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Resolving IssuesResolving Issues

When using compensating queries, we When using compensating queries, we should not apply the results until after all should not apply the results until after all related queries have been receivedrelated queries have been received

If updates occurred after each query the If updates occurred after each query the view could temporally be in an invalid stateview could temporally be in an invalid state

To avoid invalid states ECA collects the To avoid invalid states ECA collects the intermediate answers in a relation called intermediate answers in a relation called Collect (initialized to empty set)Collect (initialized to empty set)

Page 18: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ExampleExample

Three insertions in to three base relations Three insertions in to three base relations and its affect on the view that references and its affect on the view that references themthem

Page 19: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ECA-KeyECA-Key

Used to streamline the algorithm when a Used to streamline the algorithm when a key from the base relations are available in key from the base relations are available in the Viewthe View

The The CollectCollect relation is initialized to current relation is initialized to current View and becomes a working copy of the View and becomes a working copy of the ViewView

Page 20: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ECA-Key AlgorithmsECA-Key Algorithms

Delete received, no query sent, delete is Delete received, no query sent, delete is directly applied to directly applied to CollectCollect

Insert received, query sent, no Insert received, query sent, no compensating queries created, answers compensating queries created, answers are added to are added to CollectCollect and duplicate values and duplicate values are ignored because of the keysare ignored because of the keys

Once completed the tuples in Once completed the tuples in CollectCollect replace the tuples for the Viewreplace the tuples for the View

Page 21: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

ECA - LocalECA - Local

Combines the compensating queries of Combines the compensating queries of ECA and the local updates of ECA-Key to ECA and the local updates of ECA-Key to create a more streamlined querycreate a more streamlined query

Maintaining order of execution of local and Maintaining order of execution of local and non-local processes is complicated and non-local processes is complicated and will create a greater over head then other will create a greater over head then other algorithmsalgorithms

Future work needs to be done to see if Future work needs to be done to see if this is a worthwhile approachthis is a worthwhile approach

Page 22: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Performance ComparisonPerformance Comparison

Total Bytes Transferredvs.

Cardinality of Relation

Total Bytes Transferredvs.

# of Source Updates

Page 23: Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

Review of ECAReview of ECA

Incremental updating approach, it doesn’t Incremental updating approach, it doesn’t start from scratch every timestart from scratch every time

No additional burden placed on sources No additional burden placed on sources (timestamps or locks)(timestamps or locks)

Compensating queries are only used when Compensating queries are only used when more then update is occurring, keeping more then update is occurring, keeping computation costs lowcomputation costs low