Upload
eric-gray
View
215
Download
2
Embed Size (px)
Citation preview
Right In Time
Presented By: Maria BaronWritten By: Rajesh GadodiaIntelligent EnterpriseFeb 7, 2004Vol. 7, Iss. 2; pg 26
Traditional Data Warehouse
Central repository of transactional data spread across heterogeneous platforms and applications
Focused on strategic reporting and analysis Loaded periodically (nightly, weekly, monthly) Information latency
Evolution of The Data Warehouse
First-generation Reporting
Second-generation Analytic processing and data mining Multidimensional tools for drill down
New generation Speed information cycle time Minimize latency Information on demand
Why Real Time Data Warehousing?
Active decision support Business activity monitoring (BAM) Alerting Efficiently execute business strategy Monitoring is completed in the background Positions information for use by downstream
applications Can be built on top of existing data
warehouse
Traditional Vs. Real-Time Data Warehouse
Traditional Data Warehouse (EDW) Strategic
Passive Historical trends
Batch Offline analysis
Isolated Not interactive
Best effort Guarantees neither availability nor performance
Traditional Vs. Real-Time Data Warehouse
Real-Time Data Warehouse (RTDW) Tactical
Focuses on execution of strategy Real-Time
Information on Demand Most up-to-date view of the business
Integrated Integrates data warehousing with business processes
Guaranteed Guarantees both availability and performance
Real-Time Integration
Goal of real-time data extraction, transformation and loading Keep warehouse refreshed Minimal delay
Issues How does the system identify what data has been
added or changed since the last extract Performance impact of extracts on the source
system
Real-Time Data Warehouse – Logical Architecture
Techniques for real-time ETL
Simulated real-time feed Increase the frequency of batch runs Most useful when information is not required to be
‘up to the minute’ Requires minimal changes to existing ETL
infrastructure Easy to implement
Techniques for real-time ETL
Trickle Feed Allows continuous update of the RTDW as the
data in the source system changes Messaging infrastructure Perpetually open data pipe Also called streaming Basic elements – Capture, Stage and Apply
Techniques for real-time ETL
Trickle feed (cont.) Target and source databases must be configured May require special gateways Source – capture process: automatically capture
changes to data or table structure RTDW records changes as logical change
records (LCRs) that are kept in a staging partition called the message queue
The message queue can be explicitly updated by user applications
Techniques for real-time ETL
Trickle feed Role of Target database A process takes the logical change records out of
the message queue and applies changes to selected database objects
Rules are set in message queues to handle data transformation
Require upfront development and can be complex to configure and manage
Trickle Feed Architecture for Real-Time load
Information Delivery
Changes to traditional data warehouse Need to accommodate continuous data trickle
feeds intermixed with liver user queries Schema design Active partition management Data aggregation
Designing an RTDW - Options
Trickle And Flip Copy of fact table is made and given a name that
cannot be accessed by queries As new data trickles in, it is appended to copy of
the fact table At certain intervals, the trickle is halted, the copy
fact table is copied, renamed to the active fact table name, (the active fact table is deleted) and the process starts over
Poses scalability problems – may not keep up with the trickle depending on the size of the table
Designing an RTDW - Options
Table Partitioning Allows for the creation of large tables that are
handled internally by the database as a series of smaller ones, each with its own indexes
Can rope off partition so it isn’t visible to active queries
Problem: Determining criteria for partitioning
Designing an RTDW - Options
Real-Time partitions Create new tables that resemble active fact tables
that are designed for quick updates Interval tables – contain data from only the last
update Truly real-time Can be accessed by analysts and other BI tools
Real-Time Partition
Conclusion
RTDWs have an a distinct advantage for those business utilizing time-sensitive data Call Centers Performance indicators Fraud detection Yield management Certain financial transactions