Upload
belinda-fitzgerald
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Data Warehouse
A read-only database for decision analysis
Subject Oriented Integrated Time variant Nonvolatile
consisting of time stamped operational and external data.
Data Warehouse vsOperational Databases Highly tuned Real time Data Detailed records Current values Accesses small
amounts of data in a predictable manner
Flexible access Consistent timing Summarized as
appropriate Historical Access large
amounts of data in unexpected ways
Data Warehouse Purpose Identify problems in time to avoid
them Locate opportunities you might
otherwise miss
Data Warehouse:New Approach
An old idea with a new interest because of:
Cheap Computing PowerSpecial Purpose Hardware
New Data StructuresIntelligent Software
Three Approaches
Classical Enterprise DatabaseContains operational data from all areas of the organization.
Data MartExtracted and managerial support data designed for departmental or EUC applications
Data PackageData required for a specific application
Source Archived data
Extraction Batch extraction programs
Data Atomic transaction data
Tool VLDB technology
Analysis IT driven software
Classical Warehouse
Mart
Source Deposit or External sources
Extraction Batch summary
Data Designed departmental database
Tool OLAP, ROLAP, MDBMS
Analysis IT driven or trained user
Package
Source Mart
Extraction Sample and summary
Data Problem specific dataset
Tool PC tools
Analysis Trained user
Data Acquisition Handles acquisition of data from
legacy systems and outside sources.
Data is identified, copied, formatted and prepared for loading into the warehouse.
Acquisition steps Catalog the data
Develop an inventory of where it is and what it means.
Clean and prepare the data. Extract from legacy files and
reformat to make it usable. Transport data from one location
to another.
Storage
The storage component holds the data so that the many different data mining, executive information and decision support systems can make use of it effectively.
The Storage Area
Managed by Relational databases
like those from Oracle Corp. or Informix Software Inc.
Specialized hardware symmetric multiprocessor (SMP) or massively parallel processor
(MPP) machines
Storage The majority of warehouse storage
today is being managed by relational databases running on Unix platforms.
Oracle, Sybase Inc., IBM Corp. and Informix control 65 percent of the warehouse storage market. Meta Group Inc. (1996)
Access Different end-user PCs and workstations
draw data from the warehouse with the help of multidimensional analysis products, neural networks, data discovery tools or analysis tools.
These powerful, "smart" software products are the real driving force behind the viability of data warehousing.
Access Tools Intelligent Agents and Agencies Query Facilities and Managed Query
Environments Statistical Analysis Data Discovery.
(decision support, artificial intelligence and expert systems)
OLAP Data Visualization
Hardware Budget A typical startup warehouse
project allocates more than 60 percent of its budget for hardware and software to the creation of a powerful storage component, spending just 30 percent on data mining and user access technologies.
Systems Analysis BudgetBudgeting for systems analysis and development, however, follows a very different pattern.
More than 50 percent of development dollars are spent on building acquisition capabilities,
30 percent fund the development of user solutions and
20 percent are dedicated to the creation of databases in the storage component.
Design Issues
Relational and Multidimensional Models
Denormalized and indexed relational models more flexible
Multidimensional models simpler to use and more efficient
Star Schemas in a RDBMS In most companies doing ROLAP, the DBAs have created countless indexes and summary tables in order to avoid I/O-intensive table scans against large fact tables. As the indexes and summary tables proliferate in order to optimize performance for the known queries and aggregations that the users perform, the build times and disk space needed to create them has grown enormously, often requiring more time than is allotted and more space than the original data!
Building a Data Warehouse from a Normalized DatabaseThe steps Develop a normalized entity-relationship
business model of the data warehouse. Translate this into a dimensional model.
This step reflects the information and analytical characteristics of the data warehouse.
Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives.
The Business Model
Identify the data structure, attributes and constraints for the client’s data warehousing environment.
Stable Optimized for update Flexible
Business ModelAs always in life, there are some
disadvantages to 3NF: Performance can be truly awful. Most of
the work that is performed on denormalizing a data model is an attempt to reach performance objectives.
The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data.
Structural Dimensions The first step is the development of the
structural dimensions. This step corresponds very closely to what we normally do in a relational database.
The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions.