Upload
vasanth-kumar-c
View
85
Download
1
Embed Size (px)
Citation preview
PRESENTED PRESENTED PRESENTED PRESENTED BY BY BY BY VASANTHKUMAR C
1DA12CS118
VEERABHADRAPPA KS
1DA12CS120
DWH
Dr. AMBEDKAR INSTITUTE OF TECHNOLOGY
� Loosely speaking, a data warehouse refers to a database that ismaintained separately from an organization’s operationaldatabase
� practical interest in many applications such Decision Makingin Companies by higher order database Administrators, DataAnalysis…etc
� selection & dealing successfully with particular queries givesbetter results overall.
DWH
DWH
INTRODUCTION
DATA WAREHOUSE vs OLTP
DATA WAREHOUSE vs DATA MARTS
DISCUSSION(Data Warehouse termino logy)
METHODOLGY
ETL LIFE CYCLE
FUTURE ENHANCEMENTS
DWH
Data Warehouse
Concepts
And
ETL Tool
INTRODUCTION
What is a Data Warehouse?
A single, complete andconsistent store of dataobtained from a variety ofdifferent sources madeavailable to end users in awhat they can understandand use in a businesscontext.
[Barry Devlin]
Definition of a Data Warehouse
“ An enterprise structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support. The data warehouse stores atomic
and summary data.”
DWH
7
Warehouses are Very Large Databases
35%
30%
25%
20%
15%
10%
5%
0%
5GB
5-9GB
10-19GB 50-99GB 250-499GB
20-49GB 100-249GB 500GB-1TB
Initial
Projected 2Q96
Source: META Group, Inc.
Re
sp
on
de
nts
Data Warehouse Properties
Data
Warehouse
Integrated
Time VariantNon Volatile
Subject
Oriented
DWH
Subject-Oriented
Data is categorized and stored by business subject
rather than by application
SupplierCustomers Whole Sale
Marketing
Company
Products
Employees
Shippers
OLTP Applications Data Warehouse Subject
DWH
Integrated
OLTP Applications
Products
Order Detail
Order
Data Warehouse
Data on a given subject is defined and stored once.
Customer
DWH
Time-Variant
Data is stored as a series of snapshots, each
representing a period of time
Time Data
Jan-97 January
Feb-97 February
Mar-97 March
DWH
Nonvolatile
Typically data in the data warehouse is not updated or delelted.
Insert
Update
Delete
Read Read
Operational Warehouse
Load
DWH
Changing Data
Warehouse Database
First time load
Refresh
Refresh
Refresh
Operational
Database
DWH
Data Warehouse Versus OLTP
Property
Response
Time
Operations
Nature of Data
Data Organization
Size
Data Source
Activities
Operational
Sub seconds to
seconds
DML
30-60 days
Applications
Small to large
Operational, Internal
Processes
Data Warehouse
Seconds to hours
Snapshots over time
Subject, time
Large to very large
Operational, Internal,
External
Analysis
Primarily read only
DWH
Data Warehouses Versus
Data Marts
Property Data Warehouse Data Mart
Scope Enterprise Department
Subject Multiple Single-subject, LOB
Data Source Many Few
Size(typical) 100 GB to>1 TB <100 GB
Implementation time Months to years Months
Data
WarehouseData
Mart
DWH
Dependent Data Mart
Marketing
Sales
Human Resources
(Employees)
Shipper
Categories
Orders
External Data
Data
Warehouse
Operational
Systems
Flat Files
Data Marts
DWH
Data Warehouse
Terminology
�Operational data store (ODS)
Stores tactical data from production systems that are subject-oriented and integrated to address operational needs
�Metadata
Metadata
DWH
Data Warehouse
Terminology
Data
Integration
Enterprise data
warehouse
Business
area
warehouse
Source
data
Architecture
DWH
Methodology
�Ensures a successful data warehouse
�Encourages incremental development
�Provides a staged approach to an enterprisewide
warehouse
- Safe
- Manageable
- Proven
- Recommended
DWH
Modeling
�Warehouses differ from operational structures:
- Analytical requirements
- Subject orientation
�Data must map to subject oriented information:
- Identify business subjects
- Define relationships between subjects
- Name the attributes of each subject
�Modeling is iterative
�Modeling tools are available
DWH
21
Components of the Warehouse
�Data Extraction and Loading
�The Warehouse
�Analyze and Query -- OLAP Tools
�Metadata
�Data Mining tools
Loading the Warehouse
Cleaning the data before it is loaded
Extraction, Transformation & Loading
Purchase specialist tools, or develop programs
�Extraction-- select data using different methods
�Transformation--validate, clean, integrate, and
time stamp data
�Loading--move data into the warehouse
OLTP Databases ETL Tool Warehouse Database
DWH
ETL Life Cycle
�The typical real-life ETL cycle consists of the
following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for
data integrity, create aggregates or disaggregates)
DWH
DWH
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with
business rules. Also, in case of failure, helps to
diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
Data Access and Reporting
� Tools that retrieve data for business analysis
� Imperatives
- Ease of use
- Intuitive
- Metadata
- Training
� More than one tool may be required
Warehouse
Database
Charts
Forecasting
Drill-down
DWH
27
Snowflake schema
�Represent dimensional hierarchy directly by
normalizing tables.
�Easy to maintain and saves storage
Ti
m
e
prod
cust
city
fact
date, custno, prodno, cityname, ...
region
Oracle Warehouse Components
Relational /
Multidimensional
Text, image Spatial
Web Audio
video
External
data
Operational
data
Relational
tools
OLAP
tools
Applications/Web
Any DataAny Source Any Access
DWH
Oracle Data Mart Suite
Data ModelingOracle Data Mart Designer
OLTP
Engines
OLTP
Databases
Data
Extraction
Oracle Data Mart
Builder
Ware-
housing
Engines
Data Mart
Database
SQL*Plus
Data
Management
Oracle Enterprise
Manager
Data Access
& Analysis
Discoverer &
Oracle ReportsDWH
Oracle Business
Intelligence Tools
Current Tactical Strategic
IS develops
user’s Views Business users Analysis
Reports Discover Express
DWH
31
Data Mining works with Warehouse
Data
�Data Warehousing provides the Enterprise with a memory
�Data Mining provides the Enterprise with intelligence
The Tool for Each Task
Tool
Reports
Discover
Express
Production
reporting
Ad hoc
query and
analysis
Advanced
analysis
Question
What were sales by
region last quarter?
What is driving the
increase in North
American sales?
Given the rapid increase
in Web sales, what will
total sales be for the rest
of the year?
Task
DWH
33
Reporting Tools
� Andyne Computing -- GQL
� Brio -- BrioQuery
� Business Objects -- Business Objects
� Cognos -- Impromptu
� Information Builders Inc. -- Focus for Windows
� Oracle -- Discoverer2000
� Platinum Technology -- SQL*Assist, ProReports
� PowerSoft -- InfoMaker
� SAS Institute -- SAS/Assist
� Software AG -- Esperant
� Sterling Software -- VISION:Data
34
Extraction and Transformation Tools
� Carleton Corporation -- Passport
� Evolutionary Technologies Inc. -- Extract
� Informatica -- OpenBridge
� Informatica PowerCenter
� Information Builders Inc. -- EDA Copy Manager
� Platinum Technology -- InfoRefiner
� Prism Solutions -- Prism Warehouse Manager
� Red Brick Systems -- DecisionScape Formation
Warehouse Services
EducationConsulting
Support Services
Customers
DWH
DWH
OLAP constructs in RDBMS:
A relational database designed for OLTP will not serve well as a
database for data analysis. Optimization techniques such as
aggregating fact tables, partitioning fact tables, and denormalizing
relation tables all provide significant improvements in
performance.
No Future Without Data Warehousing:
Summaryfollowing are covered topics:
�Identifying a common, broadly accepted definition
of the data warehouse
�Distinguishing the differences between OLTP
systems and analytical systems
�Defining some of the common data warehouse
terminology
�Identifying some of the elements and processes in
a data warehouse
�Identifying and positioning the Oracle Warehouse
vision, products, and servicesDWH
DWH