38
PRESENTED PRESENTED PRESENTED PRESENTED BY BY BY BY VASANTHKUMAR C 1DA12CS118 VEERABHADRAPPA KS 1DA12CS120 DWH Dr. AMBEDKAR INSTITUTE OF TECHNOLOGY

DWH_PROJECT [Compatibility Mode]

Embed Size (px)

Citation preview

Page 1: DWH_PROJECT [Compatibility Mode]

PRESENTED PRESENTED PRESENTED PRESENTED BY BY BY BY VASANTHKUMAR C

1DA12CS118

VEERABHADRAPPA KS

1DA12CS120

DWH

Dr. AMBEDKAR INSTITUTE OF TECHNOLOGY

Page 2: DWH_PROJECT [Compatibility Mode]

� Loosely speaking, a data warehouse refers to a database that ismaintained separately from an organization’s operationaldatabase

� practical interest in many applications such Decision Makingin Companies by higher order database Administrators, DataAnalysis…etc

� selection & dealing successfully with particular queries givesbetter results overall.

DWH

Page 3: DWH_PROJECT [Compatibility Mode]

DWH

INTRODUCTION

DATA WAREHOUSE vs OLTP

DATA WAREHOUSE vs DATA MARTS

DISCUSSION(Data Warehouse termino logy)

METHODOLGY

ETL LIFE CYCLE

FUTURE ENHANCEMENTS

Page 4: DWH_PROJECT [Compatibility Mode]

DWH

Data Warehouse

Concepts

And

ETL Tool

INTRODUCTION

Page 5: DWH_PROJECT [Compatibility Mode]

What is a Data Warehouse?

A single, complete andconsistent store of dataobtained from a variety ofdifferent sources madeavailable to end users in awhat they can understandand use in a businesscontext.

[Barry Devlin]

Page 6: DWH_PROJECT [Compatibility Mode]

Definition of a Data Warehouse

“ An enterprise structured repository of

subject-oriented, time-variant, historical data

used for information retrieval and decision

support. The data warehouse stores atomic

and summary data.”

DWH

Page 7: DWH_PROJECT [Compatibility Mode]

7

Warehouses are Very Large Databases

35%

30%

25%

20%

15%

10%

5%

0%

5GB

5-9GB

10-19GB 50-99GB 250-499GB

20-49GB 100-249GB 500GB-1TB

Initial

Projected 2Q96

Source: META Group, Inc.

Re

sp

on

de

nts

Page 8: DWH_PROJECT [Compatibility Mode]

Data Warehouse Properties

Data

Warehouse

Integrated

Time VariantNon Volatile

Subject

Oriented

DWH

Page 9: DWH_PROJECT [Compatibility Mode]

Subject-Oriented

Data is categorized and stored by business subject

rather than by application

SupplierCustomers Whole Sale

Marketing

Company

Products

Employees

Shippers

OLTP Applications Data Warehouse Subject

DWH

Page 10: DWH_PROJECT [Compatibility Mode]

Integrated

OLTP Applications

Products

Order Detail

Order

Data Warehouse

Data on a given subject is defined and stored once.

Customer

DWH

Page 11: DWH_PROJECT [Compatibility Mode]

Time-Variant

Data is stored as a series of snapshots, each

representing a period of time

Time Data

Jan-97 January

Feb-97 February

Mar-97 March

DWH

Page 12: DWH_PROJECT [Compatibility Mode]

Nonvolatile

Typically data in the data warehouse is not updated or delelted.

Insert

Update

Delete

Read Read

Operational Warehouse

Load

DWH

Page 13: DWH_PROJECT [Compatibility Mode]

Changing Data

Warehouse Database

First time load

Refresh

Refresh

Refresh

Operational

Database

DWH

Page 14: DWH_PROJECT [Compatibility Mode]

Data Warehouse Versus OLTP

Property

Response

Time

Operations

Nature of Data

Data Organization

Size

Data Source

Activities

Operational

Sub seconds to

seconds

DML

30-60 days

Applications

Small to large

Operational, Internal

Processes

Data Warehouse

Seconds to hours

Snapshots over time

Subject, time

Large to very large

Operational, Internal,

External

Analysis

Primarily read only

DWH

Page 15: DWH_PROJECT [Compatibility Mode]

Data Warehouses Versus

Data Marts

Property Data Warehouse Data Mart

Scope Enterprise Department

Subject Multiple Single-subject, LOB

Data Source Many Few

Size(typical) 100 GB to>1 TB <100 GB

Implementation time Months to years Months

Data

WarehouseData

Mart

DWH

Page 16: DWH_PROJECT [Compatibility Mode]

Dependent Data Mart

Marketing

Sales

Human Resources

(Employees)

Shipper

Categories

Orders

External Data

Data

Warehouse

Operational

Systems

Flat Files

Data Marts

DWH

Page 17: DWH_PROJECT [Compatibility Mode]

Data Warehouse

Terminology

�Operational data store (ODS)

Stores tactical data from production systems that are subject-oriented and integrated to address operational needs

�Metadata

Metadata

DWH

Page 18: DWH_PROJECT [Compatibility Mode]

Data Warehouse

Terminology

Data

Integration

Enterprise data

warehouse

Business

area

warehouse

Source

data

Architecture

DWH

Page 19: DWH_PROJECT [Compatibility Mode]

Methodology

�Ensures a successful data warehouse

�Encourages incremental development

�Provides a staged approach to an enterprisewide

warehouse

- Safe

- Manageable

- Proven

- Recommended

DWH

Page 20: DWH_PROJECT [Compatibility Mode]

Modeling

�Warehouses differ from operational structures:

- Analytical requirements

- Subject orientation

�Data must map to subject oriented information:

- Identify business subjects

- Define relationships between subjects

- Name the attributes of each subject

�Modeling is iterative

�Modeling tools are available

DWH

Page 21: DWH_PROJECT [Compatibility Mode]

21

Components of the Warehouse

�Data Extraction and Loading

�The Warehouse

�Analyze and Query -- OLAP Tools

�Metadata

�Data Mining tools

Page 22: DWH_PROJECT [Compatibility Mode]

Loading the Warehouse

Cleaning the data before it is loaded

Page 23: DWH_PROJECT [Compatibility Mode]

Extraction, Transformation & Loading

Purchase specialist tools, or develop programs

�Extraction-- select data using different methods

�Transformation--validate, clean, integrate, and

time stamp data

�Loading--move data into the warehouse

OLTP Databases ETL Tool Warehouse Database

DWH

Page 24: DWH_PROJECT [Compatibility Mode]

ETL Life Cycle

�The typical real-life ETL cycle consists of the

following execution steps:

1. Cycle initiation

2. Build reference data

3. Extract (from sources)

4. Validate

5. Transform (clean, apply business rules, check for

data integrity, create aggregates or disaggregates)

DWH

Page 25: DWH_PROJECT [Compatibility Mode]

DWH

6. Stage (load into staging tables, if used)

7. Audit reports (for example, on compliance with

business rules. Also, in case of failure, helps to

diagnose/repair)

8. Publish (to target tables)

9. Archive

10. Clean up

Page 26: DWH_PROJECT [Compatibility Mode]

Data Access and Reporting

� Tools that retrieve data for business analysis

� Imperatives

- Ease of use

- Intuitive

- Metadata

- Training

� More than one tool may be required

Warehouse

Database

Charts

Forecasting

Drill-down

DWH

Page 27: DWH_PROJECT [Compatibility Mode]

27

Snowflake schema

�Represent dimensional hierarchy directly by

normalizing tables.

�Easy to maintain and saves storage

Ti

m

e

prod

cust

city

fact

date, custno, prodno, cityname, ...

region

Page 28: DWH_PROJECT [Compatibility Mode]

Oracle Warehouse Components

Relational /

Multidimensional

Text, image Spatial

Web Audio

video

External

data

Operational

data

Relational

tools

OLAP

tools

Applications/Web

Any DataAny Source Any Access

DWH

Page 29: DWH_PROJECT [Compatibility Mode]

Oracle Data Mart Suite

Data ModelingOracle Data Mart Designer

OLTP

Engines

OLTP

Databases

Data

Extraction

Oracle Data Mart

Builder

Ware-

housing

Engines

Data Mart

Database

SQL*Plus

Data

Management

Oracle Enterprise

Manager

Data Access

& Analysis

Discoverer &

Oracle ReportsDWH

Page 30: DWH_PROJECT [Compatibility Mode]

Oracle Business

Intelligence Tools

Current Tactical Strategic

IS develops

user’s Views Business users Analysis

Reports Discover Express

DWH

Page 31: DWH_PROJECT [Compatibility Mode]

31

Data Mining works with Warehouse

Data

�Data Warehousing provides the Enterprise with a memory

�Data Mining provides the Enterprise with intelligence

Page 32: DWH_PROJECT [Compatibility Mode]

The Tool for Each Task

Tool

Reports

Discover

Express

Production

reporting

Ad hoc

query and

analysis

Advanced

analysis

Question

What were sales by

region last quarter?

What is driving the

increase in North

American sales?

Given the rapid increase

in Web sales, what will

total sales be for the rest

of the year?

Task

DWH

Page 33: DWH_PROJECT [Compatibility Mode]

33

Reporting Tools

� Andyne Computing -- GQL

� Brio -- BrioQuery

� Business Objects -- Business Objects

� Cognos -- Impromptu

� Information Builders Inc. -- Focus for Windows

� Oracle -- Discoverer2000

� Platinum Technology -- SQL*Assist, ProReports

� PowerSoft -- InfoMaker

� SAS Institute -- SAS/Assist

� Software AG -- Esperant

� Sterling Software -- VISION:Data

Page 34: DWH_PROJECT [Compatibility Mode]

34

Extraction and Transformation Tools

� Carleton Corporation -- Passport

� Evolutionary Technologies Inc. -- Extract

� Informatica -- OpenBridge

� Informatica PowerCenter

� Information Builders Inc. -- EDA Copy Manager

� Platinum Technology -- InfoRefiner

� Prism Solutions -- Prism Warehouse Manager

� Red Brick Systems -- DecisionScape Formation

Page 35: DWH_PROJECT [Compatibility Mode]

Warehouse Services

EducationConsulting

Support Services

Customers

DWH

Page 36: DWH_PROJECT [Compatibility Mode]

DWH

OLAP constructs in RDBMS:

A relational database designed for OLTP will not serve well as a

database for data analysis. Optimization techniques such as

aggregating fact tables, partitioning fact tables, and denormalizing

relation tables all provide significant improvements in

performance.

No Future Without Data Warehousing:

Page 37: DWH_PROJECT [Compatibility Mode]

Summaryfollowing are covered topics:

�Identifying a common, broadly accepted definition

of the data warehouse

�Distinguishing the differences between OLTP

systems and analytical systems

�Defining some of the common data warehouse

terminology

�Identifying some of the elements and processes in

a data warehouse

�Identifying and positioning the Oracle Warehouse

vision, products, and servicesDWH

Page 38: DWH_PROJECT [Compatibility Mode]

DWH