12
Data Warehousing Lecture-3 Introduction and Background 1

Data Warehousing

Embed Size (px)

DESCRIPTION

Data Warehousing. Lecture-3 Introduction and Background. Introduction and Background. What is a Data Warehouse ?. It is a blend of many technologies, the basic concept being: Take all data from different operational systems. If necessary, add relevant data from industry. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Warehousing

Data Warehousing Lecture-3

Introduction and Background

1

Page 2: Data Warehousing

Introduction and Background

2

Page 3: Data Warehousing

What is a Data Warehouse ?

3

It is a blend of many technologies, the basic concept being:

Take all data from different operational systems. If necessary, add relevant data from industry. Transform all data and bring into a uniform format. Integrate all data as a single entity.

Page 4: Data Warehousing

What is a Data Warehouse ? (Cont…)

4

It is a blend of many technologies, the basic concept being: Store data in a format supporting easy access for decision support. Create performance enhancing indices. Implement performance enhancement joins. Run ad-hoc queries with low selectivity.

Page 5: Data Warehousing

5

Business user needs info

User requestsIT people

IT peoplecreate reports

IT peoplesend reports tobusiness user

IT people dosystem analysis

and design

Business usermay get answers

Answers resultin more questions

?

How is it Different?How is it Different? Fundamentally differentFundamentally different

Page 6: Data Warehousing

How is it Different?

• Different patterns of hardware utilization

6

100%

0%

Operational DWH

Bus Service vs. Train Bus Service vs. Train

Page 7: Data Warehousing

How is it Different?• Combines operational and historical data.

7

Don’t do data entry into a DWH, OLTP or ERP are the source systems.

OLTP systems don’t keep history, cant get balance statement more than a year old.

DWH keep historical data, even of bygone customers. Why?

In the context of bank, want to know why the customer left?

What were the events that led to his/her leaving? Why?

Customer retention.

Page 8: Data Warehousing

How much history?

• Depends on:– Industry.

– Cost of storing historical data.

– Economic value of historical data.

8

Page 9: Data Warehousing

How much history?

• Industries and history– Telecomm calls are much much more as compared to bank

transactions- 18 months.

– Retailers interested in analyzing yearly seasonal patterns- 65 weeks.

– Insurance companies want to do actuary analysis, use the historical data in order to predict risk- 7 years.

9

Page 10: Data Warehousing

How much history?

10

Economic Economic valuevalue of data of data Vs.Vs.

Storage Storage costcost

Data Warehouse a Data Warehouse a complete repositorycomplete repository of data? of data?

Page 11: Data Warehousing

How is it Different?• Usually (but not always) periodic or batch

updates rather than real-time.

11

The boundary is blurring for active data warehousing.

For an ATM, if update not in real-time, then lot of real trouble.

DWH is for strategic decision making based on historical data. Wont hurt if transactions of last one hour/day are absent.

Page 12: Data Warehousing

How is it Different?

12

Rate of update depends on: volume of data, nature of business, cost of keeping historical data, benefit of keeping historical data.