28
1 Data Warehouses BUAD/American University Data Warehouses

1 Data Warehouses BUAD/American University Data Warehouses

Embed Size (px)

Citation preview

1Data WarehousesBUAD/American University

Data Warehouses

2Data WarehousesBUAD/American University

Definition

• Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.

3Data WarehousesBUAD/American University

Need forData Warehousing

• Integrated, company-wide view of high-quality information.

• Separation of operational and informational systems and data– operational system: a system that is used to run a

business in real time, based on current data– informational system: systems designed to support

decision making based on stable point-in-time or historical data

4Data WarehousesBUAD/American University

Factors AllowingData Warehousing

• Relational DBMS.

• Advances in hardware: speed and storage capacity.

• End-user computing interfaces and tools.

5Data WarehousesBUAD/American University

Data Warehouse Architectures

• Two-level

– source system files containing operational data

– transformed and integrated data warehouse

• Three-level

– Operational data.

– Enterprise data warehouse (EDW)- single source of data for decision making.

– Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups

6Data WarehousesBUAD/American University

Generic data warehouse architecture

7Data WarehousesBUAD/American University

Three-layer architecture

8Data WarehousesBUAD/American University

Reasons for theThree-Level Architecture

• EDW and data marts have different purposes and data architectures.

• Data transformation is complex and is best performed in two steps.

• Data marts customized decision support for different groups

• Architecture– Operational data, reconciled data, Derived data.

9Data WarehousesBUAD/American University

Three-layer data architecture

10Data WarehousesBUAD/American University

Data Characteristics

• Status vs. Event data.

– A transaction is a business activity that triggers one or more business events: event data captures them

• Transient vs. Periodic data.

– Transient: data in which changes to existing records are written over previous records, thus destroying previous data content

– periodic data: data that are never physically altered or deleted once added

11Data WarehousesBUAD/American University

Example of DBMS log entry

12Data WarehousesBUAD/American University

Transient operational data

13Data WarehousesBUAD/American University

Reconciled DataCharacteristics

• Detailed

• Historical

• Normalized

• Enterprise-wide

• Quality controlled

14Data WarehousesBUAD/American University

The Data Reconciliation Process

• Capture: capture the relevant data from source files to fill EDW– Static - initial load.– Incremental - ongoing update.

• Scrub or data cleansing– missing data, name reconciliation– Pattern recognition and other artificial

intelligence techniques.

15Data WarehousesBUAD/American University

Steps in data reconciliation

16Data WarehousesBUAD/American University

The Data Reconciliation Process

• Transform

– Convert the data format from the source to the target system.

– Record-Level Functions

• Selection.

• Joining.

• Aggregation (for data marts).

– Field-Level Functions

• Single-field transformation

• Multi-field transformation

17Data WarehousesBUAD/American University

The Data Reconciliation Process

• Load and Index– Refresh Mode

• When the warehouse is first created.

• Static data capture.

– Update Mode• Ongoing update of the warehouse.

• Incremental data capture.

18Data WarehousesBUAD/American University

Derived DataCharacteristics

• Type of data– Detailed, possibly periodic.– Aggregated.

• Distributed to departmental servers.

• Implemented in star schema.

19Data WarehousesBUAD/American University

Star Schema

• Also called the dimensional model.

• Fact and dimension tables.– Fact table: consists of factual or quantitative

data about the business– Dimension table: hold descriptive data

• Grain of a fact table - time period for each record.

20Data WarehousesBUAD/American University

Components of a star schema

21Data WarehousesBUAD/American University

Star schema example

22Data WarehousesBUAD/American University

Star schema with sample data

23Data WarehousesBUAD/American University

Example of snowflake sample

24Data WarehousesBUAD/American University

Size of the fact table

• Total number of stores: 1,000

• Total number of products: 10,000

• Total number of periods: 24

• Total rows: 1000 * 10,000 * 24 = 240,000,000

• On average 50% items record sales,– no of rows = 120,000,000

25Data WarehousesBUAD/American University

Types of Data Marts

• Dependent - Populated from the EDW.

• Independent - Data taken directly from the operational databases.

26Data WarehousesBUAD/American University

The User Interface

• The role of metadata.• Traditional query and reporting tools.• On-line analytical processing (OLAP)• The use of a set of graphical tools that provides

users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.

27Data WarehousesBUAD/American University

The User Interface

– Slicing a cube.– Pivot

• Rotate the view for a particular data point to obtain another perspective.

• E.g. take a value from the units column and obtain by-store values.

– Drill-down

28Data WarehousesBUAD/American University

Slicing a data cube