35
MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management System Feb 5, 2015

MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

Embed Size (px)

Citation preview

Page 1: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 1Georgia State University - Confidential

MGS 8020

Business Intelligence

The Data Warehouse& Relational Database Management System

Feb 5, 2015

Page 2: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 2Georgia State University - Confidential

Agenda

Designing & Building the

Data Warehouse

Data Warehouse

Relational Database

Page 3: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 3Georgia State University - Confidential

The Data Warehouse

The Data Warehouse

• is physically separated from all other operational systems

• holds aggregated data and transactional data for management separate from that data used for online transaction processing

Page 4: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 4Georgia State University - Confidential

Data Flow

OperationalData Store

DataWarehouse

DataMart

Metadata

LegacySystems

PersonalData

Warehouse

Page 5: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 5Georgia State University - Confidential

The Data Warehouse

The Data Warehouse

• is physically separated from all other operational systems

• holds aggregated data and transactional data for management separate from that data used for online transaction processing

Page 6: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 6Georgia State University - Confidential

Characteristics of a Data Warehouse

• Subject Orientation

• Data IntegratedConsistent Naming and Measurement Attributes

• Time Variant

• Nonvolatility

Page 7: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 7Georgia State University - Confidential

Business Intelligence & Data Warehouse

Internal Source Systems

External Data Sources

Data WarehouseData Mart

Ext

ract

, T

rans

form

atio

n an

d L

oad

Business IntelligenceBusiness Objects

Cognos

Microstrategy

Brios

Microsoft Access

Etc.

Advanced AnalyticsSAS

Minitab

SPSS

Executive Information SystemDashboard Balanced Scorecard

KPI

Financial Metrics

Page 8: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 8Georgia State University - Confidential

Data Warehouse Vendors

• Business Objects

• Cognos

• Hyperion

• IBM

• Microsoft

• NCR / Teradata

• Oracle

• SAS

Page 9: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 9Georgia State University - Confidential

Metadata

What is Metadata?

• Data about Data• Without metadata, the data is meaningless• Provides consistency of the truth

Components of Metadata

• Transformation Mapping• Extraction and Relationship History• Algorithms for Summarization (and calculations)• Data Ownership• Patterns of Warehouse Access• Business Friendly naming conventions• Status Information

Page 10: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 10Georgia State University - Confidential

Agenda

Data Warehouse

Designing & Building the

Data Warehouse

Relational Database

Page 11: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 11Georgia State University - Confidential

Relational Database

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product, customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid.

Page 12: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 12Georgia State University - Confidential

Relational Database

When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable.

The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints. Meta is a prefix that in most information technology usages means "an underlying definition or description." Thus, metadata is a definition or description of data and metalanguage is a definition or description of language.

A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.

SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database. Although SQL is both an ANSI and an ISO standard, many database products support SQL with proprietary extensions to the standard language. Queries take the form of a command language that lets you select, insert, update, find out the location of data, and so forth.

Page 13: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 13Georgia State University - Confidential

Business Intelligence Environment

Internal Source Systems

External Data Sources

Ext

ract

, T

rans

form

atio

n an

d L

oad

Data WarehouseData Mart

RDBMS

SQL

Business IntelligenceBusiness Objects

Cognos

Microsoft Access

Etc.

Page 14: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 14Georgia State University - Confidential

Relational Database

• IBM DB2, DB2/400 • Microsoft SQL/Server • Teradata • Oracle • Sybase • Informix / Red Brick

• Microsoft Access• MySQL

Page 15: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 15Georgia State University - Confidential

Relational Database

RDBMS

BI SoftwareApplication

SQLRequest

ResultSet

Page 16: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 16Georgia State University - Confidential

SQL

SQL – Structured Query Language

1. DDL – Data Definition Language

• Create• Drop • Alter

2. DML – Data Manipulation Language

• Insert• Update• Delete• Select

Page 17: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 17Georgia State University - Confidential

SQL Select Statement

SELECT column1, column2, . . .

FROM table1, table2, . . .

WHERE criteria1 AND/OR criteria2 . . . . .

ORDER BY column1, column1, . . .

Page 18: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 18Georgia State University - Confidential

SQL Select Statement

SELECT column1, column2, . . .

FROM table1, table2, . . .

WHERE criteria1 AND/OR criteria2 . . . . .

ORDER BY column1, column1, . . .

GROUP BY column1, column1, . . .

HAVING criteria1 AND/OR criteria2 . . . . .

Aggregation

Page 19: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 19Georgia State University - Confidential

SQL – Example 1

SQL

CREATE

TABLE ADDR_BOOK ( NAME char(30),

COMPANY char(20),

E_MAIL char (25)

Output

Name Company Email

John Smith Microsoft [email protected]

Jeff Jones Delta [email protected]

Page 20: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 20Georgia State University - Confidential

SQL – Example 2

2a)

SQL

SELECT

NAME,

COMPANY,

E_MAIL

FROM

ADDR_BOOK

WHERE COMPANY = ‘Microsoft'

Output

Name Company Email

John Smith Microsoft [email protected]

2b)

Table - Product

ID Name Category

I Internet A

B Browsers A

A Application Null

G Graphics Null

SQL

SELECT

ID,

NAME

from

PRODUCT

WHERE CATEGORY = NULL

Page 21: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 21Georgia State University - Confidential

SQL – Example 3

SQL

SELECT

ADDR_BOOK.NAME,

COMPANY.EMAIL

FROM

ADDR_BOOK,

COMPANY

WHERE ADDR_BOOK.EMPLOYEE_ID = COMPANY.EMPLOYEE_ID

Output

Name Email

John Smith [email protected]

Jeff Jones [email protected]

Page 22: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 22Georgia State University - Confidential

SQL – Example 4

SQL

CREATE TABLE CUSTOMER (

CUST_NO INTEGER,

FIRST_NAME CHAR(30),

LAST_NAME CHAR(30),

ADDRESS CHAR(50),

CITY CHAR(30),

STATE CHAR (2),

ZIP_CODE CHAR(9),

COUNTRY CHAR(20) )

CREATE TABLE ORDER (

ORDER_NO INTEGER,

DATE_ENTERED DATE,

CUST_NO INTEGER )

SQL

SELECT

ORDER.ORDER_NO, CUSTOMER.NAME, CUSTOMER.ADDRESS, CUSTOMER.CITY, CUSTOMER.ZIP_CIDE, CUSTOMER.COUNTRY

FROM

ORDER, CUSTOMER

WHERE ORDER.CUST_NO = CUSTOMER.CUST_NO

AND

ORDER.DATE_ENTERED = '1998-20-11'

Page 23: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 23Georgia State University - Confidential

SQL – Example 5

SQL

CREATE

TABLE ADDR_BOOK ( NAME char(30),

COMPANY char(20),

E_MAIL char (25)

Output

Name Company Email

John Smith Microsoft [email protected]

Jeff Jones Delta [email protected]

Page 24: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 24Georgia State University - Confidential

SQL – Example 6 – Referential Integrity

SQL

CREATE TABLE CUSTOMER (

CUST_NO INTEGER PRIMARY KEY,

FIRST_NAME CHAR(30),

LAST_NAME CHAR(30),

ADDRESS CHAR(50),

CITY CHAR(30),

ZIP_CODE CHAR(9),

COUNTRY CHAR(20) )

CREATE TABLE ORDER (

ORDER_NO INTEGER PRIMARY KEY,

DATE_ENTERED DATE,

CUST_NO INTEGER REFERENCES CUSTOMER (CUST_NO) )

SQL

CREATE TABLE ORDER_ITEMS (

ORDER_NO INTEGER,

ITEM_NO INTEGER,

PRODUCT CHAR(30),

QUANTITY INTEGER,

UNIT_PRICE MONEY )

ALTER TABLE ORDER_ITEMS

ADD PRIMARY KEY PK_ORDER_ITEMS (ORDER_NO, ITEM_NO)

ALTER TABLE ORDER_ITEMS

ADD FOREIGN KEY FK_ORDER_ITEMS_1 (ORDER_NO)

REFERENCES ORDER (ORDER_NO)

Page 25: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 25Georgia State University - Confidential

SQL – Example 7 – Index

When you have a primary key, you already have an implicitly (or explicitly) defined unique index on the primary key columns. It's generally a good idea to define non-unique indexes on the foreign keys.

SQL

CREATE UNIQUE INDEX PK_CUSTOMER ON CUSTOMER (CUST_NO)

CREATE UNIQUE INDEX PK_ORDER ON ORDER (ORDER_NO)

CREATE INDEX FK_ORDER_1 ON ORDER (CUST_NO)

CREATE UNIQUE INDEX PK_ORDER_ITEMS ON ORDER_ITEMS (ORDER_NO, ITEM_NO)

CREATE INDEX FK_ORDER_ITEMS_1 ON ORDER_ITEMS (ORDER_NO)

Page 26: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 26Georgia State University - Confidential

Agenda

Data Warehouse

Designing & Building the Data

Warehouse

Relational Database

Page 27: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 27Georgia State University - Confidential

Why Business Intelligence

1. Improve consistency and accuracy of reporting

2. Reduce stress on operational systems for reporting and analysis

3. Faster access to information

4. BI tools provide increased analytical capabilities

5. Empowering the Business User

6. Companies are realizing that data is a company’s most underutilized asset

Page 28: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 28Georgia State University - Confidential

ERM vs. DM

ERM - Entity Relationship Model

• Remove redundancy

• Efficiency of transactions

DM - Dimensional Model • Intuitive View of the Data • Efficiency of access and analysis

Page 29: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 29Georgia State University - Confidential

Dimensional Model

Fact Table

Foreign_Key_1Foreign_Key_2Foreign_Key_3Foreign_Key_4Metric_1Metric_2. . . .

Dimension Table

Primary_KeyDescriptive_Attribute_1Descriptive_Attribute_2Descriptive_Attribute_3Descriptive_Attribute_4Descriptive_Attribute_5Descriptive_Attribute_6Descriptive_Attribute_7. . . .

Dimension Table

Primary_KeyDescriptive_Attribute_1Descriptive_Attribute_2Descriptive_Attribute_3Descriptive_Attribute_4Descriptive_Attribute_5Descriptive_Attribute_6Descriptive_Attribute_7. . . .

Dimension Table

Primary_KeyDescriptive_Attribute_1Descriptive_Attribute_2Descriptive_Attribute_3Descriptive_Attribute_4Descriptive_Attribute_5Descriptive_Attribute_6Descriptive_Attribute_7. . . .

Dimension Table

Primary_KeyDescriptive_Attribute_1Descriptive_Attribute_2Descriptive_Attribute_3Descriptive_Attribute_4Descriptive_Attribute_5Descriptive_Attribute_6Descriptive_Attribute_7. . . .

Star Schema

Page 30: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 30Georgia State University - Confidential

Retail Sales Dimensional Model (Partial)

Sales Fact Table

Time_Key (FK)Product_Key (FK)Store_Key (FK)Customer_Key(FK)UnitsRevenueCost. . .

Product Dimension Table

Product_Key (PK)SKU_NumberDescriptionBrandProduct_CategorySize. . . .Etc.

Customer Dimension Table

Customer_Key (PK)Customer_NamePurchase_ProfileCredit_ProfileDemographic_CategoryAddress. . . .Etc.

Time Dimension Table

Time_Key (PK)DateDay_of_WeekWeek_NumberMonth. . . .Etc.

Store Dimension Table

Store_Key (PK)Store_IDStore_NameAddressDistrictFloor_Plan. . . .Etc.

Page 31: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 31Georgia State University - Confidential

Fact Table

1. Contains Foreign Keys that relate to Dimension Tables

2. Have a many-to-one relationship to Dimension Tables

3. Contains Metrics to be aggregated

4. Typically does not contain any non-foreign key or non-metric data elements

5. Level of Granularity defines depth and flexibility of analysis

Sales Fact Table

Time_Key (FK)Product_Key (FK)Store_Key (FK)Customer_Key(FK)UnitsRevenueCost. . .

Page 32: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 32Georgia State University - Confidential

Dimension Table

1. Contains a Primary Key that relates to the Fact Table(s)

2. Has a one-to-many relationship to the Fact Table(s)

3. Contains Descriptive data used to limit and aggregated metrics from the Fact Table(s)

4. Can sometimes contain pre-aggregated data

Product Dimension Table

Product_Key (PK)SKU_NumberDescriptionBrandProduct_CategorySize. . . .Etc.

Page 33: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 33Georgia State University - Confidential

Warehouse Architecture Specification

• Common Sources

• Common Dimensions

• Common Business Rules

• Common Semantics

• Common Metrics

Page 34: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 34Georgia State University - Confidential

Time Dimension

Week – defined by an end of week day

Month – January, February, March, . . .

Quarter – Q1: 01/01 – 03/31

Q2: 04/01 – 06/30

Q3: 07/01 – 09/30

Q4: 10/01 – 12/31

Year – 2000, 2001, 2002, 2003

Date (Primary Key) – a day, 365 per year

Fiscal Month – 4/4/5Fiscal QuarterFiscal Year

Page 35: MGS8020_05.ppt/Feb 5, 2015/Page 1 Georgia State University - Confidential MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management

MGS8020_05.ppt/Feb 5, 2015/Page 35Georgia State University - Confidential

Time Dimension

Weekday/Weekend

Day of Week – Monday, Tuesday, Wednesday, . . .

Season – Winter, Spring, Summer, Fall

Holiday – Labor Day, 4th of July, Memorial Day, . . .

. . .

Date (Primary Key) – a day, 365 per year