53
Core of Business “Intelligence” technology Database warehouse, data mining and on-line analytical processing

Core of Business “Intelligence” technology

  • Upload
    keelty

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Core of Business “Intelligence” technology. Database warehouse, data mining and on-line analytical processing . Business Intelligence and Analytics for Decision Support. The diagram show the role played by data warehouse, data-mining - PowerPoint PPT Presentation

Citation preview

Page 1: Core of Business “Intelligence” technology

Core of Business “Intelligence” technology

Database warehouse, data mining and on-line analytical processing

Page 2: Core of Business “Intelligence” technology

Laudon and LaudonChapter 10

Business intelligence and analytics requires a strong database foundation, a set of analytic tools, and an involved management team that can ask intelligent questions and analyze data.

Business Intelligence and Analytics for Decision Support

The diagram show the role played by data warehouse, data-mining and olap in the “overall” business “decision making” process

Page 3: Core of Business “Intelligence” technology

The Data Warehouse “A data warehouse is a subject-oriented,

integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support of management’s decision making process.”– Data warehouses developed because E.G.:– if you want to ask “How much does this customer

owe?” then the sales database is probably the one to use. However if you want to ask “Was this ad campaign more successful than that one?”, you require data from more disparate sources Other sources e.g. production, marketing etc.

Page 4: Core of Business “Intelligence” technology

Characteristics of a Data Warehouse• Subject oriented – (based around business processes;

e.g. sale of products, Customer purchases • Integrated – inconsistencies removed• Nonvolatile – stored in read-only format• Time variant – data is “static” and update periodically; • Summarized – in decision-usable format; monthly

average; total quanity• Large volume – data sets are quite large; all the

pertinent data of an organisation• Non normalized – often redundant: “non-relational”

star flake schema (dimension tables and fact tables):

Page 5: Core of Business “Intelligence” technology

Product CodeProdRef Eff. Date

ProdRef End DateProduct NameUnit PriceProduct CategoryProduct TypeProduct Sub Type

Product RefCustomer IDActivity DateProduct Code

Product NameSales Rep IDQty PurchasedTotal DollarsPromotion Flag

Cust PurchasesCustomerCustomer IDStatus Date

Cust Addr StateCust ZIP CodeCustomer TypeCustomer Status...

Sales Rep ID

Sales Person NameStore ID

Sales Rep RefStore ID

Store NameStore LocationDistribution Channel

Outlet ReferenceCust AveragesCustomer IDCust Average Date

Cust Avg. End DateCust Avg. Rev.Cust Longevity

The Atomic Schema

Page 6: Core of Business “Intelligence” technology

Purchases 1

Days of ActivityUnit PriceTotal QuantityTotal DollarsReturned QtyReturned DollarsPromotion Qty

Sales Rep IDProduct CodeCust ZIP CodeCustomer TypeWeek Ending Date

Cust ZIP Code

CityState/ProvinceCountry

Customer Location

ProductProduct Code

Product NameProd. CategoryProduct TypeProd Sub Type

Week Ending Date

MonthQuarterYear

Date Information

Customer Type

Cust Type Desc

Customer Type

Selling ResponsibilitySales Rep ID

Sales Rep NameStore IDStore NameStore LocationSales Channel

For Example:

Page 7: Core of Business “Intelligence” technology

Select E.Month, B.Customer_Type, C.Product_Type,D.Store_Location, sum(A.Total_Quantity) (note A is the

fact table)

From Purchases_1 A, Customer_Type B, Product C,Selling_Responsibility D, Date_Information E

Where B.Customer_Type = A.Customer_Type andC.Product_Code = A.Product_Code andD.Sales_Rep_ID = A.Sales_Rep_ID andE.Week_Ending_Date = AWeek_Ending_Date andE.Year = “1996” andC.Product_Category = “V”

Group by E.Month, B.Customer_Type, C.Product_Type,D.Store_Location;

Star Schema Query

Page 8: Core of Business “Intelligence” technology

A data warehouse process model

OperationalDatabase(s)

Decision Support SystemDataWarehouse

IndependentData Mart

ExternalData

ETL Routine(Extract/Transform/Load)

DependentData Mart

Extract/Summarize Data

Report

Page 9: Core of Business “Intelligence” technology

Meta Data• A key concept behind D.W. is Meta Data.

– Meta data is data about the data (which has come from the data sources) and shows what data is contained in the DW, where it came from, and what changes have been made to it.

• The metadata are essential ingredients in the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data.

– For example, a line in a sales database may contain: 1023 K596 111.21

– This is mostly meaningless until we consult the metadata (in the data directory) that tells us it was store number 1023, product K596 and sales of $111.21.

Page 10: Core of Business “Intelligence” technology

Meta Data Answers Questions for Users of the Data Warehouse

• How do I find the data I need?

• What is the original source of the data?

• How was this summarization created?

• What queries are available to access the data?

How have business definitions and terms changed over time?

How do product lines vary across organizations?

What business assumptions have been made?

Page 11: Core of Business “Intelligence” technology

Dependent Data marts • A data mart is a data store that is subsidiary to a data

warehouse of integrated data.• The data mart is directed at a partition of data (subject area)

that is created for the use of a dedicated group of users and is sometimes termed a “subject warehouse”

• The data mart might be a set of denormalised, summarised or aggregated data that can be placed on the data warehouse database or more often placed on a separate physical store.

• Data marts can be “dependent data marts” when the data is sourced from the data warehouse.

• Independent data marts represent fragmented solutions to a range of business problems in the enterprise, however, such a concept should not be deployed as it doesn’t have the “data integration” concept that’s associated with data warehouses.

Page 12: Core of Business “Intelligence” technology

Independent Data marts• However, such marts are not necessarly all

bad.• Often a valid solution to a pressing business

problem:– Extremely urgent user requirements– The absence of a budget for a full data

warehouse– The decentralisation of business units

Page 13: Core of Business “Intelligence” technology

Data Warehousing Architecture• Access Tools

– The principal purpose of the data warehouse is to provide information for strategic decision making.

– The main Decision tools used to achieve this objective are:• Data mining tools• On-line analytical processing tools• Decision support systems / Executive information

system tools• The dataware house, like all organisational

databases, can be centralised or distributed.

Page 14: Core of Business “Intelligence” technology

Data Warehousing Typology

– THE D.W. can be at single location i.e. a central data warehouse

– The collection of data is replicated around multiple locations. This means users have a local copy of the data warehouse. This can improve query run-times, and reduce communications overheads. Distributed Data warehouse (Note: The principles associated with distributed database equally apply to Distributed Data warehouses, however, the static nature of the data needs to be factored in to the design process ) .

Page 15: Core of Business “Intelligence” technology

Data Warehouse Construction Tips

• Accept that your first try will require revision• Examine the data: What formats and specific data are

needed to support your application?• Clean up the data before using it in the warehouse• Build a prototype mini-data warehouse as a learning

experience and revise strategies as necessary• Plan on more users than anticipated wanting to use the

warehouse• Keep storage requirements constantly in mind

Page 16: Core of Business “Intelligence” technology

16

Data Mining• The process of extracting valid, previously

unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions.

• Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.

Page 17: Core of Business “Intelligence” technology

17

Data Mining

• Data mining tools uses ,e.g. AI techniques, to help:– predict future trends: ,– Segment datasets – “Product” association

• allowing businesses to make proactive, knowledge-driven decisions.

Page 18: Core of Business “Intelligence” technology

18

Data mining: A.I. operations. • Some of the most commonly used techniques A.I. techniques

in data mining are:

– Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

– Rule induction: The extraction of useful if-then rules from data based on statistical significance.

– Artificial neural networks: Predictive models that learn through training and resemble biological neural networks in structure.

– Clustering: a technique to group data points into different clusters

– Regression analysis: analysis the relationship between variables

Page 19: Core of Business “Intelligence” technology

19

How Data Mining Works• For example, say that you are the director of

marketing for a insurance company and you'd like to acquire some new customers– You could just randomly go out and mail

coupons to the general population. However you would not achieve the required result.

– Alternatively As the marketing director you have access to a lot of information about all of your customers: their age, sex, income range and credit card insurance.

Page 20: Core of Business “Intelligence” technology

20

How Data Mining Works

• The goal in prospecting is to make some decisions about the information in the lower right hand quadrant based on the model that we build going from Customer General Information to Customer Proprietary Information.

Customers Prospects

General information (e.g. demographic data)

Known Known

Proprietary information (e.g. customer transactions)

Known Target

Page 21: Core of Business “Intelligence” technology

21

Data Mining techniques• Data mining operations include:

– Predictive modelling: decision trees, regression analysis…

– Database segmentation: clustering techniques

– Link analysis: decision trees, association rules

Page 22: Core of Business “Intelligence” technology

Predictive Modeling• Applications of predictive

modelling include direct marketing and use techniques like decision trees.

• uses observations to form a model of the important characteristics of some phenomenon: e.g. those traits associated with those who will buy property;

• can be used for target marketing….

22

Simple decision tree example

Page 23: Core of Business “Intelligence” technology

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 24: Core of Business “Intelligence” technology

IncomeRange

30-40K

4 Yes1 No

2 Yes2 No

1 Yes3 No

2 Yes

50-60K40-50K20-30K

Table 3.1 • The Credit Card Promotion Database

Income Life Insurance Credit CardRange Promotion Insurance Sex Age

40–50K No No Male 4530–40K Yes No Female 4040–50K No No Male 4230–40K Yes Yes Male 4350–60K Yes No Female 3820–30K No No Female 5530–40K Yes Yes Male 3520–30K No No Male 2730–40K No No Male 4330–40K Yes No Female 4140–50K Yes No Female 4320–30K Yes No Male 2950–60K Yes No Female 3940–50K No No Male 5520–30K Yes Yes Female 19

Page 25: Core of Business “Intelligence” technology

25

How Data Mining Works• For instance, a simple

model for a • Insurance company might

be:– Customers who earn between 50 K to 60 K have

a life insurance policy. • This model could then be

applied to the general population to target those for the life insurance promotion.

• The tree can be more complex e.g. See figure opposite

Age

Sex

<= 43

Male

Yes (6/0)

Female

> 43

CreditCard

Insurance

YesNo

No (4/1) Yes (2/0)

No (3/0)

Page 26: Core of Business “Intelligence” technology

26

Database Segmentation• Aim is to partition a database into an

unknown number of segments, or clusters, of similar records.

• Uses clustering techniques in order to group data

• Applications of database segmentation include fraudulent activity (credit card), market segmentation, customer segmentation….

Page 27: Core of Business “Intelligence” technology

27

Database Segmentation using a Scatterplot

Page 28: Core of Business “Intelligence” technology

28

Link Analysis

• Aims to establish links between records, or sets of records, in a database; one such example would be association discovery….

• Applications include product affinity analysis.

• Finds items that imply the presence of other items in the same event.

Page 29: Core of Business “Intelligence” technology

29

Link Analysis - Associations Discovery

• Affinities between items are represented by association discovery. – e.g. ‘When a customer rents property for

more than 2 years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties’.

Page 30: Core of Business “Intelligence” technology

30

Examples of Applications of Data Mining

• Retail / Marketing– Predicting response to mailing campaigns

• Banking:– Detecting patterns of fraudulent credit card use.

• Insurance– Claims analysis

Page 31: Core of Business “Intelligence” technology

31

Data mining in conclusion • Two critical factors for success with data

mining are: – a large, well-integrated data warehouse and – a well-defined understanding of the business

process within which data mining is to be applied (e.g. customer prospecting (target marketing), retention, campaign management etc.).

Page 32: Core of Business “Intelligence” technology

32

What is OLAP

• OLAP stands for "On-Line Analytical Processing.“• OLTP ("On-Line Transaction Processing")• OLAP describes a class of technologies that are designed

for live ad hoc data access and analysis.• OLTP generally relies solely on relational databases,• OLAP has become synonymous with multidimensional

views of business data supported by multidimensional databases

• Relational databases were never intended to provide data synthesis, analysis and consolidation functionality.

Page 33: Core of Business “Intelligence” technology

33

What is OLAP• OLTP databases are optimised for transaction updating

however, • OLAP applications are used by managers and analysts for

a higher level aggregate view of the data, thus they are designed for analysis.

• Many problems that people try to solve using relational databases e.g. summaries are handled much more efficiently by an OLAP server than by RDBMS

Page 34: Core of Business “Intelligence” technology

34

Key OLAP “server” Features

Although OLAP applications are found in widely divergent functional areas, as illustrate in the table opposite. Moreover they all have the following key features:

1. multi-dimensional views of data (MD databases via Star Schema)

2. Support complex calculations

3. Time intelligence4. Data sparicity

Page 35: Core of Business “Intelligence” technology

A star schema for credit card purchases

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Purchase Key Category1 Supermarket

.

.

.

.

.

.

2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.

Star Schema: basis of MD view

Page 36: Core of Business “Intelligence” technology

• Example of three-dimensional query.

• What is the total amount and number of purchases for vehicles in region 2 for December.

Multidimensional cube for credit card purchases

Dec.

Mar.

Feb.

Apr.

May

Jun.

Jul.

Aug.

Sep.

Oct.

Nov.

Jan.

Mon

th

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

FourThreeTwo

Month = Dec.

Count = 110Amount = 6,720Region = TwoCategory = Vehicle

Multi-dimensional view as a cube: also represented a 4 column table

Page 37: Core of Business “Intelligence” technology

37

Why Multidimensional Data• Queries requiring only a single number to be

retrieved need not use multidimensional databases.• If queries involved retrieving multiple numbers

and aggregating them for large databases can become intolerable as relational databases can scan only a few hundred records per second.

• However multidimensional databases can add up 10,000 or more numbers in rows and columns per second.

• Thus for such queries multidimensional databases have an enormous performance advantage

Page 38: Core of Business “Intelligence” technology

Multi-dimensional Operations

• Slice – A single dimension operation• Dice – A multidimensional operation• Roll-up – A higher level of generalization

• (total sales: can be simple (e.g. region) or multiple (region, product type)

• Drill-down – A greater level of detail• Rotation – View data from a new perspective

Page 39: Core of Business “Intelligence” technology

39

Drill down to core database• Most organisations now utilise relational

databases as standard for their data warehouses. • Often there is no need to replicate all the data in

the relational database into a MD database for OLAP.

• Summary level data can be kept in the MD database and detailed data in the relational database.

Page 40: Core of Business “Intelligence” technology

40

Support for complex calculations• Important computational features of OLAP servers

inlcude:– Independently dimensioned variables (IDV):

• numeric measures variables (facts) such as Sales, Cost, price…; based on relevant dimensions; region, customer type, product…

– Statistical calculations• provide a range of powerful computational and statistical

methods such as that required by sales forecasting: regression analysis , projection . Correlations…

– Vector Arithmetic

Page 41: Core of Business “Intelligence” technology

41

Time Series Data Types• Users want to look at trends in all aspects of their business

e.g. sales trends, market trends etc.• A series of numbers representing a particular variable over

time is called a time series e.g.. 52 weekly sales numbers is a time series.

• Utilising a time-series data type allows you to store an entire string of numbers representing daily, weekly or monthly data.

• Thus an OLAP server that supports time-series data type allows one to store historical data without having to specify a separate dimension for time.

• Unlike other dimensions time has special attributes and rules: periodicity,

Page 42: Core of Business “Intelligence” technology

42

Sparse Data

• When less than 10% of the cells contain data the database is said to be sparsely populated or sparse.

• Scarcity can also occur if there are many cells that contain the same number e.g.. Price of a product every day.

• This situation can also be represented by storing the number once along with the number of days that the number is repeated

• While a relational database would fill up the database with duplicate data an OLAP server that understands sparse data can skip over zeros, missing data and duplicate data.

Page 43: Core of Business “Intelligence” technology

Alternative Database topology: The star schema

D.W. O.L.A.P

Data mining

Page 44: Core of Business “Intelligence” technology

Product CodeProdRef Eff. Date

ProdRef End DateProduct NameUnit PriceProduct CategoryProduct TypeProduct Sub Type

Product RefCustomer IDActivity DateProduct Code

Product NameSales Rep IDQty PurchasedTotal DollarsPromotion Flag

Cust PurchasesCustomerCustomer IDStatus Date

Cust Addr StateCust ZIP CodeCustomer TypeCustomer Status...

Sales Rep ID

Sales Person NameStore ID

Sales Rep RefStore ID

Store NameStore LocationDistribution Channel

Outlet ReferenceCust AveragesCustomer IDCust Average Date

Cust Avg. End DateCust Avg. Rev.Cust Longevity

The Atomic Schema

Page 45: Core of Business “Intelligence” technology

Dimension Table 1

Dimension Table 2

Dimension Table 3

Dimension Table 4

Fact Table

Dimension Key 4

Description 4Aggregatn Lvl 4.1Aggregatn Lvl 4.2Aggregatn Lvl 4.n

Dimension Key 1

Description 1Aggregatn Lvl 1.1Aggregatn Lvl 1.2Aggregatn Lvl 1.n

Dimension Key 2

Description 2Aggregatn Lvl 2.1Aggregatn Lvl 2.2Aggregatn Lvl 2.n

Dimension Key 3

Description 3Aggregatn Lvl 3.1Aggregatn Lvl 3.2Aggregatn Lvl 3.n

Dimension Key 1Dimension Key 2Dimension Key 3Dimension Key 4

Fact 1Fact 2Fact 3Fact 4...Fact n

The Star Schema

Page 46: Core of Business “Intelligence” technology

Dimension Table 1Dimension Key 1

Description 1Aggregatn Lvl 1.1Aggregatn Lvl 1.2Aggregatn Lvl 1.n

Dimension Table

• Describes the data that has been organized in the Fact Table

• Key should either be the most detailed aggregation level necessary (e.g. country vs. county), if possible, or...

• Manageable number of aggregation levels

Page 47: Core of Business “Intelligence” technology

Dimension Key 1Dimension Key 2Dimension Key 3Dimension Key 4

Fact 1Fact 2Fact 3Fact 4...Fact n

Fact Table

Fact Table• Quantifies the data that has been

described by the Dimension Tables• Key made up of unique combination

of values of dimension keys–ALWAYS contains date or date

dimension

• Fact values should be additive–Aggregations of quantities or amounts

from atomic level–Can not be percentages or ratios

Page 48: Core of Business “Intelligence” technology

Purchases 1

Days of ActivityUnit PriceTotal QuantityTotal DollarsReturned QtyReturned DollarsPromotion Qty

Sales Rep IDProduct CodeCust ZIP CodeCustomer TypeWeek Ending Date

Cust ZIP Code

CityState/ProvinceCountry

Customer Location

ProductProduct Code

Product NameProd. CategoryProduct TypeProd Sub Type

Week Ending Date

MonthQuarterYear

Date Information

Customer Type

Cust Type Desc

Customer Type

Selling ResponsibilitySales Rep ID

Sales Rep NameStore IDStore NameStore LocationSales Channel

For Example:

Page 49: Core of Business “Intelligence” technology

Select E.Month, B.Customer_Type, C.Product_Type,D.Store_Location, sum(A.Total_Quantity)

From Purchases_1 A, Customer_Type B, Product C,Selling_Responsibility D, Date_Information E

Where B.Customer_Type = A.Customer_Type andC.Product_Code = A.Product_Code andD.Sales_Rep_ID = A.Sales_Rep_ID andE.Week_Ending_Date = A.Week_Ending_Date andE.Year = “1996” andC.Product_Category = “V”

Group by E.Month, B.Customer_Type, C.Product_Type,D.Store_Location;

Star Schema Query

Page 50: Core of Business “Intelligence” technology

A star schema for credit card purchases

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Purchase Key Category1 Supermarket

.

.

.

.

.

.

2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.

Star Schema: basis of MD view

Page 51: Core of Business “Intelligence” technology

• Example of three-dimensional query.

• What is the total amount and number of purchases for vehicles in region 2 for December.

Multidimensional cube for credit card purchases

Dec.

Mar.

Feb.

Apr.

May

Jun.

Jul.

Aug.

Sep.

Oct.

Nov.

Jan.

Mon

th

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

FourThreeTwo

Month = Dec.

Count = 110Amount = 6,720Region = TwoCategory = Vehicle

Example of Star Schema query:

Page 52: Core of Business “Intelligence” technology

52

Question• Business decisions require the delivery of critical

information in a timely, suitable format. Explain, using appropriate examples, how OLAP can facilitate the business decision making process.

• Discuss how a data ware house can play’s key role in strategic decision making.

• Discuss, using suitable examples how data mining can contribute to companies making a proactive knowledge driven decisions which could help with formulation of a companies strategy.

Page 53: Core of Business “Intelligence” technology

Question• A data warehouse, a data mining systems and OLAP are 3

important technologies used in facilitating business decision making. using a suitable examples. – The star schema is a database schema that can be

utilised by all three technologies: Describe, using a simple example, The essential elements of this schema

– (10 marks)– Explain how, such star schemas, can be used by any

two of the technologies above technologies to provide information to derive simple business strategies.

– (20 marks)