64
Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

Embed Size (px)

Citation preview

Page 1: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

Section 6E-Biz &

DATABASE

Section 6E-Biz &

DATABASE

Special thanks to Dr. George M. Marakas

Page 2: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-2

LEARNING OUTCOMES

• List, describe, and provide an example of each of the five characteristics of high quality information

• Define the relationship between a database and a database management system

• Describe the advantages an organization can gain by using a database.

Page 3: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-3

UNDERSTANDING INFORMATION

• Information is everywhere in an organization

• Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions

• Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing

Page 4: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-4

UNDERSTANDING INFORMATION

• Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract)– Levels

– Formats

– Granularities

Page 5: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-5

Information Quality

• Business decisions are only as good as the quality of the information used to make the decisions

• Characteristics of high quality information include:– Accuracy– Completeness– Consistency– Uniqueness– Timeliness

Page 6: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-6

Information Quality

• Low quality information example

Page 7: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-7

Understanding the Costs of Poor Information

• The four primary sources of low quality information include:

1. Online customers intentionally enter inaccurate information to protect their privacy

2. Information from different systems have different entry standards and formats

3. Call center operators enter abbreviated or erroneous information by accident or to save time

4. Third party and external information contains inconsistencies, inaccuracies, and errors

Page 8: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-8

Understanding the Costs of Poor Information

• Potential business effects resulting from low quality information include:– Inability to accurately track customers– Difficulty identifying valuable customers– Inability to identify selling opportunities– Marketing to nonexistent customers– Difficulty tracking revenue due to inaccurate

invoices– Inability to build strong customer relationships

Page 9: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-9

Understanding the Benefits of Good Information

• High quality information can significantly improve the chances of making a good decision

• Good decisions can directly impact an organization's bottom line

Page 10: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-10

DATABASE FUNDAMENTALS

• Information is everywhere in an organization

• Information is stored in databases– Database – maintains information about various

types of objects (inventory), events (transactions), people (employees), and places (warehouses)

Page 11: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-11

DATABASE ADVANTAGES

• Database advantages from a business perspective include– Increased flexibility

– Increased scalability and performance

– Reduced information redundancy

– Increased information integrity (quality)

– Increased information security

Page 12: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-12

Increased Flexibility

• A well-designed database should:– Handle changes quickly and easily

– Provide users with different views

– Have only one physical view• Physical view – deals with the physical storage of

information on a storage device

– Have multiple logical views• Logical view – focuses on how users logically

access information

Page 13: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-13

INTEGRATING DATA AMONG MULTIPLE DATABASES

• Integration – allows separate systems to communicate directly with each other– Forward integration – takes information entered

into a given system and sends it automatically to all downstream systems and processes

– Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes

Page 14: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-14

INTEGRATING DATAAMONG MULTIPLE DATABASES

• Forward and backward integration

Page 15: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-15

INTEGRATING DATAAMONG MULTIPLE DATABASES

• Building a central repository specifically for integrated information

Page 16: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

Data Warehouse Data Mining in eBiz

Page 17: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-17

LEARNING OUTCOMES

• Describe the roles and purposes of data warehouses and data marts in an organization

• Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases

• Identify the importance of ensuring the cleanliness of information throughout an organization

• Explain the relationship between business intelligence and a data warehouse

Page 18: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-18

HISTORY OF DATA WAREHOUSING

• Data warehouses extend the transformation of data into information

• In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions

• The data warehouse provided the ability to support decision making without disrupting the day-to-day operations

Page 19: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-19

DATA WAREHOUSE FUNDAMENTALS

• Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks

• The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes

Page 20: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-20

DATA WAREHOUSE FUNDAMENTALS

• Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse

• Data mart – contains a subset of data warehouse information

Page 21: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-21

DATA WAREHOUSE FUNDAMENTALS

Page 22: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-22

From Data Warehousing to Data Mining

Page 23: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-23

Multidimensional Analysis

• Databases contain information in a series of two-dimensional tables

• In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows– Dimension – a particular attribute of information

Page 24: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-24

Multidimensional Analysis

• Cube – common term for the representation of multidimensional information

Page 25: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-25

Multidimensional Analysis

• Data mining – the process of analyzing data to extract information not offered by the raw data alone

• To perform data mining users need data-mining tools– Data-mining tool – uses a variety of techniques to

find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making

Page 26: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-26

Information Cleansing or Scrubbing

• An organization must maintain high-quality data in the data warehouse

• Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

Page 27: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-27

Information Cleansing or Scrubbing

• Contact information in an operational system

Page 28: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-28

Information Cleansing or Scrubbing

• Standardizing Customer name from Operational Systems

Page 29: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-29

Information Cleansing or Scrubbing

Page 30: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-30

Information Cleansing or Scrubbing

• Accurate and complete information

Page 31: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-31

DATA MINING

• Data-mining software includes many forms of AI such as neural networks and expert systems

Page 32: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-32

Data Mining’s Growth in Popularity

• One reason is that we keep getting more and more data all the time and need tools to understand it.

• We also are aware that the human brain has trouble processing multidimensional data.

• A third reason is that machine learning techniques are becoming more affordable and more refined at the same time.

Page 33: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-33

DATA MINING

• Common forms of data-mining analysis capabilities include:

– Cluster analysis

– Association detection

– Statistical analysis

Page 34: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-34

Cluster Analysis

• Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible

• CRM systems depend on cluster analysis to segment customer information and identify behavioral traits

Page 35: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-35

Cluster Example

Page 36: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-36

Statistical Analysis

• Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis

– Forecast – predictions made on the basis of time-series information

– Time-series information – time-stamped information collected at a particular frequency

Page 37: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-37

Association Detection

• Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information

– Market basket analysis – analyzes such items as Web sites and checkout scanner information to detect customers’ buying behavior and predict future behavior by identifying affinities among customers’ choices of products and services

Page 38: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-38

Making Accurate Predictions with Data Mining

• Although the literature contains statements such as “data mining will allow us to predict who will buy a particular product,” that is against human nature.

• • In situations where data mining is used to

predict response to a marketing campaign, only about 5% of the people selected as “likely respondents” actually do respond.

Page 39: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-39

Making Accurate Predictions with Data Mining (cont.)

• Although the accuracy of predicting individual behavior is not so good, it is better than it seems, since direct marketing efforts often have “hit rates” of only about 1% without data mining.

Page 40: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-40

Online Analytical Processing (OLAP)

1. Multidimensional view

2. Transparent to user

3. Accessible

4. Consistent reporting

5. Client-server architecture

6. Generic dimensionality

7. Dynamic sparse matrix handling

8. Multiuser support

9. Cross-dimensional ops

10. Intuitive manipulation

11.Flexible reporting

12.Unlimited dimension and aggregation

Codd developed a set of 12 rules for the development of multidimensional databases:

Page 41: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-41

OLAP as Implemented

• To date, it does not appear that any implementation exists that satisfies all 12 rules.

• Some people argue it might not even be possible to attain all of them.

• More recently, the term OLAP has come to represent the broad category of software technology that enables multidimensional analysis of enterprise data.

Page 42: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-42

Multidimensional OLAP (MOLAP)• Data can be viewed across

several dimensions. Here sales are arrayed by region and product.

• A fourth dimension could be added by using several graphs -- perhaps at different time points.

• Most analyses have many more dimensions than this. MOLAP handles data as an n-dimensional hypercube.

4

3

1

0.3Product

0.4

0.5

2

0.6

0.7

2

Sales

1

3Region

Page 43: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-43

Relational OLAP (ROLAP)

• A large relational database server replaces the multidimensional one.

• The database contains both detailed and summarized data, allowing “drill down” techniques to be applied.

• SQL interfaces allow vendors to build tools, both portable and scalable.

• This does require databases with many relational tables which may lead to substantial processor overhead on complex joins.

Page 44: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-44

A Typical Relational Schema

Page 45: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-45

3-3: Techniques Used to Mine the Data

• Paralleling the popularity of data mining itself, the development of new techniques is exploding as well.

• Many innovations are vendor-specific, which sometimes does little to advance the state of the art.

• Regardless, data-mining techniques tend to fall into four major categories:

1. classification 2. association3. sequencing 4. clustering

Page 46: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-46

Classification methods

• The goal is to discover rules that define whether an item belongs to a particular subset or class of data.

• For example, if we are trying to determine which households will respond to a direct mail campaign, we will want rules that separate the “probables” from the not probables.

• These IF-THEN rules often are portrayed in a tree-like structure.

Page 47: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-47

The Knowledge Discovery Search Process

Steps in Discovery :

– Define the business problem and obtain the data to study it.

– Use data mining software to model the problem.

– Mine the data to search for patterns of interest.

Page 48: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-48

The Knowledge Discovery Search Process (cont.)

– Review the mining results and refine them by respecifying the model.

– Once validated, make the model available to other users of the DW.

Page 49: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-49

Creating a Data-Mining Model

Although syntax differs from vendor to vendor, building a model on top of a database is much like creating a table:CREATE MODEL mail_listIncome character input, Age integer input, Respond

character input

To populate it with data, use an SQL INSERT:INSERT INTO mail_listSELECT income, age, respondFROM client_listWHERE region = ‘Southeast”

Page 50: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-50

Creating a Data-Mining Model (cont.)

The process automatically created additional views of the model (mail_list_UNDERSTAND and mail_list_PREDICT). These can be examined:

SELECT * FROM mail_list_UNDERSTANDWHERE input_column_name = ‘income” and

input_column_value = “high” andoutput_column_name = “respond” andoutput_column_value = ‘yes”

Once these are created, they are treated as tables in the database so they can be viewed and joined by other users.

Page 51: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-51

New Applications for Data Mining

As the technology matures, new applications emerge, especially in two new categories, text mining and web mining. Some text mining examples are:– Distilling the meaning of a text

– Accurate summarization of a text

– Explication of the text theme structure

– Clustering of texts

Page 52: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-52

Web mining

• Web mining is a special case of text mining where the mining occurs over a website.

• It enhances the website with intelligent behavior, such as suggesting related links or recommending new products.

• It allows you to unobtrusively learn the interests of the visitors and modify their user profiles in real time.

• They also allow you to match resources to the interests of the visitor.

Page 53: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-53

Market Basket Analysis:

• This is the most widely used and, in many ways, most successful data mining algorithm.

• It essentially determines what products people purchase together.

• Stores can use this information to place these products in the same area.

• Direct marketers can use this information to determine which new products to offer to their current customers.

• Inventory policies can be improved if reorder points reflect the demand for the complementary products.

Page 54: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-54

Market Basket Analysis Methodology

• We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers.

• Next, we choose a list of products to analyze, and tabulate how many times each was purchased with the others.

• The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought.

Page 55: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-55

A Convenience Store Example

Consider the following simple example about five transactions at a convenience store:

Transaction 1: Frozen pizza, cola, milkTransaction 2: Milk, potato chipsTransaction 3: Cola, frozen pizzaTransaction 4: Milk, pretzelsTransaction 5: Cola, pretzels

These need to be cross tabulated and displayed in a table.

Page 56: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-56

A Convenience Store Example

• Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity?

• Milk sells well with everything – people probably come here specifically to buy it.

Product Bought

Pizza also

Milk

also

Cola

also

Chips also

Pretzels

also

Pizza 2 1 2 0 0

Milk 1 3 1 1 1

Cola 2 1 3 0 1

Chips 0 1 0 1 0

Pretzels 0 1 1 0 2

Page 57: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-57

Using the Results

• The tabulations can immediately be translated into association rules and the numerical measures computed.

• Comparing this week’s table to last week’s table can immediately show the effect of this week’s promotional activities.

• Some rules are going to be trivial (hot dogs and buns sell together) or inexplicable (toilet rings sell only when a new hardware store is opened).

Page 58: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-58

Limitations to Market Basket Analysis

• A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency.

• The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers).

Page 59: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-59

Performing Analysis with Virtual Items

• The sales data can be augmented with the addition of virtual items. For example, we could record that the customer was new to us, or had children.

• The transaction record might look like:Item 1: Sweater Item 2: Jacket Item 3: New

• This might allow us to see what patterns new customers have versus old customers.

Page 60: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-60

Taxonomies

• The presence of items not purchased very frequently is an obstacle to a good market basket analysis.

• One way to deal with this is to eliminate products that occur with a frequency less than some threshold.

• A better idea would be to try to form groups of products that fall below the threshold. Four flavors of popsicle occur 9% of the time all together, but no more than 3% individually.

Page 61: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-61

Multidimensional Market Basket Analysis

• Rules can involve more than two items, for example Plant and Clay Pot IMPLIES Soil.

• These rules are built iteratively. First, pairs are found, then relevant sets of three or four.

• These are then pruned by removing those that occur infrequently.

• In an environment like a grocery store, where customers commonly buy over 100 items, rules could involve as many as 10 items.

Page 62: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-62

Current Limitations and Challenges to Data Mining

Despite the potential power and value, data mining is still a new field. Some things that that thus far have limited advancement are:– Identification of missing information – not all

knowledge gets stored in a database– Data noise and missing values – future systems

need better ways to handle this– Large databases and high dimensionality –

future applications need ways to partition data into more manageable chunks

Page 63: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-63

Popular tools and languages by industry types

David Smith 2012

Page 64: Section 6 E-Biz & DATABASE Section 6 E-Biz & DATABASE Special thanks to Dr. George M. Marakas

6-64

CLOSING CASE

1. Review the five common characteristics of high quality information and rank them in order of importance for a government organization.

2. How could data warehouses and data marts be used to help marketing departments of travel companies improve the efficiency and effectiveness of its operations?