23
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Embed Size (px)

Citation preview

Page 1: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Fox MISSpring 2011

Data Mining

Week 9Introduction to Data Mining

Page 2: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Data WarehouseData Warehouse

Customer No. Name Address Membership

Product No. Product Name Price Description

External Source

MySQL

ERD

Data Mining

Competitive Advantage Performance

Good Business Decision Better Understanding

Page 3: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Defining User Communities

• Information user– Generally requires standard reports and

that often includes charts and tables– Wants to scan consistently structured

reports without needing slice or dice to find the desired values

– Static or simple interactive reports• Information consumer

– Requires the ability to dynamically query the database, without becoming an expert at database design or the query tool

– Ad-hoc multidimensional analysis– Many business people cross the line

between information users and information consumers

• Power analyst– Require the full analytical power of the

data mart in order to perform free-form ad hoc analysis

Page 4: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Some Questions Analysts Need to Answers

• Sales analysis:– What are the sales by quarter and geography?– How do sales compare in two different stores in the same

state?

• Profitability analysis:– Which is the most profitable store in the state CA? – Which product lines are the highest revenue producers this

year?– Which products and product lines are the most profitable

this quarter?

• Sale force analysis– Which salesperson is the best revenue producer this year?

Do salesperson X meet his sale target this quarter?

Page 5: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Finding a Pattern from Data• Tenure and sick days by department

– Average tenure for each department: 9.0– Average number of sick days is 7.5 for each

Page 6: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Finding a Pattern: Graphical Representation

Page 7: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Data Analysis Evolutionary Step

Evolutionary Step Business Question Enabling Technologies Characteristics

Data Collection (1960s)

"What was my total revenue in the last five years?"

Computers, tapes, disks Retrospective,static data delivery

Data Access (1980s) "What were unit sales in New England last March?"

Relational databases (RDBMS), Structured Query Language (SQL)

Retrospective, dynamic data delivery at record level

Data Warehousing & Decision Support(1990s)

"What were unit sales in New England last March? Drill down to Boston."

On-line analytic processing (OLAP), multidimensional databases, data warehouses

Retrospective, dynamic data delivery at multiple levels

Data Mining (Emerging Today)

"What’s likely to happen to Boston unit sales next month? Why?"

Advanced algorithms,multiprocessor computers, massive databases

Prospective, proactive information delivery

Page 8: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

• The application of specific algorithms for extracting patterns from data

• Data mining tools automatically search data for patterns and relationships

• Data mining tools– Analyze data– Uncover problems or opportunities– Form computer models based on findings– Predict business behavior with models– Require minimal end-user intervention

Data Mining

Page 9: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Data Mining

• Goal– Simplification and automation of the overall

statistical process, from data source(s) to model application

• Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: – Massive data collection – Powerful multiprocessor computers – Data mining algorithms

Page 10: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Convergence of Three Key Technologies

Page 11: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Data Mining and Knowledge Discovery in the Real World

• Marketing– If customer bought X, he/she is also likely to

buy Y and Z• Investment

– Stock investment• Fraud detection

– Identify financial transactions that might indicate money-laundering activity

Page 12: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

A Problem...• You are a marketing manager for a brokerage

company

• Problem: Churn is too high– Turnover (after six month introductory period

ends) is 40%– Customers receive incentives (average cost:

$160) when account is opened– Giving new incentives to everyone who might

leave is very expensive (as well as wasteful)– Bringing back a customer after they leave is both

difficult and costly

Page 13: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

… A Solution

• One month before the end of the introductory period is over, predict which customers will leave

• If you want to keep a customer that is predicted to churn, offer them something based on their predicted value

• The ones that are not predicted to churn need no attention

Page 14: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

A weather problem

Page 15: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

A numeric weather problem

Page 16: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Benefit of Data Mining• New business opportunities by providing these

capabilities: • Automated prediction of trends and behaviors

– Targeted marketing.• Promotional mailings to identify the targets most likely to

maximize return on investment in future mailings. – Forecasting bankruptcy and other forms of default

• Automated discovery of previously unknown patterns. – Data mining tools sweep through databases and

identify previously hidden patterns in one step– Analysis of retail sales data to identify seemingly

unrelated products that are often purchased together

Page 17: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Descriptive Data Mining

• Descriptive Data Mining

– Seeks to describe new patterns in the data and requires human interaction to determine the significance and meaning of these patterns

– Affinity grouping• Which item goes together

– Clustering• Divides data into smaller groups based on similarity

without predefinition of the groups– Customers with similar buying habits

– Visualization• Graphical representation of data

Page 18: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Predictive Data Mining

• Likelihood of a particular outcome

• Mathematical algorithms are used to create models

• Classification

– A new record is assigned to a specific category defined by the model

– New credit applicants as low risk, medium risk, or high risk

• Estimation

– Assign a new record with a predicted value

– Length of time a customer will stay

Page 19: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Defining Data Mining

• The automated extraction of predictive information from (large) databases

• Two key words:– Automated– Predictive

• Data mining lets you be proactive• Prospective rather than Retrospective

Page 20: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

How Data Mining Works: Modeling• Modeling is simply the act of building a model in one

situation where you know the answer and then applying it to another situation that you don't.

• Some models are better than others– Accuracy– Understandability

• Models range from “easy to understand” to incomprehensible

• Decision trees• Rule induction• Regression models• Neural Networks

Page 21: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Techniques in Data Ming

• Decision Trees

• Nearest Neighbor Classification

• Neural Networks

• Rule Induction

• K-means Clustering

Page 22: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Distinctions

Page 23: Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining

Distinctions (Continued)