15
3. DATA MINING Business Intelligence 1

Business Intelligence 3. DATA MINING - KOCWcontents.kocw.net/KOCW/document/2014/koreasejong/... · 2016-09-09 · Data Mining Definition “Computational process of extracting interesting

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

3. DATA MINING

Business Intelligence

1

Data Mining

Definition

“Computational process of extracting interesting (non-

trivial, implicit, previously unknown and potentially

useful) patterns or knowledge from huge amount of

data using methods from artificial intelligence, machine

learning, statistics, and database systems.”

Alternative names

◦ Knowledge discovery in databases (KDD), knowledge

extraction, data/pattern analysis, business intelligence,

big data, etc.

2

Background

Explosive Growth of Data

◦ Transactions, web, e-commerce, mobile computing, sensors

(weather sensors, closed caption cameras), etc.

We are drowning in data, but starving for knowledge!

To mine knowledge from abundant data, we need

automated analysis

One of the technologies

for Big Data

3

A Typical Data Mining System

data cleaning, integration, and selection

Database or Data Warehouse Server

Data Mining Engine

Pattern Evaluation

Visualization

Database Data

Warehouse

World

Wide

Web

Other Info

Repositories

4

Data Mining is a Confluence of Multiple Disciplines

Data Mining

Database Technology Statistics

Machine Learning

Pattern Recognition

Algorithm

Other Disciplines

Visualization

5

Data Mining Classifications

Pattern Mining and Trend Analysis

◦ Association, regression

E.g. Diaper Beer [2%, 75%]

◦ Trend? common temporal behaviors

Classification (symbolic) and prediction (numeric)

◦ E.g., classify a given country based on its ‘climate’ (into tropical climate,

temperate climate, polar climate country)

◦ E.g., predicting temperatures and plants in a country.

Cluster analysis

◦ Group data into different clusters that maximize intra-class similarity

interclass dissimilarity

E.g., cluster animals into various classes (mammal, fish, etc.)

Outlier analysis

◦ Find data object not complying with general characteristics

E.g., Credit card fraud, call fraud, money laundry, etc.

6

Business Applications of DM

Where does the business data come from?

◦ Credit card transactions, loyalty cards, discount coupons, customer complaint calls, (public) lifestyle studies, mobile devices, sensors, etc.

Where does DM applied?

◦ Customer Clustering

◦ Customer Profiling

◦ Determine customer purchasing patterns

◦ Cross-market analysis

◦ Predict what factors will attract new customers

◦ Fraud detection: auto insurance, medical insurance, phone-call frauds,

or money laundering detection

◦ Anti-terrorism

7

PREDICTION

Business Intelligence 3. Data Mining

8

Classification and Prediction

Classification (symbolic)

◦ Construct classification model using training set (having values and class labels), then, classify new instances into categories using the model

◦ Techniques: if/then rules, decision trees, artificial neural net

Prediction (numeric)

◦ Predicts unknown or missing numeric values

◦ Construct models of continuous-valued functions, then use it for predictions

◦ Techniques: regression

Typical applications of classification and prediction

◦ Credit approval, Target marketing, Medical diagnosis, Fraud detection, etc.

9

Prediction

Predict a numeric value for a target variable given various condition variable values.

Two Step Process

1. Construct a model

2. Use model to predict a value for a given input

Major method for prediction: regression

◦ Model the relationship between one or more independent or predictor variables and a dependent or response variable

Regression analysis

◦ Linear regression

◦ Multiple linear regression

◦ Non-linear regression

◦ Etc.

Linear Regression

Linear regression: involves a response variable y and a single

predictor variable x

where w0 (y-intercept) and w1 (slope) are regression coefficients

Method of least squares: estimates the best-fitting straight line

||

1

2

||

11

)(

))((

D

i

i

D

i

ii

xx

yyxx

w

xwyw 10

xwwy 10

11

Multiple Linear Regression

Multiple linear regression: involves more than one predictor

variable

◦ Training data is in the form of (x1, x2, y)

Ex. For 2-D data

◦ Solvable by extension of least square method for y = w0 + w1 x1+ w2 x2

12

Nonlinear Regression

Some nonlinear models can be modeled by a polynomial function

A polynomial regression model can be transformed into multiple

linear regression model.

For example, y = w0 + w1 x + w2 x2 + w3 x

3

can be converted to linear with new variables: x2 = x2, x3= x3

y = w0 + w1 x + w2 x2 + w3 x3

Other functions, such as power function, can also be transformed to

linear model

13

Accuracy Measures

C ~C Total (fact)

C True positive False negative Positive

~C False positive True negative Negative

Total (prediction) Pos-predictions Neg-predictions Total

Accuracy measures

◦ accuracy = (t-pos + t-neg) / (pos + neg)

◦ error-rate = 1 - accuracy

◦ sensitivity = t-pos/pos

(identifying cows with Mad Cow Disease - find all positive cases. )

◦ specificity = t-neg/neg

(identifying professional patients - find all negative cases)

(fact)

(fact)

(prediction)

fact

prediction

14

Accuracy Measures – An Exercise

Compute accuracy, error_rate, sensitivity, and specificity for the

following example

classes buy_computer = yes

buy_computer = no

total recognition(%)

buy_computer = yes 6954 46 7000 99.34

buy_computer = no 412 2588 3000 86.27

total 7366 2634 10000 95.52

15