35
Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Embed Size (px)

Citation preview

Page 1: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Introduction to Mathematical Programming

MA/OR 504

Chapter 7Machine Learning:

Discriminant AnalysisNeural Networks

6-1

Page 2: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Part 1: Discriminant Analysis and Mahalanobis Distance

Chapter 7

Page 3: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Introduction to Discriminant Analysis (DA)

DA is a statistical technique that uses information from a set of independent variables to predict the value of a discrete or categorical dependent variable.

The goal is to develop a rule for predicting to which of two or more predefined groups a new observation belongs based on the values of the independent variables.

Examples:– Credit Scoring

Will a new loan applicant: (1) default, or (2) repay?– Insurance Rating

Will a new client be a: (1) high, (2) medium or (3) low risk?

Page 4: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Types of DA Problems

2 Group Problems...…regression can be used

k-Group Problem (where k>=2)...…regression cannot be used if k>2

Page 5: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Example of a 2-Group DA Problem:ACME Manufacturing

All employees of ACME manufacturing are given a pre-employment test measuring mechanical and verbal aptitude.

Each current employee has also been classified into one of two groups: satisfactory or unsatisfactory.

We want to determine if the two groups of employees differ with respect to their test scores.

If so, we want to develop a rule for predicting whether new applicants will be satisfactory or unsatisfactory.

Page 6: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

The Data

See file Fig7-1.xls

Page 7: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Graph of Data for Current Employees

25

30

35

40

45

25 30 35 40 45 50

Mechanical Aptitude

Ver

bal

Ap

titu

de

Satisfactory Employees

Unsatisfactory Employees

Group 1 centroid

Group 2 centroid

C2

C1

Page 8: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Calculating Discriminant Scores

Y X Xi ob b bi i

1 1 2 2

whereX1 = mechanical aptitude test score

X2 = verbal aptitude test scoreFor our example, using regression we obtain,

iii 21 X0272.0X0791.0373.5Y

Figure 7-2

Page 9: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

A Classification Rule

If an observation’s discriminant score is less than or equal to some cutoff value, then assign it to group 1; otherwise assign it to group 2

What should the cutoff value be?

Page 10: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Possible Distributions of Discriminant Scores

Group 1 Group 2

Cut-off ValueY1Y2

Page 11: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Cutoff Value For data that is multivariate-normal with

equal covariances, the optimal cutoff value is:

Cutoff Value = Y Y

21 2

For our example, the cutoff value is:

Cutoff Value = 2

1193 1 7641 479

. ..

Even when the data is not multivariate-normal, this cutoff value tends to give good results.

Page 12: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Calculating Predicted Group

See file Fig7-3.xls

Page 13: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

A Refined Cutoff Value Costs of misclassification may differ. Probability of group memberships may differ. The following refined cutoff value

accounts for these considerations:

Cutoff Value = Y Y

2 Y YLN

p C

p C1 2 p

1 2 1

( | )

( | )

S 22 12

21

Page 14: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Classification Accuracy

Predicted Group

1 2 Total

Actual 1 9 2 11

Group 2 2 7 9

Total 11 9 20

Accuracy rate = 16/20 = 80%

Page 15: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Classifying New Employees

See file Fig7-4.xls

Page 16: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

The k-Group DA Problem Suppose we have 3 groups (A=1, B=2 & C=3)

and one independent variable.

Y Xi b bi

0 1 1

We could then fit the following regression function:

.Yi 1 5

1 5 2 5. . Yi

.Yi 2 5

If the discriminant score is: Assign observation to group:

A

B

C

The classification rule is then:

Page 17: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Graph Showing Linear Relationship

0

1

2

3

0 1 2 3 4 5 6 7 8 9 10 11 12 13

X

Y

Group A

Group B

Group C

Page 18: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

The k-Group DA Problem Now suppose we re-assign the groups numbers

as follows: A=2, B=1 & C=3.

The relation between X & Y is no longer linear. There is no general way to ensure group numbers are assigned in a way that

will always produce a linear relationship.

Page 19: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Graph Showing Nonlinear Relationship

Y

0

1

2

3

0 1 2 3 4 5 6 7 8 9 10 11 12 13

X

Group A

Group B

Group C

Page 20: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Example of a 3-Group DA Problem:ACME Manufacturing

All employees of ACME manufacturing are given a pre-employment test measuring mechanical and verbal aptitude.

Each current employee has also been classified into one of three groups: superior, average, or inferior.

We want to determine if the three groups of employees differ with respect to their test scores.

If so, we want to develop a rule for predicting whether new applicants will be superior, average, or inferior.

Page 21: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

The Data

See file Fig7-5.xls

Page 22: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

25.0

30.0

35.0

40.0

45.0

25.0 30.0 35.0 40.0 45.0 50.0

Mechanical Aptitude

Ver

bal

Ap

titu

de

Superior Employees

Average EmployeesInferior Employees

C1

Group 1 centroid

Group 2 centroid

Group 3 centroid

C2

C3

Graph of Data for Current Employees

Page 23: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

The Classification Rule

Compute the distance from the point in question to the centroid of each group.

Assign it to the closest group.

Page 24: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Distance Measures

Euclidean Distance

D istance A A B B ( ) ( )1 22

1 22

This does not account for possible differences in variances.

Page 25: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

99% Contours of Two GroupsX2

X1

C2

C1

P1

Page 26: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Distance Measures Variance-Adjusted Distance

variableindep. on j group of variancesample theis 2

variableindep. on j group of mean value theis

variableindep. on obs. of valueis where

2

2

s

)(D

thkjks

thkjkx

thkiikx

k jk

jkikij

xx

This can be adjusted further to account for differences in covariances.

The DA.xla add-in uses the Mahalanobis distance measure.

Page 27: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Mahalanobis Distance

27

2 1

2

1

( ) ( )

where:

Mahalanobis distance

vector of data

vector of mean values of independent variables

inverse of covariance matrix of independent variables

TD x m C x m

D

x

m

C

Page 28: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Using the DA.XLA Add-In

See file Fig7-6.xls

For detail, see

See file Fig. 7-7

Page 29: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Multivariate Normal Distribution

29

Σ

ΣΣ

N ,

11/2/2

~

1 1exp

22

d

T

dp

x

x x x

μ

μ μ

Covariance Matrix

Page 30: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Bivariate Normal

30If X and Y are independent then Cov(X, Y)=0. However, if Cov(X, Y)=0 then X and Y may not be independent.

Page 31: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

31

Page 32: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Example

32

1

2

Suppose X, Y bivariate normal

500 6292 3754 .00025 .00015

500 3754 6280 .00015 .00025

For (X, Y) (410,400), 1.825

C C

D

Page 33: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

MBA Admissions Salterdine Univ wants to use DA to determine

which applicants to admit to the MBA program. Director believes undergraduate GPA and GMAT

score provide useful information for predicting which applicants will be good students.

Faculty classify 30 current students in the MBA program into 2 groups: 1) good students, 2) weak students.

Information for 5 new applicants has been received by the director.

See Fig. 7-8

33

Page 34: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

Bank Loans Commercial loan dept. mgr. evaluates loan

applications. Important company characteristics for evaluating

loan application:1. Liquidity (ratio of current assets to current liabilities)

2. Profitability (ratio of net profit to sales)

3. Activity (ratio of sales to fixed assets)

18 past loans bank has made are categorized1. Acceptable

2. One or two late payments

3. Unacceptable, 3 or more late payments

Must evaluate 5 new loan applications

Fig. 7-934

Page 35: Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1

End of Chapter 7