SPSS Discriminant Function Analysis.pdf

Embed Size (px)

Citation preview

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    1/58

    SPSS Discriminant FunctionAnalysis

    By Hui Bian

    Office for Faculty Excellence

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    2/58

    Discriminant Function Analysis

    What is Discriminant function analysisIt builds a predictive model for groupmembership

    The model is composed of a discriminantfunction based on linear combinations ofpredictor variables.

    Those predictor variables provide the bestdiscrimination between groups.

    2

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    3/58

    Discriminant Function Analysis

    Purpose of Discriminant analysis

    to maximally separate the groups.

    to determine the most parsimonious way toseparate groups

    to discard variables which are little relatedto group distinctions

    3

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    4/58

    Discriminant Function Analysis

    Summary: we are interested in the relationshipbetween a group of independent variables and onecategorical variable. We would like to know howmany dimensions we would need to express thisrelationship. Using this relationship, we canpredict a classification based on the independentvariables or assess how well the independentvariables separate the categories in theclassification.

    4

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    5/58

    Discriminant Function Analysis

    It is similar to regression analysis

    A discriminant score can be calculated based on theweighted combination of the independent variables

    D i = a + b 1 x 1 + b 2 x 2 ++ b nx n

    D i is predicted score (discriminant score)

    x is predictor and b is discriminant coefficient

    We use maximum likelihood technique to assign a caseto a group from a specified cut-off score.

    If group size is equal, the cut-off is mean score.If group size is not equal, the cut-off is calculated fromweighted means.

    5

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    6/58

    Discriminant Function Analysis

    Grouping variablesCategorical variables

    Can have more than two values

    The codes for the grouping variables must beintegers

    Independent variablesContinuous

    Nominal variables must be recoded to dummyvariables

    6

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    7/58

    Discriminant Function Analysis

    Discriminant functionA latent variable of a linear combination ofindependent variablesOne discriminant function for 2-group discriminant

    analysisFor higher order discriminant analysis, the number ofdiscriminant function is equal to g-1 (g is the numberof categories of dependent/grouping variable).The first function maximizes the difference betweenthe values of the dependent variable.

    The second function maximizes the differencebetween the values of the dependent variable whilecontrolling the first function.And so on.

    7

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    8/58

    Discriminant Function Analysis

    The first function will be the most powerfuldifferentiating dimension.

    The second and later functions may also representadditional significant dimensions of differentiation.

    8

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    9/58

    Discriminant Function Analysis

    Assumptions (from SPSS 19.0 help)Cases should be independent.Predictor variables should have a multivariate normaldistribution, and within-group variance-covariance

    matrices should be equal across groups.Group membership is assumed to be mutuallyexclusiveThe procedure is most effective when groupmembership is a truly categorical variable; if groupmembership is based on values of a continuousvariable (for example, high IQ versus low IQ),consider using linear regression to take advantage ofthe richer information that is offered by thecontinuous variable itself.

    9

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    10/58

    Discriminant Function Analysis

    Assumptions(similar to those for linear regression)

    Linearity, normality, multilinearity, equalvariances

    Predictor variables should have a multivariatenormal distribution.

    fairly robust to violations of the most of theseassumptions. But highly sensitive to outliers.

    Model specification

    10

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    11/58

    Discriminant Function Analysis

    Test of significanceFor two groups, the null hypothesis is that themeans of the two groups on the discriminantfunction-the centroids, are equal.

    Centroids are the mean discriminant score foreach group.Wilks lambda is used to test for significantdifferences between groups.

    Wilks lambda is between 0 and 1. It tells us thevariance of dependent variable that is notexplained by the discriminant function.

    11

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    12/58

    Discriminant Function Analysis

    Wilks lambda is also used to test for significantdifferences between the groups on the individualpredictor variables.

    It tells which variables contribute a significant amount of

    prediction to help separate the groups.

    12

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    13/58

    Discriminant Function Analysis

    Two groups using an example from SPSS manualExample: the purpose of this example is to identifycharacteristics that are indicative of people who arelikely to default on loans, and use those characteristicsto identify good and bad credit risks.

    Sample includes a total of 850 cases (old andnew/future customers) The first 700 cases arecustomers who were previously given loans.

    Use first 700 customers to create a discriminant analysismodel, setting the remaining 150 customers aside to

    validate the analysis.Then use the model to classify the 150 prospectivecustomers as good or bad credit risks.

    13

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    14/58

    Discriminant Function Analysis

    Grouping variable: Default

    Predictors: employ, address, debtinc, andcreaddebt

    Obtain Discriminant function analysisAnalyze > Classify > Discriminant

    14

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    15/58

    Discriminant Function Analysis

    15

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    16/58

    Discriminant Function Analysis

    16

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    17/58

    Discriminant Function Analysis

    Click Classify to get this window

    17

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    18/58

    Discriminant Function Analysis

    Click Save to get this window

    18

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    19/58

    Discriminant Function Analysis

    SPSS Output: descriptive statistics

    19

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    20/58

    Discriminant Function Analysis

    SPSS output: ANOVA table

    In the ANOVA table, the smaller the Wilks's lambda, the moreimportant the independent variable to the discriminant function.Wilks's lambda is significant by the F test for all independentvariables.

    20

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    21/58

    Discriminant Function Analysis

    SPSS Output (correlation matrix)

    The within-groups correlation matrix shows thecorrelations between the predictors.

    21

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    22/58

    Discriminant Function Analysis

    SPSS output: test of homogeneity ofcovariance matrices

    The larger the log determinantin the table, the more thatgroup's covariance matrixdiffers. The "Rank" columnindicates the number ofindependent variables in thiscase. Since discriminant

    analysis assumes homogeneityof covariance matrices betweengroups, we would like to seethe determinants be relativelyequal.

    22

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    23/58

    Discriminant Function Analysis

    SPSS output: test of homogeneity ofcovariance matrices

    1. Box's M test tests the assumption ofhomogeneity of covariance matrices.

    This test is very sensitive to meetingthe assumption of multivariatenormality.2. Discriminant function analysis isrobust even when the homogeneity ofvariances assumption is not met,provided the data do not containimportant outliers.3. For our data, we conclude the groupsdo differ in their covariance matrices,violating an assumption of DA.4. when n is large, small deviationsfrom homogeneity will be foundsignificant, which is why Box's M mustbe interpreted in conjunction with

    inspection of the log determinants.

    23

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    24/58

    Discriminant Function Analysis

    SPSS output: test of homogeneity ofcovariance matrices

    1. The larger the eigenvalue, the moreof the variance in the dependentvariable is explained by thatfunction.

    2. Dependent has two categories, thereis only one discriminant function.

    3. The canonical correlation is themeasure of association between the

    discriminant function and thedependent variable.

    4. The square of canonical correlationcoefficient is the percentage ofvariance explained in the dependentvariable.

    24

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    25/58

    Discriminant Function Analysis

    SPSS output: summary of canonical discriminantfunctions

    When there are two groups, thecanonical correlation is the mostuseful measure in the table, and it isequivalent to Pearson's correlationbetween the discriminant scores andthe groups.Wilks' lambda is a measure of howwell each function separates casesinto groups. Smaller values of Wilks'lambda indicate greaterdiscriminatory ability of the function.

    The associated chi-square statistic tests thehypothesis that the means of the functions listed areequal across groups. The small significance valueindicates that the discriminant function does better

    than chance at separating the groups.

    25

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    26/58

    Discriminant Function Analysis

    SPSS output: summary of canonical discriminantfunctions

    The standardized discriminant functioncoefficients in the table serve the samepurpose as beta weights in multipleregression (partial coefficient) : theyindicate the relative importance of theindependent variables in predicting thedependent. They allow you to compare

    variables measured on different scales.Coefficients with large absolute valuescorrespond to variables with greaterdiscriminating ability.

    26

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    27/58

    Discriminant Function Analysis

    SPSS output: summary of canonical discriminantfunctions

    1.The structure matrix table shows thecorrelations of each variable with eachdiscriminant function.2. Only one discriminant function is inthis study.3. The correlations then serve likefactor loadings in factor analysis --

    that is, by identifying the largestabsolute correlations associated witheach discriminant function theresearcher gains insight into how toname each function.

    27

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    28/58

    Discriminant Function Analysis

    SPSS output: summary of canonical discriminantfunctions

    1. Discriminant function is a latentvariable that is created as alinear combination ofindependent variables.

    2. Discriminating variables areindependent variables.

    3. The table shows the Pearsoncorrelations between predictorsand standardized canonicaldiscriminant functions.

    4. Loading < .30 may be removedfrom the model.

    28

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    29/58

    Discriminant Function Analysis

    SPSS output: summary of canonicaldiscriminant functions

    This table contains theunstandardized discriminantfunction coefficients. Thesewould be used likeunstandardized b (regression)

    coefficients in multipleregression -- that is, they areused to construct the actualprediction equation which canbe used to classify new cases.

    29

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    30/58

    Discriminant Function Analysis

    Discriminant function: o ur model should belike this:

    D i = 0.058 0.12emplo 0.037addres +

    0.075debin + 0.312 credebt

    30

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    31/58

    Discriminant Function Analysis

    SPSS output: summary of canonicaldiscriminant functions

    Centroids are the mean discriminant scores for each group. This table is used toestablish the cutting point for classifying cases. If the two groups are of equalsize, the best cutting point is half way between the values of the functions atgroup centroids (that is, the average). If the groups are unequal, the optimalcutting point is the weighted average of the two values. The computer does theclassification automatically, so these values are for informational purposes.

    31

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    32/58

    Discriminant Function Analysis

    The centroids are calculated based on the function:D i = 0.058 0.12emplo 0.037addres +0.075debin + 0.312 credebt

    Centroids are discriminant score for each groupwhen the variable means (rather than individualvalues for each case) are entered into the function.

    32

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    33/58

    Discriminant Function Analysis

    SPSS Output: Classification Statistics

    Prior Probabilities are used inclassification. The default isusing observed group sizes. Inyour sample to determine theprior probabilities ofmembership in the groupsformed by the dependent, andthis is necessary if you havedifferent group sizes. If eachgroup is of the same size, as analternative you could specifyequal prior probabilities for allgroups.

    33

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    34/58

    Discriminant Function Analysis

    SPSS Output: Classification Statistics

    Two sets (one for eachdependent group) ofunstandardized lineardiscriminant coefficients arecalculated, which can be usedto classify cases. This is theclassical method ofclassification, though now littleused.

    34

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    35/58

    Discriminant Function Analysis

    SPSS Output: Classification Statistics

    This table is used to assess howwell the discriminant functionworks, and if it works equally wellfor each group of the dependentvariable. Here it correctlyclassifies more than 75% of thecases, making about the sameproportion of mistakes for bothcategories. Overall, 76.0% of thecases are correctly classified.

    35

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    36/58

    Discriminant Function Analysis

    SPSS Output: separate-group plots

    If two distributionsoverlap too much, itmeans they do not

    discriminate too (poordiscriminant function).

    36

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    37/58

    Discriminant Function Analysis

    Run DA

    37

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    38/58

    Discriminant Function Analysis

    We get this classification table

    38 Sensitivity = 45.4%; specificity = 94.2%

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    39/58

    Discriminant Function Analysis

    The table from previous slide shows howaccurately the customers were classifiedinto these groups.

    Sensitivity: highly sensitive test means thatthere are few false negative results (Type IIerror)

    Specificity: highly specific test means that

    there few false positive results (Type Ierror).

    39

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    40/58

    Discriminant Function Analysis

    Discriminant methodsEnter all independent variables into theequation at once

    Stepwise: remove independent variables thatare not significant.

    40

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    41/58

    Discriminant Function Analysis

    Discriminant Function Analysis (more than twoGroups)

    Example from SPSS mannual.

    A telecommunications provider hassegmented its customer base by serviceusage patterns, categorizing the customersinto four groups. If demographic data canbe used to predict group membership, youcan customize offers for individualprospective customers.

    41

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    42/58

    Discriminant Function Analysis

    Variables in the analysis

    Dependent variable custcat (fourcategories)

    Independent variables: demographics

    Obtain discriminant function analysisAnalyze > Classify > Discriminant

    42

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    43/58

    Discriminant Function Analysis

    When you have a lot of predictors, the stepwise methodcan be useful by automatically selecting the "best"variables to use in the model.

    43

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    44/58

    Discriminant Function Analysis

    Click Method

    44

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    45/58

    Discriminant Function Analysis

    MethodWilks' lambda. A variable selection method forstepwise discriminant analysis that chooses variablesfor entry into the equation on the basis of how muchthey lower Wilks' lambda. At each step, the variable

    that minimizes the overall Wilks' lambda is entered.Unexplained variance. At each step, the variable thatminimizes the sum of the unexplained variationbetween groups is entered.Mahalanobis distance. A measure of how much acase's values on the independent variables differ

    from the average of all cases. A large Mahalanobisdistance identifies a case as having extreme valueson one or more of the independent variables.

    45

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    46/58

    Discriminant Function Analysis

    MethodSmallest F ratio. A method of variableselection in stepwise analysis based onmaximizing an F ratio computed from the

    Mahalanobis distance between groups.Rao's V. A measure of the differencesbetween group means. Also called theLawley-Hotelling trace. At each step, thevariable that maximizes the increase in

    Rao's V is entered. After selecting thisoption, enter the minimum value avariable must have to enter the analysis.

    46

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    47/58

    Discriminant Function Analysis

    Use F value. A variable is entered into the model if its Fvalue is greater than the Entry value and is removed ifthe F value is less than the Removal value . Entry mustbe greater than Removal, and both values must bepositive. To enter more variables into the model, lower

    the Entry value. To remove more variables from themodel, increase the Removal value.

    Use probability of F. A variable is entered into the modelif the significance level of its F value is less than theEntry value and is removed if the significance level isgreater than the Removal value. Entry must be less than

    Removal, and both values must be positive. To entermore variables into the model, increase the Entry value.To remove more variables from the model, lower theRemoval value.

    47

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    48/58

    Discriminant Function Analysis

    SPSS output

    The stepwise method starts with a modelthat doesn't include any of the predictors

    (step 0).At each step, the predictor with the largestF to Enter, value that exceeds the entrycriteria (by default, 3.84) is added to themodel.

    48

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    49/58

    Discriminant Function Analysis

    SPSS output

    49

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    50/58

    Discriminant Function Analysis

    SPSS output: variables in the analysis

    Tolerance is the proportion of a variable's variance notaccounted for by other independent variables in the equation.A variable with very low tolerance contributes littleinformation to a model and can cause computationalproblems.Actually, tolerance is about multicollinearity.

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    51/58

    Discriminant Function Analysis

    SPSS output: variables in the analysis

    F to Remove values are useful for describing what happens if a variableis removed from the current model (given that the other variablesremain).

    51

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    52/58

    Discriminant Function Analysis

    SPSS output: Summary of Canonical DiscriminantFunctions

    Nearly all of the variance explained by the model is due to the firsttwo discriminant functions. We can ignore the third function. For eachset of functions, this tests the hypothesis that the means of thefunctions listed are equal across groups. The test of function 3 has ap value of .34, so this function contributes little to the model.

    52

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    53/58

    Discriminant Function Analysis

    SPSS output

    The structure matrix tableshows the correlations ofeach variable with eachdiscriminant function. Thecorrelations serve like factorloadings in factor analysis --

    that is, by identifying thelargest absolute correlationsassociated with eachdiscriminant function.

    53

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    54/58

    Discriminant Function Analysis

    Three discriminant functions

    54

    Function 1: -2.84 + .88ed + .01em + .18reFunction 2: -1.86 + .12ed + .10em + .17reFunction 3: -.82 - .22ed -.01em + .66re

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    55/58

    Discriminant Function Analysis

    SPSS Output: combined-group plot

    The closer the groupcentroids, the moreerrors of classification

    likely will be.

    55

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    56/58

    Discriminant Function Analysis

    SPSS output: classification table

    The model excels at identifying Total service customers. However, it doesan exceptionally poor job of classifying E-service customers. You may needto find another predictor in order to separate these customers.

    56

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    57/58

    Discriminant Function Analysis

    We have created a discriminant model thatclassifies customers into one of four predefined"service usage" groups, based on demographicinformation from each customer. Using the

    structure matrix, we identified which variables aremost useful for segmenting the customer base.Lastly, the classification results show that themodel does poorly at classifying E-servicecustomers. If identifying E-service customers is notthe concern, the model may be accurate enoughfor this purpose.

    57

  • 8/14/2019 SPSS Discriminant Function Analysis.pdf

    58/58

    Discriminant Function Analysis

    58