45
Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate Statistics for Personnel Classification. 1967. This sample data is for "World Airlines, a company employing over 50,000 persons and operating scheduled flights. This company naturally needs many men who can be assigned to a particular set of functions. The mechanics on the line who service the equipment of World Airlines form one of the groups we shall consider. A second group are the agents who deal with the passengers of the airline. A third group are the men in operations who coordinate airline activities. The personnel officer of World Airlines has developed an Activity Preference Inventory for the use of the airline. The first section of this inventory contains 30 pairs of activities, each pair naming an indoor activity and an outdoor activity. One item is _____ Billiards : Golf _____ The applicant for a job in World Airlines checks the activity he prefers. The score in the number of outdoor activities marked." (page 24) The second section of the Activity Preference Inventory "contains 35 items. One activity of each pair is a solitary activity, the other convivial. An example is _____ Solitaire : Bridge _____ The apprentice's score is the number of convivial activities he prefers." (page 82) Discriminant Analysis

Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Embed Size (px)

Citation preview

Page 1: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 1

A Problem in Personnel Classification

This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate Statistics for Personnel Classification. 1967.

This sample data is for "World Airlines, a company employing over 50,000 persons and operating scheduled flights. This company naturally needs many men who can be assigned to a particular set of functions. The mechanics on the line who service the equipment of World Airlines form one of the groups we shall consider. A second group are the agents who deal with the passengers of the airline. A third group are the men in operations who coordinate airline activities.

The personnel officer of World Airlines has developed an Activity Preference Inventory for the use of the airline. The first section of this inventory contains 30 pairs of activities, each pair naming an indoor activity and an outdoor activity. One item is

_____ Billiards : Golf _____

The applicant for a job in World Airlines checks the activity he prefers. The score in the number of outdoor activities marked." (page 24)

The second section of the Activity Preference Inventory "contains 35 items. One activity of each pair is a solitary activity, the other convivial. An example is

_____ Solitaire : Bridge _____

The apprentice's score is the number of convivial activities he prefers." (page 82)

The third section of the Activity Preference Inventory "contains 25 items. One activity of each pair is a liberal activity, the other a conservative activity. An example is

_____ Counseling : Advising _____

Discriminant Analysis

Page 2: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 2 Discriminant Analysis

The apprentice's score is the number of conservative activities he prefers." (page 153)

The Activity Preference Inventory was administered to 244 employees in the three job classifications who were successful and satisfied with their jobs.  The dependent variable, JOBCLASS 'Job Classification'  included three job classifications: 1 - Passenger Agents, 2 - Mechanics, and 3 - Operations Control.

The purpose of the analysis is to develop a classification scheme based on scores on the Activity Preference Inventory to assign new employees to the different job groups.

A Problem in Personnel Classification (continued)

Page 3: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 3

Stage One: Define the Research Problem

In this stage, the following issues are addressed:•Relationship to be analyzed•Specifying the dependent and independent variables•Method for including independent variables

Discriminant Analysis

Relationship to be analyzed

We are interested in the relationship between scores on the three scales of the Activity Preference Inventory and the different job classifications.

Specifying the dependent and independent variables

The dependent variable is:•JOBCLASS 'Job Classification'

The independent variables are:•OUTDOOR, 'Outdoor Activity Score'•CONVIV, 'Convivial Score'•CONSERV, 'Conservative Score'

Method for including independent variables

Since the purpose of this analysis is to articulate the relationship between the activity scores and job classification, direct entry of independent variables would be an appropriate method for selecting variables.  However, I prefer to use a stepwise method in order to identify which predictors are statistically significant.

Page 4: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 4

Stage 2: Develop the Analysis Plan: Sample Size Issues

In this stage, the following issues are addressed:

•Missing data analysis•Minimum sample size requirement: 20+ cases per independent variable•Division of the sample: 20+ cases in each dependent variable group

Discriminant Analysis

Missing data analysis

There is no missing data in this data set.

Minimum sample size requirement: 20+ cases per independent variable

The data set contains 244 subjects and 3 independent variables.  The ratio of 81 cases per independent variable excess the minimum sample size requirement.

Division of the sample: 20+ cases in each dependent variable group

There were 85 Passenger Agents in the sample, 93 Mechanics, and 66 Operations Control staff in the sample.  There are more than 20 cases in each dependent variable group.

Page 5: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 5

Stage 2: Develop the Analysis Plan: Measurement Issues:

In this stage, the following issues are addressed:

•Incorporating nonmetric data with dummy variables•Representing curvilinear effects with polynomials•Representing interaction or moderator effects

Discriminant Analysis

Incorporating Nonmetric Data with Dummy Variables

None of the variables are nonmetric.

Representing Curvilinear Effects with Polynomials

We do not have any evidence of curvilinear effects at this point in the analysis.

Representing Interaction or Moderator Effects

We do not have any evidence at this point in the analysis that we should add interaction or moderator variables.

Page 6: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 6

Stage 3: Evaluate Underlying Assumptions

In this stage, the following issues are addressed:

•Nonmetric dependent variable and metric or dummy-coded independent variables•Multivariate normality of metric independent variables: assess normality of individual variables•Linear relationships among variables•Assumption of equal dispersion for dependent variable groups

Discriminant Analysis

Nonmetric dependent variable and metric or dummy-coded independent variables

The dependent variable is nonmetric.  All of the independent variables are metric.

Multivariate normality of metric independent variables

Since there is not a method for assessing multivariate normality, we assess the normality of the individual metric variables.

Page 7: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 7

Run the 'NormalityAssumptionAndTransformations' Script

Discriminant Analysis

Page 8: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 8

Complete the 'Test for Assumption of Normality' Dialog Box

Discriminant Analysis

Tests of Normality

We find that all three of the independent variables fail the test of normality, and that none of the transformations induced normality in any of the variables.  We should note the failure to meet the normality assumption for possible inclusion in our discussion of findings.

Page 9: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 9

Linear relationships among variables

Since our dependent variable is not metric, we cannot use it to test for linearity of the independent variables. As an alternative, we can plot each metric independent variable against all other independent variables in a scatterplot matrix to look for patterns of nonlinear relationships.  If one of the independent variables shows multiple nonlinear relationships to the other independent variables, we consider it a candidate for transformation

Discriminant Analysis

Page 10: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 10

Requesting a Scatterplot Matrix

Discriminant Analysis

Page 11: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 11

Specifications for the Scatterplot Matrix

Discriminant Analysis

Page 12: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 12

The Scatterplot Matrix

Blue fit lines were added to the scatterplot matrix to improve interpretability.

Having computed a scatterplot for all combinations of metric independent variables, we identify all of the variables that appear in any plot that shows a nonlinear trend. We will call these variables our nonlinear candidates. To identify which of the nonlinear candidates is producing the nonlinear pattern, we look at all of the plots for each of the candidate variables. The candidate variable that is not linear should show up in a nonlinear relationship in several plots with other linear variables. Hopefully, the form of the plot will suggest the power term to best represent the relationship, e.g. squared term, cubed term, etc.

None of the scatterplots show evidence of any nonlinear relationships.

Discriminant Analysis

Page 13: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 13

Assumption of equal dispersion for dependent variable groups

Box's M statistic tests for homogeneity of dispersion matrices across the subgroups of the dependent variable.  The null hypothesis is that the dispersion matrices are homogenous.  If the analysis fails this test, we can request classification using separate group dispersion matrices to see it this improves the model's accuracy rate.

Box's M statistic is produced by the SPSS discriminant procedure, so we will defer this question until we have obtained the discriminant analysis output.

Discriminant Analysis

Page 14: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 14

Stage 4: Estimation of Discriminant Functions and Overall Fit: The Discriminant Functions

In this stage, the following issues are addressed:

•Compute the discriminant analysis•Overall significance of the discriminant function(s)

Discriminant Analysis

Compute the discriminant analysis

The steps to obtain a discriminant analysis are detailed on the following screens.

Page 15: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 15

Requesting a Discriminant Analysis

Discriminant Analysis

Page 16: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 16

Specifying the Dependent Variable

Discriminant Analysis

Page 17: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 17

Specifying the Independent Variables

Discriminant Analysis

Page 18: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 18

Specifying Statistics to Include in the Output

Discriminant Analysis

Page 19: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 19

Specifying the Stepwise Method for Selecting Variables

Discriminant Analysis

Page 20: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 20

Specifying the Classification Requirement

Discriminant Analysis

Page 21: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 21

Complete the Discriminant Analysis Request

Discriminant Analysis

Page 22: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 22

Overall significance of the discriminant function(s) - 1

Our first task is to determine whether or not there is a statistically significant relationship between the independent variables and the dependent variable. We navigate to the section of output titled "Summary of Canonical Discriminant Functions" to locate the following outputs:

Recall that the maximum number of discriminant functions is equal to the number of groups in the dependent variable minus one, or the number of variables in the analysis, whichever is smaller. For this problem, the maximum number of discriminant functions is two.

Discriminant Analysis

Page 23: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 23

Overall significance of the discriminant function(s) - 2

In the Wilks' Lambda table, SPSS successively tests models with an increasing number of functions. The first line of the table tests the null hypothesis that the mean discriminant scores for the two possible functions are equal in the three groups of the dependent variable. Since the probability of the chi-square statistic for this test is less than 0.0001, we reject the null hypothesis and conclude that there is at least one statistically significant function. Had the probability for this test been larger than 0.05, we would have concluded that there are no discriminant functions which separate the groups of the dependent variable.

The second line of the Wilks' Lambda table tests the null hypothesis that the mean discriminant scores for the second possible discriminant function are equal in the three groups of the dependent variable. Since the probability of the chi-square statistic for this test is less than 0.0001, we reject the null hypothesis and conclude that the second discriminant function, as well as the first, is statistically significant. Had the probability for this test been larger than 0.05, we would have concluded that there is only one discriminant function to separate the groups of the dependent variable.

Our conclusion from this output is that there are two statistically discriminant functions for this problem.

Discriminant Analysis

Page 24: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 24

Stage 4: Estimation of Discriminant Functions and Overall Fit:  Assessing Model Fit

In this stage, the following issues are addressed:

•Assumption of equal dispersion for dependent variable groups•Classification accuracy chance criteria•Press's Q statistic•Presence of outliers

Discriminant Analysis

Page 25: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 25

In discriminant analysis, the best measure of overall fit is classification accuracy.  The appropriateness of using the pooled covariance matrix in the classification phase is evaluated by the Box's M statistic.

We examine the probability of the Box's M statistic to determine whether or not we meet the assumption of equal dispersion of the dispersion or covariance matrices (multivariate measure of variance). This test is very sensitive, so we should select a conservative alpha value of 0.01. At that alpha level, we fail to reject the null hypothesis for this analysis.

Assumption of equal dispersion for dependent variable groups

Had we failed this test, our remedy would be to re-run the discriminant analysis requesting the use of separate covariance matrices in classification.

Discriminant Analysis

Page 26: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 26

Classification accuracy chance criteria - 1

The classification matrix for this problem computed by SPSS is shown below:

Discriminant Analysis

Page 27: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 27

Classification accuracy chance criteria - 2

Following the text, we compare the accuracy rate for the cross-validated sample (75.0%) to each of the by chance accuracy rates.

In the table of Prior Probabilities for Groups, we see that the three groups contained .348, .381, and .270 of the sample of 244 cases used to derive the discriminant model.

The proportional chance criteria for assessing model fit is calculated by summing the squared proportion that each group represents of the sample, in this case (0.348 x 0.348) + (0.381 x 0.381 ) + (0.270 x 0.270) = 0.339. Based on the requirement that model accuracy be 25% better than the chance criteria, the standard to use for comparing the model's accuracy is 1.25 x 0.339= 0.4424. Our model accuracy rate of 75% exceeds this standard.

The maximum chance criteria is the proportion of cases in the largest group, 38.1% in this problem. Based on the requirement that model accuracy be 25% better than the chance criteria, the standard to use for comparing the model's accuracy is 1.25 x 38.1% = 47.6%. Our model accuracy rate of 75% exceeds this standard.

Discriminant Analysis

Page 28: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 28

Press's Q statistic

Substituting the values for this problem (244 cases, 183 correct classifications, and 3 groups) into the formula for Press's Q statistic, we obtain a value = [244 - (183 x 3)] ^ 2 / 244 * (3 - 1) = 190.6. This value exceeds the critical value of 6.63 (Text, page 305) so we conclude that the prediction accuracy is greater than that expected by chance.

By all three criteria, we would interpret our model as having an accuracy above that expected by chance. Thus, this is a valuable or useful model that supports predictions of the dependent variable.

Discriminant Analysis

Page 29: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 29

SPSS print Mahalanobis distance scores for each case in the table of Casewise Statistics, so we can use this as a basis for detecting outliers.

According to the SPSS Applications Guide, p .227, cases with large values of the Mahalanobis Distance from their group mean can be identified as outliers. For large samples from a multivariate normal distribution, the square of the Mahalanobis distance from a case to its group mean is approximately distributed as a chi-square statistic with degrees of freedom equal to the number of variables in the analysis. The critical value of chi-square with 3 degrees of freedom (the stepwise procedure entered three variables in the function) and an alpha of 0.01 (we only want to detect major outliers) is 11.345.

We can request this figure from SPSS using the following compute command:

COMPUTE mahcutpt = IDF.CHISQ(0.99,3).

EXECUTE.

Where 0.99 is the cumulative probability up to the significance level of interest and 3 is the number of degrees of freedom.  SPSS will create a column of values in the data set that contains the desired value.

We scan the table of Casewise Statistics to identify any cases that have a Squared Mahalanobis distance greater than 11.345 for the group to which the case is most likely to belong, i.e. under the column labeled 'Highest Group.'

Presence of outliers - 1

Discriminant Analysis

Page 30: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 30

Presence of outliers - 2

In this particular analysis, I find one case, number 23, with a large enough Mahalanobis distance to indicate that it is an outlier and might be considered for removal from the analysis.  However, since there is only one case out of 244, it is not likely to make any difference, so we will forego re-running the analysis without this case.

Discriminant Analysis

Page 31: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 31

Stage 5: Interpret the Results

In this section, we address the following issues:

•Number of functions to be interpreted•Relationship of functions to categories of the dependent variable•Assessing the contribution of predictor variables•Impact of multicollinearity on solution

Discriminant Analysis

Number of functions to be interpreted

As indicated previously, there are two significant discriminant functions to be interpreted.

Page 32: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 32

Role of functions in differentiating categories of the dependent variable

The combined-groups scatterplot enables us to link the discriminant functions to the categories of the dependent variable. I have modified the SPSS output by changing the symbols for the different points so that we can easily detect the group members.  In addition, I have added reference lines at the zero value for each axis.

Analyzing this plot, we see that the first function differentiates Passenger Agents from Mechanics and Operations Control personnel.  The second function differentiates Mechanics from Operations Control staff.

Discriminant Analysis

Page 33: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 33

Assessing the contribution of predictor variables - 1

Identifying the statistically significant predictor variables

The summary table of variables entering and leaving the discriminant functions is shown below. We can see that we have three independent variables included in the analysis in the order shown in the table. We would conclude all three of the independent variables, Outdoor Activity Score, Convivial Score, and Conservative Score make a statistically significant contribution to group membership on the dependent variable.

Discriminant Analysis

Page 34: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 34

Assessing the contribution of predictor variables - 2

Importance of Variables and the Structure Matrix

To determine which predictor variables are more important in predicting group membership when we use a stepwise method of variable selection,  we can simply look at the order in which the variables entered, as shown in the following table. 

Discriminant Analysis

Page 35: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 35

Assessing the contribution of predictor variables - 3

While we know which variables were important to the overall analysis, we are also concerned with which variables are important to which discriminant function.  This information is provided by the structure matrix, which is a rotated correlation matrix containing the correlations between each of the independent variables and the discriminant function scores.

From the structure matrix, we see that two of the three variable entered into the functions (Convivial Score and Conservative Score) are the important variables in the first discriminant function, while Outdoor Activity Score is the important variable on the second function.

Discriminant Analysis

Page 36: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 36

Assessing the contribution of predictor variables - 4

Comparing Group Means to Determine Direction of Relationships

If we examine the pattern of means for the three statistically significant variables for the three job classifications, we can provider a fuller discussion of the relationships between the independent variables, the dependent variable groups, and the discriminant functions.

Discriminant Analysis

Page 37: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 37

Assessing the contribution of predictor variables - 5

The first discriminant function distinguishes Passenger Agents from Mechanics and Operations Control staff. The two variables that are important on the first function are convivial score and conservative score.  Passenger agents had higher convivial scores and lower conservative scores than the other two groups.

Operations Control staff are distinguished from Mechanics by the second discriminant function which contains only a single variable, the outdoor activity Score.  Mechanics had a higher average on the outdoor activity score than did Operations Control staff.

In sum, Passenger Agents are more outgoing (convivial) and more tolerant (less conservative) than Mechanics and Operations Control personnel.  Mechanics differ from Operations Control personnel in their stronger preference for outdoor oriented activities.

Discriminant Analysis

Page 38: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 38

Impact of Multicollinearity on solution

Multicollinearity is indicated by SPSS for discriminant analysis by very small tolerance values for variables, e.g. less than 0.10 (0.10 is the size of the tolerance, not its significance value).

If we look at the table of Variables Not In The Analysis, we see that it did not print anything for step 3, indicating that all variables were in the analysis.  Multicollinearity is not an issue in this problem.

Discriminant Analysis

Page 39: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 39

Stage 6: Validate The Model

In this stage, we are normally concerned with the following issues:

•Conducting the Validation Analysis•Generalizability of the Discriminant Model

Discriminant Analysis

Conducting  the Validation Analysis

To validate the discriminant analysis, we can randomly divide our sample into two groups, a screening sample and a validation sample. The analysis is computed for the screening sample and used to predict membership on the dependent variable in the validation sample. If the model in the screening sample is valid, we would expect that the accuracy rates for both samples to be about the same.

In the double cross-validation strategy, we reverse the designation of the screening and validation sample and re-run the analysis.  We can then compare the discriminant functions derived for both samples.  If the two sets of functions contain a very different set of variables, it indicates that the variables might have achieved significance because of the sample size and not because of the strength of the relationship. Our findings about these individual variables would be that the predictive utility of these variables is not generalizable.

Page 40: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 40

Set the Starting Point for Random Number Generation

Discriminant Analysis

Page 41: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 41

Compute the Variable to Randomly Split the Sample into Two Halves

Discriminant Analysis

Page 42: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 42

Specify the Cases to Include in the First Screening Sample

Discriminant Analysis

Page 43: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 43

Specify the Value of the Selection Variable for the First Validation Analysis

Discriminant Analysis

Page 44: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 44

Specify the Value of the Selection Variable for the Second Validation Analysis

Discriminant Analysis

Page 45: Slide 1 A Problem in Personnel Classification This problem is from Phillip J. Rulon, David V. Tiedeman, Maurice Tatsuoka, and Charles R. Langmuir. Multivariate

Slide 45

Generalizability of the Discriminant Model

We base our decisions about the generalizability of the discriminant model on a table which compares key outputs comparing the analysis with the full data set to each of the validation runs.

Full Model Split=0 Split=1

Number of Significant Functions

2 2 2

Cross-validated Accuracy 75.0% 74.8% 76.0%

Accuracy Rate for Validation Sample

77.7% 76.4%

SignificantCoefficients (p < 0.05)

1. OUTDOOR Outdoor Activity Score2. CONVIV Convivial Score3. CONSERV Conservative Score

2. OUTDOOR Outdoor Activity Score1. CONVIV Convivial Score3. CONSERV Conservative Score

1. OUTDOOR Outdoor Activity Score2. CONVIV Convivial Score3. CONSERV Conservative Score

In both of the validation analyses, two significant discriminant functions were found. The cross-validated accuracy rates and the accuracy rate for the validation samples were approximately the same size. Both validation analyses included the three available independent variables, though the order of entry differed.The results of the validation analyses are similar to the model with the full data set. We can conclude that the model is generalizable.

Discriminant Analysis