22
TWO GROUPS DISCRIMINANT ANALYSIS: BASIC EXPLORATION AND APPLICATION IN PSYCHOLOGY DATA LEE SAN JING UNIVERSITI TEKNOLOGI MALAYSIA

LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

Embed Size (px)

Citation preview

Page 1: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

TWO GROUPS DISCRIMINANT ANALYSIS: BASIC EXPLORATION AND APPLICATION IN PSYCHOLOGY DATA

LEE SAN JING

UNIVERSITI TEKNOLOGI MALAYSIA

Page 2: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis
Page 3: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis
Page 4: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

TWO GROUPS DISCRIMINANT ANALYSIS:

BASIC EXPLORATION AND APPLICATION IN PSYCHOLOGY DATA

LEE SAN JING

This thesis is submitted in fulfillment of the requirement for the award of the

degree of Bachelor of Science and Education (Mathematics)

Faculty of Education

Universiti Teknologi Malaysia

MAY 2006

Page 5: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis
Page 6: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

iii

To my beloved family, brothers, and friends…..

Page 7: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

iv

ACKNOWLEGDEMENT

First of all, thank you Lord. With all HIS guidance and the assets HE gave

me, I was able to complete my thesis successfully.

I would like to take this opportunity to express my deepest gratitude to my

thesis supervisor, Pn. Haliza Abdul Rahman who gave me her guidance and

suggestions to me. Thanks for giving me her time and effort to complete this thesis

successfully.

Besides, I would like to acknowledge Pn. Noraslinda Mohamed Ismail for

her guidance and helps to me. Thanks for her kindness to give time and effort in

helping me.

I also like to express my love and gratitude to my family members who have

been with me every step of the way and give me support. I want to thank my course

mates, roommate, and friends for their extremely great support and encouragements

for me. Thanks for always being there to share all my happiness and sorrow.

Last but no least, I appreciate all the support and guidance provided by

everyone who ever help me directly or indirectly. Thanks you to all.

Page 8: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

v

ABSTRACT

Multivariate analysis is a continuation of univariate analysis that studies two

or more variables simultaneously. Discriminant analysis is part of the techniques in

multivariate analysis. This technique is designed to be used when dependent

variables are categorical data and metric data as independent variables. This report

will discuss about discriminant analysis for testing groups’ differences in term of the

value of the independent variables. It also involves the developing linear

discriminant function rule to classify individuals into the defined groups. Two

groups discriminant analysis using simultaneous direct estimation methods will be

focused. Error rate also will be computed to evaluate the performance of the

classification procedure. Discriminant analysis is applied in educational psychology

data from Wechsler Adult Intelligence Scale subtest Scores by using the Statistical

Package for Social Science (SPSS) Version 10.0.

Page 9: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

vi

ABSTRAK

Analisis multivariat adalah analisis lanjutan daripada analisis univariat yang

melibatkan dua atau lebih pembolehubah secara serentak. Analisis diskriminasi iaitu

merupakan salah satu kaedah dalam analisis multivariat digunakan apabila

pembolehubah bersandar adalah daripada data kategori dan data metrik sebagai

pembolehubah tidak bersandar. Laporan ini akan membincangkan mengenai analisis

diskriminasi dalam menguji perbezaan di antara kumpulan dengan menggunakan

nilai pembolehubah tidak bersandar. Ini juga melibatkan pembentukan suatu Aturan

Fungsi Diskriminasi Linear untuk mengklasifikasikan individu ke kumpulan-

kumpulan yang telah diketahui. Laporan ini akan tertumpu kepada analisis

pembezalayan yang melibatkan dua kumpulan sahaja dan yang menggunakan

Keadah Penganggaran Serentak & Terus. Kadar ralat juga akan dikira untuk menilai

procedur klasifikasi itu. Akhirnya, analisis pembezalayan diaplikasikan dalam data

psikologi pendidikan daripada Wechsler Adult Intelligence Scale subtest Scores

dengan menggunakan Statistical Package for Social Science (SPSS) 10.0.

Page 10: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

vii

TABLE OF CONTENTS

CHAPTER ITEM PAGE THESIS VALIDATION FORM

SUPERVISOR’S DECLARATION

TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

TABLES OF CONTENTS vii

LIST OF TABLES xi

LIST OF FIGURES xii

LIST OF ABBREVIATIONS xiii

1 INTRODUCTION

1.1 Introduction 1

1.2 Objectives of Study 4

1.3 Scope of Study 4

1.4 Thesis Organization 5

2 DISCRIMINANT ANALYSIS

2.1 Introduction 6

Page 11: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

viii

2.2 Goal of Discriminant Analysis 6

2.3 Application of Discriminant Analysis 9

2.4 Variables In Discriminant Analysis 12

2.5 Discriminant Rule 13

2.5.1 A Maximum Likelihood Rule 13

2.5.2 The Linear Discriminant Function Rule 15

2.5.3 A Mahalanobis Distances Rule 15

2.5.4 The Prior Probability Rule 16

2.6 Assumptions In Discriminant Analysis 23

2.6.1 Multivariate Normality 23

2.6.2 Common of Covariance Matrices 26

3 TWO GROUPS DISCRIMINANTION ANALYSIS

3.1 Introduction 27

3.2 Statistical Tests In Discriminant Analysis 27

3.2.1 Testing Differences Between

Two Groups Centroids

Using Hottelling’ T2 28

3.2.2 The Significance of

Discriminating Variables 30

3.3 Discriminant Function for Classification 31

3.3.1 Simultaneous Direct Methods 33

3.3.2 Stepwise Methods 33

3.4 Discriminant Score 33

3.5 Cuttoff Score Determination 34

3.6 Evaluating Discriminant Function 36

3.6.1 Classification Matrix 36

3.6.2 Estimating Probabilities

Misclassification 37

3.7 Linear Discriminant Function for Two

Multivariate Normal Populations with

Known Parameters when Σ=Σ=Σ 21 38

Page 12: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

ix

3.8 Linear Discriminant Function for Samples of

Two Multivariate Normal Populations with

Known Parameters when Σ=Σ=Σ 21 42

4 DATA ANALYSIS USING DISCRIMINANT ANALYSIS

4.1 Description on Data 53

4.2 Data Analysis with Usage of SPSS 56

4.2.1 Introduction for SPSS

for Window Version 10.0 56

4.2.2 Step of Data Analysis Using SPSS 57

4.3 Results of Study Using Discriminant Analysis 64

4.3.1 Test of Normality 64

4.3.1.1 Statistical Method 64

4.3.1.2 Graphical Method 65

4.3.2 Box’s M 67

4.3.3 Group Statistics 68

4.3.4 Tests Equality of Groups Mean 68

4.3.5 Covariance Matrices 69

4.3.6 Eigenvalues 70

4.3.7 Wilks Lambda 71

4.3.8 Standard Canonical Discriminant

Function Coefficients 71

4.3.9 Structure Matrix 72

4.3.10 Unstandardized Discriminant

Function Coefficients 73

4.3.11 Function at Group Centroids 74

4.3.12 Classification Result 75

Page 13: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

x

5 CONCLUSION AND RECOMMENDATION

5.1 Conclusion 76

5.2 Recommendation 77

REFERENCES 79

Page 14: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

xi

LIST OF TABLES

TABLE NO. TITLE PAGE

1.1 Matrix 1

1.2 Multivariate Techniques 3

3.1 Computation of Discriminant Score 51

3.2 Classification Matrix 52

4.1 Wechsler Adult Intelligence Scale Subtest Scores 54

4.2 Descriptions of WAIS subtests 55

4.3 Test Normality 64

4.4 Log Determinants 67 4.5 Results of Box’s M 67

4.6 Mean and Standard Deviation of variables in two groups

4.7 ANOVA Table 68

4.8 Covariances Matrices of two groups 69

4.9 Pooled Within-Groups Matrices 70

4.10 Results of Eigenvalues 70

4.11 Significance test using Wilks Lambda 71

4.12 Standard Canonical Discriminant Function Coefficients 71

4.13 Structure Matrix 72 4.14 Unstandardized Coefficients 73

4.15 Group Centroids 74

4.16 Classification Results 75

Page 15: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

xii

LIST OF FIGURES FIGURE NO TITLE PAGE

2.1 Plots of Skewness and Kurtosis 24

2.2 Normal Q-Q Plot 25

2.3 Histogram Plot 25

3.1 Plot of Discriminant Scores 34

4.1 Empty Data Editor 57

4.2 SPSS Data Editor Box for definition of variables 58

4.3 SPSS Data Editor Box to define the

dependent variable and independent variables 58

4.4 Value Label Box for “group” 59

4.5 Entering data in the data view 59

4.6 Starting Discriminant Analysis 60

4.7 Discriminant Analysis Dialog Box 60

4.8 Discriminant Analysis: Define Range of

grouping variable 61

4.9 Discriminant Analysis Box with the

“Enter variables together” 61

4.10 Discriminant Analysis Statistics for

Descriptives, Function Coefficients and matrices 62

4.11 Discriminant Analysis: Classification 62

4.12 Discriminant Analysis: Save 63

4.13 Output of the analysis 63

4.14 Normal Q-Q Plots of Variables 65

4.15 Plot of Histogram 66

Page 16: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

xiii

LIST OF ABBREVIATIONS ( )21c - Costs when an observation 2π from incorrectly classified as 1π

( )12c - Costs when an observation 1π from incorrectly classified as 2π

id - Mahalanobis’s distance

Σ - Covariance matrix

oH - Null hypothesis

aH - Alternative hypothesis

m) - Cutoff value

in - Sample size

NG - Number of groups

p - Number of predictor variables

1p - Prior probability classified as 1π

2p - Prior probability classified as 2π

pooledS - Pooled sample variance-covariance matrix

1pooledS− - Inverse pooled sample variance-covariance matrix

iS - Sample variance-covariance matrices

wSS - Within-groups sum of squares

tSS - Total sum of squares

iµ - True mean vector

iW - Discriminant coefficient for variable i

WAIS - Wechsler Adult Intelligence Scale

iX - Values of Independent variable i

x - Predictors variables or independent variables

ix - Estimated mean vector

Page 17: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

xiv

ox - New observation]

y - Dependent variable

iy - Group centroids or group means

nZ - Discriminant score for the nth individual

iπ - Populations

α - Significant Level

Λ Wilks Lambda

SPSS - Statistical Package for Science Social

Page 18: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

CHAPTER 1

INTRODUCTION

1.1 Introduction

Multivariate data occur in all branches of science. An experimental unit is

an object that can be measured in some way. The objects may be are items, persons,

organizations, events, and so on. Measuring and evaluating experimental units are

two principal activities of most researchers. Multivariate data result whenever a

researcher measures more than one variables of each experimental unit. The

variables sometimes called characteristics or properties which are the aspects of the

objects that are measured. Multivariate data, whether metric (interval and ratio

scale) or nonmetric (nominal and ordinal), are typically arranged in a structure array

called matrix.

Page 19: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

2

Table 1.1 : Matrix

p denote the number of variables and n denote the number of objects in the

sample. Each column of the matrix is a variable; each row corresponds to an object.

Multivariate techniques are increasingly popular techniques used nowadays.

Many multivariate techniques are extensions of univariate analysis such as analysis

of single distribution and bivariate analysis. Multivariate statistics are the complete

or general case, while univariate and bivariate statistics are special cases of the

multivariate model (Tabachnick, B.G 2001). The term “univariate analysis” refers

to analysis in which involves single dependent variables and multiple independent

variables. Moreover, “bivariate analysis” is an analysis of relationship between two

variables. With multivariate analysis, we can simultaneously analyze multiple

dependent and multiple independent variables.

Multivariate method is a collection of procedures for analyzing association

between two or more sets of measurements that have been made on each object in

one or more samples of objects. Multivariate method is extremely useful for helping

researchers making sense of large, complicated, and complex data sets that consists

of a lot of variables measured on large numbers of experimental units. Multivariate

methods are being widely applied in industry, government and research centers. It is

Variables

Objects

1 2 3 … p

1 11x 12x 13x … px1

2 21x 22x 23x … px2

3 31x 32x 33x … px3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

n 1nx 2nx 3nx … npx

Page 20: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

3

becoming widely used because of the complicated research questions with univariate

analysis and the availability of computer software package in performing analyses.

The availability of computer packages such as SPSS, S-Plus, and SAS have made it

easier for statisticians and researcher to process large and complex database with

multivariate analysis.

Multivariate methods sometimes are classified as “variable directed”

techniques. These techniques are primarily concerned with the relationship that

might exist among the measured response variables. Multivariate methods used

principally for dependence analysis and independence analysis. For methods

concerning with dependence analysis, a set of dependent variables are predicted by

other independent variables. However in an independence analysis, there is no

single or no group of variables is defined as independents or dependent. It is

focusing on the distinction between methods that examine the independence among

variables.

Table 1.2 : Multivariate Techniques

Dependence Techniques Independence Techniques

• Canonical Correlation Analysis

• Conjoint Analysis

• Multiple Regression

• Multivariate Analysis of

Variance (MANOVA)

• Structural Equation Modeling

• Clusters Analysis

• Correspondence Analysis

• Discriminant Analysis

• Factor Analysis

• Linear Probability Models

• Multidimensional Scaling

• Principal Components Analysis

In this report, discriminant analysis which is one of the multivariate methods

will be discussed. Discriminant analysis is a special case of multiple regressions.

This technique is designed to be used with metric data and nonmetric data.

Discriminant analysis is similar to regression analysis except the dependent variable

is categorical rather than continuous. In regression, we want to be able to predict the

Page 21: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

4

value of a variable of interest based on a set of predictor variables. In discriminant

analysis, we able to predict class membership of an individual based on a set of

predictors (Dallas E. Johnson, 1998).

Further studies about discriminant analysis will be discussed in the following

chapter.

1.2 Objectives of Study

In general, the objectives of this study are:

1. To learn and understand discriminant analysis between two groups.

(i) To test the significant differences between the two defined groups with

respect to a set of predictors.

(ii) To identify the predictor variables that best discriminate between two

groups.

(iii) To evaluate how well the discriminant rule be used to classify

individuals to one of the existing two groups.

2. Apply discriminant analysis in educational psychology field.

3. To learn and use the statistical computer software named Statistical Package

Science Social (SPSS) in analyzing data.

1.3 Scope of Study

This study discusses only the two-group discriminant analysis with

categorical dependent variables and metric independent variables. Hotelling’s T2 is

used to test for a significant between the two groups centroids. Simultaneous

estimation method is used to derive discriminant function. Linear discriminant rule

will be discussed with example of calculation. Discriminant analysis is applied to

Page 22: LEE SAN JING - Universiti Teknologi Malaysia · LEE SAN JING This thesis is submitted in fulfillment of the requirement for the award of the ... Analisis multivariat adalah analisis

5

the data in psychology educational field from Wechsler Adult Intelligence Scale

(WAIS) Subtest using SPSS.

1.4 Thesis Organization

Chapter 1 discusses about the introduction of multivariate analysis in general

followed by the objectives of the study, scope of the study and thesis organization.

Chapter 2 introduces the discriminant analysis. The objectives of

discriminant analysis and its application will be discussed in this chapter. Besides

that, it also included some limitation of the dependent and independent variables,

four different discriminant rules with an example of calculation and assessment

assumptions of discriminant analysis.

Chapter 3 discusses about a hypothesis test to test differences between two

groups centroids, and linear discriminant function for two multivariate normal

populations with known parameters when Σ=Σ=Σ 21 . It also discusses about the

determination of cutoff score, derivation dicriminant function to compute

discriminant score and evaluation of the discriminant function.

Chapter 4 discusses about the application of discriminant analysis in

educational psychology that data is from WAIS Subtest Score among Senile and No

Senile group. It also included every essential steps of using statistical software

SPSS version 10.0 in discriminant analysis and the result of the data analysis.

Chapter 5 discusses about conclusion of the whole study and some

recommendations for those who interested to pursue the study about the discriminant

analysis.