Discrimination amongst k populations. We want to determine if an observation vector comes from one...

Preview:

Citation preview

Discrimination amongst k populations

We want to determine if an observation vector

1 1 1 1

1

: , ,

: , ,

p

k k p k

f x x f x

f x x f x

1

p

x

x

x

comes from one of the k populations

For this purpose we need to partition p-dimensional space into k regions C1, C2 , …, Ck

We will make the decision:

j

j i i

C

P x C f x dx

came from i iD x

For this purpose we need to partition p-dimensional space into k regions C1, C2 , …, Ck

if ix C

Misclassification probabilities

P[j|i] = P[ classify the case in j when case is from i]

Cost of Misclassification

cj|i = Cost classifying the case in j when case is from i

1 1 11 1 1i i i i iECM i c P i c P i i c P i i

Initial probabilities of inclusion

P[i] = P[ classify the case is from i initially]

Expected Cost of Misclassification of a case from population i

We assume that we know the case came from i

k ic P k i

j ij i

c P j i

1 1ECM P ECM P k ECM k

Total Expected Cost of Misclassification

j ii j i

P i c P j i

i

P i ECM i

j

ij ii j i C

P i c f x dx

j

i j ij i jC

P i f x c dx

i

j

Optimal Classification Rule

The optimal classification rule will find the regions Cj that will minimize:

j

i j ij i jC

ECM P i f x c dx

| if j

i j ij i jC

c P i f x dx c c

1

j

k

i jj iC

c P i f x P j f x dx

ECM will be minimized if Cj is chosen where the term that is omitted:

is the largest jP j f x

Optimal Regions when misclassification

costs are equal

for j j iC x P j f x P i f x i j

ln ln for j ix P j f x P i f x i j

Optimal Regions when misclassification

costs are equal an distributions are p-variate Normal with common covariance matrix

for j j iC x P j f x P i f x i j

ln ln for j ix P j f x P i f x i j

112

/ 2 1/ 2

1

2i ix x

i pf x e

In the case of normality

ln iP i f x

112

1ln ln 2 ln

2 2 i i

pP i x x

and ln ln if:j iP j f x P i f x

that is

112

1ln ln 2 ln

2 2 j j

pP j x x

112

1ln ln 2 ln

2 2 i i

pP i x x

1 1 1 11 12 2ln lnj j j i i ix P j x P i

j j i ia x b a x b

1 112where and lni i i i ia b P i

or

Summarizing

We will classify the observation vector in population j if: max maxj j j i i i

i iL a x b L a x b

1 112where and lni i i i ia b P i

1 2 3,L L L12

3

3 1 2,L L L 2 1 3,L L L

1 2L L

2 3L L1 3L L

k—means Clustering

A non-hierarchical clustering scheme

want subdivide the data set into k groups

The k means algorithm

1. Initially subdivide the complete data into k groups.

2. Compute the centroids (mean vector) for each group.

3. Sequentially go through the data reassigning each case to the group with the closest centroid.

4. After reassigning a case to a new group recalculate the centroid for the original group and the new group to which it is a member.

5. Continue until there are no new reassignment of cases.

Example: n = 60 cases with two variables (x,y) measured

x y x y x y x y

26.25 17.32 18.29 13.55 8.95 25.8 20.85 8.1427.11 23.51 17.45 15.61 8.56 16.98 18.65 8.2735.05 19.41 26.21 22.43 11.23 17.89 22.03 529.04 21.98 35.48 21.5 10.99 15.44 19.16 7.2628.4 22.39 19.34 15.19 14.1 23.47 25.27 4.3122.2 19.26 16.33 20.05 3.51 20.32 21.2 10.26

29.38 22.61 15.02 17.21 12.36 18.77 19.02 7.2226.59 18.76 8.31 23.44 7.87 17.32 20.49 6.6727.61 18.9 8.9 21.13 9.98 20.26 20.09 9.5725.23 20.05 9.3 17.33 9.32 22.96 33.83 4.1325.02 16.63 14.55 22.01 18.79 10.2 23.95 10.5733.57 26.85 12.12 28.5 28.91 8 23.92 12.1527.25 16.44 1.16 20.54 26.91 5.65 13.51 11.7333.7 22.8 13.95 17.69 14.22 10.45 20.53 9.4

27.16 19.48 13.14 24.02 33.24 5.29 14.88 14.42

0

5

10

15

20

25

30

0 10 20 30 40

Graph: Scattergram of data

0

5

10

15

20

25

30

0 10 20 30 40

Graph: Initial Clustering

0

5

10

15

20

25

30

10 20 30 40

Graph: Final Clustering

0

5

10

15

20

25

30

0 10 20 30 40

Graph: True subpopulations

An Example: Cluster Analysis, Discriminant Analysis, MANOVA

A survey was given to 132 students

• Male=35,

• Female=97

They rated, on a Likert scale

• 1 to 5

• their agreement with each of 40 statements.

All statements are related to the Meaning of Life

Questions and Statements

1. How religious/spiritual would you say you are?

2. To have trustworthy and intimate friend(s)

3. To have a fulfilling career

4. To be closely connected to family 5. To share values/beliefs with others in your close circle or

community

6. To have and raise children 7. To continually set short and long-term, achievable goals for

yourself

8. To feel satisfied with yourself (feel good about yourself)

9. To live up to the expectations of family and close friends

10. To contribute to world peace

Statements - continued

11. To be involved in an intimate relationship with a significant person

12. To give of yourself to others.

13. To be able to plan and take time for leisure.

14. To act on your own personal beliefs, despite outside pressure.

15. To be seen as physically attractive. 16. To feel confident in choosing new experiences to better

yourself.

17. To care about the state of the physical/natural environment.

18. To take responsibility for your mistakes.

19. To make restitution for you mistakes, if necessary.

20. To be involved with social or political causes.

21. To keep up with media and popular-culture trends.

22. To adhere to religious practices based on tradition or rituals. 23. To use your own creativity in a way that you believe is

worthwhile. 24. The meaning of life is found in understanding ones ultimate

purpose for life. 25. The meaning of life can be discovered through intentionally

living a life that glorifies a Spiritual being.

26. There is a reason for everything that happens. 27. Obtaining things in life that are material and tangible is only

part of discovering the meaning of life. 28. People unearth the same basic values when attempting to find

the meaning of life. 29. It is more important to cultivate character than to be consumed

with outward rewards, or, awards.

30. Some aims or goals in life are more valuable than other goals.

31. The purpose of life lies in promoting the ends of truth, beauty, and goodness.

32. A meaningful life is one that contributes to the well-being of others.

33. The meaning of life is the same as a happy life.

34. The meaning of life is found in realizing my potential.

35. Life has purpose only in the everyday details of living. 36. There is no, one, universal way of obtaining a meaningful life

for all people. 37. People passionately desire different things. Obtaining these

things contributes to making life more meaningful for them. 38. What contributes to a meaningful life varies according to each

person (or group). 39. Lives can be meaningful even without the existence of a God

or spiritual realm.

40. Our lives have no significance, but we must live as if they do.

The Analysis

The first step in the analysis is to perform cluster analysis to see if there are any subgroups of interest:

Both hierarchical and partitioning method (K-means) approaches were used for the cluster analysis.

Figure1: Dendogram

Clustering using Ward`s method

Euclidean distances

Cases

Lin

ka

ge

Dis

tan

ce

0

10

20

30

40

50

60

70

80

The Analysis

From the of the previous figure, it follows by cutting across the dendogram branches at a linkage distance of between 30 or 75, that 2 or 3 clusters describe the data best.

The k-means method was then used (with k=2 and k=3) to identify members of these clusters. Using the k-means procedure, similarly, two and three cluster models fit the data best (attempts to use higher values of k resulted in clusters with only one case).

One-way MANOVA was then utilized to test for significant differences between the clusters

It was also used to identify the statements on which the differences between the two clusters were most significant.

Table 1: Questions and Descriptive Statistics by Clusters

Cluster 1 Cluster 2 Cluster 3 p-value mean std.dev mean std dev mean std dev Question

0.000 2.40 0.93 4.41 0.84 1.26 0.45 25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

0.000 4.65 0.73 2.59 1.52 4.58 0.96 36. There is no, one, universal way of obtaining a meaningful life for all people.

0.000 4.24 0.85 1.59 0.95 4.37 1.38 39. Lives can be meaningful even without the existence of a God or spiritual realm.

0.000 1.40 0.60 1.17 0.44 3.05 1.78 40. Our lives have no significance, but we must live as if they do.

0.000 2.31 1.10 3.41 1.07 1.37 0.68 22. To adhere to religious practices based on tradition or rituals.

0.000 4.03 0.93 4.34 0.94 2.32 1.57 26. There is a reason for everything that happens.

0.000 3.50 1.11 3.27 1.36 1.53 0.90 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

0.000 2.78 1.01 3.98 1.29 2.37 1.34 1. How religious/spiritual would you say you are?

0.000 4.79 0.41 3.90 1.14 4.53 0.84 38. What contributes to a meaningful life varies according to each person (or group).

0.000 4.22 0.86 3.22 1.26 3.16 1.26 37. People passionately desire different things. Obtaining these things contributes to making life more meaningful for them.

0.000 4.03 0.71 3.34 1.09 2.89 1.15 34. The meaning of life is found in realizing my potential.

0.000 3.89 0.78 4.34 0.62 3.16 1.07 5. To share values/beliefs with others in your close circle or community

0.000 4.25 0.55 4.61 0.54 4.89 0.32 14. To act on your own personal beliefs, despite outside pressure.

0.000 3.53 0.92 3.76 1.11 2.37 1.42 24. The meaning of life is found in understanding ones ultimate purpose for life.

0.000 4.38 0.57 4.37 0.73 3.58 1.43 32. A meaningful life is one that contributes to the well-being of others.

0.001 3.38 1.12 2.61 1.12 2.53 1.54 33. The meaning of life is the same as a happy life.

0.003 2.64 1.07 2.05 1.16 1.84 1.17 35. Life has purpose only in the everyday details of living.

0.005 3.01 0.94 2.95 0.92 2.21 1.08 28. People unearth the same basic values when attempting to find the meaning of life.

0.007 3.93 0.94 4.27 0.90 3.42 1.17 7. To continually set short and long-term, achievable goals for yourself

0.008 3.67 0.95 3.51 0.95 2.84 1.34 9. To live up to the expectations of family and close friends

0.013 3.63 0.78 3.90 0.89 4.26 1.10 17. To care about the state of the physical/natural environment.

0.015 4.32 0.73 4.05 0.89 4.68 0.75 13. To be able to plan and take time for leisure.

0.015 2.72 0.97 2.56 1.05 1.95 1.13 21. To keep up with media and popular-culture trends.

0.041 4.50 0.80 4.54 0.78 3.95 1.35 30. Some aims or goals in life are more valuable than other goals.

Table: Questions and Cluster means

Cluster p-value 1 2 3 Question

0.000 2.40 4.41 1.26 25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

0.000 4.65 2.59 4.58 36. There is no, one, universal way of obtaining a meaningful life for all people.

0.000 4.24 1.59 4.37 39. Lives can be meaningful even without the existence of a God or spiritual realm.

0.000 1.40 1.17 3.05 40. Our lives have no significance, but we must live as if they do.

0.000 2.31 3.41 1.37 22. To adhere to religious practices based on tradition or rituals.

0.000 4.03 4.34 2.32 26. There is a reason for everything that happens.

0.000 3.50 3.27 1.53 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

0.000 2.78 3.98 2.37 1. How religious/spiritual would you say you are?

0.000 4.79 3.90 4.53 38. What contributes to a meaningful life varies according to each person (or group).

0.000 4.22 3.22 3.16 37. People passionately desire different things. Obtaining these things contributes to making life more meaningful for them.

0.000 4.03 3.34 2.89 34. The meaning of life is found in realizing my potential.

0.000 3.89 4.34 3.16 5. To share values/beliefs with others in your close circle or community

0.000 4.25 4.61 4.89 14. To act on your own personal beliefs, despite outside pressure.

0.000 3.53 3.76 2.37 24. The meaning of life is found in understanding ones ultimate purpose for life.

0.000 4.38 4.37 3.58 32. A meaningful life is one that contributes to the well-being of others.

0.001 3.38 2.61 2.53 33. The meaning of life is the same as a happy life.

0.003 2.64 2.05 1.84 35. Life has purpose only in the everyday details of living.

0.005 3.01 2.95 2.21 28. People unearth the same basic values when attempting to find the meaning of life.

0.007 3.93 4.27 3.42 7. To continually set short and long-term, achievable goals for yourself

0.008 3.67 3.51 2.84 9. To live up to the expectations of family and close friends

0.013 3.63 3.90 4.26 17. To care about the state of the physical/natural environment.

0.015 4.32 4.05 4.68 13. To be able to plan and take time for leisure.

0.015 2.72 2.56 1.95 21. To keep up with media and popular-culture trends.

0.041 4.50 4.54 3.95 30. Some aims or goals in life are more valuable than other goals.

A step-wise discriminant function analysis was done to predict cluster membership and to attempt to identify the minimal set of survey statements used to identify cluster separation for the 128 participants in the study.

Table 2: Standardized Canonical Discriminant Function Coefficients

Function

Question 1 2 13. To be able to plan and take time for leisure. .339 -.069 14. To act on your own personal beliefs, despite outside pressure.

.012 -.351

25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

-.501 .118

26. There is a reason for everything that happens. -.336 .368 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

-.234 .287

34. The meaning of life is found in realizing my potential. -.042 .444 36. There is no, one, universal way of obtaining a meaningful life for all people.

.258 .319

39. Lives can be meaningful even without the existence of a God or spiritual realm.

.469 .468

40. Our lives have no significance, but we must live as if they do.

.268 -.637

Figure 2: Cluster Mean Scores for discriminating questions

0

1

2

3

4

5

Q13 Q14 Q25 Q26 Q27 Q34 Q36 Q39 Q40

Semi- Religious

Religious

Humanistic

0

1

2

3

4

5

6

7

8

-4 -3 -2 -1 0 1 2 3 4 5 6

F1 (Discriminant function 1)

F 2 (

Dis

crim

inan

t fu

ncti

on 2

)

Semi-ReligiousReligiousHumanistic

religious Non-religious

Opt

imis

ticP

essi

mis

tic

1. 96% of the cluster 1 respondents were correctly classified,

2. 88% of cluster 2 respondents were correctly classified, and

3. 84% of cluster 3 respondents were classified correctly.

Discrimination performance

Techniques for studying correlation and covariance structure

Principle Components Analysis (PCA)

Factor Analysis

Principle Component Analysis

Let x

and covariance matrix .

Definition:

1 1 1 p pC a x a x a x

have a p-variate Normal distribution

with mean vector

The linear combination

is called the first principle component if

1, , pa a a

is chosen to maximize

1Var C Var a x a a

subject to2 21 1pa a a a

Let

, 1 1g a V a a a a a a

Consider maximizing

subject to 2 21 1pa a a a

V Var a x a a

Using the Lagrange multiplier technique

Now

,1 0 if 1

g aa a a a

and

,2 2 0 if

g aa a a a

a

Thus is an eigenvector of and is the eigenvalue

associated with .

a

a

Also Var a x a a a a a a

Hence is maximized if is the largest

eigenvalue of .

Var a x

Summary

1 1 1 p pC a x a x a x

is the first principle component if 1

p

a

a

a

2 21i.e. 1pa a a a

is the eigenvector (length 1)of associated with the largest eigenvalue 1 of .

Recommended