Session5 Factor Analysis Handout

Embed Size (px)

Citation preview

  • 7/30/2019 Session5 Factor Analysis Handout

    1/16

    Factor Analysis

    z Several variables are to be studied in multivariate analysis.

    Research Studies

    z These variables may / may not be mutually independent of

    each other

    z Some may hold strong correlation with some other

    variables. Multi-collinearity may exist among variables.

    z Data analysis methods in this situation are called Inter-dependence Methods.

    Major Inter-dependence Methods

    zFactor Analysis to reduce several

    correlated variables into a few

    uncorrelated meaningful factors

    zCluster Analysis to classify individual

    elements of the population into a few

    homogeneous groups.

    Research Studies

    z Several variables are to be studied

    z Purpose is to establish a cause-and-effect relationship

    z One dependent (effect) variable and several

    independent (cause) variables

    z Data are obtained on them from sample

    z Data analysis methods in such situations are called

    Dependence Methods.

  • 7/30/2019 Session5 Factor Analysis Handout

    2/16

    Major Dependence Methods

    Dependent Variable

    Metric Categorical

    Independent Variables Independent Variables

    Categorical Metric Categorical Metric

    Analysis of Multiple Canonical Multiple

    Variance Regression Correlation Discriminant

    ANOVA DISCRIMINANT ANALYSIS REGRESSION

    Major Dependence Methods

    SimilaritiesNumber of One One OnedependentVariables

    Number ofindependent Many Many Manyvariables

    Differences

    Nature of thedependent Metric Categorical MetricVariables

    Nature of theindependent Categorical Metric Metricvariables

    Multivariate Analysis Methods

    Major Multivariate methods:

    1. Factor Analysis

    2. Cluster Analysis

    3. Multivariate Discriminant Anal sis

    4. Multivariate Regression Analysis.

  • 7/30/2019 Session5 Factor Analysis Handout

    3/16

    Factor Analysis

    z To define the underlying structure among the variables in the analysis.

    z Examines the interrelationships among a large number of variables and

    then attempts to explain them in terms of their common underlying

    dimensions, referred to as factors.

    z Examines entire set of inter-dependent relationships without making any

    distinction between dependent and independent variables.

    zReduces the total number ofvariables in the research study to a smallernumber offactors by combining a few correlated variables into a factor.

    What is a Factor

    A factor is a linear combination of the

    observed original variables V1 ,V2 , . . ,Vn:

    Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + WinVn

    where

    Fi = The ith factor (i = 1, 2,..,m n)Wi = weight (factor score coefficient)

    n = number of original variables.

    Factor Analysis

    z Discovers a smaller set ofuncorrelated factors (m) to

    significantly (m

    n)

    z These factors do not have multi-collinearity, i.e. they

    are orthogonal to each other

    z They can then be used in further multivariate analysis

    (regression or discriminant analysis).

    Example # 1

    z Evaluate credit card usage & behavior of customers

    z Initial set of variables is large: Age, Gender, Marital

    Status, Income, Education, Employment Status,

    Credit History, Family Background: Total 8 variables

    z Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + Wi8V8

  • 7/30/2019 Session5 Factor Analysis Handout

    4/16

    Example # 1

    Reduction of8 variables into 3 factors (i = 3):

    ac or : eavy we g age or age, gen er, mar a s a us

    and low weightages to other variables

    z Factor 2: Heavy weightage for income, education, employment

    status & low weightages to others

    z Factor 3: Heavy weightage for credit history & family

    background and low weightages to other variables.

    Example # 1

    These 3 un-correlated factors can be identified by commoncharacteristics of variables with heavy weightages & named

    accordingly as follows:

    z Factor 1: (age, gender, marital status) as Demographic Status

    z Factor 2: (income, education, employment status) as Socio-

    economic Status

    z Factor 3: (credit history & family background) as Background

    Status.

    Example # 2

    z Evaluate customer motivation for buying a two wheeler

    z Initial set of variables is large:

    1. Affordable

    2. Sense of freedom3. Economical

    4. Mans vehicle

    5. Feel powerful

    6. Friends jealous

    7. Feel good to see ad of this brand

    8. Comfortable ride

    9. Safe travel

    10.Ride for three.

    Example # 2

    Reduction of10 variables to 3 factors:

    z Pride: (mans vehicle, feel powerful, sense of freedom, friends

    jealous, feel good to see ad of this brand)

    z Utility: ( economical, comfortable ride, safe travel)

    z Economy: (affordable, ride for three to be allowed)

  • 7/30/2019 Session5 Factor Analysis Handout

    5/16

    Standardize the Data

    z Enlist all variables that can be important in resolving theresearch problem

    z Collect metric data on each variable from all subjects

    sampled

    z Convert all data on each variable into standard format

    ean: . ev.: s nce eren var a es may ave

    different units of measurement

    z SPSS / SAS etc. do it automatically.

    Standard Normal Distribution

    Two Steps in Factor Analysis

    Factor Extraction

    What Factor Extraction does

    1. It determines the minimum numberof factors that can

    comfortabl re resent all variables in the research stud

    z Obviously, maximum number of factors equals the total number of

    variables

    2. It converts correlated variables into the desired number of

    un-correlated factors

    Tool: Principal Component Method.

  • 7/30/2019 Session5 Factor Analysis Handout

    6/16

    Principal Component Method

    z SPSS gives inter-variable correlations

    z PCM assists checking appropriateness of factor analysis

    (Bartletts test)

    z Assists checking adequacy of sample size (KMO test)

    z Gives initial eigen values

    z They determine the minimum numberof factors that can

    represent all variables.

    z To determine the benefits consumers seek from

    Example # 3

    z Sample of 30 persons was interviewed

    agreement with the following statements using a 7

    point scale: (1=Strongly agree, 7=Strongly disagree)

    Six Important Variables

    V1: Buy a toothpaste that prevents cavities

    V2: Like a toothpaste that gives shiny teeth

    V3: Toothpaste should strengthen your gums

    V4: Prefer toothpaste that freshens breath

    V5: Prevention of tooth decay is not an important benefit

    V6: Most important concern is attractive teeth

    Data obtained are given in the next slide.

    Original Data: 30 persons, 6 variablesRESPONDENT

    NUMBER V1 V2 V3 V4 V5 V61 7.00 3.00 6.00 4.00 2.00 4.00

    2 1.00 3.00 2.00 4.00 5.00 4.00

    3 6.00 2.00 7.00 4.00 1.00 3.00

    4 4.00 5.00 4.00 6.00 2.00 5.00

    5 1.00 2.00 2.00 3.00 6.00 2.00

    6 6.00 3.00 6.00 4.00 2.00 4.00

    7 5.00 3.00 6.00 3.00 4.00 3.00

    8 6.00 4.00 7.00 4.00 1.00 4.00

    9 3.00 4.00 2.00 3.00 6.00 3.00

    10 2.00 6.00 2.00 6.00 7.00 6.00

    11 6.00 4.00 7.00 3.00 2.00 3.00

    12 2.00 3.00 1.00 4.00 5.00 4.00

    13 7.00 2.00 6.00 4.00 1.00 3.00

    14 4.00 6.00 4.00 5.00 3.00 6.00

    15 1.00 3.00 2.00 2.00 6.00 4.00

    16 6.00 4.00 6.00 3.00 3.00 4.00

    17 5.00 3.00 6.00 3.00 3.00 4.00

    18 7.00 3.00 7.00 4.00 1.00 4.00

    19 2.00 4.00 3.00 3.00 6.00 3.00

    . . . . . .

    21 1.00 3.00 2.00 3.00 5.00 3.00

    22 5.00 4.00 5.00 4.00 2.00 4.00

    23 2.00 2.00 1.00 5.00 4.00 4.00

    24 4.00 6.00 4.00 6.00 4.00 7.00

    25 6.00 5.00 4.00 2.00 1.00 4.00

    26 3.00 5.00 4.00 6.00 4.00 7.00

    27 4.00 4.00 7.00 2.00 2.00 5.00

    28 3.00 7.00 2.00 6.00 4.00 3.00

    29 4.00 6.00 3.00 7.00 2.00 7.00

    30 2.00 3.00 2.00 4.00 7.00 2.00

  • 7/30/2019 Session5 Factor Analysis Handout

    7/16

    Inter-variable Correlations:

    Correlation Matrix from SPSS

    Variables V1 V2 V3 V4 V5 V6

    V1 1.000

    V2 -0.530 1.000

    V3 0.873 -0.155 1.000

    V4 -0.086 0.572 -0.248 1.000

    V5 -0.858 0.020 -0.778 -0.007 1.000

    V6 0.004 0.640 -0.018 0.640 -0.136 1.000

    Bartletts Test

    z For valid factor analysis, many variables must becorrelated with each other

    z That means, if each original variable is completely

    independent of each of the remaining n-1 variables,

    there is no need to perform factor analysis

    z i.e. if zero correlation among all variables

    z H0: Correlation matrix is unit matrix.

    Unit Matrix

    V1 V2 V3 ---- ---- Vn

    V1 1 0 0 0 0 0

    V2 0 1 0 0 0 0

    V3 0 0 1 0 0 0

    ---- ---- ---- ---- ---- ---- ----

    ---- ---- ---- ---- ---- ---- ----

    Vn 0 0 0 0 0 1

    Bartletts Test

    z For valid factor analysis, many variables must be correlated

    with each other

    zH0 : Correlation matrix is unit matrix

    z Here, SPSS gives p level < 0.05

    z Reject H with 95% level of confidence

    z So, correlation matrix is not unit matrix

    Conclusion: Factor analysis can be validly done.

  • 7/30/2019 Session5 Factor Analysis Handout

    8/16

    KMO Test

    zKaiser-Meyer-Olkin measure of sampling adequacy inthis case= 0.660

    z Values of KMO between 0.5 and 1.0 suggest thatsample is adequate for carrying out factor analysis.Otherwise, we must draw additional sample.

    z Here, 0.660 > 0.5

    zConclusion: Sample is adequate

    z Thus, these two tests together confirm appropriatenessof factor analysis.

    Initial Eigen Values

    Initial Eigen values

    Factor Eigen value % of variance Cumulat. % 1 2.731 45.520 45.520 2 2.218 36.969 82.488

    3 0.442 7.360 89.848 4 0.341 5.688 95.536

    5 0.183 3.044 98.580 6 0.085 1.420 100.000

    Eigen Value

    z Variance of each standardized variable is 1

    z Total variance in study = Number of variables (here 6)

    z Fi = W i1V1 + W i2V2 + W i3V3 + . . . . . . . . . . . . . + W i6V6

    z Variance explained by a factor is called Eigen Value of that factor

    z It depends on (a) weights for different variables and (b)

    corre a ons e ween e ac or eac var a e ca e ac or

    Loadings)

    z Higher the eigen value of the factor, bigger is the amount of

    variance explained by the factor.

    Principal Component Method

    z Each original variable has Eigen value = 1 due to

    standardization

    z So, factors with eigen value < 1 are no better than a single

    variable

    z Onl factors with ei en value 1 are retained

    z Principal Component Method determines the least number

    of factors to explain maximum variance.

  • 7/30/2019 Session5 Factor Analysis Handout

    9/16

    PCM is a Sequential Process

    z Selects weights (i.e. factor score coefficients) in such a

    manner that the first factor explains the largest portion of the

    z F1 = W11V1 + W12V2 + W13V3 + . . . . . . . . . . . + W1nVn

    z Then selects a second set of weights for

    z F2 = W21V1 + W22V2 + W23V3 + . . . . . . . . . . . + W2nVn

    variance, subject to being uncorrelated with first factor

    z Process goes on till cumulative variance explained crosses

    a desired level, usually 60%.

    Two Factors Explain > 60% Variation

    Factor Eigen Value % of Variance Cumulative %. . .

    2 2.218 36.969 82.488

    Conclusion: Number of factors required

    to explain >60% variation is 2.

    Factor Loadings: Correlation Between

    Each Factor & Each Variable

    Factor Matrix

    .

    Variables Factor 1 Factor 2

    V1 0.928 0.253

    V2 -0.301 0.795

    V3 0.936 0.131

    V4 -0.342 0.789

    - . - .

    V6 -0.177 0.871

    Factor Rotation

    z Initial factor matrix rarely results in factors that can be easily

    z Therefore, through a process of rotation, the initial factor

    matrix is transformed into a simpler matrix that is easier to

    interpret

    z It leads to identify which factors are strongly associated with

    which original variables.

  • 7/30/2019 Session5 Factor Analysis Handout

    10/16

    Rotation of Factors

    Factor Rotation

    t e re erence axes o t e actors are tune a out t e originuntil some other position has been reached.

    Since unrotated factor solutions extract factors based on howmuch variance they account for, with each subsequent factoraccounting for less variance.

    So the ultimate effect of rotating the factor matrix is toredistribute the variance from earlier factors to later ones toachieve a simpler, theoretically more meaningful factor pattern.

    Two Rotational Approaches:

    1. Orthogonal = axes are maintained at 90 degrees.

    2. Oblique = axes are not maintained at 90 degrees.

    Orthogonal Factor RotationOrthogonal Factor Rotation

    UnrotatedUnrotatedFactor IIFactor II

    Rotated Factor IIRotated Factor II+1.0

    V1

    UnrotatedUnrotatedFactor IFactor I

    -1.0 -.50 0 +.50 +1.0

    +.50V2

    RotatedRotatedFactor IFactor I

    -.50

    -1.0

    V4

    V5

    UnrotatedUnrotatedFactor IIFactor II OrthogonalOrthogonal

    Rotation: Factor IIRotation: Factor II+1.0

    1Oblique Rotation:Oblique Rotation:

    Oblique Factor RotationOblique Factor Rotation

    UnrotatedUnrotatedFactor IFactor I

    -1.0 -.50 0 +.50 +1.0

    +.50 V2

    Factor IIFactor II

    ObliqueObliqueRotation:Rotation:Factor IFactor I

    -.50

    -1.0

    V4

    V5

    OrthogonalOrthogonal

    Rotation: Factor IRotation: Factor I

  • 7/30/2019 Session5 Factor Analysis Handout

    11/16

    Orthogonal Rotation Methods:

    Quartimax (simplify rows).

    Varimax sim lif columns .

    Equimax (combination).

    Simplification means attempting to making zero values either:

    in rows (variables, i.e. maximizing a variables loading on asingle factor) making as many values in rows as close to zero

    ,

    in columns (factors, i.e. making the number of high loading asfew as possible) - making as many values in each column asclose to zero as possible.

    Choosing Factor Rotation MethodsChoosing Factor Rotation Methods

    Orthogonal rotation methods:

    o are the most widely used rotational methods.

    o are the preferred method when the research goal is datareduction to either a smaller number of variables or a set of

    uncorrelated measures for subsequent use in other

    multivariate techniques.

    Oblique rotation methods:

    o best suited to the goal of obtaining several theoretically

    meaningful factors or constructs because, realistically, very

    few constructs in the real world are uncorrelated.

    Factor Rotation

    In rotating the factors, we would like each factor to

    variables.

    The process of rotation is called orthogonal rotation if

    the axes are maintained at right angles

    Let us see how it is done.

    Illustration of Rotation of Axes

    Let us take a simpler illustration

    z Su ose factor loadin s of 2 variables on 2 factors:

    Factor 1 Factor 2

    V1 0.6 0.7

    V2 0.5 - 0.5

    z Variation explained by V1 = (0.6)2 + (0.7)2 = 0.85

    z = 2 + - 2 = . . .

    z None of the loadings is too large or too small to reach anymeaningful conclusion

    z Let us rotate the two axes & see what happens.

  • 7/30/2019 Session5 Factor Analysis Handout

    12/16

    Graph of Original Loadings

    Factor 2 +1V

    Factor 1

    -1 0 +1

    V2

    -1

    Graph of Rotated Axes (clockwise)

    Factor 2 +1V

    Factor 1

    -1 0 +1

    V2

    -1

    Graph of Rotated Axes

    -1 Factor 2V

    0

    V2

    -1 Factor 1 +1

    Factor Loadings After Rotation

    z Factor loadings of 2 variables on 2 factors:

    ac or ac or

    V1 -0.2 0.9V2 0.7 0.1

    z Variation explained by V1 = (-0.2)2 + (0.9)2 = 0.85

    z Variation explained by V2 = (0.7)2 + (0.1)2 = 0.50

    z Note that variation explained remains unchanged

    z Some of the loadings are too large or too small

    zNow, we can reach meaningful conclusion.

  • 7/30/2019 Session5 Factor Analysis Handout

    13/16

    Example # 3:

    Factor Loadings after Rotation

    Rotated Factor Matrix

    Variables Factor 1 Factor 2

    V1 0.962 -0.027

    V2 -0.057 0.848

    V3 0.934 -0.146

    V4 -0.098 0.845

    - -. .

    V6 0.083 0.885

    Weightages to Variables for Each Factor

    from SPSS

    Factor Score Coefficient Matrix

    Variables Factor 1 Factor 2

    V1 0.358 0.011

    V2 -0.001 0.375

    V3 0.345 -0.043

    V4 -0.017 0.377

    V5 -0.350 -0.059

    V6 0.052 0.395

    Factors: (6 Variables into 3 factors)

    F = W V + W V + W V + . . . + W V

    In example # 3:

    F1 = 0.358V1 0.001V2 + 0.345V3 0.017V4 0.350V5 + 0.052V6

    F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6

    Interpretation of Factors

    A factor can then be interpreted in terms of the variables that load

    .

    FACTOR 1 has high coefficients for:

    z V1: Buy a toothpaste that prevents cavities

    z V3: Toothpaste should strengthen your gums

    z V5: Prevention of tooth decay is not an important benefit

    (Note: Coefficient is negative)

    FACTOR 1 may be labelled as Health Factor.

  • 7/30/2019 Session5 Factor Analysis Handout

    14/16

    Interpretation of Factors

    F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6

    FACTOR 2 has high coefficients on:

    z V2: Like a toothpaste that gives shiny teeth

    z V4: Prefer toothpaste that freshens breath

    z V6: Most important concern is attractive teeth

    FACTOR 2 may be labelled as Aesthetic Factor

    The factors are jointly calledprincipal components.

    Conclusion

    From the data gathered from 30 respondents on 6,

    seek from purchase of a toothpaste are HEALTH and

    AESTHETICS

    Health has 45.5 % importance

    Aesthetics has 36.9 % importance.

    Selecting a Surrogate Variable

    z Sometimes, we are not willing to discover new factors but

    we want to stick to ori inal variables and want to know

    which ones are important

    z By examining the factor matrix, we could select for each

    factor just one variable with the highest loading for that

    factor, if possible

    z That variable could then be used as a surrogate variable for

    the associated factor

    Selecting Surrogate Variables

    z V1 has highest loading on F1

    z So, V1 is surrogate variable for F1

    z Similarly V6 could be surrogate for F2

    z So, we concentrate on only 2 variables: V1 (Preventing

    Cavities) & V6 (Attractive teeth).

  • 7/30/2019 Session5 Factor Analysis Handout

    15/16

  • 7/30/2019 Session5 Factor Analysis Handout

    16/16

    Assessing Factor LoadingsAssessing Factor Loadings

    While factor loadings of +0.30 to +0.40 are minimally acceptable, valuesgreater than + 0.50 are considered necessary for practical significance.

    To be considered significant:o A smaller loading (i.e. +0.30) is enough either a larger sample size,

    or a larger number of variables being analyzed.

    o A larger loading (i.e. + 0.50 and above) is needed for a smaller

    sample size.

    Statistical tests of significance for factor loadings are generally veryconservative and should be considered only as starting points needed for

    including a variable for further consideration.

    Interpreting The FactorsInterpreting The Factors

    An optimal structure exists when all variables have high loadings only on a

    single factor.

    Variables that cross-load (load highly on two or more factors) are usually

    deleted unless theoretically justified.

    Variables should generally have communalities of >0.50 to be retained in

    the analysis.

    Re-specification of a factor analysis can include options such as:

    o deleting a variable(s),

    o changing rotation methods, and/or

    o increasing or decreasing the number of factors.