Learn About Reliability Analysis
and Factor Analysis in SPSS With
Online Shopping Data (2006)
© 2019 SAGE Publications, Ltd. All Rights Reserved.
This PDF has been generated from SAGE Research Methods Datasets.
Learn About Reliability Analysis
and Factor Analysis in SPSS With
Online Shopping Data (2006)
Student Guide
Introduction
This example describes what factor analysis is, provides guidelines in performing
such an analysis, and shows how to produce factors and interpret the results.
This example dataset contains variables used in an online shopping survey. Our
research questions are (1) What are the factors affecting the consumer shopping
experience? (2) Is there a positive relationship between these factors?
The online survey was designed to assess consumers’ attitude and shopping
experience. A factor analysis is used to analyze the data. We are interested in
forming factors based on survey questions and test the relationship between the
factors. The survey questions ask consumers’ satisfaction with online vendors,
information presented on the websites, and the benefits of doing online shopping.
There are three questions to measure satisfaction and three questions to measure
ease of navigation. We use a Likert scale between 1 and 7 for each survey
question. This method allows researchers to converge related questions into one
factor.
In this tutorial, we provide a codebook describing the variables, a data file, and
step-by-step guide on how to perform a factor analysis and test hypotheses
related to the factors.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 2 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
What Is a Factor Analysis?
Factor Analysis is a method for modeling observed variables to identify
unobserved “factors.” It is a method to reduce dimensions. With reduced
dimensions, we are still able to preserve the meanings of our research variables.
There are two types of factor analysis—exploratory and confirmatory. In an
exploratory factor analysis, researchers intend to search for a possible underlying
structure in the variables. In a confirmatory analysis, the researchers know the
structure of the variables ahead of time and only want to verify what is known.
The major difference between exploratory and confirmatory factor analysis is
that researchers have knowledge about variables and factors based on prior
findings in the literature or practical experience. When used as an exploratory tool,
factor analysis doesn’t have many statistical assumptions, except that variables
measuring the same underlying factor are significantly correlated to each other.
Orthogonal Factor Model
The n continuous random variables X1, X2, X3, …, Xn are mutually statistically
independent.
(1)
Cov(Xi, Xk) = 0
if Xi and Xk are independent.
The means and covariances of the n × 1 random vector X can be written as
matrices.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 3 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
E(X) = [E(X1)
E(X2)
⋮
E(Xn) ] = [μ1
μ2
⋮μn
] = μ
|∑ = Cov(X) = [σ11
σ21
⋯
⋯
σn1
⋯
⋯
⋯
⋯
⋯
σ1n
σ2n
⋯
⋯
σnn
]The observed vector X has n components with a mean of μ and covariance matrix
Ʃ. We intend to use the factor model to represent the data. The question is that
whether such a model is adequate.
Xnx1 = μnx1 + L1Fmx1 + εnx1
where:
• μi = mean of variable i
• εi = ith error
• Fj = jth common factor
• Lij = loading of the ith variable on the jth factor
X1 – μ1 = l11 F1 + l12 F2 + l13 F3 + … + l1mFm + ε1
X2 – μ2 = l21 F1 + l22 F2 + l23 F3 + … + l2mFm + ε1
…
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 4 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Xn – μn = ln1 F1 + ln2 F2 + ln3 F3 + … + lnmFm + ε1
Covariance structure for the orthogonal factor model:
Cov(X) = LL ' + Ψ
Cov(X, F) = L
Factor loading shows the relationship of measures to their corresponding factors.
The larger the factor loading, the stronger the relationship between the measure
and its underlying factor. When evaluating factor loadings, standaradized factor
loadings can be interpreted like standardized regression coefficients.
Definitions Used in Factor Analysis
When learning factor analysis, it is important to understand the following basic
concepts:
Factor: Linear combination (variate) of the original variables. Factors
represent the underlying dimensions (latent variables) that account for
the observed variables.
Factor loadings: Correlation between the original variables and the
factors.
Factor matrix: Table listing the factor loadings of all variables on each
factor.
Factor score: Composite score created by averaging the variables
represented by the factor extracted from the factor analysis.
Common variance: Variance related with other variables and shared
with other variables.
Unique variance: Variance of each variable not explained or associated
with other variables.
Eigenvalue: It represents the amount of variance accounted for by a
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 5 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
factor.
Factor rotation: Process of adjusting the factor axes to achieve a simpler
and more meaningful factor solution.
Oblique factor rotation: Factor rotation calculated so that the extracted
factors are correlated.
Orthogonal factor rotation: Factors are extracted by rotating their axes,
so that their axes are maintained at 90 degrees. The correlation between
the factors is zero. Varimax is one of the most popular orthogonal factor
rotation methods.
Steps in Conducting a Factor Analysis
When performing a factor analysis, you typically need to go through the following
steps. First you need to know what are the variables and what are the factors you
have in mind. Then complete the following steps:
1. Perform a descriptive statitics on the variables.
2. Conduct a factor analysis using varimax rotation.
3. Interprete factor results and identify cross-loaded items.
4. Purify items and rerun the factor analysis until no cross-loadings.
5. Produce reliability scores using the items with clean loadings.
6. Compute the composite scores of factors.
7. Test relationships between factors.
What Is a Reliability Analysis?
Reliability refers to consistency, an instrument that produces stable and consistent
results/scores. After factors are extracted, we need to check the stability of a
factor. That is, to assess the reliability of a factor.
If a customer answers the same customer satisfaction survey a few times, and
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 6 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
always receives the same or similar results, then the survey is reliable. Reliability
allows you to repeat the study and receive consistent results. Another example
of reliablity is the standaradized American College Testing (ACT) or Scholastic
Aptitude Test (SAT) tests. Such tests will measure students’ knowledge on
mathematics, language, and science. Cronbach’s alpha is the most widely used
internal consistency coefficient.
The formula is listed below:
a =N ×
¯c
¯v + (N − 1) ×
¯c
In the above equation, N is the number of items, ¯c is the average inter-item
covariance among the items, and ¯v equals the average variance.
Table 1 below shows the guidelines for evaluating Cronbach’s alpha scores.
Table 1: Cronbach’s Alpha.
Cronbach’s Alpha Internal Consistency
α ≥ 0.9 Excellent
0.9 > α ≥ 0.8 Good
0.8 > α ≥ 0.7 Acceptable
0.7 > α ≥ 0.6 Questionable
0.6 > α ≥ 0.5 Poor
Suppose if we use three items to measure customer satisfaction, after running a
reliability analysis, we get the reliability score of 0.75, which is above the threshold
of 0.7 (Robinson, Shaver, & Wrightman, 1991). We therefore conclude that the
items we used for customer satisfaction is reliable.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 7 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Illustrative Example: Online Shopping Data
To study consumer attitude and behavior or describe characteristics of a large
population, companies develop and distribute a survey to consumers to collect
information. When performing an analysis, each question is a variable. The
variables allow researchers to conduct a wide variety of tests based on their
interest. Business topics like consumer satisfaction, loyalty, and intention are
sustainable research topics.
In this example, we study satisfaction-related factors and variables. This example
presents the process on how to conduct a factor analysis. We developed an
online survey to measure consumers’ shopping experience. Our survey contains
questions related to consumer satisfaction, information quality, and perceptions
of online shopping convenience. We perform the analysis and also test the
relationships between the factors.
Thus, this example addresses the following research questions:
1. What are the factors that affect consumers’ shopping experience?
2. Is there a positive relationship between satisfaction and information
quality?
Stated in the form of null hypothesis:
H0: There is no correlation between satisfaction and information
quality.
3. Is there a positive relationship between satisfaction and online
shopping convenience?
Stated in the form of null hypothesis:
H0: There is no correlation between satisfaction and online shopping
convenience.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 8 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Measures/Variables
In the satisfaction example explained above, satisfaction is called an unobserved
factor. Factors are more abstract, and broad concepts are used to illustrate
observed phenomenon. In social science, unobserved factors play important roles
in study social behavior. By creating factors through observable variables,
researchers can better interpret relationships between factors.
In social and behavioral sciences, examples of abstract concepts are satisfaction
and intention measure people’ feelings, emotions, perceptions etc. These kinds of
concepts are subjective and different from variables such as weight, height, cost,
profit, income, etc. Variables such as weight and cost, for example, are called
objective measures. We use a scale to measure weight and use a monetary unit
to measure cost. When studying high-level concepts, we need to differentiate
conceptual definition from operational definition. Conceputal definition is the
meaning of a concept. Operational definition is a metric we develop to quantify
something of interest. Let’s use satisfaction as an example. Conceputally
satisfaction is defined as a judgment regarding a product or service feature, or
the product or service itself, provided (or is providing) a pleasurable level of
consumption-related fulfillment, including levels of under- or overfulfillment (Oliver,
1981). To operationalize satisfaction, we use three questions: (1) My choice to
use this company was a wise one; (2) I am always delighted with this company’s
service; and (3) overall, I am satisfied with this company. We use a 7-point Likert
scale for possible respneses, where 1 indicates strongly disagree and 7 indicates
strongly agree. In this case, satisfaction is called a latent/unobserved variable.
The three questions are called observable variables. In our mind, these three
questions measure satisfaction.
The Data
This example uses the data collected on consumers’ online shopping perceptions
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 9 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
between 2005 and 2006. We received 328 responses from the online survey.
Table 2 below shows factors, variables, and survey questions. Each question on
the survey is a variable when we perform analysis.
Table 2: Factors and Variables.
Factors Variables Survey Questions
Information quality
Productinfo The website provides sufficient information about products.
infopresent The information on the website is recent.
uptoday It seems that the website gets updated timely.
Satisfaction
tobuy I want to buy more products online.
wise Online shopping is wise.
pleasant Online shopping is a pleasure experience.
satisfied I am satisfied with my shopping experience.
e-Convenience
savetime It saves me time when shopping online.
comprice It is easy to compare prices when shopping online.
comproduct It is easy to compare products when shopping online.
convenient It is convenient to buy products online.
The reponses to the survey are coded between 1 and 7 with 1 representing
strongly disagree and 7 meaning strongly agree.
Analyzing the Data
Many statistical applications (SAS, SPSS, Statistica, Knime, etc.) can be used to
perform factor analysis. We need to typically report the sample size, descriptive
statistics (mean, min, max, and standard deviation) on the variables as shown in
Table 3.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 10 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Table 3: Descriptive Statistics.
N Minimum Maximum Mean Standard Deviation
productinfo 328 1 7 5.39 1.30
infopresent 328 1 7 5.37 1.22
uptoday 328 1 7 5.61 1.25
tobuy 328 1 7 5.36 1.38
wise 328 1 7 4.99 1.44
pleasant 328 1 7 5.25 1.30
satisfied 328 1 7 5.23 1.36
savetime 328 1 7 5.25 1.45
comprice 328 1 7 5.47 1.53
comproduct 328 1 7 5.12 1.66
convenient 328 1 7 5.60 1.25
Factor Analysis
We conducted Kaiser-Meyer-Olkin (KMO) Test, which is a measure of how suited
your data is for factor analysis. Kaiser (1970) suggest KMO value of 0.8 is
adequate for factor analysis. Our data produce a KMO value of 0.9; thus, our data
are appropriate for factor analysis because it exceeds the suggested value.
Next, we use SPSS to perform factor analysis with varimax rotation. Figure 1
shows the emergence of three factors. Variables with loadings greater than 0.5 are
loaded on the same factor, except the variable tobuy which loads on Factor 1 with
the value of 0.59 and Factor 3 with the value of 0.52. Obviously, the variable tobuy
cross-loads on two factors. In this case, the variable tobuy should be removed
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 11 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
from the factor analysis and rerun the factor analysis. This process is necessary
to produce meaningful factors that are different from each other.
Figure 1: Factor Loadings.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 12 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 13 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Figure 2 shows variables are loaded on factors that are intended to with loadings
greater than 0.5. Unlike Figure 1, we assign a meaningful name to each factor
in Figure 2. Three variables productinfo, infopresent, and uptoday load on
information quality with loadings of 0.86, 0.85, and 0.76, respectively; three
variables wise, pleasant, and satisfied load on satisfaction with loadings of 0.91,
0.82, and 0.88, respectively; and four variables savetime, comprice, comproduct,
and convenient load on e-convenience with loadings of 0.78, 0.84, 0.8, and 0.75,
respectively.
Figure 2: Clean Loadings.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 14 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Validity
When conducting a factor analysis, it is important to perform validity tests. We
focus on content validity, discriminant validity, and convergent validity. Content
validity refers to the development of survey items. Researchers must conduct
extensive literature review on the topic and consult experts. Convergent validity
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 15 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
refers to the degree to which two measures designed to measure the same
construct are related. Convergence is found if the two measures are highly
correlated. An early advocated rule of thumb for convergent validity is that the
correlation between two measures designed to assess the same construct should
be statistically significant and “sufficiently large to encourage further examination
of validity” (Campbell & Fiske, 1959). All the loadings are above 0.5 on one
construct and cross-loadings are less than 0.5 as suggested by Hair, Anderson,
Tatham, and Black (1998). Discriminant validity assesses the degree to which
two measures designed to measure similar, but conceptually different constructs
are related. A low-to-moderate correlation is often considered evidence of
discriminant validity. The factor loadings in Figure 2 provide evidence of
discriminant validity and convergent validity. Another approach to evaluate
discriminant validity is to examine item to construct relationships to ensure that
items correlate higher to the construct they intend to measure than to other
constructs (Campbell & Fiske, 1959). Table 4 shows that variables are more highly
correlated with their underlying factor than with other factors. infoQuality and
productinfo, satisfaction and productinfo, and e-convenience and productinfo are
0.87, 0.37, and 0.36 respectively. The variable productinfo has a higher correlation
with infoQuality than with satisfaction and e-convenience.
Table 4: Item-to-Contruct Correlation.
infoQuality satisfaction e-convenience
productinfo 0.87 0.37 0.36
infopresent 0.87 0.41 0.31
uptoday 0.82 0.41 0.31
wise 0.38 0.94 0.38
pleasant 0.49 0.90 0.44
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 16 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
satisfied 0.41 0.92 0.42
savetime 0.26 0.38 0.80
comprice 0.32 0.36 0.85
comproduct 0.31 0.26 0.81
convenient 0.35 0.53 0.82
Note: Bold values = 0.1% significance.
Reliability
The most widely used internal consistency reliability coefficient is Cronbach’s
coefficient alpha. Some suggest that the acceptance level for coefficient alpha
should be at least 0.7 (Robinson et al., 1991). All items met this criteria. We run
reliability analysis and find that the reliability score of information quality is 0.83;
the reliability score is 0.84 for e-convenience and 0.91 for satisfaction.
Compute Composite Factor Scores
After we clean up the factor loadings and identify the variables and their
associated factors, we now can compute factor scores. We know theoretically
three variables are used to assess information quality (productinfo, infopresent,
and uptoday). We average the raw scores of these three variables to get the
score of information quality. We repeat the similar steps to compute the composite
scores of satisfaction and e-convenience. Figure 3 shows a portion data of the
factors.
Figure 3: Portion of Data.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 17 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Hypothesis Testing
After composite scores are calculated, we are ready to perform multiple linear
regression to test our hypotheses. When we have more than one independent
variable (information quality, e-convenience) to predict the dependent variable
(satisfaction), we need to check variable inflation factor (VIF). The literature
contains various recommendations for acceptable levels of VIF. For example, Hair
et al. (1995) recommend a VIF value of 10, whereas Rogerson (2001) suggests a
value of 5. With our data, the VIF value is 1.78, much lower than the suggested
values. When evaluating the model fit, we check the R-square, overall model
significance, and significance of coefficients. The output shows that R-square is
34.1%, the overall model is significant, and the two independent variables are
significant.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 18 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Results
As we intend to answer three research questions, in the Results section, we need
to present and discuss each research question.
Research Question 1: What Are the Factors?
Figure 2 shows three factors emerged after running factor analysis with varimax
rotation. By examining the content of the three questions (“The website provides
sufficient information about products,” “The information on the website is recent,”
and “It seems that the website gets updated timely”) in the survey, we feel
that these three questions are related to information quality. We thus name a
factor information quality which is observed through the variables productinfo,
infopresent, and uptoday. Using the similar approach, we can name the other two
factors as satisfaction and e-convenience.
In summary, SPSS only produces factors, but these factors need researchers to
interpret and give a meaningful name.
Research Question 2: There Is No Correlation Between Satisfaction
and Information Quality
The result shows that the standardized coefficient is 0.35 between information
quality and satisfaction with p < .001, and we conclude that there is a strong and
positive relationship between information quality and satisfaction.
Research Question 3: There Is No Correlation Between Satisfaction
and Online Shopping Convenience
The result shows that the standardized coefficient is 0.32 between e-convenience
and satisfaction with p < .001, and we conclude that there is a strong and positive
relationship between e-convenience and satisfaction.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 19 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
Review
Factor analysis is a statistical procedure to identify interrelationships that exist
among a large number of variables, that is, to identify how suites of variables are
related. Factor analysis can be used for exploratory or confirmatory purposes. It
is commonly used in social, business, and behavior studies. Continuous variables
or variables that are at interval or ratio scale can be used to conduct a factor
analysis. Factor analysis is a data reduction technique. Factors are extracted from
variables, and these factors can be used to conduct relationship tests or group
difference tests on the factors.
In the example we provide, we use a survey method to collect data on consumers’
perceptions on their online shopping experience. Using the survey data, we
extracted three factors—satisfaction, information quality, and convenience. We
then compute the composite scores of the three factors and use the composite
score to test the relationships between satisfaction and information quality, and
satisfaction and convenience.
You should know:
• What types of variables are suited for a factor analysis.
• The basic assumptions underlying factor analysis.
• How to perform a factor analysis.
• How to compute factor scores.
• How to interpret results of a factor analysis.
• How to refine factor loadings.
• How to perform reliability and validity tests.
Your Turn
Now it is your turn to perform factor analysis and reproduce the results presented
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 20 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)
above. Read the file containing the description of variables. Download the dataset
and perform descriptive analysis, correlation analysis, factor analysis, and
reliability test on the sample data. After factors are extracted, compute the scores
for each factor and use the composite factor scores to perform hypotheses testing.
References
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation
by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81–105.
Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995).
Multivariate data analysis (3rd ed.). New York, NY: Macmillan.
Kaiser, H. F. (1970). A second generation Little Jiffy. Psychometrika, 35,
401–415.
Oliver, R. L. (1997). Satisfaction: A behavioural perspective on the consumer.
New York, NY: McGraw Hill.
Robinson, J. P, Shaver, P. R., & Wrightsman, L. S. (Eds.). (1991). Criteria
for scale selection and evaluation. In Measures of personality and social
psychological attitudes (pp. 1–15). San Diego, CA: Academic Press.
Rogerson, P. A. (2001). Statistical methods for geography. London, UK: SAGE.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 21 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online
Shopping Data (2006)