Download pdf - Learn About Reliability Analysis and Factor Analysis in

Learn About Reliability Analysis

and Factor Analysis in SPSS With

Online Shopping Data (2006)

© 2019 SAGE Publications, Ltd. All Rights Reserved.

This PDF has been generated from SAGE Research Methods Datasets.

Learn About Reliability Analysis

and Factor Analysis in SPSS With

Online Shopping Data (2006)

Student Guide

Introduction

This example describes what factor analysis is, provides guidelines in performing

such an analysis, and shows how to produce factors and interpret the results.

This example dataset contains variables used in an online shopping survey. Our

research questions are (1) What are the factors affecting the consumer shopping

experience? (2) Is there a positive relationship between these factors?

The online survey was designed to assess consumers’ attitude and shopping

experience. A factor analysis is used to analyze the data. We are interested in

forming factors based on survey questions and test the relationship between the

factors. The survey questions ask consumers’ satisfaction with online vendors,

information presented on the websites, and the benefits of doing online shopping.

There are three questions to measure satisfaction and three questions to measure

ease of navigation. We use a Likert scale between 1 and 7 for each survey

question. This method allows researchers to converge related questions into one

factor.

In this tutorial, we provide a codebook describing the variables, a data file, and

step-by-step guide on how to perform a factor analysis and test hypotheses

related to the factors.

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 2 of 21 Learn About Reliability Analysis and Factor Analysis in SPSS With Online

Shopping Data (2006)

What Is a Factor Analysis?

Factor Analysis is a method for modeling observed variables to identify

unobserved “factors.” It is a method to reduce dimensions. With reduced

dimensions, we are still able to preserve the meanings of our research variables.

There are two types of factor analysis—exploratory and confirmatory. In an

exploratory factor analysis, researchers intend to search for a possible underlying

structure in the variables. In a confirmatory analysis, the researchers know the

structure of the variables ahead of time and only want to verify what is known.

The major difference between exploratory and confirmatory factor analysis is

that researchers have knowledge about variables and factors based on prior

findings in the literature or practical experience. When used as an exploratory tool,

factor analysis doesn’t have many statistical assumptions, except that variables

measuring the same underlying factor are significantly correlated to each other.

Orthogonal Factor Model

The n continuous random variables X1, X2, X3, …, Xn are mutually statistically

independent.

(1)

Cov(Xi, Xk) = 0

if Xi and Xk are independent.

The means and covariances of the n × 1 random vector X can be written as

matrices.

SAGE



2



E(X) = [E(X1)

E(X2)

⋮

E(Xn) ] = [μ1

μ2

⋮μn

] = μ

|∑ = Cov(X) = [σ11

σ21

⋯

⋯

σn1

⋯

⋯

⋯

⋯

⋯

σ1n

σ2n

⋯

⋯

σnn

]The observed vector X has n components with a mean of μ and covariance matrix

Ʃ. We intend to use the factor model to represent the data. The question is that

whether such a model is adequate.

Xnx1 = μnx1 + L1Fmx1 + εnx1

where:

• μi = mean of variable i

• εi = ith error

• Fj = jth common factor

• Lij = loading of the ith variable on the jth factor

X1 – μ1 = l11 F1 + l12 F2 + l13 F3 + … + l1mFm + ε1

X2 – μ2 = l21 F1 + l22 F2 + l23 F3 + … + l2mFm + ε1

…

SAGE



2



Xn – μn = ln1 F1 + ln2 F2 + ln3 F3 + … + lnmFm + ε1

Covariance structure for the orthogonal factor model:

Cov(X) = LL ' + Ψ

Cov(X, F) = L

Factor loading shows the relationship of measures to their corresponding factors.

The larger the factor loading, the stronger the relationship between the measure

and its underlying factor. When evaluating factor loadings, standaradized factor

loadings can be interpreted like standardized regression coefficients.

Definitions Used in Factor Analysis

When learning factor analysis, it is important to understand the following basic

concepts:

Factor: Linear combination (variate) of the original variables. Factors

represent the underlying dimensions (latent variables) that account for

the observed variables.

Factor loadings: Correlation between the original variables and the

factors.

Factor matrix: Table listing the factor loadings of all variables on each

factor.

Factor score: Composite score created by averaging the variables

represented by the factor extracted from the factor analysis.

Common variance: Variance related with other variables and shared

with other variables.

Unique variance: Variance of each variable not explained or associated

with other variables.

Eigenvalue: It represents the amount of variance accounted for by a

SAGE



2



factor.

Factor rotation: Process of adjusting the factor axes to achieve a simpler

and more meaningful factor solution.

Oblique factor rotation: Factor rotation calculated so that the extracted

factors are correlated.

Orthogonal factor rotation: Factors are extracted by rotating their axes,

so that their axes are maintained at 90 degrees. The correlation between

the factors is zero. Varimax is one of the most popular orthogonal factor

rotation methods.

Steps in Conducting a Factor Analysis

When performing a factor analysis, you typically need to go through the following

steps. First you need to know what are the variables and what are the factors you

have in mind. Then complete the following steps:

1. Perform a descriptive statitics on the variables.

2. Conduct a factor analysis using varimax rotation.

3. Interprete factor results and identify cross-loaded items.

4. Purify items and rerun the factor analysis until no cross-loadings.

5. Produce reliability scores using the items with clean loadings.

6. Compute the composite scores of factors.

7. Test relationships between factors.

What Is a Reliability Analysis?

Reliability refers to consistency, an instrument that produces stable and consistent

results/scores. After factors are extracted, we need to check the stability of a

factor. That is, to assess the reliability of a factor.

If a customer answers the same customer satisfaction survey a few times, and

SAGE



2



always receives the same or similar results, then the survey is reliable. Reliability

allows you to repeat the study and receive consistent results. Another example

of reliablity is the standaradized American College Testing (ACT) or Scholastic

Aptitude Test (SAT) tests. Such tests will measure students’ knowledge on

mathematics, language, and science. Cronbach’s alpha is the most widely used

internal consistency coefficient.

The formula is listed below:

a =N ×

¯c

¯v + (N − 1) ×

¯c

In the above equation, N is the number of items, ¯c is the average inter-item

covariance among the items, and ¯v equals the average variance.

Table 1 below shows the guidelines for evaluating Cronbach’s alpha scores.

Table 1: Cronbach’s Alpha.

Cronbach’s Alpha Internal Consistency

α ≥ 0.9 Excellent

0.9 > α ≥ 0.8 Good

0.8 > α ≥ 0.7 Acceptable

0.7 > α ≥ 0.6 Questionable

0.6 > α ≥ 0.5 Poor

Suppose if we use three items to measure customer satisfaction, after running a

reliability analysis, we get the reliability score of 0.75, which is above the threshold

of 0.7 (Robinson, Shaver, & Wrightman, 1991). We therefore conclude that the

items we used for customer satisfaction is reliable.

SAGE



2



Illustrative Example: Online Shopping Data

To study consumer attitude and behavior or describe characteristics of a large

population, companies develop and distribute a survey to consumers to collect

information. When performing an analysis, each question is a variable. The

variables allow researchers to conduct a wide variety of tests based on their

interest. Business topics like consumer satisfaction, loyalty, and intention are

sustainable research topics.

In this example, we study satisfaction-related factors and variables. This example

presents the process on how to conduct a factor analysis. We developed an

online survey to measure consumers’ shopping experience. Our survey contains

questions related to consumer satisfaction, information quality, and perceptions

of online shopping convenience. We perform the analysis and also test the

relationships between the factors.

Thus, this example addresses the following research questions:

1. What are the factors that affect consumers’ shopping experience?

2. Is there a positive relationship between satisfaction and information

quality?

Stated in the form of null hypothesis:

H0: There is no correlation between satisfaction and information

quality.

3. Is there a positive relationship between satisfaction and online

shopping convenience?

Stated in the form of null hypothesis:

H0: There is no correlation between satisfaction and online shopping

convenience.

SAGE



2



Measures/Variables

In the satisfaction example explained above, satisfaction is called an unobserved

factor. Factors are more abstract, and broad concepts are used to illustrate

observed phenomenon. In social science, unobserved factors play important roles

in study social behavior. By creating factors through observable variables,

researchers can better interpret relationships between factors.

In social and behavioral sciences, examples of abstract concepts are satisfaction

and intention measure people’ feelings, emotions, perceptions etc. These kinds of

concepts are subjective and different from variables such as weight, height, cost,

profit, income, etc. Variables such as weight and cost, for example, are called

objective measures. We use a scale to measure weight and use a monetary unit

to measure cost. When studying high-level concepts, we need to differentiate

conceptual definition from operational definition. Conceputal definition is the

meaning of a concept. Operational definition is a metric we develop to quantify

something of interest. Let’s use satisfaction as an example. Conceputally

satisfaction is defined as a judgment regarding a product or service feature, or

the product or service itself, provided (or is providing) a pleasurable level of

consumption-related fulfillment, including levels of under- or overfulfillment (Oliver,

1981). To operationalize satisfaction, we use three questions: (1) My choice to

use this company was a wise one; (2) I am always delighted with this company’s

service; and (3) overall, I am satisfied with this company. We use a 7-point Likert

scale for possible respneses, where 1 indicates strongly disagree and 7 indicates

strongly agree. In this case, satisfaction is called a latent/unobserved variable.

The three questions are called observable variables. In our mind, these three

questions measure satisfaction.

The Data

This example uses the data collected on consumers’ online shopping perceptions

SAGE



2



between 2005 and 2006. We received 328 responses from the online survey.

Table 2 below shows factors, variables, and survey questions. Each question on

the survey is a variable when we perform analysis.

Table 2: Factors and Variables.

Factors Variables Survey Questions

Information quality

Productinfo The website provides sufficient information about products.

infopresent The information on the website is recent.

uptoday It seems that the website gets updated timely.

Satisfaction

tobuy I want to buy more products online.

wise Online shopping is wise.

pleasant Online shopping is a pleasure experience.

satisfied I am satisfied with my shopping experience.

e-Convenience

savetime It saves me time when shopping online.

comprice It is easy to compare prices when shopping online.

comproduct It is easy to compare products when shopping online.

convenient It is convenient to buy products online.

The reponses to the survey are coded between 1 and 7 with 1 representing

strongly disagree and 7 meaning strongly agree.

Analyzing the Data

Many statistical applications (SAS, SPSS, Statistica, Knime, etc.) can be used to

perform factor analysis. We need to typically report the sample size, descriptive

statistics (mean, min, max, and standard deviation) on the variables as shown in

Table 3.

SAGE



2



Table 3: Descriptive Statistics.

N Minimum Maximum Mean Standard Deviation

productinfo 328 1 7 5.39 1.30

infopresent 328 1 7 5.37 1.22

uptoday 328 1 7 5.61 1.25

tobuy 328 1 7 5.36 1.38

wise 328 1 7 4.99 1.44

pleasant 328 1 7 5.25 1.30

satisfied 328 1 7 5.23 1.36

savetime 328 1 7 5.25 1.45

comprice 328 1 7 5.47 1.53

comproduct 328 1 7 5.12 1.66

convenient 328 1 7 5.60 1.25

Factor Analysis

We conducted Kaiser-Meyer-Olkin (KMO) Test, which is a measure of how suited

your data is for factor analysis. Kaiser (1970) suggest KMO value of 0.8 is

adequate for factor analysis. Our data produce a KMO value of 0.9; thus, our data

are appropriate for factor analysis because it exceeds the suggested value.

Next, we use SPSS to perform factor analysis with varimax rotation. Figure 1

shows the emergence of three factors. Variables with loadings greater than 0.5 are

loaded on the same factor, except the variable tobuy which loads on Factor 1 with

the value of 0.59 and Factor 3 with the value of 0.52. Obviously, the variable tobuy

cross-loads on two factors. In this case, the variable tobuy should be removed

SAGE



2



from the factor analysis and rerun the factor analysis. This process is necessary

to produce meaningful factors that are different from each other.

Figure 1: Factor Loadings.

SAGE



2



SAGE



2



Figure 2 shows variables are loaded on factors that are intended to with loadings

greater than 0.5. Unlike Figure 1, we assign a meaningful name to each factor

in Figure 2. Three variables productinfo, infopresent, and uptoday load on

information quality with loadings of 0.86, 0.85, and 0.76, respectively; three

variables wise, pleasant, and satisfied load on satisfaction with loadings of 0.91,

0.82, and 0.88, respectively; and four variables savetime, comprice, comproduct,

and convenient load on e-convenience with loadings of 0.78, 0.84, 0.8, and 0.75,

respectively.

Figure 2: Clean Loadings.

SAGE



2



Validity

When conducting a factor analysis, it is important to perform validity tests. We

focus on content validity, discriminant validity, and convergent validity. Content

validity refers to the development of survey items. Researchers must conduct

extensive literature review on the topic and consult experts. Convergent validity

SAGE



2



refers to the degree to which two measures designed to measure the same

construct are related. Convergence is found if the two measures are highly

correlated. An early advocated rule of thumb for convergent validity is that the

correlation between two measures designed to assess the same construct should

be statistically significant and “sufficiently large to encourage further examination

of validity” (Campbell & Fiske, 1959). All the loadings are above 0.5 on one

construct and cross-loadings are less than 0.5 as suggested by Hair, Anderson,

Tatham, and Black (1998). Discriminant validity assesses the degree to which

two measures designed to measure similar, but conceptually different constructs

are related. A low-to-moderate correlation is often considered evidence of

discriminant validity. The factor loadings in Figure 2 provide evidence of

discriminant validity and convergent validity. Another approach to evaluate

discriminant validity is to examine item to construct relationships to ensure that

items correlate higher to the construct they intend to measure than to other

constructs (Campbell & Fiske, 1959). Table 4 shows that variables are more highly

correlated with their underlying factor than with other factors. infoQuality and

productinfo, satisfaction and productinfo, and e-convenience and productinfo are

0.87, 0.37, and 0.36 respectively. The variable productinfo has a higher correlation

with infoQuality than with satisfaction and e-convenience.

Table 4: Item-to-Contruct Correlation.

infoQuality satisfaction e-convenience

productinfo 0.87 0.37 0.36

infopresent 0.87 0.41 0.31

uptoday 0.82 0.41 0.31

wise 0.38 0.94 0.38

pleasant 0.49 0.90 0.44

SAGE



2



satisfied 0.41 0.92 0.42

savetime 0.26 0.38 0.80

comprice 0.32 0.36 0.85

comproduct 0.31 0.26 0.81

convenient 0.35 0.53 0.82

Note: Bold values = 0.1% significance.

Reliability

The most widely used internal consistency reliability coefficient is Cronbach’s

coefficient alpha. Some suggest that the acceptance level for coefficient alpha

should be at least 0.7 (Robinson et al., 1991). All items met this criteria. We run

reliability analysis and find that the reliability score of information quality is 0.83;

the reliability score is 0.84 for e-convenience and 0.91 for satisfaction.

Compute Composite Factor Scores

After we clean up the factor loadings and identify the variables and their

associated factors, we now can compute factor scores. We know theoretically

three variables are used to assess information quality (productinfo, infopresent,

and uptoday). We average the raw scores of these three variables to get the

score of information quality. We repeat the similar steps to compute the composite

scores of satisfaction and e-convenience. Figure 3 shows a portion data of the

factors.

Figure 3: Portion of Data.

SAGE



2



Hypothesis Testing

After composite scores are calculated, we are ready to perform multiple linear

regression to test our hypotheses. When we have more than one independent

variable (information quality, e-convenience) to predict the dependent variable

(satisfaction), we need to check variable inflation factor (VIF). The literature

contains various recommendations for acceptable levels of VIF. For example, Hair

et al. (1995) recommend a VIF value of 10, whereas Rogerson (2001) suggests a

value of 5. With our data, the VIF value is 1.78, much lower than the suggested

values. When evaluating the model fit, we check the R-square, overall model

significance, and significance of coefficients. The output shows that R-square is

34.1%, the overall model is significant, and the two independent variables are

significant.

SAGE



2



Results

As we intend to answer three research questions, in the Results section, we need

to present and discuss each research question.

Research Question 1: What Are the Factors?

Figure 2 shows three factors emerged after running factor analysis with varimax

rotation. By examining the content of the three questions (“The website provides

sufficient information about products,” “The information on the website is recent,”

and “It seems that the website gets updated timely”) in the survey, we feel

that these three questions are related to information quality. We thus name a

factor information quality which is observed through the variables productinfo,

infopresent, and uptoday. Using the similar approach, we can name the other two

factors as satisfaction and e-convenience.

In summary, SPSS only produces factors, but these factors need researchers to

interpret and give a meaningful name.

Research Question 2: There Is No Correlation Between Satisfaction

and Information Quality

The result shows that the standardized coefficient is 0.35 between information

quality and satisfaction with p < .001, and we conclude that there is a strong and

positive relationship between information quality and satisfaction.

Research Question 3: There Is No Correlation Between Satisfaction

and Online Shopping Convenience

The result shows that the standardized coefficient is 0.32 between e-convenience

and satisfaction with p < .001, and we conclude that there is a strong and positive

relationship between e-convenience and satisfaction.

SAGE



2



Review

Factor analysis is a statistical procedure to identify interrelationships that exist

among a large number of variables, that is, to identify how suites of variables are

related. Factor analysis can be used for exploratory or confirmatory purposes. It

is commonly used in social, business, and behavior studies. Continuous variables

or variables that are at interval or ratio scale can be used to conduct a factor

analysis. Factor analysis is a data reduction technique. Factors are extracted from

variables, and these factors can be used to conduct relationship tests or group

difference tests on the factors.

In the example we provide, we use a survey method to collect data on consumers’

perceptions on their online shopping experience. Using the survey data, we

extracted three factors—satisfaction, information quality, and convenience. We

then compute the composite scores of the three factors and use the composite

score to test the relationships between satisfaction and information quality, and

satisfaction and convenience.

You should know:

• What types of variables are suited for a factor analysis.

• The basic assumptions underlying factor analysis.

• How to perform a factor analysis.

• How to compute factor scores.

• How to interpret results of a factor analysis.

• How to refine factor loadings.

• How to perform reliability and validity tests.

Your Turn

Now it is your turn to perform factor analysis and reproduce the results presented

SAGE



2



above. Read the file containing the description of variables. Download the dataset

and perform descriptive analysis, correlation analysis, factor analysis, and

reliability test on the sample data. After factors are extracted, compute the scores

for each factor and use the composite factor scores to perform hypotheses testing.

References

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation

by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81–105.

Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995).

Multivariate data analysis (3rd ed.). New York, NY: Macmillan.

Kaiser, H. F. (1970). A second generation Little Jiffy. Psychometrika, 35,

401–415.

Oliver, R. L. (1997). Satisfaction: A behavioural perspective on the consumer.

New York, NY: McGraw Hill.

Robinson, J. P, Shaver, P. R., & Wrightsman, L. S. (Eds.). (1991). Criteria

for scale selection and evaluation. In Measures of personality and social

psychological attitudes (pp. 1–15). San Diego, CA: Academic Press.

Rogerson, P. A. (2001). Statistical methods for geography. London, UK: SAGE.

SAGE



2