Upload
hoa-tu-dang
View
213
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Survey
Citation preview
Multivariate Data Analysis
Course Introduction
Vu Hoang Nguyen, PhD.
Course Content
Provide an introduction to multivariate data analysis.
Focus on practical applications for business and marketing.
Topics include: design of a multivariate research, choice of a multivariate method, evaluation of a multivariate data analysis and interpretation of the results.
Course objectives
After this course you would be able to:
Understand the basic concepts in multivariate analysis.
Choose the most appropriate multivariate technique for a given data set.
Perform the analysis using SPSS. Interpret the numerical and graphical results. Summarize and communicate the obtained
results.
Pedagogy
This course is conducted in a combination of lectures, group presentations, home assignments, and group projects.
SPSS is used as the primary software in the course.
Schedule
1 Introduction to Multivariate Analysis
Introduction to SPSS
2 Data preparation
3&4Frequency Distribution, Cross-tabulation, and Hypothesis Testing
5&6 Analysis of Variance and Covariance
7&8 Correlation and Regression
9 Discriminant Analysis
10 Factor Analysis
11 Cluster Analysis12 Multidimensional Scaling and Conjoint Analysis
13&14 Group presentation & Course revision15 Final Exam
Grading
Group presentation: 20% Group project and homework: 20% Final exam: 60%
Reference
Required Textbook Malhotra, N. K. (2009). Marketing Research: An
Applied Orientation (6th edition). Prentice-Hall. Recommended Textbook
Joseph, F.H., Willian C.B., Barry J. B., and Rolph, E. A. Multivariate Data Analysis: A Global Perspective (7th edition). Pearson.
Mertler, C.A and Vannatta, R.A. Advanced and Multivariate Statistical Methods: Practical Application and Interpretation (3rd edition). Pyrczak Publishing.
LEARNING OBJECTIVES:
1. Explain what multivariate analysis is and its application.
2. Define and discuss the specific techniques included in multivariate analysis.
3. Determine which multivariate technique is appropriate for a specific research problem.
4. Discuss the nature of measurement scales and their relationship to multivariate techniques.
5. Describe the statistical issues inherent in multivariate analysis.
Introduction to Muitivariate Analysis
All statistical methods that simultaneously analyze multiple measurements on each individual or object under investigation.
Why use it?• Measurement• Explanation & Prediction• Hypothesis Testing
What is Multivariate Analysis?
Data set The Variate Measurement Scales
• Nonmetric• Metric
Multivariate Measurement Measurement Error Types of Techniques
Basic Concepts
What is data
Data are facts and figures… collected for analysis, presentation and interpretation.
Data set: a collection of data values as a whole. Subject (or individual): an item for study (e.g., a
student, a company). Variable: a characteristic about the subject (e.g.,
student name, company revenue). Observation: a single data value.
What is data
Employee Name Sexuality DOB Income per year in $
Gladys Simpson Female 1-May-1971
120,000
Divid Hinds Male 17-Dec-1968
135,000
Kenneth Henry Male 3-Sep-1965
98,000
Variable
Observation
A data set with 3 observations
Variables
Univariate: one variable• Student name
Bivariate: two variables• Student name, Date of birth
Multivariate: more than two variables• Student name, Date of birth, Math score
The Variate The variate is a linear combination of variables with
empirically determined weights. Weights are determined to best achieve the objective of
the specific multivariate technique. Variate equation: Y = W1 X1 + W2 X2 + . . . + Wn Xn
Each respondent has a variate value Y. The Y value is a linear combination of the entire set of variables. It is the dependent variable.
Potential Independent Variables:• X1 = income
• X2 = education
• X3 = family size
Scales of Measurement
The scale indicates the data summarization and statistical analyses that are most appropriate. The scale indicates the data summarization and statistical analyses that are most appropriate.
The scale determines the amount of information contained in the data. The scale determines the amount of information contained in the data.
Scales of measurement include: Scales of measurement include:
Nominal
Ordinal
Interval
Ratio
Nominal Scale
Data are labels or names used to identify an attribute of the element.
Example: Students of a university are classified by as Business,
Humanities, Education, and so on. Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).
No ordering.
Ordinal Scale
The data have the properties of nominal data and the order or rank of the data is meaningful.
Example: Students of a university are classified as Freshman,
Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the
class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).
Ordering, but differences have no meaning.
Interval Scale
The data have the properties of ordinal data, and the difference between measurements is meaningful quantity, but the measurements have no true zero value.
Example: Difference between a temperature of 00C and
20C is the same difference as between 20C and 40C, but we couldn’t say that 40C is as twice as hot as 20C.
Differences have meaning, but ratios have no meaning.
0 °C 32.0 °F 1 °C 33.8 °F2 °C 35.6 °F3 °C 37.4 °F4 °C 39.2 °F5 °C 41.0 °F6 °C 42.8 °F
All variables have some error. What are the sources of error? Measurement error = distorts observed relationships
and makes multivariate techniques less powerful. Researchers use summated scales, for which several
variables are summed or averaged together to form a composite representation of a concept.
Measurement Error
In addressing measurement error, researchers evaluate two important characteristics of measurement:• Validity = the degree to which a measure accurately
represents what it is supposed to.• Reliability = the degree to which the observed variable
measures the “true” value and is thus error free.
Measurement Error
Type I error, or , is the probability of rejecting the null hypothesis when it is true.
Type II error, or , is the probability of failing to reject the null hypothesis when it is false.
Power, or 1-, is the probability of rejecting the null hypothesis when it is false.
H0 true H0 false
Fail to Reject H0 1- Type II error
Reject H0
Type I error1-
Power
Statistical Significance and Power
Impact of Sample Size on Power
Dependence: analyze dependent and independent variables at the same time.
Interdependence: analyze dependent and independent variables separately.
Two Types of Multivariate Methods
A variable or set of variables is identified as the dependent variable to be predicted or explained by other variables known as independent variables.
• Multiple Regression• Multiple Discriminant Analysis• Logit/Logistic Regression• Multivariate Analysis of Variance (MANOVA) and Covariance• Conjoint Analysis• Canonical Correlation• Structural Equations Modeling (SEM)
Dependence Techniques
Involve the simultaneous analysis of all variables in the set, without distinction between dependent variables and independent variables.
• Principal Components and Common Factor Analysis• Cluster Analysis• Multidimensional Scaling (perceptual mapping)• Correspondence Analysis
Indepedence Techniques
Selecting a Multivariate Technique
1. What type of relationship is being examined – dependence or interdependence?
2. Dependence relationship: How many variables are being predicted?• What is the measurement scale of the
dependent variable?• What is the measurement scale of the
independent variable?3. Interdependence relationship: Are you examining
relationships between variables, respondents, or objects?
MultipleRegression
and Conjoint
DiscriminantAnalysisand Logit
MANOVAand
Canonical
CanonicalCorrelation,
DummyVariables
Metric Nonmetric Metric Nonmetric
Metric Nonmetric
FactorAnalysis
ClusterAnalysis
NonmetricMDS and
Correspon-dence
Analysis
Selecting the Correct Multivariate Method
SEM CFA
SeveralDependentVariables
OneDependent
Variable
MetricMDS
MultivariateMethods
DependenceMethods
InterdependenceMethods
MultipleRelationships -
StructuralEquations
X2
A single metric dependent variable is predicted
by several metric independent variables.
Y
X1
Multiple Regression
A single, non-metric (categorical) dependent variable is predicted by several
metric independent variables.
Examples of Dependent Variables:
• Gender – Male vs. Female• Culture – USA vs. Outside USA• Purchasers vs. Non-purchasers• Member vs. Non-Member• Good, Average and Poor Credit Risk
Discriminant Analysis
A single nonmetric dependent variable is
predicted by several metric independent
variables. This technique is similar to
discriminant analysis, but relies on calculations
more like regression.
Logistic Regression
Several metric dependent variables
are predicted by a set of nonmetric
(categorical) independent variables.
MANOVA
Several metric dependent variables
are predicted by several metric
independent variables.
Canonical Analysis
is used to understand respondents’ preferences for products and services.
In doing this, it determines the importance of both:
attributes and
levels of attributes
. . . based on a smaller subset of combinations of
attributes and levels.
Conjoint Analysis
Typical Applications:
Soft Drinks Candy Bars
Cereals Beer
Apartment Buildings; Condos Solvents; Cleaning Fluids
Conjoint Analysis
Structural Equations Modeling (SEM)
Estimates multiple, interrelated dependence relationships based on two components:
1. Measurement Model
2. Structural Model
Canonical CorrelationY1+Y2+Y3+…+Yn = X1+X2+X3+…+Xn
metric, nonmetric metric, nonmetric
Multivariate Dependence Methods
Multivariate Analysis of VarianceY1+Y2+Y3+…+Yn = X1+X2+X3+…+Xn
metric nonmetric
Analysis of Variance Y1 = X1+X2+X3+…+Xn
metric nonmetric
Multiple Disriminant AnalysisY1 = X1+X2+X3+…+Xn
nonmetric metric
Multivariate Dependence Methods
Multivariate Regression Analysis Y1 = X1+X2+X3+…+Xn
metric metric, nonmetric
Conjoint Analysis Y1 = X1+X2+X3+…+Xn
metric, nonmetric nonmetric
Structural Equation Modeling
Y1 = X11 + X12 + X13+…+X1n
Y2 = X21 + X22 + X23+…+X2n …
Ym = Xm1+ Xm2 + Xm3+…+Xmn metric metric, nonmetric
Multivariate Dependence Methods
analyzes the structure of the interrelationships among a large number of
variables to determine a set of common underlying dimensions (factors).
Exploratory Factor Analysis
. . . groups objects (respondents, products, firms, variables, etc.) so that each
object is similar to the other objects in the cluster and different from objects in all the
other clusters.
Cluster Analysis
Multidimensional Scaling
. . . identifies “unrecognized” dimensions that affect purchase behavior
based on customer judgments of:
• similarities or• preferences
and transforms these into distances
represented as perceptual maps.
Correspondence Analysis
uses non-metric data and evaluates either
linear or non-linear relationships in an effort
to develop a perceptual map representing the
association between objects (firms, products,
etc.) and a set of descriptive characteristics
of the objects.
• Establish Practical Significance as well as Statistical Significance.
• Sample Size Affects All Results.
• Know Your Data.
• Strive for Model Parsimony.
• Look at Your Errors.
• Validate Your Results.
Guidelines for Multivariate Analysis
Stage 1: Define the Research Problem, Objectives, and
Multivariate Technique(s) to be Used
Stage 2: Develop the Analysis Plan
Stage 3: Evaluate the Assumptions Underlying the
Multivariate Technique(s)
Stage 4: Estimate the Multivariate Model and Assess
Overall Model Fit
Stage 5: Interpret the Variate(s)
Stage 6: Validate the Multivariate Model
A Structured Approach to Multivariate Model Building:
Multivariate AnalysisLearning Checkpoint
1. What is multivariate analysis?
2. Why use multivariate analysis?
3. Why is knowledge of measurement scales
important in using multivariate analysis?
4. What basic issues need to be examined
when using multivariate analysis?
5. Describe the process for applying
multivariate analysis.