45
Multivariate Data Analysis Course Introduction Vu Hoang Nguyen, PhD.

Lecture 1

Embed Size (px)

DESCRIPTION

Survey

Citation preview

Page 1: Lecture 1

Multivariate Data Analysis

Course Introduction

Vu Hoang Nguyen, PhD.

Page 2: Lecture 1

Course Content

Provide an introduction to multivariate data analysis.

Focus on practical applications for business and marketing.

Topics include: design of a multivariate research, choice of a multivariate method, evaluation of a multivariate data analysis and interpretation of the results.

Page 3: Lecture 1

Course objectives

After this course you would be able to:

Understand the basic concepts in multivariate analysis.

Choose the most appropriate multivariate technique for a given data set.

Perform the analysis using SPSS. Interpret the numerical and graphical results. Summarize and communicate the obtained

results.

Page 4: Lecture 1

Pedagogy

This course is conducted in a combination of lectures, group presentations, home assignments, and group projects.

SPSS is used as the primary software in the course.

Page 5: Lecture 1

Schedule

1 Introduction to Multivariate Analysis

Introduction to SPSS

2 Data preparation

3&4Frequency Distribution, Cross-tabulation, and Hypothesis Testing

5&6 Analysis of Variance and Covariance

7&8 Correlation and Regression

9 Discriminant Analysis

10 Factor Analysis

11 Cluster Analysis12 Multidimensional Scaling and Conjoint Analysis

13&14 Group presentation & Course revision15 Final Exam

Page 6: Lecture 1

Grading

Group presentation: 20% Group project and homework: 20% Final exam: 60%

Page 7: Lecture 1

Reference

Required Textbook Malhotra, N. K. (2009). Marketing Research: An

Applied Orientation (6th edition). Prentice-Hall. Recommended Textbook

Joseph, F.H., Willian C.B., Barry J. B., and Rolph, E. A. Multivariate Data Analysis: A Global Perspective (7th edition). Pearson.

Mertler, C.A and Vannatta, R.A. Advanced and Multivariate Statistical Methods: Practical Application and Interpretation (3rd edition). Pyrczak Publishing.

Page 8: Lecture 1

LEARNING OBJECTIVES:

1. Explain what multivariate analysis is and its application.

2. Define and discuss the specific techniques included in multivariate analysis.

3. Determine which multivariate technique is appropriate for a specific research problem.

4. Discuss the nature of measurement scales and their relationship to multivariate techniques.

5. Describe the statistical issues inherent in multivariate analysis.

Introduction to Muitivariate Analysis

Page 9: Lecture 1

All statistical methods that simultaneously analyze multiple measurements on each individual or object under investigation.

Why use it?• Measurement• Explanation & Prediction• Hypothesis Testing

What is Multivariate Analysis?

Page 10: Lecture 1

Data set The Variate Measurement Scales

• Nonmetric• Metric

Multivariate Measurement Measurement Error Types of Techniques

Basic Concepts

Page 11: Lecture 1

What is data

Data are facts and figures… collected for analysis, presentation and interpretation.

Data set: a collection of data values as a whole. Subject (or individual): an item for study (e.g., a

student, a company). Variable: a characteristic about the subject (e.g.,

student name, company revenue). Observation: a single data value.

Page 12: Lecture 1

What is data

Employee Name Sexuality DOB Income per year in $

Gladys Simpson Female 1-May-1971

120,000

Divid Hinds Male 17-Dec-1968

135,000

Kenneth Henry Male 3-Sep-1965

98,000

Variable

Observation

A data set with 3 observations

Page 13: Lecture 1

Variables

Univariate: one variable• Student name

Bivariate: two variables• Student name, Date of birth

Multivariate: more than two variables• Student name, Date of birth, Math score

Page 14: Lecture 1

The Variate The variate is a linear combination of variables with

empirically determined weights. Weights are determined to best achieve the objective of

the specific multivariate technique. Variate equation: Y = W1 X1 + W2 X2 + . . . + Wn Xn

Each respondent has a variate value Y. The Y value is a linear combination of the entire set of variables. It is the dependent variable.

Potential Independent Variables:• X1 = income

• X2 = education

• X3 = family size

Page 15: Lecture 1

Scales of Measurement

The scale indicates the data summarization and statistical analyses that are most appropriate. The scale indicates the data summarization and statistical analyses that are most appropriate.

The scale determines the amount of information contained in the data. The scale determines the amount of information contained in the data.

Scales of measurement include: Scales of measurement include:

Nominal

Ordinal

Interval

Ratio

Page 16: Lecture 1

Nominal Scale

Data are labels or names used to identify an attribute of the element.

Example: Students of a university are classified by as Business,

Humanities, Education, and so on. Alternatively, a numeric code could be used for the

school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).

No ordering.

Page 17: Lecture 1

Ordinal Scale

The data have the properties of nominal data and the order or rank of the data is meaningful.

Example: Students of a university are classified as Freshman,

Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the

class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).

Ordering, but differences have no meaning.

Page 18: Lecture 1

Interval Scale

The data have the properties of ordinal data, and the difference between measurements is meaningful quantity, but the measurements have no true zero value.

Example: Difference between a temperature of 00C and

20C is the same difference as between 20C and 40C, but we couldn’t say that 40C is as twice as hot as 20C.

Differences have meaning, but ratios have no meaning.

0 °C 32.0 °F 1 °C 33.8 °F2 °C 35.6 °F3 °C 37.4 °F4 °C 39.2 °F5 °C 41.0 °F6 °C 42.8 °F

Page 19: Lecture 1

All variables have some error. What are the sources of error? Measurement error = distorts observed relationships

and makes multivariate techniques less powerful. Researchers use summated scales, for which several

variables are summed or averaged together to form a composite representation of a concept.

Measurement Error

Page 20: Lecture 1

In addressing measurement error, researchers evaluate two important characteristics of measurement:• Validity = the degree to which a measure accurately

represents what it is supposed to.• Reliability = the degree to which the observed variable

measures the “true” value and is thus error free.

Measurement Error

Page 21: Lecture 1

Type I error, or , is the probability of rejecting the null hypothesis when it is true.

Type II error, or , is the probability of failing to reject the null hypothesis when it is false.

Power, or 1-, is the probability of rejecting the null hypothesis when it is false.

H0 true H0 false

Fail to Reject H0 1- Type II error

Reject H0

Type I error1-

Power

Statistical Significance and Power

Page 22: Lecture 1

Impact of Sample Size on Power

Page 23: Lecture 1

Dependence: analyze dependent and independent variables at the same time.

Interdependence: analyze dependent and independent variables separately.

Two Types of Multivariate Methods

Page 24: Lecture 1

A variable or set of variables is identified as the dependent variable to be predicted or explained by other variables known as independent variables.

• Multiple Regression• Multiple Discriminant Analysis• Logit/Logistic Regression• Multivariate Analysis of Variance (MANOVA) and Covariance• Conjoint Analysis• Canonical Correlation• Structural Equations Modeling (SEM)

Dependence Techniques

Page 25: Lecture 1

Involve the simultaneous analysis of all variables in the set, without distinction between dependent variables and independent variables.

• Principal Components and Common Factor Analysis• Cluster Analysis• Multidimensional Scaling (perceptual mapping)• Correspondence Analysis

Indepedence Techniques

Page 26: Lecture 1

Selecting a Multivariate Technique

1. What type of relationship is being examined – dependence or interdependence?

2. Dependence relationship: How many variables are being predicted?• What is the measurement scale of the

dependent variable?• What is the measurement scale of the

independent variable?3. Interdependence relationship: Are you examining

relationships between variables, respondents, or objects?

Page 27: Lecture 1

MultipleRegression

and Conjoint

DiscriminantAnalysisand Logit

MANOVAand

Canonical

CanonicalCorrelation,

DummyVariables

Metric Nonmetric Metric Nonmetric

Metric Nonmetric

FactorAnalysis

ClusterAnalysis

NonmetricMDS and

Correspon-dence

Analysis

Selecting the Correct Multivariate Method

SEM CFA

SeveralDependentVariables

OneDependent

Variable

MetricMDS

MultivariateMethods

DependenceMethods

InterdependenceMethods

MultipleRelationships -

StructuralEquations

Page 28: Lecture 1

X2

A single metric dependent variable is predicted

by several metric independent variables.

Y

X1

Multiple Regression

Page 29: Lecture 1

A single, non-metric (categorical) dependent variable is predicted by several

metric independent variables.

Examples of Dependent Variables:

• Gender – Male vs. Female• Culture – USA vs. Outside USA• Purchasers vs. Non-purchasers• Member vs. Non-Member• Good, Average and Poor Credit Risk

Discriminant Analysis

Page 30: Lecture 1

A single nonmetric dependent variable is

predicted by several metric independent

variables. This technique is similar to

discriminant analysis, but relies on calculations

more like regression.

Logistic Regression

Page 31: Lecture 1

Several metric dependent variables

are predicted by a set of nonmetric

(categorical) independent variables.

MANOVA

Page 32: Lecture 1

Several metric dependent variables

are predicted by several metric

independent variables.

Canonical Analysis

Page 33: Lecture 1

is used to understand respondents’ preferences for products and services.

In doing this, it determines the importance of both:

attributes and

levels of attributes

. . . based on a smaller subset of combinations of

attributes and levels.

Conjoint Analysis

Page 34: Lecture 1

Typical Applications:

Soft Drinks Candy Bars

Cereals Beer

Apartment Buildings; Condos Solvents; Cleaning Fluids

Conjoint Analysis

Page 35: Lecture 1

Structural Equations Modeling (SEM)

Estimates multiple, interrelated dependence relationships based on two components:

1. Measurement Model

2. Structural Model

Page 36: Lecture 1

Canonical CorrelationY1+Y2+Y3+…+Yn = X1+X2+X3+…+Xn

metric, nonmetric metric, nonmetric

Multivariate Dependence Methods

Multivariate Analysis of VarianceY1+Y2+Y3+…+Yn = X1+X2+X3+…+Xn

metric nonmetric

Analysis of Variance Y1 = X1+X2+X3+…+Xn

metric nonmetric

Page 37: Lecture 1

Multiple Disriminant AnalysisY1 = X1+X2+X3+…+Xn

nonmetric metric

Multivariate Dependence Methods

Multivariate Regression Analysis Y1 = X1+X2+X3+…+Xn

metric metric, nonmetric

Conjoint Analysis Y1 = X1+X2+X3+…+Xn

metric, nonmetric nonmetric

Page 38: Lecture 1

Structural Equation Modeling

Y1 = X11 + X12 + X13+…+X1n

Y2 = X21 + X22 + X23+…+X2n …

Ym = Xm1+ Xm2 + Xm3+…+Xmn metric metric, nonmetric

Multivariate Dependence Methods

Page 39: Lecture 1

analyzes the structure of the interrelationships among a large number of

variables to determine a set of common underlying dimensions (factors).

Exploratory Factor Analysis

Page 40: Lecture 1

. . . groups objects (respondents, products, firms, variables, etc.) so that each

object is similar to the other objects in the cluster and different from objects in all the

other clusters.

Cluster Analysis

Page 41: Lecture 1

Multidimensional Scaling

. . . identifies “unrecognized” dimensions that affect purchase behavior

based on customer judgments of:

• similarities or• preferences

and transforms these into distances

represented as perceptual maps.

Page 42: Lecture 1

Correspondence Analysis

uses non-metric data and evaluates either

linear or non-linear relationships in an effort

to develop a perceptual map representing the

association between objects (firms, products,

etc.) and a set of descriptive characteristics

of the objects.

Page 43: Lecture 1

• Establish Practical Significance as well as Statistical Significance.

• Sample Size Affects All Results.

• Know Your Data.

• Strive for Model Parsimony.

• Look at Your Errors.

• Validate Your Results.

Guidelines for Multivariate Analysis

Page 44: Lecture 1

Stage 1: Define the Research Problem, Objectives, and

Multivariate Technique(s) to be Used

Stage 2: Develop the Analysis Plan

Stage 3: Evaluate the Assumptions Underlying the

Multivariate Technique(s)

Stage 4: Estimate the Multivariate Model and Assess

Overall Model Fit

Stage 5: Interpret the Variate(s)

Stage 6: Validate the Multivariate Model

A Structured Approach to Multivariate Model Building:

Page 45: Lecture 1

Multivariate AnalysisLearning Checkpoint

1. What is multivariate analysis?

2. Why use multivariate analysis?

3. Why is knowledge of measurement scales

important in using multivariate analysis?

4. What basic issues need to be examined

when using multivariate analysis?

5. Describe the process for applying

multivariate analysis.