Upload
obssidian
View
220
Download
0
Embed Size (px)
Citation preview
8/10/2019 SMDE - (US) Multivariate Analysis
1/44
MULTIVARIATE ANALYSIS
Pau Fonseca i Casas [email protected]
8/10/2019 SMDE - (US) Multivariate Analysis
2/44
Multivariant analisys?
Is the set of statistical methods with the aim to
analyze simultaneously data sets of multivariate
data in the sense that exists different variables
measured for each individual or object to bestudied.
Its reason to exist is to understand better the
phenomenon to be studied, obtaining information
that the univariate or bivariate methods are not
able to obtain.
8/10/2019 SMDE - (US) Multivariate Analysis
3/44
Multivariate analysis objectives
Provide methods with the aim of study of
multivariate data sets, that the univariate or
bivariate statistical analysis cannot afford.
To help to the researcher to take optimal decisionsin his context, taking care the available information
for the analyzed dataset.
8/10/2019 SMDE - (US) Multivariate Analysis
4/44
Multivariate techniques
Dependency methods Metric Regression analysis
Survival analysis
MANOVA Canonical correlation
No metric Discriminant analysis
Logistic regression
Conjoint analysis Interdependency methods
Structural methods
8/10/2019 SMDE - (US) Multivariate Analysis
5/44
Multivariate techniques
Dependency methods
Interdependency methods Metric data
Principal component analysis
Factorial analysis Multidimensional scales
Cluster analysis
No metric data Correspondence analysis
Log-lineal models Multidimensional scales
Cluster analysis
Structural methods
8/10/2019 SMDE - (US) Multivariate Analysis
6/44
Regression analysis, Survival analysis, MANOVA,
Canonical regression, Discriminant analysis,
Logistic regression, Cojoint analysis
Dependency methods
8/10/2019 SMDE - (US) Multivariate Analysis
7/44
Dependency methods
They assume that the variables analyzed are
divided into two groups: the dependent and
independent variables.
The goal of the methods is dependent on whetherthe set of independent variables affects all
dependent variables and how.
8/10/2019 SMDE - (US) Multivariate Analysis
8/44
Dependency methods
They can be classified into two subgroups according
to the variable (s) dependent (s) is (are)
quantitative or qualitative.
If the dependent variable is quantitative sometechniques that can be applied are:
Regression Analysis
Survival Analysis Analysis of variance
Canonical Correlation
8/10/2019 SMDE - (US) Multivariate Analysis
9/44
Technique adequate if in the analysis exists one or
several dependent metric variables whose value
depends of one or more independent metrics
variables. For example, trying to predict the annual expenditure
on Christmas of a person from their income level,
education level, gender and age.
Regression analysis
8/10/2019 SMDE - (US) Multivariate Analysis
10/44
Survival Analysis
Similar to regression analysis but with the difference
that the independent variable is the time of survival
of an individual or object.
For example, try to predict the time spent inunemployment of an individual from their level of
education and age.
8/10/2019 SMDE - (US) Multivariate Analysis
11/44
Analysis of variance
They are used in situations where the total sample is
divided into several groups based on one or more
independent nonmetric variables and the
dependent variables analyzed are metric. It aims tofind out if there are significant differences between
the groups in terms of the dependent variables.
For example, are there differences in the level of
cholesterol by gender? Does it affect also the type
of occupation?.
8/10/2019 SMDE - (US) Multivariate Analysis
12/44
Canonical Correlation
Its aim is to connect simultaneously several
independent and dependent metric variables
defining linear combinations of each set of
variables that maximize the correlation between thetwo sets of variables.
For example, analyzing how is related the time
dedicated to work and leisure for a person with an
specific income level, age and education level.
8/10/2019 SMDE - (US) Multivariate Analysis
13/44
Dependency methods
If the dependent variable is qualitative some
techniques that can be applied are:
Discriminant Analysis
Logistic regression models
Conjoint Analysis
8/10/2019 SMDE - (US) Multivariate Analysis
14/44
Discriminant Analysis
This technique gives optimal classification rules of
new observations where is unknown its source group
based on the information provided by the values
that in it takes the independent variables. For example, determining the financial ratios that
best allow discriminating between profitable and
unprofitable.
8/10/2019 SMDE - (US) Multivariate Analysis
15/44
Logistic regression models
Are regression models in which the dependent
variable is not metric. They are used as an
alternative to the discriminant analysis when normal
assumption cannot be assumed.
8/10/2019 SMDE - (US) Multivariate Analysis
16/44
Conjoint Analysis
It is a technique that analyzes the effect of independentnon-metric variables on metric or nonmetric variables. Thedifference with the analysis of variance is based on twofacts: the dependent variables can be non-metric and
the values of the independent variables are not set by metricsanalyst. In other disciplines is known as Design of Experiments.
For example, a company wants to design a new product andit needs to specify the shape of the container, its price perpackage content and chemical composition. Presents variouscompositions of these four factors. 100 customers provide aranking of the combinations that are presented. It wants todetermine the optimal values of these four factors.
8/10/2019 SMDE - (US) Multivariate Analysis
17/44
Principal Component Analysis, Factorial Analysis,Multidimensional Scales, Cluster Analysis,Correspondence Analysis, Log-Lineal Models,Multidimensional Scales, Cluster Analysis.
Interdependence methods
8/10/2019 SMDE - (US) Multivariate Analysis
18/44
Interdependence methods
These methods do not distinguish between
dependent and independent variables and the
objective is to identify which variables are related,
how they are, and why.
8/10/2019 SMDE - (US) Multivariate Analysis
19/44
Interdependence methods
They can be classified into two groups according to
the type of data to analyze whether metric or non-
metric.
If data are metric can be used, among others, thefollowing techniques:
Factorial Analysis and Principal Component Analysis
Multidimensional Scales Cluster Analysis
8/10/2019 SMDE - (US) Multivariate Analysis
20/44
Factor Analysis and Principal
Component Analysis
Is used to analyze interactions between a large number ofvariables such interrelationships explaining metrics in termsof fewer variables called factors (if unobservable) orprincipal components (if they are observable).
For example, if a financial analyst wants to determine whichis the financial health of a company based on theknowledge of a number of financial ratios, building severalnumerical indices that define their situation, the problemwould be resolved by analyzing Principal Components.
If a psychologist wants to determine the factors thatcharacterize an individual's intelligence from their answersto an IQ test, can use to solve this problem a FactorialAnalysis.
8/10/2019 SMDE - (US) Multivariate Analysis
21/44
Multidimensional Scales
Is intended to transform judgments of preference or
similarity in distances represented in a multidimensional
space. Consequently a map is constructed in which
positions represents the objects compared. Those whoare similar are closed and far from the dissimilar ones.
For example, look at the soft drinks market, perceptions
that a consumer group has about a list of drinks and
brands in order to study how a consumer uses subjectivefactors when classifying these products.
8/10/2019 SMDE - (US) Multivariate Analysis
22/44
8/10/2019 SMDE - (US) Multivariate Analysis
23/44
Interdependence methods
If the data are not metric can be used, in addition
to multidimensional scaling and cluster analysis, the
following techniques:
Correspondence Analysis Log-linear models
8/10/2019 SMDE - (US) Multivariate Analysis
24/44
Correspondence Analysis
Applies to multidimensional contingency tables and
pursues a similar objective of multidimensional
scales but simultaneously representing the rows and
columns of the contingency tables. For example, unemployment in Aragon analyze
considering the province, sex, age and educational
level of the unemployed
8/10/2019 SMDE - (US) Multivariate Analysis
25/44
Log-linear models
They apply to multidimensional contingency tables
and multidimensional dependencies modeling the
observed variables that seek to explain the
observed frequencies.
8/10/2019 SMDE - (US) Multivariate Analysis
26/44
Structural Methods
8/10/2019 SMDE - (US) Multivariate Analysis
27/44
Structural methods
They assume that the variables are divided into two
groups: the dependent variable and the
independent. The objective of these methods is to
analyze not only as independent variables to thedependent variables affect, but also how variables
relate the two groups together.
8/10/2019 SMDE - (US) Multivariate Analysis
28/44
8/10/2019 SMDE - (US) Multivariate Analysis
29/44
Multivariate analysis steeps
1. Goals of the analysis
2. Design of the analysis
3. Hypotheses of the Analysis
4. Analytical procedure
5. Interpretation of the results
6. Analysis Validation
8/10/2019 SMDE - (US) Multivariate Analysis
30/44
1. Goals of the analysis
The problem is specified defining objectives and
multivariate techniques that will be used.
The investigator must establish the problem
conceptually defining the concepts and relationsthat are fundamental to the investigation. It must
determine whether such relationships will be
relations of dependence or interdependence. With
all these the variables to observe are determined.
8/10/2019 SMDE - (US) Multivariate Analysis
31/44
2. Design of the analysis
Determine the sample size, the equations to
estimate (if applicable), the distances to calculate
(if applicable) and the estimation techniques
employed. Once this is determined we can proceedto observe the data.
8/10/2019 SMDE - (US) Multivariate Analysis
32/44
3. Hypotheses of the Analysis
We evaluate the assumptions underlying the
multivariate technique.
These hypotheses may be of normality, linearity,
independence, homoscedasticity, etc. You must alsodecide what to do with the missing data.
8/10/2019 SMDE - (US) Multivariate Analysis
33/44
4. Analytical procedure
We estimate the model and we evaluate the fit to
the data.
In this step may appear unusual observations (outliers)
or influential whose influence on the estimates and thegoodness of fit must be analyzed.
8/10/2019 SMDE - (US) Multivariate Analysis
34/44
8/10/2019 SMDE - (US) Multivariate Analysis
35/44
6. Analysis Validation
Is to establish the validity of the results obtained by
analyzing whether the results, obtained with the
sample, is generalized to the population from which
it comes. This sample can be divided into several parts in which
the model is re-estimated and the results are
comparared. Other techniques that can be used here
are resampling techniques (jackknife and bootstrap)
8/10/2019 SMDE - (US) Multivariate Analysis
36/44
What technique are you going to apply?
Example
8/10/2019 SMDE - (US) Multivariate Analysis
37/44
1. Goals of the analysis
Predicting the amount of money a person spends incinema depending on income level, education level,gender and age which would allow us to betterunderstand what are the patterns of behavior ofthe population.
Variables to consider: Income level
Level of education Sex
Age
8/10/2019 SMDE - (US) Multivariate Analysis
38/44
1. Goals of the analysis
We propose multiple regression analysis in which
the dependent variable would be spending on film
and the other independent variables.
8/10/2019 SMDE - (US) Multivariate Analysis
39/44
2. Design of the analysis
Are decided how to select the sample, the size of it
and how to measure the variables involved in the
analysis.
Spending on film could be measured as the annualexpenditure on film measured in euros.
The income level could be measured with an ordinal
variable, given the reluctance to give accurate
information on these variables, the level of educationwould be an ordinal variable, a binary sex and age a
quantitative variable measured in years.
8/10/2019 SMDE - (US) Multivariate Analysis
40/44
2. Design of the analysis
The sample size would be chosen in function of the
power that may be given to the multiple regression.
It should, moreover, that the ratio of the number of
observations to the number of parameters to beestimated is sufficiently broad to estimate model
parameters with the least possible error.
8/10/2019 SMDE - (US) Multivariate Analysis
41/44
3. Hypotheses of the Analysis
You have to check the linearity of the relationship,
normality and homoscedasticity. No data are
missing and should study the possible existence of
outliers in each of the variables.
8/10/2019 SMDE - (US) Multivariate Analysis
42/44
4. Analytical procedure
You can use the least squares estimator, knowing itssampling distribution under normality assumption.
This estimate coincides with the maximum likelihood andis efficient.
You can also use the method of stepwise regression todetermine the independent variables included in theregression.
Once estimated the regression equation examines thegoodness of fit by calculating R2 and residual analysis.
Study the homoscedasticity, independence, possibleomission of variables, existence of outliers and influenceof individual observations.
8/10/2019 SMDE - (US) Multivariate Analysis
43/44
5. Interpretation of the results
The value of the coefficients obtained and its sign
are interpreted.
8/10/2019 SMDE - (US) Multivariate Analysis
44/44
6. Analysis Validation
The sample is divided into two sub-samples of size
50 and re-estimate the regression equation in each
subsample comparing results.