SMDE - (US) Multivariate Analysis

Embed Size (px)

Citation preview

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    1/44

    MULTIVARIATE ANALYSIS

    Pau Fonseca i Casas [email protected]

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    2/44

    Multivariant analisys?

    Is the set of statistical methods with the aim to

    analyze simultaneously data sets of multivariate

    data in the sense that exists different variables

    measured for each individual or object to bestudied.

    Its reason to exist is to understand better the

    phenomenon to be studied, obtaining information

    that the univariate or bivariate methods are not

    able to obtain.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    3/44

    Multivariate analysis objectives

    Provide methods with the aim of study of

    multivariate data sets, that the univariate or

    bivariate statistical analysis cannot afford.

    To help to the researcher to take optimal decisionsin his context, taking care the available information

    for the analyzed dataset.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    4/44

    Multivariate techniques

    Dependency methods Metric Regression analysis

    Survival analysis

    MANOVA Canonical correlation

    No metric Discriminant analysis

    Logistic regression

    Conjoint analysis Interdependency methods

    Structural methods

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    5/44

    Multivariate techniques

    Dependency methods

    Interdependency methods Metric data

    Principal component analysis

    Factorial analysis Multidimensional scales

    Cluster analysis

    No metric data Correspondence analysis

    Log-lineal models Multidimensional scales

    Cluster analysis

    Structural methods

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    6/44

    Regression analysis, Survival analysis, MANOVA,

    Canonical regression, Discriminant analysis,

    Logistic regression, Cojoint analysis

    Dependency methods

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    7/44

    Dependency methods

    They assume that the variables analyzed are

    divided into two groups: the dependent and

    independent variables.

    The goal of the methods is dependent on whetherthe set of independent variables affects all

    dependent variables and how.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    8/44

    Dependency methods

    They can be classified into two subgroups according

    to the variable (s) dependent (s) is (are)

    quantitative or qualitative.

    If the dependent variable is quantitative sometechniques that can be applied are:

    Regression Analysis

    Survival Analysis Analysis of variance

    Canonical Correlation

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    9/44

    Technique adequate if in the analysis exists one or

    several dependent metric variables whose value

    depends of one or more independent metrics

    variables. For example, trying to predict the annual expenditure

    on Christmas of a person from their income level,

    education level, gender and age.

    Regression analysis

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    10/44

    Survival Analysis

    Similar to regression analysis but with the difference

    that the independent variable is the time of survival

    of an individual or object.

    For example, try to predict the time spent inunemployment of an individual from their level of

    education and age.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    11/44

    Analysis of variance

    They are used in situations where the total sample is

    divided into several groups based on one or more

    independent nonmetric variables and the

    dependent variables analyzed are metric. It aims tofind out if there are significant differences between

    the groups in terms of the dependent variables.

    For example, are there differences in the level of

    cholesterol by gender? Does it affect also the type

    of occupation?.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    12/44

    Canonical Correlation

    Its aim is to connect simultaneously several

    independent and dependent metric variables

    defining linear combinations of each set of

    variables that maximize the correlation between thetwo sets of variables.

    For example, analyzing how is related the time

    dedicated to work and leisure for a person with an

    specific income level, age and education level.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    13/44

    Dependency methods

    If the dependent variable is qualitative some

    techniques that can be applied are:

    Discriminant Analysis

    Logistic regression models

    Conjoint Analysis

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    14/44

    Discriminant Analysis

    This technique gives optimal classification rules of

    new observations where is unknown its source group

    based on the information provided by the values

    that in it takes the independent variables. For example, determining the financial ratios that

    best allow discriminating between profitable and

    unprofitable.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    15/44

    Logistic regression models

    Are regression models in which the dependent

    variable is not metric. They are used as an

    alternative to the discriminant analysis when normal

    assumption cannot be assumed.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    16/44

    Conjoint Analysis

    It is a technique that analyzes the effect of independentnon-metric variables on metric or nonmetric variables. Thedifference with the analysis of variance is based on twofacts: the dependent variables can be non-metric and

    the values of the independent variables are not set by metricsanalyst. In other disciplines is known as Design of Experiments.

    For example, a company wants to design a new product andit needs to specify the shape of the container, its price perpackage content and chemical composition. Presents variouscompositions of these four factors. 100 customers provide aranking of the combinations that are presented. It wants todetermine the optimal values of these four factors.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    17/44

    Principal Component Analysis, Factorial Analysis,Multidimensional Scales, Cluster Analysis,Correspondence Analysis, Log-Lineal Models,Multidimensional Scales, Cluster Analysis.

    Interdependence methods

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    18/44

    Interdependence methods

    These methods do not distinguish between

    dependent and independent variables and the

    objective is to identify which variables are related,

    how they are, and why.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    19/44

    Interdependence methods

    They can be classified into two groups according to

    the type of data to analyze whether metric or non-

    metric.

    If data are metric can be used, among others, thefollowing techniques:

    Factorial Analysis and Principal Component Analysis

    Multidimensional Scales Cluster Analysis

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    20/44

    Factor Analysis and Principal

    Component Analysis

    Is used to analyze interactions between a large number ofvariables such interrelationships explaining metrics in termsof fewer variables called factors (if unobservable) orprincipal components (if they are observable).

    For example, if a financial analyst wants to determine whichis the financial health of a company based on theknowledge of a number of financial ratios, building severalnumerical indices that define their situation, the problemwould be resolved by analyzing Principal Components.

    If a psychologist wants to determine the factors thatcharacterize an individual's intelligence from their answersto an IQ test, can use to solve this problem a FactorialAnalysis.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    21/44

    Multidimensional Scales

    Is intended to transform judgments of preference or

    similarity in distances represented in a multidimensional

    space. Consequently a map is constructed in which

    positions represents the objects compared. Those whoare similar are closed and far from the dissimilar ones.

    For example, look at the soft drinks market, perceptions

    that a consumer group has about a list of drinks and

    brands in order to study how a consumer uses subjectivefactors when classifying these products.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    22/44

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    23/44

    Interdependence methods

    If the data are not metric can be used, in addition

    to multidimensional scaling and cluster analysis, the

    following techniques:

    Correspondence Analysis Log-linear models

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    24/44

    Correspondence Analysis

    Applies to multidimensional contingency tables and

    pursues a similar objective of multidimensional

    scales but simultaneously representing the rows and

    columns of the contingency tables. For example, unemployment in Aragon analyze

    considering the province, sex, age and educational

    level of the unemployed

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    25/44

    Log-linear models

    They apply to multidimensional contingency tables

    and multidimensional dependencies modeling the

    observed variables that seek to explain the

    observed frequencies.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    26/44

    Structural Methods

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    27/44

    Structural methods

    They assume that the variables are divided into two

    groups: the dependent variable and the

    independent. The objective of these methods is to

    analyze not only as independent variables to thedependent variables affect, but also how variables

    relate the two groups together.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    28/44

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    29/44

    Multivariate analysis steeps

    1. Goals of the analysis

    2. Design of the analysis

    3. Hypotheses of the Analysis

    4. Analytical procedure

    5. Interpretation of the results

    6. Analysis Validation

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    30/44

    1. Goals of the analysis

    The problem is specified defining objectives and

    multivariate techniques that will be used.

    The investigator must establish the problem

    conceptually defining the concepts and relationsthat are fundamental to the investigation. It must

    determine whether such relationships will be

    relations of dependence or interdependence. With

    all these the variables to observe are determined.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    31/44

    2. Design of the analysis

    Determine the sample size, the equations to

    estimate (if applicable), the distances to calculate

    (if applicable) and the estimation techniques

    employed. Once this is determined we can proceedto observe the data.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    32/44

    3. Hypotheses of the Analysis

    We evaluate the assumptions underlying the

    multivariate technique.

    These hypotheses may be of normality, linearity,

    independence, homoscedasticity, etc. You must alsodecide what to do with the missing data.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    33/44

    4. Analytical procedure

    We estimate the model and we evaluate the fit to

    the data.

    In this step may appear unusual observations (outliers)

    or influential whose influence on the estimates and thegoodness of fit must be analyzed.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    34/44

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    35/44

    6. Analysis Validation

    Is to establish the validity of the results obtained by

    analyzing whether the results, obtained with the

    sample, is generalized to the population from which

    it comes. This sample can be divided into several parts in which

    the model is re-estimated and the results are

    comparared. Other techniques that can be used here

    are resampling techniques (jackknife and bootstrap)

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    36/44

    What technique are you going to apply?

    Example

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    37/44

    1. Goals of the analysis

    Predicting the amount of money a person spends incinema depending on income level, education level,gender and age which would allow us to betterunderstand what are the patterns of behavior ofthe population.

    Variables to consider: Income level

    Level of education Sex

    Age

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    38/44

    1. Goals of the analysis

    We propose multiple regression analysis in which

    the dependent variable would be spending on film

    and the other independent variables.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    39/44

    2. Design of the analysis

    Are decided how to select the sample, the size of it

    and how to measure the variables involved in the

    analysis.

    Spending on film could be measured as the annualexpenditure on film measured in euros.

    The income level could be measured with an ordinal

    variable, given the reluctance to give accurate

    information on these variables, the level of educationwould be an ordinal variable, a binary sex and age a

    quantitative variable measured in years.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    40/44

    2. Design of the analysis

    The sample size would be chosen in function of the

    power that may be given to the multiple regression.

    It should, moreover, that the ratio of the number of

    observations to the number of parameters to beestimated is sufficiently broad to estimate model

    parameters with the least possible error.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    41/44

    3. Hypotheses of the Analysis

    You have to check the linearity of the relationship,

    normality and homoscedasticity. No data are

    missing and should study the possible existence of

    outliers in each of the variables.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    42/44

    4. Analytical procedure

    You can use the least squares estimator, knowing itssampling distribution under normality assumption.

    This estimate coincides with the maximum likelihood andis efficient.

    You can also use the method of stepwise regression todetermine the independent variables included in theregression.

    Once estimated the regression equation examines thegoodness of fit by calculating R2 and residual analysis.

    Study the homoscedasticity, independence, possibleomission of variables, existence of outliers and influenceof individual observations.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    43/44

    5. Interpretation of the results

    The value of the coefficients obtained and its sign

    are interpreted.

  • 8/10/2019 SMDE - (US) Multivariate Analysis

    44/44

    6. Analysis Validation

    The sample is divided into two sub-samples of size

    50 and re-estimate the regression equation in each

    subsample comparing results.