Upload
maria-mccabe
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Computing in Computing in ArchaeologyArchaeology
Session 12. Multivariate Session 12. Multivariate statisticsstatistics
© Richard Haddlesey www.medievalarchitecture.net
AimsAims
To introduce the techniques of To introduce the techniques of multivariate analysismultivariate analysis• Cluster analysisCluster analysis• Correspondence analysisCorrespondence analysis• Principal components and factor analysisPrincipal components and factor analysis• Multiple regressionMultiple regression• Discriminant analysisDiscriminant analysis
Key textKey text• Fletcher & Lock 2005 Fletcher & Lock 2005 Digging NumbersDigging Numbers
Introduction to Introduction to multivariate analysismultivariate analysis
In earlier lectures we have seen examples In earlier lectures we have seen examples of of univariateunivariate analysis using such analysis using such techniques as simple bar charts, frequency techniques as simple bar charts, frequency tables of one variable and calculations of a tables of one variable and calculations of a simple sample meansimple sample mean
When 2 variables are involved such as in When 2 variables are involved such as in clustered bar charts, scatterplots, when clustered bar charts, scatterplots, when we comparing the mean of 2 groups or we comparing the mean of 2 groups or when we are asking is the any association when we are asking is the any association between 2 variables, then we are using between 2 variables, then we are using such techniques of such techniques of bivariatebivariate analysis analysis
Introduction to Introduction to multivariate analysismultivariate analysis
More than two variables, however, More than two variables, however, we are dealing with we are dealing with multivariatemultivariate analysisanalysis
SPSSSPSS
These techniques require the use of These techniques require the use of suitable statistical packages, such as suitable statistical packages, such as SPSS, because of the considerable SPSS, because of the considerable computation involvedcomputation involved
Consequently, the approach of working Consequently, the approach of working examples by hand used in earlier lectures examples by hand used in earlier lectures is not relevant here and we will not be is not relevant here and we will not be going into the statistical and mathematical going into the statistical and mathematical details behind the techniques details behind the techniques
Techniques discussedTechniques discussed
Type A: reduction and groupingType A: reduction and grouping• Given several measurements (ordinal interval Given several measurements (ordinal interval
or presence/absence) on each of many objects or presence/absence) on each of many objects (i.e. several variables and many cases) is it (i.e. several variables and many cases) is it possible to reduce the number of variables, still possible to reduce the number of variables, still maintaining the information in the data?maintaining the information in the data?
• Using either the original variables or the new Using either the original variables or the new reduced set can these objects be put into reduced set can these objects be put into groups or clusters so that within each group groups or clusters so that within each group the objects are similar but between groups the objects are similar but between groups there are interpretable differences there are interpretable differences
Techniques discussedTechniques discussed
Type B: predictionType B: prediction• Given several measurements (ordinal Given several measurements (ordinal
interval or presence/absence) on each of interval or presence/absence) on each of many objects (i.e. several variables many objects (i.e. several variables many cases) with one of the variables of many cases) with one of the variables of particular interest, is it possible to particular interest, is it possible to predict this variable from the others and predict this variable from the others and if so which variables are important in if so which variables are important in this prediction?this prediction?
Type A techniquesType A techniques
Cluster analysisCluster analysis
Correspondence AnalysisCorrespondence Analysis
Principal Components and Factor Principal Components and Factor Analysis (PCA)Analysis (PCA)
Type B techniquesType B techniques
Multiple regressionMultiple regression
Discriminant analysisDiscriminant analysis
Type A:Type A:1. reduction and grouping 1. reduction and grouping
2. cluster analysis2. cluster analysis
We may wish to askWe may wish to ask• Can spearheads be grouped or clustered, so Can spearheads be grouped or clustered, so
that those within a cluster are similar to each that those within a cluster are similar to each other but there are important differences other but there are important differences between the clusters?between the clusters?
• i.e. if we group by dimension, thus creating i.e. if we group by dimension, thus creating clusters of like sized spearheads, will it show a clusters of like sized spearheads, will it show a difference between various size clusters?difference between various size clusters?
Hierarchical cluster analysisHierarchical cluster analysis Most stats packages offer a standard clustering Most stats packages offer a standard clustering
method called method called hierarchical cluster analysishierarchical cluster analysis
It starts by making each spearhead a single It starts by making each spearhead a single cluster. We then tell it how we want the clusters cluster. We then tell it how we want the clusters produced and SPSS will reduce the single clusters produced and SPSS will reduce the single clusters into one big clusterinto one big cluster
It will then output the data and provide It will then output the data and provide information on cluster membership and indicate information on cluster membership and indicate how good the clustering has been (i.e. how how good the clustering has been (i.e. how similar the members are) similar the members are)
DendrogramsDendrograms
The way to “visualise” the clusters as The way to “visualise” the clusters as they are formed, as an aid to they are formed, as an aid to deciding how many are “significant”, deciding how many are “significant”, is by asking the software to produce is by asking the software to produce a a dendrogramdendrogram
Type B:Type B:1 prediction1 prediction
2 multiple regression2 multiple regression
We have already covered the theory We have already covered the theory of of predictionprediction and and regressionregression in the in the previous lecture. Although we are previous lecture. Although we are now talking about now talking about multiple multiple regression,regression, the principle is the the principle is the same and is best understood through same and is best understood through the practical session to follow the practical session to follow
Type B:Type B:1 prediction1 prediction
2 multiple regression2 multiple regression
We may askWe may ask• Can the length of a spear be predicted if Can the length of a spear be predicted if
the tip is missing?the tip is missing?
Previously we discussed correlation Previously we discussed correlation and regression between two and regression between two variables, multiple regression allows variables, multiple regression allows to use multiple variablesto use multiple variables
Multiple regressionMultiple regression
Multiple regression will produce a linear Multiple regression will produce a linear equation relating spear length, the equation relating spear length, the dependant variabledependant variable, to several , to several independent variablesindependent variables such as socket such as socket length, maximum width, width of upper length, maximum width, width of upper socket and width of lower socket.socket and width of lower socket.
Both the dependant variable (the one to Both the dependant variable (the one to be predicted) and the individual variables be predicted) and the individual variables (the ingredients for this prediction) must (the ingredients for this prediction) must be measured on an interval scale or be be measured on an interval scale or be presence/absence datapresence/absence data