Upload
hamed9811
View
213
Download
0
Embed Size (px)
DESCRIPTION
fghfghgfh
Citation preview
Multivariate Distance and SimilarityRobert F. MurphyCytometry Development Workshop 2000
General Multivariate DatasetWe are given values of p variables for n independent observationsConstruct an n x p matrix M consisting of vectors X1 through Xn each of length p
Multivariate Sample MeanDefine mean vector I of length p
ormatrix notationvector notation
Multivariate VarianceDefine variance vector s2 of length pmatrix notation
Multivariate Varianceor
vector notation
Covariance MatrixDefine a p x p matrix cov (called the covariance matrix) analogous to s2
Covariance MatrixNote that the covariance of a variable with itself is simply the variance of that variable
Univariate DistanceThe simple distance between the values of a single variable j for two observations i and l is
Univariate z-score DistanceTo measure distance in units of standard deviation between the values of a single variable j for two observations i and l we define the z-score distance
Bivariate Euclidean DistanceThe most commonly used measure of distance between two observations i and l on two variables j and k is the Euclidean distance
Multivariate Euclidean DistanceThis can be extended to more than two variables
Effects of variance and covariance on Euclidean distancePoints A and B have similar Euclidean distances from the mean, but point B is clearly more different from the population than point A.BAThe ellipse shows the 50% contour of a hypothetical population.
Mahalanobis DistanceTo account for differences in variance between the variables, and to account for correlations between variables, we use the Mahalanobis distance