More Matrix Algebra; Mean Vectors and Covariance Matrices;
the Multivariate Normal Distribution
PSYC 943 (930): Fundamentals of Multivariate Modeling
Lecture 11: October 3, 2012
PSYC 943: Lecture 11
Todayโs Class
โข The conclusion of Fridayโs lecture on matrix algebra Matrix inverse Zero/ones vector Matrix identity Matrix determinant NOTE: an introduction to principal components analysis will be relocated
later in the semester
โข Putting matrix algebra to use in multivariate statistics
Mean vectors Covariance matrices
โข The multivariate normal distribution
PSYC 943: Lecture 11 2
A Guiding Example
โข To demonstrate matrix algebra, we will make use of data โข Imagine that somehow I collected data SAT test scores for both the
Math (SATM) and Verbal (SATV) sections of 1,000 students
โข The descriptive statistics of this data set are given below:
PSYC 943: Lecture 11 4
Matrix Computing: PROC IML
โข To help demonstrate the topics we will discuss today, I will be showing examples in SAS PROC IML
โข The Interactive Matrix Language (IML) is a scientific computing package in SAS that typically used for statistical routines that arenโt programed elsewhere in SAS
โข Useful documentation for IML: http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#langref_toc.htm โข A great web reference for IML: http://www.psych.yorku.ca/lab/sas/iml.htm
PSYC 943: Lecture 11 6
Moving from Vectors to Matrices
โข A matrix can be thought of as a collection of vectors Matrix operations are vector operations on steroids
โข Matrix algebra defines a set of operations and entities on matrices I will present a version meant to mirror your previous algebra experiences
โข Definitions: Identity matrix Zero vector Ones vector
โข Basic Operations: Addition Subtraction Multiplication โDivisionโ
PSYC 943: Lecture 11 8
Matrix Addition and Subtraction
โข Matrix addition and subtraction are much like vector addition/subtraction
โข Rules: Matrices must be the same size (rows and columns)
โข Method:
The new matrix is constructed of element-by-element addition/subtraction of the previous matrices
โข Order:
The order of the matrices (pre- and post-) does not matter
PSYC 943: Lecture 11 9
Matrix Multiplication
โข Matrix multiplication is a bit more complicated The new matrix may be a different size from either of the two
multiplying matrices ๐(๐ ๐ฅ ๐)๐(๐ ๐ฅ ๐) = ๐(๐ ๐ฅ ๐)
โข Rules: Pre-multiplying matrix must have number of columns equal to the number
of rows of the post-multiplying matrix
โข Method:
The elements of the new matrix consist of the inner (dot) product of the row vectors of the pre-multiplying matrix and the column vectors of the post-multiplying matrix
โข Order:
The order of the matrices (pre- and post-) matters
PSYC 943: Lecture 11 11
Multiplication in Statistics
โข Many statistical formulae with summation can be re-expressed with matrices
โข A common matrix multiplication form is: ๐๐๐ Diagonal elements: โ ๐๐2๐
๐=1 Off-diagonal elements: โ ๐๐๐๐๐๐
๐๐=1
โข For our SAT example:
๐๐๐ =
๏ฟฝ๐๐๐๐๐2๐
๐=1
๏ฟฝ ๐๐๐๐๐๐๐๐๐๐
๐
๐=1
๏ฟฝ ๐๐๐๐๐๐๐๐๐๐
๐
๐=1
๏ฟฝ๐๐๐๐๐2
๐
๐=1
= 251,797,800 251,928,400251,928,400 254,862,700
PSYC 943: Lecture 11 13
Identity Matrix
โข The identity matrix is a matrix that, when pre- or post- multiplied by another matrix results in the original matrix:
๐๐ = ๐ ๐๐ = ๐
โข The identity matrix is a square matrix that has:
Diagonal elements = 1 Off-diagonal elements = 0
๐ผ 3 ๐ฅ 3 =1 0 00 1 00 0 1
PSYC 943: Lecture 11 14
Zero Vector
โข The zero vector is a column vector of zeros
๐(3 ๐ฅ 1) =000
โข When pre- or post- multiplied the result is the zero vector: ๐๐ = ๐ ๐๐ = ๐
PSYC 943: Lecture 11 15
Ones Vector
โข A ones vector is a column vector of 1s:
๐(3 ๐ฅ 1) =111
โข The ones vector is useful for calculating statistical terms, such as the
mean vector and the covariance matrix
PSYC 943: Lecture 11 16
Matrix โDivisionโ: The Inverse Matrix
โข Division from algebra: First: ๐
๐= 1
๐๐ = ๐โ1๐
Second: ๐๐
= 1
โข โDivisionโ in matrices serves a similar role For square and symmetric matrices, an inverse matrix is a matrix that when
pre- or post- multiplied with another matrix produces the identity matrix: ๐โ1๐ = ๐ ๐๐โ๐ = ๐
โข Calculation of the matrix inverse is complicated Even computers have a tough time
โข Not all matrices can be inverted Non-invertible matrices are called singular matrices
In statistics, singular matrices are commonly caused by linear dependencies PSYC 943: Lecture 11 17
The Inverse
โข In data: the inverse shows up constantly in statistics Models which assume some type of (multivariate) normality need an
inverse covariance matrix
โข Using our SAT example Our data matrix was size (1000 x 2), which is not invertible However ๐๐๐ was size (2 x 2) โ square, and symmetric
๐๐๐ = 251,797,800 251,928,400251,928,400 254,862,700
The inverse is:
๐๐๐ โ1 = 3.61๐ธ โ 7 โ3.57๐ธ โ 7โ3.57๐ธ โ 7 3.56๐ธ โ 7
PSYC 943: Lecture 11 18
Matrix Determinants โข A square matrix can be characterized by a scalar value called a
determinant: det๐ = ๐
โข Calculation of the determinant is tedious
The determinant for the covariance matrix of our SAT example was 6,514,104.5
โข For two-by-two matrices ๐ ๐๐ ๐ = ๐๐ โ ๐๐
โข The determinant is useful in statistics:
Shows up in multivariate statistical distributions Is a measure of โgeneralizedโ variance of multiple variables
โข If the determinant is positive, the matrix is called positive definite
Is invertible
โข If the determinant is not positive, the matrix is called non-positive definite
Not invertible PSYC 943: Lecture 11 19
Matrix Trace
โข For a square matrix ๐ with V rows/columns, the trace is the sum of the diagonal elements:
๐ก๐ก๐ = ๏ฟฝ๐๐ฃ๐ฃ
๐
๐ฃ=1
โข For our data, the trace of the correlation matrix is 2
For all correlation matrices, the trace is equal to the number of variables because all diagonal elements are 1
โข The trace is considered the total variance in multivariate statistics
Used as a target to recover when applying statistical models
PSYC 943: Lecture 11 20
Matrix Algebra Operations (for help in reading stats manuals)
โข ๐ + ๐ + ๐ = ๐ + (๐ + ๐)
โข ๐ + ๐ = ๐ + ๐ โข ๐ ๐ + ๐ = ๐๐ + ๐๐ โข ๐ + ๐ ๐ = ๐๐ + ๐๐ โข ๐ + ๐ ๐ = ๐๐ + ๐๐ โข ๐๐ ๐ = ๐(๐๐) โข ๐๐ ๐ = ๐๐๐ โข ๐ ๐๐ = ๐๐ ๐ โข ๐ ๐๐ = ๐๐ ๐
โข ๐ ๐ + ๐ = ๐๐ + ๐๐ โข ๐๐ ๐ = ๐๐๐๐ โข For ๐ฅ๐ such that ๐๐ฅ๐ exists:
๏ฟฝ๐๐ฑ๐ =๐
๐=1
๐๏ฟฝ๐ฑ๐
๐
๐=1
๏ฟฝ ๐๐ฑ๐ ๐๐ฑ๐๐ =
๐
๐=1
๐ ๏ฟฝ๐ฑ๐๐ฑ๐๐๐
๐=1
๐๐
PSYC 943: Lecture 11 21
Multivariate Statistics โข Up to this point in this course, we have focused on the prediction (or
modeling) of a single variable Conditional distributions (aka, generalized linear models)
โข Multivariate statistics is about exploring joint distributions
How variables relate to each other simultaneously
โข Therefore, we must adapt our conditional distributions to have multiple variables, simultaneously (later, as multiple outcomes)
โข We will now look at the joint distributions of two variables ๐ ๐ฅ1, ๐ฅ2 or in matrix form: ๐ ๐ (where ๐ is size N x 2; ๐ ๐ gives a scalar/single number)
Beginning with two, then moving to anything more than two We will begin by looking at multivariate descriptive statistics
Mean vectors and covariance matrices
โข Today, we will only consider the joint distribution of sets of variables โ but next time we will put this into a GLM-like setup
The joint distribution will the be conditional on other variables PSYC 943: Lecture 11 23
Multiple Means: The Mean Vector
โข We can use a vector to describe the set of means for our data
๐ฑ๏ฟฝ =1๐๐๐๐ =
๏ฟฝฬ ๏ฟฝ1๏ฟฝฬ ๏ฟฝ2โฎ๏ฟฝฬ ๏ฟฝ๐
Here ๐ is a N x 1 vector of 1s The resulting mean vector is a v x 1 vector of means
โข For our data:
๐ฑ๏ฟฝ = 499.32499.27 = ๏ฟฝฬ ๏ฟฝ๐๐๐๐
๏ฟฝฬ ๏ฟฝ๐๐๐๐
โข In SAS PROC IML:
PSYC 943: Lecture 11 24
Mean Vector: Graphically
โข The mean vector is the center of the distribution of both variables
PSYC 943: Lecture 11 25
200
400
600
800
200 400 600 800
SATV
SATM
Covariance of a Pair of Variables
โข The covariance is a measure of the relatedness Expressed in the product of the units of the two variables:
๐ ๐ฅ1๐ฅ2 =1๐๏ฟฝ ๐ฅ๐1 โ ๏ฟฝฬ ๏ฟฝ1 ๐ฅ๐2 โ ๏ฟฝฬ ๏ฟฝ2
๐
๐=1
The covariance between SATV and SATM was 3,132.22 (in SAT Verbal-Maths)
The denominator N is the ML version โ unbiased is N-1
โข Because the units of the covariance are difficult to understand, we more commonly describe association (correlation) between two variables with correlation Covariance divided by the product of each variableโs standard deviation
PSYC 943: Lecture 11 26
Correlation of a Pair of Varibles
โข Correlation is covariance divided by the product of the standard deviation of each variable:
๐ก๐ฅ1๐ฅ2 =๐ ๐ฅ1๐ฅ2
๐ ๐ฅ12 ๐ ๐ฅ2
2
The correlation between SATM and SATV was 0.78
โข Correlation is unitless โ it only ranges between -1 and 1
If ๐ฅ1 and ๐ฅ2 both had variances of 1, the covariance between them would be a correlation Covariance of standardized variables = correlation
PSYC 943: Lecture 11 27
Covariance and Correlation in Matrices
โข The covariance matrix (for any number of variables v) is found by:
๐ =1๐
๐ โ ๐๐ฑ๏ฟฝ๐ ๐ ๐ โ ๐๐ฑ๏ฟฝ๐ =๐ ๐ฅ12 โฏ ๐ ๐ฅ1๐ฅ๐โฎ โฑ โฎ
๐ ๐ฅ1๐ฅ๐ โฏ ๐ ๐ฅ๐2
โข In SAS PROC IML:
โข ๐ = 2,477.34 3,123.223,132.22 6,589.71
PSYC 943: Lecture 11 28
From Covariance to Correlation
โข If we take the SDs (the square root of the diagonal of the covariance matrix) and put them into a diagonal matrix ๐, the correlation matrix is found by:
๐ = ๐โ1๐๐โ1 =
๐ ๐ฅ12
๐ ๐ฅ12 ๐ ๐ฅ1
2โฏ
๐ ๐ฅ1๐ฅ๐
๐ ๐ฅ12 ๐ ๐ฅ๐
2
โฎ โฑ โฎ๐ ๐ฅ1๐ฅ๐
๐ ๐ฅ12 ๐ ๐ฅ๐
2โฏ
๐ ๐ฅ๐2
๐ ๐ฅ๐2 ๐ ๐ฅ๐
2
=1 โฏ ๐ก๐ฅ1๐ฅ๐โฎ โฑ โฎ
๐ก๐ฅ1๐ฅ๐ โฏ 1
PSYC 943: Lecture 11 29
Example Covariance Matrix
โข For our data, the covariance matrix was:
๐ = 2,477.34 3,123.223,132.22 6,589.71
โข The diagonal matrix ๐ was:
๐ =2,477.34 0
0 6,589.71= 49.77 0
0 81.18
โข The correlation matrix ๐ was:
๐ = ๐โ1๐๐โ1 =
149.77 0
01
81.18
2,477.34 3,123.223,132.22 6,589.71
149.77 0
01
81.18
๐ = 1.00 .78.78 1.00
PSYC 943: Lecture 11 30
Generalized Variance โข The determinant of the covariance matrix is the generalized variance
Generalized Sample Variance = ๐
โข It is a measure of spread across all variables Reflecting how much overlap (covariance) in variables occurs in the sample Amount of overlap reduces the generalized sample variance Generalized variance from our SAT example: 6,514,104.5 Generalized variance if zero covariance/correlation: 16,324,929
โข The generalized sample variance is: Largest when variables are uncorrelated Zero when variables form a linear dependency
โข In data:
The generalized variance is seldom used descriptively, but shows up more frequently in maximum likelihood functions
PSYC 943: Lecture 11 32
Total Sample Variance
โข The total sample variance is the sum of the variances of each variable in the sample The sum of the diagonal elements of the sample covariance matrix The trace of the sample covariance matrix
๐๐๐ก๐๐ ๐๐๐๐๐๐ ๐๐๐ก๐๐๐๐๐ = ๏ฟฝ๐ ๐ฅ๐2
๐
๐ฃ=1
= tr ๐
โข Total sample variance for our SAT example:
โข The total sample variance does not take into consideration the covariances among the variables Will not equal zero if linearly dependency exists
โข In data:
The total sample variance is commonly used as the denominator (target) when calculating variance accounted for measures
PSYC 943: Lecture 11 33
Multivariate Normal Distribution
โข The multivariate normal distribution is the generalization of the univariate normal distribution to multiple variables The bivariate normal distribution just shown is part of the MVN
โข The MVN provides the relative likelihood of observing all V variables
for a subject p simultaneously: ๐ฑ๐ = ๐ฅ๐1 ๐ฅ๐2 โฆ ๐ฅ๐๐
โข The multivariate normal density function is:
๐ ๐ฑ๐ =1
2๐๐2 ๐บ
12
exp โ๐ฑ๐๐ โ ๐ ๐๐บโ1 ๐ฑ๐๐ โ ๐
2
PSYC 943: Lecture 11 35
The Multivariate Normal Distribution
๐ ๐ฑ๐ =1
2๐๐2 ๐บ
12
exp โ๐ฑ๐๐ โ ๐ ๐๐บโ1 ๐ฑ๐๐ โ ๐
2
โข The mean vector is ๐ =
๐๐ฅ1๐๐ฅ2โฎ๐๐ฅ๐
โข The covariance matrix is ๐บ =
๐๐ฅ12 ๐๐ฅ1๐ฅ2 โฏ ๐๐ฅ1๐ฅ๐
๐๐ฅ1๐ฅ2 ๐๐ฅ22 โฏ ๐๐ฅ2๐ฅ๐
โฎ โฎ โฑ โฎ๐๐ฅ1๐ฅ๐ ๐๐ฅ2๐ฅ๐ โฏ ๐๐ฅ๐
2
The covariance matrix must be non-singular (invertible)
PSYC 943: Lecture 11 36
Comparing Univariate and Multivariate Normal Distributions
โข The univariate normal distribution:
๐ ๐ฅ๐ =12๐๐2
exp โ๐ฅ โ ๐ 2
2๐2
โข The univariate normal, rewritten with a little algebra:
๐ ๐ฅ๐ =1
2๐12|๐2|
12
exp โ๐ฅ โ ๐ ๐โ
12 ๐ฅ โ ๐
2
โข The multivariate normal distribution
๐ ๐ฑ๐ =1
2๐๐2 ๐บ
12
exp โ๐ฑ๐๐ โ ๐ ๐๐บโ1 ๐ฑ๐๐ โ ๐
2
When ๐ = 1 (one variable), the MVN is a univariate normal distribution
PSYC 943: Lecture 11 37
The Exponent Term
โข The term in the exponent (without the โ12) is called the squared
Mahalanobis Distance ๐2 ๐๐ = ๐ฑ๐๐ โ ๐ ๐๐บโ1 ๐ฑ๐๐ โ ๐
Sometimes called the statistical distance
Describes how far an observation is from its mean vector, in
standardized units
Like a multivariate Z score (but, if data are MVN, is actually distributed as a ๐2variable with DF = number of variables in X)
Can be used to assess if data follow MVN
PSYC 943: Lecture 11 38
Multivariate Normal Notation
โข Standard notation for the multivariate normal distribution of v variables is ๐๐ฃ ๐,๐บ Our SAT example would use a bivariate normal: ๐2 ๐,๐บ
โข In data: The multivariate normal distribution serves as the basis for most every
statistical technique commonly used in the social and educational sciences General linear models (ANOVA, regression, MANOVA) General linear mixed models (HLM/multilevel models) Factor and structural equation models (EFA, CFA, SEM, path models) Multiple imputation for missing data
Simply put, the world of commonly used statistics revolves around the
multivariate normal distribution Understanding it is the key to understanding many statistical methods
PSYC 943: Lecture 11 39
Bivariate Normal Plot #1
๐ =๐๐ฅ1๐๐ฅ2
= 00 ,๐บ =
๐๐ฅ12 ๐๐ฅ1๐ฅ2
๐๐ฅ1๐ฅ2 ๐๐ฅ22 = 1 0
0 1
Density Surface (3D) Density Surface (2D): Contour Plot
PSYC 943: Lecture 11 40
Bivariate Normal Plot #2 (Multivariate Normal)
๐ =๐๐ฅ1๐๐ฅ2
= 00 ,๐บ =
๐๐ฅ12 ๐๐ฅ1๐ฅ2
๐๐ฅ1๐ฅ2 ๐๐ฅ22 = 1 .5
.5 1
Density Surface (3D) Density Surface (2D): Contour Plot
PSYC 943: Lecture 11 41
Multivariate Normal Properties
โข The multivariate normal distribution has some useful properties that show up in statistical methods
โข If ๐ is distributed multivariate normally: 1. Linear combinations of ๐ are normally distributed
2. All subsets of ๐ are multivariate normally distributed
3. A zero covariance between a pair of variables of ๐ implies that the
variables are independent
4. Conditional distributions of ๐ are multivariate normal
PSYC 943: Lecture 11 42
Multivariate Normal Distribution in PROC IML
โข To demonstrate how the MVN works, we will now investigate how the PDF provides the likelihood (height) for a given observation: Here we will use the SAT data and assume the sample mean vector and
covariance matrix are known to be the true:
๐ = 499.32498.27 ; ๐ = 2,477.34 3,123.22
3,132.22 6,589.71
โข We will compute the likelihood value for several observations (SEE EXAMPLE SAS SYNTAX FOR HOW THIS WORKS): ๐631,โ = 590 730 ; ๐ ๐ = 0.00000087528 ๐717,โ = 340 300 ; ๐ ๐ = 0.00000037082 ๐ = ๐๏ฟฝ = 499.32 498.27 ; ๐ ๐ = 0.0000624
โข Note: this is the height for these observations, not the joint likelihood across all the data Next time we will use PROC MIXED to find the parameters in ๐ and ๐บ using
maximum likelihood
PSYC 943: Lecture 11 43
LikelihoodsโฆFrom SAS
PSYC 943: Lecture 11 44
๐ ๐ฑ๐ =1
2๐๐ฃ2 ๐บ
12
exp โ๐ฑ๐๐ โ ๐ ๐๐บโ1 ๐ฑ๐๐ โ ๐
2
Wrapping Up
โข The last two classes set the stage to discuss multivariate statistical methods that use maximum likelihood
โข Matrix algebra was necessary so as to concisely talk about our distributions (which will soon be models)
โข The multivariate normal distribution will be necessary to understand as it is the most commonly used distribution for estimation of multivariate models
โข Friday we will get back into data analysis โ but for multivariate observationsโฆusing SAS PROC MIXED Each term of the MVN will be mapped onto the PROC MIXED output
PSYC 943: Lecture 11 45
Recommended