Matrix Algebra Notes

7/21/2019 Matrix Algebra Notes

1/25

Econometrics - II

Indira Gandhi Institute of Development ResearchJanuary - May Semester 2013

Subrata Sarkar c

Elements Of Matrix Algebra

Start with an example

Yt= 1+ 2X2t+ 3X3t+ + kXkt+ Ut t= 1, 2, n

Writing for each observation

Y1 = 1+ 2X21+ 3X31+ + kXk1+ U1

Y2 = 1+ 2X22+ 3X32+ + kXk2+ U2...

Yn = 1+ 2X2n+ 3X3n+ + kXkn+ Un

Summarize these n equations in a convenient form

Y1Y2...

Yn

=

1 X21 Xk11 X22 Xk2...

... ...

1 X2n Xkn

12...

k

+

U1U2...

Un

or,Y =X + U

Y, X,, U are vectors and matrices

Vector: An ordered sequence of numbers arranged in a row or a column

Y , , U arranged in column

U, Y n element column vector k element column vector

1


2/25

Y = [Y1, Y2 Yn]transpose ofYU = [U1, U2 Un] transpose ofU

= [1, 2 k] transpose of

X=

1 X21 Xk11 X22 Xk2...

... ...

1 X2n Xkn

X is a matrix

Matrix: A rectangular array of elements.Order of a Matrix = number of rows number of columns

=n k (number of rows being always written first)

Observations:

1. A column vector of n elements i.e.

Y =

Y1Y2...Yn

is a matrix of ordern 1

2. A row vector of k elements i.e. Z = [Z1 Z2 Zk] is a matrix of order1 k

3. Representing the X matrix

X= [X1n1 X2n1 Xkn1 ]

or

X=

S1S2...

Sn

2


3/25

4. Transpose of a matrix

Xnk= [Xij] Xkn = [Xji]

Example

X=

1 6 43 2 24 1 15 3 5

X =

1 3 4 56 2 1 3

4 2 1 5

1. Operations on Vectors

(a) Multiplication by a scalar

2

23

4

=

2 22 3

2 4

=

46

8

(b) Addition of two vectorsU+ V= Sum of corresponding elementsorder has to be the same

(c) Linear combination K1U +K2V where K1 and K2 are con-stants.

(d) Vector Multiplication

ab= [1 2 3]

456

= 1 4 + 2 5 + 3 6

a = 1 3b = 3 1

} number of elements have to be same

3


4/25

A special vector: S the sum vector

S=

11...1

thereforeas=

ni=1

ai

n 1

2. Operations on Matrices

(a) Multiplication by a scalarK A= {Kaij}

(b) Addition of two matrices sum of corresponding elements

(c) Equality of Matrices orders have to be same

(d) Matrix Multiplication: Ank Bkm

A Bn k k m

n m

=

a11 a1ka21 a2k

...an1 ank

b11 b1mb21 b2m

...bk1 bkm

=

a1b1 a

1bma2b1 a

2bm...

anb1 a

nbm

The two matrices have to be conformable.

An example: 2 33 1

4 2

6 32 2

=

2 6 + 3 2 2 3 + 3 23 6 + 1 2 3 3 + 1 2

4 6 + 2 2 4 3 + 2 2

3 2 2 2

A special case

a11 a1m

...an1 anm

12...

m

=

a

1a2...

an

4


5/25

3. Some Special Matrices

(a) Diagonal matrix

Ann=

a11a22

. . .

ann

has to be square

(b) The identity matrix

Inn=

11

. . .

1

nn

(c) Symmetric matrixA = A

(d) A scalar matrix

=

. . .

=I

(e) Idempotent matrix. Has to be square

A = A2

A = A2

=A3

5


6/25

4. Some Properties of Matrices

(a) (AB) =B A, (ABC) =CBA

(b) (A + B) + C=A + (B+ C)

(c) (AB)C=A(BC)

(d) A(B+ C) =AB+ AC

(e) AI=A

(f) (A + B) =A + B

5. Trace of a Square Matrix

tr(A) =n

i=1

aij

Properties of Trace

tr(ABC) =tr(BC A) =tr(CAB)

6. Matrix Inverse

In algebra we have ab = 1 b = 1a

In matrix algebra we ask that given Ann , does there B |AB= In?

Answer: If columns of A are linearly independent, then there B suchthat AB = I. In that case B is denoted as A1 i.e. AA1 =I

6


7/25

Linear Independence: a1, a2, anare linearly independent. If not then

ai can be written as a linear combination of the other ais

Theorem: If all columns of A are linearly independent, then so are allthe rows. Then there exists C such that

CA= I

NowC = CI

= CAB

= IB

ThereforeC = B. = A

1

Therefore if A is square matrix with all columns (rows) linearly in-dependent, then there exists a unique matrix called the inverse of A,denoted by A1 such that

AA1 =A1A= I . A is non singular

7. Properties of Inverse

(a) [A1]1 =A

(b) [A]1 = [A1]

(c) [AB]1 =B1A1

8. Calculations of Inverse

A=

a11 a12a21 a22

Replace each element by its minor a22 a21a12 a11

7


8/25

Sign the minors -, i.e. get cofactors (1)i+j a22 a21a12 a11

Transpose a22 a12a21 a11

Adj(A)

Get determinant a11a22 a12a21=|A |

Divide each element ofAdj(A) by| A |

Therefore A1 = 1

|A |[adj(A)]

A=

a11 a12 a13a21 a22 a23a11 a31 a33

Step 1: Minor

(a22a33 a23a32) (a21a33 a31a32) (a21a32 a22a31)

Step 2: Cofactor

+ + + + +

Step 3: Transpose Adjoint

8


9/25

Step 4: Determinant

|A | = a11[a22a33 a23a32]

= a12[a21a33 a23a31]

= +a13[a21a32 a22a31]

Step 5: Inverse

Divide every element of Adjoint (Step 3) by determinant (Step 4).

9. The Rank of a Matrix

The rank of a matrix A, not necessarily square, is the maximum num-ber of linearly independent columns (or rows).

The maximum number of linearly independent columns of A = Themaximum number of linearly independent rows of A.

rank is unique and is denoted by (A)

(A) min[m, n]

When(A) =m < n A has full row rank

When(A) =n < m A has full column rank

If A is square matrix of order n, with full row(column) rank then A isnon-singular

Example A

1 2 3 41 0 1 12 2 4 53 6 7 4

9


10/25

Summary of Basic Matrix Algebra

1. Matrix: A rectangular array of elements.

A=

1 2 3 24 5 6 7

7 8 9 2

34

={aij}

A is a 3 (rows) 4 (columns) matrtix

2. Row vector: x= [1 2 3 4]14

3. Column vector: y=

78

9

31

4. Diagonal Matrix: D=

1 2

3

5. Symmetric C matrix: {aij}= {aji}

A= 1 22 7

6. Transpose of a matrix:

A=

1 2 34 5 6

23

A =

1 42 53 6

32Symmetric matrix : A= A

10


11/25

7. Rank of Matrix: The number of linearly independent rows (columns)

Rank Am n

min[m, n]

8. Square Matrix: Ann

if Rank A = n Then A has an inverse :

AA1 =I=A1A

whereInn=

1

11

. . .

1

9. Addition of matrices: An m and Bn m

same order

A + B = {aij} + {bij}= {aij+ Bij}

10. Multiplication:

Anm Bmp= ABnp.[Example]

conformable

11. (AB) =B A

12. (AB)1 = B1A1 assuming A and B are square matrices with fullrank

11


12/25

Quadratic Form and Matrix Derivatives

1. Quadratic Form

Consider the expression q1 = 2X21 +X1X2+ X

23 Calling X the

column vector of Xs, ie.X = [X1, X2 Xn], a quadratic form can beput in the form q= XAXwith A symmetric. A is uniqueonce theorder of X is chosen. A has

in the diagonal ie. aii, the coefficient attached to X2i

in the off-diagonal, aij, 12 , the coefficient attached to XiXj

In our example: A=

2 1/2 01/2 0 0

0 0 1

Example 1:A =

2 11 1

XAX = 2X21 + 2X1X2+ X22 0 X

= (X1+ X2)2 + X21 >0 X= 0

Example 2:A =

1 11 1

XAX = X21 + 2X1X2+ X22

= (X1+ X2)2 0 X

but X= 0 such that XAX= 0

Definition:

12


13/25

A quadratic form is said to be positive definite ifXAX >0 for

allX= 0 A quadratic form is said to be positive semi-definite ifXAX0

for all X and X= 0such thatXAX= 0

Remarks:

(a) A matrix is said to be n.n.d if it is either P.D or P.S.D.

(b) The concept of n.d and n.s.d can be defined similarly (by reversingthe sign).

(c) A symmetric matrix A is said to be P.D (P.S.D) if the associatedquadratic form is in P.D (P.S.D)

There are three equivalent conditions for a symmetric matrix A to beP.D. These are iffconditions

(a) The matrix A is non-singular

(b) a non singular matrix P, such that PP =A

(c) a non singular matrix Q, such that Q=I

Some more properties related to Quadratic Forms

(a) Let B be any n k matrix. Then

i. BB (order ofk k) is n.n.d

ii. BB is p.d if rank(B) = k

iii. BB is p.s.d if rank(B) < k

(b) Ais p.dB is n.n.d

A + B is p.d.

(c) A p.d. n nB any n k

13


14/25

i. BAB (order ofk k) is n.n.d

ii. B

AB is p.d if rank(B) = kiii. BAB is p.s.d if rank(B) < k

2. Matrix Derivatives

(a) Scalar Function

Y = f(X) where X is a vector

Y = AX1 X2 =f(X1, X2) X=

X1

X2 Definition

Y

X =

Y/X1Y/X2...Y/Xn

gradient vector

Y

X=

Y

X1

Y

Xn

gradient vector

Our example YX

= AX11 X2AX1 X

12

i. Special case: Linear function

Y =P1X1+ P2X2+ + PnXn= PX=XP

Y

X= [P1 P2 Pn] =P

Y

X = P

ii. Special case: Quadratic form

Y = XAX A symmetric

Y

X = 2AX Example A=

2 11 1

14


15/25

(b) Vector Function

Ym1= Fm1(Xn1)

Y1 = F1(X)

Y2 = F2(X)...

Ym = Fm(X)

Then

Y

X = Y1X1

Y1X2

Y1Xn

.

..YmX1

YmX2

YmXn

The Jacobian Martix

Special case: Linear vector functions

Y1 = P

1X

Y2 = P

2X...

Ym = P

mX

Y =P X P =

P1P2...

Pm

Then Y

X=P

X = IX

Y

X= I

15


16/25

(c) Application of Derivative

q = 1

2XAX+ bX+ c where A is p.d.

q

X =

1

22AX+ b

q

X = AX+ b= 0 (F.O.C)

Therefore AX = b

X = A1b [A1 since a is p.d.]

Also 2

qXX

=A is p.d. X defines a minimum of q

(The Hessian Matrix)

Proof:

Let X=X + Z

Then XAX= (X+Z)A(X+Z) =XAX+2XAZ+ZAZ

Therefore q = 1

2XAX+2XAZ+

1

2ZAZ+ b(X + Z) + c

q = 1

2XAX + bX + c + XAZ+

1

2ZAZ+ bZ

q = q + (bA1)AZ+1

2ZAZ+ bZ

= q bZ+1

2ZAZ+bZ

q = q +1

2

ZAZ >0 for Z= 0

Thereforeq > q for X=X

16


17/25

Matrix Statistics

(a) Random Vectors and Matrices

IfX1, X2 Xn are random variables then X=

X1X2

...Xn

is a ran-

dom vector. Elements are r.v.s.

Likewise, W ={Wij} is a random matrix when Wijs are all ran-dom variables.

(b) Expectation

E(X) = [E(Xi)] =

E(X1)E(X2)

...E(Xn)

Let E(Xi) = for all i. E(X) =

E(X1)

...E(Xn)

=

...

=Sn

sum vector

Properties of Expectation: Let A, B, C, U be constants

i. E(U) =U

ii. E(AX) =AE(X)iii. E(X+ Y) =E(X) + E(Y)

iv. E(BX C) =BE(X)C

v. E(W1W2) =E(W1)E(W2) when W1 andW2 are iid

17


18/25

(c) Variance and Co-variance Matrix

X=

X1X2

...Xn

Let E(X) =

E(X1)...

E(Xn)

=

1...

n

Therefore E(Xi) =i

i. There are n expectations.

ii. There are variances and co-variances

there are n variances E[Xi i]2 =ii> 0

there are (n)(n 1)2

covariances E[Xii][Xj j ] =

ij i=j ij =ji

The variance-covariance matrix of X can be written as

V(X) = E[(X )(X )] where E(X) = =

U1U1...

Un

= {E(Xi i)(Xj j)}= {ij} V

In the diagonal we have variances and in the off-diagonal we havecovariances. The matrix is symmetric.

Remarks:

i. If the Xis are uncorrelated then ij = 0 i, j, i = j.ThenV= diag{ii}

ii. In addition, if there is homoskedasticity i.eii= 2 i, then

V =2In

iii. IfE(X) = 0 then V(X) =E(XX)

(d) Linear Transformation

Consider X withE(X) =, V(X) =VDefineY =AX is a L.T of X

18


19/25

Then E(Y) = A

V(Y) = AV A

Proof: Y =AX

E(Y) = E(AX) =AE(X) =A

V(Y) = E(Y E(Y))(Y E(Y))

= E[AX A][AX A]

= E[A(X )][A(X )]

= E[A(X )(X )A]

= A E(X )(X )

A

= AV A

Now consider the scalar linear transformation Y =ZX

V(Y)>0 if Y is not a constant. i.e. ifZXis not a constant.

V(Y) = 0 if Y is a constant. i.e. if ZX is a constant. i.e X are linearlydependent.

V(Y) =ZV Z >0 Z= 0 if the Xi are linearly independent= 0 if theXi are linearly dependent

Conclusion: The variance-covariance matrix is always p.d, except in caseswhere the Xis are linearly dependent, in which case it is a p.s.d matrix.

Corollary:E(X) = V(X) =V V positive def.Then it is possible to get a standard vector through a linear transformation.i.e

E(Y) = 0V(Y) =I

Define:Y = Q[X ]

E(Y) = QE(X ) =Q[E(X) ] = 0

V(Y) = QV Q =I for some Q

19


20/25

(e) Expectation of a Quadratic Form

XE(X) = 0 V(X) =V

q= XAXwhere A is a symmetric matrix of constants.

ThenE(q) =E(XAX) =tr AV

Trace: trace of a square matrix is the sum of its diagonal elements.

Proof: XAX is a scalar, and so equal to its trace.

XAX=tr(XAX)

E(XAX) =E(tr XAX)

Trace is commutative

E(XAX) =E(tr AXX)

Trace is a linear operator

E(XAX) = tr E(AXX)

= tr AE(XX)

= tr AV

Example:V =2InA is idempotent of rank K

Therefore E(XAX) =tr A2I=2tr A= 2K

[Rank (A) = tr (A) since A is idempotent]

(f) Multinomial Normal Distribution and Related Distributions

20


21/25

i. Introduction

Xi i = 1, 2, n be n independent normal random vari-ables with

E(Xi) =iV(Xi) =

2i

Xi N(i, 2i )

A. Density ofXi f(Xi) = 1

22ie

1

22i

(Xii)2

B. Joint density ofX1, X2 Xn, whenXis are independent

is the product of the individual densities.

f(X1, X2 Xn) = (2)n

2 (n

i=1

2i )

1

2 e

1

2

i

1

2i

(Xii)2

Lets write the above in vector-matrix notation

X=

X1X2

...Xn

E(X) =

12...

n

=

V(X) =

21

. . .

2n

= V

Note:

|V|= 21 22

2n , V

1 =

1/21. . .

1/2n

i1

2i(Xi i)

2 = 1

21(X1 1)

2 + 1

22(X2 2)

2 + + 1

2n(Xn n

is a quadratic form (X )V1(X )

Therefore f(X1 Xn) = (2)n

2 |V|1

2 e1

2(X)V1(X)

21


22/25

ii. Formal definition: The random vector X, withE(X) = and

V(X) =V, is said to be normally distributed iff:f(X) = (2)

n

2 |V|1

2 e1

2(X)V1(X)

We then write XN(, V)

When XN(0, In) then we say X is a standard normalvector.

iii. Properties of normal distribution

A. If XN(, V) and Y = AX +

m 1 m n n 1 m 1

Then Y N(A + ,AVA

)E(Y) = A +

V(Y) = AV A is p.d. sincer(A) =m

B. The orthogonal transformation of a standard normal vec-tor is also a standard normal vector.

Z = CX orthogonal transformation

n 1 n n n 1 C1 =C

CC1 =CC

CC =ICC =I

E(Z) = CE(X) = 0

V(Z) = CV C

= CIC

= CC

= I

Corollary: If X N(, V) then by a suitable transfor-mation we can get a S.N.V.

Since V is pd Q|QV Q =I

Y =Q(X ) E(Y) = 0

V(Y) =QV Q =I

22


23/25

Y N(0, I)

C. For normal variables zero covariance independence

XN(, V) X=

X1X2

SX1(nS)X1

with E(X) =

E(X1)E(X2)

=

12

V(X) =

V11 V12V21 V22

Zero covariance between X1, X

2 V

12= V

21= 0

Then f(X) = (2)n

2 (|V11||V22|)

1

2 e

1

2[(X11)(X22)]

V111

V122

X1 1X2 2

= (2)n

2 |V11|

1

2 |V22|

1

2 e1

2[(X11)V

1

11 (X11)+(X22)V

1

22 (X22)]

= (2)S

2 |V11|

1

2 e1

2(X11)V

1

11 (X11).(2)

nS

2 |V22|

1

2

e1

2(X22)V

1

22 (X22) indep.

iv. The chi-squared

XN(0, In) independent

X21 + X22 + + X

2n

2(n)

23


24/25

Characterization

A. X21 + X

22 + X

2n X

XThereforeX N(0, In)

XX 2(n)

B. Y N(, V)

(Y )V1(Y ) 2(n)

Since V is p.d , Q|QV Q =I

X=Q(Y ) X is normal

E(X) = 0 V(X) =QV Q

=I

ThereforeX N(0, I)

XX 2(n)

(Y )QQ(Y ) 2(n)

QV Q= I

Q1QV Q =Q1

V Q =Q1

V(Q)(Q)1 =Q1(Q)1

V = (QQ)1

V1 =QQ)

Therefore (Y )V1(Y ) 2(n)

C.

Z N(0, In)

Z = Z1

Z2 SX1

(nS)X1

Z1 N(0, IS) Z2 N(0, InS) independent

Z1Z1 2(S) Z

2Z2 2(nS)independent

24


25/25

Now observe

Z1 = [IS0]Z Z1 = AZ

S 1 S n n 1

Z1Z1= ZAAZ = Z

IS

0

[IS0]Z=Z

IS 00 0

Z

Therefore Z1Z1 = ZMZ with M idempotent of r(M) = S

ZMZ2(S)

Z2 = [0 InS]

Z1Z2

= Z

Z2Z2 = Z

0 00 InS

Z=ZMZ M ID r(M) =n S

Therefore ZMZ 2(nS)

Also M =I M

MM =M(I M) =M M.M=M M= 0

Theorem: If Z N(0, In) and M is an idempotent matrtix of rankS, then

ZMZ 2(S)

Z(1 M)Z 2(nS)

and 2(S) and 2(nS) are independent since M(I-M) = 0

25

Documents

Matrix Algebra Notes