461
Angewandte Multivariate Statistik Angewandte Multivariate Statistik Prof. Dr. Ostap Okhrin Ostap Okhrin 1 of 461

Angewandte Multivariate Statistik · City Country Pop. (10000) Order Statistics Tokyo Japan 3420 x ... Jakarta Indonesia 1655 x (5) ... K x x i h K isthekernel

  • Upload
    dokhue

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Angewandte Multivariate Statistik

Angewandte Multivariate Statistik

Prof. Dr. Ostap Okhrin

Ostap Okhrin 1 of 461

Angewandte Multivariate Statistik

Basis

These slides strongly based on those made by Ladislaus vonBortkiewicz Chair of Statistics, Humboldt University Berlin

Applied Multivariate Statistical Analysis(W.Härdle, L.Simar)- lvb.wiwi.hu-berlin.de

Ostap Okhrin 2 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Comparison of Batches

An old Swiss 1000-franc bank note.

Ostap Okhrin 3 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Example: Swiss bank dataThe authorities have measured

X1 = length of the billX2 = height of the bill (left)X3 = height of the bill (right)X4 = distance of the inner frame to the lower borderX5 = distance of the inner frame to the upper borderX6 = length of the diagonal of the central picture.

Ostap Okhrin 4 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Example: (cont.)The dataset consists of 200 measurements on Swiss bank notes. Thefirst half of these bank notes are genuine, the other half are forgedbank notes.It is important to be able to decide whether a given banknote isgenuine.We want to derive a good rule that separates the genuine andcounterfeit banknotes.Which measurement is the most informative? We have to visualize thedifference.

Ostap Okhrin 5 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Boxplots

Boxplot is a graphical technique for displaying the distribution of variables. helps us in seeing location, skewness, spread, tail length and

outlying points. is particularly useful in comparing different batches. is a graphical representation of the Five Number Summary.

Ostap Okhrin 6 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

City Country Pop. (10000) Order StatisticsTokyo Japan 3420 x(15)

Mexico City Mexico 2280 x(14)

Seoul South Korea 2230 x(13)

New York USA 2190 x(12)

Sao Paulo Brazil 2020 x(11)

Bombay India 1985 x(10)

Delhi India 1970 x(9)

Shanghai China 1815 x(8)

Los Angeles USA 1800 x(7)

Osaka Japan 1680 x(6)

Jakarta Indonesia 1655 x(5)

Calcutta India 1565 x(4)

Cairo Egypt 1560 x(3)

Manila Philippines 1495 x(2)

Karachi Pakistan 1430 x(1)

Tabelle 1: The 15 largest world cities in 2006.

Ostap Okhrin 7 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Five Number Summary

Upper quartile FU

Lower quartile FL

Median = deepest point Extremes

Consider the order statistics→ Depth of a data value x(i): mini , n − i + 1

depth of fourth =[depth of median] + 1

2

Ostap Okhrin 8 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Median

Order statistics x(1), x(2), . . . , x(n) is the set of the ordered valuesx1, x2, . . . , xnwhere x(1) denotes the minimum and x(n) the maximum.Median M

M =

x( n+12 ) n odd

12

x( n

2) + x( n2+1)

n even

Ostap Okhrin 9 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Construction of the Boxplot

Median: 1815 (depth of data 8)Fourths (depth = 4.5): 1610=FL, 2105=FUExtremes (depth = 1): 1430, 3420F-spread: FU − FL = dFOutside bars: FU + 1.5dF , FL − 1.5dF

1. Construct the box with borders at FU and FL

2. Draw Median as | and Mean as...

3. Draw whiskers a to data within the outside bars4. Mark outliers by • if they are outside [FL − 1.5dF ,FU + 1.5dF ]

and by ? if they lie outside [FL − 3dF ,FU + 3dF ]

Ostap Okhrin 10 of 461

1500

2000

2500

3000

World Cities

Val

ues

Boxplot

Boxplot for world cities. MVAboxcity

US JAPAN EU

1520

2530

3540

Car Data

Boxplot for the mileage of U.S. American, Japanese and European cars(from left to right). MVAboxcar

GENUINE COUNTERFEIT

138

139

140

141

142

Swiss Bank Notes

Variable X6 (diagonal) of bank notes, the genuine on the left.MVAboxbank6

GENUINE COUNTERFEIT

214

214.

521

521

5.5

216

Swiss Bank Notes

Variable X1 (length) of bank notes, the genuine on the left.MVAboxbank1

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Summary: Boxplots

Median and mean bars indicate the central locations. The relative location of median (and mean) in the box is a

measure of skewness. The length of the box and whiskers is a measure of spread. The length of whiskers indicate the tail length of the distribution.

Ostap Okhrin 15 of 461

Angewandte Multivariate Statistik Comparison of Batches Boxplots

Summary: Boxplots

The outliers are marked by • if they are outside[FL − 1.5dF ,FU + 1.5dF ] and by ? if they lie outside[FL − 3dF ,FU + 3dF ]

The boxplots do not indicate multi-modality or clusters. If we compare the relative size and location of the boxes, we are

comparing distributions.

Ostap Okhrin 16 of 461

Angewandte Multivariate Statistik Comparison of Batches Histograms

Histograms

fh(x) = n−1h−1∑j∈Z

n∑i=1

Ixi ∈ Bj(x0, h)Ix ∈ Bj(x0, h)

Bj(x0, h) = [x0 + (j − 1)h, x0 + jh), j ∈ Z. [., .) denotes a left closed and right open interval. I• denotes the indicator function. h is a smoothing parameter and controls the width of the

histogram bins.

Ostap Okhrin 17 of 461

Swiss Bank Notes

h = 0.1

Dia

gona

l

138 139 140 141

04

8

Swiss Bank Notes

h = 0.3

Dia

gona

l

138 139 140 141

010

2030

Swiss Bank Notes

h = 0.2

Dia

gona

l

138 139 140 141

05

15

Swiss Bank Notes

h = 0.4D

iago

nal

138 139 140 141

020

40

Diagonal of counterfeit bank notes. Histograms with x0 = 137.8 andh = 0.1 (upper left), h = 0.2 (lower left), h = 0.3 (upper right),h = 0.4 (lower right). MVAhisbank1

Swiss Bank Notes

x_0 = 137.65

Dia

gona

l

138 139 140 141

020

40

Swiss Bank Notes

x_0 = 137.85

Dia

gona

l

138 139 140 141

020

40

Swiss Bank Notes

x_0 = 137.75

Dia

gona

l

138 139 140 141

020

40

Swiss Bank Notes

x_0 = 137.95D

iago

nal

138 139 140 141

020

40

Diagonal of counterfeit bank notes. Histograms with h = 0.4 andorigins x0 = 137.65 (upper left), x0 = 137.75 (lower left), x0 = 137.85(upper right), x0 = 137.95 (lower right). MVAhisbank2

Angewandte Multivariate Statistik Comparison of Batches Histograms

Summary: Histograms

Modes of the density are detected with a histogram. Modes correspond to strong peaks in the histogram. Histograms with the same h need not be identical. They also

depend on the origin x0 of the grid. The influence of the origin x0 is drastic. Changing x0 creates

different looking histograms.

Ostap Okhrin 20 of 461

Angewandte Multivariate Statistik Comparison of Batches Histograms

Summary: Histograms

The consequence of a too large h is a flat and unstructuredhistogram.

A too small binwidth h results in an unstable histogram.

There is an optimal bandwidth hopt =(

24√π

n

) 13 .

It is recommended to use averaged histograms. They are kerneldensities.

Ostap Okhrin 21 of 461

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Kernel densities

Histogram (at the center of a bin) can be written as

fh(x) = n−1(2h)−1n∑

i=1

I(|x − xi | ≤ h)

Define K (u) = I (|u| ≤ 1/2)

fh(x) = n−1h−1n∑

i=1

K

(x − xi

h

)K is the kernel.

Ostap Okhrin 22 of 461

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Kernel functions

K (•) Kernel

K (u) = 12 I(|u| ≤ 1) Uniform

K (u) = (1− |u|)I(|u| ≤ 1) TriangleK (u) = 3

4(1− u2)I(|u| ≤ 1) EpanechnikovK (u) = 15

16(1− u2)2I(|u| ≤ 1) Quartic (Biweight)K (u) = 1√

2πexp(−u2

2 ) = ϕ(u) Gaussian

Tabelle 2: Kernel functions.

Ostap Okhrin 23 of 461

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Kernel functions

−2 0 20

0.5

1

Uniform

−2 0 20

0.5

1

Triangle

−2 0 20

0.5

1

Epanechnikov

−2 0 20

0.5

1

Quartic (biweight)

−2 0 20

0.5

1

Gaussian

Kernel functions. MVAkernelfunctions

Ostap Okhrin 24 of 461

137 138 139 140 141 142 143

0.0

0.2

0.4

0.6

0.8

Swiss bank notes

Counterfeit / Genuine

Den

sity

est

imat

es fo

r di

agon

als

Densities of diagonals of genuine and counterfeit bank notes.MVAdenbank

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Choice of the bandwidth h

Silverman’s rule of thumbGaussian kernel

K (u) =1√2π

exp(−u2

2)

hG = 1.06σn−15

Quartic kernel

K (u) =1516

(1− u2)2I(|u| ≤ 1)

hQ = 2.62hG

Sample standard deviation: σ =

√n−1

n∑i=1

(xi − x)2

Ostap Okhrin 26 of 461

0.02 0.04 0.06

0.08

0.1

0.1

0.12

0.12

0.14

0.14

0.16

0.16

0.18

9 10 11 12

138

139

140

141

142

143

Contours of the density of X5,X6 of genuine and counterfeit banknotes. MVAcontbank2

Contours of the density of X4,X5,X6 of genuine and counterfeit banknotes. MVAcontbank3

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Summary: Kernel densities

Kernel densities estimate distribution densities by the kernelmethod.

The bandwidth h determines the degree of smoothness of theestimate f .

Kernel densities are smooth functions and they can graphicallyrepresent distributions (up to 3 dimensions).

Ostap Okhrin 29 of 461

Angewandte Multivariate Statistik Comparison of Batches Kernel densities

Summary: Kernel densities

A simple (but not necessarily correct) way to find a goodbandwidth is to compute the rule of thumb bandwidthhG = 1.06σn−1/5. This bandwidth is to be used only incombination with a Gaussian kernel ϕ.

Kernel density estimates are a good descriptive tool for seeingmodes, location, skewness, tails, asymmetry, etc.

Ostap Okhrin 30 of 461

Angewandte Multivariate Statistik Comparison of Batches Scatterplots

Scatterplots

Scatterplots - bivariate or trivariate plots of variables against eachother

Rotation of data Separation lines Draftman’s plot Brushing Parallel coordinate plots

Ostap Okhrin 31 of 461

7 8 9 10 11 12 13

138

139

140

141

142

Swiss bank notes

2D scatterplot for X5 vs. X6 of the bank notes. Genuine notes arecircles, counterfeit are triangles. MVAscabank56

Swiss bank notes

8 10 11 13 14 8 9 10 11 12

139

140

140

141

142

Lower inner frame (X4) Upper inner frame (X5)

Dia

gona

l (X

6)

3D scatterplot for (X4,X5,X6) of the bank notes. Genuine notes arecircles, counterfeit are triangles. MVAscabank456

3

X

Y

128.5 129.0 129.5 130.0 130.5 131.0 131.5

78

910

1112

13

X

Y

128.5 129.0 129.5 130.0 130.5 131.0 131.5

78

910

1112

13

X

Y

128.5 129.0 129.5 130.0 130.5 131.0 131.5

137

138

139

140

141

142

143

7 8 9 10 11 12

129.

012

9.5

130.

013

0.5

131.

0

X

Y

4

X

Y

7 8 9 10 11 12 13

78

910

1112

13

X

Y

7 8 9 10 11 12 13

137

138

139

140

141

142

143

8 9 10 11 12

129.

012

9.5

130.

013

0.5

131.

0

X

Y

8 9 10 11 12

78

910

1112

X

Y

5

X

Y

7 8 9 10 11 12 13

137

138

139

140

141

142

143

138 139 140 141 142

129.

012

9.5

130.

013

0.5

131.

0

X

Y

138 139 140 141 142

78

910

1112

X

Y

138 139 140 141 142

89

1011

12

X

Y

6

Draftman’s plot of the bank notes. The pictures in the left-handcolumn show (X3,X4), (X3,X5) and (X3,X6), in the middle we have(X4,X5) and (X4,X6), and in the lower right (X5,X6). The upper righthalf contains the corresponding density contour plots.

MVAdraftbank4

Angewandte Multivariate Statistik Comparison of Batches Scatterplots

Summary: Scatterplots

Scatterplots in two and three dimensions help us in seeingseparated points, clouds or sub-clusters.

They help us in judging positive or negative dependence. Draftman scatterplot matrices are useful for detecting structures

conditioned on values of certain other variables. As the brush of a scatterplot matrix is moving in the point cloud

we can study conditional dependence.

Ostap Okhrin 35 of 461

Angewandte Multivariate Statistik Comparison of Batches Chernoff-Flury faces

Chernoff-Flury Faces

Index

91

Index

92

Index

93

Index

94

Index

95

Index

96

Index

97

Index

98

Index

99

Index

100

Index

101

Index

102

Index

103

Index

104

Index

105

Index

106

Index

107

Index

108

Index

109

Index

110

Chernoff-Flury faces for observations 91 to 110 of the bank notes.MVAfacebank10

Ostap Okhrin 36 of 461

Angewandte Multivariate Statistik Comparison of Batches Chernoff-Flury faces

Six variables - face elements

X1 = 1, 19 (eye sizes)X2 = 2, 20 (pupil sizes)X3 = 4, 22 (eye slants)X4 = 11, 29 (upper hair lines)X5 = 12, 30 (lower hair lines)X6 = 13, 14, 31, 32 (face lines and darkness of hair)

Ostap Okhrin 37 of 461

Index

1

Index

2

Index

3

Index

4

Index

5

Index

6

Index

7

Index

8

Index

9

Index

10

Index

11

Index

12

Index

13

Index

14

Index

15

Index

16

Index

17

Index

18

Index

19

Index

20

Index

21

Index

22

Index

23

Index

24

Index

25

Index

26

Index

27

Index

28

Index

29

Index

30

Index

31

Index

32

Index

33

Index

34

Index

35

Index

36

Index

37

Index

38

Index

39

Index

40

Index

41

Index

42

Index

43

Index

44

Index

45

Index

46

Index

47

Index

48

Index

49

Index

50

Observations 1 to 50

Flury faces for observations 1 to 50 of the bank notes.MVAfacebank50

Index

51

Index

52

Index

53

Index

54

Index

55

Index

56

Index

57

Index

58

Index

59

Index

60

Index

61

Index

62

Index

63

Index

64

Index

65

Index

66

Index

67

Index

68

Index

69

Index

70

Index

71

Index

72

Index

73

Index

74

Index

75

Index

76

Index

77

Index

78

Index

79

Index

80

Index

81

Index

82

Index

83

Index

84

Index

85

Index

86

Index

87

Index

88

Index

89

Index

90

Index

91

Index

92

Index

93

Index

94

Index

95

Index

96

Index

97

Index

98

Index

99

Index

100

Observations 51 to 100

Flury faces for observations 51 to 100 of the bank notes.MVAfacebank50

Index

101

Index

102

Index

103

Index

104

Index

105

Index

106

Index

107

Index

108

Index

109

Index

110

Index

111

Index

112

Index

113

Index

114

Index

115

Index

116

Index

117

Index

118

Index

119

Index

120

Index

121

Index

122

Index

123

Index

124

Index

125

Index

126

Index

127

Index

128

Index

129

Index

130

Index

131

Index

132

Index

133

Index

134

Index

135

Index

136

Index

137

Index

138

Index

139

Index

140

Index

141

Index

142

Index

143

Index

144

Index

145

Index

146

Index

147

Index

148

Index

149

Index

150

Observations 101 to 150

Flury faces for observations 101 to 150 of the bank notes.MVAfacebank50

Index

151

Index

152

Index

153

Index

154

Index

155

Index

156

Index

157

Index

158

Index

159

Index

160

Index

161

Index

162

Index

163

Index

164

Index

165

Index

166

Index

167

Index

168

Index

169

Index

170

Index

171

Index

172

Index

173

Index

174

Index

175

Index

176

Index

177

Index

178

Index

179

Index

180

Index

181

Index

182

Index

183

Index

184

Index

185

Index

186

Index

187

Index

188

Index

189

Index

190

Index

191

Index

192

Index

193

Index

194

Index

195

Index

196

Index

197

Index

198

Index

199

Index

200

Observations 151 to 200

Flury faces for observations 151 to 200 of the bank notes.MVAfacebank50

Angewandte Multivariate Statistik Comparison of Batches Chernoff-Flury faces

Summary: Faces

Faces can be used to detect subgroups in multivariate data. Subgroups are characterized by similar looking faces. Outliers are identified by extreme faces (e.g. dark hair, smile or

happy face). If one element of X is unusual, the corresponding face element

changes significantly in shape.

Ostap Okhrin 42 of 461

Angewandte Multivariate Statistik Comparison of Batches Andrews’ Curves

Andrews’ Curves

Each multivariate observation Xi = (Xi ,1, ..,Xi ,p) ∈ Rp is transformedinto a curve as follows

p odd

fi (t) =Xi,1√

2+Xi,2 sin(t)+Xi,3 cos(t)+. . .+Xi,p−1 sin

(p − 1

2t

)+Xi,p cos

(p − 1

2t

)

p even

fi (t) =Xi,1√

2+ Xi,2 sin(t) + Xi,3 cos(t) + . . .+ Xi,p sin

(p2t)

such that the observation represents the coefficients of a so-calledFourier series, t ∈ [−π, π].

Ostap Okhrin 43 of 461

Angewandte Multivariate Statistik Comparison of Batches Andrews’ Curves

Andrews’ Curves

Subgroups are characterized by similar curves. Outliers are characterized by single curves. Order plays an important role in the interpretation.

Ostap Okhrin 44 of 461

Angewandte Multivariate Statistik Comparison of Batches Andrews’ Curves

Let us take the 96th observation of the Swiss bank note dataset,

X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7)

The Andrews’ curve is:

f96(t) =215.6√

2+ 129.9 sin(t) + 129.9 cos(t) + 9.0 sin(2t) + 9.5 cos(2t) + 141.7 sin(3t)

Ostap Okhrin 45 of 461

Andrews curves (Bank data)

−0.

250

0.25

0.5

0 1 2 3 4 5 6

Andrews’ curves of the observations 96–105 of the Swiss bank notedata. The order of the variables is 1,2,3,4,5,6. MVAandcur

Angewandte Multivariate Statistik Comparison of Batches Andrews’ Curves

Let us take the 96th observation of the Swiss bank note dataset,

X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7)

The Andrews’ curve using the reversed order of variables is:

f96(t) =141.7√

2+ 9.5 sin(t) + 9.0 cos(t) + 129.9 sin(2t) + 129.9 cos(2t) + 215.6 sin(3t)

Ostap Okhrin 47 of 461

Andrews curves (Bank data)

−0.

250

0.25

0.5

0 1 2 3 4 5 6

Andrews’ curves of the observations 96–105 of the Swiss bank notedata. The order of the variables is 6,5,4,3,2,1. MVAandcur2

Angewandte Multivariate Statistik Comparison of Batches Andrews’ Curves

Summary: Andrews’ Curves

Outliers appear as single Andrew’s curves, which looks differentfrom the rest.

A subgroup is characterized by a set of similar curves. The order of the variables plays an important role for

interpretation. The order of variables may be optimized by Principal Component

Analysis. For more than 20 observations we obtain a bad ”signal-to-ink-

ratio”, which means we cannot see the structure of so manycurves obtained.

Ostap Okhrin 49 of 461

Angewandte Multivariate Statistik Comparison of Batches Parallel coordinate plots

Parallel Coordinate Plots

Parallel Coordinate Plots Are not based on an orthogonal coordinate system Allow to see more than four dimensions

IdeaInstead of plotting observations in an orthogonal coordinate systemone draws their coordinates in a system of parallel axes. This way ofrepresentation is however sensitive to the order of the variables.

Ostap Okhrin 50 of 461

Parallel coordinates plot (Bank data)

V1 V2 V3 V4 V5 V6

00.

20.

40.

60.

81

Parallel coordinate plot of observations 96–105 MVAparcoo1

Parallel coordinates plot (Bank data)

V1 V2 V3 V4 V5 V6

00.

20.

40.

60.

81

The full bank dataset. Genuine bank notes displayed as black lines.The forged bank notes are shown as red lines. MVAparcoo2

Angewandte Multivariate Statistik Comparison of Batches Parallel coordinate plots

Summary: Parallel coordinate plots

Parallel coordinate plots overcome the visualisation problem ofthe Cartesian coordinate system for dimensions greater than 4.

Outliers are seen as outlying polygon curves. The order of variables is still important for detection of subgroups. Subgroups may be screened by selective coloring in an interactive

manner.

Ostap Okhrin 53 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

A Short Excursion into Matrix Algebra

A(n×p) =

a11 · · · a1p...

. . ....

an1 · · · anp

Ostap Okhrin 54 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Definition Notation

Transpose A>Sum A+ BDifference A− BScalar product c · AProduct A · BRank rank(A)Trace tr(A)Determinant det(A) = |A|Inverse A−1

Generalised Inverse A− : AA−A = A

Tabelle 3: Elementary matrix operations.

Ostap Okhrin 55 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Name Definition Notation Example

scalar p = n = 1 a 3

column vector p = 1 a

(13

)row vector n = 1 a>

(1 3

)vector of ones (1, . . . , 1︸ ︷︷ ︸

n

)> 1n

(11

)vector of zeros (0, . . . , 0︸ ︷︷ ︸

n

)> 0n

(00

)square matrix n = p A(p × p)

(2 00 2

)Tabelle 4: Special matrices and vectors.

Ostap Okhrin 56 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Name Definition Notation Example

diagonal matrix aij = 0, i 6= j , n = p diag(aii )

(1 00 2

)identity matrix diag(1, . . . , 1︸ ︷︷ ︸

p

) Ip(

1 00 1

)unit matrix aij = 1, n = p 1n1>n

(1 11 1

)symmetric matrix aij = aji

(1 22 3

)Tabelle 5: Special matrices and vectors.

Ostap Okhrin 57 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Name Definition Example

null matrix aij = 0(

0 00 0

)upper triangular matrix aij = 0, i < j

1 2 40 1 30 0 1

idempotent matrix A2 = A

( 12

12

12

12

)orthogonal matrix A>A = I = AA>

(1√2

1√2

1√2− 1√

2

)

Tabelle 6: Special matrices and vectors.

Ostap Okhrin 58 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Properties of a Square Matrix

For any A(n × n) and B(n × n) and any scalar c

tr(A+ B) = tr(A) + tr(B)

tr(cA) = c tr(A)

|cA| = cn|A|tr(AB) = tr(BA)

|AB| = |BA||AB| = |A||B||A−1| = |A|−1

Ostap Okhrin 59 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Eigenvalues and Eigenvectors

Square matrix A(n × n)Eigenvalue λ = Eval(A)Eigenvector γ = Evec(A)

Aγ = λγ

Using spectral decomposition, it can be shown that:

|A| =n∏

j=1

λj

tr(A) =n∑

j=1

λj

Ostap Okhrin 60 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Elementary Operations

Summary: Matrix Algebra

The determinant |A| is a product of the eigenvalues of A. The inverse of a matrix A exists if |A| 6= 0. The trace tr(A) is the sum of the eigenvalues of A. The sum of the traces of two matrices equals the trace of the

sum of the two matrices. The trace tr(AB) equals tr(BA). The rank(A) is the maximum number of linearly independent

rows (columns) of A .

Ostap Okhrin 61 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Spectral Decomposition

Every symmetric matrix A(p × p) can be written as:

A = ΓΛΓ>

=

p∑j=1

λjγjγ>j

Λ = diag(λ1, · · · , λp)

Γ = (γ1, · · · , γp)

Ostap Okhrin 62 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Covariance matrix

Σ =

(1 ρρ 1

)Eigenvalues: ∣∣∣∣ 1− λ ρ

ρ 1− λ

∣∣∣∣ = 0

λ1 = 1 + ρ, λ2 = 1− ρ, Λ = diag(1 + ρ, 1− ρ)Eigenvectors: (

1 ρρ 1

)(x1x2

)= (1 + ρ)

(x1x2

)MVAspecdecomp

Ostap Okhrin 63 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

x1 + ρx2 = x1 + ρx1ρx1 + x2 = x2 + ρx2

⇒ x1 = x2.

γ1 =

(1/√

21/√

2

).

γ2 =

(1/√

2−1/√

2

).

Γ = (γ1, γ2) =

(1/√

2 1/√

21/√

2 −1/√

2

)Check: A = ΓΛΓ>

Ostap Okhrin 64 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Eigenvectors

The direction of the first eigenvector is the main direction of the pointcloud. The second eigenvector is orthogonal to the first one.

This eigenvector direction is in general different from the LSregression shape line.

Ostap Okhrin 65 of 461

normal sample, n=150

-2 0 2

original data (x1), rotated data (x1)

-20

24

orig

inal

dat

a (y

2), r

otat

ed d

ata

(y2)

Scatterplot of observed data () (sample size n = 150) and the samedata (N) displayed in the coordinate system given by the eigenvectorsof the covariance matrix.

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Singular Value Decomposition (SVD)

A(n × p), rank(A) = r

A = Γ Λ ∆>

Γ(n× r), ∆(p × r), Γ>Γ = ∆>∆ = Ir and Λ = diag(λ

1/21 , . . . , λ

1/2r

),

λj > 0.λj = Eval(ATA)Γ and ∆ consist of the corresponding eigenvectors of AA> and A>AG-inverse of A may be defined as A− = ∆Λ−1ΓT .AA−A = A

Ostap Okhrin 67 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Summary: Spectral Decomposition

The spectral (Jordan) decomposition gives a representation of asymmetric matrix in terms of eigenvalues and eigenvectors.

The eigenvectors belonging to the largest eigenvalues point intothe "main directionöf the data.

The Jordan decomposition allows to easily compute the power ofa matrix A: Aα = ΓΛαΓ>.

A−1 = ΓΛ−1Γ>, A1/2 = ΓΛ1/2Γ>.

Ostap Okhrin 68 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Spectral Decomposition

Summary: Spectral Decomposition

The singular value decomposition (SVD) is a generalization of theJordan decomposition to non-quadratic matrices.

The direction of the first eigenvector of the covariance matrix of atwo-dimensional point cloud is different from the least squaresregression line.

Ostap Okhrin 69 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Quadratic Forms

Quadratic Forms

A(p × p) symmetric matrix can be written as

Q(x) = x>Ax =

p∑i=1

p∑j=1

aijxixj

Definiteness

Q(x) > 0 for all x 6= 0 positive definite (pd),Q(x) ≥ 0 for all x 6= 0 positive semidefinite (psd).

A is pd (psd) iff Q(x) = x>Ax is pd (psd).

Ostap Okhrin 70 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Quadratic Forms

Example:Q(x) = x>Ax = x2

1 + x22 , A =

(10

01

)Eigenvalues: λ1 = λ2 = 1 positive definiteQ(x) = (x1 − x2)2, A =

(1−1−1

1

)Eigenvalues λ1 = 2, λ2 = 0 positive semidefiniteQ(x) = x2

1 − x22

Eigenvalues λ1 = 1, λ2 = −1 indefinite.

Ostap Okhrin 71 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Quadratic Forms

TheoremIf A is symmetric and Q(x) = x>Ax is the corresponding quadraticform, then there exists a transformation x 7→ Γ>x = y such that

x> A x =

p∑i=1

λiy2i ,

where λi are the eigenvalues of A.Lemma

A > 0 ⇔ λi > 0,A ≥ 0 ⇔ λi ≥ 0, i = 1, . . . , p.

Ostap Okhrin 72 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Quadratic Forms

Theorem (Theorem 2.5)If A and B are symmetric and B > 0, then the maximum of x>Ax

x>Bx isgiven by the largest eigenvalue of B−1A. More generally,

maxx

x>Axx>Bx

= λ1 ≥ λ2 ≥ · · · ≥ λp = minx

x>Axx>Bx

,

where λ1, . . . , λp denote the eigenvalues of B−1A. The vector whichmaximises (minimises) x>Ax

x>Bx is the eigenvector of B−1A whichcorresponds to the largest (smallest) eigenvalue of B−1A. Ifx>Bx = 1, we get

maxx

x>Ax = λ1 ≥ λ2 ≥ · · · ≥ λp = minx

x>Ax

Ostap Okhrin 73 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Quadratic Forms

Summary: Quadratic forms

A quadratic form can be described by a symmetric quadraticmatrix A.

Quadratic forms can always be diagonalized. Positive definiteness of a quadratic form is equivalent to

positiveness of the eigenvalues of the matrix A. The maximum and minimum of a quadratic form under

constraints can be expressed in terms of eigenvalues.

Ostap Okhrin 74 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Derivatives

Derivatives

f : Rp → R, (p × 1) vector x :

∂f (x)

∂xcolumn vector of partial derivatives

∂f (x)

∂xj

, j = 1, . . . , p

∂f (x)

∂x>row vector of the same derivatives

∂f (x)

∂xis called the gradient of f .

Ostap Okhrin 75 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Derivatives

Second order derivatives:

∂2f (x)

∂x∂x>

(p × p) Hessian matrix of the second derivatives

∂2f (x)

∂xi∂xj

, i = 1, . . . , p, j = 1, . . . , p.

Some useful formulaeA(p × p), x(p × 1) ∈ Rp, a(p × 1) andA = A>

∂a>x

∂x=∂x>a

∂x= a

Ostap Okhrin 76 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Derivatives

Example:f : Rp → R, f (x) = a>x

a = (1, 2)>, x = (x1, x2)>

∂a>x

∂x=∂(x1 + 2x2)

∂x= (1, 2)> = a

Ostap Okhrin 77 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Derivatives

Derivatives of the quadratic form

∂x>Ax∂x

= 2Ax

∂2x>Ax∂x∂x>

= 2A

Ostap Okhrin 78 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Derivatives

Summary: Derivatives

The column vector ∂f (x)∂x is called the gradient.

The gradient ∂a>x∂x = ∂x>a

∂x equals a.

The derivative of the quadratic form ∂x>Ax∂x equals 2Ax .

The Hessian of f : Rp → R is the (p × p) matrix of the secondderivatives ∂2f (x)

∂xi∂xj.

The Hessian of the quadratic form x>Ax equals 2A.

Ostap Okhrin 79 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Partitioned Matrices

A(n × p),B(n × p), A =

(A11 A12A21 A22

)Aij(ni × pj), n1 + n2 = n and p1 + p2 = p

A+ B =

(A11 + B11 A12 + B12A21 + B21 A22 + B22

)B> =

(B>11 B>21B>12 B>22

)AB> =

(A11B>11 +A12B>12 A11B>21 +A12B>22A21B>11 +A22B>12 A21B>21 +A22B>22

)

Ostap Okhrin 80 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

A(p × p) nonsingular partitioned in such a way that A11,A22 aresquare matrices

A−1 =

(A11 A12

A21 A22

)where

A11 = (A11 −A12A−122 A21)−1 def

= (A11·2)−1

A12 = −(A11·2)−1A12A−122

A21 = −A−122 A21(A11·2)−1

A22 = A−122 +A−1

22 A21(A11·2)−1A12A−122

Ostap Okhrin 81 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Matrix A11 is non-singular

|A| = |A11||A22 −A21A−111 A12|

and A22 non-singular

|A| = |A22||A11 −A12A−122 A21|

B =

(1 b>

a A

)→ |B| = |A − ab>| = |A||1− b>A−1a|

(A− ab>)−1 = A−1 +A−1ab>A−1

1− b>A−1a

Ostap Okhrin 82 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Summary: Partitioned Matrices

For a partitioned matrix A(n × p) =

(A11 A12A21 A22

)and

B(n × p) =

(B11 B12B21 B22

)holds

A+ B =

(A11 + B11 A12 + B12A21 + B21 A22 + B22

).

Ostap Okhrin 83 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Summary: Partitioned Matrices

The product AB> equals(A11B>11 +A12B>12 A11B>21 +A12B>22A21B>11 +A22B>12 A21B>21 +A22B>22

).

Ostap Okhrin 84 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Summary: Partitioned Matrices

For A nonsingular, A11, A22 square matrices,

A−1 =

(A11 A12

A21 A22

)A11 = (A11 −A12A−1

22 A21)−1 def= (A11·2)−1

A12 = −(A11·2)−1A12A−122

A21 = −A−122 A21(A11·2)−1

A22 = A−122 +A−1

22 A21(A11·2)−1A12A−122

Ostap Okhrin 85 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Partitioned Matrices

Summary: Partitioned Matrices

For B =

(1 b>

a A

)and for non-singular A we have

|B| = |A − ab>| = |A||1− b>A−1a|. (A− ab>)−1 = A−1 + A−1ab>A−1

1−b>A−1a

Ostap Okhrin 86 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Geometrical Aspects

Distance function d : R2p → R+

d2(x , y) = (x − y)>A(x − y), A > 0

A = Ip, Euclidean distance

Ed = x ∈ Rp | (x − x0)>(x − x0) = d2

Example: x ∈ R2, x0 = 0, x21 + x2

2 = 1Norm of a vector w.r.t metric Ip

‖x‖Ip = d(0, x) =√x>x

Ostap Okhrin 87 of 461

Distance d . d2(x , y) = (x − y)>(x − y)

Iso–distance sphere. A = I2, (x1 − x01)2 + (x2 − x02)2 = d2

Iso–distance ellipsoid. Ed = x : (x − x0)>A(x − x0) = d2,γj = Evec(A), A > 0

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Angle between Vectors

Scalar product

< x , y > = x>y

< x , y >A = x>Ay

Norm of a vector

‖x‖Ip = d(0, x) =√x>x

‖x‖A =√x>Ax

Unit vectorsx : ‖x‖ = 1

Ostap Okhrin 91 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Angle between Two Vectors

Angle of vectors x and y can be calculated as

cos θ =x>y

‖x‖ ‖y‖

Example: Angle = CorrelationObservations xini=1, yini=1x = y = 0

rXY =

∑xiyi√∑x2i

∑y2i

= cos θ

Correlation corresponds to the angle between x , y ∈ Rn.

Ostap Okhrin 92 of 461

Angle between vectors.

cos θ =x>y

‖x‖‖y‖=

x1y1 + x2y2

‖x‖‖y‖= cos θ1 cos θ2 + sin θ1 sin θ2

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Column space

X (n × p) data matrix

C (X ) = x ∈ Rn | ∃a ∈ Rp so that X a = x

Projection matrixP(n × n), P = P> = P2 (P is idempotent)let b ∈ Rn, a = Pb is the projection of b on C (P)

Ostap Okhrin 94 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Projection on C (X )

X (n × p), P = X (X>X )−1X>

PX = X , P is a projector, PP = P.

Q = In − P,Q2 = Q

px =y>x

‖y‖2y

PX = XQX = 0

Ostap Okhrin 95 of 461

Projection. px = y(y>y)−1y>x =y>x

‖y‖2y

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Summary: Geometrical aspects

A distance between two p-dimensional points x , y is a quadraticform (x − y)>A(x − y) in the vectors of differences (x − y). Adistance defines the norm of a vector.

Iso-distance curves of a point x0 are all those points which havethe same distance from x0. Iso-distance curves are ellipsoidswhose principal axes are determined by the direction of theeigenvectors. The half-length of principal axes is proportional tothe inverse of the roots of the eigenvalues of A.

Ostap Okhrin 97 of 461

Angewandte Multivariate Statistik A Short Excursion into Matrix Algebra Geometrical Aspects

Summary: Geometrical aspects

The angle between two vectors x and y is given bycos θ = x>Ay

‖x‖A ‖y‖A w.r.t. the metric A. For the Euclidean distance with A = I the correlation between

two centered data vectors x and y is given by the cosine of theangle between them, i.e. cos θ = rXY .

The projection P = X (X>X )−1X> is the projection onto thecolumn space C (X ) of X .

The projection of x ∈ Rn on y ∈ Rn is given bypx = y>x

‖y‖2 y .

Ostap Okhrin 98 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Covariance

Covariance is a measure of (linear) dependency between variables.

σXY = Cov(X ,Y ) = E(XY )− (EX )(EY )

Covariance of X with itself:

σXX = Var(X ) = Cov(X ,X )

Covariance matrix for p-dimensional X :

Σ =

σX1X1 . . . σX1Xp

.... . .

...σXpX1 . . . σXpXp

Ostap Okhrin 99 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Empirical versions:

sXY = n−1n∑

i=1

(xi − x)(yi − y)

sXX = n−1n∑

i=1

(xi − x)2

Empirical covariance matrix:

S =

sX1X1 . . . sX1Xp

.... . .

...sXpX1 . . . sXpXp

Ostap Okhrin 100 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Example: Swiss bank data

X1 = length of the billX2 = height of the bill (left)X3 = height of the bill (right)X4 = distance of the inner frame to the lower borderX5 = distance of the inner frame to the upper borderX6 = length of the diagonal of the central picture.

Ostap Okhrin 101 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

X full bank dataset

S =

0.14 0.03 0.02 −0.10 −0.01 0.080.03 0.12 0.10 0.21 0.10 −0.210.02 0.10 0.16 0.28 0.12 −0.24−0.10 0.21 0.28 2.07 0.16 −1.03−0.01 0.10 0.12 0.16 0.64 −0.540.08 −0.21 −0.24 −1.03 −0.54 1.32

sX1X1 = s11 = 0.14sX4X5 = 0.16

Ostap Okhrin 102 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Scatterplots with point clouds that are ”upward-sloping” areshowing variables with positive covariance.

Scatterplots with ”downward-sloping” structure are showingnegative covariance.

Ostap Okhrin 103 of 461

7 8 9 10 11 12 13

78

910

1112

13

Swiss bank notes

Scatterplot of variables X4 vs. X5 of the full bank dataset.MVAscabank45

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Example: “classic blue” pulloverSales of “classic blue” pullovers in 10 periods.X1 number of pullovers soldX2 price in EURX3 advertisement cost in EURX4 presence of sales assistant in hours per periodDoes price have a big influence on pullovers sold?sX1X2 = −80.02

Ostap Okhrin 105 of 461

Pullovers Data

Price (X2)

Sal

es (

X1)

8012

016

020

024

0

80 90 100 110 120

Scatterplot of variables X2 vs. X1 of the pullovers dataset.MVAscapull1

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Summary: Covariance

The covariance is a measure of dependence. Covariance measures only linear dependence. There are nonlinear dependencies that have zero covariance. Zero covariance does not imply independence. Independence implies zero covariance. Covariance is scale dependent.

Ostap Okhrin 107 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Covariance

Summary: Covariance

Negative covariance corresponds to downward-slopingscatterplots.

Positive covariance corresponds to upward-sloping scatterplots. The covariance of a variable with itself is its variance

Cov(X ,X ) = σXX . For small n we should replace the factor 1

n for the computation ofthe covariance by 1

n−1 .

Ostap Okhrin 108 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Correlation

ρXY =Cov(X ,Y )√Var(X )Var(Y )

The empirical version of ρXY :

rXY =sXY√sXX sYY

Ostap Okhrin 109 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Correlation matrix:

P =

ρX1X1 . . . ρX1Xp

.... . .

...ρXpX1 . . . ρXpXp

Empirical correlation matrix:

R =

rX1X1 . . . rX1Xp

.... . .

...rXpX1 . . . rXpXp

Ostap Okhrin 110 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Example: Swiss bank dataFor genuine bank notes:

Rg =

1.00 0.41 0.41 0.22 0.05 0.030.41 1.00 0.66 0.24 0.20 −0.250.41 0.66 1.00 0.25 0.13 −0.140.22 0.24 0.25 1.00 −0.63 −0.000.05 0.20 0.13 −0.63 1.00 −0.250.03 −0.25 −0.14 −0.00 −0.25 1.00

Ostap Okhrin 111 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

For forged bank notes:

Rf =

1.00 0.35 0.24 −0.25 0.08 0.060.35 1.00 0.61 −0.08 −0.07 −0.030.24 0.61 1.00 −0.05 0.00 0.20−0.25 −0.08 −0.05 1.00 −0.68 0.370.08 −0.07 0.00 −0.68 1.00 −0.060.06 −0.03 0.20 0.37 −0.06 1.00

The correlation between X4 and X5 is negative!

Ostap Okhrin 112 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

If X and Y are independent, then Cov(X ,Y ) = ρ(X ,Y ) = 0.

The converse is not true in generalExample:

standard normal distributed random variable X

random variable Y = X 2 which is surely not independent of X

Cov(X ,Y ) = E(XY )− E(X )E(Y ) = E(X 3) = 0

(because E(X ) = 0 and E(X 2) = 1) and therefore ρ(X ,Y ) = 0, too.

Ostap Okhrin 113 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Test of Correlation

Fisher’s Z -transformation (variance stabilizing transformation):

W =12log(1 + rXY1− rXY

)E(W ) ≈ 1

2 log(

1+ρXY1−ρXY

)Var(W ) ≈ 1

(n−3)

Z =W − E(W )√

Var(W )

L−→ N(0, 1)

Ostap Okhrin 114 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Example: Car datasetCorrelation between mileage (X2) and weight (X8)n = 74, rX2X8 = −0.823

H0 : ρ = 0 H1 : ρ 6= 0

w =12log(1 + rX2X8

1− rX2X8

)= −1.166, z =

−1.166− 0√171

= −9.825

H0 : ρ = −0.75

z =−1.166− (−0.973)√

171

= −1.627.

Ostap Okhrin 115 of 461

Car Data

Mileage (X2)

Wei

ght (

X8)

2000

3000

4000

15 20 25 30 35 40

Mileage (X2) vs. weight (X8) of U.S. (star), European (plus) andJapanese (circle) cars. MVAscacar

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Summary: Correlation

The correlation is a standardized measure of dependence. The absolute value of the correlation is always less or equal to

one. Correlation measures only linear dependence. There are nonlinear dependencies that have zero correlation. Zero correlation does not imply independence.

Ostap Okhrin 117 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Correlation

Summary: Correlation

Independence implies zero correlation. Negative correlation corresponds to downward-sloping

scatterplots. Positive correlation corresponds to upward-sloping scatterplots. Fisher’s Z-transformation helps us in testing hypotheses on

correlation. For small samples, Fisher’s Z-transformation can be improved by

W ∗ = W − 3W+tanh(W )4(n−1) .

Ostap Okhrin 118 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Summary Statistics

X (n × p) data matrix

X =

x11 · · · x1p...

......

...xn1 . . . xnp

xi = (xi1, · · · , xip)> ∈ Rp: i-th observation of a p-dimensional randomvariable X ∈ Rp

Ostap Okhrin 119 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Mean

x =

x1...xp

= n−1X>1n

Empirical covariance matrix

S = n−1X>X − x x>

= n−1(X>X − n−1X>1n1>n X ) = n−1X>HX

Centering matrixH = In − n−11n1>n

Ostap Okhrin 120 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Empirical correlation matrix

R = D−1/2SD−1/2

with D = diag(sXjXj) and D−1/2 = diag(s

−1/2XjXj

) for j = 1, . . . , p.

Ostap Okhrin 121 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Linear Transformations

A (q × p) matrix

Y = XA> = (y1, . . . , yn)>

y = n−1Y>1n = AxSY = n−1Y>HY = ASXA>

Example:Let x = (1, 2)> and y = 4x , x ∈ R2

Then y = 4x = (4, 8)>.

Ostap Okhrin 122 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Mahalanobis Transformation

Z = (z1, . . . , zn)>

zi = S−1/2(xi − x), i = 1, . . . , n

SZ = n−1Z>HZ = IpZ = 0

The Mahalanobis transformation leads to standardized uncorrelatedzero mean data matrix Z.

Ostap Okhrin 123 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Summary: Summary Statistics

The center of gravity of a data matrix is given by its mean vectorx = n−1X>1n.

The dispersion of the observations in a data matrix is given by theempirical covariance matrix S = n−1X>HX .

The empirical correlation matrix is given by R = D−1/2SD−1/2.

Ostap Okhrin 124 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Summary Statistics

Summary: Summary Statistics

A linear transformation Y = XA> of a data matrix X has meanAx and empirical covariance ASXA>.

The Mahalanobis transformation is a linear transformationzi = S−1/2(xi − x) which gives a standardized, uncorrelated datamatrix Z.

Ostap Okhrin 125 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

One-sample t-test

We have iid observations x1, . . . , xn.Assume that the observations stem from N(µ, σ2).Then xn ∼ N(µ, σ2/n), i.e.

√n

(xn − µ)

σ∼ N(0, 1).

Ostap Okhrin 126 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

H0 : µ = µ0 H1 : µ 6= µ0Assume that σ2 is known:

√n|xn − µ0|

σ∼ N(0, 1)

Show that P(reject H0|H0 is true) = α.

Ostap Okhrin 127 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

Usually σ2 is not known and we have to estimate it:

σ2n =

1n − 1

n∑i=1

(xi − xn)2.

It can be shown that

√n

(xn − µ)

σn∼ tn−1.

Note: t-distribution tn approaches N(0, 1) as n→∞ (parameter n:degrees of freedom).

Ostap Okhrin 128 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

Test:H0 : E (X ) = µ0 H1 : E (X ) 6= µ0We reject H0 if

√n|xn − µ0|

σn> t1−α/2;n−1.

t1−α/2;n−1: 1− α critical value (i.e. 1− α/2 quantile) of the Student’st-distribution with (n − 1) degrees of freedom

Ostap Okhrin 129 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

Example: Car damageMcCullagh and Nelder (1989). The response variable Cn is “averagecosts of claims (in British pounds)”.H0 : average costs = 200 H1 : average costs 6= 200

Cn = 222.11

σ2n = 123.22n = 128

√n

(Cn − 200)

σn= 2.0301 > t0.975;n−1 = 1.9788

We reject that average costs are equal to 200.

Ostap Okhrin 130 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

Two-sample t-test

We have two iid samples y11, . . . , y1n and y21, . . . , y2m.Assume that Y11 ∼ N(µ1, σ

2) and Y21 ∼ N(µ2, σ2)

H0 : µ1 = µ2 H1 : µ1 6= µ2Pooled estimate of variance

σ2P =

1m + n − 2

n∑

i=1

(y1i − y1)2 +m∑j=1

(y2j − y2)2

Ostap Okhrin 131 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions One-Sample and Two-Sample t-Test

Test statistic

T =

√m + n

mn

(y1 − y2)− (µ1 − µ2)

σP∼ tn+m−2

Reject H0 if |T | > t1−α/2;n+m−2.

Ostap Okhrin 132 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Linear Model for Two Variables

yi = β0 + β1xi + εi , E (εi ) = 0, Var (εi ) = σ2, i = 1, . . . , nβ0 = intercept, β1 = slope

Estimate (β0, β1) by least squares

(β0, β1) = arg min(β0,β1)

n∑i=1

(yi − β0 − β1xi )2

β1 =sXYsXX

=Cov(X ,Y )

Var(X )

β0 = y − β1x

Ostap Okhrin 133 of 461

Price (X2)

Sal

es (

X1)

Pullovers Data

8012

016

020

024

0

80 90 100 110 120

Regression of sales (X1) on price (X2) of pullovers, β0 = 210.8,β1 = −0.36. MVAregpull

Lower inner frame (X4), genuine

Upp

er in

ner

fram

e (X

5), g

enui

ne

7.5

8.5

9.5

10.5

11.5

7 8 9 10

Swiss bank notes

Regression of upper inner frame (X5) on lower inner frame (X4) forgenuine bank notes. MVAregbank

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Total variation

Regression equations: yi = β0 + β1xi + εi and yi = β0 + β1xi

n∑i=1

(yi − y)2

︸ ︷︷ ︸SSTO

=n∑

i=1

(yi − y)2

︸ ︷︷ ︸SSTR

+n∑

i=1

(yi − yi )2

︸ ︷︷ ︸SSE

SSTO = SSTR + SSE

SSTO - Variation in the response variable (total variation)SSTR - Variation explained by linear regressionSSE - Error sum of squares

Ostap Okhrin 136 of 461

88 90 92 94 96 98 100 102

165

175

185

195

Price (X2)

Sal

es (

X1)

Pullover Data

Regression of sales (X1) on price (X2) of pullovers with highlighteddistances. MVAregzoom

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Coefficient of determination

r2 =

n∑i=1

(yi − y)2

n∑i=1

(yi − y)2=

SSTR

SSTO

r2 = 1: variation fully explained by linear regression, i.e. y is a linearfunction of x .

r2 = 1−

n∑i=1

(yi − yi )2

n∑i=1

(yi − y)2

Ostap Okhrin 138 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Example: “Classic blue” pullover dataRegress sales on price: β0 = 210.774, β1 = −0.364, r2 = 0.028.Low r2: sales are not influenced very much by the price (in a linearway).

Regression of Y on X is dissimilar to regression of X on Y .

Ostap Okhrin 139 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

t-Test for β1

H0 : β1 = 0 (ρXY = 0) H1 : β1 6= 0

Var(β1) =σ2

(n · sXX ), SE (β1) =

σ

(n · sXX )1/2 , t =β1

SE (β1)

t1−α/2;n−2: 1− α critical value (i.e. 1− α/2 quantile) of the Student’st-distribution with (n − 2) degrees of freedom

Do not reject H0 if |t| ≤ t1−α/2;n−2

Ostap Okhrin 140 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Example: Swiss bank data

Distance of the inner frame to the lower and to the upper border, i.e.X4 vs. X5.Why is negative slope to be expected?

β0 = 14.666 and β1 =sXYsXX

=−0.263470.41321

= −0.626.

|t| = |−8.064| > t0.975;98 = 1.9845

Ostap Okhrin 141 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Summary: Linear Regression

The linear regression y = β0 + β1x + ε models a linear relationbetween two one-dimensional variables.

The sign of the slope β1 is the same as that of the covariance andthe correlation of x and y .

A linear regression predicts values of Y given a possibleobservation x of X .

Ostap Okhrin 142 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Summary: Linear Regression

The coefficient of determination r2 measures the amount ofvariation in Y which is explained by a linear regression on X .

If the coefficient of determination is r2 = 1, then all points lie onone line.

The regression line of X on Y and the regression line of Y on Xare in general different.

Ostap Okhrin 143 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Linear Model for Two Variables

Summary: Linear Regression

The t-test for the hypothesis β1 = 0 is t = β1SE(β1)

, where

SE (β1) = σ(n·sXX )1/2

.

The t-test rejects the null hypothesis β = 0 at the level ofsignificance α if |t| ≥ t1−α/2;n−2 where t1−α;n−2 is the 1− α/2quantile of the Student’s t-distribution with (n − 2) degrees offreedom.

The standard error SE (β) increases/decreases with less/morespread in the X variables.

Ostap Okhrin 144 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Simple Analysis of Variance (ANOVA)

Assumptions

Average values of the response variable y are induced by onesimple factor

Factor takes on p values For each factor level, we have m = n/p observations All observations are independent

Ostap Okhrin 145 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

sample element factor levels l1 y11 · · · y1l · · · y1p

2...

......

......

......

k yk1 · · · ykl · · · ykp...

......

...m = n/p ym1 · · · yml · · · ymp

Tabelle 7: Observations structure of a simple ANOVA.

Ostap Okhrin 146 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Simple ANOVA Model

ykl = µl + εkl for k = 1, . . . ,m and l = 1, . . . , p. (1)

NoteI Each factor has a mean value µl

I Observation ykl equals the sum of µl and a zero mean randomerror εkl

I Linear regression model: m = 1, p = n and µi = α+ βxi , where xiis the i-th level value of the factor

Ostap Okhrin 147 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Example: “Classic blue” pullover data

Analyse the effect of three marketing strategies:1. Advertisement in local newspapers2. Presence of sales assistant3. Luxury presentation in shop windows

p = 3 factors, 10 different shops and n = mp = 30 observations

Ostap Okhrin 148 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

shop marketing strategyk factor l

1 2 31 9 10 182 11 15 143 10 11 174 12 15 95 7 15 146 11 13 177 12 7 168 10 15 149 11 13 1710 13 10 15

Tabelle 8: Pullover sales as function of marketing strategy.

Ostap Okhrin 149 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Do all three strategies have the same mean effect?

Test

H0 : µl = µ for l = 1, . . . , p vs. H1 : µl 6= µl ′ for some l and l ′

Alternative: one marketing strategy is better than the others

Ostap Okhrin 150 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Decomposition of sums of squares

p∑l=1

m∑k=1

(ykl − y)2 = m

p∑l=1

(yl − y)2 +

p∑l=1

m∑k=1

(ykl − yl)2

Total variation (sum of squares = SS)

SS(reduced) =

p∑l=1

m∑k=1

(ykl − y)2, y = n−1p∑

l=1

m∑k=1

ykl

Variation under H1

SS(full) =

p∑l=1

m∑k=1

(ykl − yl)2, yl = m−1

m∑k=1

ykl

Ostap Okhrin 151 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

F -test

F =SS(reduced)− SS(full)/df (r)− df (f )

SS(full)/df (f )

Degrees of freedomI Number of observations minus the number of parametersI Full model df (f ) = n − pI Reduced model df (r) = n − 1

Ostap Okhrin 152 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

ANOVA Table

SS df MS F -stat p-value

SS(explained) p − 1 SS(explained)

p−1SS(explained)/(p−1)

MSEp-value

SS(full) n − p SS(full)n−p

= MSE

SS(reduced) n − 1

F ∼ Fp−1,n−p

Test: reject H0 if F > F1−α;p−1,n−p, or if p-value< α

Ostap Okhrin 153 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Example: “Classic blue” pullover data

Reduced model: H0 : µl = µ l = 1, 2, 3Full model: H1 : µl different

df (r) = n −#parameters(r) = 30− 1 = 29

df (f ) = n −#parameters(f ) = 30− 3 = 27

SS(reduced) = 260.3

SS(full) = 157.7

F =(260.3− 157.7)/(29− 27)

157.7/27= 8.78 > F2;27(0.95) = 3.35

Ostap Okhrin 154 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

SS df MS F -stat p-value

102.6 2 51.30 8.78 0.001157.7 27 5.84

260.3 29

Ostap Okhrin 155 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

F -test in a linear regression model

Reduced model: yi = β0 + 0 · xi + εi

SS(reduced) =n∑

i=1

(yi − y)2

SS(full) =n∑

i=1

(yi − yi )2 = RSS

F =SS(reduced)− SS(full)/1

SS(full)/ (n − 2)

Ostap Okhrin 156 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Explained Variation

n∑i=1

(yi − y)2 =n∑

i=1

(β0 + β1xi − y

)2

=n∑

i=1

β21(xi − x)2

= β21n · sXX

F =β2

1n · sXXRSS/(n − 2)

=

(β1

SE(β1)

)2

Ostap Okhrin 157 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Summary: ANOVA

Simple ANOVA models an output Y as a function of one factor. The reduced model is the hypothesis of equal means. The full model is the alternative hypothesis of different means. The F -test is based on a comparison of the sum of squares under

the full and the reduced models.

Ostap Okhrin 158 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions Simple Analysis of Variance

Summary: ANOVA

The degrees of freedom are calculated as the number ofobservations minus the number of parameters.

The F -statistic is

F =SS(reduced)− SS(full)/df (r)− df (f )

SS(full)/df (f ).

Reject the null if the F -statistic is larger than the(1− α)-quantile of the Fdf (r)−df (f ),df (f ) distribution.

The F -test statistic for the slope of the linear regression modelyi = β0 + β1xi + εi is the square of the t-test statistic.

Ostap Okhrin 159 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Multiple Linear Model

y(n × 1), X (n × p), β = (β1, . . . , βp)

Approximate y by a linear combination y of columns of XFind β such that y = X β is the best fit of y = Xβ + ε (errors ε)

β = argminβ

(y −Xβ)>(y −Xβ)

= argminβ

n∑i=1

(yi − x>i β)2 =(X>X

)−1X>y ,

if X>X is of full rank.

Ostap Okhrin 160 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Linear Model with Intercept

yi = β0 + β1xi1 + . . .+ βpxip + εi i = 1, . . . , n

can be written asy = X ∗β∗ + ε

whereX ∗ = (1n X )

β∗ =

(β0β

)= (X ∗>X ∗)−1X ∗>y

Ostap Okhrin 161 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Example: “Classic blue” pullover dataApproximate the sales as a linear function of the three other variables:price (X2), advertisement (X3) and presence of sales assistants (X4)Adding a column of ones to the data (in order to estimate also theintercept β0) leads to

β0 = 65.670, β1 = −0.216, β2 = 0.485, β3 = 0.844.

Coefficient of determination: r2 = 0.907

Ostap Okhrin 162 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Remark:The coefficient of determination is influenced by the number ofregressors.For a given sample size n, the r2 value will increase by adding moreregressors into the linear model.A corrected coefficient of determination for p regressors anda constant intercept:

r2adj = r2 − p(1− r2)

n − (p + 1)

Ostap Okhrin 163 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Example: “Classic blue” pullover dataCorrected coefficient of determination:

r2adj = 0.907− 3(1− 0.9072)

10− 3− 1= 0.818.

81.8% of the variation of the response variable is explained by theexplanatory variables.

Ostap Okhrin 164 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Simple ANOVA ModelExample: “Classic blue” pullover data

X =

1m 0m 0m0m 1m 0m0m 0m 1m

m = 10, p = 3, n = mp = 30; X (n × p)β = (µ1, µ2, µ3)> parameter vectory = Xβ + ε linear model

Ostap Okhrin 165 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Reduced model (µ1 = µ2 = µ3 = µ)

βH0 = y

df (r) = n − 1

Full model (µi 6= µj)

βH1 = (X>X )−1X>ydf (f ) = n − 3

SS(reduced) =n∑

i=1

(yi − yi )2 = ‖y −X βH0‖2

SS(full) = ‖y −X βH1‖2

Ostap Okhrin 166 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Simple ANOVA Model - F−test

F =SS(reduced)− SS(full)/df (r)− df (f )

SS(full)/df (f )

=||y −X βH0 ||2 − ||y −X βH1 ||2/df (r)− df (f )

||y −X βH1||2/df (f )

Comparing the lengths of projections into different column spaces.

Ostap Okhrin 167 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Summary: Multiple Linear Model

The relation y = Xβ + ε models a linear relation betweena one-dimensional variable Y and a p-dimensional variable X . Pygives the best linear regression fit of the vector y onto C (X ). Theleast squares parameter estimator is β = (X>X )−1X>y .

The simple ANOVA model can be written as a linear model.

Ostap Okhrin 168 of 461

Angewandte Multivariate Statistik Moving to Higher Dimensions The Multiple Linear Model

Summary: Multiple Linear Model

The ANOVA model can be tested by comparing the length of theprojection vectors.

The test statistic of the F -test can be written as

||y −X βH0 ||2 − ||y −X βH1 ||2/df (r)− df (f )||y −X βH1 ||2/df (f )

.

The adjusted coefficient of determination is

r2adj = r2 − p(1− r2)

n − (p + 1).

Ostap Okhrin 169 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Multivariate Distributions

Random vector X ∈ Rp

(Multivariate) distribution function is

F (x) = P(X ≤ x) = P(X1 ≤ x1,X2 ≤ x2, . . . ,Xp ≤ xp)

f (x) denotes density of X , i.e.

F (x) =

∫ x

−∞f (u)du

∫ ∞−∞

f (u) du = 1

PX ∈ (a, b) =

b∫a

f (x)dx

Ostap Okhrin 170 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

X = (X1,X2)>, X1 ∈ Rk X2 ∈ Rp−k

Marginal density of X1 is

fX1(x1) =

∫ ∞−∞

f (x1, x2)dx2

Conditional density of X2 (conditioned on X1 = x1)

fX2|X1=x1(x2) = f (x1, x2)/fX1(x1)

Ostap Okhrin 171 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Example

f (x1, x2) =

12x1 + 3

2x2 0 ≤ x1, x2 ≤ 1,0 otherwise.

f (x1, x2) is a density since∫f (x1, x2)dx1x2 =

12

[x212

]1

0+

32

[x222

]1

0=

14

+34

= 1.

Ostap Okhrin 172 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

The marginal densities

fX1(x1) =

∫f (x1, x2)dx2 =

∫ 1

0

(12x1 +

32x2

)dx2 =

12x1 +

34

;

fX2(x2) =

∫f (x1, x2)dx1 =

∫ 1

0

(12x1 +

32x2

)dx1 =

32x2 +

14·

The conditional densities

f (x2 | x1) =12x1 + 3

2x212x1 + 3

4and f (x1 | x2) =

12x1 + 3

2x232x2 + 1

These conditional pdf’s are nonlinear in x1 and x2 although the jointpdf has a simple (linear) structure.

Ostap Okhrin 173 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Definition of independence

X1, X2 are independent iff

f (x) = f (x1, x2) = fX1(x1)fX2(x2)

Two random variables may have identical marginals but differentjoint distribution.

Ostap Okhrin 174 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Example

f (x1, x2) = 1, 0 < x1, x2 < 1,

f (x1, x2) = 1 + α(2x1 − 1)(2x2 − 1), 0 < x1, x2 < 1, −1 ≤ α ≤ 1.

fX1(x1) = 1, fX2(x2) = 1.∫ 1

01 + α(2x1 − 1)(2x2 − 1)dx2 = 1 + α(2x1 − 1)[x2

2 − x2]10 = 1.

Ostap Okhrin 175 of 461

7 8 9 10 11

0.0

0.1

0.2

0.3

0.4

0.5

Swiss bank notes

Lower Inner Frame (X4)

Den

sity

7 8 9 10 11 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Swiss bank notes

Upper Inner Frame (X5)D

ensi

ty

Univariate estimates of the density of X4 (left) and X5 (right) of thebank notes. MVAdenbank2

68

1012

14

810

12

0

0.05

0.1

0.15

0.2

Product of univariate density estimates for X4 and X5 of the banknotes. MVAdenbank3

5

10

15 510

15

0

0.05

0.1

0.15

0.2

Joint density estimate for X4 and X5 of the bank notes.MVAdenbank3

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Summary: Distributions

The cumulative distribution function (cdf) is F (x) = P(X < x). If a probability density function (pdf) f exists then

F (x) =

∫ x

−∞f (u)du.

Let X = (X1,X2)> be partitioned in subvectors X1 and X2 withjoint cdf F . Then FX1(x1) = P(X1 ≤ x1) is the marginal cdf of

X1. The marginal pdf of X1 is fX1(x1) =

∫ ∞−∞

f (x1, x2)dx2.

Ostap Okhrin 179 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multivariate Distributions

Summary: Distributions

Different joint pdf’s may have the same marginal pdf’s.

The conditional pdf of X2 given X1 = x1 is f (x2 | x1) =f (x1, x2)

fX1(x1)·

Two random variables X1,X2 are called independent ifff (x1, x2) = fX1(x1)fX2(x2). This is equivalent tof (x2 | x1) = fX2(x2).

Ostap Okhrin 180 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Moments and Characteristic Functions

EX ∈ Rp denotes the p-dimensional vector of expected values of therandom vector X

EX =

EX1...

EXp

=

∫xf (x)dx =

∫x1f (x)dx

...∫xpf (x)dx

= µ.

The properties of the expected value follow from the properties of theintegral:

E (αX + βY ) = αEX + β EY

Ostap Okhrin 181 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

If X and Y are independent then

E(XY>) =

∫xy>f (x , y)dxdy

=

∫xf (x)dx

∫y>f (y)dy = EX EY>

Definition of the covariance matrix (Σ)

Σ = Var(X ) = E(X − µ)(X − µ)>

We say that a random vector X has a distribution with the vector ofexpected values µ and the covariance matrix Σ,

X ∼ (µ,Σ)

Ostap Okhrin 182 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Properties of the Covariance Matrix

Elements of Σ are variances and covariances of the components of therandom vector X :

Σ = (σXiXj)

σXiXj= Cov(Xi ,Xj)

σXiXi= Var(Xi )

Computational formula: Σ = E(XX>)− µµ>Covariance matrix is positive semidefinite, Σ ≥ 0(variance a>Σa of any linear combination a>X cannot be negative).

Ostap Okhrin 183 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Properties of Variances and Covariances

Var(a>X ) = a> Var(X ) a =∑i ,j

aiajσXiXj

Var(AX + b) = A Var(X ) A>

Cov(X + Y ,Z ) = Cov(X ,Z ) + Cov(Y ,Z )

Var(X + Y ) = Var(X ) + Cov(X ,Y ) + Cov(Y ,X ) + Var(Y )

Cov(AX ,BY ) = A Cov(X ,Y ) B>.

Ostap Okhrin 184 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Example

f (x1, x2) =

12x1 + 3

2x2 0 ≤ x1, x2 ≤ 1,0 otherwise.

The conditional densities

f (x2 | x1) =12x1 + 3

2x212x1 + 3

4and f (x1 | x2) =

12x1 + 3

2x232x2 + 1

Ostap Okhrin 185 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

µ1 =

∫ ∫x1f (x1, x2)dx1dx2 =

∫ 1

0

∫ 1

0x1

(12x1 +

32x2

)dx1dx2

=

∫ 1

0x1

(12x1 +

34

)dx1 =

12

[x313

]1

0+

34

[x212

]1

0

=16

+38

=4 + 924

=1324

,

µ2 =

∫ ∫x2f (x1, x2)dx1dx2 =

∫ 1

0

∫ 1

0x2

(12x1 +

32x2

)dx1dx2

=

∫ 1

0x2

(14

+32x2

)dx2 =

14

[x222

]1

0+

32

[x323

]1

0

=18

+12

=1 + 48

=58·

Ostap Okhrin 186 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Covariance Matrix

σX1X1 = EX 21 − µ2

1 with

EX 21 =

∫ 1

0

∫ 1

0x21

(12x1 +

32x2

)dx1dx2

=12

[x414

]1

0+

34

[x313

]1

0=

38

σX2X2 = EX 22 − µ2

2 with

EX 22 =

∫ 1

0

∫ 1

0x22

(12x1 +

32x2

)dx1dx2

=14

[x323

]1

0+

32

[x424

]1

0=

1124

Ostap Okhrin 187 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

σX1X2 = E(X1X2)− µ1µ2 with

E(X1X2) =

∫ 1

0

∫ 1

0x1x2

(12x1 +

32x2

)dx1dx2

=

∫ 1

0

(16x2 +

34x22

)dx2

=16

[x222

]1

0+

34

[x323

]1

0=

13.

Σ =

(0.0815 0.00520.0052 0.0677

)

Ostap Okhrin 188 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Conditional Expectations

Random vector X = (X1,X2)>, X1 ∈ Rk X2 ∈ Rp−k

Conditional expectation of X2, given X1 = x1:

E(X2 | x1) =

∫x2f (x2 | x1) dx2

and conditional expectation of X1, given X2 = x2:

E(X1 | x2) =

∫x1f (x1 | x2) dx1

The conditional expectation E(X2 | x1) is a function of x1.Typical example of this setup is a simple linear regression, whereE(Y | X = x) = Xβ.

Ostap Okhrin 189 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Error term in approximation:

U = X2 − E(X2 | X1)

(1) E(U) = 0(2) E(X2|X1) is the best approximation of X2 by a function h(X1) of

X1 in the sense of mean squared error (MSE) whenMSE (h) = E[X2 − h(X1)> X2 − h(X1)] andh : Rk −→ Rp−k .

Ostap Okhrin 190 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Summary: Moments

The expectation of a random vector X is µ =∫xf (x) dx , the

covariance matrix Σ = Var(X ) = E(X − µ)(X − µ)>. We denoteX ∼ (µ,Σ).

Expectations are linear, i.e., E(αX + βY ) = αEX + β EY . IfX ,Y are independent then E(XY>) = EX EY>.

Ostap Okhrin 191 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Summary: Moments

The covariance between two random vectors X ,Y is ΣXY =Cov(X ,Y ) = E(X − EX )(Y − EY )> = E(XY>)− EX EY>. IfX ,Y are independent then Cov(X ,Y ) = 0.

The conditional expectation E(X2|X1) is the MSE bestapproximation of X2 by a function of X1.

Ostap Okhrin 192 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Characteristic Functions

The characteristic function (cf) of a random vector X ∈ Rp is definedas

ϕX (t) = E(e it>X ) =

∫e it>x f (x) dx , t ∈ Rp,

where i is the complex unit: i2 = −1.

Ostap Okhrin 193 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Properties of cf: ϕX (0) = 1, |ϕX (t)| ≤ 1

if ϕ is absolutely integrable (∫ ∞−∞|ϕ(x)|dx exists and is finite)

thenf (x) =

1(2π)p

∫ ∞−∞

e−it>xϕX (t) dt.

if X = (X1,X2, . . . ,Xp)> then for t = (t1, t2, . . . , tp)> :

ϕX1(t1) = ϕX (t1, 0, . . . , 0), . . . , ϕXp(tp) = ϕX (0, . . . , 0, tp).

Ostap Okhrin 194 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

For X1, . . . ,Xp independent RV’s and t = (t1, t2, . . . , tp)> is:

ϕX (t) =

p∏j=1

ϕXj(tj).

For X1, . . . ,Xp independent RV’s, t ∈ R is:

ϕX1+...+Xp(t) =

p∏j=1

ϕXj(t).

The characteristic function allows to recover all the cross-productmoments of any order: ∀jk ≥ 0, k = 1, . . . , p, t = (t1, . . . , tp)> wehave

E(X j1

1 · · ·Xjpp

)=

1ij1+...+jp

[∂ϕX (t)

∂t j11 · · · ∂tjpp

]t=0

.

Ostap Okhrin 195 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

X ∈ R1 follows the standard normal distribution

fX (x) =1√2π

exp(−x2

2

)

ϕX (t) =1√2π

∫ ∞−∞

e itx exp(−x2

2

)dx

= exp(− t2

2

) ∫ ∞−∞

1√2π

exp−(x − it)2

2

dx

= exp(− t2

2

),

since i2 = −1 and∫ 1√

2πexp− (x−it)2

2

dx = 1.

Ostap Okhrin 196 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Theorem (Cramér-Wold)The distribution of X ∈ Rp is completely determined by the set of all(one-dimensional) distributions of t>X , t ∈ Rp.This theorem says that we can determine the distribution of X in Rp

by specifying all the one-dimensional distributions of the linearcombinations

p∑j=1

tjXj = t>X , t = (t1, t2, . . . , tp)>.

Ostap Okhrin 197 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Summary: Characteristic Functions

The characteristic function (cf) of a random vector X isϕX (t) = E(eit>X ).

The distribution of a p-dimensional random variable X iscompletely determined by all one-dimensional distributions oft>X , t ∈ Rp (Theorem of Cramer-Wold).

Ostap Okhrin 198 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Cumulants

For a random variable X with density f and finite moments of order kthe characteristic function ϕX (t) = E(e itX ) has a derivative[

∂ϕ(j)X

∂t

]t=0

= κj , j = 1, . . . , k.

The values κj are called cumulants or semi-invariants since κj does notchange (for j > 1) under a shift transformation X 7→ X + a. Thecumulants are natural parameters for dimension reduction methods, inparticular the Projection Pursuit method.

Ostap Okhrin 199 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

The relation between the first k moments m1, . . . ,mk and thecumulants is given by

κk = (−1)k−1

∣∣∣∣∣∣∣∣∣∣∣∣∣

m1 1 . . . 0

m2

(10

)m1 . . .

......

. . ....

mk

(k − 10

)mk−1 . . .

(k − 1k − 2

)m1

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Ostap Okhrin 200 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Suppose that k = 1, thenκ1 = m1.

For k = 2 we obtain

κ2 = −

∣∣∣∣∣∣m1 1

m2

(10

)m1

∣∣∣∣∣∣ = m2 −m21

Ostap Okhrin 201 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

For k = 3 we have to calculate

κ3 = −

∣∣∣∣∣∣m1 1 0m2 m1 1m3 m2 2m1

∣∣∣∣∣∣Calculating this determinant we arrive at:

κ3 = m1

∣∣∣∣ m1 1m2 2m1

∣∣∣∣−m2

∣∣∣∣ 1 0m2 2m1

∣∣∣∣+ m3

∣∣∣∣ 1 0m1 1

∣∣∣∣= m1(2m2

1 −m2)−m2(2m1) + m3

= m3 − 3m1m2 + 2m31.

In a similar way one calculates

κ4 = m4 − 4m3m1 − 3m22 + 12m2m

21 − 6m4

1.

Ostap Okhrin 202 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

In a similar fashion we find the moments from the cumulants:

m1 = κ1

m2 = κ2 + κ21

m3 = κ3 + 3κ2κ1 + κ31

m4 = κ4 + 4κ3κ1 + 3κ22 + 6κ2κ

21 + κ4

1

A very simple relationship can be observed between the semi-invariantsand the central moments µk = E(X − µ)k , where µ = m1 as definedbefore. We have, in fact, κ2 = µ2, κ3 = µ3, κ4 = µ4 − 3µ2

2.

Ostap Okhrin 203 of 461

Angewandte Multivariate Statistik Multivariate Distributions Moments and Characteristic Functions

Skewness γ3 and kurtosis γ4 are defined as:

γ3 = E(X − µ)3/σ3

γ4 = E(X − µ)4/σ4

The skewness and kurtosis determine the shape of one-dimensionaldistributions. The skewness of a normal distribution is 0 and thekurtosis equals 3. The relation of these parameters to the cumulants isgiven by:

γ3 =κ3

κ3/22

γ4 =κ4

κ22

Ostap Okhrin 204 of 461

Angewandte Multivariate Statistik Multivariate Distributions Transformations

Transformations

Suppose X ∼ fX . What is the pdf of Y = 3X?

X = u(Y )

one-to-one transformation u: Rp → Rp

Jacobian:

J =

(∂xi∂yj

)=

∂ui (y)

∂yj

fY (y) = abs(|J |)fXu(y)

Ostap Okhrin 205 of 461

Angewandte Multivariate Statistik Multivariate Distributions Transformations

Example

(x1, . . . , xp)> = u(y1, . . . , yp)

Y = 3X → X = 13Y = u(y)

J =

13 0

. . .0 1

3

abs(|J |) =

(13

)p

Ostap Okhrin 206 of 461

Angewandte Multivariate Statistik Multivariate Distributions Transformations

Y = AX + b, A nonsingular

X = A−1(Y − b)

J = A−1

fY (y) = abs(|A|−1)fXA−1(y − b)

Ostap Okhrin 207 of 461

Angewandte Multivariate Statistik Multivariate Distributions Transformations

X = (X1,X2) ∈ R2 with density fX (x) = fX (x1, x2)

A =

(1 11 −1

), b =

(00

).

Y = AX + b =

(X1 + X2X1 − X2

)|A| = −2, abs(|A|−1) =

12, A−1 = −1

2

(−1 −1−1 1

).

fY (y) =12fX

12

(y1 + y2),12

(y1 − y2)

.

Ostap Okhrin 208 of 461

Angewandte Multivariate Statistik Multivariate Distributions Transformations

Summary: Transformations

If X has pdf fX (x), then a transformed random vector Y ,X = u(Y ), has pdf fY (y) = abs(|J |) · fXu(y), where Jdenotes the Jacobian J =

(∂u(yi )∂yj

).

In the case of a linear relation Y = AX + b the pdf’s of X and Yare related via fY (y) = abs(|A|−1)fXA−1(y − b).

Ostap Okhrin 209 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multinormal Distribution

Multinormal Distribution

The pdf of a multinormal is (assuming that Σ has full rank):

f (x) = |2πΣ|−1/2 exp−12

(x − µ)>Σ−1(x − µ)

.

X ∼ Np(µ,Σ)

Expected value is EX = µ,

Covariance matrix of X is Var(X ) = Σ > 0.(What is the meaning of the quadratic form (x − µ)>Σ−1(x − µ) inthe formula for density?)

Ostap Okhrin 210 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multinormal Distribution

Geometry of the Np(µ,Σ) Distribution

Density of Np(µ,Σ) is constant on ellipsoids of the form

(x − µ)>Σ−1(x − µ) = d2

If X ∼ Np(µ,Σ), then the variable Y = (X − µ)>Σ−1(X − µ) is χ2p

distributed, since the Mahalanobis transformation yieldsZ = Σ−1/2(X − µ) ∼ Np(0, Ip) and Y = Z>Z =

∑pj=1 Z

2j .

Ostap Okhrin 211 of 461

−1 0 1 2 3 4 5 6

−2

02

46

8

X1

X2

Normal sample

X1

X2

0.005

0.01

0.015

0.02

0.025

0.03

0.035

−2 0 2 4 6

−5

05

10

Contour Ellipses

Scatterplot of normal sample and contour ellipses for µ =

(32

)and

Σ =

(1.0 −1.5−1.5 4.0

)MVAcontnorm

Angewandte Multivariate Statistik Multivariate Distributions Multinormal Distribution

Singular Normal Distribution

Definition of “Normal” distribution in case that the matrix Σ issingular-we use its eigenvalues λi and the generalized inverse Σ−:rank(Σ) = k < p, λ1, . . . , λk

f (x) =(2π)−k/2

(λ1 · · ·λk)1/2 exp−12

(x − µ)>Σ−(x − µ)

Σ− = G-inverse

Ostap Okhrin 213 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multinormal Distribution

Summary: Multinormal Distribution

The pdf of a p-dimensional multinormal X ∼ Np(µ,Σ) is

f (x) = |2πΣ|−1/2 exp−12

(x − µ)>Σ−1(x − µ)

.

The contour curves of a multinormal are ellipsoids withhalf-lengths proportional to

√λi , where λi , i = 1, · · · , p, denote

the eigenvalues of Σ. The Mahalanobis transformation transforms X ∼ Np(µ,Σ) to

Y = Σ−1/2(X − µ) ∼ Np(0, Ip). Vice versa, one can createX ∼ Np(µ,Σ) from Y ∼ Np(0, Ip) via X = Σ1/2Y + µ.

Ostap Okhrin 214 of 461

Angewandte Multivariate Statistik Multivariate Distributions Multinormal Distribution

Summary: Multinormal Distribution

If the covariance matrix Σ is singular (i.e., rank(Σ) < p), then itdefines a singular normal distribution.

The density of a singular normal distribution is given by

f (x) =(2π)−k/2

(λ1 · . . . · λk)1/2 exp−12

(x − µ)>Σ−(x − µ)

,

where Σ− denotes the G-inverse of Σ.

Ostap Okhrin 215 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Limit Theorems

Central Limit Theorem describes the (asymptotic) behaviour ofsample meanX1,X2, . . . ,Xn, i.i.d. with Xi ∼ (µ,Σ)

√n(x − µ)

L−→ Np(0,Σ) for n −→∞.

The CLT can be easily applied for testing.Normal distribution plays a central role in statistics.

Ostap Okhrin 216 of 461

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

1000 Random Samples

Est

imat

ed a

nd N

orm

al D

ensi

ty

Asymptotic Distribution, n = 5

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

1000 Random Samples

Est

imat

ed a

nd N

orm

al D

ensi

ty

Asymptotic Distribution, n = 35

The CLT for Bernoulli distributed random variables. Sample size n = 5(left) and n = 35 (right). MVAcltbern

−4−2

02

4

−5

0

50

0.05

0.1

0.15

0.2

−4−2

02

4

−5

0

50

0.05

0.1

0.15

0.2

The CLT in the two-dimensional case. Sample size n = 5 (left) andn = 85 (right). MVAcltbern2

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Σ a consistent estimator of Σ: ΣP→ Σ.

x is asymptotically normal:

√nΣ−

12 (x − µ)

L−→ Np(0, Ip) as n→∞

Confidence interval for (univariate) mean µXi ∼ N(µ, σ2)

√n

(x − µσ

)L−→ N(0, 1) as n→∞

Ostap Okhrin 219 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Define u1−α/2 as the 1− α/2 quantile of the N(0, 1) distribution.Then we get the following 1− α confidence interval:

C1−α =

[x − σ√

nu1−α/2, x +

σ√nu1−α/2

]P(µ ∈ C1−α) −→ 1− α for n→∞.

Ostap Okhrin 220 of 461

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

EDF and CFD

X

ED

F(X

), C

DF

(X)

EmpiricalTheoretical

The standard normal cdf and the empirical distribution function forn = 100. MVAedfnormal

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

EDF and CFD

X

ED

F(X

), C

DF

(X)

EmpiricalTheoretical

The standard normal cdf and the empirical distribution function forn = 1000 MVAedfnormal

−3 −2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

X

edfs

[1..3

](x)

EDF and 2 bootstrap EDFs, n = 100

edf

1. bootstrap edf

2. bootstrap edf

The edf Fn and two bootstrap edf‘s F ∗n . MVAedfbootstrap

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Bootstrap confidence intervals

Empirical distribution functionedf Fn(x) = n−1∑n

i=1 I(xi ≤ x)Xi ∼ FX ∗i ∼ Fnx∗ = mean of bootstrap sample

supu

P∗(√

n(x∗ − x)

σ∗< u

)− P

(√n(x − µ)

σ< u

)a.s.−→ 0

Construction of confidence intervals possible! The unknowndistribution of x can be approximated by the known distribution of x∗.

Ostap Okhrin 224 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Transformation of Statistics

If√n(t − µ)

L−→ Np(0,Σ) and if f = (f1, . . . , fq)> : Rp → Rq

are real valued functions which are differentiable at µ ∈ Rp, then f (t)is asymptotically normal with mean f (µ) and covariance D>ΣD, i.e.,

√nf (t)− f (µ) L−→ Nq(0,D>ΣD) for n −→∞,

where

D =

(∂fj∂ti

)(t)

∣∣∣∣t=µ

(p × q) matrix of all partial derivatives.This theorem can be applied e.g. to find the “variance stabilizing”transformation.

Ostap Okhrin 225 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

ExampleSuppose

Xini=1 ∼ (µ,Σ); µ =

(00

), Σ =

(1 0.50.5 1

), p = 2.

We have by CLT for n→∞√n(x − µ)

L→ N(0,Σ).

The distribution of(

x21 − x2

x1 + 3x2

)?

This means to consider f = (f1, f2)> with

f1(x1, x2) = x21 − x2, f2(x1, x2) = x1 + 3x2, q = 2.

Ostap Okhrin 226 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Then f (µ) =(00

)and

D = (dij), dij =

(∂fj∂xi

)∣∣∣∣x=µ

=

(2x1 1−1 3

)∣∣∣∣x=0

=

(0 1−1 3

).

We have the covariance(0 −11 3

) (1 1

212 1

) (0 1−1 3

)=

(1 −7

2−7

2 13

)D> Σ D D>ΣD

.

This yields

√n

(x21 − x2

x1 + 3x2

)L→ N2

((00

),

(1 −7

2−7

2 13

)).

Ostap Okhrin 227 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Summary: Limit Theorems

If X1, . . . ,Xn are i.i.d. random vectors with Xi ∼ (µ,Σ), then thedistribution of

√n(x − µ) is asymptotically N(0,Σ) (Central Limit

Theorem). If X1, . . . ,Xn are i.i.d. random variables with Xi ∼ (µ, σ), then an

asymptotic confidence interval can be constructed by the CLT:

x ± σ√nu1−α/2.

Ostap Okhrin 228 of 461

Angewandte Multivariate Statistik Multivariate Distributions Limit Theorems

Summary: Limit Theorems

For small sample sizes the Bootstrap improves the precision ofthis confidence interval.

The Bootstrap estimates x∗ have the same asymptotic limit. If t is a statistic that is asymptotically normal, i.e.,√n(t − µ)

L→ Np(0,Σ), then this holds also for a function f (t),i.e.,√nf (t)− f (µ) is asymptotically normal.

Ostap Okhrin 229 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Heavy-Tailed Distributions

Introduced by Pareto, studied by Paul Lévy Applications: finance, medicine, seismology, engineering

I asset returns in financial marketsI stream flow in hydrologyI insuranceI precipitation and hurricane damage in meteorologyI earthquake prediction in seismologyI pollutionI material strength

Ostap Okhrin 230 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Definition

A distribution is called heavy-tailed if it has higher probability densityin its tail area compared with a normal distribution with same mean µand variance σ2.

Ostap Okhrin 231 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −2 0 2 4 60.

00.

20.

4

X

Y

−2f −1f 1f 2f

GaussCauchy

Distribution Comparison

Abbildung 1: Comparison of the pdf of a standard Gaussian (blue) and aCauchy distribution (red) with location parameter 0 and scale parameter 1.

MVAgausscauchy

Ostap Okhrin 232 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Kurtosis

In terms of kurtosis, a heavy-tailed distribution has kurtosis greaterthan 3, which is called leptokurtic, in contrast to mesokurticdistribution (kurtosis = 3) and platykurtic distribution (kurtosis < 3).

Ostap Okhrin 233 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Generalised Hyperbolic Distribution

Introduced by Barndorff-Nielsen and at first applied to model grainsize distributions of wind blown sands.Applications: stock price modelling, market risk measurement.

Ostap Okhrin 234 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF of GH Distribution

The density of a one-dimensional generalised hyperbolic (GH)distribution for x ∈ R is

fGH(x ;λ, α, β, δ, µ) =

=(√α2 − β2/δ)λ

√2πKλ(δ

√α2 − β2)

·Kλ−1/2α

√δ2 + (x − µ)2

(√δ2 + (x − µ)2/α)1/2−λ

eβ(x−µ),

Kλ is a modified Bessel function of the third kind with index λ

Kλ(x) =12

∫ ∞0

yλ−1e−x2 (y+y−1)dy

Ostap Okhrin 235 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Parameters

The domain of variation of the parameters is µ ∈ R and

δ ≥ 0, |β| < α, if λ > 0δ > 0, |β| < α, if λ = 0δ > 0, |β| ≤ α, if λ < 0

where µ is location, δ scale parameter.

Ostap Okhrin 236 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean and Variance of GH Distribution

E[X ] = µ+δβ√α2 − β2

Kλ+1(δ√α2 − β2)

Kλ(δ√α2 − β2)

Var[X ] = δ2

[Kλ+1(δ

√α2 − β2)

δ√α2 − β2Kλ(δ

√α2 − β2)

+β2

α2 − β2

[Kλ+2(δ

√α2 − β2)

Kλ(δ√α2 − β2)

−Kλ+1(δ

√α2 − β2)

Kλ(δ√α2 − β2)

2]]

Ostap Okhrin 237 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Hyperbolic and Normal-Inverse GaussianDistributions

With specific values of λ we obtain different sub-classes of GH.

For λ = 1 we obtain the hyperbolic distributions (HYP)

fHYP(x ;α, β, δ, µ) =

√α2 − β2

2αδK1(δ√α2 − β2)

e−α√δ2+(x−µ)2+β(x−µ)

where x , µ ∈ R, δ ≥ 0 and |β| < α.

For λ = −1/2 we obtain the normal-inverse Gaussian distribution(NIG)

fNIG (x ;α, β, δ, µ) =αδ

π

K1(α√

(δ2 + (x − µ)2))√

δ2 + (x − µ)2eδ√α2−β2+β(x−µ).

Ostap Okhrin 238 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −2 2 6

0.0

0.1

0.2

0.3

0.4

0.5

X

Y

GHNIGHYP

PDF of GH, HYP and NIG

−6 −2 2 6

0.0

0.2

0.4

0.6

0.8

1.0

X

Y

GHNIGHYP

CDF of GH, HYP and NIG

Abbildung 2: pdf (left) and cdf (right) of GH (λ = 0.5), HYP and NIG withα = 1, β = 0, δ = 1, µ = 0 MVAghdis

Ostap Okhrin 239 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Student’s t-distribution

Introduced by Gosset (1908). Published under the pseudonymßtudent"by request of his employer.Let X be a normally distributed rv with mean µ and variance σ2, andY be the rv such that Y 2/σ2 has a chi-square distribution with ndegrees of freedom. Assume that X and Y are independent, then

tdef=

X√n

Y

is distributed as Student’s t with n degrees of freedom.

Ostap Okhrin 240 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF of Student’s t-distribution

The t-distribution has the following density function

ft(x ; n) =Γ(n+12

)√nπΓ

(n2

)(1 +x2

n

)− n+12

where n is the number of degrees of freedom, −∞ < x <∞, and Γ isthe gamma function

Γ(α) =

∫ ∞0

xα−1e−xdx

Ostap Okhrin 241 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

X

Y

t3t6t30

PDF of t−distribution

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

X

Y

t3t6t30

CDF of t−distribution

Abbildung 3: pdf (left) and cdf (right) of t-distribution with different degrees offreedom (t3 stands for t-distribution with 3 degrees of freedom) MVAtdis

Ostap Okhrin 242 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean, Variance, Skewness and Kurtosis

The mean, variance, skewness and kurtosis of Student’s t-distribution(n > 4) are:

µ = 0

σ2 =n

n − 2Skewness = 0

Kurtosis = 3 +6

n − 4.

Ostap Okhrin 243 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Property

Student’s t-distribution approaches the normal distribution as nincreases, since

limn→∞

ft(x ; n) =1√2π

e−x22 .

Ostap Okhrin 244 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Tail of Student’s t-distribution

In the tail x is proportional to |x |−(n+1).

2.5 3.0 3.5 4.0

0.00

0.01

0.02

0.03

0.04

X

Y

t1t3t9t45Gaussian

Tail comparison of t−distribution

Abbildung 4: Tails of pdf curves of t-distributions. With higher degree offreedom, the t-distribution decays faster. MVAdistail

Ostap Okhrin 245 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Laplace Distribution

The univariate Laplace distribution with mean zero was introduced byLaplace (1774).The Laplace distribution can be defined as the distribution ofdifferences between two independent variates with identicalexponential distributions.

Ostap Okhrin 246 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF and CDF of Laplace Distribution

The Laplace distribution with mean µ and scale parameter θ has thepdf

fLaplace(x ;µ, θ) =12θ

e−|x−µ|θ

and the cdf

FLaplace(x ;µ, θ) =121 + sgn(x − µ)(1− e−

|x−µ|θ ),

where sgn is signum function.

Ostap Okhrin 247 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean, Variance, Skewness and Kurtosis

The mean, variance, skewness and kurtosis of the Laplace distribution:

µ = µ

σ2 = 2θ2

Skewness = 0Kurtosis = 6

Ostap Okhrin 248 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −2 2 6

0.0

0.1

0.2

0.3

0.4

0.5

X

Y

L1L1.5L2

PDF of Laplace distribution

−6 −2 2 6

0.0

0.2

0.4

0.6

0.8

1.0

XY

L1L1.5L2

CDF of Laplace distribution

Abbildung 5: pdf (left) and cdf (right) of Laplace distributions with zero meanand different scale parameters (L1 stands for Laplace distribution with θ = 1)

MVAlaplacedis

Ostap Okhrin 249 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Standard Laplace Distribution

Standard Laplace distribution has mean 0 and θ = 1

f (x) =e−|x |

2

F (x) =

ex

2 for x < 01− e−x

2 for x ≥ 0

Ostap Okhrin 250 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Cauchy Distribution

Named after Augustin Cauchy and Hendrik Lorentz.Applications in physics – the solution to the differential equation describing

forced resonance, in spectroscopy – the description of the line shape of spectral

lines.

Ostap Okhrin 251 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF and CDF of the Cauchy Distribution

fCauchy (x ;m, s) =1sπ

11 + ( x−ms )2

FCauchy (x ;m, s) =12

+1πarctan

(x −m

s

)where m and s are location and scale parameter respectively.

Ostap Okhrin 252 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Standard Cauchy Distribution

Standard Cauchy distribution has m = 0 and s = 1:

fCauchy (x) =1

π(1 + x2)

FCauchy (x) =12

+arctan(x)

π

Ostap Okhrin 253 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −2 2 6

0.0

0.2

0.4

X

Y

C1C1.5C2

PDF of Cauchy distribution

−6 −2 2 6

0.2

0.6

X

Y

C1C1.5C2

CDF of Cauchy distribution

Abbildung 6: pdf (left) and cdf (right) of Cauchy distributions with m = 0and different scale parameters (C1 stands for Cauchy distribution with s = 1)

MVAcauchy

Ostap Okhrin 254 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean, Variance, Skewness and Kurtosis

The mean, variance, skewness and kurtosis of Cauchy distribution areall undefined, since its moment generating function diverges. But ithas mode and median, both equal the location parameter m.

Ostap Okhrin 255 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mixture Model

Mixture modelling concerns modelling a distribution by a mixture(weighted sum) of different distributions.

Ostap Okhrin 256 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF of Mixture Model

The pdf of a mixture distribution

f (x) =n∑

l=1

wlpl(x)

under the constraints:

0 ≤ wl ≤ 1n∑

l=1

wl = 1∫pl(x)dx = 1,

pl(x) is the pdf of the l ’th component density and wl is a weight.

Ostap Okhrin 257 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean, Variance, Skewness and Kurtosis

µ =n∑

l=1

wlµl

σ2 =n∑

l=1

wlσ2l + (µl − µ)2

Skewness =n∑

l=1

wl

(σlσ

)3

SKl +3σ2

l (µl − µ)

σ3 +

(µl − µσ

)3

Kurtosis =n∑

l=1

wl

(σlσ

)4

Kl +6(µl − µ)2σ2

l

σ4 +4(µl − µ)σ3

l

σ4 SKl

+

(µl − µσ

)4,

where µl , σl ,SKl and Kl correspond to l ’th distribution.Ostap Okhrin 258 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Gaussian Mixture ModelsThe pdf for a Gaussian mixture:

fGM(x) =n∑

l=1

wl√2πσl

e− (x−µl )

2

2σ2l .

When Gaussian distributions have mean 0:

fGM(x) =n∑

l=1

wl√2πσl

e− x2

2σ2l ,

with variance, skewness and kurtosis

σ2 =n∑

l=1

wlσ2l Skewness = 0

Kurtosis =n∑

l=1

3wl

(σlσ

)4

Ostap Okhrin 259 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −2 2 6

0.0

0.1

0.2

0.3

0.4

X

Y

Gaussian MixtureGaussian

Pdf of a Gaussian mixture and Gaussian

−6 −2 2 6

0.0

0.2

0.4

0.6

0.8

1.0

XY

Gaussian MixtureGaussian

Cdf of a Gaussian mixture and Gaussian

Abbildung 7: pdf (left) and cdf (right) of a Gaussian mixture MVAmixture

Remark The Gaussian Mixture is not in general a Gaussiandistribution.

Ostap Okhrin 260 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Multivariate Generalised HyperbolicDistribution

The multivariate Generalised Hyperbolic Distribution (GHd) has thefollowing pdf

fGHd (x ;λ, α, β, δ,∆, µ) = adKλ− d

2

α√δ2 + (x − µ)>∆−1(x − µ)

α−1

√δ2 + (x − µ)>∆−1(x − µ)

d2−λ

eβ>(x−µ)

ad = ad(λ, α, β, δ,∆) =

(√α2 − β>∆β/δ

)λ(2π)

d2 Kλ(δ

√α2 − β>∆β

.

Ostap Okhrin 261 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Parameters of GHd

The domain of variation of the parameters:

λ ∈ R, β, µ ∈ Rd

δ > 0, α > β>∆β∆ ∈ Rd×d positive definite matrix|∆| = 1

For λ = d+12 we obtain the multivariate hyperbolic (HYP) distribution;

for λ = −12 we get the multivariate normal inverse Gaussian (NIG)

distribution.

Ostap Okhrin 262 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Second Parameterization

Blæsild and Jensen (1981) introduced a second parameterization(ζ,Π,Σ), where

ζ = δ√α2 − β>∆β

Π = β

√∆

α2 − β>∆β

Σ = δ2∆

Ostap Okhrin 263 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Second Parameterization

The mean and variance of X ∼ GHd

E[X ] = µ+ δRλ(ζ)Π∆12

Var[X ] = δ2ζ−1Rλ(ζ)∆ + Sλ(ζ)(Π∆

12 )>(Π∆

12 )

where

Rλ(x) =Kλ+1(x)

Kλ(x)

Sλ(x) =Kλ+2(x)Kλ(x)− K 2

λ+1(x)

K 2λ(x)

Ostap Okhrin 264 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Multivariate t-distribution

If X ∼ Np(µ,Σ) and Y ∼ χ2n are independent and X

√n/Y = t − µ,

then the pdf of t is

ft(t; n,Σ, µ) =Γ (n + p)/2

Γ(n/2)np/2πp/2 |Σ|1/21 + 1

n (t − µ)>Σ−1(t − µ)(n+p)/2

The distribution of t is the noncentral t-distribution with n degrees offreedom and the noncentrality parameter µ.

Ostap Okhrin 265 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Multivariate Laplace Distribution

Let g and G be the pdf and cdf of a d-dimensional Gaussiandistribution Nd(0,Σ), the pdf and cdf of a multivariate Laplacedistribution can be written as

fMLaplaced (x ;m,Σ) =

∫ ∞0

g(z−12 x − z

12m)z−

d2 e−zdz

FMLaplaced (x ,m,Σ) =

∫ ∞0

G (z−12 x − z

12m)e−zdz

Ostap Okhrin 266 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

PDF of Multivariate Laplace Distribution

The pdf can also be described as

fMLaplaced (x ;m,Σ) =2ex

>Σ−1m

(2π)d2 |Σ|

12

(x>Σ−1x

2 + m>Σ−1m

)λ2

×Kλ(√

(2 + m>Σ−1m)(x>Σ−1x))

where λ = 2−d2 and Kλ(x) is the modified Bessel function of the third

kind

Kλ(x) =12

(x

2

)λ ∫ ∞0

t−λ−1e−t−x24t dt, x > 0

Ostap Okhrin 267 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Mean and Variance of Multivariate LaplaceDistribution

E[X ] = m

Cov[X ] = Σ + mm>

Ostap Okhrin 268 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Multivariate Mixture Model

A multivariate mixture model comprises multivariate distributions, e.g.the pdf of a multivariate Gaussian distribution can be written as

f (x) =n∑

l=1

wl

|2πΣl |12e−

12 (x−µl )>Σ−1(x−µl )

Ostap Okhrin 269 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

Generalised Hyperbolic Distribution

The GH distribution has an exponential decaying speed

fGH(x ;λ, α, β, δ, µ = 0) ∼ xλ−1e−(α−β)x as x →∞.

Ostap Okhrin 270 of 461

Angewandte Multivariate Statistik Multivariate Distributions Heavy-Tailed Distributions

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

X

Y

LaplaceNIGCauchyGaussian

Distribution comparison

−5.0 −4.8 −4.6 −4.4 −4.2 −4.0

0.00

00.

005

0.01

00.

015

0.02

0

X

Y

LaplaceNIGCauchyGaussian

Tail comparison

Abbildung 8: Graphical comparison of tail behavior. For all distributions meansequal 0 and variances equal 1. The NIG distribution (line) with λ = − 1

2decays second fast in the tails and has the highest peak. The Cauchy (dots)distribution has the lowest peak and the fattest tails. MVAghadatail

Ostap Okhrin 271 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Copulae vs Normal Distribution

1. The empirical marginal distributions are skewed and fat tailed.2. Multivariate normal distribution does not consider the possibility

of extreme joint co-movement of asset returns.The dependency structure of portfolio asset returns is differentfrom the Gaussian one.

Ostap Okhrin 272 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Advantages

1. Copulae are useful tools to simulate asset return distributions in amore realistic way.

2. Copulae allow to model the dependence structure independentlyfrom the marginal distributions

I construct a multivariate distribution with different marginsI the dependence structure is given by the copula.

Ostap Okhrin 273 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Dependency Structures

−4 −2 0 2 4

−4

−2

02

4

−4 −2 0 2 4

−4

−2

02

4

−4 −2 0 2 4

−4

−2

02

4

Abbildung 9: Scatter plots of bivariate samples with different dependencystructures and equal correlation coefficients.

Ostap Okhrin 274 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Varying Dependency

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Bayer

Sie

men

s

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Bayer

Sie

men

s

Abbildung 10: Standardized log returns of Bayer and Siemens 20000103-20020101 (left) and 20040101-20060102 (right). MVAscalogret

Ostap Okhrin 275 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Outline

1. Motivation X2. Copulae3. Parameter Estimation4. Sampling from Copulae5. Tail Dependence6. Value-at-Risk with Copulae7. Application

Ostap Okhrin 276 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Copulae

A copula is a multivariate distribution function defined on the unitcube [0, 1]d , with uniformly distributed margins.

P(X1 ≤ x1, . . . ,Xd ≤ xd) = C P(X1 ≤ x1), . . . ,P(Xd ≤ xd)= C F1(x1), . . . ,Fd(xd)

Ostap Okhrin 277 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Applications

1. medicine2. hydrology3. finance (portfolio selection, time series, risk management)

2

the others work in banks, insurance companies and financial institutions. Their writings appeared in some 165 journals and conference proceedings. The most striking feature of the data set is the rapid growth in the annual number of contributions to the subject. This is illustrated in Figure 1. More detailed examination reveals that the growth falls into three periods:

a) Before 1986, the literature was sparse and mostly mathematical. The concept of copula can be traced back at least to the work of Wassily Hoeffding and Maurice Fréchet, though the term itself was coined by Sklar (1959). Many contributions were related to the study of probabilistic metric spaces, as described in the book by Schweizer & Sklar (1983).

b) Beginning in 1986, one can see a slow, systematic rise in the number of publications. Growth was largely due to the emergence of the concept of copula in statistics and to three conferences devoted to the subject: Rome (1990), Seattle (1993), and Prague (1996).

c) From 1999 on, the number of contributions grew considerably. The books by Joe (1997) and Nelsen (1999) were influential in disseminating copula theory; the book by Drouet-Mari & Kotz (2001), which focuses on correlation and dependence, is also noteworthy. Actuarial and financial applications were fuelled by Frees & Valdez (1998) and Embrechts et al. (1999), who illustrated the potential for copula modeling in these fields.

Figure 1. Number of documents on copula theory, 1971–2005 3. Breakdown by field of study What is the part of finance to the spectacular growth of copula methodology in the past few years? To investigate this issue, we subjectively grouped the 871 documents in our database into 9 mutually exclusive categories: mathematics; statistics; biostatistics; operations research; natural sciences; engineering; actuarial science; economics; and finance. We achieved this classification by carefully examining the contents of each document. About 1% of them did not match any of the categories and were left unclassified. Figure 2 shows the results of the grouping. Even though people in finance have been interested in copulas only since 2000, they produced the largest proportion of documents, i.e., 41%. Next come statistics (28%), biostatistics (10%), mathematics (8%), and actuarial science (8%). Interestingly, in June 2006 finance and actuarial science together contributed 47% of the literature, whereas mathematics, statistics, and biostatistics together accounted for 46%. No doubt

0 25 50 75

100 125 150 175 200 225

1972 1974 197619781980198219841986198819901992199419961998200020022004

3

finance-related documents now account for over half of the literature on the subject. We will later discuss the nature of these contributions.

2%

8%

10%

28%

41%

2%1% 1%

1%

6%

Unclassified Economics Operations research Engineering

Natural sciences Actuarial science Mathematics Biostatistics

Statistics Finance

Figure 2. Breakdown by discipline of the 871 documents in the database

The level of activity in each discipline is also reflected by Table 1, which lists the peer-review journals that carried the largest number of articles concerned with copulas. As of June 2006, statistics continued to lead the rooster. This is not surprising, given that copulas have a long history in this area. Interestingly, Risk Magazine and Quantitative Finance make the list, even though the earliest papers on the topic appeared there in 2001. A fair proportion of copula-related articles in Insurance: Mathematics and Economics also pertain to finance.

Table 1. List of journals that published the largest number of copula-related articles

Rank Journal Papers published

1 Journal of Multivariate Analysis 29 2 Statistics & Probability Letters 26 3 Insurance, Mathematics & Economics 23 4 Communications in Statistics: Theory & Methods 19 5 Biometrika 14 6 Risk 14 7 The Canadian Journal of Statistics 12 8 Biometrics 12 9 Quantitative Finance 11

10 Journal of Nonparametric Statistics 10

Abbildung 11: Number of documents on copula theory, 1971 - 2005. Break-down by discipline of the 871 documents in the database (41% Finance, 28%Statistics, 10% Biostatistics, 8% Mathematics, 6% Insurance)

Ostap Okhrin 278 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

F-volume

Let U1 and U2 be two sets in R = R ∪ +∞ ∪ −∞ and considerthe function F : U1 × U2 −→ R.The F -volume of a rectangle B = [x1, x2]× [y1, y2] ⊂ U1 × U2 isdefined as:

VF (B) = F (x2, y2)− F (x1, y2)− F (x2, y1) + F (x1, y1) (2)

Ostap Okhrin 279 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

2-increasing Function

F is said to be a 2-increasing function if for everyB = [x1, x2]× [y1, y2] ⊂ U1 × U2,

VF (B) ≥ 0 (3)

Remark Note, that “to be 2-increasing function” neither implies nor isimplied by “to be increasing in each argument”.

Ostap Okhrin 280 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

2-increasing Function

LemmaLet U1 and U2 be non-empty sets in R and let F : U1 ×U2 −→ R be atwo-increasing function. Let x1, x2 be in U1 with x1 ≤ x2, and y1, y2be in U2 with y1 ≤ y2. Then the function t 7→ F (t, y2)− F (t, y1) isnon-decreasing on U1 and the function t 7→ F (x2, t)− F (x1, t) isnon-decreasing on U2.

Ostap Okhrin 281 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Grounded Function

If U1 and U2 have a smallest element minU1 and minU2 respectively,then we say, that a function F : U1 × U2 −→ R is grounded if :

for all x ∈ U1 : F (x ,minU2) = 0 and (4)for all y ∈ U2 : F (minU1, y) = 0 (5)

Ostap Okhrin 282 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Distribution Function

A distribution function is a function from R2 7→ [0, 1] which: is grounded is 2-increasing satisfies F (∞,∞) = 1.

Ostap Okhrin 283 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Margins

If U1 and U2 have a greatest element maxU1 and maxU2 respectively,then we say, that a function F : U1 × U2 −→ R has margins and thatthe margins of F are given by:

F (x) = F (x ,maxU2) for all x ∈ U1 (6)F (y) = F (maxU1, y) for all y ∈ U2 (7)

Ostap Okhrin 284 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Bivariate Copulae

A 2-dimensional copula is a function C : [0, 1]2 → [0, 1] with thefollowing properties:1. For every u ∈ [0, 1], C (0, u) = C (u, 0) = 0 (grounded).2. For every u ∈ [0, 1], C (u, 1) = u and C (1, u) = u.

3. For every (u1, u2), (v1, v2) ∈ [0, 1]× [0, 1] with u1 ≤ v1 andu2 ≤ v2: C (v1, v2)− C (v1, u2)− C (u1, v2) + C (u1, u2) ≥ 0(2-increasing).

Ostap Okhrin 285 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Fréchet-Hoeffding Bounds

1. every copula C satisfies

W (u1, u2) ≤ C (u1, u2) ≤ M(u1, u2)

2. upper and lower bounds are copulae

M(u1, u2) = min(u1, u2)

W (u1, u2) = max(u1 + u2 − 1, 0)

Ostap Okhrin 286 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Fréchet Copulae

Abbildung 12: M(u, v) = min(u, v), W (u, v) = max(u + v − 1, 0)

and Π(u, v) = uv SFEfrechet

Fréchet, Maurice R. on BBI:

Ostap Okhrin 287 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Sklar’s Theorem in Two Dimensions

Let F be a two-dimensional distribution function with marginaldistribution functions FX1 and FX2 . Then a copula C exists such thatfor all x1, x2 ∈ R2:

F (x1, x2) = CCC FX1 (x1) ,FX2 (x2) (8)

Moreover, if FX1 and FX2 are continuous, then C is unique. OtherwiseC is uniquely determined on the Cartesian product Im(FX1)× Im(FX2).Conversely, if C is a copula and FX1 and FX2 are distribution functions,then F defined by (24) is a two-dimensional distribution function withmarginals FX1 and FX2 .

Ostap Okhrin 288 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Gauss Copula

C(u1, u2) = ΦρΦ−1(u1),Φ−1(u2)

=

Φ−1(u1)∫−∞

Φ−1(u2)∫−∞

1

2π√

1− ρ2exp

−x2 − 2ρxy + y2

2(1− ρ2)

dx dy

Abbildung 13: Gauss copula density, ρ = 0.4. MSRpdf_cop_Gauss

Gauss, Carl F. on BBI:

Ostap Okhrin 289 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

t-Student Copula

C(u1, u2) = tρ,νt−1ν (u1), t−1ν (u2)

=

t−1ν (u1)∫−∞

t−1ν (u2)∫−∞

12π√

1− ρ2exp

1 +

x2 − 2ρxy + y2

ν(1− ρ2)

−(ν+2)/2

dx dy

Abbildung 14: t-Student copula density, ν = 3, ρ = 0.4.MSRpdf_cop_tStudent

Gosset, Emil J. on BBI:

Ostap Okhrin 290 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Archimedean Copulae

Archimedean copula:

C (u, v) = ψ[−1]ψ(u) + ψ(v)

for a continuous, decreasing and convex ψ, ψ(1) = 0.

ψ[−1](t) =

ψ−1(t), 0 ≤ t ≤ ψ(0),0, ψ(0) < t ≤ ∞.

The function ψ is a generator of the Archimedean copula.For ψ(0) =∞: ψ[−1] = ψ−1 and the ψ is called a strict generator.

Ostap Okhrin 291 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Gumbel Copula

C(u, v) = exp[−

(− log u)θ + (− log v)θ 1

θ]

Abbildung 15: Gumbel copula density, parameter θ = 2.MSRpdf_cop_Gumbel

E. Gumbel on BBI:

Ostap Okhrin 292 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Clayton Copula

C(u, v) = max

(u−θ + v−θ − 1)−1θ , 0

Abbildung 16: Clayton copula density, θ = 2. MSRpdf_cop_Clayton

Ostap Okhrin 293 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Frank Copula

C(u, v) = −1θ

log

1 +(e−θu − 1)(e−θv − 1)

e−θ − 1

Abbildung 17: Frank copula density, θ = 2. MSRpdf_cop_Frank

Ostap Okhrin 294 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X

YClayton

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X

Y

Gumbel

Abbildung 18: Monte Carlo sample of 10.000 realizations of pseudo randomvariable with uniform marginals in [0, 1] and dependence structure given byClayton (left) and Gumbel (right) copula with θ = 3. MVAgumbelclayton

Ostap Okhrin 295 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Transformations of Margins

If (X1,X2) have copula C and set g1, g2 two continuous increasingfunctions, then g1 (X1) , g2 (X2) have the copula C, too.

Ostap Okhrin 296 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Product Copula

Independence implies that the product of the cdf’s FX1 and FX2 equalsthe joint distribution function F , i.e.:

F (x1, x2) = FX1(x1)FX2(x2) (9)

Thus, we obtain the independence or product copulaC = Π(u, v) = uv .

Ostap Okhrin 297 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Product Copula

Let X1 and X2 be random variables with continuous distributionfunctions F1 and F2 and joint distribution function H.Then X1 and X2 are independent if and only if CX1X2 = Π.According to Sklar’s Theorem, there exists a unique copula C with

P(X1 ≤ x1,X2 ≤ x2) = H(x1, x2)

= C F1(x1),F2(x2)= F1(x1) · F2(x2)

Ostap Okhrin 298 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Partial DerivativesLet C (u, v) be a copula. For any v ∈ I , the partial derivative ∂C(u,v)

∂vexists for almost all u ∈ I . For such u and v one has:

∂C (u, v)

∂v∈ I (10)

The analogous statement is true for the partial derivative ∂C(u,v)∂u :

∂C (u, v)

∂u∈ I (11)

Moreover, the functions

u 7→ Cv (u)def= ∂C (u, v)/∂v and

v 7→ Cu(v)def= ∂C (u, v)/∂u

are defined and non-increasing almost everywhere on I .Ostap Okhrin 299 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Copulae in d-Dimensions

Let U1,U2, . . . ,Ud be non-empty sets in R and consider the functionF : U1 × U2 × . . .× Ud −→ R. For a = (a1, a2, . . . , ad) andb = (b1, b2, . . . , bd) with a ≤ b (i.e. ak ≤ bk for all k) letB = [a, b] = [a1, b1]× [a2, b2]× . . .× [ad , bd ] be the d-box withvertices c = (c1, c2, . . . , cd). It is obvious, that each ck is either equalto ak or to bk .

Ostap Okhrin 300 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

F -volume

The F -volume of a d-boxB = [a, b] = [a1, b1]× [a2, b2]× . . .× [ad , bd ] ⊂ U1 ×U2 × . . .×Ud isdefined as follows:

VF (B) =d∑

k=1

sgn(ck)F (ck) (12)

where sgn(ck) = 1, if ck = ak for even k and sgn(ck) = −1, if ck = akfor odd k .

Ostap Okhrin 301 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

d-increasing Function

F is said to be a d-increasing function if for all d-boxes B with verticesin U1 × U2 × . . .× Ud holds:

VF (B) ≥ 0. (13)

Ostap Okhrin 302 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Grounded Function

If U1,U2, . . . ,Ud have a smallest elementminU1,minU2, . . . ,minUd respectively, then we say, that a functionF : U1 × U2 × . . .× Ud −→ R is grounded if :

F (x) = 0 for all x ∈ U1 × U2 × . . .× Ud (14)

such that xk = minUk for at least one k.

Ostap Okhrin 303 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Multivariate Copula

A d-dimensional copula is a function C : [0, 1]d → [0, 1]:1. C (u1, . . . , ui−1, 0, ui+1, . . . , ud) = 0 (at least one ui is 0);2. u ∈ [0, 1]d , C (1, . . . , 1, ui , 1, . . . , 1) = ui (all coordinates except

ui is 1)3. For each u < v ∈ [0, 1]d (ui < vi )

VC [u, v ] =∑a

sgn(a)C (a) ≥ 0

where a is taken over all vertices of [u, v ]. sgn(a) = 1 if ak = ukfor an even number of k ′s and sgn(a) = −1 if ak = uk for an oddnumber of k ′s (d-increasing).

Ostap Okhrin 304 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Sklar’s Theorem

For a distribution function F with marginals FX1 . . . ,FXd, there exists

a copula C : [0, 1]d → [0, 1], such that

F (x1, . . . , xd) = CCCFX1(x1), . . . ,FXd(xd) (15)

for all xi ∈ R, i = 1, . . . , d . If FX1 , . . . ,FXdare cts, then C is unique.

If C is a copula and FX1 , . . . ,FXdare cdfs, then the function F defined

in (15) is a joint cdf with marginals FX1 , . . . ,FXd.

Ostap Okhrin 305 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

a copula C and marginal distributions can be "coupled"togetherinto a distribution function:

FX (x1, . . . , xd) = CFX1(x1), . . . ,FXd(xd)

a (unique) copula is obtained from "decouplingëvery (continuous)multivariate distribution function from its marginal distributions:

C (u1, . . . , ud) = FXF−1X1

(u1), . . . ,F−1Xd

(ud)

uj = FXj(xj), j = 1, . . . , d

Ostap Okhrin 306 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

if C is absolute continuous, there exists a copula density

c(u1, . . . , ud) =∂dC (u1, . . . , ud)

∂u1 . . .∂ud

the joint density fX is

fX (x1, . . . , xd) = cFX1(x1), . . . ,FXd(xd)

d∏j=1

fj(xj)

Ostap Okhrin 307 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Fréchet-Hoeffding Bounds, Product Copula

1. Every copula C satisfies

W d(u1, . . . , ud) ≤ C (u1, . . . , ud) ≤ Md(u1, . . . , ud)

2. Upper and lower bounds

Md(u1, . . . , ud) = min(u1, . . . , ud)

W d(u1, . . . , ud) = max

(d∑

i=1

ui − d + 1, 0

)

3. Product copula Πd(u1, . . . , ud) =∏d

j=1 uj

4. The functions Md and Πd are d-copulae for all d ≥ 2, thefunction W d is not a d-copula for any d > 2.

Ostap Okhrin 308 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Multivariate Elliptical Copulae

Gauss∫ Φ−1(u1)−∞ . . .

∫ Φ−1(ud )−∞ (2π)−

d2 |R|−

12 exp

(−1

2 r>R−1r

)dr1 . . . drd ,

where r = (r1, . . . , rd)>

t-Student∫ t−1ν (u1)−∞ . . .

∫ t−1ν (ud )−∞ (2π)−

d2 |R|−

12

(1 + r>R−1r

ν

)− v+d2

dr1 . . . drd

where r = (r1, . . . , rd)>

Ostap Okhrin 309 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Multivariate Archimedean Copulae

Gumbel

C (u1, . . . , ud) = exp[−

(− log u1)θ + . . .+ (− log ud)θ 1θ

] Cook-Johnson

C (u1, . . . , ud) =

d∑j=1

u−θj − d + 1

− 1θ

Frank

C (u1, . . . , ud) = −1θlog1 +

(e−θu1 − 1) . . . (e−θud − 1)

(e−θ − 1)d−1

Ostap Okhrin 310 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Dimensionality

In d-dimension1. Elliptical Copulae: correlation matrix with d(d−1)

2 parameters2. Archimedean Copulae: 1 parameter

Ostap Okhrin 311 of 461

Angewandte Multivariate Statistik Multivariate Distributions Copulae

Conclusions

Pluses of copulae flexible and wide range of dependence easy to simulate, estimate, implement explicit form of densities of copulae modelling of fat tails, assymetries

Minuses of copulae Elliptical: correlation matrix, symmetry Archimedean: too restrictive, single parameter, exchangable selection of copula

Ostap Okhrin 312 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Theory of the Multinormal

Elementary Properties of the MultinormalThe pdf of X ∼ Np (µ,Σ) is given by:

f (x) = |2πΣ|−1/2 exp−12

(x − µ)>Σ−1(x − µ)

The expectation and variance are respectively given by:

E(X ) = µ,Var(X ) = Σ

Ostap Okhrin 313 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Linear transformations

Linear transformations turn normal random variables into normalrandom variables. X ∼ Np(µ, Σ),A(p × p), c ∈ Rp

Y = AX + c ∼ Np(Aµ+ c ,AΣA>).

Ostap Okhrin 314 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

TheoremX =

(X1

X2

)∼ Np(µ,Σ), X1 ∈ Rr X2 ∈ Rp−r

X2.1 = X2 − Σ21Σ−111 X1 with

Σ =(

Σ11 Σ12Σ21 Σ22

).

⇒ X1 ∼ Nr (µ1,Σ11),independent

⇒ X2.1 ∼ Np−r (µ2.1,Σ22.1)

µ2.1 = µ2 − Σ21Σ−111 µ1,

Σ22.1 = Σ22 − Σ21Σ−111 Σ12.

Ostap Okhrin 315 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Corollary

Let X =

(X1

X2

)∼ Np(µ,Σ).

Σ12 = 0 if and only if X1 is independent of X2.

The independence of two linear transforms of a multinormal X can beshown via the following corollary.

CorollaryIf X ∼ Np(µ,Σ),A and B matrices, then AX and BX are independentif and only if AΣB> = 0.

Ostap Okhrin 316 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

TheoremIf X ∼ Np(µ,Σ) and A(q × p), c ∈ Rq, q ≤ p, then Y = AX + c is aq-variate Normal, i.e.,

Y ∼ Nq(Aµ+ c ,AΣA>).

Ostap Okhrin 317 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

TheoremThe conditional distribution of X2 given X1 = x1 is normal with meanµ2 + Σ21Σ−1

11 (x1 − µ1) and covariance Σ22.1, i.e.,

(X2 | X1 = x1) ∼ Np−r (µ2 + Σ21Σ−111 (x1 − µ1),Σ22.1).

The conditional mean E(X2 | X1 = x1) is a LINEAR function of X1!

Ostap Okhrin 318 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Example

p = 2, r = 1, µ =

(00

), Σ =

(1−0.8

−0.82

)Σ11 = 1, Σ21 = −0.8, Σ22.1 = 2− (0.8)2 = 1.36.

⇒ fX1(x1) = 1√2π

exp(− x21

2

)⇒ f (x2 | x1) = 1√

2π(1.36)exp− (x2+0.8x1)2

2·(1.36)

.

Ostap Okhrin 319 of 461

−10 −5 0 50.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

01

23

45

Conditional Normal Densities f(X2|X1)

Shifts in the conditional density. MVAcondnorm

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

TheoremIf X1 ∼ Nr (µ1,Σ11) and (X2|X1 = x1) ∼ Np−r (Ax1 + b,Ω) where Ωdoes not depend on x1, then

X =

(X1

X2

)∼ Np(µ,Σ),

where

µ =

(µ1

Aµ1 + b

)and

Σ =

(Σ11 Σ11A>AΣ11 Ω +AΣ11A>

).

Ostap Okhrin 321 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Conditional Approximations

Best approximation of X2 ∈ Rp−r by X1 ∈ Rr :

X2 = E(X2|X1) + U = µ2 + Σ21Σ−111 (X1 − µ1)

= β0 + BX1 + U

with B = Σ21Σ−111 , β0 = µ2 − Bµ1 and U ∼ Np−r (0,Σ22.1).

Ostap Okhrin 322 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Consider the case where X2 ∈ R, i.e., r = p − 1.Now B is (1× r)-row vector β> such that:

X2 = β0 + β> X1 + U.

This means that the best MSE approximation of X2 by a function ofX1 is a hyperplane.

Ostap Okhrin 323 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Σ =

(Σ11 σ12σ21 σ22

)with σ12 ∈ Rr and σ22 ∈ R.

Marginal variance of X2:

σ22 = β>Σ11β + σ22.1 = σ21Σ−111 σ12 + σ22.1.

Squared multiple correlation between X2 and the r variables X1:

ρ22.1...r =

σ21Σ−111 σ12

σ22.

Ostap Okhrin 324 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Example: classic blue pullover data

Suppose that X1 (sales), X2 (price), X3 (advertisement) and X4 (salesassistants) are normally distributed with

µ =

172.7104.6104.093.8

and Σ =

1037.21−80.02 219.841430.70 92.10 2624.00271.44 −91.58 210.30 177.36

.

Ostap Okhrin 325 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

The conditional distribution of X1 given (X2X3X4) is univariate normalwith mean

µ1 + σ12Σ−122

X2 − µ2X3 − µ3X4 − µ4

= 65.7− 0.2X2 + 0.5X3 + 0.8X4

and varianceσ11.2 = σ11 − σ12Σ−1

22 σ21 = 96.761

The multiple correlation is ρ21.234 =

σ12Σ−122 σ21σ11

= 0.907.

Ostap Okhrin 326 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

The correlation matrix between the 4 variables is given by

P =

1

−0.168 10.867 0.121 10.633 −0.464 0.308 1

.

The conditional distribution of (X1,X2) given (X3,X4) is bivariatenormal with mean:(

µ1

µ2

)+

(σ13 σ14σ23 σ24

)(σ33 σ34σ43 σ44

)−1(X3 − µ3X4 − µ4

)

=

(32.516 + 0.467X3 + 0.977X4153.644 + 0.085X3 − 0.617X4

)

Ostap Okhrin 327 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

and covariance matrix:(σ11 σ12σ21 σ22

)−(σ13 σ14σ23 σ24

)(σ33 σ34σ43 σ44

)−1(σ31 σ32σ41 σ42

)

=

(104.006−33.574 155.592

).

This covariance matrix allows to compute the partial correlationbetween X1 and X2 for a fixed level of X3 and X4:

ρX1X2|X3X4 =−33.574√

104.006 · 155.592= −0.264.

Ostap Okhrin 328 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Mahalanobis Transform

If X ∼ Np(µ,Σ) then the Mahalanobis transform is

Y = Σ−1/2(X − µ) ∼ Np(0, Ip)

and it holds

Y>Y = (X − µ)> Σ−1(X − µ) ∼ χ2p.

Y is random vector and Y>Y is scalar. Y>Y can be used for testing (assuming that Σ is known). Normally, we do not know Σ. The tests in this situation can be

carried out using Wishart and Hotelling distributions.

Ostap Okhrin 329 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Summary: Elementary Properties

If X ∼ Np(µ,Σ) then a linear transformationAX + c ,A(q × p), c ∈ Rq has distribution Nq(Aµ+ c ,AΣA>).

Two linear transformations AX and BX of X ∼ Np(µ,Σ) areindependent if and only if AΣB> = 0.

If X1 and X2 are partitions of X ∼ Np(µ,Σ) then the conditionaldistribution of X2 given X1 = x1 is normal again.

Ostap Okhrin 330 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Elementary Properties

Summary: Elementary Properties

In the multivariate normal case, X1 is independent of X2 if andonly if Σ12 = 0.

The conditional expectation of (X2|X1) is a linear function if(X1X2

)∼ Np(µ,Σ).

The multiple correlation coefficient is defined asρ22.1...r =

σ21Σ−111 σ12σ22

.

The multiple correlation coefficient is the percentage of thevariance of X2 explained by the linear approximation β0 + β>X1.

Ostap Okhrin 331 of 461

Angewandte Multivariate Statistik Theory of the Multinormal The Wishart Distribution

Wishart Distribution

X ∼ Np(µ,Σ), µ = 0X (n × p) data matrix

M(p × p) = X>X ∼Wp(Σ, n)

Example (Wishart is generalization of χ2):p = 1, X ∼ N1(0, σ2)

X =

x1...xn

M = X>X =n∑

i=1x2i ∼ σ2χ2

n = W1(σ2, n)

Ostap Okhrin 332 of 461

Angewandte Multivariate Statistik Theory of the Multinormal The Wishart Distribution

Linear Transformation of the Data Matrix

Theorem

M∼Wp(Σ, n), B(p × q)

⇒ B>MB ∼Wq(B>ΣB, n)

Ostap Okhrin 333 of 461

Angewandte Multivariate Statistik Theory of the Multinormal The Wishart Distribution

Wishart and χ2p- Distribution

Theorem

M∼Wp(Σ, n) , a ∈ Rp , a>Σa 6= 0

⇒ a>Ma

a>Σa∼ χ2

n

Ostap Okhrin 334 of 461

Angewandte Multivariate Statistik Theory of the Multinormal The Wishart Distribution

Theorem (Cochran)

X (n × p) data matrix from a Np(0,Σ) distribution

nS = X>HX ∼Wp(Σ, n − 1)S is the sample covariance matrix

x and S are independent

Ostap Okhrin 335 of 461

Angewandte Multivariate Statistik Theory of the Multinormal The Wishart Distribution

Summary: Wishart Distribution

The Wishart distribution is a generalization of the χ2-distribution.In particular W1(σ2, n) = σ2χ2

n. The empirical covariance matrix S has a 1

nWp(Σ, n − 1)distribution.

In the normal case, x and S are independent. ForM∼Wp(Σ,m), a>Ma

a>Σa∼ χ2

m.

Ostap Okhrin 336 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Hotelling Distribution

Hotelling’s T 2-Distribution

Assume that random vector Y ∼ Np(0, I) is independent of randommatrixM∼Wp(I, n).

n Y>M−1Y ∼ T 2(p, n)

Hotelling’s T 2 is a generalization of Student’s t-distributionThe critical values of Hotelling’s T 2 can be calculated usingF -distribution:

T 2(p, n) =np

n − p + 1Fp,n−p+1

Ostap Okhrin 337 of 461

Angewandte Multivariate Statistik Theory of the Multinormal Hotelling Distribution

Summary: Hotelling’s T 2-Distribution

Hotelling’s T 2-distribution is a generalization of thet-distribution. In particular T (1, n) = tn.

(n − 1)(x − µ)>S−1(x − µ) has a T 2(p, n − 1) distribution. The relation between Hotelling’s T 2− and Fisher’s F -distribution

is given by T 2(p, n) = npn−p+1 Fp,n−p+1.

Ostap Okhrin 338 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Theory of Estimation

In parametric statistics, θ is a k-variate vector θ ∈ Rk characterizingthe unknown properties of the population pdf f (x ; θ)

The aim will be to estimate θ from the sample X through estimators θwhich are functions of the sample: θ = θ(X ).

We must derive the sampling distribution of θ to analyze its properties(is it related to the unknown quantity θ it is supposed to estimate?).

We will utilise the maximum likelihood theory.

Ostap Okhrin 339 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

The Likelihood Function

X ∼ f (x ; θ) pdf of an i.i.d. sample xini=1 with parameter θLikelihood function

L(X ; θ) =n∏

i=1

f (xi ; θ)

MLEθ = argmax

θL(X ; θ)

log-likelihood`(X ; θ) = log L(X ; θ)

Ostap Okhrin 340 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

ExampleSample xini=1 from Np(µ, I), i.e. from the pdf

f (x ; θ) = (2π)−p/2 exp−12

(x − θ)>(x − θ)

where θ = µ ∈ Rp is the mean vector parameter.The log-likelihood is

`(X ; θ) =n∑

i=1

logf (xi ; θ) = log (2π)−np/2 − 12

n∑i=1

(xi − θ)>(xi − θ).

The term (xi − θ)>(xi − θ) equals

(xi − x)>(xi − x) + (x − θ)>(x − θ) + 2(x − θ)>(xi − x).

Ostap Okhrin 341 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

ExampleIf we sum up this term over i = 1, . . . , n we see that

n∑i=1

(xi − θ)>(xi − θ) =n∑

i=1

(xi − x)>(xi − x) + n(x − θ)>(x − θ).

Hence

`(X ; θ) = log(2π)−np/2 − 12

n∑i=1

(xi − x)>(xi − x)− n

2(x − θ)>(x − θ).

Only the last term depends on θ and is obviously maximized for

θ = µ = x .

Thus x is the MLE.

Ostap Okhrin 342 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (MLE’s from a Normal Distribution)xini=1 is a sample from a normal distribution Np(µ,Σ)Due to the symmetry of Σ, the unknown parameter θ is in factp + 1

2p(p + 1)-dimensional.Then

L(X ; θ) = |2πΣ|−n/2 exp

−12

n∑i=1

(xi − µ)>Σ−1(xi − µ)

and

`(X ; θ) = −n

2log |2πΣ| − 1

2

n∑i=1

(xi − µ)>Σ−1(xi − µ).

Ostap Okhrin 343 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (MLE’s from a Normal Distribution - cont’d)The term (xi − µ)>Σ−1(xi − µ) equals

(xi − x)>Σ−1(xi − x) + (x − µ)>Σ−1(x − µ) + 2(x − µ)>Σ−1(xi − x).

If we sum up this term over i = 1, . . . , n we see that

n∑i=1

(xi − µ)>Σ−1(xi − µ)

=n∑

i=1

(xi − x)>Σ−1(xi − x) + n(x − µ)>Σ−1(x − µ).

Ostap Okhrin 344 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (MLE’s from a Normal Distribution - cont’d)Note that

(xi − x)>Σ−1(xi − x) = tr

(xi − x)>Σ−1(xi − x)

= tr

Σ−1(xi − x)(xi − x)>.

We sum this up over the index i :n∑

i=1

(xi − µ)>Σ−1(xi − µ)

= trΣ−1n∑

i=1

(xi − x)(xi − x)>+ n(x − µ)>Σ−1(x − µ)

= trΣ−1nS+ n(x − µ)>Σ−1(x − µ).

Ostap Okhrin 345 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (MLE’s from a Normal Distribution - cont’d)Thus the log-likelihood function for Np(µ,Σ) is

`(X ; θ) = −n

2log |2πΣ| − n

2trΣ−1S − n

2(x − µ)>Σ−1(x − µ)

We can easily see that the third term would be maximized by µ = x .The MLE’s are given by

µ = x , Σ = S.

Note that the unbiased covariance estimator Su = nn−1S is not the

MLE!

Ostap Okhrin 346 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (Linear Regression Model)Linear regression model yi = β>xi + εi ; i = 1, . . . , n, with εi i.i.d.N(0, σ2) and xi ∈ Rp.Here θ = (β>, σ) is a (p + 1)-dimensional parameter vector.Denote

y =

y1...yn

, X =

x>1...x>n

.

Then

L(y ; θ) =n∏

i=1

1√2πσ

exp− 12σ2 (yi − β>xi )2

and

Ostap Okhrin 347 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (Linear Regression Model - cont’d)

`(y ; θ) = log(

1(2π)n/2σn

)− 1

2σ2

n∑i=1

(yi − β>xi )2

= −n

2log(2π)− n log σ − 1

2σ2 (y −Xβ)>(y −Xβ)

= −n

2log(2π)− n log σ − 1

2σ2 (y>y + β>X>Xβ − 2β>X>y)

Differentiating w.r.t. the parameters yields

∂β` = − 1

2σ2 (2X>Xβ − 2X>y)

∂σ` = −n

σ+

1σ3

(y −Xβ)>(y −Xβ)

.

Ostap Okhrin 348 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (Linear Regression Model - cont’d)∂∂β ` is the vector of the derivatives w.r.t. all components of β (thegradient).Since the first equation is only dependent on β, we start with derivingβ.

X>X β = X>y =⇒ β = (X>X )−1X>y

Now we plug-in β into the second equation which gives

n

σ=

1σ3 (y −X β)>(y −X β) =⇒ σ2 =

1n||y −X β||2,

|| • ||2 denoting the Euclidean vector norm.

Ostap Okhrin 349 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (Linear Regression Model - cont’d)We see that the MLE β is identical with the least squares estimator.The variance estimator

σ2 =1n

n∑i=1

(yi − β>xi )2

is nothing else than the residual sum of squares (RSS) generalized tothe case of multivariate xi .

Ostap Okhrin 350 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Example (Linear Regression Model - cont’d)Note that in a fixed design situation where the xi are considered asbeing fixed, we have

E(y) = Xβ and Var(y) = σ2In.

Then, using the properties of moments, we have

E(β) = (X>X )−1X> E(y) = β,

Var(β) = σ2(X>X )−1.

Ostap Okhrin 351 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Summary: Likelihood Function

If xini=1 is an i.i.d. sample from a distribution with pdf f (x ; θ)then L(X ; θ) =

∏ni=1 f (xi ; θ) is the likelihood function.

The maximum likelihood estimator (MLE) is that value of θwhich maximizes L(X ; θ). Equivalently one can maximize thelog-likelihood `(X ; θ).

Ostap Okhrin 352 of 461

Angewandte Multivariate Statistik Theory of Estimation The Likelihood Function

Summary: Likelihood Function

The MLE’s of µ,Σ from a Np(µ,Σ) distribution are µ = x andΣ = S. Note that the MLE for Σ is not unbiased.

The MLE’s in a linear model y = Xβ + ε, ε ∼ Nn(0, σ2I) aregiven by the least squares estimator β = (X>X )−1X>y andσ2 = 1

n ||y −X β||2. E(β) = β and Var(β) = σ2(X>X )−1.

Ostap Okhrin 353 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Cramer-Rao Lower bound

One typical property we want for an estimator is unbiasedness: E(θ) = θ. (x is an unbiased estimator of µ and S is a biased

estimator of Σ in finite sample). We look for an unbiased estimator with the smallest possible

variance. The Cramer-Rao lower bound will achieve this and it provides the

asymptotic optimality property of maximum likelihood estimators. The Cramer-Rao theorem involves the score function and its

properties, which are first derived.

Ostap Okhrin 354 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Score Function and Fisher Information

The score function is

s(X ; θ) =∂

∂θ`(X ; θ)

The covariance matrix Fn = Vars(X ; θ) is called the Fisherinformation matrix.

Ostap Okhrin 355 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Example (Score Function and Fisher Information)Suppose that X ∼ Np(θ, I). Then

s(X ; θ) =∂

∂θ`(X ; θ)

= −12∂

∂θ

n∑

i=1

(xi − θ)>(xi − θ)

= n(x − θ),

hence the information matrix is Fn = Varn(x − θ) = nIp.

Ostap Okhrin 356 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

TheoremIf s = s(X ; θ) is the score function and if θ = t = t(X , θ) is anyfunction of X and θ, then under regularity conditions

E(st>) =∂

∂θE(t>)− E

(∂t>

∂θ

CorollaryIf s = s(X ; θ) is the score function, and θ = t = t(X ) is any unbiasedestimator of θ (i.e., E(t) = θ), then

E(st>) = Cov(s, t) = Ik .

Ostap Okhrin 357 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Note that the score function has mean zero

Es(X ; θ) = 0.

Hence, E(ss>) = Var(s) = Fn and it follows that

Fn = −E

∂2

∂θ∂θ>`(X ; θ)

.

RemarkIf x1, · · · , xn are i.i.d., Fn = nF1 where F1 is the Fisher informationmatrix for sample size n = 1.

Ostap Okhrin 358 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

All estimators which are unbiased and attain the Cramer-Rao lowerbound are minimum variance estimators.

Theorem (Cramer-Rao)If θ = t = t(X ) is any unbiased estimator for θ then under regularityconditions

Var(t) ≥ F−1n ,

whereFn = Es(X ; θ)s(X ; θ)> = Vars(X ; θ)

is the Fisher information matrix

Ostap Okhrin 359 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Beweis.Consider the correlation ρY ,Z between Y and Z where Y = a>t,Z = c>s, and s is the score and the vectors a, c ∈ Rp. By theCorollary Cov(s, t) = I and thus

Cov(Y ,Z ) = a> Cov(t, s)c = a>c

Var(Z ) = c> Var(s)c = c>Fnc .

Hence,

ρ2Y ,Z =

Cov2(Y ,Z )

Var(Y )Var(Z )=

(a>c)2

a> Var(t)a· c>Fnc≤ 1.

Ostap Okhrin 360 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

cont’d.In particular, this holds for any c 6= 0. Therefore it holds also for themaximum of the left-hand side with respect to c . Since

maxc

c>aa>c

c>Fnc= max

c>Fnc=1c>aa>c

andmax

c>Fnc=1c>aa>c = a>F−1

n a

Ostap Okhrin 361 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

By the maximization theorem in the chapter on Matrix Algebra wehave

a>F−1n a

a> Var(t)a≤ 1 ∀ a ∈ Rp, a 6= 0,

i.e.,a>Var(t)−F−1

n a ≥ 0 ∀ a ∈ Rp, a 6= 0,

which is equivalent to Var(t) ≥ F−1n .

Ostap Okhrin 362 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Asymptotic Sampling Distribution of the MLE

Maximum likelihood estimators (MLE’s) attain the lower bound if thesample size n goes to infinity. The next theorem states this and, inaddition, gives the asymptotic sampling distribution of the maximumlikelihood estimation, which turns to be multinormal.

Ostap Okhrin 363 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

TheoremSuppose that the sample xini=1 is i.i.d. If θ is the MLE for θ ∈ Rk ,i.e., θ = argmax

θL(X ; θ), then under some regularity conditions, as

n→∞: √n(θ − θ)

L−→ Nk(0,F−11 )

where F1 denotes the Fisher information for sample size n = 1.As a consequence we see that (under regularity conditions) the MLE isasymptotically unbiased, efficient (minimum variance) and normallydistributed.

Ostap Okhrin 364 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

It follows that asymptotically

n(θ − θ)>F1(θ − θ)L→ χ2

p,

If F1 is a consistent estimator of F1

n(θ − θ)>F1(θ − θ)L→ χ2

p

This expression may be useful to test hypotheses about θ and toconstruct confidence regions for θ in a very general setup. It is clearthat

P(n(θ − θ)>F1(θ − θ) ≤ χ2

1−α;p

)≈ 1− α,

where χ2ν;p denotes the ν-quantile of a χ2

p random variable. So, theellipsoid n(θ − θ)>F1(θ − θ) ≤ χ2

1−α;p ∈ Rp provides an asymptotic(1− α)-confidence region for θ.

Ostap Okhrin 365 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Summary: Cramer-Rao Lower bound

The score function is the derivative s(X ; θ) = ∂∂θ`(X ; θ) of the

log-likelihood with respect to θ. The covariance matrix of s(X ; θ)is the Fisher information matrix.

Any unbiased estimator θ = t = t(X ) has a variance that isbounded below by the inverse of the Fisher information. Thus, anestimator, which attains this lower bound, is a minimal varianceestimator.

Ostap Okhrin 366 of 461

Angewandte Multivariate Statistik Theory of Estimation Cramer-Rao Lower Bound

Summary: Cramer-Rao Lower bound

MLE’s attain the lower bound in an asymptotic sense, i.e.,

√n(θ − θ)

L−→ N(0,F−11 )

if θ is the MLE θ = argmaxθ

L(X ; θ).

Ostap Okhrin 367 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Likelihood Ratio Test

Suppose that the distribution of xini=1, xi ∈ Rp, depends on aparameter vector θ. Then

H0 : θ ∈ Ω0

H1 : θ ∈ Ω1.

The hypothesis H0 corresponds to the “reduced model” and H1 to the“full model”.

Ostap Okhrin 368 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

ExampleXi ∼ Np(θ, I)

H0 : θ = θ0

H1 : no constraints for θ

or equivalently to Ω0 = θ0, Ω1 = Rp.

Ostap Okhrin 369 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Likelihood RatioDefine L∗j = max

θ∈Ωj

L(X ; θ), the maxima of the likelihood for each of the

hypotheses.

λ(X ) =L∗0L∗1

Likelihood Ratio Testrejection region:

R = x : λ(x) < c

supθ∈Ω0

Pθ(x ∈ R) = α

Ostap Okhrin 370 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Theorem (Wilks)If Ω1 ⊂ Rq is a q-dimensional space and if Ω0 ⊂ Ω1 is anr -dimensional subspace, then under regularity conditions for n→∞

∀ θ ∈ Ω0 : −2 log λ L−→ χ2q−r .

Ostap Okhrin 371 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 1

X1, . . . ,Xn, i.i.d. with Xi ∼ Np(µ,Σ)

H0 : µ = µ0, Σ known, H1 : no constraints.

Ω0 = µ0, r = 0,Ω1 = Rp, q = p

−2 log λ = 2(`∗1 − `∗0) = n(x − µ0)>Σ−1(x − µ0)

−2 log λ ∼ χ2p

Rejection region R : x ∈ Rn such that −2 log λ > χ20.95;p

Ostap Okhrin 372 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Example (Bank Data)

µ0 = (214.9, 129.9, 129.7, 8.3, 10.1, 141.5)>.

x = (214.8, 130.3, 130.2, 10.5, 11.1, 139.4)>.

−2 log λ = 2(`∗1 − `∗0) = n(x − µ0)>Σ−1(x − µ0)

= 7362.32

LR test statistic −2 log λ ∼ χ26 is highly significant.

Ostap Okhrin 373 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 2Xi ∼ Np(µ,Σ) i.i.d.

H0 : µ = µ0, Σ unknown, H1 : no constraints.

Under H0 it can be shown that

`∗0 = `(µ0,S + dd>), d = (x − µ0)

and under H1 we have`∗1 = `(x ,S).

This leads to

− 2 log λ = 2(`∗1 − `∗0) = n log(1 + d>S−1d). (16)

Ostap Okhrin 374 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 2 cont’dNote that this statistic depends on (n − 1)d>S−1d which has, underH0, a Hotelling’s T 2-distribution. Therefore,

(n − 1)(x − µ0)>S−1(x − µ0) ∼ T 2(p, n − 1). (17)

or equivalently(n − p

p

)(x − µ0)>S−1(x − µ0) ∼ Fp,n−p

So the rejection region may be defined as(n − p

p

)(x − µ0)>S−1(x − µ0) > F1−α;p,n−p.

Ostap Okhrin 375 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 2 cont’dAlternatively we have, under H0, the asymptotic distribution

−2 log λ −→ χ2p,

leading to the rejection region

n log1 + (x − µ0)>S−1(x − µ0)

> χ2

1−α;p

Ostap Okhrin 376 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Confidence region for µ

(n−pp

)(x − µ)>S−1(x − µ) ∼ Fp,n−pµ ∈ Rp | (µ− x)>S−1(µ− x) ≤ p

n − pF1−α;p,n−p

is a confidence region at level (1-α) for µ; it is the interior of aniso-distance ellipsoid in Rp.When p is large, ellipsoids are difficult to practically handle. One isthus interested in finding confidence intervals for µ1, µ2, . . . , µp sothat simultaneous confidence on all the intervals reaches the desiredlevel say, 1− α.

Ostap Okhrin 377 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Simultaneous Confidence Intervals for a>µ

Obvious confidence interval for certain a>µ is given by:∣∣∣∣√n − 1(a>µ− a>x)√a>Sa

∣∣∣∣ ≤ t1−α2 ;n−1

or equivalently

t2(a) =(n − 1)

a>(µ− x)

2

a>Sa≤ F1−α;1,n−1

which provides the (1− α) confidence interval for a>µ:a>x −

√F1−α;1,n−1

a>San − 1

≤ a>µ ≤ a>x +

√F1−α;1,n−1

a>San − 1

.

Ostap Okhrin 378 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Using Theorem on maximum of quadratic forms we see that:

maxa

t2(a) = (n − 1)(x − µ)>S−1(x − µ) ∼ T 2(p, n − 1).

implies that the simultaneous confidence intervals for all possiblelinear combinations a>µ, a ∈ Rp of the elements of µ is given by:(

a>x −√Kαa>Sa, a>x +

√Kαa>Sa

),

where Kα = pn−pF1−α;p,n−p.

Ostap Okhrin 379 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Example95% confidence region for µf , the mean of the forged banknotes, isgiven by the ellipsoid:

µ ∈ R6∣∣∣(µ− xf )>S−1

f (µ− xf ) ≤ 694

F0.95;6,94

95% simultaneous c.i. are given by (using F0.95;6,94 = 2.1966)

214.692 ≤ µ1 ≤ 214.954130.205 ≤ µ2 ≤ 130.395130.082 ≤ µ3 ≤ 130.30410.108 ≤ µ4 ≤ 10.95210.896 ≤ µ5 ≤ 11.370

139.242 ≤ µ6 ≤ 139.658

Ostap Okhrin 380 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Example (cont’d)Comparison with µ0 = (214.9, 129.9, 129.7, 8.3, 10.1, 141.5)> showsthat almost all components (except the first one) are responsible forthe rejection of µ0.In addition, choosing e.g. a> = (0, 0, 0, 1, −1, 0) gives c.i.−1.211 ≤ µ4 − µ5 ≤ 0.005 shows that for the forged bills, the lowerborder is essentially smaller than the upper border.

Ostap Okhrin 381 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 3Xi ∼ Np(µ,Σ)

H0 : Σ = Σ0, µ unknown, H1 : no constraints.

−2 log λ = 2(`∗1 − `∗0)= n tr(Σ−1

0 S)− n log |Σ−10 S| − np.

−2 log λ→ χ2m, m = 1

2 p(p + 1)

Ostap Okhrin 382 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Example (US companies data)

S = 107 ×(

1.6635 1.24101.2410 1.3747

)(energy sector)

We want to test if Var(X1X2

)= 107 ×

(1.2248 1.14251.1425 1.5112

)= Σ0

(where Σ0 is the variance of manufacturing sector)LR test statistic −2 log λ = 2.7365 is not significant for χ2

3.Hence, we don’t reject the null hypothesis H0 and we can not concludethat Σ 6= Σ0.

Ostap Okhrin 383 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Test problem 4Yi ∼ N1(β>xi , σ

2), xi ∈ Rp

H0 : β = β0, σ2 unknown, H1 : no constraints.

−2 log λ = 2(`∗1 − `∗0)

=n

2log

(||y −Xβ0||2

||y −X β||2

)−→ χ2

p

Recall

F =(n − p)

p

(||y −Xβ0||2

||y −X β||2− 1

)∼ Fp,n−p

Ostap Okhrin 384 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Example (Classic blue pullover example)(αβ

)=(211

0

)

y =

y1...

y10

=

x1,1...

x10,1

, X =

1 x1,2...

...1 x10,2

.

The test statistic for the LR test is −2 log λ = 4.55 which is under theχ2

2 distribution not significant. However the exact F -test statisticF = 5.93 under the F2,8 distribution is significant (F2,8;0.95 = 4.46).

Ostap Okhrin 385 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Summary: Hypothesis Testing

The hypotheses H0 : θ ∈ Ω0 against H1 : θ ∈ Ω1 can be test bymeans of the likelihood ratio test (LRT).

The likelihood ratio (LR) is the quotient λ(X ) = L∗0/L∗1 where the

L∗j are the maxima of the likelihood in each of the hypotheses. The test statistic in the LRT is λ(X ) or equivalently its logarithm

log λ(X ).

Ostap Okhrin 386 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Summary: Hypothesis Testing

If Ω1 is q-dimensional and Ω0 ⊂ Ω1 r -dimensional, then theasymptotic distribution of −2 log λ is χ2

q−r . This allows to test H0against H1 by calculating as test statistic −2 log λ = 2(`∗1 − `∗0)where `∗j = log L∗j .

The hypothesis H0 : µ = µ0 for X ∼ Np(µ,Σ), Σ known, leads to−2 log λ = n(x − µ0)>Σ−1(x − µ0) ∼ χ2

p.

The hypothesis H0 : µ = µ0 for X ∼ Np(µ,Σ), Σ unknown, leadsto −2 log λ = n log1 + (x − µ0)>S−1(x − µ0) −→ χ2

p, and

(n − 1)(x − µ0)>S−1(x − µ0) ∼ T 2(p, n − 1).

Ostap Okhrin 387 of 461

Angewandte Multivariate Statistik Hypothesis Testing Likelihood Ratio Test

Summary: Hypothesis Testing

The hypothesis H0 : Σ = Σ0 for X ∼ Np(µ,Σ), µ unknown, leadsto −2 log λ = n tr

(Σ−1

0 S)− n log |Σ−1

0 S| − np −→ χ2m, m =

12p(p + 1).

The hypothesis H0 : β = β0 for Yi ∼ N1(β>xi , σ2), σ2 unknown,

leads to −2 log λ = n2 log

(||y−Xβ0||2

||y−X β||2

)−→ χ2

p.

Ostap Okhrin 388 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Linear Hypothesis

We present a general procedure which allows a linear hypothesis to betested.Linear hypotheses are of the form Aµ = a with known matricesA(q × p) and a(q × 1) with q ≤ p.

ExampleSuppose that X1 ∼ N(µ1, σ) and X2 ∼ N(µ2, σ) are independent andthat you want to test the hypothesis H0 : µ1 = µ2This can be written as linear hypothesis

H0 : Aµ =(1 −1

)( µ1µ2

)= 0.

Ostap Okhrin 389 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test problem 5Xi ∼ Np(µ,Σ)

H0 : Aµ = a, Σ known, H1 : no constraints.

The results of the Test Problems 1 and 2 can directly be used on µy ,the mean of Yi = AXi .Indeed Yi ∼ Nq(µy ,Σy ) where µy=Aµ and Σy = AΣA>.Accordingly we have: y = Ax , Sy = ASA>, d = Ax − a.

n(Ax − a)>(AΣA>)−1(Ax − a) ∼ X 2q

Ostap Okhrin 390 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleWe consider hypotheses on partitioned µ =

(µ1µ2

).

H0 : µ1 = µ2, H1 : no constraints,

for N2p((µ1µ2

),(

Σ0

)) with known Σ.

This is equivalent to A = (Ip,−Ip), a = (0, . . . , 0︸ ︷︷ ︸p

)> and leads to

−2 log λ = n(x1 − x2)(2Σ)−1(x1 − x2) ∼ χ2p.

Ostap Okhrin 391 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleAnother example is the test whether µ1 = 0, i.e.

H0 : µ1 = 0, H1 : no constraints,

for N2p((µ1µ2

),(

Σ0

)) with known Σ.

This is equivalent to Aµ = a with A = (I, 0),a = (0, . . . , 0︸ ︷︷ ︸

p

)>.

Hence−2 log λ = nx1Σ−1x1 ∼ χ2

p.

Ostap Okhrin 392 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test problem 6Xi ∼ Np(µ,Σ)

H0 : Aµ = a, Σ unknown, H1 : no constraints.

Ostap Okhrin 393 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleConsider the bank data set and test if µ4 = µ5, i.e., if the lower bordermean equals to the larger border mean for the forged bills.

A = (0 0 0 1− 1 0)

a = 0.

The test statistic is

99(Ax)>(ASfA>)−1(Ax) ∼ T 2(1, 99) = F1,99.

The observed value is 13.638 which is significant.

Ostap Okhrin 394 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Repeated Measurements

Frequently, n independent sampling units are observed under pdifferent experimental conditions (different treatments,...).X1, . . . ,Xn are i.i.d. with Xi ∼ Np(µ,Σ) given p repeated measures.

The hypothesis of interest in that case is the following: there are notreatment effects, H0 : µ1 = µ2 = . . . = µp. This hypothesis is a directapplication of the Test Problem 6.

H0 : Cµ = 0 where C ((p − 1)× p) =

1 −1 0 · · · 00 1 −1 · · · 0...

......

......

0 · · · 0 1 −1

Ostap Okhrin 395 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Repeated Measurements

Note that in many cases one of the experimental conditions is the“control” (a placebo, standard drug or reference condition). In thiscase,

C ((p × 1)× p) =

1 −1 0 · · · 01 0 −1 · · · 0...

......

......

1 0 0 · · · −1

The null hypothesis will be rejected when

(n − p + 1)

p − 1x>C>(CSC>)−1Cx > F1−α;p−1,n−p+1

Ostap Okhrin 396 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Repeated Measurements

Simultaneous confidence intervals for linear combinations of the meanof Yi have already been derived. For all a ∈ Rp−1, with probability(1− α) we have:

a>Cµ ∈ a>Cx ±

√(p − 1)

n − p + 1F1−α;p−1,n−p+1a>CSC>a.

The row sums of the element of C are zero: C1p = 0, therefore a>C isa vector whose sum of element vanishes. This is called a contrast.

Ostap Okhrin 397 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Repeated Measurements

Let b = C>a, we have b>1p =p∑

j=1bj = 0, the result above provides

thus for all contrasts of µ, b>µ simultaneous confidence intervals oflevel (1− α)

b>µ ∈ b>x ±

√(p − 1)

n − p + 1F1−α;p−1,n−p+1b>Sb.

Contrast are e.g.: b> = (1,−1, 0, 0), (1, 0, 0,−1), (13 ,−

13 ,−

13 ,−

13).

Ostap Okhrin 398 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example40 children were randomly chosen and then followed from grade level 8to 11, the scores obtained from a test of their vocabulary.

x> = (1.086, 2.544, 2.851, 3.420)

S =

2.9022.438 3.0492.963 2.775 4.2812.183 2.319 2.939 3.162

.

Ostap Okhrin 399 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example (cont’d)The matrix C providing successive differences of µj is:

C =

1 −1 0 00 1 −1 00 0 1 −1

.

The test statistic is Fobs = 53.134 which is significant for F3.37.We have the following simultaneous 95% confidence intervals

−1.958 ≤ µ1 − µ2 ≤ −0.959−0.949 ≤ µ2 − µ3 ≤ 0.335−1.171 ≤ µ3 − µ4 ≤ 0.036.

Ostap Okhrin 400 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example (cont’d)The rejection of the H0 is mainly due to the difference between thefirst and the second year performance of children. The followingconfidence intervals for the following contrasts may also be of interest:

−2.283 ≤ µ1 − 13(µ2 + µ3 + µ4) ≤ −1.423

−1.777 ≤ 13(µ1 + µ2 + µ3)− µ4 ≤ −0.742

−1.479 ≤ µ2 − µ4 ≤ −0.272

i.e., µ1 is different from the average of the 3 other years and µ4 turnsout to be better than µ2.

Ostap Okhrin 401 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test Problem 7Suppose Y1, . . . ,Yn, independent withYi ∼ N1(β>xi , σ

2),xi ∈ Rp.

H0 : Aβ = a, σ2 unknown, H1 : no constraints.

The constrained maximum likelihood estimators under H0 are

β = β − (X>X )−1A>[A(X>X )−1A>]−1(Aβ − a)

for β and σ2 = 1n (y −X β)>(y −X β). β denotes the unconstrained

MLE as before. The LR statistic is

−2 log λ = 2(`∗1 − `∗0)

=n

2log

(||y −X β||2

||y −X β||2

)−→ χ2

q

Ostap Okhrin 402 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example (“classic blue” pullovers)Let’s test if β = 0 in the regression of sales on prices. It holds

β = 0 ←→ (0 1)

β

)= 0.

The LR statistic here is

−2 log λ = 0.142

which is not significant for the χ21 distribution. The F -test statistic

F = 0.231

is also not significant.

Ostap Okhrin 403 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example (“classic blue” pullovers cont’d)We can assume independence of sales on prices (alone).Multivariate regression in the “classic blue” pullovers example.Parameter estimates in the model

X1 = α + β1X2 + β2X3 + β3X4 + ε

areα = 65.670, β1 = −0.216, β2 = 0.485, β3 = 0.844.

Let us test now the hypothesis

H0 : β1 = −12β2

Ostap Okhrin 404 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example (“classic blue” pullovers cont’d)This is equivalent to

(0 1

12

0) α

β1β2β3

= 0.

The LR statistic in this case is equal to

−2 log λ = 0.006,

the F statistic isF = 0.007.

Hence, in both cases we will not reject our hypothesis.

Ostap Okhrin 405 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test Problem 8 (Comparison of two means)Suppose Xi1 ∼ Np(µ1,Σ),i = 1, · · · , n1 andXj2 ∼ Np(µ2,Σ),j = 1, · · · , n2, all the variables being independent.

H0 : µ1 = µ2, H1 : no constraints.

Both samples provide the statistics xk and Sk , k=1,2. Letδ = µ1 − µ2,we have

(x1 − x2) ∼ Np

(δ,n1 + n2

n1n2Σ

)n1S1 + n2S2 ∼Wp(Σ, n1 + n2 − 2).

Ostap Okhrin 406 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

The rejection region will thus be given by:

n1n2(n1 + n2 − p − 1)

p(n1 + n2)2 ((x1 − x2))> S−1 ((x1 − x2))

≥ F1−α;p,n1+n2−p−1

A (1− α) ∗ 100% confidence region for δ is given by the ellipsoidcentered at (x1 − x2)

(δ − (x1 − x2))> S−1 (δ − (x1 − x2))

≤ p(n1 + n2)2

(n1 + n2 − p − 1)(n1n2)F1−α;p,n1+n2−p−1,

and the simultaneous confidence intervals for all linear combinations ofthe elements of δ : a>δ are given by

a>δ ∈ a>(x1−x2)±

√p(n1 + n2)2

(n1 + n2 − p − 1)(n1n2)F1−α;p,n1+n2−p−1a>Sa.

Ostap Okhrin 407 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleWe want to compare the mean of the assets (X1) and of the sales (X2)of the two sectors energy (group 1) and manufacturing (group 2). Wehave the following statistics n1 = 15, n2 = 10, p = 2,

x1 =

(40842580.5

), x1 =

(4307.24925.2

)and

S1 = 107 ∗(

1.6635 1.24101.2410 1.3747

),

S2 = 107 ∗(

1.2248 1.14251.1425 1.5112

),

so that S = 107 ∗(

1.4880 1.20161.2016 1.4293

).

Ostap Okhrin 408 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleThe observed value of the test statistic is Fobs = 2.7036. SinceF0.95;2,22 = 3.4434 the hypothesis of equal means of the two groups isnot rejected although it would be rejected at a less severe level(p − value = 0.0892). The 95% simultaneous confidence intervals forthe differences are given by

−4628.6 ≤ µ1a − µ2a ≤ 4182.2−6662.4 ≤ µ1s − µ2s ≤ 1973.0.

Ostap Okhrin 409 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleLet us compare the vectors of means of the forged and the genuinebank notes. The matrices Sf and Sg were already calculated and sincehere nf = ng = 100, S is simply the mean of Sf andSg : S = 1

2 (Sf + Sg ).

x>g = (214.97 129.94 129.72 8.305 10.168 141.52)

x>f = (214.82 130.3 130.19 10.53 11.133 139.45)

The test statistic is Fobs = 391.92 which is highly significant forF6,193.

Ostap Okhrin 410 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleThe 95% simultaneous confidence intervals for the differencesδj = µgj − µfj , j = 1, . . . , p are:

−0.0443 ≤ δ1 ≤ 0.3363−0.5186 ≤ δ2 ≤ −0.1954−0.6416 ≤ δ3 ≤ −0.3044−2.6981 ≤ δ4 ≤ −1.7519−1.2952 ≤ δ5 ≤ −0.63481.8072 ≤ δ6 ≤ 2.3268

All the components (except for the first) show a significant differencein the means. The main effects being taken by the lower border (X4)and the diagonal (X6).

Ostap Okhrin 411 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test Problem 9 (Comparison of Covariance Matrices)Let Xih ∼ Np(µh,Σh), i = 1 · · · ,Nh; h = 1, · · · , kall variables being independent,

H0 : Σ1 = Σ2 = · · · = Σk , H1 : no constraints.

Ostap Okhrin 412 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Each subsample provides Sh an estimator of Σh with

nhSh ∼Wp(Σh, nh − 1)

Under H0,∑k

h=1 nhSh ∼Wp(Σ, n − k), where Σ is the commoncovariance matrix x and n =

∑kh=1 nh. Let S = n1S1+···+nkSk

n be theweighted average of the Sh(it is in the fact the MLE of Σ when H0 istrue). The likelihood ratio test leads to the statistic

−2 log λ = n log | S | −k∑

h=1

nh log | Sh |

which under H0 is approximately distributed as a X 2m where

m=12(k − 1)p(p + 1).

Ostap Okhrin 413 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleCome back to US companies data, where the mean of assets and saleshave been compared for companies from the energy and manufacturingsector. The test Σ1 = Σ2 leads to the value of the test statistic

−2 log λ = 0.9076

which is not significant (p-value for a χ23 = 0.82). We cannot reject H0

and the comparison of the means above is valid.

Ostap Okhrin 414 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Test Problem 10 (Comparison of two means, unequal covariancematrices, large samples)Suppose Xi1 ∼ Np(µ1,Σ1),i = 1, · · · , n1 andXj2 ∼ Np(µ2,Σ2),j = 1, · · · , n2, all the variables being independent.

H0 : µ1 = µ2, H1 : no constraints.

Ostap Okhrin 415 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

(x1 − x2) ∼ Np

(δ,

Σ1

n1+

Σ2

n2

).

Therefore,

(x1 − x2)>(

Σ1

n1+

Σ2

n2

)−1

(x1 − x2) ∼ χ2p

Since Si is a consistent estimator of Σi , i = 1, 2 we have

(x1 − x2)>(S1

n1+S2

n2

)−1

(x1 − x2)→ χ2p (18)

Ostap Okhrin 416 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleLet us compare the forged and the genuine bank notes again (n1 andn2 are large). The test statistic turns out to be 2436.8 which is highlysignificant. The 95% simultaneous confidence intervals are now:

−0.0389 ≤ δ1 ≤ 0.3309−0.5140 ≤ δ2 ≤ −0.2000−0.6368 ≤ δ3 ≤ −0.3092−2.6846 ≤ δ4 ≤ −1.7654−1.2858 ≤ δ5 ≤ −0.6442

1.8146 ≤ δ6 ≤ 2.3194

showing that all the components except the first are different fromzero, the larger difference coming from X6 (length of the diagonal) andX4 (lower border).

Ostap Okhrin 417 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Profile analysis

p measures are reported in the same units. For instance, measures of blood pressure at p different moments,

one group being the control group and the other is the groupreceiving a new treatment.

One is then interested to compare the profile of each group: the profilebeing just the vectors of means of the p responses (the comparisonmay be visualized in a two dimensional graph using the parallelcoordinate plot

Ostap Okhrin 418 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Profile Analysis

1 2 3 4 5

12

34

5

Population Profiles

Treatment

Mea

n

Group 1Group 2

Abbildung 19: Population profiles MVAprofil

Ostap Okhrin 419 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

The following questions are of interest:1) Are the profiles similar in the sense of being parallel (which means

no interaction between the treatments and the groups)?2) If the profiles are parallel, are they at the same level?3) If the profiles are parallel, is there any treatment effect (are the

profiles horizontal)?The above questions are easily translated in terms of linear constraintson the means and a test statistic is obviously obtained.

Ostap Okhrin 420 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Parallelism

Let C be a((p − 1)× p) matrix defined as

C =

1 −1 0 · · · 00 1 −1 · · · 00 · · · 0 1 −1

.The hypothesis to be tested is H(1)

0 : C (µ1 − µ2) = 0. Under H0,

n1n2

(n1 + n2)2 (n1 + n2 − 2) (C (x1 − x2))> (CSC>)−1C (x1 − x2)

∼ T 2(p − 1, n1 + n2 − 2)

when S is the pooled covariance matrix. The hypothesis is rejected if

n1n2(n1 + n1 − p)

(n1 + n2)2(p − 1)(Cx)>

(CSC>

)−1Cx > F1−α;p−1,n1+n2−p.

Ostap Okhrin 421 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Equality of two levels

The question of equality of the two levels is meaningful only if the twoprofiles are parallel. In the case of interaction (rejection of H(1)

0 ), thetwo populations react differently to the treatments and the question ofthe level has no meaning.The equality of the two levels is written as:

H(2)0 : 1>p (µ1 − µ2) = 0

Ostap Okhrin 422 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

n1n2

(n1 + n2)2 (n1 + n2 − 2)

(1>p (x1 − x2)

)21>p S1p

∼ T 2(1, n1 + n2 − 2)

= F1,n1+n2−2.

The rejection region is thus

n1n2(n1 + n2 − 2)

(n1 + n2)2

(1>p (x1 − x2)

)21>p S1p

> F1−α;1,n1+n2−2.

Ostap Okhrin 423 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Treatment effect

If the parallelism between the profiles has been rejected, then twoindependent analyses should be done on the two groups using therepeated measurement approach (see above). But if the parallelism isaccepted, we can exploit the information contained in both groups(eventually at different levels) to test a treatment effect or thehorizontality of the two profiles.This may be written as:

H(3)0 : C (µ1 + µ2) = 0.

Ostap Okhrin 424 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

It is easy to prove that H(3)0 with H

(1)0 implies that

C

(n1µ1 + n2µ2

n1 + n2

)= 0.

So under parallel, horizontal profiles we have√n1 + n2Cx ∼ Np(0,CΣC ′).

We obtain

(n1 + n2 − 2)(Cx)>(CSC>)−1Cx ∼ T 2(p − 1, n1 + n2 − 2).

This leads to the rejection region of H(3)0

n1 + n2 − p

p − 1(Cx)>(CSC>)−1Cx > F1−α;p−1,n1+n2−p.

Ostap Okhrin 425 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleWechsler Adult Intelligence Scale (WAIS) for 2 categories of people: ingroup 1 are n1 = 37 people who do not present a senile factor, group 2are those (n2 = 12) presenting a senile factor. The four WAIS subtestsare X1 (information), X2 (similarities), X3 (arithmetic) and X4 (picturecompletion). The relevant statistics are

x>1 = (12.57 9.57 11.49 7.97)

x>2 = (8.75 5.33 8.50 4.75)

Ostap Okhrin 426 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Example

S1 =

11.1648.840 11.7596.210 5.778 10.7902.020 0.529 1.743 3.594

S2 =

9.6889.583 16.7228.875 11.083 12.0837.021 8.167 4.875 11.688

Ostap Okhrin 427 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

ExampleThe test statistics for testing the parallelism of the two profiles isFobs = 0.4634 which is not significant (p − value = 0.71) so we canaccept the parallelism.The second test (equality of the levels of the 2 profiles) is given withFobs = 17.2146 which is highly significant (p-value ' 10−4): theglobal level of the test for the non-senile people is superior to thesenile group.Finally, the final test (horizontality of the average profile) givesFobs = 53.317 which is also highly significant (p-value ' 10−14).There are significant differences among the means of the differentsubtests.

Ostap Okhrin 428 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Summary: Linear Hypothesis

Hypotheses about µ can often be written as Aµ = a, with matrixA, and vector a.

The hypothesis H0 : Aµ = a for X ∼ Np(µ,Σ) with Σ knownleads to −2 log λ = n(Ax − a)>(AΣA>)−1(Ax − a) ∼ χ2

q, whereq is the number of elements in a.

Ostap Okhrin 429 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Summary: Linear Hypothesis

The hypothesis H0 : Aµ = a for X ∼ Np(µ,Σ) with Σ unknownleads to−2 log λ = n log1 + (Ax − a)>(ASA)−1(Ax − a) −→ χ2

q,where q is the number of elements in a and we have an exact test(n − 1)(Ax − a)>(ASA>)−1(Ax − a) ∼ T 2(q, n − 1).

Ostap Okhrin 430 of 461

Angewandte Multivariate Statistik Hypothesis Testing Linear Hypothesis

Summary: Linear Hypothesis

The hypothesis H0 : Aβ = a for Yi ∼ N1(β>xi , σ2) with σ2

unknown leads to −2 log λ = n2 log

(||y−X β||2

||y−X β||2− 1)−→ χ2

q, withq being the length of a and with

n − p

q

(Aβ − a

)A(X>X

)−1A>−1 (

Aβ − a)

(y −X β

)> (y −X β

) ∼ Fq,n−p.

Ostap Okhrin 431 of 461

Angewandte Multivariate Statistik Regression Models

Regression Models

Linear Regression

y = Xβ + ε

X (n × p) explanatory variabley(n × 1) response

Ostap Okhrin 432 of 461

Angewandte Multivariate Statistik Regression Models

Example

Let x1, x2 be two factors that explain the variation of response y

yi = β0 + β1xi1 + β2xi2 + β3x2i1 + β4x

2i2 + β5xi1xi2 + εi

i = 1, . . . , n

X =

1 x11 x12 x2

11 x212 x11x12

1 x21 x22 x221 x2

22 x21x22...

......

......

...1 xn1 xn2 x2

n1 x2n2 xn1xn2

Ostap Okhrin 433 of 461

Angewandte Multivariate Statistik Regression Models

Example

Abbildung 20: 3-D response surface MVAresponsesurface

Ostap Okhrin 434 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

ANOVA Models

One factor (p levels) model

yk` = µ+ α` + εk`, k = 1, . . . , n`, and ` = 1, . . . , p

Pullover example: p = 3 marketing strategies, y = Xβ + ε

X =

1 1 01 1 01 0 11 0 11 −1 −11 −1 −1

Ostap Okhrin 435 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Multiple-Factors Models

Example: 3 marketing strategies, 2 locations

A1 A2 A3

B1 18 15 5 8 8 10 14B2 15 20 25 30 10 12 20 25

Tabelle 9: A two factor ANOVA data set, factor A, three levels of the marketingstrategy and factor B, two levels for the location. The figures represent theresulting sales during the same period.

Ostap Okhrin 436 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

General Two Factor Model

yijk = µ+ αi + γj + (αγ)ij + εijk

i = 1, . . . , r , j = 1, . . . , s, k = 1, . . . , nij

r∑i=1

αi = 0,s∑

j=1

γj = 0

r∑i=1

(αγ)ij = 0,s∑

j=1

(αγ)ij = 0

For the marketing data: r = 3, s = 2 Interactions: (αγ)ij

Ostap Okhrin 437 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Example

(αγ)11 > 0 The effect of A1 (advertisement in local newspaper)more successful in location B1 (commercial centre)

(αγ)31 < 0 A3 (luxury presentation) less effective in B1 than inB2 (non-commercial centre)

Ostap Okhrin 438 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Model without Interactions

y =(

18 15 15 20 25 30 5 8 8 10 12 10 14 20 25)>

X =

1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 0 0 0 0 0 −1 −1 −1 −10 0 0 0 0 0 1 1 1 1 1 −1 −1 −1 −11 1 −1 −1 −1 −1 1 1 1 −1 −1 1 1 −1 −1

>

Ostap Okhrin 439 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Model with Interactions

X =

1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 0 0 0 0 0 −1 −1 −1 −10 0 0 0 0 0 1 1 1 1 1 −1 −1 −1 −11 1 −1 −1 −1 −1 1 1 1 −1 −1 1 1 −1 −11 1 −1 −1 −1 −1 0 0 0 0 0 −1 −1 1 10 0 0 0 0 0 1 1 1 −1 −1 −1 −1 1 1

>

Ostap Okhrin 440 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Example

β p-valuesµ 15.25α1 4.25 0.0218α2 -6.25 0.0033γ1 -3.42 0.0139

(αγ)11 0.42 0.7922(αγ)21 1.42 0.8096

Tabelle 10: The values of β in the full model with interactions for the marketingdata (RSSfull = 158)

Ostap Okhrin 441 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

ANCOVA Models

Regression models where some of the variables are qualitative andothers are continuous

Example: Consider the Car data and analyse the effect of weight (W )and displacement (D) on the mileage (M). Test if the origin of the car(C ) has some effect on the response and if the effect of the continuousvariables is same for different factor levels.

Ostap Okhrin 442 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Example

β p-values β p-valuesµ 41.0066 0.0000 43.4031 0.0000W -0.0073 0.0000 -0.0074 0.0000D 0.0118 0.2250 0.0081 0.4140C -0.9675 0.1250

Tabelle 11: Estimation of the effects of weight and displacement on the mileageMVAcareffect

Ostap Okhrin 443 of 461

Angewandte Multivariate Statistik Regression Models General ANOVA and ANCOVA Models

Example

µ p-values W p-values D p-valuesc = 1 40.043 0.0000 -0.0065 0.0000 0.0058 0.3790c = 2 47.557 0.0005 0.0081 0.3666 -0.3582 0.0160c = 3 44.174 0.0002 0.0039 0.7556 -0.2650 0.3031

Tabelle 12: Different factor levels on the response MVAcareffect

Ostap Okhrin 444 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Categorical Responses

The response variable is categorical (qualitative) Observe counts yk for class k = 1, . . . ,K Likelihood

L =n!∏K

k=1 yk !

K∏k=1

(mk

n

)yk Idea: make logmk linear on X

Ostap Okhrin 445 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Two-Way Tables

yjk is the number of observations on cell (j , k)

Multinomial likelihood

L =n!∏J

j=1∏K

k=1 yjk !

J∏j=1

K∏k=1

(mjk

n

)yjk No interaction

logmjk = µ+ αj + γk for j = 1, . . . , J, k = 1, . . . ,K

logm = Xβ

Ostap Okhrin 446 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Model without Interaction

logm =

logm11logm12logm13logm21logm22logm23

, X =

1 1 1 01 1 0 11 1 −1 −11 −1 1 01 −1 0 11 −1 −1 −1

, β =

β0β1β2β3

Ostap Okhrin 447 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Model without Interaction

Likelihood

Lβ =J∑

j=1

K∑k=1

yjk logmjk s.t.∑j ,k

mjk = n

α1 = β1, α2 = −β1

γ1 = β2, γ2 = β3, γ3 = −(β2 + β3)

Ostap Okhrin 448 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Model with Interactions

logmjk = µ+ αj + γk + (αγ)jk , j = 1, . . . , J, k = 1 . . . ,K

K∑k=1

(αγ)jk = 0, for j = 1, . . . , J

J∑j=1

(αγ)jk = 0, for k = 1, . . . ,K

Ostap Okhrin 449 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Testing with Count Data

yk count data mk value predicted by the model

Pearson chi-square

χ2 =K∑

k=1

(yk − mk)2

mk

Deviance

G 2 = 2K∑

k=1

yk log(

ykmk

)

Ostap Okhrin 450 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Testing with Count Data

Both statistics are asymptotically χ2 distributed

Degrees of freedom

d .f . = # free cells−# free parameters estimated

TestH0 : reduced model with r degrees of freedomH1 : full model with f degrees of freedom

Ostap Okhrin 451 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Testing with Count Data

G 2H0− G 2

H1∼ χ2

r−f

Reject H0

Pχ2r−f >

(G 2H0− G 2

H1

)observed

Ostap Okhrin 452 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Example

2× 2× 5 table of n = 5833 counts on prescribed drugs

M A1 A2 A3 A4 A5DY 21 32 70 43 19DN 683 596 705 295 99F A1 A2 A3 A4 A5DY 46 89 169 98 51DN 738 700 847 336 196

Tabelle 13: A Three-way Contingency Table: top table for men and bottomtable for women MVAdrug

Ostap Okhrin 453 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Example

β0 intercept 5.0089 β10 0.0205β1 gender: M −0.2867 β11 0.0482β2 drug: DY −1.0660 β12 drug*age −0.4983β3 age −0.0080 β13 −0.1807β4 0.2151 β14 0.0857β5 0.6607 β15 0.2766β6 −0.0463 β16 gender*drug*age −0.0134β7 gender*drug −0.1632 β17 −0.0523β8 gender*age 0.0713 β18 −0.0112β9 −0.0092 β19 −0.0102

Ostap Okhrin 454 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Example

β0 intercept 5.0051 β8 gender*age 0.0795β1 gender: M −0.2919 β9 0.0321β2 drug: DY −1.0717 β10 0.0265β3 age −0.0030 β11 0.0534β4 0.2358 β12 drug*age −0.4915β5 0.6649 β13 −0.1576β6 −0.0425 β14 0.0917β7 gender*drug −0.1734 β15 0.2822

Tabelle 14: Coefficients estimates based on the saturated model (previousslide) and ML method (current slide) MVAdrug3waysTab

Ostap Okhrin 455 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Logit Models

p (xi ) = P(yi = 1 | xi ) =

exp(β0 +

p∑j=1

βjxij)

1 + exp(β0 +

p∑j=1

βjxij)

Log odds ratio is linear

log

p (xi )

1− p (xi )

= β0 +

p∑j=1

βjxij

Ostap Okhrin 456 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Logit Models

Likelihood function

L(β0, β) =n∏

i=1

p (xi )yi1− p (xi )1−yi

Log-likelihood function

`(β0, β) =n∑

i=1

[yi log p (xi ) + (1− yi ) log1− p (xi )]

Ostap Okhrin 457 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Example

β p-valuesβ0 3.6042 0.0660β3 -0.2031 0.0037β4 -0.0205 0.0183β5 -1.1841 0.3108

Tabelle 15: Estimation of the financial characteristics on bankrupt banks withlogit model MVAbankrupt

Ostap Okhrin 458 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Summary: Regression Models

In contingency tables, the categories are defined by the qualitativevariables.

The saturated model has all of the interaction terms, and 0degree of freedom.

The non-saturated model is a reduced model since it fixes someparameters to be zero.

Ostap Okhrin 459 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Summary: Regression Models

Two statistics to test for the full model and the reduced modelare:

X 2 =K∑

k=1

(yk − mk)2/mk

G 2 = 2K∑

k=1

yk log (yk/mk)

Ostap Okhrin 460 of 461

Angewandte Multivariate Statistik Regression Models Categorical Responses

Summary: Regression Models

The logit models allow the column categories to be a quantitativevariable, and quantify the effect of the column category by usingfewer parameters and incorporating more flexible relationshipsthan just a linear one.

The logit model is equivalent to a log-linear model.

log [p (xi )/1− p (xi )] = β0 +

p∑j=1

βjxij

Ostap Okhrin 461 of 461