L4 linear model and matrix representation

Introduction to Linear ModelStatistical Methods in Finance

Lecture 4

Ta-Wei Huang

December 7, 2016

Ta-Wei Huang Introduction to Linear Model December 7, 2016 1 / 29

Table of Contents

Regression analysis is almost certainly the most important tool at the

econometrician’s disposal. But what is regression? Let’s what we’ll talk

about in today’s lecture.

1 Basic Idea in Regression

2 Matrix Representation

3 The Least Square Estimator

4 Next Lecture


Table of Contents







4 Next Lecture


Table of Contents







4 Next Lecture


Table of Contents







4 Next Lecture


Basic Idea in Regression

What is Linear Model 1

In very general terms, regression (linear model) is concerned with

describing and evaluating the linear relationship between a given

variable and one or more other variables.

More specifically, linear model is an attempt to explain movements in

a variable by reference to movements in one or more other variables.

(Causal Relationship)



What is Linear Model 2

Causal Relationship: Y = f(X1, · · · , Xk) + ε

Y : Output Variable (Response, Effect)

Xi: Input Variables (Causes)

f : a function presenting the causal relationship

ε: a random error term

The causal relationship f is deterministic but unknown. Can we

approximate it by

f(X1, · · · , Xn) =∑

i βigi(X1, · · · , Xk),

where gi is known (and chosen by ourselves)?



Definition of Linear Model

Definition (Linear Model)

The linear model of an output variable Y with input variables X1, · · · , Xk

has the general form

Y =

p∑i=1

βigi(X1, · · · , Xk) + ε,

where X1, · · · , Xn are accurate deterministic, gi is a known function of

X1, · · · , Xn for i = 1, . . . , k, βi is a unknown parameter entered in

linearity for i = 1, . . . , p, and ε is a random error term.



Explanation of Linear Model

Y =

p∑i=1

βigi(X1, · · · , Xk) + ε

The definition implies that we know the form gi of effects on Y , but

we don’t know the magnitude βi of effects on Y . ⇒ signal

In a linear model, we assume that variation due to random error ε

only occurs on the output Y . ⇒ error



Rationale of Linear Model 1

A general model is given by Y = f(X1, · · · , Xk) + ε, where f is unknown

and arbitrary.

By Taylor’s theorem, f(X1, · · · , Xn) =∑∞

i=01i!D

if(X) ·X, which

implies that there are infinite parameters to be estimated.

Usually we don’t have enough data to estimate f directly, and so we

have to assume that it has some more restricted form.

A local (the range of data X1, · · · , Xk) approximation of f may be

achievable by a linear model.

Because the predictors can be transformed and combined in any way,

linear models are actually very flexible.



Rationale of Linear Model 2

Let f(X) =∑p

i=1 βigi(X). Then the predictive value Y = f(X).

The mean squared error (MSE) is E(Y − Y )2 = E(f(X) + ε− f(X))2

= (f(X)− f(X))2︸︷︷︸reducible

+ V ar(ε)︸︷︷︸irreducible

An intuitive way is to locally minimize the reducible error

(f(X)− f(X))2, but not always do that for some reasons.



When to Use Linear Model 1

Examine the causal relationship between an output variable Y

(effect) and some input variables X = (X1, · · · , Xk) (causes).

Question

Let Yi,t = Ri,t −Rf,t and Xi,t = Rm,t −Rf,t. We find out the linear

model Yi,t = αi,t + βi,tXi,t + εi,t fitting well. Can we conclude that the

reason for a higher return on one stock is the higher market premium?

From the above question, we know that sometimes the causal

relationship is not quite clear, especially in financial data. Can we still

use linear models to model behaviours in financial markets?




Even where no sensible causal relationship exists between X and Y ,

we may wish to relate them by some sore of mathematical equation

(rationale: sample from a multivariate normal population) since there

is a strong association between X and Y .

From mathematical derivation, one can see that if (X, Y ) follows a

joint normal distribution, then Y |X has the pattern of linear model.

(See: Sampling Model)




Question

To investigate the behaviour of returns, which models should be used?

1 Ri,t −Rf,t = αi,t + βi,t(Rm,t −Rf,t) + εi,t

2 Rm,t −Rf,t = αi,t + βi,t(Ri,t −Rf,t) + εi,t

The domain knowledge is important to determine input and output

variables, and that’s why theoretical models are still important even if

they are not realistic. However, in machine learning and prediction,

association relationship is enough!



Data Type in Linear Model

A response/output/dependent variable Y is modeled or explained by

predicton/input/independent/regressor variables that are functions of

X = (X1, · · · , Xk).

Y : an ”approximately” continuous random variable

X: continuous/discrete/categorical deterministic variables

Note that if X is a random vector, then Y |X allows us to treat X as

a deterministic vector.



Types of Linear Model

X = (X1, · · · , Xk): quantitative ⇒ multiple regression

X = (X1, · · · , Xk): qualitative + quantitative ⇒ analysis of

covariance

X = (X1, · · · , Xk): qualitative ⇒ analysis of variance (ANOVA)

multiple Y ’s ⇒ multivariate regression

qualitative Y ⇒ generalized linear model (logistic regression)



Statistical Procedure of Linear Model


Matrix Representation

The Data Structure

The data structure of n records with one output variable Y and k input

variables X1, ..., Xk is

Y X1 X2 · · · Xk

y1 x11 x12 · · · x1k

y2 x21 x22 · · · x2k...

......

......

yn xn1 xn2 · · · xnk




The functional form of a linear model is

Yi = β0 + β1g1(xi1, · · · , xik) + · · ·+ βp−1gp−1(xi1, · · · , xik) + εi,

for i = 1, 2, . . . , n.

The matrix representation of that model is

Y = Xβ + ε,

where Yn×1 =

(y1...yn

), Xn×p =

1 g11 ··· g1p−1

......

. . ....

1 gn1 ··· gnp−1

,

βp×1 =

(β0...

βp−1

), and εn×1 =

( ε1...εn

).



Common Linear Models

Definition

A linear model Y = Xβ + ε is said to be

a least square model if there is no assumption on ε. The parameter

space is Θ = β : β ∈ Rp.

a Gauss Markov model if E(ε) = 0 and cov(ε) = σ2I. The

parameter space is Θ = (β, σ2) : β ∈ Rp, σ2 ∈ R+.

a Aitken model if E(ε) = 0 and cov(ε) = σ2V, where V is known.

The parameter space is Θ = (β, σ2) : β ∈ Rp, σ2 ∈ R+.

a general linear mixed model if E(ε) = 0 and cov(ε) = Σ ≡ Σ(θ).

The parameter space is Θ = (β, θ) : β ∈ Rp, σ2 ∈ Ω, where Ω is

the set of all values of θ such that Σ(θ) is positive definite.



Gauss Markov Model

Definition

A linear model Y = Xβ + ε is said to be a Gauss Markov model if

E(ε) = 0 and cov(ε) = σ2I. The parameter space of this model is

Θ = (β, σ2) : β ∈ Rp, σ2 ∈ R+.

Common Gauss Markov Model

One-sample Problem

Simple Linear Regression

Multiple Linear Regression

ANOVA and ANCOVA



Example 1 (One-sample Problem)

Assume that Y1, · · · , Yn is an iid sample with mean µ and variance

σ2 > 0. If ε1, · · · , εn are iid with mean E(εi) = 0 and common variance

σ2, then the functional form of the GM model is Yi = µ+ εi. The matrix

form of this model is Y = Xβ + ε, where Yn×1 =

(Y1...Yn

), Xn×1 =

( 1...1

),

β1×1 = µ, and εn×1 =

( ε1...εn

).



Example 2 (Simple Linear Regression)

Consider the model where a response variable Y is linearly related to an

independent variable x via Yi = β0 + β1xi + εi for i = 1, 2, . . . , n, where

εi are uncorrelated random variables with mean 0 and common variance

σ2 > 0. The matrix form of this model is Y = Xβ + ε, where

Yn×1 =

(Y1...Yn

), Xn×2 =

( 1 x1...

...1 xn

), β2×1 =

(β0β1

), and εn×1 =

( ε1...εn

).



Example 3 (Multiple Linear Regression)

Consider the model where a response variable Y is linearly related to

several independent variables, say x1, · · · , xk via

Yi = β0 + β1xi1 + · · ·+ β1xk + εi for i = 1, 2, . . . , n, where εi are

uncorrelated with mean 0 and common variance σ2 > 0. The matrix form

of this model is Y = Xβ + ε, where Yn×1 =

(Y1...Yn

),

Xn×p =

1 x11 ··· x1k...

.... . .

...1 xn1 ··· xnk

, βp×1 =

(β0...βk

), and εn×1 =

( ε1...εn

).


The Least Square Estimator

Introduction

Consider the GM linear mode Y = Xβ + ε, where Y is an n× 1 vector of

observed responses, X is an n× p matrix of functions of input variables, β

is a p× 1 unknown parameters needed estimating, and ε is an n× 1 vector

of random errors.

If Y is a random vector but input variables x1, · · · , xn are fixed

constants, then E(Y) = Xβ and cov(Y ) = σ2I.

If Y and input variables X1, · · · , Xn are all random, then

E(Y|X1, · · · , Xn) = Xβ and cov(Y |X1, · · · , Xn) = σ2I.



A Geometric Viewpoint: Simple Case 1

Now, consider a simple regression model Yi = β0 + β1xi + εi, i = 1, 2, 3.

⇒(Y1Y2Y3

)=

(1 x11 x21 x3

)(β0β1

)+(ε1ε2ε3

)= β0

(111

)+ β1

(x1x2x3

)+(ε1ε2ε3

)Then the random vector Y ∈ R3 with

two dimensions coming from β0

(111

)+ β1

(x1x2x3

)one dimension coming from

(ε1ε2ε3

)Let β =

(β0β1

)be an estimator for β =

(β0β1

).




Let Ω = β0(

111

)+ β1

(x1x2x3

): β0, β1 ∈ R. Then dim(Ω) = 2.

Finding β is equivalent to finding a vector Xβ on two-dimensional Ω.

Our target is to find β such that Y = Xβ + ε = Xβ + ε.

Question

What Y = Xβ captures information best?

Answer

Find β such that Y = Xβ close to Y. ⇒ What is the meaning of

”close”? The Euclidean distance?






A General Case

Consider the GM linear mode Y = Xβ + ε, where Y is an n× 1 vector of

observed responses, X is an n× p matrix of functions of input variables, β

is a p× 1 unknown parameters needed estimating, and ε is an n× 1 vector

of random errors.

Question

The Euclidean distance is (Y −Xβ)′(Y −Xβ) = ε′ε. Under what

assumptions is this a good measure for closeness?



Assumptions

Assumptions: cov(ε) = σ2I

Homoskedasticity

Uncorrelation



The Ordinary Least Square Estimator

Definition (Least Square Estimator)

An estimator β is a least squares estimate of β if

β = arg minβ∈Rp(Y −Xβ)′(Y −Xβ).


Next Lecture

The Next Lecture

In next lecture, I will introduce a classical local optimization methods - the

least square model, and then discuss the geometrical interpretation of the

ordinary least square estimator.


Data & Analytics

L4 linear model and matrix representation