View
286
Download
2
Category
Tags:
Preview:
DESCRIPTION
Second course of Applied Statistics, MSc level in Buisiness School
Citation preview
Applied Statistics Vincent JEANNIN – ESGF 4IFM
Q1 2012
1
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
2
Summary of the session (est. 4.5h) • R Steps by Steps • Reminders of last session • The Value at Risk • OLS & Exploration
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
R Step by Step
3
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
http://www.r-project.org/
Downloadable for free (open source)
4
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Main screen
5
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Menu: File / New Script
6
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Step 1, upload your data
Excel CSV file easy to import
Path C:\Users\vin\Desktop
DATA<-read.csv(file="C:/Users/vin/Desktop/DataFile.csv",header=T)
Note: 4 columns with headers
7
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Run your instruction(s)
8
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
You can call variables anytime you want
9
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
10
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
summary(DATA) Shows a quick summary of the distribution of all variables
SPX SPXr AMEXr AMEX
Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287
1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580
Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007
Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891
3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923
Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967
Min. 1st Qu. Median Mean 3rd Qu. Max.
86.43 95.70 100.80 99.67 103.80 107.20
summary(DATA$SPX) Shows a quick summary of the distribution of one variable
Careful using the following instructions min(DATA)
max(DATA)
This will consider DATA as one variable
> min(DATA)
[1] -0.08832874
> max(DATA)
[1] 123.4793
> sd(DATA)
SPX SPXr AMEXr AMEX
4.92763551 0.01468776 6.03035318 0.01915489
> mean(DATA)
SPX SPXr AMEXr AMEX
9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04
Mean & SD
11
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Easy to show histogram
hist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq",
xlab="SPXr", col="blue")
12
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Obvious Excess Kurtosis Obvious Asymmetry
Functions doesn’t exists directly in R…
However some VNP (Very Nice Programmer) built and shared add-in
Package Moments
13
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Menu: Packages / Install Package(s)
• Choose whatever mirror (server) you want • Usually France (Toulouse) is very good as it’s a
University Server with all the packages available
14
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
require(moments)
library(moments)
Once installed, you can load them with the following instructions:
New functions can now be used!
15
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
> require(moments)
> library(moments)
> skewness(DATA)
SPX SPXr AMEXr AMEX
-0.6358029 -0.4178701 0.1876994 -0.2453693
> kurtosis(DATA)
SPX SPXr AMEXr AMEX
2.411177 5.671254 2.078366 5.770583
Btw, you can store any result in a variable
> Kur<-kurtosis(DATA$SPXr)
> Kur
[1] 5.671254
16
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Lost?
Call the help! help(kurtosis)
Reminds you the package
Syntax
Arguments definition
17
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Let’s store a few values
x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500)
Build a sequence, the x axis
SPMean<-mean(DATA$SPXr)
SPSD<-sd(DATA$SPXr) Package Stats
Build a normal density on these x
Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats
hist(DATA$SPXr, breaks=25,main="S&P Returns / Normal
Distribution",xlab="Returns",ylab="Occurences", col="blue")
Display the histogram
Display on top of it the normal density
lines(x,y1,type="l",lwd=3,col="red")
Package graphics
Package graphics
18
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Positive Excess Kurtosis & Negative Skew
19
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Let’s build a spread Spd<-DATA$SPXr-DATA$AMEX
What is the mean?
Mean is linear 𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸(𝑌)
𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸(𝑌)
> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)
[1] 0
Let’s verify
20
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
What is the standard deviation?
Is standard deviation linear? NO! VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝐸 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)
> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5
[1] 0.01019212
> sd(Spd)
[1] 0.01019212
Let’s show the implication in a proper manner
Let’s create a portfolio containing half of each stocks
21
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Portf<-0.5*DATA$SPXr+0.5*DATA$AMEX
plot(sd(DATA$SPXr),mean(DATA$SPXr),col="blue",ylim=c(0,0.0008),xlim=c(0.012
,0.022),ylab="Return",xlab="Vol")
points(sd(DATA$AMEX),mean(DATA$AMEX),col="red")
points(sd(Portf),mean(Portf),col="green")
22
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
The efficient frontier
23
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
points(sd(0.1*DATA$SPXr+0.9*DATA$AMEX),mean(0.1*DATA$SPXr+0.9*DATA$AMEX),c
ol="green")
points(sd(0.2*DATA$SPXr+0.8*DATA$AMEX),mean(0.2*DATA$SPXr+0.8*DATA$AMEX),c
ol="green")
points(sd(0.3*DATA$SPXr+0.7*DATA$AMEX),mean(0.3*DATA$SPXr+0.7*DATA$AMEX),c
ol="green")
points(sd(0.4*DATA$SPXr+0.6*DATA$AMEX),mean(0.4*DATA$SPXr+0.6*DATA$AMEX),c
ol="green")
points(sd(0.6*DATA$SPXr+0.4*DATA$AMEX),mean(0.6*DATA$SPXr+0.4*DATA$AMEX),c
ol="green")
points(sd(0.7*DATA$SPXr+0.3*DATA$AMEX),mean(0.7*DATA$SPXr+0.3*DATA$AMEX),c
ol="green")
points(sd(0.8*DATA$SPXr+0.2*DATA$AMEX),mean(0.8*DATA$SPXr+0.2*DATA$AMEX),c
ol="green")
points(sd(0.9*DATA$SPXr+0.1*DATA$AMEX),mean(0.9*DATA$SPXr+0.1*DATA$AMEX),c
ol="green")
24
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
plot(DATA$AMEX,DATA$SPXr)
abline(lm(DATA$AMEX~DATA$SPXr), col="blue")
25
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
LM stands for Linear Models
> lm(DATA$AMEX~DATA$SPXr)
Call:
lm(formula = DATA$AMEX ~ DATA$SPXr)
Coefficients:
(Intercept) DATA$SPXr
0.0004505 1.1096287
𝑦 = 1.1096𝑥 + 0.04%
Will be used later for linear regression and hedging
26
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Do you remember what is the most platykurtic distribution in the nature?
Toss Head = Success = 1 / Tail = Failure = 0
> require(moments)
Loading required package: moments
> library(moments)
> toss<-rbinom(100,1,0.5)
> mean(toss)
[1] 0.52
> kurtosis(toss)
[1] 1.006410
> kurtosis(toss)-3
[1] -1.993590
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 52
Let’s test the fairness
100 toss… Else memory issue…
27
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑓 𝑟 𝐻 = ℎ, 𝑇 = 𝑡 =𝑁 + 1 !
ℎ! 𝑡!𝑟ℎ(1 − 𝑟)𝑡
Density of a binomial distribution
Let’s plot this density with
ℎ = 52
𝑡 = 48
𝑁 = 100 N<-100
h<-52
t<-48
r<-seq(0,1,length=500)
y<-
(factorial(N+1)/(factorial(h)*factori
al(t)))*r^h*(1-r)^t
plot(r,y,type="l",col="red",main="Pro
bability density to have 52 head out
100 flips")
28
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
If the probability between 45% and 55% is significant we’ll accept the fairness
What do you think?
29
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
What is the problem with this coin?
Toss it! Head = Success = 1 / Tail = Failure = 0
> require(moments)
Loading required package: moments
> library(moments)
> toss<-rbinom(100,1,0.7)
> mean(toss)
[1] 0.72
> kurtosis(toss)
[1] 1.960317
> kurtosis(toss)-3
[1] -1.039683
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 72
Let’s test the fairness (assuming you don’t know it’s a trick)
100 toss
Obvious fake! Assuming the probability of head is 0.7
30
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
If the probability between 45% and 55% is significant we’ll accept the fairness
N<-100
h<-72
t<-28
r<-seq(0.2,0.8,length=500)
y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t
plot(r,y,type="l",col="red",main="Probability density or r given 72
head out 100 flips")
Trick coin!
Reminders of last session
31
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Snapshot, 4 moments:
Mean
SD
Skewness
Kurtosis
0
1
0
3
Normal Standard Distribution
32
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑃 𝑋 ≤ 𝜇 = 𝑃 𝑋 ≤ −𝜎 + 𝜇
𝑃 𝑋 ≤ −2 ∗ 𝜎 + 𝜇
𝑃 𝑋 ≤ −3 ∗ 𝜎 + 𝜇
𝑃 𝜇 − 𝜎 ≤ 𝑋 ≤ 𝜇 + 𝜎
𝑃 𝜇 − 2 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 2 ∗ 𝜎
𝑃 𝜇 − 3 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 3 ∗ 𝜎
𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇
𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇
0.5
= 0.05
= 0.01
= 0.159
= 0.023
= 0.001
= 0.682
= 0.954
= 0.996
33
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑓 𝑥 =1
2𝜋𝜎2𝑒−(𝑥−𝜇)2
2𝜎2 Density
𝑁(𝜇, 𝜎) Notation
𝑃 𝑋 ≤ 𝑥 = 𝜙 𝑥 = 𝑓 𝑥 𝑑𝑥𝑥
−∞
CDF
34
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Let be X~N(1,1.5) Find:
𝑃 𝑋 ≤ 4.75
𝑃 𝑋 ≤ 4.75 =P 𝑌 ≤4.75−1
1.5
With Y~N(0,1)
P 𝑌 ≤ 2.5 =?
Use the table!
P 𝑌 ≤ −2.5 =0.0062
P 𝑋 ≤ 4.75 =0.9938
P 𝑌 ≤ 2.5 =0.9938
ESG
F 4
IFM
Q1
20
12
vi
nzj
ean
nin
@h
otm
ail.c
om
35
>qqnorm(FCOJ$V1)
>qqline(FCOJ$V1)
Fat Tail
QQ Plot
36
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Discrete form 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡 𝑑𝑡𝜀
Geometric Brownian Motion
Based on Stochastic Differential Equation 𝑑𝑠𝑡 = 𝜇𝑠𝑡𝑑𝑡 + 𝜎𝑠𝑡𝑊𝑡
with 𝜀~N(0,1)
B&S
CRR
𝑢 = 𝑒𝜎 𝑡
𝑑 =1
𝑢= 𝑒−𝜎 𝑡
S𝑒𝑟𝑡 = 𝑝𝑆𝑢 + 1 − 𝑝 𝑆𝑑 𝑒𝑟𝑡 = 𝑝𝑢 + 1 − 𝑝 𝑑
𝑝 =𝑒𝑟𝑡 − 𝑑
𝑢 − 𝑑
BV= OpUp ∗ p + OpDown ∗ 1 − p ∗ 𝑒−𝑟𝑡
37
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Greeks Approximation – Taylor Development
𝑑𝐶 = 𝐶 + ∆ ∗ 𝑑𝑆 +1
2∗ 𝛾 ∗ 𝑑𝑆2
+1
6∗ 𝑆𝑝𝑒𝑒𝑑 ∗ 𝑑𝑆3
+1
24∗ 𝐺𝑟𝑒𝑒𝑘4𝑡ℎ ∗ 𝑑𝑆4
etc…
38
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Estimate with a specific confidence interval (usually 95% or 99%) the worth loss possible. In other words, the point is to identify a particular point on the left of the distribution
3 Methods
• Historical • Parametrical • Monte-Carlo
For now, we’ll focus on VaR on one linear asset… FCOJ is back!
The Value at Risk
39
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Historical VaR
• No assumption about the distribution • Easy to implement and calculate • Sensitive to the length of the history • Sensitive to very extreme values
Let’s get back to our FCOJ time series, last price is $150 cents
If we work on returns, we’ve seen the use of the PERCENTILE Excel function
• 1% Percentile is -5.22%, 99% Historical Daily VaR is -$7.83 cents • 5% Percentile is -3.34%, 95% Historical Daily VaR is -$5.00 cents
Works as well on weekly, monthly, quarterly series
40
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Historical VaR
Can be worked as well with prices variations instead of returns but it’s going to be price sensitive! So careful to the bias.
• 1% Percentile in term of price movement is -$8.11 cents • 5% Percentile in term of price movement is -$4.14 cents
41
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Parametric VaR
• Easy to implement and calculate • Assumes a particular shape of the distribution • Not really sensitive to fat tails
FCOJ Mean Return: 0.1364%
𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05
𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01
FCOJ SD: 2.1664%
We already know:
𝑃 𝑋 ≤ −3.43% = 0.05
Then:
𝑃 𝑋 ≤ −4.90% = 0.01
VaR 95% (-$5.15 cents)
VaR 99% (-$7.35 cents)
42
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Parametric VaR
𝑃 𝑋 ≤ −3.57% = 0.05
𝑃 𝑋 ≤ −5.04% = 0.01
VaR 95% (-$5.36 cents)
VaR 99% (-$8.10 cents)
Very often you assume anyway a 0 mean, therefore:
Lower values than the historical VaR
Problem with leptokurtic distributions, impact of fat tails isn’t strong on the method
43
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Monte Carlo VaR
Based on an assumption of a price process (for example GBM)
• Most efficient method when asset aren’t linear • Tough to implement • Assumes a particular shape of the distribution
Great number of random simulations on the price process to build a distribution and outline the VaR
44
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Monte Carlo VaR
library(sde)
require(sde)
FCOJ<-
read.csv(file="C:/Users/Vinz/Desktop/FCOJStats.csv",head=FALSE,sep=",")
Drift<-mean(FCOJ$V1)
Volat<-sd(FCOJ$V1)
nbsim<-252
Spot<-150
Final<-rep(1,10000)
for(i in 1:100000){
Matr<-GBM(x=Spot,r=Drift, sigma=Volat,N=nbsim)
Final[i]<-Matr[nbsim+1]}
quantile(Final, 0.05)
quantile(Final, 0.01)
Let’s simulate 10,000 GBM, 252 steps and store the final result
Don’t be fooled by the 252, we’re still making a daily simulation: what to change in the code to make it yearly?
45
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Monte Carlo VaR
> quantile(Final, 0.05)
5%
144.93
> quantile(Final, 0.01)
1%
142.7941
• 95% Daily VaR is -$5.07 cents • 99% Daily VaR is -$7.21 cents
Let’s take off the drift
46
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Monte Carlo VaR
> quantile(Final, 0.05)
5%
144.7583
> quantile(Final, 0.01)
1%
142.6412
• 95% Daily VaR is -$5.35 cents • 99% Daily VaR is -$7.36 cents
47
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Comparison
Which is the best?
48
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Going forward on the VaR
All method give different but coherent values
Easy? Yes but…
• We’ve involved one asset only • We’ve involved a linear asset
What about an option?
What about 2 assets?
49
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Going forward on the VaR
Portfolio scale: what to look at to calculate the VaR?
Big question, is the VaR additive?
NO! Keywords for the future: covariance, correlation, diversification
50
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Going forward on the VaR
Options: what to look at to calculate the VaR?
4 risk factors: • Underlying price • Interest rate • Volatility • Time
4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)
Yes but,… Does the underling price/rate/volatility vary independently?
Might be a bit more complicated than expected…
OLS & Exploration
51
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Linear regression model
Minimize the sum of the square vertical distances between the observations and the linear approximation
𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏
Residual ε
OLS: Ordinary Least Square
52
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Two parameters to estimate: • Intercept α • Slope β
Minimising residuals
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2
𝑛
𝑖=1
When E is minimal?
When partial derivatives i.r.w. a and b are 0
53
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2
𝑛
𝑖=1
𝜕𝐸
𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖
2 + 2𝑏𝑥𝑖
𝑛
𝑖=1
= 0
𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 = 𝑦𝑖2 − 2𝑎𝑥𝑖𝑦𝑖 − 2𝑏𝑦𝑖 + 𝑎2𝑥𝑖
2 + 2𝑎𝑏𝑥𝑖 + 𝑏2
Quick high school reminder if necessary…
−𝑥𝑖𝑦𝑖 + 𝑎𝑥𝑖2 + 𝑏𝑥𝑖
𝑛
𝑖=1
= 0
𝑎 ∗ 𝑥𝑖2
𝑛
𝑖=1
+ 𝑏 ∗ 𝑥𝑖
𝑛
𝑖=1
= 𝑥𝑖𝑦𝑖
𝑛
𝑖=1
𝜕𝐸
𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖
𝑛
𝑖=1
= 0
−𝑦𝑖 + 𝑏 + 𝑎𝑥𝑖
𝑛
𝑖=1
= 0
𝑎 ∗ 𝑥𝑖
𝑛
𝑖=1
+ 𝑛𝑏 = 𝑦𝑖
𝑛
𝑖=1
54
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑎 ∗ 𝑥𝑖
𝑛
𝑖=1
+ 𝑛𝑏 = 𝑦𝑖
𝑛
𝑖=1
Leads easily to the intercept
𝑎𝑛𝑥 + 𝑛𝑏 = 𝑛𝑦
𝑎𝑥 + 𝑏 = 𝑦
The regression line is going through (𝑥 , 𝑦 )
The distance of this point to the line is 0 indeed
𝜕𝐸
𝜕𝑏
𝑏 = 𝑦 − 𝑎𝑥
55
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝜕𝐸
𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖
2 + 2𝑏𝑥𝑖
𝑛
𝑖=1
= 0
y = 𝑎𝑥 + 𝑦 − 𝑎𝑥
y − 𝑦 = 𝑎(𝑥 − 𝑥 )
𝑏 = 𝑦 − 𝑎𝑥
𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 = 0
𝑛
𝑖=1
𝜕𝐸
𝜕𝑏= −2𝑦𝑖 + 2𝑏 + 2𝑎𝑥𝑖 = 0
𝑛
𝑖=1
𝑦𝑖 − 𝑏 − 𝑎𝑥𝑖
𝑛
𝑖=1
= 0
𝑦𝑖 − 𝑦 + 𝑎𝑥 − 𝑎𝑥𝑖 = 0
𝑛
𝑖=1
(𝑦𝑖 − 𝑦 ) − 𝑎(𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥𝑖 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑦 + 𝑎𝑥 = 0
𝑛
𝑖=1
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
56
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 𝑥 ( 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
𝑥𝑖(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
− 𝑥 𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥
𝑛
𝑖=1
= 0
(𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 − 𝑎 𝑥𝑖 − 𝑥 )
𝑛
𝑖=1
= 0
𝑎 = (𝑥𝑖−𝑥 )(𝑦𝑖 − 𝑦 )𝑛
𝑖=1
(𝑥𝑖−𝑥 )2 𝑛𝑖=1
Finally…
We have
and
57
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑎 = (𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦 )𝑛
𝑖=1
(𝑥𝑖 − 𝑥 )2𝑛𝑖=1
Covariance
Variance
𝑎 =𝐶𝑜𝑣𝑥𝑦
𝜎2𝑥
𝑏 = 𝑦 − 𝑎 𝑥
You can use Excel function INTERCEPT and SLOPE
58
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}
You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
59
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Let’s asses the quality of the regression
Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC):
𝑟 =𝐶𝑜𝑣𝑥𝑦
𝜎𝑥𝜎𝑦 Value between -1 and 1
𝑟 = 1 Perfect dependence
𝑟 ~0 No dependence
Give an idea of the dispersion of the scatterplot
You can use Excel function CORREL
60
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
R=0.96
High quality
R=0.62
Poor quality
61
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
What is good quality?
Slightly discretionary…
𝑟 ≥3
2= 0.8666…
If
It’s largely admitted as the threshold for acceptable / poor
62
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
The regression itself introduces a bias
Let’s introduce the coefficient of determination R-Squared
Total Dispersion = Dispersion Regression + Dispersion Residual
Dispersion Regression
Total Dispersion 𝑅2 =
In other words the part of the total dispersion explained by the regression
𝑦𝑖 − 𝑦 2 = 𝑦𝑖 − 𝑦𝑖 2 + 𝑦𝑖 − 𝑦 2
You can use Excel function RSQ
63
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
In a simple linear regression with intercept 𝑅2 = 𝑟2
Is a good correlation coefficient and a good coefficient of determination enough to accept the regression?
Not necessarily!
Residuals need to have no effect, in other word to be a white noise!
64
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
65
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑦 = 7.5
𝑥 = 9
𝑦 = 3 + 0.5𝑥
𝑟 = 0.82
𝑅2 = 0.67
Don’t get fooled by numbers!
For every dataset of the Quarter
Can you say at this stage which regression is the best?
Certainly not those on the right you need a LINEAR dependence
66
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Is any linear regression useless?
Think what you could do to the series
Polynomial transformation, log transformation,…
Else, non linear regressions, but it’s another story
67
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
First application on financial market
S&P / AmEx in 2011
68
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑅𝐴𝑚𝑒𝑥 = 0.06% + 1.1046 ∗ 𝑅𝑆&𝑃
𝑟 =𝐶𝑜𝑣𝐴𝑚𝐸𝑥,𝑆&𝑃
𝜎𝐴𝑚𝐸𝑥𝜎𝑆&𝑃= 0.8501
𝑅2 = 𝑟2 = 0.7227
Oups :-o
Is Excel wrong?
R-Squared has different calculation methods
Let’s accept the following regression then as the quality seems pretty good
69
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
How to use this?
• Forecasting? Not really… Both are random variables
• Hedging? Yes but basis risk Yes but careful to the residuals…
Let’s have a try!
In theory, what is the daily result of the hedge? 𝑎
70
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Hedging $1.0M of AmEx Stocks with $1.1046M of S&P
It would have been too easy… Great differences… Why?
Sensitivity to the size of the sample
Heteroscedasticity Basis Risk
71
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
The purpose was to see if the market as effect an effect on a particular stock
The dependence is obvious but residuals too volatile for any stable application
But attention!
We are looking for causation, not correlation!
Causation implies correlation
Reciprocity is not true!
DON’T BE FOOLED BY PRETTY NUMBERS
Let prove this…
72
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Perfect linear dependence
Excellent R-Squared
Residuals are a white noise
What’s the problem then?
73
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Do you really think fresh lemon reduces car fatalities?
74
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Conclusion
75
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
R
Normal Distribution
VaR
OLS
Recommended