95
WELCOME TO MY PRESENTATION ON STATISTICAL DISTANCE

Different kind of distance and Statistical Distance

Embed Size (px)

Citation preview

Page 1: Different kind of distance and Statistical Distance

WELCOME TO MY PRESENTATION

ON STATISTICAL DISTANCE

Page 2: Different kind of distance and Statistical Distance

Md. Menhazul AbedinM.Sc. Student

Dept. of StatisticsRajshahi UniversityMob: 01751385142

Email: [email protected]

Page 3: Different kind of distance and Statistical Distance

Objectives

• To know about the meaning of statistical distance and it’s relation and difference with general or Euclidean distance

Page 4: Different kind of distance and Statistical Distance

Content Definition of Euclidean distance Concept & intuition of statistical distance Definition of Statistical distance Necessity of statistical distance Concept of Mahalanobis distance (population

&sample) Distribution of Mahalanobis distance Mahalanobis distance in RAcknowledgement

Page 5: Different kind of distance and Statistical Distance

Euclidean Distance from origin

(0,0)

(X,Y)

X

Y

Page 6: Different kind of distance and Statistical Distance

Euclidean Distance

P(X,Y) Y O (0,0) X By Pythagoras =

Page 7: Different kind of distance and Statistical Distance

Euclidean Distance

Specific point

Page 8: Different kind of distance and Statistical Distance
Page 9: Different kind of distance and Statistical Distance

we see that two specific points in each picture

Our problem is to determine the length between two points .

But how ??????????

Assume that these pictures are placed in two dimensional spaces and points are joined by a straight line

Page 10: Different kind of distance and Statistical Distance

Let 1st point is (,) and 2nd point is () then distance is

D= )

What will be happen when dimension is three

Page 11: Different kind of distance and Statistical Distance

Distanse in

Page 12: Different kind of distance and Statistical Distance

Distance is given by

• Points are (x1,x2,x3) and (y1,y2,y3)

Page 13: Different kind of distance and Statistical Distance

For n dimension it can be written as the following expression and

named as Euclidian distance

2222

211

2121

)()()(),(

),,,(),,,,(

pp

pp

yxyxyxQPd

yyyQxxxP

Page 14: Different kind of distance and Statistical Distance

05/01/2023 14

Properties of Euclidean Distance and Mathematical Distance

• Usual human concept of distance is Eucl. Dist.• Each coordinate contributes equally to the distance

2222

211

2121

)()()(),(

),,,(),,,,(

pp

pp

yxyxyxQPd

yyyQxxxP

14

Mathematicians, generalizing its three properties ,

1) d(P,Q)=d(Q,P).

2) d(P,Q)=0 if and only if P=Q and

3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance

on any set.

Page 15: Different kind of distance and Statistical Distance

P(X1,Y1) Q(X2,Y2)

R(Z1,Z2))

R(Z1,Z2)

Page 16: Different kind of distance and Statistical Distance

Taxicab Distance :Notion Red: Manhattan distance.

Green: diagonal, straight-

line distance

Blue, yellow: equivalent Manhattan distances.

Page 17: Different kind of distance and Statistical Distance

• The Manhattan distance is the simple sum of the horizontal and vertical components, whereas

the diagonal distance might be computed by applying the Pythagorean Theorem .

Page 18: Different kind of distance and Statistical Distance

• Red: Manhattan distance.• Green: diagonal, straight-line distance.• Blue, yellow: equivalent Manhattan distances.

Page 19: Different kind of distance and Statistical Distance

• Manhattan distance 12 unit

• Diagonal or straight-line distance or Euclidean distance is =6 We observe that Euclidean distance is less than Manhattan distance

Page 20: Different kind of distance and Statistical Distance

Taxicab/Manhattan distance :Definition

(p1,p2))

(q1,q2)│𝑝1−𝑞2│

│p2-q2│

Page 21: Different kind of distance and Statistical Distance

Manhattan Distance

• The taxicab distance between (p1,p2) and (q1,q2) is │p1-q1│+│p2-q2│

Page 22: Different kind of distance and Statistical Distance

Relationship between Manhattan & Euclidean distance.

7 Block

6 Block

Page 23: Different kind of distance and Statistical Distance

Relationship between Manhattan & Euclidean distance.

• It now seems that the distance from A to C is 7 blocks, while the distance from A to B is 6 blocks.

• Unless we choose to go off-road, B is now closer to A than C.

• Taxicab distance is sometimes equal to Euclidean distance, but otherwise it is greater than Euclidean distance.

Euclidean distance <Taxicab distanceIs it true always ???Or for n dimension ???

Page 24: Different kind of distance and Statistical Distance

Proof……..

Absolute values guarantee non-negative value

Addition property of inequality

Page 25: Different kind of distance and Statistical Distance

Continued………..

Page 26: Different kind of distance and Statistical Distance

Continued………..

Page 27: Different kind of distance and Statistical Distance

For high dimension

• It holds for high dimensional case • Σ │ Σ │ + 2Σ│Which implies Σ││

Page 28: Different kind of distance and Statistical Distance

05/01/2023

Statistical Distance• Weight coordinates subject to a great deal of

variability less heavily than those that are not highly variable

Who is nearer to

data set if it were

point?

Same distance from

origin

Page 29: Different kind of distance and Statistical Distance

• Here

variability in x1 axis > variability in x2 axis Is the same distance meaningful from

origin ??? Ans: noBut, how we take into account the different variability ????Ans : Give different weights on axes.

Page 30: Different kind of distance and Statistical Distance

05/01/2023

Statistical Distance for Uncorrelated Data

22

22

11

212*

22*

1

222*2111

*1

21

),(

/,/

)0,0(),,(

sx

sxxxPOd

sxxsxx

OxxP

weight

Standardization

Page 31: Different kind of distance and Statistical Distance

all point that have coordinates (x1,x2) and are a constant squared distance , from the origin must satisfy =But … how to choose c ????? It’s a problem Choose c as 95% observation fall in this area ….

= >

Page 32: Different kind of distance and Statistical Distance

05/01/2023

Ellipse of Constant Statistical Distance for Uncorrelated Data

11sc 11sc

22sc

22sc

x1

x2

0

Page 33: Different kind of distance and Statistical Distance

• This expression can be generalized as ……… statistical distance from an arbitrary point P=(x1,x2) to any fixed point Q=(y1,y2)

;lk;lk; For P dimension……………..

Page 34: Different kind of distance and Statistical Distance

Remark : 1) The distance of P to the origin O is obtain by setting all 2) If all are equal Euclidean distance formula is appropriate

Page 35: Different kind of distance and Statistical Distance

Scattered Plot for Correlated Measurements

Page 36: Different kind of distance and Statistical Distance

• How do you measure the statistical distance of the above data set ??????

• Ans : Firstly make it uncorrelated .

• But why and how………???????

• Ans: Rotate the axis keeping origin fixed.

Page 37: Different kind of distance and Statistical Distance

05/01/2023

Scattered Plot for Correlated Measurements

Page 38: Different kind of distance and Statistical Distance

Rotation of axes keeping origin fixed

O M R X1

N Q

~𝑥1

P(x1,x2)x2

~𝑥2

𝜃

𝜃

Page 39: Different kind of distance and Statistical Distance

x=OM =OR-MR =cos– sin…. (i) y=MP =QR+NP = sin cos……….(ii)

Page 40: Different kind of distance and Statistical Distance

• The solution of the above equations

Page 41: Different kind of distance and Statistical Distance

Choice of

What will you choice ? How will you do it ?

Data matrix → Centeralized data matrix → Covariance of data matrix → Eigen vector

Theta = angle between 1st eigen vector and [1,0] or angle between 2nd eigen vector and [0,1]

Page 42: Different kind of distance and Statistical Distance

Why is that angle between 1st eigen vector and [0,1] or angle between 2nd eigen vector and [1,0] ?? Ans: Let B be a (p by p) positive definite matrix with eigenvalues λ1λ2λ3λp> and associated normalized eigenvectors .Then attained when x= attained when x=

Page 43: Different kind of distance and Statistical Distance

attained when x=

Page 44: Different kind of distance and Statistical Distance

Choice of #### Excercise 16.page(309).Heights in inches (x) & Weights in pounds(y). An Introduction to Statistics and Probability M.Nurul Islam ####### x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68,68,68,70,70,70);xy=c(115,120,130,125,130,140,120,135,130,145,135,170,140,155,150,160,175,180,160,175);y ############V=eigen(cov(cdata))$vectors;Vas.matrix(cdata)%*%Vplot(x,y)

Page 45: Different kind of distance and Statistical Distance

data=data.frame(x,y);dataas.matrix(data)colMeans(data)xmv=c(rep(64.8,20));xmv ### x mean vector ymv=c(rep(144.5,20));ymv ### y mean vector meanmatrix=cbind(xmv,ymv);meanmatrixcdata=data-meanmatrix;cdata ### mean centred data plot(cdata) abline(h=0,v=0)

cor(cdata)

Page 46: Different kind of distance and Statistical Distance

• ##################

cov(cdata)

eigen(cov( cdata))

xx1=c(1,0);xx1

xx2=c(0,1);xx2

vv1=eigen(cov(cdata))$vectors[,1];vv1

vv2=eigen(cov(cdata))$vectors[,2];vv2

Page 47: Different kind of distance and Statistical Distance

################theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) * sqrt(sum(vv1 * vv1)) ) );theta

theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) * sqrt(sum(vv2 * vv2)) ) );theta

###############xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xxyy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yyplot(xx,yy)abline(h=0,v=0)

Page 48: Different kind of distance and Statistical Distance

V=eigen(cov(cdata))$vectors;Vtdata=as.matrix(cdata)%*%V;tdata ### transformed datacov(tdata)round(cov(tdata),14)cor(tdata)plot(tdata)abline(h=0,v=0)round(cor(tdata),16)

Page 49: Different kind of distance and Statistical Distance

• ################ comparison of both method ############

comparison=tdata - as.matrix(cbind(xx,yy));comparisonround(comparison,4)

Page 50: Different kind of distance and Statistical Distance

########### using package. md from original data #####

md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md ## md =mahalanobis distance

######## mahalanobis distance from transformed data ######## tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd

###### comparison ############ md-tmd

Page 51: Different kind of distance and Statistical Distance

Mahalanobis distance : Manually mu=colMeans(tdata);muincov=solve(cov(tdata));incovmd1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]-mu);md1md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]-mu);md2md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]-mu);md3............. ……………. ………….. md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]-mu);md20md for package and manully are equal

Page 52: Different kind of distance and Statistical Distance

tdatas1=sd(tdata[,1]);s1s2=sd(tdata[,2]);s2xstar=c(tdata[,1])/s1;xstarystar=c(tdata[,2])/s2;ystar

md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2………. ………… ……………..Not equal to above distances……..Why ???????Take into account mean

Page 53: Different kind of distance and Statistical Distance

05/01/2023

Statistical Distance under Rotated Coordinate System

22222112

2111

212

211

22

22

11

21

21

2),(

cossin~sincos~~~

~~

),(

)~,~(),0,0(

xaxxaxaPOd

xxxxxxsx

sxPOd

xxPO

are sample variances

Page 54: Different kind of distance and Statistical Distance

• After some manipulation this can be written in terms of origin variables

Whereas

Page 55: Different kind of distance and Statistical Distance

Proof…………• = =

= + 2 + = = - 2 +

Page 56: Different kind of distance and Statistical Distance

Continued………….

=

Page 57: Different kind of distance and Statistical Distance

Continued………….

Page 58: Different kind of distance and Statistical Distance

05/01/2023

General Statistical Distance

)])((2))((2))((2

)(

)()([

),(

]222

[),(

),,,(),0,,0,0(),,,,(

11,1

331113221112

2

22222

21111

1,131132112

22222

2111

2121

pppppp

pppp

pppp

ppp

pp

yxyxayxyxayxyxa

yxa

yxayxa

QPd

xxaxxaxxa

xaxaxaPOd

yyyQOxxxP

Page 59: Different kind of distance and Statistical Distance

• The above distances are completely determined by the coefficients(weights) These are can be arranged in rectangular array as

this array (matrix) must be symmetric positive definite.

Page 60: Different kind of distance and Statistical Distance

Why Positive definite ???? Let A be a positive definite matrix .

A=C’C X’AX= X’C’CX = (CX)’(CX) = Y’Y It obeys all the distance property. X’AX is distance ,For different A it gives different distance .

Page 61: Different kind of distance and Statistical Distance

• Why positive definite matrix ????????• Ans: Spectral decomposition : the spectral

decomposition of a kk symmetric matrix A is given by

• Where are pair of eigenvalues and eigenvectors.

And And if pd & invertible .

Page 62: Different kind of distance and Statistical Distance

4.0 4.5 5.0 5.5 6.02

3

4

5

λ1λ2

𝑒1

𝑒2

Page 63: Different kind of distance and Statistical Distance

• Suppose p=2. The distance from origin is

By spectral decomposition

X1

X2𝐶√ λ1

𝐶√ λ2

Page 64: Different kind of distance and Statistical Distance

Another property is

Thus

We use this property in Mahalanobis distance

Page 65: Different kind of distance and Statistical Distance

05/01/2023

Necessity of Statistical Distance

Center of gravity

Another point

Page 66: Different kind of distance and Statistical Distance

• Consider the Euclidean distances from the point Q to the points P and the origin O.

• Obviously d(PQ) > d (QO )

But, P appears to be more like the points in the cluster than does the origin .

If we take into account the variability of the points in cluster and measure distance by statistical distance , then Q will be closer to P than O .

Page 67: Different kind of distance and Statistical Distance

Mahalanobis distance

• The Mahalanobis distance is a descriptive statistic that provides a relative measure of a data point's distance from a common point. It is a unitless measure introduced by P. C. Mahalanobis in 1936

Page 68: Different kind of distance and Statistical Distance

Intuition of Mahalanobis Distance • Recall the eqution

d(O,P)= => = Where x= , A=

Page 69: Different kind of distance and Statistical Distance

Intuition of Mahalanobis Distance

d(O,P)= Where ; A=

Page 70: Different kind of distance and Statistical Distance

Intuition of Mahalanobis Distance

where, A=

Page 71: Different kind of distance and Statistical Distance

Mahalanobis Distance

• Mahalanobis used ,inverse of covariance matrix instead of A

• Thus ……………..(1)

• And used instead of y ………..(2)

Mah-alan-obis

dist-ance

Page 72: Different kind of distance and Statistical Distance

Mahalanobis Distance

• The above equations are nothing but Mahalanobis Distance ……

• For example, suppose we took a single observation from a bivariate population with Variable X and Variable Y, and that our two variables had the following characteristics

Page 73: Different kind of distance and Statistical Distance

• single observation, X = 410 and Y = 400 The Mahalanobis distance for that single value as:

Page 74: Different kind of distance and Statistical Distance

• ghk

1.825

Page 75: Different kind of distance and Statistical Distance

• Therefore, our single observation would have a distance of 1.825 standardized units from the mean (mean is at X = 500, Y = 500).

• If we took many such observations, graphed them and colored them according to their Mahalanobis values, we can see the elliptical Mahalanobis regions come out

Page 76: Different kind of distance and Statistical Distance

• The points are actually distributed along two primary axes:

Page 77: Different kind of distance and Statistical Distance
Page 78: Different kind of distance and Statistical Distance

If we calculate Mahalanobis distances for each of these points and shade them according to their distance value, we see clear elliptical patterns emerge:

Page 79: Different kind of distance and Statistical Distance
Page 80: Different kind of distance and Statistical Distance

• We can also draw actual ellipses at regions of constant Mahalanobis values:

68% obs

95% obs

99.7% obs

Page 81: Different kind of distance and Statistical Distance

• Which ellipse do you choose ??????Ans : Use the 68-95-99.7 rule .

1) about two-thirds (68%) of the points should be within 1 unit of the origin (along the axis). 2) about 95% should be within 2 units 3)about 99.7 should be within 3 units

Page 82: Different kind of distance and Statistical Distance

If normal

Page 83: Different kind of distance and Statistical Distance

Sample Mahalanobis Distancce • The sample Mahalanobis distance is made by

replacing by S and by • i.e (X- )’ (X- )

Page 84: Different kind of distance and Statistical Distance

For sample

(X- )’ (X- )

Distribution of mahalanobis distance

Page 85: Different kind of distance and Statistical Distance

Distribution of mahalanobis distance Let be in dependent observation from any population with meanand finite (nonsingular) covariance Σ . Then is approximately and is approximately for n-p large This is nothing but central limit theorem

Page 86: Different kind of distance and Statistical Distance

Mahalanobis distance in R

• ########### Mahalanobis Distance ##########

• x=rnorm(100);x

• dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix

• cm=colMeans(dm);cm ## cm= column means

• cov=cov(dm);cov ##cov = covariance matrix

• incov=solve(cov);incov ##incov= inverse of

covarianc matrix

Page 87: Different kind of distance and Statistical Distance

Mahalanobis distance in R• ####### MAHALANOBIS DISTANCE : MANUALY ######

• @@@ Mahalanobis distance of first • observation@@@@@@• ob1=dm[1,];ob1 ## first observation • mv1=ob1-cm;mv1 ## deviatiopn of first observation from center of gravity • md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis distance of first observation from center of gravity •

Page 88: Different kind of distance and Statistical Distance

Mahalanobis distance in R• @@@@@@ Mahalanobis distance of second observation@@@@@

• ob2=dm[2,];ob2 ## second observation • mv2=ob2-cm;mv2 ## deviatiopn of second • observation from • center of gravity • md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis distance of second observation from center of gravity ................ ……………… …..……………

Page 89: Different kind of distance and Statistical Distance

Mahalanobis distance in R ………....... ……………… ……………

@@@@@ Mahalanobis distance of 20th observation@@@@@• Ob20=dm[,20];ob20 [## 20th observation • mv20=ob20-cm;mv20 ## deviatiopn of 20th observation from center of gravity • md20=t(mv20)%*%incov%*%mv20;md20 ## mahalanobis distance of 20thobservation from center of gravity

Page 90: Different kind of distance and Statistical Distance

Mahalanobis distance in R

####### MAHALANOBIS DISTANCE : PACKAGE ########

• md=mahalanobis(dm,cm,cov,inverted =F);md ## md =mahalanobis distance• md=mahalanobis(dm,cm,cov);md

Page 91: Different kind of distance and Statistical Distance

Another example

• x <- matrix(rnorm(100*3), ncol = 3)

• Sx <- cov(x)

• D2 <- mahalanobis(x, colMeans(x), Sx)

Page 92: Different kind of distance and Statistical Distance

• plot(density(D2, bw = 0.5), main="Squared Mahalanobis distances, n=100, p=3") • qqplot(qchisq(ppoints(100), df = 3), D2, main = expression("Q-Q plot of Mahalanobis" * ~D^2 * " vs. quantiles of" * ~ chi[3]^2))

• abline(0, 1, col = 'gray')• ?? mahalanobis

Page 93: Different kind of distance and Statistical Distance

Acknowledgement

Prof . Mohammad Nasser . Richard A. Johnson & Dean W. Wichern . & others

Page 94: Different kind of distance and Statistical Distance

THANK YOU ALL

Page 95: Different kind of distance and Statistical Distance

Necessity of Statistical Distance

In home Mother

In mess Female

maid

Student in mess