Upload
others
View
27
Download
2
Embed Size (px)
Citation preview
CORRELATION
ANALYSIS
NDIM
IntroductionCorrelation a LINEAR association between two
random variables
Correlation analysis show us how to determineboth the nature and strength of relationshipbetween two variables
When variables are dependent on timecorrelation is applied
Correlation lies between +1 to -1NDIM
A zero correlation indicates that there is no
relationship between the variables
A correlation of –1 indicates a perfect negative
correlation
A correlation of +1 indicates a perfect positive
correlation
NDIM
Types of Correlation
There are three types of correlation
Types
Type 1 Type 2 Type 3
NDIM
Type1
Positive Negative No Perfect
If two related variables are such that when
one increases (decreases), the other also
increases (decreases).
If two variables are such that when one
increases (decreases), the other decreases
(increases)
If both the variables are independentNDIM
When plotted on a graph it tends to be a perfect
line
When plotted on a graph it is not a straight line
Type 2
Linear Non – linear
NDIM
NDIM
Type 3
Simple Multiple Partial
Two independent and one dependent variable
One dependent and more than one independent
variables
One dependent variable and more than one
independent variable but only one independent
variable is considered and other independent
variables are considered constant
NDIM
Methods of Studying Correlation
Scatter Diagram Method
Karl Pearson Coefficient Correlation of
Method
Spearman’s Rank Correlation Method
NDIM
180
160
140
120
100
80
60
40
20
0
0 5 0 2 0 0 2 5 0100 150
Drug A (dose in mg)
S y
mpt
omIn
dex
160
140
120
100
80
60
40
20
00 50 250100 150 200
Drug B (dose in mg)
Sym
pto
mIn
dex
Very good fit Moderate fit
Correlation: LinearRelationships
Strong relationship = good linear fit
Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)
NDIM
Coefficient of CorrelationA measure of the strength of the linear relationship
between two variables that is defined in terms of the
(sample) covariance of the variables divided by their
(sample) standard deviations
Represented by “r”
r lies between +1 to -1
Magnitude and Direction
NDIM
-1 < r < +1
The + and – signs are used for positive linear
correlations and negative linear
correlations, respectively
NDIM
n Y 2 ( Y)2X2n ( X)2
n XY X Yrxy
Shared variability of X and Y variables on the topIndividual variability of X and Y variables on the bottom
NDIM
Interpreting CorrelationCoefficient r
strong correlation: r > .70 or r < –.70
moderate correlation: r is between .30 &.70or r is between –.30
and –.70
weak correlation: r is between 0 and .30 or r is between 0 and –.30 .
NDIM
Spearmans rank coefficient
A method to determine correlation when the data
is not available in numerical form and as an
alternative the method, the method of rank
correlation is used. Thus when the values of the
two variables are converted to their ranks, and
there from the correlation is obtained, the
correlations known as rank correlation.
NDIM
Computation of RankCorrelation
Spearman’s rank correlation coefficient
ρ can be calculated when
Actual ranks given
Ranks are not given but grades are given but not
repeated
Ranks are not given and grades are given and
repeated
NDIM
REGRESSION ANALYSIS
NDIM
Algebraically method
1.Least Square Method-:
The regression equation of X on Y is :
X= a+bX
Where,
X=Dependent variable and Y=Independent variable
The regression equation of Y on X is:
Y = a+bX
Where,
Y=Dependent variable
X=Independent variable
NDIM
Simple Linear Regression
Independent variable (x)
De
pe
nd
en
t va
ria
ble
(y)
The output of a regression is a function that predicts the dependent variable
based upon values of the independent variables.
Simple regression fits a straight line to the data.
y = a + bX ± є
a (y intercept)
b = slope
= ∆y/ ∆x
є
NDIM
Example1-: From the following data obtain the regression equations using the method of Least Squares.
X 3 2 7 4 8
Y 6 1 8 5 9
Solution-:
X Y XY X2 Y2
3 6 18 9 36
2 1 2 4 1
7 8 56 49 64
4 5 20 16 25
8 9 72 64 81
24X 29Y 168XY 1422 X 2072 Y
NDIM
XbnaY
2XbXaXY
Substitution the values from the table we get
29=5a+24b…………………(i)168=24a+142b84=12a+71b………………..(ii)
Multiplying equation (i ) by 12 and (ii) by 5
348=60a+288b………………(iii)420=60a+355b………………(iv)
By solving equation(iii)and (iv) we get
a=0.66 and b=1.07
NDIM
By putting the value of a and b in the Regression equation Y on X we get
Y=0.66+1.07X
Now to find the regression equation of X on Y ,The two normal equation are
2YbYaXY
YbnaX
Substituting the values in the equations we get
24=5a+29b………………………(i)168=29a+207b…………………..(ii)
Multiplying equation (i)by 29 and in (ii) by 5 we get
a=0.49 and b=0.74
NDIM
Substituting the values of a and b in the Regression equation X and Y
X=0.49+0.74Y
2.Deaviation from the Arithmetic mean method:
The calculation by the least squares method are quit cumbersome when the values of X and Y are large. So the work can be simplified by using this method.The formula for the calculation of Regression Equations by this method:
Regression Equation of X on Y- )()( YYbXX xy Regression Equation of Y on X-
)()( XXbYY yx
2y
xybxy
2x
xybyxand
Where,xyb
yxband = Regression Coefficient
NDIM
Example2-: from the previous data obtain the regression equations byTaking deviations from the actual means of X and Y series.
X 3 2 7 4 8
Y 6 1 8 5 9
X Y x2 y2 xy
3 6 -1.8 0.2 3.24 0.04 -0.36
2 1 -2.8 -4.8 7.84 23.04 13.44
7 8 2.2 2.2 4.84 4.84 4.84
4 5 -0.8 -0.8 0.64 0.64 0.64
8 9 3.2 3.2 10.24 10.24 10.24
XXx YYy
24X 29Y 8.262 x 8.28xy8.382 y 0x 0 y
Solution-:
NDIM
Regression Equation of X on Y is
49.074.0
8.574.08.4
8.58.38
8.288.4
2
YX
YX
YX
y
xybxy
Regression Equation of Y on X is)()( XXbYY yx
66.007.1
)8.4(07.18.5
8.48.26
8.288.5
2
XY
XY
XY
x
xybyx
………….(I)
………….(II)
)()( YYbXX xy
NDIM
It would be observed that these regression equations are same as those obtained by the direct method .
3.Deviation from Assumed mean method-:
When actual mean of X and Y variables are in fractions ,thecalculations can be simplified by taking the deviations from theassumed mean.
The Regression Equation of X on Y-:
22
yy
yxyx
xy
ddN
ddddNb
The Regression Equation of Y on X-:
22
xx
yxyx
yx
ddN
ddddNb
)()( YYbXX xy
)()( XXbYY yx
But , here the values of and will be calculated byfollowing formula:
xyb yxb
NDIM
Example-: From the data given in previous example calculateregression equations by assuming 7 as the mean of X series and 6 asthe mean of Y series.
X YDev. From
assu. Mean 7 (dx)=X-7
Dev. From assu. Mean 6 (dy)=Y-6
dxdy
3 6 -4 16 0 0 0
2 1 -5 25 -5 25 +25
7 8 0 0 2 4 0
4 5 -3 9 -1 1 +3
8 9 1 1 3 9 +3
Solution-:
2
xd 2
yd
24X 29Y 11xd 1yd 512
xd 392
yd 31yxddNDIM
The Regression Coefficient of X on Y-:
22
yy
yxyx
xy
ddN
ddddNb
74.0
194
144
1195
11155
)1()39(5
)1)(11()31(52
xy
xy
xy
xy
b
b
b
b
8.55
29
Y
N
YY
The Regression equation of X on Y-:
49.074.0
)8.5(74.0)8.4(
)()(
YX
YX
YYbXX xy
8.45
24
X
N
XX
NDIM
The Regression coefficient of Y on X-:
22
xx
yxyx
yx
ddN
ddddNb
07.1
134
144
121255
11155
)11()51(5
)1)(11()31(52
yx
yx
yx
yx
b
b
b
b
The Regression Equation of Y on X-:)()( XXbYY yx
66.007.1
)8.4(07.1)8.5(
XY
XY
It would be observed the these regression equations are same as thoseobtained by the least squares method and deviation from arithmetic mean .
NDIM