33851

Embed Size (px)

Citation preview

  • 8/12/2019 33851

    1/55

  • 8/12/2019 33851

    2/55

    Correlation &Regression

    Dr. Moataza Mahmoud Abdel Wahab

    Lecturer of BiostatisticsHigh Institute of Public Health

    University of Alexandria

  • 8/12/2019 33851

    3/55

    CorrelationFinding the relationship between two

    quantitative variables without being able to

    infer causal relationships

    Correlationis a statistical technique used to

    determine the degree to which two

    variables are related

  • 8/12/2019 33851

    4/55

    Rectangular coordinate

    Two quantitative variables

    One variable is called independent (X) and

    the second is called dependent (Y)

    Points are not joined

    No frequency table

    Scatter diagram

    Y

    * *

    *

    X

  • 8/12/2019 33851

    5/55

    Wt.

    (kg)

    67 69 85 83 74 81 97 92 114 85

    SBP

    mHg)

    120 125 140 160 130 180 150 140 200 130

    Example

  • 8/12/2019 33851

    6/55

  • 8/12/2019 33851

    7/55

    80

    100

    120

    140

    160

    180

    200

    220

    60 70 80 90 100 110 120Wt (kg)

    SBP(mmHg)

    Scatter diagram of weight and systolic blood pressure

  • 8/12/2019 33851

    8/55

    Scatter plots

    The pattern of data is indicative of the type of

    relationship between your two variables:

    positive relationship

    negative relationship

    no relationship

  • 8/12/2019 33851

    9/55

    Positive relationship

  • 8/12/2019 33851

    10/55

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    0 10 20 30 40 50 60 70 80 90

    Age in Weeks

    Heightin

    CM

  • 8/12/2019 33851

    11/55

    Negative relationship

    Reliability

    Age of Car

  • 8/12/2019 33851

    12/55

    No relation

  • 8/12/2019 33851

    13/55

    Correlation Coefficient

    Statistic showing the degree of relation

    between two variables

  • 8/12/2019 33851

    14/55

    Simple Correlation coefficient (r)

    It is also called Pearson's correlation

    or product moment correlation

    coefficient.

    It measures the natureand strength

    between two variables of

    the quantitativetype.

  • 8/12/2019 33851

    15/55

    The signof rdenotes the nature of

    association

    while the valueof rdenotes the

    strength of association.

  • 8/12/2019 33851

    16/55

    If the sign is +ve this means the relationis direct (an increase in one variable isassociated with an increase in theother variable and a decrease in onevariable is associated with a

    decrease in the other variable).

    While if the sign is -ve this means an

    inverse or indirect relationship (whichmeans an increase in one variable isassociated with a decrease in the other).

  • 8/12/2019 33851

    17/55

    The value of r ranges between ( -1) and ( +1)

    The value of r denotes the strength of theassociation as illustrated

    by the following diagram.

    -110

    -0.25-0.75 0.750.25

    strong strongintermediate intermediateweak weak

    no relation

    perfect

    correlation

    perfect

    correlation

    Directindirect

  • 8/12/2019 33851

    18/55

    If r= Zero this means no association orcorrelation between the two variables.

    If 0 < r< 0.25= weak correlation.

    If 0.25 r< 0.75= intermediate correlation.

    If 0.75 r< 1= strong correlation.

    If r = l= perfect correlation.

  • 8/12/2019 33851

    19/55

    n

    y)(y.

    n

    x)(x

    n

    yxxy

    r2

    2

    2

    2

    How to compute the simple correlation

    coefficient (r)

  • 8/12/2019 33851

    20/55

    Example:

    A sample of 6 children was selected, data about their

    age in years and weight in kilograms was recorded as

    shown in the following table . It is required to find the

    correlation between age and weight.

    Weight

    (Kg)

    Age

    (years)

    serial

    No

    1271

    862

    1283

    1054

    1165

    1396

  • 8/12/2019 33851

    21/55

    These 2 variables are of the quantitative type, one

    variable (Age) is called the independent and

    denoted as (X) variable and the other (weight)is called the dependent and denoted as (Y)

    variables to find the relation between age and

    weight compute the simple correlation coefficient

    using the following formula:

    n

    y)(y.

    n

    x)(x

    n

    yxxy

    r2

    22

    2

  • 8/12/2019 33851

    22/55

    Y2X2xy

    Weight

    (Kg)

    (y)

    Age

    (years)

    (x)

    Serial

    n.

    14449841271

    643648862

    14464961283

    10025501054

    12136661165

    169811171396

    y2=

    742

    x2=

    291

    xy=

    461

    y=

    66

    x=

    41

    Total

  • 8/12/2019 33851

    23/55

    r = 0.759

    strong direct correlation

    6

    (66)742.

    6

    (41)291

    6

    6641461

    r22

  • 8/12/2019 33851

    24/55

    EXAMPLE: Relationship between Anxiety and

    Test Scores

    Anxiety

    (X)

    Test

    score (Y)X2 Y2 XY

    10 2 100 4 20

    8 3 64 9 242 9 4 81 18

    1 7 1 49 7

    5 6 25 36 306 5 36 25 30

    X = 32 Y = 32 X2= 230 Y2= 204 XY=129

  • 8/12/2019 33851

    25/55

    Calculating Correlation Coefficient

    94.

    )200)(356(

    1024774

    32)204(632)230(6

    )32)(32()129)(6(22

    r

    r = - 0.94

    Indirect strong correlation

  • 8/12/2019 33851

    26/55

    Spearman Rank Correlat ion Coeff ic ient

    (rs)

    It is a non-parametric measure of correlation.

    This procedure makes use of the two sets ofranks that may be assigned to the sample

    values of x and Y.Spearman Rank correlation coefficient could becomputed in the following cases:

    Both variables are quantitative.

    Both variables are qualitative ordinal.

    One variable is quantitative and the other isqualitative ordinal.

  • 8/12/2019 33851

    27/55

    Procedure:

    1. Rank the values of X from 1 to n where nis the numbers of pairs of values of X and

    Y in the sample.

    2.

    Rank the values of Y from 1 to n.3. Compute the value of di for each pair of

    observation by subtracting the rank of Yi

    from the rank of Xi

    4. Square each di and compute di2 which

    is the sum of the squared values.

  • 8/12/2019 33851

    28/55

    5. Apply the following formula

    1)n(n

    (di)61r2

    2

    s

    The value of rsdenotes the magnitude

    and nature of association giving the same

    interpretation as simple r.

  • 8/12/2019 33851

    29/55

    Example

    In a study of the relationship between leveleducation and income the following data was

    obtained. Find the relationship between themand comment.

    Income

    (Y)

    level education

    (X)

    sample

    numbers

    25Preparatory.A

    10Primary.B

    8University.C

    10secondaryD15secondaryE

    50illiterateF

    60University.G

  • 8/12/2019 33851

    30/55

    Answer:

    di2diRank

    Y

    Rank

    X(Y)(X)423525PreparatoryA

    0.250.55.5610Primary.B

    30.25-5.571.58University.C

    4-25.53.510secondaryD

    0.25-0.543.515secondaryE

    2552750illiterateF

    0.250.511.560university.G

    di2=64

  • 8/12/2019 33851

    31/55

    Comment:

    There is an indirect weak correlation

    between level of education and income.

    1.0)48(7

    646

    1

    sr

    i

  • 8/12/2019 33851

    32/55

    exercise

  • 8/12/2019 33851

    33/55

    Regression Analyses

    Regression: technique concerned with predictingsome variables by knowing others

    The process of predicting variable Y usingvariable X

  • 8/12/2019 33851

    34/55

    Regression

    Uses a variable (x) to predict some outcome

    variable (y)

    Tells you how values in y change as a function

    of changes in values of x

  • 8/12/2019 33851

    35/55

    Correlation and Regression

    Correlation describes the strength of a linear

    relationship between two variables

    Linearmeans straight line

    Regressiontells us how to draw the straight line

    described by the correlation

    Regression

  • 8/12/2019 33851

    36/55

    Regression

    Calculates the best-fit line for a certain set of data

    The regression line makes the sum of the squares of

    the residuals smaller than for any other lineRegression minimizes residuals

    80

    100

    120

    140

    160

    180

    200

    220

    60 70 80 90 100 110 120Wt (kg)

    SBP(mmHg)

  • 8/12/2019 33851

    37/55

    By using the least squares method (a procedure

    that minimizes the vertical deviations of plotted

    points surrounding a straight line) we are

    able to construct a best fitting straight line to the

    scatter diagram points and then formulate a

    regression equation in the form of:

    n

    x)(x

    n

    yxxy

    b2

    2

    1)xb(xyy b

    bXay

    R i E ti

  • 8/12/2019 33851

    38/55

    Regression Equation

    Regression equation

    describes the

    regression line

    mathematically

    Intercept Slope

    80

    100

    120

    140

    160

    180

    200

    220

    60 70 80 90 100 110 120Wt (kg)

    SBP(mmHg)

  • 8/12/2019 33851

    39/55

    Linear Equations

    Y

    Y = bX + a

    a = Y-intercept

    X

    Changein Y

    Change in X

    b = Slope

    bXay

  • 8/12/2019 33851

    40/55

    Hours studying and grades

  • 8/12/2019 33851

    41/55

    Regressing grades on hours

    Linear Regression

    2.00 4.00 6.00 8.00 10.00

    Number of hours spent studying

    70.00

    80.00

    90.00

    Finalgr

    ade

    in

    course

    Final grade in course = 59.95 + 3.17 * study

    R-Square = 0.88

    Predicted final grade in class =

    59.95 + 3.17*(number of hours you study per week)

    Predicted final grade in class = 59.95 + 3.17*(hours of study)

  • 8/12/2019 33851

    42/55

    Predict the final grade of

    Someone who studies for 12 hours

    Final grade = 59.95 + (3.17*12)

    Final grade = 97.99

    Someone who studies for 1 hour:

    Final grade = 59.95 + (3.17*1)

    Final grade = 63.12

    Predicted final grade in class 59.95 3.17 (hours of study)

  • 8/12/2019 33851

    43/55

    Exercise

    A sample of 6 persons was selected thevalue of their age ( x variable) and their

    weight is demonstrated in the following

    table. Find the regression equation and

    what is the predicted weight when age is

    8.5 years.

  • 8/12/2019 33851

    44/55

    Weight (y)Age (x)Serial no.

    12

    8

    1210

    11

    13

    7

    6

    85

    6

    9

    1

    2

    34

    5

    6

  • 8/12/2019 33851

    45/55

    Answer

    Y2X2xyWeight (y)Age (x)Serial no.

    144

    64

    144100

    121

    169

    49

    36

    6425

    36

    81

    84

    48

    9650

    66

    117

    12

    8

    1210

    11

    13

    7

    6

    85

    6

    9

    1

    2

    34

    5

    6

    7422914616641Total

  • 8/12/2019 33851

    46/55

    6.836

    41x 11

    6

    66y

    92.0

    6

    )41(

    291

    6

    6641461

    2

    b

    Regression equation

    6.83)0.9(x11y (x)

  • 8/12/2019 33851

    47/55

    0.92x4.675y (x)

    12.50Kg8.5*0.924.675y(8.5)

    Kg58.117.5*0.924.675y(7.5)

  • 8/12/2019 33851

    48/55

    11.4

    11.6

    11.8

    12

    12.2

    12.4

    12.6

    7 7.5 8 8.5 9

    Age (in years)

    Weight(i

    nKg)

    we create a regression line by plotting two

    estimated values for y against their X component,

    then extending the line right and left.

  • 8/12/2019 33851

    49/55

    Exercise 2

    The following are theage (in years) andsystolic bloodpressure of 20

    apparently healthyadults.

    B.P

    (y)

    Age

    (x)

    B.P

    (y)

    Age

    (x)

    128136

    146

    124

    143

    130

    124

    121126

    123

    4653

    60

    20

    63

    43

    26

    1931

    23

    120128

    141

    126

    134

    128

    136

    132140

    144

    2043

    63

    26

    53

    31

    58

    4658

    70

  • 8/12/2019 33851

    50/55

    Find the correlation between ageand blood pressure using simple

    and Spearman's correlationcoefficients, and comment.

    Find the regression equation?

    What is the predicted bloodpressure for a man aging 25 years?

  • 8/12/2019 33851

    51/55

    x2xyyxSerial

    4002400120201

    1849550412843239698883141633

    6763276126264

    28097102134535

    9613968128316

    33647888136587

    21166072132468

    336481201405894900100801447010

  • 8/12/2019 33851

    52/55

    x2xyyxSerial

    211658881284611

    280972081365312360087601466013

    40024801242014

    396990091436315184955901304316

    67632241242617

    3612299121191896139061263119

    52928291232320

    416781144862630852Total

  • 8/12/2019 33851

    53/55

    n

    x)(x

    n

    yxxy

    b2

    21

    4547.0

    2085241678

    20

    2630852114486

    2

    =

    =112.13 + 0.4547 x

    for age 25

    B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg

    y

  • 8/12/2019 33851

    54/55

    Multiple Regression

    Multiple regression analysis is a

    straightforward extension of simple

    regression analysis which allows more

    than one independent variable.

  • 8/12/2019 33851

    55/55