Correlation lecture

Embed Size (px)

Citation preview

  • 8/20/2019 Correlation lecture

    1/79

    Linear correlation andlinear regression

  • 8/20/2019 Correlation lecture

    2/79

    Continuous outcome

    (means)OutcomeVariable

    Are the observations independent or correlated?Alternatives if thenormality assumption isviolated (and smallsample size):

    independent correlated

    Continuous(e g painscale!cognitivefunction)

    Ttest: compares meansbet"een t"o independentgroups

    ANOVA: compares means

    bet"een more than t"oindependent groups

    Pearson’s correlationcoefcient (linearcorrelation): sho"s linearcorrelation bet"een t"ocontinuous variables

    Linear regression: multivariate re ression

    Paired ttest: comparesmeans bet"een t"o relatedgroups (e g ! the samesub%ects before and after)

    Repeated-measuresANOVA: compares changesover time in the means of t"oor more groups (repeatedmeasurements)

    Mixed models/ !!modeling : multivariateregression techni#ues tocompare changes over timebet"een t"o or more groups$

    &on'parametric statistics

    "ilcoxon sign-ran#test : non'parametricalternative to the paired ttest

    "ilcoxon sum-ran# test ( ann'*hitney + test): non'parametric alternative tothe ttest

    $rus#al-"allis test: non'parametric alternative toA&OVA

    %pearman ran#

  • 8/20/2019 Correlation lecture

    3/79

    /ecall: Covariance

    1

    ))((),(cov 1

    −−=

    ∑=

    n

    Y y X x y x

    n

    iii

  • 8/20/2019 Correlation lecture

    4/79

    cov(0!1) 2 3 0 and 1 are positively correlated

    cov(0!1) 4 3 0 and 1 are inversely correlated

    cov(0!1) 3 0 and 1 are independent

    5nterpreting Covariance

  • 8/20/2019 Correlation lecture

    5/79

    Correlation coe.cient,earson-s Correlation Coe.cient is standardized covariance (unitless):

    y x

    y xariancer

    var var

    ),(cov=

  • 8/20/2019 Correlation lecture

    6/79

    Correlationeasures the relative strength of the linear

    relationship bet"een t"o variables

    +nit'less/anges bet"een 67 and 7

    8he closer to 67! the stronger the negative linearrelationship

    8he closer to 7! the stronger the positive linearrelationship

    8he closer to 3! the "ea9er any positive linear relationship

  • 8/20/2019 Correlation lecture

    7/79

    catter ,lots of ;ata "ithVarious CorrelationCoe.cients Y

    X

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r = -1 r = -.6 r = 0

    r = +.3r = +1

    Y

    Xr = 0

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    8/79

    Y

    X

    Y

    X

    Y

    Y

    X

    X

    Linear relationships Curvilinear relationshipsLinear Correlation

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    9/79

    Y

    X

    Y

    X

    Y

    Y

    X

    X

    Strong relationships Weak relationshipsLinear Correlation

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    10/79

    Linear Correlation

    Y

    X

    Y

    X

    No relationship

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    11/79

    Calculating by hand<

    1

    )(

    1

    )(

    1

    ))((

    var var

    ),(cov#

    1

    2

    1

    2

    1

    −−

    ==

    ∑∑

    ==

    =

    n

    y y

    n

    x x

    n

    y y x x

    y x

    y xariancer

    n

    i

    i

    n

    i

    i

    n

    iii

  • 8/20/2019 Correlation lecture

    12/79

    impler calculation

    formula<

    y x

    xy

    n

    ii

    n

    ii

    n

    iii

    n

    ii

    n

    ii

    n

    iii

    SS SS

    SS

    y y x x

    y y x x

    n

    y y

    n

    x x

    n

    y y x x

    r

    =−−

    −−

    =

    −−

    =

    ∑∑

    ∑∑

    ==

    =

    ==

    =

    1

    2

    1

    2

    1

    1

    2

    1

    2

    1

    )()(

    ))((

    1

    )(

    1

    )(

    1

    ))((

    #

    y x

    xy

    SS SS

    SS r =#

    Numerator ofcovariance

    Numerators ofvariance

  • 8/20/2019 Correlation lecture

    13/79

    &istri'ution o( t)e

    correlation coefcient:

    =note! li9e a proportion! the variance of the correlation coe.cient

    depends on the correlation coe.cient itself substitute in estimated r

    21

    )#(

    2

    −= n

    r r SE

    The sample correlation coefficient follows a T-distributionwith n-2 degrees of freedom (since you have to estimate thestandard error).

  • 8/20/2019 Correlation lecture

    14/79

    Continuous outcome

    (means)OutcomeVariable

    Are the observations independent or correlated?Alternatives if thenormality assumption isviolated (and smallsample size):

    independent correlated

    Continuous(e g painscale!cognitivefunction)

    Ttest: compares meansbet"een t"o independentgroups

    ANOVA: compares meansbet"een more than t"oindependent groups

    Pearson’s correlationcoefcient (linearcorrelation): sho"s linearcorrelation bet"een t"ocontinuous variables

    Linear regression: multivariate re ression

    Paired ttest: comparesmeans bet"een t"o relatedgroups (e g ! the samesub%ects before and after)

    Repeated-measuresANOVA: compares changesover time in the means of t"oor more groups (repeatedmeasurements)

    Mixed models/ !!modeling : multivariateregression techni#ues tocompare changes over timebet"een t"o or more groups$

    &on'parametric statistics

    "ilcoxon sign-ran#test : non'parametricalternative to the paired ttest

    "ilcoxon sum-ran# test ( ann'*hitney + test): non'parametric alternative tothe ttest

    $rus#al-"allis test: non'parametric alternative toA&OVA

    %pearman ran#

  • 8/20/2019 Correlation lecture

    15/79

    Linear regression5n correlation! the t"o variables are treated as e#uals 5n regression!one variable is considered independent ( predictor) variable ( X ) and

    the other the dependent ( outcome) variable Y

  • 8/20/2019 Correlation lecture

    16/79

    *hat is >Linear ?/emember this:Y=mX+B?

    B

    m

  • 8/20/2019 Correlation lecture

    17/79

    *hat-s lope?

    $ slo%e of 2 means that ever& 1!'nit change in

    &ields a 2!'nit change in *

  • 8/20/2019 Correlation lecture

    18/79

    ,rediction+f &o' no- something a.o't , this no-ledge hel%s &o'

    %redict something a.o't * (So'nd familiar/ so'ndli e conditional %ro.a.ilities/)

  • 8/20/2019 Correlation lecture

    19/79

    /egression e#uation<

    iii x x y E β α +=)(Expected value of y at a given level of x

  • 8/20/2019 Correlation lecture

    20/79

    ,redicted value for anindividual<

    &i α 3 β xi 3 random error i

    5ollo-s a normaldistri.'tion

    5ixed 6exactl&on the

    line

  • 8/20/2019 Correlation lecture

    21/79

    Assumptions (or the @neprint)

    Linear regression assumes that<7 8he relationship bet"een 0 and 1 is linear

    1 is distributed normally at each value of0B 8he variance of 1 at every value of 0 isthe same (homogeneity of variances)

    8he observations are independent

  • 8/20/2019 Correlation lecture

    22/79

    The standard error of ! given " is the average variability around theregression line at any given value of ". #t is assumed to be e$ual atall values of ".

    %y&x

    %y&x

    %y&x

    %y&x

    %y&x

    %y&x

  • 8/20/2019 Correlation lecture

    23/79

    ' A

    A

    yi

    x

    y

    yi

    C

    7east s8'ares estimationgave 's the line (9) thatminimi ed ; 2

    α β += ii x y#

    y

    2 2 ' 2 %%total Total s*uared distance o(o'ser+ations (rom na,+emean o( Total variation

    %%reg ariance aro'nd the regression lineAdditional variability not explained by

    x—what least squares method aims tominimize

    ∑∑ ∑== =

    −+−=−n

    iii

    n

    i

    n

    iii y y y y y y

    1

    2

    1 1

    22 )#()#()(

    /egression ,icture

    * 2 %%reg &%%total

  • 8/20/2019 Correlation lecture

    24/79

    /ecall eEample: cognitivefunction and vitamin ;

    Fypothetical data loosely based onG7H$ cross'sectional study of 733middle'aged and older Iuropeanmen

    Cognitive function is measured by the

    ;igit ymbol ubstitution 8est (; 8)

    7 Lee ; ! 8a%ar A! +lubaev A! et al Association bet"een J'hydroEyvitamin ; levels and cognitive performancein middle'aged and older Iuropean men K &eurol &eurosurg ,sychiatry 33 Kul$M3(N):N '

  • 8/20/2019 Correlation lecture

    25/79

    ;istribution of vitamin ;

    ean B nmolPL

    tandard deviation BBnmolPL

  • 8/20/2019 Correlation lecture

    26/79

    ;istribution of ; 8&ormally distributed

    ean M points

    tandard deviation 73 points

  • 8/20/2019 Correlation lecture

    27/79

    Qour hypothetical datasets

    5 generated four hypotheticaldatasets! "ith increasing 8/+Islopes (bet"een vit ; and ; 8):

    33 J points per 73 nmolPL

    7 3 points per 73 nmolPL7 J points per 73 nmolPL

  • 8/20/2019 Correlation lecture

    28/79

    ;ataset 7: no relationship

  • 8/20/2019 Correlation lecture

    29/79

    ;ataset : "ea9relationship

  • 8/20/2019 Correlation lecture

    30/79

    ;ataset B: "ea9 tomoderate relationship

  • 8/20/2019 Correlation lecture

    31/79

    ;ataset : moderaterelationship

  • 8/20/2019 Correlation lecture

    32/79

    8he >Rest @t line

    *egressione$uation+

    E(! i) 2, /vit0 i (in 1 nmol& )

  • 8/20/2019 Correlation lecture

    33/79

    8he >Rest @t line

    Note how the line isa little deceptive3 itdraws your eye4ma5ing therelationship appearstronger than itreally is6

    *egressione$uation+

    E(! i) 27 .8/vit0 i (in 1 nmol& )

  • 8/20/2019 Correlation lecture

    34/79

    8he >Rest @t line

    *egression e$uation+

    E(! i) 22 1. /vit0 i (in 1 nmol& )

  • 8/20/2019 Correlation lecture

    35/79

    8he >Rest @t line

    *egression e$uation+

    E(! i) 2 1.8/vit 0 i (in 1 nmol& )

    Note+ all the lines gothrough the point(794 2,)6

  • 8/20/2019 Correlation lecture

    36/79

    Istimating the intercept andslope: least s#uaresestimation 7east S8'ares Estimation

    $ little calc'l's *?hat are -e tr&ing to estimate/ :4 the slope4 from

    ?hat@s the constraint/ ?e are tr&ing to minimi e the s8'ared distance (hence the Aleast s8'aresB) .et-een theo.servations themselves and the %redicted val'es , or (also called the Aresid'alsB, or left!over 'nex%lained

    varia.ilit&)

  • 8/20/2019 Correlation lecture

    37/79

    /esulting formulas<

    %lope (beta coefficient)

    )(

    ),(#

    xVar

    y xCov=β

    ),( y x

    x#!:;alc'late β α =#ntercept

    *egression line always goes through the point+

  • 8/20/2019 Correlation lecture

    38/79

    /elationship "ithcorrelation

    y

    x

    SDSD

    r β ## =

    +n correlation, the t-o varia.les are treated as e8'als* +n regression, one varia.le is consideredinde%endent ( %redictor) varia.le ( X ) and the other the de%endent ( o'tcome) varia.le Y *

  • 8/20/2019 Correlation lecture

    39/79

    IEample: dataset

    y

    x

    SS SS

    β #

    %0x 99 nmol&

    %0y 1 points

    'ov("4!) 179points/nmol&

    eta 179&99 2 .18points per nmol&

    1.8 points per 1nmol&

    r 179&(1 /99) .;<

    =r

    r .18 / (99&1 ) .;<

  • 8/20/2019 Correlation lecture

    40/79

    igni@cance testing<%lope0istribution of slope > T n-2 (:4s.e.( ))

    β #

    F 3 : S 7 3 (no linear relationship)F 7 : S 7 ≠ 3 (linear relationship doeseEist)

    )#*(*0#

    β β e s

    −T n-2

  • 8/20/2019 Correlation lecture

    41/79

    Qormula for the standard errorbeta (you "ill not have tocalculate by handT):

    ii

    n

    ii

    x y

    x x

    β α ###and

    )(SS-here1

    2x

    +=

    −= ∑=

    x

    x y

    x

    n

    iii

    SS

    s

    SS n

    y y

    s21

    2

    #2

    )#(

    =−

    =∑=

    β

  • 8/20/2019 Correlation lecture

    42/79

    IEample: dataset

    tandard error (beta) 3 3B 8 LM 3 7JP3 3B J! p4 3337

    JU Con@dence interval 3 3 to3 7

  • 8/20/2019 Correlation lecture

    43/79

    /esidual Analysis: chec9assumptions

    8he residual for observation i! e i! is the

    di erence bet"een its observed and predictedvalueChec9 the assumptions of regression byeEamining the residuals

    IEamine for linearity assumptionIEamine for constant variance for all levels of 0(homoscedasticity)Ivaluate normal distribution assumptionIvaluate independence assumption

    Wra hical Anal sis of /esiduals

    iii Y Y e #−=

  • 8/20/2019 Correlation lecture

    44/79

    ,redicted values<

    ii x y *120# +=?or @itamin 0

  • 8/20/2019 Correlation lecture

    45/79

    /esidual observed ' predicted

    14#

    F4#

    4H

    =−

    =

    =

    ii

    i

    i

    y y

    y

    y"

  • 8/20/2019 Correlation lecture

    46/79

    /esidual Analysis for

    Linearity

    Not Linear Linear

    x r e s

    i d u a

    l s

    x

    Y

    x

    Y

    x

    r e s

    i d u a

    l s

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    47/79

    /esidual Analysis forFomoscedasticity

    Non-constant variance Constant variance

    x x

    Y

    x x

    Y

    r e s

    i d u a

    l s

    r e s

    i d u a

    l s

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    48/79

    Residual Analysis forIndependence

    Not IndependentIndependent

    X

    X r e s

    i d u a l s

    r e s

    i d u a

    l s

    X r e s

    i d u a

    l s

    Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 rentice!"all

  • 8/20/2019 Correlation lecture

    49/79

    /esidual plot! dataset

  • 8/20/2019 Correlation lecture

    50/79

    ultiple linearregression<

    *hat if age is a confounder here?Older men have lo"er vitamin ;

    Older men have poorer cognition>Ad%ust for age by putting age inthe model:

    ; 8 score intercept Xslope 7 x vitamin ; X slope A x age

  • 8/20/2019 Correlation lecture

    51/79

  • 8/20/2019 Correlation lecture

    52/79

    ;i erent B; vie"<

  • 8/20/2019 Correlation lecture

    53/79

    Qit a plane rather than aline<

    =n the plane4 the

    slope for vitamin0 is the same atevery age3 thus4the slope forvitamin 0represents the

    effect of vitamin0 when age isheld constant.

  • 8/20/2019 Correlation lecture

    54/79

    I#uation of the >Rest @tplane<

    ; 8 score JB X 3 33B x vitamin ;(in 73 nmolPL) ' 3 x age (in years)

    ,'value for vitamin ; 22 3J,'value for age 4 3337

    8hus! relationship "ith vitamin ; "asdue to confounding by ageT

  • 8/20/2019 Correlation lecture

    55/79

    ultiple Linear /egressionore than one predictor<

    I(y) α X β 7=0 X β =* X β

    B =Y<

    Iach regression coe.cient is the amount ofchange in the outcome variable that "ould be

    eEpected per one'unit change of the predictor! ifall other variables in the model "ere heldconstant

  • 8/20/2019 Correlation lecture

    56/79

    Qunctions of multivariateanalysis:

    ;ontrol for confo'ndersCest for interactions .et-een %redictors(effect modification)+m%rove %redictions

  • 8/20/2019 Correlation lecture

    57/79

    A ttest is linear regressionT;ivide vitamin ; into t"o groups:

    5nsu.cient vitamin ; (4J3 nmolPL) u.cient vitamin ; (2 J3 nmolPL)!reference group

    *e can evaluate these data "ith a ttestor a linear regression<

    000H*D4I*F

    4IH*10

    E4H*10

    E*JE*F24022GH

    ==

    +

    =−= pT

  • 8/20/2019 Correlation lecture

    58/79

    As a linear regression<

    Parameter ````````````````Standard Variable Estimate Error t Value Pr > |t|

    Intercept 40.07407 1.47511 27.17 .0001

    insu!! "7.5#0$0 2.174%# "#.4$ 0.000&

    #nterceptrepresents themean value inthe sufficientgroup.

    %lope representsthe difference inmeans between thegroups. 0ifferenceis significant.

    A&OVA is linear

  • 8/20/2019 Correlation lecture

    59/79

    A&OVA is linearregressionT

    ;ivide vitamin ; into three groups:;e@cient (4 J nmolPL)5nsu.cient (2 J and 4J3 nmolPL)

    u.cient (2 J3 nmolPL)! reference group; 8 α ( value for su.cient) X β insu.cient =(7 if

    insu.cient) X β =(7 if de@cient)

    8his is called >dummy coding Z"here multiple

    binary variables are created to represent being ineach category (or not) of a categorical variable

  • 8/20/2019 Correlation lecture

    60/79

    8he picture<

    %ufficient vs.#nsufficient

    %ufficient vs.0eficient

  • 8/20/2019 Correlation lecture

    61/79

    /esults< Parameter Estimates

    Parameter Standard Variable '( Estimate Error t Value Pr > |t|

    Intercept 1 40.07407 1.47&17 27.11 .0001

    deficient 1 -9.87407 3.73950 -2.64 0.0096 insufficient 1 -6.87963 2.33719 -2.94 0.0041

    5nterpretation: 8he de@cient group has a mean ; 8 MNpoints lo"er than the reference (su.cient)group

    8he insu.cient group has a mean ; 8 MN

    points lo"er than the reference (su.cient)

    h f l

  • 8/20/2019 Correlation lecture

    62/79

    Other types of multivariateregression

    M'lti%le linear regression is for normall&distri.'ted o'tcomes

    7ogistic regression is for .inar& o'tcomes

    ;ox %ro%ortional ha ards regression is 'sed -hentime!to!event is the o'tcome

  • 8/20/2019 Correlation lecture

    63/79

    'ommon multivariate regression models.

    K'tcome(de%endentvaria.le)

    Exam%leo'tcomevaria.le

    $%%ro%riatem'ltivariateregressionmodel

    Exam%le e8'ation ?hat do the coefficients give&o'/

    ;ontin'o's Llood %ress're

    inearregression

    .lood %ress're (mm"g) α 3 βsalt salt cons'm%tion (ts% da&) 3βage age (&ears) 3 βsmo er ever

    smo er (&es 1 no 0)

    slo%es tells &o' ho- m'chthe o'tcome varia.leincreases for ever& 1!'nitincrease in each %redictor*

    Linar& "igh .lood %ress're(&es no)

    ogisticregression

    ln (odds of high .lood %ress're) α 3 βsalt salt cons'm%tion (ts% da&) 3βage age (&ears) 3 βsmo er ever

    smo er (&es 1 no 0)

    odds ratios tells &o' ho-m'ch the odds of theo'tcome increase for ever&1!'nit increase in each

    %redictor*

    Cime!to!event Cime!to!death

    'ox regression ln (rate of death) α 3 βsalt salt cons'm%tion (ts% da&) 3βage age (&ears) 3 βsmo er ever

    smo er (&es 1 no 0)

    ha ard ratios tells &o' ho-m'ch the rate of the o'tcomeincreases for ever& 1!'nitincrease in each %redictor*

    l i i i

  • 8/20/2019 Correlation lecture

    64/79

    ultivariate regressionpitfallsAulti-collinearity*esidual confounding=verfitting

  • 8/20/2019 Correlation lecture

    65/79

    ulticollinearityAulticollinearity arises -hen t-o varia.les thatmeas're the same thing or similar things (e*g*,-eight and LM+) are .oth incl'ded in a m'lti%leregression modelD the& -ill, in effect, cancel eachother o't and generall& destro& &o'r model*

    Model .'ilding and diagnostics are tric & .'sinessN

  • 8/20/2019 Correlation lecture

    66/79

    /esidual confounding

    1ou cannot completely "ipe outconfounding simply by ad%usting

    for variables in multiple regressionunless variables are measured "ithzero error ("hich is usually

    impossible)IEample: meat eating andmortality

    "h t l t f t

  • 8/20/2019 Correlation lecture

    67/79

    en "ho eat a lot of meatare unhealthier for manyreasonsT

    Sinha O, ;ross $P, Qra'.ard L+, 7eit mann M5, Schat in $* Meat inta e and mortalit&: a %ros%ective st'd& of over half a million %eo%le* Arc !n"ern #ed

  • 8/20/2019 Correlation lecture

    68/79

    ortality ris9s<

    Sinha O, ;ross $P, Qra'.ard L+, 7eit mann M5, Schat in $* Meat inta e and mortalit&: a %ros%ective st'd& of over half a million %eo%le* Arc !n"ern #ed

  • 8/20/2019 Correlation lecture

    69/79

    Over@tting

    5n multivariate modeling! you canget highly signi@cant but

    meaningless results if you put toomany predictors in the model 8he model is @t perfectly to the

    #uir9s of your particular sample!but has no predictive ability in ane" sample

    O @tti l d t

  • 8/20/2019 Correlation lecture

    70/79

    Over@tting: class dataeEample

    5 as9ed A to automatically @ndpredictors of optimism in our class

    dataset Fere-s the resulting linearregression model: Parameter Standard Variable Estimate Error )*pe II SS ( Value Pr > (

    Intercept 11.&0175 2.%) 11.%$0$7 15.$5 0.001%

    e+ercise "0.2%10$ 0.0%7%& $.745$% &. 0.0117 sleep "1.%15%2 0.#%4%4 17.%&&1& 2#.5# 0.0004 obama 1.7#%%# 0.24#52 #%.01%44 51.05

  • 8/20/2019 Correlation lecture

    71/79

    5f something seems togood to be true<

    'linton4 univariate+ arameter Standard >aria.le 7a.el al'e r R t

    +nterce%t +nterce%t 1 *4FIHH 2*1F4JI 2* 0*01HH 'linton 'linton 1 .2;

  • 8/20/2019 Correlation lecture

    72/79

    ore univariate models<

    =bama4 Cnivariate+ arameter Standard >aria.le 7a.el al'e r R t

    +nterce%t +nterce%t 1 0*H210J 2*4F1FJ 0*F4 0*JFHG obama obama 1 .,B2B7 .91aria.le 7a.el al'e r R t

    +nterce%t +nterce%t 1 F*J02J0 1*2 F02 2*GI 0*00JI math ove math ove 1 .8

  • 8/20/2019 Correlation lecture

    73/79

    Over@tting

    ,ure noise variables still produce good R values if themodel is over@tted 8he distribution of R values from aseries of simulated regression models containing onlynoise variables(Qigure 7 from: Rabya9! A *hat 1ou ee ay &ot Re *hat 1ou Wet: A Rrief! &ontechnical5ntroduction to Over@tting in /egression'8ype odels Psychosomatic Medicine : 77' 7

    O'le of th'm.: o' need atleast 10 s'. ects for eachadditional %redictorvaria.le in the m'ltivariate

    regression model*

  • 8/20/2019 Correlation lecture

    74/79

    /evie" of statistical tests

    8he follo"ing table gives the appropriatechoice of a statistical test or measure of

    association for various types of data(outcome variables and predictorvariables) by study design

    ;ontin'o's o'tcome

    Linar& %redictor ;ontin'o's %redictors

    e*g*, .lood %ress're %o'nds 3 age 3 treatment (1 0)

    Types of variables to be analy ed %tatistical proceduref

  • 8/20/2019 Correlation lecture

    75/79

    or measure of associationFredictor variable&s=utcome variable'ross-sectional&case-control studies

    'ategorical (G2 groups) 'ontinuous N=@'ontinuous 'ontinuous %imple linear regression

    Aultivariate

    (categorical andcontinuous)

    'ontinuous Aultiple linear regression

    'ategorical 'ategorical 'hi-s$uare test (or ?isherHsexact)inary inary =dds ratio4 ris5 ratio

    Aultivariate inary ogistic regression

    'ohort %tudies&'linical Trials

    inary inary *is5 ratio'ategorical Time-to-event Iaplan-Aeier& log-ran5 test

    Aultivariate Time-to-event 'ox-proportional ha ardsregression4 ha ard ratio

    inary (two groups) 'ontinuousT-test

    inary *an5s&ordinal Jilcoxon ran5-sum test

    'ategorical 'ontinuous *epeated measures N=@Aultivariate 'ontinuous Aixed models3 KEE modeling

    Al i

  • 8/20/2019 Correlation lecture

    76/79

    Alternative summary:statistics for various types ofoutcome data

    OutcomeVariable

    Are the observations independentor correlated?

    Assumptions

    independent correlated

    Continuous

    (e g pain scale!cognitive function)

    8testA&OVALinear correlationLinear regression

    ,aired ttest/epeated'measuresA&OVA

    iEed modelsPWIImodeling

    Outcome isnormally distributed

    (important for smallsamples)Outcome andpredictor have alinear relationship

    Rinary orcategorical(e g fracture

    ;i erence inproportions/elative ris9sChi's#uare test

    c&emar-s testConditional logisticregressionWII modeling

    Chi's#uare testassumes su.cientnumbers in each

    cell (2 J)

  • 8/20/2019 Correlation lecture

    77/79

    Continuous outcome

    (means)$ F/, J PF/,OutcomeVariable

    Are the observations independent or correlated?Alternatives if thenormality assumption isviolated (and smallsample size):

    independent correlated

    Continuous(e g painscale!cognitivefunction)

    Ttest: compares meansbet"een t"o independentgroups

    ANOVA: compares meansbet"een more than t"o

    independent groups

    Pearson’s correlationcoefcient (linearcorrelation): sho"s linearcorrelation bet"een t"ocontinuous variables

    Linear regression: multivariate re ression

    Paired ttest: comparesmeans bet"een t"o relatedgroups (e g ! the samesub%ects before and after)

    Repeated-measures

    ANOVA: compares changesover time in the means of t"oor more groups (repeatedmeasurements)

    Mixed models/ !!modeling : multivariateregression techni#ues to

    compare changes over timebet"een t"o or more groups$

    &on'parametric statistics

    "ilcoxon sign-ran#test : non'parametricalternative to the paired ttest

    "ilcoxon sum-ran# test ( ann'*hitney + test): non'parametric alternative tothe ttest

    $rus#al-"allis test: non'parametric alternative toA&OVA

    %pearman ran#

    Ri i l

  • 8/20/2019 Correlation lecture

    78/79

    Rinary or categoricaloutcomes (proportions)$ F/,

    J PF/, 7OutcomeVariable

    Are the observations correlated? Alternative to thechi's#uare test ifsparse cells:

    independent correlated

    Rinary orcategorical(e gfracture!yesPno)

    .)i-s*uare test: compares proportionsbet"een t"o or moregroups

    Relati+e ris#s: oddsratios or ris9 ratios

    Logistic regression: multivariate techni#ue used "hen outcome isbinary$ gives multivariate'ad%usted odds ratios

    McNemar’s c)i-s*uaretest: compares binaryoutcome bet"een correlatedgroups (e g ! before and after)

    .onditional logisticregression: multivariateregression techni#ue for abinary outcome "hen groupsare correlated (e g ! matcheddata)

    !! modeling: multivariate

    regression techni#ue for abinary outcome "hen groups

    is)er’s exact test: compares proportions bet"eenindependent groups "henthere are sparse data (somecells 4J)

    McNemar’s exact test: compares proportions bet"eencorrelated groups "hen thereare sparse data (some cells4J)

  • 8/20/2019 Correlation lecture

    79/79

    8ime'to'event outcome

    (survival data)$ F/, OutcomeVariable

    Are the observation groups independent or correlated? odi@cationsto CoEregression ifproportional'hazards isviolated:

    independent correlated

    8ime'to'event (e g !time tofracture)

    $aplan-Meier statistics: estimates survivalfunctions for each group (usually displayed graphically)$compares survival functions "ith log'ran9 test

    .ox regression: ultivariate techni#ue for time'to'event data$ gives multivariate'ad%usted hazard ratios

    nPa (alreadyover time)

    8ime'dependentpredictors ortime'dependenthazard ratios(tric9yT)