7
10/13/15 1 Statistics: Unlocking the Powerof Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for slope Conditions for inference Statistics: Unlocking the Powerof Data Lock 5 Question of the Day Is the size of certain regions of your brain associated with the size of your social network? Statistics: Unlocking the Powerof Data Lock 5 Social Networks and the Brain Data from 40 students at City College London How to measure brain size? How to measure social network size? Source: R.Kanai,B.Bahrami,R.Roylance andG.Ree(2011). Online social network size is reflectedinhumanbrainstructure, Proceedings of the RoyalSociety B: Biological Sciences. 10/19/11. Statistics: Unlocking the Powerof Data Lock 5 Measuring Brain Size Structural Magnetic Resonance Imaging (MRI) Voxel-based morphometry (VBM) to compute regional grey matter volume based on T1-weighted anatomical MRI scans Brain regions found significant in initial study ¡ Amygdala (emotion and emotional memory) ¡ Middle temporal gyrus (social perception) ¡ Entorhinal cortex (memory and navigation) ¡ Superior temporal sulcus (perception of others) Response: normalized z-score of grey matter density for these brain regions Statistics: Unlocking the Powerof Data Lock 5 Brain Regions Image from Do our Brains Determine ourFacebook Friend Count? (www.nature.com) Statistics: Unlocking the Powerof Data Lock 5 Social Networks and the Brain How to measure size of social network? ¡ How many were present at your 18th or 21st birthday party? ¡ If you were going to have a party now, how many people would you invite? ¡ What is the total number of friends in yourphonebook? ¡ Write down the names of the people to whom you would send a text message marking a celebratory event. How many people is that? ¡ Write down the names of people in your phonebook you would meet for a chat in a small group (one to three people). How many people is that? ¡ How many friends have you kept from school and university whom you could have a friendly conversation with now? ¡ How many friends do you have on‘Facebook’? ¡ How many friends do you have from outside school or university? ¡ Write down the names of the people of whom you feel you could ask a favor and expect to have it granted. How many people is that?

Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

1

Statistics:UnlockingthePowerofData Lock5

STAT250Dr.KariLockMorgan

SimpleLinearRegression

SECTION9.1 • Inferenceforcorrelation• Inferenceforslope• Conditionsforinference

Statistics:UnlockingthePowerofData Lock5

QuestionoftheDay

Isthesizeofcertainregionsofyourbrainassociatedwiththesizeofyoursocialnetwork?

Statistics:UnlockingthePowerofData Lock5

SocialNetworksandtheBrain� Datafrom 40studentsatCityCollegeLondon

�Howtomeasurebrainsize?

�Howtomeasuresocial networksize?

Source:R.Kanai,B.Bahrami,R.Roylance andG.Ree(2011).Onlinesocialnetworksizeisreflectedinhumanbrainstructure,ProceedingsoftheRoyalSocietyB:BiologicalSciences.10/19/11.

Statistics:UnlockingthePowerofData Lock5

MeasuringBrainSize� Structural Magnetic Resonance Imaging (MRI)

� Voxel-based morphometry (VBM) tocomputeregional greymatter volume based onT1-weightedanatomical MRIscans

� Brainregions foundsignificant ininitial study¡ Amygdala (emotion and emotionalmemory)¡Middle temporal gyrus (social perception)¡ Entorhinal cortex(memory and navigation)¡ Superior temporal sulcus (perception of others)

� Response: normalized z-score ofgreymatterdensity forthese brain regions

Statistics:UnlockingthePowerofData Lock5

BrainRegions

ImagefromDoourBrainsDetermineourFacebookFriendCount? (www.nature.com)

Statistics:UnlockingthePowerofData Lock5

SocialNetworksandtheBrain� Howtomeasure sizeofsocialnetwork?

¡ Howmanywerepresentatyour18thor21stbirthdayparty?¡ Ifyouweregoingtohaveapartynow,howmanypeoplewouldyouinvite?

¡ What isthetotalnumberoffriendsinyourphonebook?¡ Write downthenamesofthepeopletowhomyouwouldsendatextmessage markingacelebratoryevent. Howmanypeopleisthat?

¡ Write downthenamesofpeopleinyourphonebookyouwouldmeet forachatinasmallgroup(onetothreepeople).Howmanypeopleisthat?

¡ Howmanyfriendshaveyoukeptfromschoolanduniversitywhomyoucouldhaveafriendlyconversationwithnow?

¡ Howmanyfriendsdoyouhaveon‘Facebook’?¡ Howmanyfriendsdoyouhavefromoutsideschooloruniversity?¡ Write downthenamesofthepeopleofwhomyoufeelyoucouldaskafavorandexpecttohaveitgranted.Howmanypeopleisthat?

Page 2: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

2

Statistics:UnlockingthePowerofData Lock5

SocialNetworksandtheBrain

r =0.436

Istheassociationsignificant?

Statistics:UnlockingthePowerofData Lock5

Parameter Distribution Standard Error

Proportion Normal

Difference inProportions

Normal

Mean t,df =n – 1

Difference inMeans t,df = min(n1, n2) – 1

Correlation t,df =n– 2

Standard ErrorFormulas

(1 )p pn−

2

1 1

1

2 2

2

(1 ) (1 )p p p pn n− −+

2 21 2

1 2n nσ σ+

1− ρ2

n− 2

Statistics:UnlockingthePowerofData Lock5

SocialNetworks andtheBrain• IsthegreymattervolumeoftheseregionsofthebrainsignificantlycorrelatedwithnumberofFacebookfriends?

• Fromn=40people,wefindr =.436.Isthissignificant?

Statistics:UnlockingthePowerofData Lock5

SocialNetworks andtheBrain0 : 0: 0aH

H ρρ=≠

t = r − 0

SE= 0.436

0.156= 2.99

This provides strong evidence that the grey matter density of these regions of the brain and number of Facebook friends are positively correlated.

n = 40 ≥ 30

1.Statehypotheses:

2.Check conditions:

3.Calculate test statistic:

4.Compute p-value:

5. Interpret incontext:

r = 0.436

SE = 1− 0.4362

40− 2= 0.156

p-value=0.0048

Statistics:UnlockingthePowerofData Lock5

SocialNetworksandtheBrain

ShouldyougooutandaddmoreFacebookfriendstoincreasethesizeofyourbrain?

a) Yesb)No

Statistics:UnlockingthePowerofData Lock5

R2

R2 istheproportionofthevariabilityintheresponsevariable,Y,thatis

explainedbytheexplanatoryvariable,X

� Forsimplelinearregression, R2 =r2 (R2 isjustthesamplecorrelation squared)

Page 3: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

3

Statistics:UnlockingthePowerofData Lock5

R22 0.67R = 2 0.09R =

HowmuchdoesthevariabilityinYdecreaseifyouknowX?

Statistics:UnlockingthePowerofData Lock5

RegressioninMinitab� Stat->Regression->FittedLinePlot

0.4362 =0.19

Statistics:UnlockingthePowerofData Lock5

Prediction

Shouldyouusethisequationtopredictthenormalizedsizeoftheseregionsofyourbrain?

a) Yesb)No

Statistics:UnlockingthePowerofData Lock5

SampletoPopulation� Everythingwehavedonesofarwithregressionisbasedsolelyonsampledata

�Now,wewillextendfromthesampletothepopulation

� Statistical inference!

Statistics:UnlockingthePowerofData Lock5

• Thepopulation/true simple linearmodel is

𝑦 = 𝛽$ + 𝛽&𝑥 + 𝜀

• β0 and β1,areunknownparameters

• Can usefamiliar inference methods!

Intercept Slope

Simple Linear Model

Randomerror

Statistics:UnlockingthePowerofData Lock5

InferencefortheSlope� Testforwhethertheslopeissignificantlydifferentfrom0(whetherthereisanylinearrelationshipbetweenxandy):

� Confidenceintervalforthetrueslope!!

H0 :β1 =0Ha :β1 ≠0

Page 4: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

4

Statistics:UnlockingthePowerofData Lock5

• Confidenceintervalsandhypothesistests fortheslopecanbedoneusingthefamiliarformulas:

• PopulationParameter:β1,SampleStatistic:𝛽)&

• Uset-distribution withn – 2degreesoffreedom

Inference fortheSlope

sample statistic null valueSE

t −=

*sample statistic t SE± ×

Statistics:UnlockingthePowerofData Lock5

RegressioninMinitabStat ->Regression->Regression->FitRegressionModel

Statistics:UnlockingthePowerofData Lock5

Inference forSlope

Istheslopesignificantlydifferentfrom 0?(a)Yes(b)No

n =40

Givea95%confidenceintervalforthetrueslope.

Statistics:UnlockingthePowerofData Lock5

Hypothesis Test

Statistics:UnlockingthePowerofData Lock5

RegressioninMinitabStat ->Regression->Regression->FitRegressionModel

Statistics:UnlockingthePowerofData Lock5

TwoQuantitative Variables• Thet-statistic (andp-value)foratest foranon-zeroslopeandatestforanon-zerocorrelation areidentical!

• Theyareequivalentwaysoftesting foralinearassociation betweentwoquantitativevariables.

Page 5: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

5

Statistics:UnlockingthePowerofData Lock5

Confidence Interval

Statistics:UnlockingthePowerofData Lock5

Multiple Testing?

Statistics:UnlockingthePowerofData Lock5

FalsePositive (TypeIError) Protection

� Tofurtherprotect againstTypeIerrors, theyperformedtwoindependentanalysisontwoseparatesamples(n=125,thenn=40)

Statistics:UnlockingthePowerofData Lock5

Real-WorldNetworkSize�Whataboutreal-worldnetwork size?

Statistics:UnlockingthePowerofData Lock5

Inference based onthe simple linearmodel isonlyvalidifthe followingconditions hold:

1) Linearity2) Constant Variability ofResiduals3) Normality ofResiduals4) Independence

Conditions

Statistics:UnlockingthePowerofData Lock5

• Therelationship between x and y islinear (itmakes sense todrawalinethrough thescatterplot)

Linearity

Page 6: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

6

Statistics:UnlockingthePowerofData Lock5

DogYears

• 1dogyear=7humanyears• Linear:humanage=7×dogage

Charlie

• Fromwww.dogyears.com:“Theoldrule-of-thumbthatonedogyearequalssevenyearsofahumanlifeisnotaccurate.Theratioishigherwithyouthanddecreasesabitasthedogages.”

LINEAR

ACTUAL

A linearmodelcanstillbeuseful,evenifitdoesn’tperfectlyfitthedata.

Statistics:UnlockingthePowerofData Lock5

“Allmodelsarewrong,butsomeareuseful”

-GeorgeBox

Statistics:UnlockingthePowerofData Lock5

Residuals (errors)

( )~ 0,i N εε σTheerrorsarenormallydistributed

Theaverageoftheerrorsis0

Thestandarddeviationoftheerrorsisconstantforallcases

Conditions forresiduals:

Checkwithahistogram

(Alwaystrueforleastsquaresregression)

Constantspreadofpointsaround

thelineStatistics:UnlockingthePowerofData Lock5

RegressioninMinitabIstheassociationapproximatelylinear?a) Yesb) No

Isthespreadofthepointsaround thelineapproximatelyconstant?a) Yesb) No

Statistics:UnlockingthePowerofData Lock5

HistogramofResidualsAretheresidualsapproximatelynormallydistributed?a) Yesb) No

Statistics:UnlockingthePowerofData Lock5

Non-ConstantVariability

Page 7: Simple Linear Regression...Simple Linear Regression Statistics: Unlocking the Power of Data Lock5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4) Title Sec9-1 Author Kari Lock

10/13/15

7

Statistics:UnlockingthePowerofData Lock5

Non-NormalResiduals

Statistics:UnlockingthePowerofData Lock5

• Casesmustbeindependentofeachother(onecase’s valuesdoesnotaffect anothercase’svalues)

• Most common violationofthis:dataovertime

• Whatwouldmaketheindependenceconditionsatisfiedorviolatedinthesocial networkandbrainsizedata?

Independence

Statistics:UnlockingthePowerofData Lock5

• Iftheassociation isn’tlinear:don’tusesimplelinearregression

• Ifvariabilityisnotconstant, residualsarenotnormal,orcasesnotindependent:Themodelitselfisstillvalid,butinference maynotbeaccurate

• Ifyouwanttodosomething morefancysotheconditionsaremet…takeSTAT 462!

Conditions notMet?

Statistics:UnlockingthePowerofData Lock5

1) Plotyourdata!• Associationapproximatelylinear?• Outliers?• Constantvariability?

2) Fitthemodel(least squares)3) Usethemodel • Interpretcoefficients• Makepredictions

4) Lookathistogramofresiduals(normal?)5) Inference(extend topopulation)• Inferenceonslope (intervalandtest)

Simple Linear Regression

Statistics:UnlockingthePowerofData Lock5

ToDo� ReadSection9.1

� DoHW9.1(dueFriday,12/4)