39
Correla’on con’nued and simple linear regression

Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on con'nued and simple linear regression

Page 2: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Outline for today

Be#erknowaplayer:MarkPriorReviewandcon6nua6onofcorrela6onSimplelinearregression!

Page 3: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Worksheet 3: Jeter BA boxplot

Whatisthereasonfortheoutlier?

Page 4: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Review

Page 5: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Worksheet 3: interpre'ng z-scores

BAz-scores

Jeter: 1.02Ruth: 2.168Gehrig: -0.389Mantle: 2.70MaPngly: 1.46Howdoweinterpretwhatagoodz-scoreis?

Page 6: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Are BAs normally distributed?

Page 7: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

ScaHer plots

Asca$erplotgraphstherela6onshipbetweentwovariablesIfthereisanexplanatoryandresponsevariable,thentheexplanatoryvariableisputonthex-axisandtheresponsevariableisputonthey-axis

R: plot(x, y)

Runsscored

Winning%

Page 8: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on

Thecorrela+onismeasureofthestrengthanddirec6onofalinearassocia6onbetweentwovariables.

R: cor(x, y)

Page 9: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on examples

Page 10: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on Examples

Runsallowedandwins

r=-.55

(runsscored)/(runsallowed)andwins

r=.93

Page 11: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on cau'ons

1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.

Page 12: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on cau'ons

1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Correla6oncanbeheavilyinfluencesbyoutliers.Alwaysplotyourdata!

Page 13: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Anscombe’s quartet (r = 0.81)

Page 14: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Offensive sta's'cs

Whatdothefollowingabbrevia6onsstandfor?

HBBPAABOBPBASlugPct

Hits:1B+2B+3B+HRWalks:4ballsPlateAppearances:Numberof6mes“up”AtBats:PA-BBOn-BasePercentage:(H+BB)/PABaPngAverage:H/ABSluggingpercentage:(1·1B+2·2B+3·3B+4·HR)/AB

Page 15: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Whowouldyouratherhaveonyourteam?

DerekJeter DavidOr6z

Who is a beHer hiHer: Derek Jeter or David Or'z?

Page 16: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Jeterhasabe#erbaPngaverage

Who is a beHer hiHer: Derek Jeter or David Or'z?

Page 17: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Or6zhitsmorehomeruns

Who is a beHer hiHer: Derek Jeter or David Or'z?

Page 18: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Is power or baVng average more important?

Comparethembasedonthe“best”sta6s6c

Howdowedeterminewhichsta6s6cisbest?

Runsscoredandwins

Runsscored

Wins

Page 19: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

The great cycle of baseball

Morewins

Morefans

More$$$Be#erplayers

Scoremoreruns

Wecanevaluatehow‘good’asta6s6cisbasedonhowwellitcorrelateswiththenumberofrunsateamscores

Page 20: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

What is the best sta's'c to use?

Oneidea:the‘best’sta6s6ctojudgeaplayeristhesta6s6cthatismostcorrelatedwithruns• Wecanthenusethistoexaminehowgoodahi#eris

Wewilluseadatasetthathasseasontotalsta6s6csgoingbackto1961

Thesesta6s6csincludethetotalrunsateamscored,totalteamHR,totalteamBA,etc.

load('/home/shared/baseball_stats_2007/data/team_batting_stats.Rda')

Page 21: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

What is the best sta's'c to use?

Sta6s6cstocompare:1.Homeruns(HR) 2.BaPngaverage(BA)3.On-basepercentage(OBP) 4.Sluggingpercentage(Slug)5.On-basepercentage+sluggingpercentage(OPS)

Foreachofthese5sta6s6cs:•  Createasca#erplotbetweenthesta6s6candruns(R)•  Calculatethecorrela6onbetweenthesta6s6candruns(R)

Onceyouhavefoundthesta+s+cthatismorecorrelatedwithruns,createaside-by-sideboxplottocompareDerekJe$erandDavidOr+z’sdataonthissta+s+c

Youcangettheteamyearlytotalsta6s6csrun: load('/home/shared/baseball_stats_2017/team_batting_stats.Rda')

Usefulfunc6ons:•  plot(x,y)#createasca#erplotofdifferentsta6s6csandruns•  cor(x,y)#calculatethecorrela6onbetweendifferentsta6s6csandruns•  boxplot(v1,v2,names=c(‘Derek',‘David'))#compareplayersonthis‘best’sta6s6c

Page 22: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Results…

Page 23: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on between HR and runs

Page 24: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on between BA and runs

Page 25: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on between OBP and runs

Page 26: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on between Slug and runs

Page 27: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Correla'on between OPS and runs

Page 28: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

The winner…

On-baseplussluggingseemslikethebeststa6s6ctouse!

Page 29: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Or6zhasabe#eron-baseplusslugging!

Who is a beHer hiHer: Derek Jeter or David Or'z?

Page 30: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

BeHer know a player: Derek Jeter

Onioninfographic

OtherOnionar6cles

Page 31: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Regression

RegressionismethodofusingonevariabletopredictthevalueofasecondvariableInlinearregressionwefitalinetothedata,calledtheregressionline.

Page 32: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Regression line: runs/game as a func'on of team baVng average (2013)

Page 33: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Equa'on for a line

ŷ=a+b·x

Response=a+b·Explanatory

Page 34: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Wins runs regression

ŵ=14.47+.088·runs

a=14.47

b=.088

ŷ=a+b·x

R: lm(y ~ x)

Page 35: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Interpre'ng the slope and intercept

ŷ=a+b·x

Theslopebrepresentsthepredictedchangeintheresponsevariableygivenaoneunitchangeintheexplanatoryvariablex

Theinterceptarepresentedthepredictedvalueoftheresponsevariableyiftheexplanatoryvariablexwere0

Page 36: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Using the regression line to make predic'ons

1.Approximatelyhowmanyaddi6onalrunsdoyouneedtoscoreforanaddi6onalwin?2.Howmanywinswillyouhaveifyouscore0runsallseason?

a=14.47

b=.088

ŷ=a+b·x

ŵ=14.47+.088·Runs

Page 37: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Wins runs regression

1.Anaddi6onalwinfor~11addi6onalrunsscored2.Therewillbe14.47winsifyouscore0runsallseason

a=14.47

b=.088

ŷ=a+b·x

ŵ=14.47+.088·Runs

Page 38: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

Example 2: Using the regression line to make predic'ons

a=-3.27

b=29.36

ŷ=a+b·x

1.  If a team had a baVng average of 0.270, how many runs would you expect in a game?

Page 39: Correla’on con’nued and simple linear regressionemeyers.scripts.mit.edu/emeyers/wp-content/uploads/CS149_slides/… · Correla’on cau’ons 1. A strong posi6ve or negave correlaon

If a team had a baVng average of 0.270, how many runs would you expect in a game?

(R/G)expected=29.35*BA-3.27

(R/G)expected=29.35*.270-3.27

R/G)expected=4.6572HowaboutifateambaPng.250?

a=-3.27

b=29.36

ŷ=a+b·x