Correla’on con’nued and simple linear...

Preview:

Citation preview

Correla'on con'nued and simple linear regression

Outline for today

Be#erknowaplayer:MarkPriorReviewandcon6nua6onofcorrela6onSimplelinearregression!

Worksheet 3: Jeter BA boxplot

Whatisthereasonfortheoutlier?

Review

Worksheet 3: interpre'ng z-scores

BAz-scores

Jeter: 1.02Ruth: 2.168Gehrig: -0.389Mantle: 2.70MaPngly: 1.46Howdoweinterpretwhatagoodz-scoreis?

Are BAs normally distributed?

ScaHer plots

Asca$erplotgraphstherela6onshipbetweentwovariablesIfthereisanexplanatoryandresponsevariable,thentheexplanatoryvariableisputonthex-axisandtheresponsevariableisputonthey-axis

R: plot(x, y)

Runsscored

Winning%

Correla'on

Thecorrela+onismeasureofthestrengthanddirec6onofalinearassocia6onbetweentwovariables.

R: cor(x, y)

Correla'on examples

Correla'on Examples

Runsallowedandwins

r=-.55

(runsscored)/(runsallowed)andwins

r=.93

Correla'on cau'ons

1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.

Correla'on cau'ons

1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Correla6oncanbeheavilyinfluencesbyoutliers.Alwaysplotyourdata!

Anscombe’s quartet (r = 0.81)

Offensive sta's'cs

Whatdothefollowingabbrevia6onsstandfor?

HBBPAABOBPBASlugPct

Hits:1B+2B+3B+HRWalks:4ballsPlateAppearances:Numberof6mes“up”AtBats:PA-BBOn-BasePercentage:(H+BB)/PABaPngAverage:H/ABSluggingpercentage:(1·1B+2·2B+3·3B+4·HR)/AB

Whowouldyouratherhaveonyourteam?

DerekJeter DavidOr6z

Who is a beHer hiHer: Derek Jeter or David Or'z?

Jeterhasabe#erbaPngaverage

Who is a beHer hiHer: Derek Jeter or David Or'z?

Or6zhitsmorehomeruns

Who is a beHer hiHer: Derek Jeter or David Or'z?

Is power or baVng average more important?

Comparethembasedonthe“best”sta6s6c

Howdowedeterminewhichsta6s6cisbest?

Runsscoredandwins

Runsscored

Wins

The great cycle of baseball

Morewins

Morefans

More$$$Be#erplayers

Scoremoreruns

Wecanevaluatehow‘good’asta6s6cisbasedonhowwellitcorrelateswiththenumberofrunsateamscores

What is the best sta's'c to use?

Oneidea:the‘best’sta6s6ctojudgeaplayeristhesta6s6cthatismostcorrelatedwithruns• Wecanthenusethistoexaminehowgoodahi#eris

Wewilluseadatasetthathasseasontotalsta6s6csgoingbackto1961

Thesesta6s6csincludethetotalrunsateamscored,totalteamHR,totalteamBA,etc.

load('/home/shared/baseball_stats_2007/data/team_batting_stats.Rda')

What is the best sta's'c to use?

Sta6s6cstocompare:1.Homeruns(HR) 2.BaPngaverage(BA)3.On-basepercentage(OBP) 4.Sluggingpercentage(Slug)5.On-basepercentage+sluggingpercentage(OPS)

Foreachofthese5sta6s6cs:•  Createasca#erplotbetweenthesta6s6candruns(R)•  Calculatethecorrela6onbetweenthesta6s6candruns(R)

Onceyouhavefoundthesta+s+cthatismorecorrelatedwithruns,createaside-by-sideboxplottocompareDerekJe$erandDavidOr+z’sdataonthissta+s+c

Youcangettheteamyearlytotalsta6s6csrun: load('/home/shared/baseball_stats_2017/team_batting_stats.Rda')

Usefulfunc6ons:•  plot(x,y)#createasca#erplotofdifferentsta6s6csandruns•  cor(x,y)#calculatethecorrela6onbetweendifferentsta6s6csandruns•  boxplot(v1,v2,names=c(‘Derek',‘David'))#compareplayersonthis‘best’sta6s6c

Results…

Correla'on between HR and runs

Correla'on between BA and runs

Correla'on between OBP and runs

Correla'on between Slug and runs

Correla'on between OPS and runs

The winner…

On-baseplussluggingseemslikethebeststa6s6ctouse!

Or6zhasabe#eron-baseplusslugging!

Who is a beHer hiHer: Derek Jeter or David Or'z?

BeHer know a player: Derek Jeter

Onioninfographic

OtherOnionar6cles

Regression

RegressionismethodofusingonevariabletopredictthevalueofasecondvariableInlinearregressionwefitalinetothedata,calledtheregressionline.

Regression line: runs/game as a func'on of team baVng average (2013)

Equa'on for a line

ŷ=a+b·x

Response=a+b·Explanatory

Wins runs regression

ŵ=14.47+.088·runs

a=14.47

b=.088

ŷ=a+b·x

R: lm(y ~ x)

Interpre'ng the slope and intercept

ŷ=a+b·x

Theslopebrepresentsthepredictedchangeintheresponsevariableygivenaoneunitchangeintheexplanatoryvariablex

Theinterceptarepresentedthepredictedvalueoftheresponsevariableyiftheexplanatoryvariablexwere0

Using the regression line to make predic'ons

1.Approximatelyhowmanyaddi6onalrunsdoyouneedtoscoreforanaddi6onalwin?2.Howmanywinswillyouhaveifyouscore0runsallseason?

a=14.47

b=.088

ŷ=a+b·x

ŵ=14.47+.088·Runs

Wins runs regression

1.Anaddi6onalwinfor~11addi6onalrunsscored2.Therewillbe14.47winsifyouscore0runsallseason

a=14.47

b=.088

ŷ=a+b·x

ŵ=14.47+.088·Runs

Example 2: Using the regression line to make predic'ons

a=-3.27

b=29.36

ŷ=a+b·x

1.  If a team had a baVng average of 0.270, how many runs would you expect in a game?

If a team had a baVng average of 0.270, how many runs would you expect in a game?

(R/G)expected=29.35*BA-3.27

(R/G)expected=29.35*.270-3.27

R/G)expected=4.6572HowaboutifateambaPng.250?

a=-3.27

b=29.36

ŷ=a+b·x

Recommended