Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Correla'on con'nued and simple linear regression
Outline for today
Be#erknowaplayer:MarkPriorReviewandcon6nua6onofcorrela6onSimplelinearregression!
Worksheet 3: Jeter BA boxplot
Whatisthereasonfortheoutlier?
Review
Worksheet 3: interpre'ng z-scores
BAz-scores
Jeter: 1.02Ruth: 2.168Gehrig: -0.389Mantle: 2.70MaPngly: 1.46Howdoweinterpretwhatagoodz-scoreis?
Are BAs normally distributed?
ScaHer plots
Asca$erplotgraphstherela6onshipbetweentwovariablesIfthereisanexplanatoryandresponsevariable,thentheexplanatoryvariableisputonthex-axisandtheresponsevariableisputonthey-axis
R: plot(x, y)
Runsscored
Winning%
Correla'on
Thecorrela+onismeasureofthestrengthanddirec6onofalinearassocia6onbetweentwovariables.
R: cor(x, y)
Correla'on examples
Correla'on Examples
Runsallowedandwins
r=-.55
(runsscored)/(runsallowed)andwins
r=.93
Correla'on cau'ons
1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.
Correla'on cau'ons
1.Astrongposi6veornega6vecorrela6ondoesnot(necessarily)implyacauseandeffectrela6onshipbetweentwovariables2.Acorrela6onnearzerodoesnot(necessarily)meanthattwovariablesarenotassociated.Correla6ononlymeasuresthestrengthofalinearrela6onship.3.Correla6oncanbeheavilyinfluencesbyoutliers.Alwaysplotyourdata!
Anscombe’s quartet (r = 0.81)
Offensive sta's'cs
Whatdothefollowingabbrevia6onsstandfor?
HBBPAABOBPBASlugPct
Hits:1B+2B+3B+HRWalks:4ballsPlateAppearances:Numberof6mes“up”AtBats:PA-BBOn-BasePercentage:(H+BB)/PABaPngAverage:H/ABSluggingpercentage:(1·1B+2·2B+3·3B+4·HR)/AB
Whowouldyouratherhaveonyourteam?
DerekJeter DavidOr6z
Who is a beHer hiHer: Derek Jeter or David Or'z?
Jeterhasabe#erbaPngaverage
Who is a beHer hiHer: Derek Jeter or David Or'z?
Or6zhitsmorehomeruns
Who is a beHer hiHer: Derek Jeter or David Or'z?
Is power or baVng average more important?
Comparethembasedonthe“best”sta6s6c
Howdowedeterminewhichsta6s6cisbest?
Runsscoredandwins
Runsscored
Wins
The great cycle of baseball
Morewins
Morefans
More$$$Be#erplayers
Scoremoreruns
Wecanevaluatehow‘good’asta6s6cisbasedonhowwellitcorrelateswiththenumberofrunsateamscores
What is the best sta's'c to use?
Oneidea:the‘best’sta6s6ctojudgeaplayeristhesta6s6cthatismostcorrelatedwithruns• Wecanthenusethistoexaminehowgoodahi#eris
Wewilluseadatasetthathasseasontotalsta6s6csgoingbackto1961
Thesesta6s6csincludethetotalrunsateamscored,totalteamHR,totalteamBA,etc.
load('/home/shared/baseball_stats_2007/data/team_batting_stats.Rda')
What is the best sta's'c to use?
Sta6s6cstocompare:1.Homeruns(HR) 2.BaPngaverage(BA)3.On-basepercentage(OBP) 4.Sluggingpercentage(Slug)5.On-basepercentage+sluggingpercentage(OPS)
Foreachofthese5sta6s6cs:• Createasca#erplotbetweenthesta6s6candruns(R)• Calculatethecorrela6onbetweenthesta6s6candruns(R)
Onceyouhavefoundthesta+s+cthatismorecorrelatedwithruns,createaside-by-sideboxplottocompareDerekJe$erandDavidOr+z’sdataonthissta+s+c
Youcangettheteamyearlytotalsta6s6csrun: load('/home/shared/baseball_stats_2017/team_batting_stats.Rda')
Usefulfunc6ons:• plot(x,y)#createasca#erplotofdifferentsta6s6csandruns• cor(x,y)#calculatethecorrela6onbetweendifferentsta6s6csandruns• boxplot(v1,v2,names=c(‘Derek',‘David'))#compareplayersonthis‘best’sta6s6c
Results…
Correla'on between HR and runs
Correla'on between BA and runs
Correla'on between OBP and runs
Correla'on between Slug and runs
Correla'on between OPS and runs
The winner…
On-baseplussluggingseemslikethebeststa6s6ctouse!
Or6zhasabe#eron-baseplusslugging!
Who is a beHer hiHer: Derek Jeter or David Or'z?
BeHer know a player: Derek Jeter
Onioninfographic
OtherOnionar6cles
Regression
RegressionismethodofusingonevariabletopredictthevalueofasecondvariableInlinearregressionwefitalinetothedata,calledtheregressionline.
Regression line: runs/game as a func'on of team baVng average (2013)
Equa'on for a line
ŷ=a+b·x
Response=a+b·Explanatory
Wins runs regression
ŵ=14.47+.088·runs
a=14.47
b=.088
ŷ=a+b·x
R: lm(y ~ x)
Interpre'ng the slope and intercept
ŷ=a+b·x
Theslopebrepresentsthepredictedchangeintheresponsevariableygivenaoneunitchangeintheexplanatoryvariablex
Theinterceptarepresentedthepredictedvalueoftheresponsevariableyiftheexplanatoryvariablexwere0
Using the regression line to make predic'ons
1.Approximatelyhowmanyaddi6onalrunsdoyouneedtoscoreforanaddi6onalwin?2.Howmanywinswillyouhaveifyouscore0runsallseason?
a=14.47
b=.088
ŷ=a+b·x
ŵ=14.47+.088·Runs
Wins runs regression
1.Anaddi6onalwinfor~11addi6onalrunsscored2.Therewillbe14.47winsifyouscore0runsallseason
a=14.47
b=.088
ŷ=a+b·x
ŵ=14.47+.088·Runs
Example 2: Using the regression line to make predic'ons
a=-3.27
b=29.36
ŷ=a+b·x
1. If a team had a baVng average of 0.270, how many runs would you expect in a game?
If a team had a baVng average of 0.270, how many runs would you expect in a game?
(R/G)expected=29.35*BA-3.27
(R/G)expected=29.35*.270-3.27
R/G)expected=4.6572HowaboutifateambaPng.250?
a=-3.27
b=29.36
ŷ=a+b·x