41
Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Embed Size (px)

Citation preview

Page 1: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

SabermetricsThe Art and Science of Quantifying an

Athlete’s Value

Mark RogersApril 2, 2010

Page 2: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

SABR

• The Society for American Baseball ResearchThe Society for American Baseball Research• Founded in 1971 in Cooperstown, New YorkFounded in 1971 in Cooperstown, New York• ““To foster the research, preservation, and To foster the research, preservation, and

dissemination of the history and record of baseball”dissemination of the history and record of baseball”• This is certainly made easier by being located in town This is certainly made easier by being located in town

alongside the National Baseball Hall of Fame.alongside the National Baseball Hall of Fame.

• 6,700 members6,700 members• Mostly statisticians, sports writers, and former players Mostly statisticians, sports writers, and former players

and officialsand officials

Page 3: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

A Universe of Statistics

• Baseball fans are particularly fond of using statistics to Baseball fans are particularly fond of using statistics to measure a player’s ability, for several reasons.measure a player’s ability, for several reasons.– Low scoring: limits most other quantitative measurements of the only Low scoring: limits most other quantitative measurements of the only

thing affecting the outcome, the points scored.thing affecting the outcome, the points scored.– No clock: games can have indeterminate length and scoring chances, No clock: games can have indeterminate length and scoring chances,

so a “fair number of chances to score” can varyso a “fair number of chances to score” can vary– Consistency: the game is played under almost the exact same rules Consistency: the game is played under almost the exact same rules

(and using mostly the same strategy) as it was when it premiered in (and using mostly the same strategy) as it was when it premiered in the 19the 19thth century, unlike other sports. century, unlike other sports.• The modern “live-ball era” of baseball: 1920—presentThe modern “live-ball era” of baseball: 1920—present• Football: the forward pass altered the way games were playedFootball: the forward pass altered the way games were played• Basketball: the modern “shot clock era” accelerates scoringBasketball: the modern “shot clock era” accelerates scoring• Hockey/soccer: hey, until recently, who cared?Hockey/soccer: hey, until recently, who cared?

• This consistency allows current players to be readily This consistency allows current players to be readily compared to almost any other player from the past, unlike compared to almost any other player from the past, unlike most other sports.most other sports.

Page 4: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Pioneer

• The first great baseball statistician was Henry The first great baseball statistician was Henry Chadwick (1824—1908).Chadwick (1824—1908).– An English transplant to Brooklyn, where he followed An English transplant to Brooklyn, where he followed

cricket, rounders, and their American cousin, baseballcricket, rounders, and their American cousin, baseball– Wrote summaries of games for New York newspapers, and Wrote summaries of games for New York newspapers, and

included a summary table of the game’s major statistics, included a summary table of the game’s major statistics, the the box scorebox score..

– For his contributions to the For his contributions to the legacy of the game, Chadwick legacy of the game, Chadwick was elected to the Baseball was elected to the Baseball Hall of Fame in 1938.Hall of Fame in 1938.

Page 5: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Original Baseball Statistics

• Chadwick’s early box scores focused primarily on Chadwick’s early box scores focused primarily on tabulating easily observable aspects of the game.tabulating easily observable aspects of the game.– Hits/H: any ball hit in fair territory, but not easily retrievable by Hits/H: any ball hit in fair territory, but not easily retrievable by

the fielders, resulting in a player safely reaching basethe fielders, resulting in a player safely reaching base– Walks (abbreviated BB, for “reached base on balls”)Walks (abbreviated BB, for “reached base on balls”)– Strikeouts (usually abbreviated K, or Strikeouts (usually abbreviated K, or occasionallyoccasionally SO) SO)– At-bats/AB: # of batting chances At-bats/AB: # of batting chances notnot resulting in a walk, since resulting in a walk, since

those “ball” pitches are deemed unhittablethose “ball” pitches are deemed unhittable– Stolen bases/SB: safe advances made between hit ballsStolen bases/SB: safe advances made between hit balls– Runs/R: # of times a player scores/crosses home plateRuns/R: # of times a player scores/crosses home plate– Runs batted in/RBI: # of players crossing home plate because of Runs batted in/RBI: # of players crossing home plate because of

that player’s at-bats, and that player’s at-bats, and notnot other issues other issues

Page 6: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Original Baseball Statistics

• Chadwick’s early box scores focused primarily on Chadwick’s early box scores focused primarily on tabulating easily observable aspects of the game.tabulating easily observable aspects of the game.– Errors/E: mental or physical lapses resulting in a player being Errors/E: mental or physical lapses resulting in a player being

safe who should have theoretically been thrown out, or safe who should have theoretically been thrown out, or “running themselves out” by overrunning“running themselves out” by overrunning

– Single/double/triple (1B/2B/3B): the # of bases safely reached Single/double/triple (1B/2B/3B): the # of bases safely reached by a player immediately following their hit, by a player immediately following their hit, notnot counting any counting any fielding errorfielding error

– Home runs/HR: player scores on their own hit, Home runs/HR: player scores on their own hit, notnot due to due to fielder’s error; counts as 4 basesfielder’s error; counts as 4 bases

– Total extra-base hits/XBH: 2B + 3B + HRTotal extra-base hits/XBH: 2B + 3B + HR– Total bases/TB: combined # of bases reached by a player on Total bases/TB: combined # of bases reached by a player on

their own hits, their own hits, notnot counting any fielding errors, for an entire counting any fielding errors, for an entire game (1B + 2×2B + 3×3B + 4×HR)game (1B + 2×2B + 3×3B + 4×HR)

Page 7: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Original Baseball Statistics

• Errors and other issues “outside the player’s control” are often Errors and other issues “outside the player’s control” are often used to distinguish batting performances worthy of credit from used to distinguish batting performances worthy of credit from those less so.those less so.– A player who reaches base solely due to a fielding error is A player who reaches base solely due to a fielding error is notnot credited credited

with a hit or an official at-bat.with a hit or an official at-bat.– Hits Hits followed byfollowed by an error are scored as the type of hit an errorless an error are scored as the type of hit an errorless

version of the play would have resulted in, plus “advanced (or thrown version of the play would have resulted in, plus “advanced (or thrown out) on error.”out) on error.”

– Walks/BB: appearance Walks/BB: appearance notnot counted as an official at-bat counted as an official at-bat– Hit by pitch/HBP: player awarded first base as a result, but Hit by pitch/HBP: player awarded first base as a result, but notnot given given

credit for a hit or an at-batcredit for a hit or an at-bat– Fielder’s choice/FC: player reached base only because a fielder chose to Fielder’s choice/FC: player reached base only because a fielder chose to

throw out a runner closer to scoring; thus, the player is throw out a runner closer to scoring; thus, the player is notnot given credit given credit for a hitfor a hit

– Double play/DP: batter causes multiple runners to be thrown out; Double play/DP: batter causes multiple runners to be thrown out; deemed so terrible as to not warrant RBI credit even if a run scores!deemed so terrible as to not warrant RBI credit even if a run scores!

Page 8: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Raw Data vs. Average Data

• Since baseball games are of varying length, the Since baseball games are of varying length, the number of at-bats (and therefore, the number of number of at-bats (and therefore, the number of “expected” hits, runs, etc.) can vary widely.“expected” hits, runs, etc.) can vary widely.– Therefore, measuring only the raw batting data is Therefore, measuring only the raw batting data is notnot the the

fairest measure of who is the “best” player.fairest measure of who is the “best” player.– Chadwick devised several alternative statistical methods Chadwick devised several alternative statistical methods

by calculating by calculating averagesaverages based on the ratio between the based on the ratio between the number of achievements made (in batting, pitching, or number of achievements made (in batting, pitching, or fielding) and the number of opportunities for them.fielding) and the number of opportunities for them.

Page 9: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Hitting Averages

• Batting average: a measure of the rate of fair hits made in Batting average: a measure of the rate of fair hits made in appropriate batting opportunities:appropriate batting opportunities:

– Chadwick viewed this as superior to the cricket average, which Chadwick viewed this as superior to the cricket average, which compared the number of runs to the number of outs.compared the number of runs to the number of outs.

– ““Situational” batting averages can also be measured, such as the Situational” batting averages can also be measured, such as the batter’s average with “runners in scoring position” (RISP), or one batter’s average with “runners in scoring position” (RISP), or one factoring in the # of times they grounded into a double play (GIDP).factoring in the # of times they grounded into a double play (GIDP).

• Slugging percentage: a measure of the player’s Slugging percentage: a measure of the player’s powerpower, which , which counts extra-base hits extra, but for the same number of ABs:counts extra-base hits extra, but for the same number of ABs:

BA=

HAB

SLG=

TBAB

=1B+(22B)+(33B)+(4HR)

AB

Page 10: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Hitting Comparisons

• In addition to the “portion of a whole” ratios of the BA, SLG, OBP, etc., ratios can be measured comparing one type of statistic to another.– Walk-to-strikeout ratio (BB/K): measures the hitter’s ability

to maximize one and minimize the other– Ground-ball-to-fly-ball ratio (G/F): ditto– At-bats per home run (AB/HR): measures the rate of home

runs, by using its easier-to-work-with reciprocal• Mark McGwire holds the all-time career record, with an AB/HR of

10.61 (having hit a home run in 9.4% of his official at-bats).• The league average for AB/HR in 2009 was 32.9 (the average players

hit a home run in 3.0% of their at-bats).

Page 11: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Triple Crown

• Traditionally, the HR and RBI counts, along with the popularly followed race for the batting-average title, were deemed to be the “major” hitting categories.

• Players who lead the league in all three are said to have won the Triple Crown of hitting.

• However, such a feat is difficult because of the wide gap between a power hitter “swinging for the fences” at the cost of many strikeouts and someone hitting “for average,” aiming for numerous hits even if they were “only” singles.– The most recent Triple Crowns were:

• American League: Carl Yastrzemski (Boston Red Sox), 1967• National League: Joe “Ducky” Medwick (St. Louis Cardinals), 1937

Page 12: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

A Problem with the System

• One problem with the use of the BA as a gauge of a player’s ability is that it makes no distinction between singles and “bigger” extra-base hits.– Thus, a “power hitter” who makes fewer hits, but scores more RBI with

a more powerful selection of hits, would be deemed a worse player.– Example: Ryan Howard, 2008 (the year he finished 2nd in MVP

voting)His 48 home runs and 146 RBI led the league (with 331 total bases in 610 official at-bats), but he had 199 strikeouts to go with them, which helped lower his BA to just .251.• He got a hit ¼ of the time, but his TB makes it look as if he did so ½ of the time.

• One solution is to use the slugging percentage (SLG), which gives “extra credit” for these bigger hits.– Howard’s SLG for 2008 was .543, much closer to the league-best.

Page 13: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Another Problem with the System• Both the BA and the SLG also fail to count walks as an official at-bat,

which fails to give credit to a player with “good eyes” who is able to avoid strikeouts long enough to draw a walk.– Example: Pete Rose, 1974

In an “off” year, he had a .284 BA, but also a career-best 104 walks.

• A solution to this problem is to use the on-base percentage (OBP), which counts hits and walks (as well as getting hit by a pitch, which can also have tactical advantages) and uses something more closely approximating the total “plate appearances” instead of “at-bats”:

– SF: sacrifice flies– Rose’s 1984 OBP was .385, close to the league lead.

SFHBPBBABHBPBBH

OBP

Page 14: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

To Count, Or Not To Count?

• Some baseball occurrences vary as to which statistical Some baseball occurrences vary as to which statistical categories they will count toward.categories they will count toward.– Bunt/sacrifice hit/SH: a deliberate attempt to hit the ball so Bunt/sacrifice hit/SH: a deliberate attempt to hit the ball so

as to allow a runner to advance, at the expense of the batteras to allow a runner to advance, at the expense of the batter• Like a reverse fielder’s choice, the bunt does Like a reverse fielder’s choice, the bunt does notnot count as a hit, and count as a hit, and

the attempt does the attempt does notnot count as an official at-bat. count as an official at-bat.• Assuming the runner is indeed thrown out before reaching base, it Assuming the runner is indeed thrown out before reaching base, it

will also not count towards the OBP.will also not count towards the OBP.• However, if the runner is deemed to primarily be trying to reach However, if the runner is deemed to primarily be trying to reach

first, it may be scored as a single or out instead.first, it may be scored as a single or out instead.• If the batter bunts toward a runner on third (to draw away the third If the batter bunts toward a runner on third (to draw away the third

baseman), this baseman), this squeeze play squeeze play will result in an RBI credited.will result in an RBI credited.

Page 15: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

To Count, Or Not To Count?

• Some baseball occurrences vary as to which Some baseball occurrences vary as to which statistical categories they will count toward.statistical categories they will count toward.– Sacrifice fly/SF: a fly ball hit with less than two outs, that Sacrifice fly/SF: a fly ball hit with less than two outs, that

is caught far enough from the infield to allow a runner to is caught far enough from the infield to allow a runner to scorescore• Like a sacrifice hit (bunt), the sacrifice fly does Like a sacrifice hit (bunt), the sacrifice fly does notnot count as a hit, count as a hit,

and the attempt does and the attempt does notnot count as an official at-bat. count as an official at-bat.• However, unlike the sacrifice hit, it However, unlike the sacrifice hit, it willwill count towards the OBP, as count towards the OBP, as

the play is considered more accidental and less a tactical decision.the play is considered more accidental and less a tactical decision.• The maneuver’s primary benefit is allowing a runner on third base The maneuver’s primary benefit is allowing a runner on third base

to score while the caught ball is relayed; thus, the batter is to score while the caught ball is relayed; thus, the batter is credited with an RBI if successful.credited with an RBI if successful.

Page 16: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Best of Both Worlds

• Some armchair statisticians have argued that a measurement fusing the advantages of the SLG and the OBP would be ideal.

• “On-base plus slugging”: OPS = OBP + SLG• The OPS has become one the primary statistical benchmarks

used for hitters in the modern-day game.• However, it is not without its share of controversy.

– The “equal mixture” blends two measurements that normally have very unequal numbers; typically, SLG > OBP, weighting it preferentially.

– It also has no intrinsic meaning in game-play terms, unlike the BA (the frequency of getting a hit) or OBP (the frequency of reaching base).

Page 17: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Try 1 from Column A, 1 from Column B

• Each of these statistics can also be added to or subtracted from each other for a variety of results.

• Isolated power/IsoP: a measure of the hitter’s power effects without the influence of the number of hits:

IsoP = SLG − BA• Secondary average/SecA: a measure of the hitter’s number of

bases attained without the influence of the number of hits:– Including any gained through walks and stolen bases

– CS: # of times caught stealing (presumably, additionally subtracted so as to highlight the difference between two players who achieve the same number of stolen bases in a very different number of attempts)

ABCS SB BB HTB

SecA

Page 18: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

So…What’s “Good”?

• The variety of measurements necessitate some sort of benchmark for what would rank a player among the league’s best in each category.

• Typical career averages (modern “live-ball era,” since 1920):

BA OBP SLG OPS

“Average” .267 .330 .420 .750

“Great” .300 .370 .460 .830

“Elite” .325 .400 .500 .900

All-time record .366(Ty Cobb)

.482(Ted Williams)

.690(Babe Ruth)

1.164(Babe Ruth)

Active leader .334(Albert Pujols)

.427(Todd Helton/ Albert Pujols)

.628(Albert Pujols)

1.055(Albert Pujols)

Page 19: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Seeing the Numbers• In addition to studying the numbers themselves, we can also visualize

them using a scatterplot, searching for a presumed correlation between the OBP (the ability to get on base using more than just hits) and the SLG (the ability to get past first base with one’s hits).

• The red lines represent the league average for each statistic.– Upper-left:

+ power, − average– Lower-right:

− power, + average– Lower-left:

weaker in both– Upper-right:

stronger in both

Page 20: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Seeing the Numbers• We can also plot the best-fit trend line for this scatterplot (shown here in

light blue), showing the expected link between hitting for average (OBP) and hitting for power (SLG) that certain players exceed and others trail.– Anyone above this line is hitting for more power than their OBP would have suggested.– Anyone below this line is hitting for less power than their OBP would have suggested (or

possibly getting on base more often than their SLG would have suggested).

• Once again, there is a clear sign of which current player excels at both of these critical areas.– Albert Pujols,

St. Louis Cardinals

Page 21: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Pitching Statistics

• Much as the batting average (BA) represents the number Much as the batting average (BA) represents the number of “official” at-bats for a hitter, and PA the actual of “official” at-bats for a hitter, and PA the actual number of “plate appearances,” the pitcher’s number of “plate appearances,” the pitcher’s performance can be measured by how many batters performance can be measured by how many batters were faced.were faced.– The equivalent of the PA is the # of “batters faced” (BF).The equivalent of the PA is the # of “batters faced” (BF).– The equivalent of the BA is the “opponents’ batting average” The equivalent of the BA is the “opponents’ batting average”

(OBA), which similarly subtracts any plate appearance not (OBA), which similarly subtracts any plate appearance not counted as an at-bat for the hitter:counted as an at-bat for the hitter:

• CI: Catcher Interference with the play (½ times per year per team)CI: Catcher Interference with the play (½ times per year per team) OBA =

HBF BB HBP SF SH CI

Page 22: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Pitching Averages

• Earned run average/ERA: a measure of how many earned runs a Earned run average/ERA: a measure of how many earned runs a pitcher would be expected to give up on average over a full 9-pitcher would be expected to give up on average over a full 9-inning game, regardless of the actual number of innings pitched inning game, regardless of the actual number of innings pitched (IP):(IP):

– ““Earned” runs/ER: those directly or indirectly due to the pitcher’s Earned” runs/ER: those directly or indirectly due to the pitcher’s actions, including:actions, including:• Runners who score due to the pitcher’s actions (Runners who score due to the pitcher’s actions (notnot any fielders’) any fielders’)• Runners left behind by that pitcher (who is “responsible” for them) Runners left behind by that pitcher (who is “responsible” for them)

who later score when the who later score when the reliefrelief pitcher allows a hit pitcher allows a hit• But But notnot including runners who score only because a player’s earlier including runners who score only because a player’s earlier

error gave the team an “extra out” to drive them inerror gave the team an “extra out” to drive them in– ““Good” ERAs vary widely depending on the era played.Good” ERAs vary widely depending on the era played.

• 2.00 or less in “pitchers’ eras,” 4.00 or more in “hitters’ eras”2.00 or less in “pitchers’ eras,” 4.00 or more in “hitters’ eras”

ERA =

ERIP

9

Page 23: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Other Triple Crown

• For pitchers, the three statistical categories deemed to be the most important have traditionally been wins, strikeouts, and ERA.– Like the hitting categories, these were deemed the easiest to follow.

• Players who lead the league in all three are said to have won the Triple Crown of pitching.

• Because the skill sets involved for pitchers are not as disparate as for hitters, some (particularly “power pitchers” excelling in strikeouts as a means to an end) can find winning it easier.– The most recent pitching Triple Crowns were:

• American League: Johan Santana (Minnesota Twins), 2006• National League: Jake Peavy (San Diego Padres), 2007

Page 24: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Pitching Averages• Much like the batting average “levels the field” of batting statistics Much like the batting average “levels the field” of batting statistics

between players with different numbers of at-bats, a variety of between players with different numbers of at-bats, a variety of pitching averages attempt to balance pitchers who have a varying pitching averages attempt to balance pitchers who have a varying number of innings pitched (IP), in the same manner as the ERA (by number of innings pitched (IP), in the same manner as the ERA (by dividing the raw data by the number of innings pitched and dividing the raw data by the number of innings pitched and multiplying by 9).multiplying by 9).– Hits per 9 innings pitched: “H/9” = H ÷ IP × 9Hits per 9 innings pitched: “H/9” = H ÷ IP × 9– Strikeouts per 9 innings: “K/9” = K ÷ IP × 9Strikeouts per 9 innings: “K/9” = K ÷ IP × 9– Walks per 9 innings: “BB/9” = BB ÷ IP × 9Walks per 9 innings: “BB/9” = BB ÷ IP × 9

• Since excess walks and hits can still exhaust a pitcher (who must Since excess walks and hits can still exhaust a pitcher (who must then be replaced) even if they do not result in runs, one popular then be replaced) even if they do not result in runs, one popular modern average combines these “trivial” slip-ups.modern average combines these “trivial” slip-ups.– Walks plus hits per inning pitched (WHIP): (BB + H) ÷ IPWalks plus hits per inning pitched (WHIP): (BB + H) ÷ IP

Page 25: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Pitching Comparisons

• Many of the ratios used to measure hitting prowess can be inverted to measure pitching prowess.– Strikeout-to-walk ratio (K/BB): measures the pitcher’s ability

to maximize one and minimize the other– Fly-ball-to-ground-ball ratio (F/G): ditto

• The ERAs can be adjusted as well for certain situations, including the “catcher’s ERA” (CERA), the average ERA of the team with a particular catcher playing.– A measure of the catcher’s ability to control the game– Thus, it is more of a fielding statistic.

Page 26: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Fielding Statistics• Putouts (PO): the number of outs directly caused by a fielder• Assists (A): the number of outs in which a player was indirectly

involved• Total (fielding) chances: TC = PO + A + E

– The total number of opportunities to make a defensive play

• Fielding percentage:

– Typically 98.5% (0.985) or better for most players, or slightly lower for difficult defensive positions (third base and shortstop)

• Range factor: RF = (PO + A) ÷ IP × 9– A proportional extrapolation of a full game, the fielding equivalent of the

ERA; used to gauge the amount

FP =

PO + ATC

= PO + A

PO + A + E

Page 27: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Prophet

• As general manager of the St. Louis Cardinals and the Brooklyn As general manager of the St. Louis Cardinals and the Brooklyn Dodgers, Branch Rickey profoundly altered baseball, from the Dodgers, Branch Rickey profoundly altered baseball, from the integration he pioneered with Jackie Robinson to the “farm integration he pioneered with Jackie Robinson to the “farm system” he invented to find untested players and train them for system” he invented to find untested players and train them for major-league success in the minor leagues.major-league success in the minor leagues.– With both teams struggling when With both teams struggling when

he arrived, he worked to maximize he arrived, he worked to maximize player value by signing them for player value by signing them for the lowest cost and training them the lowest cost and training them to their fullest potential.to their fullest potential.

– From Ken Burns’ film From Ken Burns’ film BaseballBaseball: : “Nobody knew how to put a “Nobody knew how to put a dollar sign on the muscle better dollar sign on the muscle better than Branch Rickey.”than Branch Rickey.”

Page 28: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Prophet• Rickey was asked by LIFE magazine if there was a “formula” for

baseball success. Skeptical at first, he worked for six months to come up with what he thought might hold the key.

– Rickey argued that this carefully balanced formula did an excellent job of approximating the final standings at season’s end, even if it violated many long-held beliefs about what was important for a team to win.

– Hall of Famer George Sisler: “I still don’t believe it, but there it is.”

G = (hitting proficiency) (pitching proficiency)

H + BB + HBPAB + BB + HBP

+ 3 (TB H)

4 AB +

RH + BB + HBP

H

AB +

BB + HBPBF + BB + HBP

+ ER

H + BB + HBP

K8 (BF + BB + HBP)

F

Page 29: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Focusing on What Really Matters

• Rickey’s formulas were the precursors to the OBP and SLG— new ways of measuring the effectiveness of a hitter in hopes of finding a better match for the only baseball statistic that really counts, the number of runs scored.– Research suggests that a player’s batting average correlates with the

team’s run-scoring success only 75% of the time.– OPS (OBP + SLG), on the other hand, does so 90% of the time.

• A perfect solution would be to calculate the number of runs each player is personally responsible for, but since scoring runs in baseball is such a communal effort, this is difficult.

Page 30: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Visionary

• In 1977, baseball writer and statistician Bill James coined the In 1977, baseball writer and statistician Bill James coined the term term sabermetrics sabermetrics (based on the acronym SABR) to refer to (based on the acronym SABR) to refer to the statistical analysis of baseball data.the statistical analysis of baseball data.

• For years, James wrote a self-published baseball statistical For years, James wrote a self-published baseball statistical abstract after publishers deemed its subject matter too abstract after publishers deemed its subject matter too esoteric for a mainstream audience.esoteric for a mainstream audience.

• His work found an obsessive audience of writers, fans, and His work found an obsessive audience of writers, fans, and baseball officials, and was soon published nationwide.baseball officials, and was soon published nationwide.– It would inspire an entire field of study and copycat publications.It would inspire an entire field of study and copycat publications.– 2006: Bill James named one of 2006: Bill James named one of TimeTime’s 100 Most Influential People’s 100 Most Influential People

Page 31: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Creating the Runs• James created a number of new statistical categories.

– Range factor (RF)– Pythagorean expectation: an estimate of how many games the team should

have won, based on the total runs scored and allowed:

• The Pythagorean expectation has a strong correlation with the number of games the team actually goes on to win, although it can be improved still further by using an exponent of 1.82 instead.

– Win shares: like the winnings divided up by a championship team, this formula divides up 3w “shares” of w wins among the players according to the amount each is entitled to, depending on their performance.• Incorporates hitting, pitching, fielding, and even other “intangible” issues• However, it is very difficult to calculate; its description in James’s book is 84 pages.

22

2

allowed) (runs scored) (runsscored) (runs

nexpectatio nPythagorea

Page 32: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Creating the Runs

• James created a number of new statistical categories.– Runs created (RC): a general category of formulas combining an on-base

factor A, an advancement factor B, and an opportunity factor C:

– This formula can vary widely depending on how those three categories are defined (or refined). One basic formula is:

– Adjustments can be made to this formula by adding in other factors, or by weighting the various factors with coefficients to make them more or less important:

CBA

RC

BB ABTBBB) (H

RC

BB ABSB))(0.55 TB(CS) BB (H

RC

Page 33: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Creating the Runs

• James created a number of new statistical categories.– One particularly elaborate version incorporates many small tweaks:

• James also developed formulas to help predict a pitcher’s performance, based on both the ERA and other acts within the game, all carefully balanced to create the “Game Score.”

• Using these very technical formulas, James became one of the leading experts on predicting outcomes in baseball.

– Even he, though, cautioned against overdependence on their use.

SF SH HBP BB ABSB))) SF SH((0.52 HBP)) IBB BB((0.26 TB(GIDP) HBP CS BB (H

RC

Page 34: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Game Changes• For a century, professional baseball players’ salaries were restricted

by the reserve clause, which guaranteed teams the right of first renewal even after contracts expired.– In theory, it meant a form of job security; in practice, it was a “non-

compete” clause, which meant that players could not choose to pursue higher salaries (or a better team) elsewhere.

– League officials had been given an anti-trust exemption by Congress.

• Following the first MLB strike in 1972, players won raises and, more importantly, binding arbitration on salary issues.– An arbitrator soon struck down the reserve clause, allowing players whose

contract with a team ended after 6 years to declare themselves “free agents,” who could sign for whatever the open market allowed.

– The result was an explosion in player salaries, which forced many teams to take a hard look at where their money could best be spent.

Page 35: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The True Believer• In 1995, new owners inherited the Oakland A’s, and ordered

general manager Sandy Alderson and assistant GM Billy Beane to slash spending on player salaries.– In 1998, Beane became the GM, “running the team” at just age 35.– The A’s had been one of the most successful teams of the previous two

decades, winning 6 pennants and 4 World Series in a small city.

• Unable to spend freely to acquire talent, Beane was forced to find undervalued players, and used sabermetrics to do so.– The front office began to emphasize statistics such as OBP, SLG, and fielding

ability rather than the traditional favored stats of BA and RBI.– In 2001 and 2002, the A’s were one of the best teams in baseball, winning

over 100 games each year despite the second-lowest payroll.– Michael Lewis’s book Moneyball chronicles the struggles of the “Beane

counters” to convince the team’s skeptical old-guard scouts.

Page 36: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The True Believer

• Since all teams were required to spend a minimum amount on salaries, and all teams won at least 50 games, the key issue was how much extra each team paid for each extra victory.– Oakland had excelled in this area, paying $500,000 for each “extra”

victory on their way to the division title (one of just two teams to spend less than $1 million per extra win), while richer but poorer- performing teams were paying $3 million or more for each of theirs.

– To avoid the high costs of free agency, the A’s were also forced to get as much mileage as possible out of these undervalued stars’ contracts before their success attracted the attention of the big-market teams.

– Before the 2001 season, Oakland had lost Jason Giambi, Johnny Damon, and Jason Isringhausen, three All-Stars whose new $33 million combined annual salaries were as much as the A’s entire team.

Page 37: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

The Apostles

• Following the success of Billy Beane, several other teams hired young general managers who used sabermetrics (rather than a long career as a professional baseball scout) as an integral part of analyzing potential player acquisitions.

• Their success rate varied widely.– Paul DePodesta

• Named GM of the Los Angeles Dodgers at age 31• Fired after just his second season, the Dodgers’ second-worst since moving to L.A.

– Theo Epstein• Named GM of the Boston Red Sox at age 28, the youngest in history• Hired Bill James as a sabermetric adviser• Two years later, the Red Sox broke “The Curse,” winning their first World Series in

86 years (as well as another one, three years later).

Page 38: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

A Very Serious Subject

• Several professors now teach sabermetrics coursesSeveral professors now teach sabermetrics courses..– Jim Albert, Bowling Green State UniversityJim Albert, Bowling Green State University

• www-math.bgsu.edu/~albert/www-math.bgsu.edu/~albert/

– Andy Andres, Tufts UniversityAndy Andres, Tufts University• www.sabermetrics101.com/www.sabermetrics101.com/

– Steven J. Miller, Williams CollegeSteven J. Miller, Williams College• www.williams.edu/go/math/sjmiller/public_html/399/index.htmwww.williams.edu/go/math/sjmiller/public_html/399/index.htm

• Typically, the courses are elective follow-ups to a Typically, the courses are elective follow-ups to a standard statistics course, created by baseball fans.standard statistics course, created by baseball fans.– Student projects often involve designing a series of Student projects often involve designing a series of

mathematical models to perform their own analysis.mathematical models to perform their own analysis.

Page 39: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Play Ball.PowerPoint slides available at www.mesastate.edu/~mcrogers

Support your team!

Season Opener:New York Yankees at Boston Red SoxSunday, April 4, 6:00 p.m., ESPN2

Opening Day:St. Louis Cardinals at Cincinnati RedsMonday, April 5, 11:10 a.m., ESPN

Page 40: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Bibliography• Jim Albert and Jay Bennett, Jim Albert and Jay Bennett, Curve Ball: Baseball, Statistics, Curve Ball: Baseball, Statistics,

and the Role of Chance in the Gameand the Role of Chance in the Game..• Baseball: A Film by Ken BurnsBaseball: A Film by Ken Burns..• Baseball Reference: www.baseball-reference.com/players/Baseball Reference: www.baseball-reference.com/players/• ESPN statistics page: espn.go.com/mlb/statisticsESPN statistics page: espn.go.com/mlb/statistics• David Grabiner, David Grabiner, The Sabermetric ManifestoThe Sabermetric Manifesto. (. (

www.baseball1.com/bb-data/grabiner/manifesto.html))• Bill James’ new website: www.billjamesonline.net/Bill James’ new website: www.billjamesonline.net/• Dan Lewis, “Lies, Damn Lies, and RBIs.” Dan Lewis, “Lies, Damn Lies, and RBIs.” National ReviewNational Review, ,

March 31, 2001. (available online at old.nationalreview.com/ March 31, 2001. (available online at old.nationalreview.com/ weekend/play-ball/pb-lewis033101.shtml)weekend/play-ball/pb-lewis033101.shtml)

• Michael Lewis, Michael Lewis, Moneyball: The Art of Winning a Unfair GameMoneyball: The Art of Winning a Unfair Game..

Page 41: Sabermetrics The Art and Science of Quantifying an Athlete’s Value Mark Rogers April 2, 2010

Bibliography• MLB Official Rules: MLB Official Rules:

mlb.mlb.com/mlb/official_info/official_rules/foreword.jspmlb.mlb.com/mlb/official_info/official_rules/foreword.jsp• Branch Rickey, “Goodby to Some Old Baseball Ideas.” Branch Rickey, “Goodby to Some Old Baseball Ideas.” LIFELIFE

Magazine, August 2, 1954. (available online at Magazine, August 2, 1954. (available online at www.baseballthinkfactory.org/btf/pages/essays/rickey/goodby_towww.baseballthinkfactory.org/btf/pages/essays/rickey/goodby_to_old_idea.htm or “scanned” at Google Books)_old_idea.htm or “scanned” at Google Books)

• SABR: www.sabr.org/SABR: www.sabr.org/• Alan Schwarz, Alan Schwarz, The Numbers Game: Baseball’s Obsession with The Numbers Game: Baseball’s Obsession with

StatisticsStatistics..• THETHE print almanac: John Thorn & Pete Palmer, print almanac: John Thorn & Pete Palmer, Total BaseballTotal Baseball..• Tom M. Tiger, Mitchel Lichtman, & Andrew Dolphin, Tom M. Tiger, Mitchel Lichtman, & Andrew Dolphin, The Book: The Book:

Playing the Percentages in BaseballPlaying the Percentages in Baseball. (insidethebook.com). (insidethebook.com)