SIAM News Blog
SIAM News
Print

Understanding the New Baseball Statistics

By James Case

Smart Baseball: The Story Behind the Old Stats That Are Ruining the Game, the New Ones That Are Running It, And the Right Way to Think About Baseball. By Keith Law. William Morrow, New York, NY, April 2017. 304 pages, $27.99.

Smart Baseball: The Story Behind the Old Stats That Are Ruining the Game, the New Ones That Are Running It, And the Right Way to Think About Baseball. By Keith Law. Courtesy of William Morrow.
Sportswriter Keith Law served as a special assistant to the general manager of the Toronto Blue Jays Baseball Club for several years. His 2017 book, Smart Baseball, describes the quantitative revolution that has swept baseball front offices since Michael Lewis published Moneyball in 2003. All 30 major league teams now have “Analytics Departments” of technically proficient individuals who attempt to improve player procurement practices. In the baseball world, “analytics” is a catch-all phrase that encompasses the collection and storage of data, as well as the use of that data to generate insight. Before spending five to eight million dollars on a recent high school graduate—or several hundred million on an established player—it pays to invest time, effort, and money assessing that player’s potential cash value.

Smart Baseball is divided into three parts. The first section explains how traditional baseball statistics are nearly useless when it comes to evaluating players, while the second describes the bewildering variety of new statistics that have become available in recent years — mainly due to the ever-increasing size and number of online databases. The book’s final section explains how one can use the new statistics to guide player procurement.

Law reserves his harshest condemnation for two familiar baseball statistics: runs batted in (RBIs) and pitcher wins (Ws). Teams, not pitchers, win games, and most of the runs for which batters get credit occur when their teammates are on base ahead of them. It is not that difficult to hit a short fly ball to the outfield with a speedy runner on third base; does the runner deserve less credit than the batter, even though he scored the resulting run? Likewise, a pitcher could pitch badly for five innings—giving up 10 runs while his teammates score 12—before being replaced by a relief pitcher who obliterates the opposition for the next four innings. Why, Law asks, should the inept starting pitcher receive credit for the win instead of the dominating reliever?

Law also considers batting average (AVG)1 to be overrated and uses Figure 1 to justify his skepticism of this statistic. “GORDON” is Dee Gordon, then of the Miami Marlins, who “led the league in batting” in 2015 with an AVG of .333. “HARPER” is Bryan Harper of the Washington Nationals, whose AVG was “only” .330. While Harper finished a close second to Gordon in AVG, he outperformed him by a substantial margin in every other category of interest, including adjusted batting runs (ABR). So does it really make sense to call Gordon the “batting champion?”

Figure 1. Comparison of Dee Gordon of the Miami Marlins and Bryan Harper of the Washington Nationals in 2015, in terms of batting average (AVG), on-base percentage (OBP), slugging percentage (SLG), doubles (2B), home runs (HR), bases on balls (BB), OUTS, and adjusted batting runs (ABR). Figure adapted from Smart Baseball.

Casual fans are often confused by the so-called “triple slash line” that is frequently appended to a player’s name to gauge his prowess as a hitter (i.e., Harper .330/.460/.649). The first number is the usual AVG, which is calculated by the ratio \(AVG= \frac{H}{AB}\). Here, \(H\) is the number of hits made by the player and \(AB\) is the number of his official turns at bat. The middle number is a relatively recent statistic called on-base percentage: \(OBP=(H+BB+\)\(HBP)/PA\). \(BB\) is “bases on balls,” \(HBP\) is “hit by pitch,” and \(PA\) is the number of “plate appearances” the player actually made: \(PA=AB+BB+HBP+SF\). \(SF\) stands for “sacrifice flies,” the batter’s fly ball outs that allowed a baserunner to score. Finally, the last number is the less familiar slugging average (SLG), which is the ratio \(TB/AB\), where \(TB\) is the number of “total bases” generated in the same series of “at-bats:” \(TB=1B+2 \times 2B + 3 \times 3B + 4 \times HR\). \(1B\) is singles, \(2B\) is doubles, \(3B\) is triples, and \(HR\) is home runs.

Statisticians can calculate the three “slash line statistics” for both teams and individual players. Moreover, all three correlate strongly with a team’s runs per game \((R/G)\), as Law demonstrates in Figure 2. On-base plus slugging (OPS) is as illegitimate as a statistic can be—it is calculated by adding ratios that lack a common denominator: \(OPS=OBP+SLG\)—but it obviously works. Teams with high OPS scores accumulate a lot of runs.

Figure 2. The “slash line statistics”—which include batting average (AVG), on-base percentage (OBP), and slugging percentage (SLG)—correlate with a team’s runs per game \((R/G)\). Figure adapted from Smart Baseball.
Later in the book, Law devotes an entire chapter to OBP as the measure of a hitter. His reason, roughly speaking, is that “a walk is as good as a hit.” In this new way of thinking, a hitter’s first duty is not to make an out. Each team begins every game with a “supply” of 27 outs; when both teams have exhausted their supplies, the game is over.2 Therefore, a batter who completes a turn at bat without getting out increases the number of runs his team can expect to score in the game. A batter who makes an out diminishes that expectation unless it is a “productive out,” such as a sacrifice fly.

In addition to popularizing OBP, baseball analysts have developed a number of “weighted average metrics” to evaluate specific aspects of player performance. One such metric is “batting runs” (BR), which is calculated as follows:

\[BR=.47H+.38D+.55T+.93HR+.33(BB+HBP)-.28 OUTS.\]

Here, \(D=2B\) and \(T=3B\). This statistic is easy to compute and particularly amenable to historical studies.

In recent years, new statistics have emerged that evaluate pitching performance in addition to hitting. Traditionally, the runs allowed (RA) or earned runs allowed (ERA) per nine-inning game have served as a measure of a pitcher’s effectiveness. The latter is obviously superior, as it does not hold the pitcher responsible for runs that are attributable to his teammates’ errors in the field. But runs can score even if fielders do not make errors, since speedy and/or agile fielders can produce outs that would have become hits against less mobile defenders. One attempt to correct for this shortcoming in the ERA statistic is known as batting average on balls in play (BABIP):

\[BABIP=(H-HR)/(AB-K-HR+SF).\]

Home runs \((HR)\) are subtracted from hits \((H)\) in the numerator because they are not balls put in play. For similar reasons, strikeouts \((K)\) and home runs \((HR)\) are subtracted from at-bats \((AB)\) in the denominator, while sacrifice flies \((SF)\) are added because they do not count as official at-bats. Major league teams use metrics like BABIP to separate a pitcher’s performance from the effects of defense and luck.

While devising a single metric that evaluates hitting or pitching prowess is undoubtedly challenging, inventing one that measures fielding ability is even more difficult. The traditional revelation that a player compiled a fielding average of 0.973 in 111 “chances,” for example, merely discloses that he failed to convert three of those chances into outs. Only those three “errors” count against him; any balls that a more mobile defender would have easily reached do not.

However, these traditional metrics are beginning to change. All 30 big league ballparks are now equipped with Statcast, a system that produces vast quantities of information that Major League Baseball (MLB) Advanced Media converts into a data stream of truly heroic proportions. The result is 1.5 billion rows of data that cover all MLB games beginning with Opening Day 2016. To remain competitive, teams must learn to store, access, and (eventually) analyze this data.

Statcast data include the speed, location, and spin rate of each pitch, as well as the exit velocity, launch angle, and direction of every batted ball. This newfound information allows analysts to deduce the projected direction of a batter’s hit with surprising accuracy. Several teams have thus improved their defenses by employing exaggerated shifts against vulnerable opponents.

Perhaps the most portentous aspect of Statcast is the information it provides about movement while the ball is in play. Analysts can now identify a fielder’s location when the batter makes contact with the ball and when he intercepts (or fails to intercept) its path. They also know the amount of time that elapses between those two events. At long last, statisticians may be able to reduce a player’s defensive prowess to a single number.

Smart Baseball explains far more statistics than is possible to summarize in a single review. Law’s tone is at times combative, and the casual fan may find his comparisons of little-known players a trifle tedious, but his explanations are both clear and informative. If you have ever found yourself wondering what modern baseball announcers are talking about on television, then this is the book for you.


1 AVG = number of hits / number of official turns at bat.

2 Unless of course the score is tied, in which case each team is issued a resupply of three supplementary outs.

James Case writes from Baltimore, Maryland.

blog comments powered by Disqus