March 9th, 2012

Stats and History, Part 3

This is the third installment on the subject of how baseball’s statistics evolved from Outs and Runs in 1845. The text below continues the publication online, for the first time, of the opening chapter of The Hidden Game of Baseball (1984). Other statistics introduced before the turn of the century were stolen bases (though not caught stealing), sacrifice bunts, doubles, triples, homers, strikeouts for batters and for pitchers, bases on balls, hit by pitch (HBP), and erratically, grounded into double play (GIDP). Caught stealing figures are available on a very sketchy basis in some of later years of the century, as some newspapers carried the data in the box scores of home-team games. From 1907 on, [Ernie] Lanigan recorded CS in box scores of the New York Press, but the leagues did not keep the figure officially until 1920. The AL has CS from that year to the present, excepting 1927 … [while] the NL kept CS from 1920 to 1925, then not again until 1951. […]

The sacrifice bunt became a prime offensive weapon of the 1880s and began appearing as a statistical entry in box scores by 1889. The totals piled up in the years when a single run was precious—from 1889 to 1893, then again from 1901 to 1920—were stupendous by modern standards (sacrifices counted as at bats until the early 1890s). Hardy Richardson had 68 sacrifice hits in 1891 (in 74 games!), Ray Chapman 67 in 1917; today it is unusual to see a player with as many as 20.

Batter bases on balls (and strikeouts) were recorded for the last year of the American Association, 1891, by Boston’s Clarence Dow, and for some years of the mid-’90s in the National League, but didn’t become an official statistic until 1910 in the NL, 1913 in the AL. Caught stealing, hit by pitch and grounded into double plays were not kept steadily in the nineteenth century, making it impossible for modern statisticians to apply the most sophisticated versions of their measures to early players.

The [next century added] little in the way of new official statistics—ERA and RBIs and SLG are better regarded as revivals despite their respective adoption dates of 1912, 1920, and 1923. These are significant measures, to be sure, but they represent official baseball’s classically conservative response to innovation: wait forty or fifty years, then “make it new.” Running counter to that trend have been baseball’s two most interesting new stats of the century [to 1984], the save and the game winning RBI. Both followed in fairly close relationship to a perception that something was occurring on the field yet, because it was not being measured, it had no verifiable reality [a later analogous situation became the middle reliever’s Hold]. (Another such stat which did not survive, alas, was stolen bases off pitchers, which the American League recorded only in 1920–24.)

The same could have been said back in 1908, in a classic case of a statistic rushing in to fill a void, as Phillies’ manager Billy Murray observed that his outfielder Sherry Magee had the happy facility of providing a long fly ball whenever presented with a situation of man on third, fewer than two outs. Taking up the cudgels on his player’s behalf, Murray protested to the National League office that it was unfair to charge Magee with an unsuccessful time at bat when he was in fact succeeding, doing precisely what the situation demanded. Murray won this point, but baseball flip-flopped a couple of times on this stat, in some years reverting to calling it a time at bat, in other years not even crediting an RBI.

The most delightfully loony stat of the century (though the GWRBI [gave] it a run for the money) was unofficial: the “All- American Pitcher” award, given to the Giants’ reliever Otis Crandall after the 1910 season, then sinking into deserved oblivion. It went like this: Add together a pitcher’s won-lost percentage, fielding percentage, and batting average, and voila, you get an All-American. Crandall’s combined figures of .810, .984, and .342, respectively, gave him 2,136 points and, according to those in the know, the best mark of all time, surpassing Al Spalding’s 2,096 points of 1875. Who is the all-time All-American since 1910? You tell us. But seriously, folks, the idea wasn’t a bad one—measuring the overall ability of pitchers—it was just that the inadequacies of the individual statistics were magnified by lumping them in this way.

There have been other new statistical tabulations in this century, but of a generally innocuous sort: counting intentional bases on balls, balks, wild pitches, shutouts, and sacrifice bunts and sacrifice flies against pitchers. Other new stats of a far superior quality appeared in the 1940s and ’50s but have not yet [as of 1984] gained the official stamp of approval. […]

Now that the genealogy of the more significant official measures has been described, it’s time to evaluate the important ones you saw in the newspapers over breakfast, and a few which are tabulated officially only at year’s end, or are found in the weekly Sporting News. [This sentence today seems particularly wistful and quaint.]

The first offensive statistic to consider will be that venerable, uncannily durable fraud, the batting average. What’s wrong with it? What’s right with it? We’ve recited the objections for the record, but we know as well as anyone else that this monument just won’t topple; the best that can be hoped is that in time fans and officials will recognize it as a bit of nostalgia, a throwback to the period of its invention when power counted for naught, bases on balls were scarce, and no one wanted to place a statistical accomplishment in historical context because there wasn’t much history as yet.

Time has given the batting average a powerful hold on the American baseball public; everyone knows that a man who hits .300 is a good hitter while one who hits .250 is not. Everyone knows that, no matter that is not true. You want to trade Bill Madlock for Mike Schmidt? Bill Buckner for Darrell Evans? BA treats all its hits in egalitarian fashion. A two-out bunt single in the ninth with no one on base and your team trailing by six runs counts the same as Bobby Thomson’s “shot heard ‘round the world.” And what about a walk? Say you foul off four 3–2 pitches, then watch a close one go by to take your base. Where’s your credit for a neat bit of offensive work? Not in the BA. And a .250 batting average may have represented a distinct accomplishment in certain years, like 1968 when the American League average was .230. That .250 hitter stood in  the same relation to an average hitter of his season as a .277 hitter did in the National League in 1983—or a .329 hitter in the NL of 1930! If .329 and .250 mean the same thing, roughly, what good is the measure?

So in attempting to assess batting excellence with the solitary yardstick of the batting average, we tend to diminish the accomplishments of (a) the extra-base hitter, whose blows have greater run-scoring potential, both for himself and for whatever men may be on base; (b) the batter whose talent is to extract walks from pitchers who do not wish to put him on base, or whose power is such that pitchers will take their chances working the corners of the plate rather than risk an extra-base hit; (c) the batter whose misfortune it is to be playing in a period dominated by pitching, either because of the game’s evolutionary cycles or because of rules-tinkering to stem a previous domination by batters; and (d) the man whose hits are few but well-timed, or clutch-they score runs. In brief, the BA is an unweighted average; it fails to account for at least one significant offensive category (not to mention hit by pitch, sacrifices, steals, and grounded into double play); it does not permit cross-era comparison; and it does not indicate value to the team.

And yet, the batting champion each year is declared to be the one with the highest batting average, and this will not soon change. The Hall of Fame is filled with .300 hitters who couldn’t carry the pine tar of many who will stay forever on the outside looking in. Knowledgeable fans have long realized that the ability to reach base and to produce runs are not adequately measured by batting average, and they have looked to other measures, for example, the other two components of the “triple crown,” home runs and RBIs. Still more sophisticated fans have looked to the slugging percentage or On Base Average. […]

How well do these other stats compensate for the weaknesses of the BA when viewed in conjunction with it or in isolation? The slugging percentage does acknowledge the role of the man whose talent is for the long ball and who may, with management’s blessing, be sacrificing bat control and thus batting average in order to let ‘er rip. (The slugging percentage is the number of total bases divided by at bats rather than hits divided by at bats, which is the BA.) But the slugging percentage has its problems, too.

It declares that a double is worth two singles, that a triple is worth one and a half doubles, and that a home run is worth four singles. All of these proportions are intuitively pleasing, for they relate to the number of bases touched on each hit, but in terms of the hits’ value in generating runs, the proportions are wrong. A home run in four at bats is not worth as much as four singles, for instance, in part because the run potential of the four singles is greater, in part because the man who hit the four singles did not also make three outs; yet the man who goes one for four at the plate, that one being a homer, has the same slugging percentage of 1.000 as a man who singles four times in four at bats.

Moreover, it is possible to attain a high slugging percentage without being a slugger. In other words, if you have a high batting average, you must have a decent slugging percentage; it’s difficult to hot .350 and have a slugging percentage of only .400. Even a bunt single boosts not only your batting average but also your slugging percentage. […]

Other things the slugging percentage does not do are: indicate how many runs were produced by the hits; give any credit for other offensive categories, such as walks, hit by pitch, or steals; permit the comparison of sluggers from different eras (if Jimmie Foxx had a slugging percentage of .749 in 1932 and Mickey Mantle had one of .705 in 1957, was Foxx 7 percent superior? The answer is no.. […]

Well, how about On Base Average? It has been around for quite a while and still [in 1984] is not an official statistic of the major leagues. But it does appear on a daily basis in some newspapers’ leaders section, weekly in The Sporting News, and annually in the American League’s averages book (since 1979, when Pete Palmer put it there). The OBA has the advantage of giving credit for walks and hit by pitch, but is an unweighted average and thus makes no distinction between those two events and, say, a grand-slam homer. A fellow like Eddie Yost, who drew nearly a walk a game in some years in which he hit under .250, gets his credit with this stat as does a Gene Tenace, one of those guys whose statistical line looks like zip without his walks. Similarly, players like Mickey Rivers or Mookie Wilson, leadoff hitters with a lot of speed, no power, and no patience are exposed by the OBA as distinctly marginal major leaguers, even in years when their batting averages look respectable or excellent. In short, the OBA does tell you more about a man’s ability to get on than BA does, and thus is a better indicator of run generation, but it’s not enough by itself to separate “good” hitters from “average” or “poor” ones.

RBIs? Don’t they indicate run production and clutch ability? Yes and no. They tell how many runs a batter pushed across the plate, all right, but they don’t tell how many fewer he might have driven in had he batted eighth rather than fourth, or how many more he might have driven in on a team that put more men on base. They don’t even tell how many more runs a batter might have driven in if he had delivered a higher proportion of his hits with men on base. (The American League kept RBI Opportunities—men on base presented to each batter—as an official stat for the first three weeks of 1918, then saw how much work was involved and ditched it.) […]

The RBI does tell you something about run-producing ability, but not enough: It’s a situation-dependent statistic, inextricably tied to factors which vary wildly for individuals on the same team or on others. And the RBI makes no distinction between being hit by a pitch to drive in the twelfth run of a game that concludes 14–3 and, again for comparison, the Thomson blast. [The counting stats are limited in their usefulness, except to say that, the fellow who hit 39 doubles was better at that skill than the fellow who hit 38.]

It’s an odd fact that from being the most interesting stat in the early days of baseball, runs has become the least interesting stat of today; it’s odd in that runs remain the essence of baseball, remain the key to victory. What has happened over the years is that the correlation between runs and times reached base has been almost constantly widening. In 1875 the number of hits allowed per nine innings was incredibly, not much different from what it is today. Tommy Bond of Hartford allowed only 7.95 hits per nine innings (facing underhand pitching was easy?). Bases on balls were in force at the time, but eight balls were required to get one, which accounts for their scarcity in the 1870s. Today, with walks greatly increased and hits only somewhat reduced, the number of runs per nine innings has dropped dramatically, although not the number of earned runs. Indeed, as the ratio of hits to runs has diminished through the years, the ratio of earned runs to total runs has increased. In 1876, for example, the National League scored 3,066 runs, of which only 1,201—39.2 percent—were earned. By the early 1890s this figure reached 70 percent, an extraordinary advance. It took until 1920 to reach 80 percent, and by the late 1940s it leveled off in the 87-89 percent range, where it remains.

In the fourth and final installment, we will move on to pitching statistics, thus setting the scene for the sabermetric revolution that, in 1984, was still regarded as nerdville and nothing more.