Stats and History, Part 4
This is the fourth and final installment on the subject of how baseball’s statistics evolved. The text below is from the opening chapter of The Hidden Game of Baseball (1984), on which Pete Palmer and I collaborated. On to the pitching statistics, the ones you commonly see. First is wins, with its correlated average of won-lost percentage. Wins are a team statistic, obviously, as are losses, but we credit a win entirely to one pitcher in each game. Why not to the shortstop? Or the left fielder? Or some combination of the three? In a 13–11 game, several players may have had more to do with the win than any pitcher. No matter. We’re not going to change this custom, though Ban Johnson gave it a good try.
To win many games a pitcher generally must play for a team which wins many games (we discount relievers from this discussion because they rarely win 15 or more) or must benefit from extraordinary support in his starts or must allow so few runs that even his team’s meager offense will be enough, as Tom Seaver and Steve Carlton did in the early 1970s. Verdict on both wins and the won-lost-record percentage: situation- dependent. Look at Red Ruffing’s W-L record with the miserable Red Sox of the 1930s, then his mark with the Yankees. Or Mike Cuellar with Houston, then with Baltimore. Conversely, look at Ron Davis with the Yanks and then the Twins. There is an endless list of good pitchers traded up in the standings by a tailender to “emerge” as stars.
The recognition of the weakness of this statistic came early. Originally it was not computed by such men as Chadwick because most teams leaned heavily, if not exclusively, on one starter, and relievers as we know them today did not exist. As the season schedules lengthened—the need for a pitching staff became evident, and separating out the team’s record on the basis of who was in the box seemed a good idea. It was not and is not a good statistic, however, for the simple reason that one may pitch poorly and win, or pitch well and lose.
The natural corrective to this deficiency of the won-lost percentage is the earned run average—which, strangely, preceded it, gave way to it in the 1880s, and then returned in 1913. Originally, the ERA was computed as earned runs per game because pitchers almost invariably went nine innings. In this century it has been calculated as ER times 9 divided by innings pitched.
The purpose of the earned run average is noble; to give a pitcher credit for doing what he can to prevent runs from scoring, aside from his own fielding lapses and those of the men around him. It succeeds to a remarkable extent in isolating the performance of the pitcher from his situation, but objections to the statistic remain. Say a pitcher retires the first two men in an inning, then has the shortstop kick a ground ball to allow the batter to reach first base. Six runs follow before the third out is secured. How many of these runs are earned? None. (Exception: If a reliever comes on in mid-inning, any men he puts on base who come in to score would be classified as earned for the relievers, though unearned for the team statistic. This peculiarity accounts for the occasional case in which a team’s unearned runs will exceed the individual totals of its staff.) Is this reasonable? Yes. Is it a fair depiction of the pitcher’s performance in that inning? No.
The prime difficulty with the ERA in the early days, say 1913, when one of every four runs scored was unearned, was that a pitcher got a lot of credit in his ERA for playing with a bad defensive club. The errors would serve to cover up in the ERA a good many runs which probably should not have scored. Those runs would hurt the team, but not the pitcher’s ERA. This situation is aggravated further by use of the newly computed ERAs for pitchers prior to 1913, the first year of its official status. Example: Bobby Mathews, sole pitcher for the New York Mutuals of 1876, allowed 7.19 runs per game, yet his ERA was only 2.86—almost a perfect illustration of the league’s 40 percent proportion of earned runs.
In modern baseball, post–1946, with 88 out of every 100 runs being earned, the problem has shifted. The pitcher with the bad defense behind him is going to be hurt less by errors than by balls that wind up recorded as base hits which a superior defense team might have stopped. Bottom line: You pitch for a bad club, you get hurt. There is no way to isolate pitching skill completely unless it is through play-by-play observation and meticulous, consistent bookkeeping.
In a column in The Sporting News on October 9, 1976, Leonard Koppett, in an overall condemnation of earned run average as a misleading statistic, suggested that total runs allowed per game would be a better measure. It is a proposition worth considering, now that the proportion of earned runs has been level for some forty years; one can reasonably assume that further improvements in fielding would be of an infinitesimal nature. [In 2012, this comment seems at least debatable.] However, when you look at the spread in fielding percentage between the worst team and the best, and then examine the number of additional unearned runs scored, pitchers on low-fielding-percentage teams probably still have a good case for continuing to have their effectiveness computed through the ERA. In 1982, for example, in the American League, only 39 of the runs scored against Baltimore were the result of errors; yet Oakland, with the most error-prone defense in the league, allowed 84 unearned runs.
What gave rise to the ERA, and what we appreciate about it, is that like batting average it is an attempt at an isolation stat, a measure of individual performance not dependent upon one’s own team. While the ERA is a far more accurate reflection of a pitcher’s value than the BA is of a hitter’s, it fails to a greater degree than BA in offering an isolated measure. For a truly unalloyed individual pitching measure we must look to the glamour statistic of strikeouts, the pitcher’s mate to the home run (though home runs are highly dependent upon home park, strikeouts to only a sight degree).
Is a strikeout artist a good pitcher? Maybe yes, maybe no, as indicated in the discussion of the Carlton-Ryan-Johnson triad [in the Introduction, not republished in this blog series]; an analogue would be to ask whether a home-run slugger was a good hitter. The two stats run together: periods of high home-run activity (as a percentage of all hits) invariably are accompanied by high strikeout totals. Strikeout totals, however, may soar even in the absence of overzealous swingers, say, as the result of a rules change such as the legalization of overhand pitching in 1884, the introduction of the foul strike (NL, 1901; AL, 1903), or the expansion of the strike zone in 1963.
Just as home-run totals are a function of the era in which one plays, so are strikeouts. The great nineteenth-century totals—Matches Kilroy’s 513, Toad Ramsey’s 499, One Arm Dailey’s 483—were achieved under different rules and fashions. No one in the century fanned batters at the rate of one per inning; indeed, among regular pitchers (154 innings or more), only Herb Score did until 1960. In the next five years the barrier was passed by Sandy Koufax, Jim Maloney, Bob Veale, Sam McDowel, and Sonny Siebert. Walter Johnson , Rube Waddell, and Bob Feller didn’t run up numbers like that. Were they slower, or easier to hit, than Sonny Siebert?
Even in today’s game, which lends itself to the accumulation of, by historic standards, high strikeout totals for a good many pitchers and batters, the strikeout is, as it always has been, just another way to make an out. Yes, it is a sure way to register an out without the risk of advancing baserunners and so is highly useful in a situation like man on third with fewer than two outs; otherwise, it is a vastly overrated stat because it has nothing to do with victory or defeat—it is mere spectacle. A high total indicates raw talent and overpowering stuff, but the imperative of the pitcher is simply to retire the batter, not to crush him. What’s not listed in your daily averages are strikeouts by batters—fans are not as interested in that because it’s a negative measure—yet the strikeout may be a more significant stat for batters than it is for pitchers.
On second thought, maybe it’s just the same. So few errors are being made these days—2 in 100 chances, on average—maybe there’s not a great premium on putting the ball into play anymore. Sure, you might move a runner up with a grounder hit behind him or with a long fly, but on the other hand, with a strikeout you do avoid hitting into a double play. At least that’s what Darryl Strawberry said in his rookie season when asked why he was unperturbed about striking out every third time he came to the plate!
Bases on balls will drive a manager crazy and put lead in fielders’ feet, but it is possible to survive, even to excel, without first-rate control, provided your stuff is good enough to hold down the number of hits. Occasionally you will see a stat called Opponents’ Batting Average, or opponents’ On Base Average, or Opponents’ Slugging Percentage, all of which seem at first blush more revealing than they are. In fact these calculations are all academic, in that it doesn’t matter how many men a pitcher puts on base. Theoretically he can put three men on base every inning, leave twenty-seven baserunners allowed, and pitch a shutout. A man who gives up one hit over nine innings can lose 1–0; it’s even possible to allow no hits and lose. Who is the better pitcher? The man with the shutout and twenty-seven baserunners allowed, or the man who allows one hit? No matter how sophisticated your measurements for pitchers, the only really significant one is runs. [Today I might add, “unless you’re evaluating players for purposes of salary offer or acquisition.”]
The nature of baseball at all points is one man against nine. It’s the pitcher against a series of batters. With that situation prevailing, we have tended to examine batting with intricate, ingenious stats, while viewing pitching through generally much weaker, though perhaps more copious, measurements. What if the game were to be turned around so that we had a “pitching order”—nine pitchers facing one batter? Think of that for one minute. The nature of the statistics would change, too, so that your batting stats would be vastly simplified. You wouldn’t care about all the individual components of the batter’s performance, all combining in some obscure fashion to reveal run production. You’d care only about runs. Yet what each of the nine pitchers did would bear intense scrutiny, and over the course of a year each pitcher’s Opponents’ BA, Opponents’ OBA, Opponents’ SLG, and so forth, would be recorded and turned this way and that to come up with a sense of how many runs saved each pitcher achieved.
A stat with an interesting history is completed games. This is your basic counter stat, but it’s taken to mean more than most of those measurements by baseball people and knowledgeable baseball fans. When everyone was completing 90–100 percent of his starts, the stat was without meaning and thus was not kept. As relief pitchers crept into the game after 1905, the percentage of completed games declined rapidly…. By the 1920s it became a point of honor to complete three quarters of one’s starts; today the man who completes half is quite likely to lead his league. [Another sentence that raises an eyebrow in 2012.] So with these shifting standards, what do CGs mean? Well, it’s useful to know that of a pitcher’s 37 starts, he completed 18. That he accepted no assistance in 18 of his 37 games is indisputable; that he required none is a judgment for others such as fans or press to make. There is managerial discretion involved: it is seldom a pitcher’s decision whether to go nine innings or not, and there are different managerial styles and philosophies.
There are the pilots who will say give me a good six or seven, fire as hard as you can as long as you can, and I’ll bring in The Goose to wrap it up. There are others who encourage their starting pitchers to go nine, feeling that it builds team morale, staff morale, and individual confidence. Verdict: situation-dependent, to a fatal degree. CGs tell you as much about the manager and his evaluation of his bullpen as they tell you about the arm or the heart of the pitcher.
Can we say that a pitcher with 18 complete games out of 37 starts is better than one with 12 complete games in 35 starts? Not without a lot of supporting help we can’t, not without a store of knowledge about the individuals, the teams, and especially the eras involved. The more uses to which we attempt to put the stat, the weaker it becomes, the more attenuated its force. If we declare the hurler with 18 CGs “better,” how are we to compare him with another pitcher from, say, fifty years earlier who completed 27 out of 30 starts? Or another pitcher of eighty years ago who completed all the games he started? (Jack Taylor completed every one of the 187 games he started over five years.) Or what about Will White, who 1880 started 76 games and completed 75 of them? But the rules were different, you say, or the ball was less resilient, or they pitched from a different distance, with a different motion, or this, or that. The point is, there are limits to what a traditional baseball statistic can tell you about a player’s performance in any given year, let alone compare his efforts to those of a player from a different era.
Perhaps the most interesting new statistic of [the last] century is the one associated with the most significant strategic element since the advent of the gopher ball—saves. Now shown in the papers on a daily basis, saves were not officially recorded at all until 1960; it was at the instigation of Jerry Holtzman of the Chicago Sun-Times, with the cooperation of The Sporting News, that this statistic was finally accepted. The need arose because relievers operated at a disadvantage when it came to picking up wins, and at an advantage in ERA. The bullpenners were a new breed, and as their role increased, the need arose to identify excellence, as it had long ago for batters, starting pitchers, and fielders.
The save is, clearly, another stat that hinges on game situation and managerial discretion. If your are a Ron Davis on a team that has a Goose Gossage, the best you can hope for is to have a great won-lost record, as David did in 1979 and ’80. To pile up a lot of saves, you have to be saved for save situations, as Martin reserves Gossage; Howser, Quisenberry; or Herzog, Sutter. These relief stars are not brought in with their teams trailing; the game must be tied or preferably the lead is in hand. The prime statistical drawback is that there is no negative to counteract the positive, no stat for saves blown (except, all too often, a victory for the “fireman”).
In April 1982, Sports Illustrated produced a battery of well-conceived, thought-provoking new measurements for relief pitchers which at last attempted, among other things, to give middle and long relievers their due. Alas, the SI method was too rigorous for the average fan, and the scheme dropped from sight. It was a worthy attempt, but perhaps the perfect example of breaking a butterfly on the wheel. The “Rolaids Formula,” which at least takes games lost and games won into account, is a mild improvement over simply counting saves or adding saves and wins. It awards two points for a save or a win and deducts one point for a loss. The reasoning, we suppose, is that a reliever is a high-wire walker without a net—one slip may have fatal consequences. His chances of drawing a loss are far greater than his chances of picking up a win, which requires the intervention of forces not his own.
So today, when we have BABIP, WHIP, VORP, plus video analysis to back up the late-night noodling, we have better ways to evaluate pitching, and especially to correlate it, or unshackle it, from fielding.