Stats and History, Part 2

Let’s return to the subject of how baseball’s statistics came into being and changed over time. The text below continues, in excerpted form, the publication online, for the first time, of the opening chapter of The Hidden Game of Baseball (1984). Chadwick’s bias against the long ball was in large measure responsible for the game that evolved and for the absence of a hitter like Babe Ruth until 1919. When lively balls were introduced—as they were periodically from the very infancy of baseball—and long drives were being belted willy-nilly, and scores were mounting, Chadwick would ridicule such games in the press. What he valued most in the early days was the low scoring game marked by brilliant fielding. In the early annual guides, he listed all the “notable” games between significant teams—i.e., those in which the winner scored under ten runs!

Chadwick prevailed, and Hits Per Game became the criterion for the Clipper batting championship and remained so until 1876, when the problem with using games as the denominator in the average at last became clear. If you were playing for a successful team, and thus were surrounded by good batters, or if your team played several weak rivals who committed many errors, the number of at bats for each individual in that lineup would increase. The more at bats one is granted in a game, the more hits one is likely to have. So if Player A had 10 at bats in a game, which was not as unusual in the ’60s, he might have 4 base hits. In a more cleanly played game, Player B might bat only 6 times, and get 3 base hits. Yet Player A, with his 4-for-10, would achieve an average of 4.00; the average of Player B, who went 3–for–6, would be only 3.00. By modern standards, of course, Player A would be batting .400 while Player B would be batting .500.

In short, the batting average used in the 1860s is the same as that used today except in its denominator, with at bats replacing games. Moreover, Chadwick posited a primitive version of the slugging percentage in the 1860s, with total bases divided by number of games; change the denominator from games to at bats and you have today’s slugging percentage—which, incidentally, was not accepted by the National League as an official statistic until 1923 and the American until 1946 (the game was born conservative). Chadwick’s “total bases average” represents the game’s first attempt at a weighted average—an average in which the elements collected together in the numerator or the denominator are recognized numerically as being unequal. In this instance, a single is the unweighted unit, the double is weighted by a factor of two, the triple by three, and the home run by four. Statistically, this is a distance leap forward from, first, counting, and next, averaging. The weighted average is in fact the cornerstone of today’s statistical innovations.

The 1870s gave rise to some new batting stats and to the first attempt to quantify thoroughly the other principal facets of the game, pitching and fielding. Although the Clipper recorded base hits and total bases as early as 1868, a significant wrinkle was added in 1870 when at bats were listed as well. This is a critical introduction because it permitted the improvement of the batting average, first introduced in its current form in the Boston press on August 10, 1874, and first computed officially—that is, for the National League—in 1876.

Since then the BA has not changed. [NOTE: later research revealed an earlier inception of the concept for a modern batting average, by Hervie Alden Dobson in the Clipper of March 11, 1871.] The objections to the batting average are well known, but to date [i.e., 1984] have not have not dislodged the BA from its place as the most popular measure of hitting ability. First of all, the batting average makes no distinction between the single, the double, the triple, and the home run, treating all as the same unit—a base hit—just as its prototype, Runs Per Game, treated  the run as its unvarying, indivisible unit. This objection was met in the 1860s with Total Bases Per Game. Second, it gives no indication of the effect of that base hit; in other words, it gives no indication of the value of the hit to the team. This was probably the objection that Chadwick had to tabulating base hits originally, because it is not likely that the idea just popped into his head in 1867, upon which he decided to act immediately; he must have thought of a hit-constructed batting average earlier and rejected it.

A third objection to the batting average is that it does not take into account times first is reached via base on ball, hit by pitch or error. This, too, was addressed at a  surprisingly early date. In 1879 the National League adopted as an official statistic a forerunner of the On Base Average; it was called “Reached First Base.” Paul Hines was the leader that year with 193, which included times reached by error as well as base on balls and base hits. But the figure was dropped after  that year. […]

The year 1876 was significant not only for the founding of the National League and the official debut of the batting average in its current form; it was also the Centennial of the United States, which was marked by a giant exposition in Philadelphia celebrating the mechanical marvels of the day. American ingenuity reigned, and technology was seen as the new handmaiden of democracy. Baseball, that mirror of American life, reflected the fervor for things scientific with an explosion of statistics far more complex than those seen before, particularly in the previously neglected areas of pitching and fielding. The increasingly minute statistical examination of the game met a responsive audience, one primed to view complexity as an indication of quality.

When the rule against the wrist-snap was removed in 1872, permitting curve pitching, and as the number of errors declined through the early 1870s—thanks to the heightened level of competition provided by baseball’s first professional league, the National Association—the number of runs scored dropped off markedly.

With the pitcher unshackled—transformed from a mere delivery boy of medium pace, straight balls to a formidable adversary—the need to identify excellence, to plot the stars, arose just as it had for batters in the 1860s. Likewise as fielding errors became more the exception than the rule, they became at last worth counting and contrasting with chances accepted cleanly, in other words, the fielding percentage. Fielding skill was still the most highly sought after attribute of a ballplayer, but the balance of fielding, batting, and pitching was in flux; by the 1880s pitching and batting would begin their long rise to domination of the game, Chadwick’s tastes notwithstanding.

The crossroads of 1876 highlights how the game had changed to that point, and how it has changed since.

In that year, the number of offensive stats tabulated at season’s end … was six: games, at bats, runs hits, runs per game, and batting average. Of these, only runs and runs per game were common in the 1860s, while that decade’s tabulation of total bases vanished. The number of [official] offensive stats a hundred years later? Twenty. (Today [i.e., 1984] the number is twenty-one, with the addition of the game winning RBI.)

The number of pitching categories in 1876 was eleven, and there some surprises, such as earned run average, hits allowed, hits per game, and opponent’s batting average. Strikeouts were not recorded, for Chadwick saw them strictly as a sign of poor batting rather than good pitching (his view had such an impact that the pitchers’ K’s were not kept officially until 1887). The number of [official]pitching stats today [i.e., 1984]? Twenty-four

The number of fielding categories in 1876 was six. One hundred years later it was still six (with the exception of the catcher, who gets a seventh: passed balls), dramatizing how the game—at least the hidden game of statistics—had passed fielding by. The fielding stats of 1876 were combined to form an average, the “percentage of chances accepted,” or fielding percentage. A “missing link” variant, devised by Al Wright in 1875, was to form averages by dividing the putouts by the number of games to yield a “putout average”; dividing the assists similarly to arrive at an “assist average”; and to divide putouts plus assists by games to get “fielding average.” These averages took no account of errors. (Does Wright’s “fielding average” look familiar? You may have recognized it as Bill James’s Range Factor! Everything old is new again.)

This is all testimony to the changing nature of the game—not just to the evolving approaches of statisticians, but to fundamental changes in the games. […] The public’s appetite for new statistics was not sated by the outburst of 1876. New measures were introduced in dizzying profusion in the remaining years of the century. Some of these did not catch on and were soon dropped, some for all time, others only to reappear with renewed vigor in the twentieth century.

The statistic that never resurfaced after its solitary appearance in 1880 was “Total Bases Run,” a wonderfully silly figure which signified virtually nothing about either an individual’s ability in isolation or his value to his team. It was sort of an RBI in reverse, or from the baserunner’s perspective. Get on with a single, proceed to score in whatever manner, and you’ve touched four bases. Abner Dalrymple of Chicago was baseball history’s only recorded leader in the category with 501. Now there’s a major league trivia question.

Another stat that was stillborn in the 1870s was times reached base on error (it was computed again in 1917 –19 by the NL, then dropped for all time). Its twentieth-century companion piece, equally short-lived after its introduction in the 1910s, was runs allowed by fielders. Lanigan records this lovely bit of doggerel written to “honor” Chicago shortstop Red Corriden, whose errors in 1914 let in 20 runs:

Red Corriden was figuring the cost of livelihood.

“‘Tis plain,” he said, “I do not get the money I should.

According to my figrin’, I’d be a millionaire

 If I could sell the boots I make for 30 cents a pair.”

Previously mentioned was another stat which blossomed in only one year (1879), Reached First Base. This resurfaced, however, in the early 1950s in an improved form called On Base Average, which may be the most widely familiar of all unofficial statistics. [It was made official in 1985, the year after publication of Hidden Game.] In the same manner, the “total bases per game” tabulation of the 1860s vanished only to be named an official stat decades later in its modified version of slugging percentage. And yet another 1860s stat, earned run average, dropped from sight in the 1880s only to return triumphant to the NL in 1912 and the AL in 1913, when Ban Johnson not only proclaimed it official but also dictated that the AL compile no official won-lost records (this state of affairs lasted for seven years, 1913 –19.)

Another stat which was “sent back to the minors” before settling in for good in 1920 was the RBI. Introduced by a Buffalo newspaper in 1879, the stat was picked up the following year by the Chicago Tribune, which in the words of Preston D. Orem, “proudly presented the ‘Runs Batted In’ record of the Chicago players for the season, showing Anson and Kelly in the lead. Readers were unimpressed. Objections were that the men who led off, Dalrymple and Gore, did not have the same opportunities to knock in runs. The paper actually wound up “almost apologizing for the computation.” Even then astute fans knew the principal weakness of the statistic to be its extreme dependence on situation—in a particular at bat, whether or not men are on base; over a season or career, one’s position in the batting order and the overall batting strength in one’s team. It is a curious bit of logical relativism to observe that fans of the nineteenth century rejected ribbies because of their poor relation to run-producing ability while twentieth-century fans embrace the stat for its presumed indication of that same quality.

More tomorrow!

