Read Moneyball (Movie Tie-In Edition) (Movie Tie-In Editions) Online
Authors: Michael Lewis
Tags: #Sports & Recreation, #Business Aspects, #Baseball, #Statistics, #History, #Business & Economics, #Management
The failure of baseball people to acknowledge that fact in their statistics led to exactly the sort of moral corruption Henry Chadwick, in creating them, had sought to eliminate. The many little injustices and misunderstandings embedded in the game’s records spawned exotic inefficiencies. Baseball strategies were often wrongheaded and baseball players were systematically misunderstood. Chadwick succeeded in creating a central role for statistics in baseball, but in doing it he created the greatest accounting scandal in professional sports.
Between Chadwick and James there had been fitful efforts to rethink old prejudices. The legendary GM Branch Rickey employed a professional statistician named Allan Roth who helped to compose an article under Rickey’s byline in
Life
magazine in 1954 that argued for the importance of on-base and slugging percentages over batting average. A professor of mechanical engineering at Johns Hopkins, Earnshaw Cook, wrote two pompous books, in prose crafted to alienate converts, that argued for the relevance of statistical analysis in baseball. In the early 1960s, a pair of brothers employed by IBM used the company’s computers to analyze baseball strategies and players. But the desire to use statistics to make baseball efficient—to measure and value precisely the events that occur on a baseball field, to give the numbers new powers of language—only became potent when it became practical.
When Bill James published his
1977 Baseball Abstract
, two changes were about to occur that would make his questions not only more answerable but also more valuable. First came radical advances in computer technology: this dramatically reduced the cost of compiling and analyzing vast amounts of baseball data. Then came the boom in baseball players’ salaries: this dramatically raised the benefits of having such knowledge. “If we’re going to pay these guys $150,000 a year to do this,” James concluded in his essay on fielding, “we should at least know how good they are—which means knowing how much they allowed in the field just as much as it means knowing how much they created at bat.” If this sounded compelling when baseball players were paid $150,000 a year, it sounded one hundred times more so when they were paid $15 million a year.
James’s first proper essay was the preview to an astonishing literary career. There was but one question he left unasked, and it vibrated between his lines: if gross miscalculations of a person’s value could occur on a baseball field, before a live audience of thirty thousand, and a television audience of millions more, what did that say about the measurement of performance in other lines of work? If professional baseball players could be over-or undervalued, who couldn’t? Bad as they may have been, the statistics used to evaluate baseball players were probably far more accurate than anything used to measure the value of people who didn’t play baseball for a living.
Still, had he left off writing in 1977, James would have been dismissed as just another crank who didn’t know when to shut up about box scores. He didn’t leave off in 1977. It didn’t occur to him to be disappointed by a sale of seventy-five copies; he was encouraged! No author has ever been so energized by so little. As James’s wife, Susan McCarthy, later put it, “instead of one page of a stolen base study lying on top of a couple of pages of pitcher data in the dungeon of a Stokely Van Camp’s cardboard box for years and years, ideas and questions about issues he had been chewing on for a long time took up residence in a climate that allowed for growth and maturation.”
*
In 1978, James came out with a second book, and this time, before entering his discussion, he checked his modesty at the door. The book was titled
1978 Baseball Abstract: The 2nd Annual Edition of Baseball’s Most Informative and Imaginative Review
. “I would like to produce here the most complete, detailed, and comprehensive picture of the game of baseball available anywhere,” he wrote, “and I would like to avoid repeating anything that has ever been written before.”
Word had spread this time: 250 people bought a copy. To an author who viewed a sale of 75 copies as encouragement, the sale of 250 was a bonanza. James’s pen was now an unstoppable force. Every winter for the next nine years he wrote with greater confidence; every spring his growing audience found relatively less space devoted to numbers and more to James’s words. The words might run on for many pages but they were typically presented as digressions from the numbers. Wishing to convey the history of his obsession with baseball, for instance, James buried it in a discussion of the year-end stats of the Kansas City Royals. Unable to supress his distaste for the rich men who bought baseball teams and spent huge sums of money on players, he left off writing about the Atlanta Braves and picked up the subject of their new owner. “Ted Turner,” he wrote, “seems never to have been tempted by moderation, by dignity or restraint. He is a man who plays hard at gentleman’s games and whines when he loses that the victor was not a gentleman. No matter how hard he flees, he will always be pursued by an Awful Commonness, and that is what makes him a winner.” (Yankees fans would soon learn that James was capable of greater contempt: “Turner is the man Steinbrenner dreams of being.”)
The
Baseball Abstracts
were one long, elaborate aside, and the aside raised all sorts of strange new questions: If Mike Schmidt hit against the Cubs all the time, what would he hit? Did fleet young black players, as it seemed to James, actually lose their speed later in their careers than fleet young white players? Who were the best dead hitters? Even the most obscure questions about baseball, and its history, had practical implications. To calculate what Mike Schmidt would hit if he hit only against the Chicago Cubs, you needed to understand how hitting in Wrigley Field differed from hitting in other parks. To compare white and black speedsters, you needed to find a way to measure speed on the base paths and in the field; and once you’d done that, you might begin to ask questions about the importance of foot speed. To determine the best dead hitters, you needed to build tools to evaluate them, and those tools worked just as well on the living.
That last problem preoccupied James. From his second season on, he more or less set baseball defense to one side and concentrated on baseball offense. He explained to the readers of the second
Abstract
that his book contained roughly forty thousand baseball statistics. A few of them had been easy for him to obtain, but “the bulk of them were compiled one by one, picked out of the box scores and laboriously sorted into groups of about 30 or so, groups with titles like ‘Double Plays turned in games started by Nino Espinosa,’ and ‘Triples hit by Larry Parrish in July.’” He freely admitted that collecting baseball statistics was, on the face of it, a bizarre way to spend one’s time—unless one was obsessed by the baseball offense. “I am a mechanic with numbers,” he wrote to readers of the third
Abstract
,
tinkering with the records of baseball games to see how the machinery of the baseball offense works. I do not start with the numbers any more than a mechanic starts with a monkey wrench. I start with the game, with the things that I see there and the things that people say there. And I ask: Is it true? Can you validate it? Can you measure it? How does it fit with the rest of the machinery? And for those answers I go to the record books…. What is remarkable to me is that I have so little company. Baseball keeps copious records, and people talk about them and argue about them and think about them a great deal. Why doesn’t anybody use them? Why doesn’t anybody say, in the face of this contention or that one, “Prove it”?
For what now seem like obvious reasons the baseball offense was more interesting to James than the other two potentially big fields of research, fielding and pitching. Hitting statistics were abundant and had, for James, the powers of language. They were, in his Teutonic coinage, “imagenumbers.” Literary material. When you read them, they called to mind pictures. “Let us start with the number 191 in the hit column,” he wrote,
and with the assertion that it is not possible for a flake (I would hope that no one reading this book doesn’t know what a flake is) to get 191 hits in a season. It is possible for a bastard to do this. It is possible for a warthog to do this. It is possible for many people whom you would not want to marry your sister to do this. But to get 191 hits in a season demands (or seems to demand, which is as good for the drama) a consistency, a day-in, day-out devotion, a self-discipline, a willingness to play with pain and (to some degree) a predisposition to the team game which is wholly inconsistent with flakiness. It is entirely possible, on the other hand, for a flake to hit 48 homers. Hitting 48 homers is something done by large, slow men three-quarters thespian….
James was an aesthete. But he was also a pragmatist: he had happened upon something broken and wanted to fix it. But he could only fix what he had the tools to fix. The power of statistical analysis depends on sample size: the larger the pile of data the analyst has to work with, the more confidently he can draw specific conclusions about it. A right-handed hitter who has gone two for ten against left-handed pitching cannot as reliably be predicted to hit .200 against lefties as a hitter who has gone 200 for 1,000. The offensive statistics available to James in 1978 were sufficiently comprehensive to reach specific, meaningful conclusions. Offense he could fix. He couldn’t fix fielding because, as he had explained in his first
Abstract
, there wasn’t the data available to make a meaningful appraisal of fielding. Pitching didn’t need to be fixed. Or, at any rate, James didn’t think it did.
In 1979, in the third, now annual,
Baseball Abstract
, James wrote, “a hitter should be measured by his success in that which he is trying to do, and that which he is trying to do is create runs. It is startling, when you think about it, how much confusion there is about this. I find it remarkable that, in listing offenses, the league will list first—meaning best—not the team which scored the most runs, but the team with the highest batting average. It should be obvious that the purpose of an offense is not to compile a high batting average.” Because it was not obvious, at least to the people who ran baseball, James smelled a huge opportunity. How
did
runs score? “We can’t directly see how many runs each player creates,” he wrote, “but we can see how many runs each team creates.”
He set out to build a model to predict how many runs a team would score, given its number of walks, hits, stolen bases, etc. He’d dig out the numbers for, say, the 1975 Red Sox. (Walks by individual players were still hard to find in 1975, thanks to Henry Chadwick, but team totals were available.) He could also find out how many runs the 1975 Red Sox scored. What he needed to determine was the relative importance to the team’s scoring of the various things Red Sox players did at the plate and on the base paths—that is, assign weights to outs, walks, steals, singles, doubles, etc. There was nothing elegant or principled in the way he went about solving the problem. He simply tried out various equations on the right side of the equals sign until he found one that gave him the team run totals on the left side. The first version of what James called his “Runs Created” formula looked like this:
Runs Created = (Hits+Walks) x Total Bases/(At Bats+Walks)
Crude as it was, the equation could fairly be described as a scientific hypothesis: a model that would predict the number of runs a team would score given its walks, steals, singles, doubles, etc. You could plug actual numbers from past seasons into the right side and see if they gave you the runs the team scored that season. James was, in a sense, trying to predict the past. If the actual number of runs scored by the 1975 Boston Red Sox differed dramatically from the predicted number, his model was clearly false. If they were identical, James was probably onto something. As it turned out, James was onto something. His model came far closer, year in and year out, to describing the run totals of every big league baseball team than anything the teams themselves had come up with.
That, in turn, implied that professional baseball people had a false view of their offenses. It implied, specifically, that they didn’t place enough value on walks and extra base hits, which featured prominently in the “Runs Created” model, and placed too much value on batting average and stolen bases, which James didn’t even bother to include. It implied that sacrifices of any sort were aptly named, as they made no contribution whatsoever. That is: outs were more precious than baseball people believed, or seemed to believe. Not all baseball people, of course. The Jamesean analysis was consistent with an approach to the game championed most vocally by the former manager of the Baltimore Orioles, Earl Weaver. Weaver designed his offenses to maximize the chances of a three-run homer. He didn’t bunt, and he had a special taste for guys who got on base and guys who hit home runs. Big ball, as opposed to small ball.
But once again, the details of James’s equation didn’t matter all that much. He was creating opportunities for scientists as much as doing science himself. Other, more technically adroit people would soon generate closer approximations of reality. What mattered was (a) it was a rational, testable hypothesis; and (b) James made it so clear and interesting that it provoked a lot of intelligent people to join the conversation. “The fact that the formulas work with the accuracy that they do is a way of saying there are essentially stable relationships between batting average, home runs, walks, other offensive elements—and runs,” wrote James.
This kind of talk was catnip to people whose lives were devoted to discovering stable relationships in a seemingly unstable world: physicists, biologists, economists. There was a young statistician at the RAND Corporation, a future chair of the Harvard statistics department, named Carl Morris. “I’d been thinking about advanced ideas in baseball analysis,” said Morris, “and was impressed that someone else was, too, who wrote about it in a very interesting way.” Morris counted the days until the next
Baseball Abstract
appeared. James pointed the way to big questions that Morris could address more rigorously than even James could.