Authors: Stephen Baker
Ferrucci leaned forward, looking agitated, and said to no one in particular, “It doesn't feel right. Did you leave off half the system?” His colleagues, all typing on their laptops, kept their heads down and murmured that they hadn't. To engage Ferrucci when he was in a darkening mood could backfire. No one was looking for a confrontation this early in the morning.
Watson continued to malfunction. As the two
Jeopardy
players outscored the machine, it developed a small speech defect. Its genial male voice started to add a “D” to words ending in “N.” In the category the Second Largest City, Watson buzzed for the clue, Lahore, and confidently answered, “What is Pakistand
?
” After a short consultation, the game judge, strictly following the rules, declared the answer incorrect. That turned Watson's $600 gain into a loss, a difference of $1,200. “This is ridiculous,” Ferrucci muttered.
Then Watson, a still faceless presence at the far left podium, began to place some ludicrous bets. In one game, it was losing to a journalist and former
Jeopardy
champion named Greg Lindsay, $12,400 to $6,700. Watson landed on a Daily Double. If it bet big, it could pull even with Lindsay or even inch ahead. Yet it wagered a laughable $5. It was Watson's second strange bet in a row. The researchers groaned in unison. Some of their colleagues were sitting in the studio with the
New York Times Magazine
's Clive Thompson, who was writing a piece on Watson. They looked through the window at Ferrucci and shrugged, as if to ask “What's up with this beast?”
But Ferrucci didn't see them. He was staring at David Gondek. Lithe and unusually cheerful, Gondek was a leading member of the team. Unlike most of his suburban colleagues, he lived far south in Greenpoint, Brooklyn, taking the train and biking from the station. He headed up machine learning and game strategy and seemed to have a hand in practically every aspect of Watson. Ferrucci continued to stare wordlessly at him. Gondek, after all, was responsible for programming Watson's betting strategy, and it looked like the computer was playing to lose. Ferrucci, during this brief interlude, was carrying out an inquisition with his eyes.
Gondek looked up at his boss. “It's a heuristic,” he explained. He meant that Watson was placing bets according to a simple formula. Gondek and his colleagues were hard at work on a more sophisticated betting strategy, which they hoped would be ready in a month. But for now, the computer relied on a handful of rules to guide its wagers.
“I didn't realize that it was this stupid!” Ferrucci said. “You never told me it was brain-dead.” He gestured toward Thompson, who was watching the game on the other side of the glass and taking notes on his laptop. “We really enjoy stinking it up for the
New York Times
writer.”
Gondek started to explain the thinking behind the heuristic. If Watson had barely half the winnings of the leader, one of its rules told it not to risk much in a Daily Double. Its primary goal at this point was not to catch up but to reach Final Jeopardy within striking distance of the leader. If it fell below half of the leader's total, it risked being locked out of Final Jeopardyâa disaster. So Watson was instructed to be timid in these circumstances, even if it meant losing the gameâand infuriating the chief scientist.
Nearly every week for several months, IBM had been bringing in groups of six players with game experience to match wits with Watson in this new mock-
Jeopardy
studio. They competed on game boards that had already been played in Culver City but not yet telecast. Friedman's team would not grant IBM access to the elite players who qualified for
Jeopardy
's Tournament of Champions. They didn't want to give Watson too much exposure to
Jeopardy
greatnessâat least not yet. For sparring partners, the machine had to settle for mere mortals, players who had won no more than two games in televised matches. It was up to Ferrucci's team to imagineâor, more likely, to calculateâhow much more quickly Ken Jennings and Brad Rutter would respond to the buzzer and how many more answers they'd get right.
By the time Watson started the sparring sessions, in November 2009, the machine had already practiced on tens of thousands of
Jeopardy
clues. But the move from Hawthorne to the Yorktown research center placed the system in a new and surprising laboratory. Playing the game tested new skills, starting with speed. For two years, development at the Hawthorne labs had focused on Watson's cognitive processâcoaxing it to come up with right answers more often, to advance up the Jennings Arc. During games, though, nailing the answer meant nothing if Watson lost the buzz. At the same time, it had to grapple with strategy. This meant calculating its bets in Daily Doubles and Final Jeopardy and estimating its chances on clues it had not yet seen. It also had to anticipate the behavior of its human foes, especially in Final Jeopardy, where single bets often won or lost games.
Perhaps the biggest revelation in the sparring matches came from the spectators: They laughed. They were mostly friends of the players and a smattering of IBM employees, watching from four rows of folding chairs. Watson amused them. This isn't to say that they weren't impressed by a machine that came up with some of the most obscure answers in a matter of seconds. But when Watson committed a blooperâand it happened several times a gameâthey cracked up. They laughed when Watson, exercising its mastery of roman numerals, referred to the civil rights leader Malcolm X as “Malcolm Ten.” They laughed more when Watson, asked what the “Al” in Alcoa stood for, promptly linked the aluminum giant to one of America's most notorious gangsters: “What is Al Capone?” (Watson, during this stage, often referred to people as things. This established a strange symmetry, since the contestants routinely referred to the
Jeopardy
machine as “him.”) One Final Jeopardy answer a few weeks later produced more merriment. In the category 19th Century Literature, the clue read: “In Chap. 10, the whole mystery of the handkerchiefs, and the watches, and the jewels . . . Rushed upon this title boy's âmind.'” Instead of Oliver Twist, Watson somehow came up with a British electronic dance music duo, answering, “What is the Pet Shop Boys?”
From a promotional perspective, an occasional nonsensical answer promised to make Watson a more entertaining television performer, as long as the computer kept it clean. This wasn't always assured. In one of its first sparring sessions, in late 2009, the machine was sailing along, thrashing a couple of mid-level
Jeopardy
players in front of an audience that included Harry Friedman and fellow
Jeopardy
bosses. Then Watson startled everyone with a botched answer for a German four-letter word in the category Just Say No. Somehow the machine came up with “What is Fuck?” and flashed the word for all to see on its electronic answer panel. To Watson's credit, it didn't have nearly enough confidence in this response to buzz. (It was a human who correctly responded, “What is
nein
?”) Still, Ferrucci was mortified. It was a relief, he said, to look over at Friedman and his colleagues and see them laughing.
Still, such a blunder could tarnish IBM's brand. Watson was the company's ambassador. It was supposed to represent the future of computing. Machines like this, the company hoped, would soon be answering questions in businesses around the world. But it was clear that Watson could conceivably win the
Jeopardy
challenge and still be remembered, on YouTube and late-night TV, for its gaffes. After an analysis of Watson's errors, IBM concluded that 5 percent of them were “embarrassing.” This led Ferrucci, early in 2010, to assign a team of researchers to a brand-new task: keeping Watson from looking dumb. “We call it the stupid team,” said Chu-Carroll. Another team worked on a profanity filter.
As each day's sparring sessions began, the six
Jeopardy
players settled into folding chairs between the three contestant podiums, the host's stand, and the big
Jeopardy
board, with its familiar grid of thirty clues. David Shepler stood before them. Dark, thin, and impeccably dressed, Shepler ran the logistics of the
Jeopardy
project. He sweated the details. He made sure that IBM followed to the letter the legal agreements covering the play. He didn't bend an inch for Watson. (It was his ruling that docked Watson $600 for mispronouncing Pakistan.) In the War Room's culture of engineers and scientists, Shepler, a former U.S. Air Force intelligence officer, was an outsider. He told them what they could not do, which at times led to resentment. Before each match, he instructed the contestants on the rules. They weren't to tell anyone orâheaven forbidâblog about the matches, the behavior of Watson, or the clues, which had been entrusted to IBM by
Jeopardy
. He had them sign lengthy nondisclosure agreements and then introduced David Ferrucci.
On this winter morning, Ferrucci ambled to the front of the room. He was wearing dark slacks and a black pullover bearing IBM's logo. He outlined the
Jeopardy
challenge and described the goal of building a question-answering dynamo. He pointed to the window behind them, where a set of blue rectangular towers housed the computers running the Watson program. Through the double-pane window, the players could hear the dull roar of the fans working to cool its processors. Ferrucci, priming the humans for the match ahead, tossed out a couple of
Jeopardy
clues, which they handled with ease. “Oh, I bet Watson's getting nervous,” he said. “He could be in for a tough day.”
Still, Watson had made astounding progress since its early days in the War Room. Ferrucci showed a slide of what used to be the Jennings Arc. It had the same constellation of Jennings dots floating high and to the right. But it had been expanded into a Winners Cloud, with blue dots representing hundreds of other
Jeopardy
winners. Most of the winners occupied the upper right quadrant, but below and to the left of most of Jennings's dots. The average winner buzzed on about half the questions and got four out of five right. Ferrucci traced Watson's path on the chart. The computer, which in 2007 produced subhuman results, now came up with confident answers to about two-thirds of the clues it encountered and got more than 80 percent of them right. This level of performance put it smack in the middle of the Winners Cloud. Though not yet in Ken Jennings's orbit, but it was moving in that direction. Of its thirty-eight games to date against experienced players, Ferrucci said, it had won 66 percent, coming in third only 10 percent of the time.
While explaining Watson's cognitive process, Ferrucci pointed to a black electronic panel. The players wouldn't see it during the game, he explained, but this panel would show the audience Watson's top five candidate answers for each question and how much confidence the machine had in each one. “This gives you a look into Watson's brain,” he said. Moments later, he gave them a glimpse into his own. Showing how the computer answered a complicated clue featuring the Portuguese explorer Vasco da Gama, Ferrucci pointed to the list of candidate answers. “I was confident and I got it right,” he said. Then, realizing that he was doing a mind meld, he explained that he was speaking for Watson. “I identify with the computer sometimes.”
One of the contestants asked him how Watson “heard” the information. “It reads,” Ferrucci said. “When the clue hits your retina, it hits Watson's chips.” Another contestant wondered about the algorithms Watson used to analyze the different answers. “Can you tell us how it generates confidence scores?”
“I could tell you,” Ferrucci said, clearly warming to the competitive nature of the Challenge, “but I'd have to shoot you.”
For these sparring rounds, IBM hired a young and telegenic host named Todd Crain. An actor originally from Rockford, Illinois, Crain had blond hair, a square jaw, and a quick wit, and had acted in comedy videos for TheOnion.com. At IBM's
Jeopardy
studio, he mastered a fluid and hipper take on Alex Trebek. Unlike Ferrucci's scientists, who usually referred to Watson as a thing, Crain always addressed Watson as a person. Watson was a character he could relate to, an information prodigy who committed the stupidest and most hilarious errors imaginable. Crain encouraged the machine, flattered it, and upbraided it. Sometimes he closed his eyes theatrically and moaned, “Oooooh, Watson!”
Crain had fallen into the
Jeopardy
gig months earlier through a chance encounter with David Shepler. Crain was working on a pilot documentary called EcoFreaks, telling the stories of people working at the fringes of the environmental movement. He said it involved spending one evening in New York with “freegans,” Dumpster-divers devoted to reusing trash. On the next assignment, Crain and the crew drove north to the college town of New Paltz, New York. There Sheplerâwith the attention to detail he later demonstrated managing the
Jeopardy
projectâhad built a three-story house that would generate as much energy as it consumed, a so-called zero net-energy structure. While showing Crain the triple-pane windows, geothermal exchange unit, and solar panel inverter, Shepler asked the young actor if he might be interested in hosting a series of
Jeopardy
shows. “I said âyes' before he even had a chance to finish the sentence,” Crain said.
On occasion, Crain could irritate Ferrucci by making jokes at Watson's expense. The computer often opened itself to such jibes by mauling pronunciation, especially of foreign words. And it had the unfortunate habit of spelling out punctuation it didn't understand. One day, in the category Hair-y Situation, Watson said, “Let's play Hair-dash-Y Situation for two hundred.” Crain imitated this bionic voice, getting a laugh from the small entourage of technicians and scientists. Ferrucci shook his head and muttered. Later, when Crain imitated a mangled name, Ferrucci channeled his irritation into feigned outrage: “He's making fun of him! It's like making fun of someone with a speech impediment!” (Once, Ferrucci said, he brought his wife and two daughters to a sparring session. When one of the girls heard Crain mimicking Watson, she said, “Daddy, why is that man being so
mean
to Watson?”)