Authors: Stephen Baker

Final Jeopardy (17 page)

As the day's sparring session began, Crain gave the first two human contestants a chance to acquaint themselves with the buzzers. They tried several dozen old clues. Then he asked if they wanted Watson to join them. They nodded. “Okay, Burn, let him loose,” Crain said. Burn Lewis, the member of Ferrucci's team who orchestrated the show from a tiny control room, pressed some buttons. The third competitor, an empty presence at the podium bearing the nameplate Watson, assumed its position. It might as well have been a ghost.

In the first game, it was clear the humans were dealing with a prodigious force that behaved differently from them. While humans almost always oriented themselves in a category by starting with the easier $200 clues, Watson began with the $1,000 clues at the bottom of the board and worked its way up. There was a logic to this. While humans heard all the answers, right and wrong, and learned from them, Watson was deaf to the proceedings. If it won the buzz, answered the clue, and got to pick another one, it could assume that it had been right. But that was its only feedback. Watson was senseless to all of the signals its human competitors displayedâthe smiles, the gasps, the confident tapping of fingers, the halting speech and darting eyes spelling panic. More important, it lost out on game intelligence. If a human answered a clue incorrectly, Watson was liable to buzz on what was known as the rebound and deliver the very same incorrect answer. What was worse, Watson had a far harder time orienting itself to the categories. How would it understand the Hair-y Situation category without hearing the other contestants' correct answers? During these weeks, Ferrucci's team was talking with the
Jeopardy
executives about giving Watson an electronic feed with the text of the correct answer after each clue. But for the time being, the machine was learning nothing. So why not start each category with the priciest clues? The high bets might spook the humans. What's more, IBM's statistical analysis indicated that Watson was likely to land more Daily Doubles in those pricier realms.

Watson started off with Capital Cities, an apparently straightforward category that seemed to promise the machine's favorite type of answer: factoids. It jumped straight to the $1,000 clue: “The Rideau Canal separates this North American capital into upper and lower regions.” Todd Crain read the clue, and Lewis, in the control room, hit the button to turn on the light around the clue, opening it up for buzzes. Within milliseconds Watson had the clue all to itself.

“Watson?” Crain said.

“What is Ottawa?” Watson answered. With that, it raced through the entire category, with each correct answer reinforcing its confidence that it would know the others. Crain read each clue, the humans squeezed the button, and Watson beat them to it. It had no trouble with the South American city founded in 1535 by Pizarro (“What is Lima?”) or the capital that fell to British troops in 1917 and to U.S. troops on April 9, 2003 (“What is Baghdad?”). These were factoids, each one wrapped in the most helpful data for Watson: hard facts unencumbered by humor, slang, or the cultural references that could tie a cognitive engine into knots. No, this category delivered a steady stream of dates, distances, specific names and numbers. For a
Jeopardy
computer, it was comfort food.

“Very good, Watson!” Crain said.

But after that promising start, Watson started to falter. Certain categories were simply hard for it to figure out. One was called I'll Take the Country from Here, Thanks. When Watson processed the $400 clue, “Nicolas Sarkozy from Jacques Chirac,” it didn't know how to answer. In a few milliseconds it could establish that both names corresponded to presidents of France. But it did not understand the category well enough to build up confidence in an answer (“What is France?”). And it was not getting any orientation from the action of the game. Humans owned that category. Watson sat it out.

Then, in the category Collegiate Rhyme Time, Watson showed its stuff, but not enough to win. One asked for the smell of the document you receive upon graduating. Watson understood the clue perfectly and searched for synonyms for “document,” then attempted to match them with words related to “smell.” The best it could come up with was “What is bill feel?” (“What is diploma aroma?”).

The real problems started when Watson found itself facing Greg Lindsay, a journalist and a two-time
Jeopardy
champion. Lindsay, thirty-two, had spent much of his time at the University of Illinois on the Quiz Bowl circuit, where he occasionally ran into Ken Jennings. In order to spar with Watson, Lindsay had to sign David Shepler's nondisclosure agreement. IBM wanted to keep Harry Friedman and his minions in the dark, as much as possible, about Watson's strengths and vulnerabilities. And Friedman didn't want the clues escaping onto the Internet before they aired on television. This meant that even if Lindsay defeated Watson, he wouldn't be able to brag about it to the Quiz Bowl community. For his crowd, this would be the equivalent of besting Kobe Bryant in a one-on-one game of hoops, then having to pretend it hadn't happened.

Even so, Lindsay came with a clear strategy to defeat Watson. He quickly saw that Watson mastered factoids but struggled with humor and irony, so he steered clear of Watson-friendly categories. He figured Watson would clean up on Name that Continent, picking out the right landmasses for Estado de Matto Grosso (“What is South America?”) and the Filchner Ice Shelf (“What is Antarctica?”). The category Superheroes Names through Pictures looked much more friendly to humans. Sure enough, Watson was bewildered by clues such as “X marks the spot, man, when this guy opens his peeper” (“What is cyclops?”). Band Names also posed problems for Watson because the clues, like this one, were so murky: “The soul of a deceased person, thankful to someone for arranging his burial” (“What is the Grateful Dead?”). If the clue had included the lead guitarist Jerry Garcia or a famous song by the band, Watson could have identified it in an instant. But clues based on allusions, not facts, left it vulnerable.

More important, since the currency they were playing with was worthless, Lindsay decided to bet the maximum on each Daily Double. If he blew it, he lost nothing. And since he wasn't on national television, his reputation wouldn't suffer. As he put it, “There's no societal fear.” Yet if he won his big bets, he'd be positioned to withstand Watson's inevitable charges through categories it understood. “I knew he would go on tears,” Lindsay said. “I had to build up big leads when I had the chance.” He aced his big bets and ended up thrashing Watson three times, once scoring an astronomical $59,999 of funny money. (The
Jeopardy
single-game record was $52,000 until Ken Jennings crushed it, winning $75,000 in his thirty-eighth game.)

Fortunately for Lindsay, he got Watson on what soon appeared to be a very bad day for the bionic star. The speech defect returned. When naming “one of the two monarchies that border China,” the computer said, “What is Bhutand?” The game judge, Karen Ingraffea, consulted with David Shepler. From the observation room, Ferrucci could see them talking but could not hear a word. Shepler nodded grimly. Then he delivered the verdict to Todd Crain. Again Watson was docked, this time $1,000.

“This is silliness!” Ferrucci said.

His concern deepened as Watson started to strike out on questions that should have been easy. One Final Jeopardy clue, in the category 20th-Century People, looked like a cinch. It said: “The July 1, 1946, cover of
Time
magazine featured him with the caption, âAll matter is speed and flame'” (“Who is Albert Einstein?”). Watson displayed its top answers on its electronic panel. They were all ridiculous, and to the machine's credit, it had rock-bottom confidence in them. First was Time 100, a list of influential people that at one time included Einstein. But Watson should have known that the clue was asking for a “him,” not an “it.” For more than two years, Ferrucci's language programmers had been preparing the machine to parse these clues. They had defined and mapped the twenty-five hundred things
Jeopardy
clues ask about. The most common of these LATs were looking for a male person, a “he.” Determining that this clue was asking for a man's name should not have been so hard.

Watson's second choice, even more absurd, was David Koresh, the founder of the apocalyptic Branch Davidian cult near Waco, Texas. Koresh appeared on the May 3, 1993, cover of
Time,
days after burning down his compound and immolating everyone in it, including himself. No doubt the “flame” in the clue led Watson to Koresh. But Koresh was not born until thirteen years after Einstein appeared on the
Time
cover. Watson's other stabs were “stroke” and the painter Andrew Wyeth.

At this point, Ferrucci's frustration boiled over. He wasn't so bothered by the wild guesses, like David Koresh. The system had come up with a few answers that were somehow connected to the clueâa common magazine cover or flame. The confidence engine had done its job. After studying them, it had found little to go on and declared them worthless. “Watson's low-confidence answers are just garbage,” Ferrucci had told the contestants earlier.

But why didn't Watson find the right answer? For a computer with access to millions of documents and lists, the July 1, 1946, cover profile in the nation's leading newsmagazine shouldn't be a deep mystery.

Ferrucci concluded that something was wrong with Watson and he wanted the team in the War Room at Hawthorne to get working on it right away. Yet even in one of the world's leading technology companies, it wasn't clear how to send the digital record of the computer's misadventures through the Internet. Ferrucci asked Eric Brown, then Eddie Epstein, and then Brown again: “How do I get the xml file to Hawthorne?” For Ferrucci, this failed game was brimming with vital feedback. It could point the Hawthorne team toward crucial fixes. The idea that his team could not respond immediately to whatever ailed Watson filled him with dread. Just imagine if Watson reprised this disastrous performance in its nationwide debut with Jennings and Rutter. “HOW DO I GET THIS FILE TO HAWTHORNE?” he shouted. No one had a quick answer. Ferrucci continued to thunder while, on the other side of the window, Todd Crain, Watson, and the other
Jeopardy
players blithely continued their game. (Watson, for one, was completely unfazed.) Finally Brown confirmed that he could plug a thumb drive into one of Watson's boxes, download the game data, and e-mail it to the team in Hawthorne. It promised to be a long night in the War Room, as the researchers diagnosed Watson's flops and struggled to restore its cognitive mojo.

Cloistered in a refrigerated room on the third floor of the Hawthorne labs stood another version of Watson. It turned out that the team needed two Watsons: the game player, engineered for speed, and this slower, steadier, and more forgiving system for development. The speedy Watson, its algorithms deployed across more than 2,000 processors, was a finicky beast and near impossible to tinker with. This slower Watson kept running while developers rewrote certain instructions, shifted out one algorithm for another, or refined its betting strategy. It took forty minutes to run a batch of questions, but it could handle two hundred at a time. Unlike the fast machine, it created meticulous records, and it permitted researchers to experiment, section by section, with its answering process. Because the team could fiddle with the slower machine, it was always up-to-date, usually a month or two ahead of its speedy sibling. After the debacle against Lindsay, IBM could only hope that the slower,
smarter Watson wouldn't have been so confused.

Within twenty-four hours, Ferrucci's team had run all of that day's games on the slow machine. The news was encouraging. It performed 10 percent better on the clues. The biggest difference, according to Eric Brown, was that some of the clues were topical, and speedy Watson's most recent data came from 2008. “We got creamed on a couple categories that required much more current information,” he said.

Other recent adjustments in the slow Watson helped it deal with chronology. Keeping track of facts as they change over time is a chronic problem for AI systems, and Watson was no exception. In the recent sparring session, it had confused a mid-nineteenth-century novel for a late-twentieth-century pop duo. Yet when Ferrucci analyzed the slower Watson's performance on the problematic Oliver Twist clue, he was relieved to see that a recent tweak had helped the machine match the clue to the right century. This fix in “temporal reasoning” pushed the Pet Shop Boys answer way down its list, from first to number 79. Watson's latest top answerâ“What is magician?”âwas still wrong but not as laughable. “It still knows nothing about Oliver Twist,” Ferrucci wrote in a late-night e-mail.

While Ferrucci and a handful of team members attended every sparring match in the winter of 2010, Jennifer Chu-Carroll generally stayed away. For her, their value was in the data they produced, not the spectacle, and much less the laughs. As she saw it, the team had a long list of improvements to make before autumn. By that point, the immense collection of software running Watson would be locked downâfrozen. After that, the only tinkering would be in a few peripheral applications, like game strategy. But the central operations of the computer, like those of other mission-critical systems, would go through little but testing during the months leading up to the
Jeopardy
showdown. Engineers didn't dare tinker with Space Shuttle software once the vessel was headed toward the launch pad. Watson would get similar treatment.

With each sparring session, however, the list of fixes was getting longer. For each fix, the team had to weigh the time it would take against the possible gain in performance. “It's triage,” Chu-Carroll said. During one sparring session, for example, Watson mispronounced
weinerschnitzel
, neglecting to say the “W” as a “V.” Was it worth the trouble to fine-tune its German phonetics? Not unless someone could do it in a hurry.

Other books

You and Everything After by Ginger Scott

The Salati Case by Tobias Jones

Impossible by Nancy Werlin

Mangled Meat by Edward Lee

Shifters on Fire: A BBW Shifter Romance Boxed Set by Marian Tee, Lynn Red, Kate Richards, Dominique Eastwick, Ever Coming, Lila Felix, Dara Fraser, Becca Vincenza, Skye Jones, Marissa Farrar, Lisbeth Frost

Say Good-bye by Laurie Halse Anderson

The Danu by Kelly Lucille

Rebecca's Heart by Lisa Harris

The Exodus Sagas: Book III - Of Ghosts And Mountains by Jason R Jones

Field of Pleasure by Farrah Rochon