Authors: Stephen Baker

Final Jeopardy (18 page)

In one Final Jeopardy, Watson inched closer to the fix-it threshold. Asked to identify the sole character in the American Film Institute's list of the fifty greatest heroes who was not portrayed by a human, the computer came back with “Who is Buffy the Vampire Slayer?” The audience laughed, and Todd Crain slapped his forehead, saying, “Oh, Watson, for the love of God!”

Still, solving that clue would have been a formidable challenge. Once Watson found the list of heroes, it would have had to carry out fifty separate searches to ascertain that each of the characters, from Atticus Finch to James Bond, Indiana Jones, and
Casablanca
's Rick Blaine, was human. (It wouldn't necessarily be that easy, since most documents and databases don't note a protagonist's species.) During that search, presumably, it would see that thirty-ninth on the list was a collie, a breed of dog (and therefore not human), and would then display “Who is Lassie?” on its electronic screen. Would the lessons gained in learning how to spot the dog in a long list of humans pay off elsewhere? Probably not.

That raised another question for the harried team. If Watson had abysmally low confidence in a Final Jeopardy response, as was the case with the Pet Shop Boys and Buffy the Vampire Slayer, would it be better to say nothing? If it was in the company's interest to avoid looking stupid, suppressing wild guesses might be a good move. This dilemma did not arise with the regular
Jeopardy
clues. There, if Watson lacked confidence in an answer, it simply refrained from buzzing. But in Daily Doubles and Final Jeopardy, contestants had to bet before seeing the clue. Humans guessed when they didn't know the answer. This is what Watson was doing, too. But its chronic shortage of common sense made its guesses infinitely dumber. In the coming weeks, the IBM team would calculate the odds of a lucky guess for each of Watson's confidence levels. While
Jeopardy
executives, eager for entertainment and high ratings, would no doubt favor the occasional outrageous guess, IBM had other priorities. “At low levels of confidence, I think we'll just have it say it doesn't know,” said Chu-Carroll. “Sometimes that sounds smarter.”

Mathematics was one category where the IBM machine could not afford to look dumb. The company, after all, was built on math. However, the
Jeopardy
training data didn't include enough examples to educate Watson in this area. Of the more than seventy-five thousand clues Eric Brown and his team studied, only fourteen involved operations with fractions. A game strategist wouldn't dwell on them. But for IBM, there was more at risk than winning or losing a game. To prepare Watson for math, the team might have to put aside the statistical approach and train the machine in the rules and lingo of arithmetic.

As they worked to lift Watson's performance, the
Jeopardy
team focused on entire categories that the machine misunderstood. They called them train wrecks. It was a new genre, conceived after Watson's debacle against Lindsay. The most insidious train wrecks, Gondek said one afternoon, were those in which Watson was fooled into “trusting” its expertiseâgenerating high confidence scoresâin categories where it in fact had no clue. This double ignorance could lead it to lay costly bets, embarrassing the team and losing the match.

Lots of the train wreck categories raised questions about the roots of Watson's misunderstandings. One category, for example, that appeared to confuse it was Books in EspaÃ±ol. Watson didn't come close to identifying Ernest Hemingway's
Adios a las Armas,
Harper Lee's
Matar un RuiseÃ±or,
or Stephen King's
La Milla Verde.
It already held rudimentary foreign words and phrases in its tool kit. But would it benefit from greater detail? As it turned out, Watson's primitive Spanish wasn't the problem. The issue was simpler than that. From the name of the category and the bare-bones phrasing of the cluesâStephanie Meyer:
Luna Nueva
âthe computer did not know what to look for. And unlike human contestants, it was deaf to the correct answers. If IBM and
Jeopardy
ironed out an arrangement to provide Watson with the answers after each clue, it might orient itself in puzzling categories. That way, it could move on to the real challenge of the clue, recognizing titles like
To Kill a Mockingbird
and
A Farewell to Arms
in Spanish.

As the season of sparring sessions progressed, people in the observation room paid less attention to the matches as they were being played. They talked more and looked up at the big monitor when they heard laughter or when Watson found itself in a tight match. The patterns of the machine were becoming familiar. For them, much of the excitement came a day later, when they began to analyze the data and saw how the smarter version of Watson handled the troublesome clues. Ferrucci occasionally used the time during the matches to explain Watson's workings to visitors, or to give interviews. One March morning, he could be heard across the room talking to a documentary producer. Asked if he would be traveling to California for the televised final match, Ferrucci deadpanned: “I'll be sedated.”

David Gondek, sitting across from Ferrucci, his fingers on his laptop keyboard, said that pressure in the War Room was mounting. He had largely abandoned his commute from Brooklyn and now spent nights in a small apartment he'd rented nearby. It was only ten minutes by bike to the War Room or a half hour to pedal to the Yorktown labs, where the sparring sessions took place.

From the very beginning, Gondek said, the
Jeopardy
challenge differed from a typical software project. Usually, software developers are given a list of functions and applications to build. And when they finish them, test, tweak, and debug them, they're done. Building Watson, however, never ended, he said. There was always something it failed to understand. The work, he said, “is infinite.”

In graduate school, Gondek had focused on data mining. His thesis, on nonredundant clustering, involved programming machines to organize clusters of data around connections that the users might not have considered. By answering some preliminary questions, for example, an intelligence officer might inform the system that he's all too familiar with Osama bin Laden's connections to terrorism. So the system, when sorting through a batch of intelligence documents, would find other threads and connections, perhaps leading to fresh insights about the Al Qaeda leader. Machines, much like humans, follow conventional patterns of analysis. Gondek had been thinking about this since listening to a recent talk by a cognitive psychologist. It raised this question: If a machine like Watson fell into the same mental traps as humans, was it a sign of intelligence or just a cluelessness that it happened to share with us? He provided an example.

“What color is snow?” he asked.

“White,” I said.

“A wedding dress?”

“White.”

“Puffy clouds?”

“White.”

“What do cows drink?”

“Milk,” I said, falling obediently into the trap he'd set.

Cows, of course, drink water once they're weaned. Because humans naturally seek patterns and associations, most of us get into a “white” frame of mind. Psychologists call this the associative network theory. One node in our mind represents “cow,” said Penn State's Richard Carlson. “It's related to others, for milk and steak and mooing, and so on.” The mention of “cow,” he said, activates the entire network, priming it. “That way, you're going to be quicker to respond.”

Gondek's point was that Watson, unlike most question-answering programs, would fall for the same trick. It focused on patterns and correlations and had a statistical version of an associative network. It was susceptible to being primed for “white.” It was like a human in that narrow way.

University researchers in psychology and computational neuroscience are building computer models to probe these similarities. At Carnegie Mellon, a team under John Anderson, a psychology professor, has come up with a cognitive architecture called ACT-R that simulates human thought processes. Like Watson, it's a massively parallel system fueled by statistical analysis.

Yet the IBM team resolutely avoided comparisons between Watson's design and that of a brain. Any claims of higher intelligence on the part of their machine, they knew, would provoke a storm of criticism from psychologists and the AI community alike. It was true that on occasion Watson and the human brain appeared to follow similar patterns. But that, said Gondek, was only because they were programmed, each in its own way, to handle the same job.

A few months later, Greg Lindsay was eating sushi in a small Japanese restaurant near his apartment in Brooklyn Heights. He wore wire-rimmed glasses, and his thinning hair was cut so short that it stood straight up. He had to eat quickly. A book editor was waiting for the manuscript fixes on his book,
Aerotropolis.
It was about the rise of cities built around airports, and it fit his insatiable hunger for facts. He said he had had to delve deeply into transportation, energy, global manufacturing, and economics. Little surprise, then, that the book was nearly five hundred pages long.

Lindsay said he had assumed that Watson would maul him in the sparring rounds, beating him to the buzzer every time. This hadn't happened. By anticipating the end of Todd Crain's delivery of the clue, he had managed to outbuzz Watson a number of times. He also thought that the extra time to answer Final Jeopardy would give Watson plenty of opportunity to find the right answer. This clearly was not the case. In fact, this extra time raised questions among Ferrucci's team. To date, Watson was answering every question the same way, as if it had the usual three to five seconds, even when it had five or six times as long. That meant that it was forgoing precious time that it could be spending hunting and evaluating potential answers. Would that extra time help? Just a few days before, Gondek had said that he wasn't sure. With more time, he said, “Watson might bring more wrong answers to the surface and undermine its confidence in the right one.” In other words, the computer ran the risk of overthinking. In the coming months, Gondek and his colleagues thought they might test a couple of other approaches, but they were starting to run out of time.

For his strategy against Watson, Lindsay said, he took a page out of the William Goldman novel
The Princess Bride.
The hero in the story is facing a fight with a much better swordsman, so he contrives to move the fight to a stony surface, where the rival might slip. In the same way, Lindsay steered Watson to an equally unstable arena: “areas of semantic complexity.” He predicted that humans playing Watson in the television showdown would follow the same strategy.

But there was one big difference. With a million dollars at stake, the humans would not only be battling Watson, they'd also be competing against each other. This could change the dynamics dramatically. In the sparring sessions, the humans (playing with funny money) focused exclusively on the machine. “I didn't care about the others; I just wanted to beat Watson,” Lindsay said. But as the two humans in the upcoming match probed each other's weaknesses and raced to buzz peremptorily, they could open the door for the third contestant, who would be oblivious to the drama and would go about its business, no doubt, with the unflappable dispatch of a machine.

7. AI

ON A MIDSUMMER
afternoon in 2010, a cognitive scientist at MIT named Joshua Tenenbaum took a few minutes to explain why the human brain was superior to a question-answering machine like Watson. He used the most convenient specimen of human cognition at hand, his own mind, to make his case. Tenenbaum, a youthful professor with sandy hair falling across his forehead and an easy smile, has an office in MIT's imposing headquarters for research on brains, memory, and cognitive science. His window looks across the street at the cascading metallic curves of MIT's Stata Center, designed by the architect Frank Gehry.

Tenenbaum is focusing his research on the computational basis of human learning and trying to replicate it with machines. His goal is to come up with computers whose intelligence reaches far beyond answering questions or finding correlations in masses of data. One day, he hopes, the systems he's working on will come up with concepts and theories, the way humans do, sometimes basing them on just a handful of observations. They would make what he called inductive leaps, behaving more like Charles Darwin than, say, Google's search engine or Watson. Darwin's dataâhis studies of worms, pigeons, and a host of other plants and animalsâwas tiny by today's standards; they would occupy no more than a few megabytes on a hard drive. Yet he came up with a theory that explained the evolution of life on earth. Could a computer do that?

Tenenbaum was working toward that distant vision, but for the moment his objective was more modest. He thought Watson acted smarter than it was, and he wanted to demonstrate why. He had recently read in a magazine about Watson's mastery of
Jeopardy
's Before and After clues, the ones that linked two concepts or people with a shared word in the middle. When asked about a candy bar that was a Supreme Court justice, Watson had quickly come up with “Who is Baby Ruth Ginsberg.”

Now Tenenbaum was creating a Before and After clue of his own. “How about this one?” he said. “A president who wrote a founding document and later led a rebellion against it.” The answer, a combination of the third president of the United States and the only president of the Confederacy: Thomas Jefferson Davis.

Tenenbaum's point was that it took a team of gifted engineers to teach Watson how to handle these questions by devising clever algorithms. But humans, after seeing a single example of a Before and After clue, could build on it, not only figuring out how to respond to such questions but inventing new ones. “I know who Ruth Ginsberg is and I know what Baby Ruth is and I see how they overlap, and from that one example I can extract that template,” he said. “I don't have to be programmed with that question.” We humans, he explained, create our own algorithms on the fly.

Other books

The Mystery of the Grinning Gargoyle by Gertrude Chandler Warner

Post-Human 05 - Inhuman by David Simpson

Between the Seams by Aubrey Gross

Never Say Never, Part Two (Second Chance Romance, Book 2) by Shaw, Melissa

Shortie Like Mine by Ni-Ni Simone

The Last in Line (The Royal Inheritance Series Book 1) by Banks, Evie

The Best of Us by Sarah Pekkanen

High Risk by Carolyn Keene

The Man in the Shed by Lloyd Jones

Framley Parsonage by Anthony Trollope