Authors: Stephen Baker
In IBM's defense, even if the company had wanted to provide Watson with a rich and modulated human voice, it would have required a large development effort to build it. Existing voice technology came close to expressing human emotion but was still a bit off. The IBM team worried that people would resent, or fear, a computer that tried to mimic the emotional range of the human voice and fell short. To save money and reduce that risk, they adapted a friendly bionic voice they already had on a shelf. This Watson would remain relentlessly upbeat through the ups and downs of its
Jeopardy
career.
Not that the avatar wouldn't be expressive in its own way. Working in his Long Island studio, Joshua Davis was devising schemes to represent Watson's cognitive processes. He worked with forty-two threads of color that would stream and swarm across Watson's round avatar according to what was going on in its “mind.” It would look a bit like the glass globes at science museums, where each touch of a human hand summons jagged ribbons of lightning. Davis, a sci-fi buff, picked the number forty-two as an homage to Douglas Adams's
Hitchhiker's Guide to the Galaxy.
In that book, a computer named Deep Thought calculated the answer to the Ultimate Question of Life, the Universe, and Everything. It was forty-two. Someday, perhaps, a smarter computer would come up with the question for which forty-two was the answer. For Davis, the forty-two threads were his own little flourish within the larger work. “It's my Easter egg,” he said.
But what stories would those threads be telling? The Ogilvy team started by dissecting videos of
Jeopardy
games. They divided the game into the various states. They began with the booming voice of the longtime announcer, Johnny Gilbert, saying, “
This
is
Jeopardy!
” They continued through the applause, the introduction of the contestants and the host, Alex Trebek, and every possible permutation after that: when Watson won the buzz, when it lost, when the other player chose a category, and when the contestants puzzled over Final Jeopardy, scribbling their responses. There were a total of thirty-six states, each with its prescribed camera shot, many of them just a second or two long. (Davis was disappointed that they couldn't find six more, raising it to his magical number. “If we could just get it to forty-two,” he joked, “I'm pretty sure something quantum mechanical could happen, like a tornado of butterflies.”)
Still, it was clear that unless Watson got special treatment, the avatar would garner precious little screen time. When it answered a question, the camera would be focused on it for between 1.7 and 5 seconds. And during its most intense cognitive stagesâwhen it was considering a question, going through the millions of documents, and choosing among candidate answersâthe camera would stay fixed for a crucial 3 or 4 seconds on the clue. In essence, Davis had to prepare an avatar for a series of cameo appearances. He said he was unfazed. “Watson is that ultimate challenge,” he said. “I've got milliseconds of time where I need to present something that's compelling and dynamic.” He went about developing different patterns for the thirty-two cognitive states in the computer. The threads would flow into a plethora of patterns and colors as it waited, listened, searched, buzzed, and pronounced its answer. The threads would soar when Watson bounded with confidence, droop when it felt confused.
While all of this work was in progress, the
Jeopardy
challenge remained a closely guarded secret. But that changed in the spring of 2009. IBM's top executives, excited about the prospect of the upcoming match, wanted to highlight it in the company's annual shareholder meeting, scheduled for April 28 at the Miami Beach Convention Center. To prepare for the media coverage sure to follow, the computer scientists on Ferrucci's team were ferried into New York City to receive media training. They were instructed to focus on the human aspect of their ventureâthe people creating the machineâand to avoid broader questions concerning IBM, such as the company's financial prospects or its growing offshore business.
Only one problem. The agreement IBM and
Jeopardy
had in place was little more than a handshake. They had to nail it down. IBM, said executives, was hoping to hammer out a deal that would include airtime for corporate messaging, perhaps telling the history of Watson, how it worked, and what such machines portended for the Information Age. But once again, Harry Friedman and his
Jeopardy
colleagues had all the leverage. IBM needed an agreement right away.
Jeopardy
did not. So Big Blue got a tentative deal, pending Watson's performance over the coming year, in time for the Miami meeting. But other than that, the negotiators came back from Culver City empty-handed, with no promises of extra airtime or other promotional concessions.
Not everything hinged on the final game. IBM hoped that Watson would enjoy a career long after the
Jeopardy
showdown. They had plans for it to tour extensively, perhaps at company events or schools. This mobile Watson might be just a simulation, running on a laptop. Or maybe they could run the big Watson, the hundreds or thousands of processors at the Hawthorne labs, from a remote pickup. The touring Watson would have advantages, at least from Joshua Davis's perspective. Freed from the constraints of
Jeopardy
production, people would have more time to study the changing moods and states of Watson's avatar. Of course, even the touring machine would have to comply with the provisions surrounding
Jeopardy
's brand and programming. That meant more negotiations, most likely with Harry Friedman still holding most of the cards.
Even as the avatar took shape, no one knew what sort of display it would run on. Davis and the Ogilvy team considered many options to house the avatar, including one technology that projected holograms on a pillar of fog. But they eventually turned to more traditional displays. In that realm, few could compete with Sony,
Jeopardy
's parent company. Friedman said that Sony engineers conceivably could create a display for Watson, but that such an effort would probably require a call from IBM's Sam Palmisano to Sony's top executive, Howard Stringer. “We said, âThat's not going to happen,'” said one IBM executive. “We'll save that call for something more important.” Still Sony had a possibility. In December, a team of five Sony employees flew from Tokyo to the Yorktown labs with a prototype of a new display. It was a projection technology so secret, they said, that no one could even take pictures of it. IBM considered it a bit too small for the Watson avatar, the Japanese contingent flew home, and the search continued.
Vannevar Bush, the visionary who in the 1940s imagined a mechanical World Wide Web, once wrote that “electronic brains” would have to be as big as the Empire State Building and require Niagara Falls to cool them. Of course, the computers he knew filled entire rooms, were built of vacuum tubes, and lacked the processing power of a hand-me-down cell phone. While Davis continued to develop Watson's face, Ferrucci's team started to grapple with a new challenge. To date, their work had focused on building software to master
Jeopardy
. Watson was only a program, like Microsoft's Windows operating system or the video game Grand Theft Auto. To compete against humans, the Watson program would have to run on a specially designed machine. This would be Watson's body. It might not end up as big as a skyscraper, but it would be a monster all the same. That much was clear.
The issue was speed. The millions of calculations for each question exacted a price in time. Millisecond by millisecond, they added up. Each clue took a single server an average of 90 minutes to 2 hours, more than long enough for Jennifer Chu-Carroll's lunch break. For Watson to compete in
Jeopardy
, Ferrucci's team had to shave that down to a maximum of 5 seconds and an average of 3.
How could Watson speed up by a factor of 1,440? In late 2008, Ferrucci entrusted this job to a five-person team of hardware experts led by Eddie Epstein, a senior researcher. For them, the challenge was to divide the work Watson carried out in two hours into thousands of stand-alone jobs, many of them small sequences of their own. They then had to distribute each job to a different processor for a second or two before analyzing the results cascading in. This work, or scale-out, required precise choreographyâthousands of jobs calculated to the millisecondâand it would function only on a big load of hardware.
Epstein and his team designed a chunky body for Watson. Packing the computers closely limited the distance the information would have to travel and enhanced the system's speed. It would develop into a cube of nearly 280 computers, or nodes, each with eight processorsâthe equivalent of 2,240 computers. The eight towers, each the size of a restaurant refrigerator, carried scores of computers on horizontal shelves, each about as big as a pizza box. The towers were tilted, like the one in Pisa, giving them more surface area for cooling. In its resting state, this assembly of machines emitted a low, whirring hum. But about a half hour before answering a
Jeopardy
question, the computers would stir into action, and the hum would amplify to a roar. During this process, Watson was moving its trove of data from hard drives onto rapid access memory (RAM). This is the much faster (and more expensive) memory that can be searched in an instantâwithout the rotating of disks. Watson, in effect, was shifting its knowledge from its inner recesses closer to the tip of its tongue. As it did, the roar heightened, the heat mounted, and a powerful set of air conditioners kicked into high gear.
It was a snowy day in February 2010 when the marketing team unveiled prototypes of the Watson avatar for David Ferrucci. Ferrucci was working from home with a slow computer connection, so it took him several long minutes to download the video of the avatar in action. “It's amazing we can get a computer to answer a question in three seconds and it still takes fifteen minutes to download a file,” he muttered. When he finally had the video, the creative team walked him through different possible versions of Watson. They weren't sure yet whether the avatar would reside in a clear globe, a reddish sphere, or perhaps a simple black screen. However it was deployed, it would continuously shift into numerous states of listening and answering. Miles Gilbert, the art director, explained that the five bars of the Smarter Planet icon would stay idle in the background “and then pop up when he becomes active.”
“This is mesmerizing,” Ferrucci said. But he had some complaints. He thought that the avatar could show more of the computation going on inside the machine. Already, the threads seemed to simulate a cognitive process. They came from different parts of the globe and some grew brighter while others faded. This was actually what was happening computationally, he said, as Watson entertained hundreds of candidate answers and sifted them down to a handful and then just one. Wouldn't it be possible to add this type of real-time data to the machine? “It would be neat if all this movement was less random and meant more,” he said.
It sounded like an awful lot of work for something that might fill a combined six minutes of television time. “You're suggesting that there should be thousands of threads, and then they're boiled down to five threads, and ultimately one?” asked a member of the research division's press team.
“Yeah,” Ferrucci said. “These are threads in massive parallelism. As they come more and more together, they compete with each other. Then you're down to the five we put on the [answer] panel. One of them's the brightest, which we put into our answer. This,” he said emphatically, “could be more precise in its meaning.”
There was silence on the line as the artists and PR people digested this contribution from the world of engineering. They moved on to the types of data that Watson could produce for its avatar. Could the system deliver the precise number of candidate answers? Could it show its levels of confidence in each one rising and falling? Ferrucci explained that the machine's ability to produce data was nearly limitlessâthough he wanted to make sure that this side job didn't interfere with its
Jeopardy
play. “I'm tempted to say something I'll probably regret,” he said. “We can tell you after each question the probability that we're going to win the game.” He laughed. “Is there room for that analysis?”
It was around this time that Ferrucci, focusing on the red circular version of Watson, started to carry out image searches on the Internet. He was looking for Kubrick's
2001.
“You probably want to avoid that red-eye look,” he said, “because when it's pulsating, it looks like HAL. I'm looking at the HAL eye on the Web. It's red and circular, and kind of global. It's sort of like Smarter Planet, actually.”
The call ended with Ferrucci promising new streams of Watson data for Joshua Davis and his colleagues at Ogilvy. They had at least until summer to get the avatar up and running. But the rest of Watsonâthe faceless brain with its new bodyâwas heading into its first round of sparring matches. They would be the first games against real
Jeopardy
players, a true test of Watson's speed, judgment, and betting strategy. The humans would carry back a trophy, along with serious bragging rights, if they managed to beat Watson before Ken Jennings and Brad Rutter even reached the podium.
EARLY IN THE MATCH,
David Ferrucci sensed that something was amiss. He was in the observation room next to the improvised
Jeopardy
studio at IBM Research on a midwinter morning in 2010. On the other side of the window, Watson was battling two humansâand appeared to be melting under the pressure. One Daily Double should have been an easy factoid: “This longest Italian river is fed by 141 tributaries.” Yet the computer inexplicably came up with “What is _____?” No Tiber, no Rubicon, no Po (the correct response). It didn't come up with a single body of water, Italian or otherwise. It drew a blank.