Authors: Scott Weems
While writing this book I found it fascinating to learn that computer chess programs hold an advantage over humans only when contestants are given
less
time to ponder moves, not more. This seems counterintuitiveâif given unlimited time to search possible moves, computers should be stronger players than humans, not worse. With so much computing power, the extra time should be a benefit. But it isn't. Why? For the same reason that computers can't tell good jokes. They're not messy thinkers. They seek solutions linearly, rather than letting their minds argue and drift until some solution pops out of nowhere. All the time in the world doesn't help if you don't know how to look.
Over the previous four chapters, we've seen that our messy thinking has some benefits. One is humor. Messy thinking also helps with chess and
Jeopardy!
because it allows us to search vast arrays of possible moves holistically, using intuition rather than algorithms. In each of these cases, the goal isn't to derive some simple solution. It's to make unexpected associations, and even connect ideas that have never been connected before.
All this is a roundabout way of saying that despite Watson's victory, humans are still the only ones who are truly creative. Yet, scientists are making great strides in the field of computer intelligence, and in this chapter we'll see how. We'll explore the complex and mysterious nature of creativity, discovering how humor provides unique insights into what that really is. And we'll see what all this has to do with telling jokesâand why perhaps computers aren't as far away from being funny as we might think.
P
ATTERN
D
ETECTION AND
H
YPOTHESIS
G
ENERATION
       Â
What kind of murderer has moral fiber? A cereal killer.
I know this joke isn't particularly funny, but what if I told you it wasn't written by a person? What if I told you it was written by a computer?
The cereal killer joke was just one of many jokes created by a program that you can operate yourself online. Just visit the University of Aberdeen's website and look for a project called The Joking Computer. The program will ask you to choose a word to start withâthis is the nucleus around which your new joke will be formed. Then it will ask a few more questions, such as what words rhyme with the one you chose. And, finally, it will show the completed joke. When I logged on and tried it for myself, I came up with the following zinger:
What do you call a witty rabbit? A funny bunny.
Again, not terribly funny, but when the cereal killer joke was submitted to Richard Wiseman's LaughLab competition, it actually outperformed many of the human-created ones. It didn't winâit didn't even come close, ranking just below the middle of the packâbut it also didn't stand out as odd or incomprehensible. That in itself is quite an accomplishment.
These two jokes show how simple joke construction can be. But they're only mildly funny because they rely on a simple word trick without much surprise. One could even argue that they aren't creative, because they're so simple. The computer just picks a word, then looks for synonyms and rhymes until finally it comes up with a solution. There's not much thinking involved, so to what extent can such programs reveal how humans really think?
Quite a bit, as it turns out. Creative behavior can be as simple as combining old ideas in new ways. And, as we learned in previous chapters, jokes are funny because they force us to confront mistakes of thinkingâfor example, errors in scripts. When we create a joke we're not inventing new thoughts or scripts, we're connecting ideas in new ways.
“Humor is essentially a matter of combinatorial creativity,” says Margaret Boden, cognitive scientist, professor of informatics at the University of Sussex, and author of
The Creative Mind: Myths and Mechanisms.
“Elephant jokes, changing light bulb jokes: those are two styles
which are easy to recognize. All you need is the ability to connect ideas in novel ways and you have yourself a joke. Deciding why one joke is funnier than another oneâwell, that's another matter.”
This is the classic issue in computer science: computers find it easy to create new things but nearly impossible to assess their usefulness or novelty. This failing is most obvious in the realm of humor, because knowing how funny a joke is takes world knowledgeâsomething that most computers lack, even Watson. Consider, for example, a joke made by The Joking Computer's successor, the Joke Analysis Production Machine (JAPE):
What kind of device has wings? An airplane hangar.
The reason JAPE thought this joke was funny was that it classified
hangars
both as places for storing aircraft and as devices for hanging clothes. That's accurate (to the extent that we accept the misspelling of
hangers
), but most humans know that a long piece of wire holding a shirt isn't much of a “device.”
Even though it followed its formula correctly, JAPE was unsuccessful specifically because it failed to recognize the lack of humor in the final product. This challenge might also explain why there are so many joke production programs but so few specialized for joke recognition. To write a joke, all you need is a strategy, such as manipulation of rhymes or replacement of words with synonyms. That's the tool used by the online program Hahacronym, which uses a stored database of potential replacements to identify funny alterations of existing acronyms.
What does FBI stand for? Fantastic Bureau of Intimidation. MIT? Mythical Institute of Theology.
Of course, identifying good humor requires more than simple tricks, since there are no shortcuts for classifying the myriad ways to make a joke. Typically, humor recognition programs meet this challenge through massive computing power, like Watson did when answering
Jeopardy!
questions. Such programs look for language patterns, especially contradictions and incongruities. In this sense they're pattern detectors. But to be effective, they must access vast amounts of materialâas in, millions of pieces of text. (As a comparison, since starting this book you've read about forty thousand words yourself.)
One example of a pattern detection program is Double Entendre via Noun Transfer, also known as DEviaNT. Developed by Chloé Kiddon and Yuriy Brun at the University of Washington in Seattle, it identifies words in natural speech that have the potential for both sexual and nonsexual meanings. Specifically, it searches text and inserts the phrase “That's What She Said” during instances of double entendres (a task of great practical importance to frat houses and fans of
The Office
). DEviaNT is distinctive in that it's not just a joke creator but a humor recognition program too, because it takes a sense of humor to know when to “interrupt.”
DEviaNT was first taught to recognize the seventy-six nouns most commonly used in sexual contexts, with special attention to the sixty-one best candidates for euphemisms. Then it read more than a million sentences from an erotica database, as well as tens of thousands of nonerotic sentences. Each word in these sentences was assigned a “sexiness” value, which, in turn, was entered into an algorithm that differentiated the erotic versus nonerotic sentences. As a test, the model was later exposed to a huge library of quotes, racy stories, and text messages as well as user-submitted “That's What She Said” jokes. The goal was to identify instances of potential double entendreâa particularly interesting challenge, noted the authors, because DEviaNT hadn't actually been taught what a double entendre was. It had been given only lots of single entendres, and then was trained to have a dirty mind.
The researchers were quite pleased when DEviaNT recognized most of the double entendres it was presented, plus two phrases from the nonerotic sentences that had acquired sexual innuendo completely by accident (“Yes, give me all the cream and he's gone” and “Yeah, but his hole really smells sometimes”). DEviaNT's high degree of accuracy was especially impressive given that most of the language it was tested on wasn't sexual. In effect, it was trying to spot needles in haystacks.
But that's cheating, you might claim. DEviaNT didn't actually understand the sexual nature of the jokes. It didn't even know what it was reading. All it did was look for language patterns, and a very specific type at that. True, but these arguments also assume that “understanding”
involves some special mental state, in addition to coming up with the right answer. (Or, when recognizing bawdy jokes, knowing when to exclaim “That's what she said!”) As we'll soon see, that's a human-centric perspective. Maybe we underestimate computers because we assume too much about how they should think. To explore that possibility, let's turn to one last computer humor programâthe University of North Texas's one-liner program, developed by the computer scientist Rada Mihalcea.
Like DEviaNT, this program was trained to recognize humor by reading vast amounts of humorous and nonhumorous material. Specifically, it was shown sixteen thousand humorous “one-liners” that had been culled from a variety of websites, along with an equal number of nonhumorous sentences taken from other public databases. Mihalcea's goal was to teach the program to distinguish between the humorous sentences and the nonhumorous ones. But the program had two versions. One version looked for certain features previously established as common in jokes, such as alliteration, slang, and the proximity of antonyms. The second version was given no such hints at all and simply allowed the program to learn on its own from thousands of labeled examples. After training, both versions were shown new sentences and asked to identify which were jokes and which weren't.
Mihalcea was surprised to see that the trained version of the program, the one told which features are most common in jokes, did relatively poorly. Its accuracy hovered only slightly above chance at recognizing humor, meaning that the hints weren't very helpful. By contrast, the version that learned on its ownâusing algorithms such as Naive Bayes and Single Vector Classifier, which start with no previous knowledge at allâreached accuracy levels averaging 85 percent. This is a fairly impressive outcome, especially considering that many humans also have difficulty recognizing jokes, especially one-liners.
Mihalcea's finding is important because it shows that imposing our own rules on computers' thinking seldom works. Computers must be allowed to “think messy,” just like people, by wandering into new thoughts or discoveries. For humans this requires a brain, but for
computers it requires an algorithm capable of identifying broad patterns. This is essential not just for creating and recognizing jokes but for all artistic endeavors. Watson needed to be creative, too. The programmers at IBM didn't try to define what problem-solving strategies Watson used to win at
Jeopardy!
Rather, they allowed it to learn and to look for patterns on its own, so that it could be a flexible learner just like the human brain.
Some people may argue that people aren't pattern detectors, at least not like computers. If you believe this, you're not alone. You're also wrong. Recognizing patterns is exactly how the human brain operates. Consider the following example: “He's so modest he pulls down the shade to change his ___.” What's the first word that comes to mind when you read this sentence? If you're in a humorous mood, you might think of
mind,
which is the traditional punch line to the joke. If you're not, you might say
clothes.
Or maybe
pants.
I share this joke because it illustrates how the human brain, like a computer, is a pattern detector.
Cloze probability
is the term that linguists use to describe how well a word “fills in the blank,” based on common language use. To measure cloze probability, linguists study huge databases of text, determining the frequency at which specific words appear within certain contexts. For example, linguists know that the word
change
most often refers to replacement of a material object, such as clothes. In fact, there's a cloze probability of 42 percent that the word
clothes
would appear in the context set up by our exampleâwhich is why it was probably the first word you thought of.
Change
referring to an immaterial object, such as a mind, is much less likelyâcloser to 6 percent.
These probabilities have a lot to do with humor because, as already discussed, humor requires surprise, which in this case is the difference between 42 percent and 6 percent. Our brains, much like computers, do rapid calculations every time we read a sentence, often jumping ahead and making inferences based on cloze probability. Thus, when we arrive at a punch line like
mind,
a sudden change in scripts is required. The new script is much less expected than
clothes,
and so the
resolution makes us laugh. Computer humor recognition works the same way, looking for patterns while also identifying the potential for those patterns to be violated.
Why, then, aren't computers better at cracking jokes than humans? Because they don't have the world knowledge to know which low-probability answer is funniest. In our current example,
mind
is clearly the funniest possible ending. But
jacket
has a low cloze probability too. In fact, the probability that people will refer to
changing their jacket
is about 3 percentâhalf the probability they'll talk about
changing their mind.
Why is the second phrase funny whereas the first one isn't? Because, with our vast world knowledge, people understand that changing our mind isn't something that can be seen through a window.