Read The Most Human Human Online
Authors: Brian Christian
Prolixity
The more words spoken the better the chance of distinguishing lies from truthfulness
.
–PAUL EKMAN
Add to all the above the fact that the Turing test is, at the end of the day, a race against the clock. A five-second Turing test would be an obvious win for the machines: the judges, barely able even to say “hello,” simply wouldn’t be able to get enough data from their respondents to make any kind of judgment. A five-hour one would be an obvious win for the humans. The time limit at the Loebner Prize contest has fluctuated since its inception, but in recent years has settled on Turing’s original prescription of five minutes: around the point where conversation starts to get interesting.
Part of what I needed to do was simply to make as much engagement happen in those minutes as I physically and mentally could. Against the terseness of the deponent I offered the prolixity and logorrhea of the author. In other words, I talked a
lot
. I only stopped typing when to keep going would have seemed blatantly impolite or blatantly suspicious. The rest of the time, my fingers were moving.
If you look at Dave’s transcripts, he warms up later on, but starts off like he’s on the receiving end of a deposition, answering in a kind of minimal staccato:
JUDGE
:
Are you from Brighton?
REMOTE
:
No, from the US
JUDGE
:
What are you doing in Brighton?
REMOTE
:
On business
JUDGE
:
How did you get involved with the competition?
REMOTE
:
I answered an e-mail.
Like a
good
deponent, he lets the questioner do all the work
7
—whereas I went out of my way to violate that maxim of “A bore is a man who, being asked ‘How are you?’ starts telling you how he is.” (And I might add: “And doesn’t stop until you cut him off.”)
JUDGE
:
Hi, how’s things?
REMOTE
:
hey there
REMOTE
:
things are good
REMOTE
:
a lot of waiting, but …
REMOTE
:
good to be back now and going along
REMOTE
:
how are you?
When I saw how stiff Dave was being, I confess I felt a certain confidence—I, in my role as the world’s worst deponent, was perhaps in fairly good shape as far as the Most Human Human award was concerned.
This confidence lasted approximately sixty seconds, or enough time to glance to my other side and see what Doug and
his
judge had been saying.
Fluency
Success in distinguishing when a person is lying and when a person is telling the truth is highest when … the interviewer and interviewee come from the same cultural background and speak the same language
.
–PAUL EKMAN
In 2008, London
Times
reporter Will Pavia misjudged a human as a computer (and thus voted the computer in the other window a
human) when a confederate responded “Sorry don’t know her” to a question about Sarah Palin—to which he incredulously replied, “How can you possibly not know her? What have you been doing for the last two months?” Another judge that year opened his conversations with a question about the “Turner Prize shortlist,” the annual award to a contemporary British visual artist, with similarly hit-or-miss results: Most Human Computer winner Elbot didn’t seem to engage the question—
JUDGE
:
What do you think of this year’s Turner Prize shortlist?
REMOTE
:
Difficult question. I will have to work on that and get back to you tomorrow.
—but neither, really, did the confederate in that round:
JUDGE
:
What do you think of this year’s Turner Prize shortlist?
REMOTE
:
good I think. Better than the years before i herad
JUDGE
:
Which was your favorite?
REMOTE
:
Not really sure
Runner-up for 2008’s Most Human Computer was the chatbot “Eugene Goostman,” which pretended to be an
immigrant
, a non-native speaker of English with an occasionally shaky command of the language:
REMOTE
:
I am from Ukraine, from the city called Odessa. You might have heard about it.
JUDGE
:
cool
REMOTE
:
Agree :-) Maybe, let’s talk about something else? What would you like to discuss?
JUDGE
:
hmm, have you heard of a game called Second Life?
REMOTE
:
No, I’ve never heard such a trash! Could you tell me what are you? I mean your profession.
Is this cheating, or merely clever? Certainly it’s true that if language is the judge’s sole means of determining which of his correspondents is which, then any limitations in language use become limitations in the judge’s overall ability to conduct the test. There’s a joke that goes around in AI circles about a program that models catatonic patients, and—by saying nothing—perfectly imitates them in the Turing test. What the joke illustrates, though, is that seemingly the less fluency between the parties, the less successful the Turing test will be.
What, exactly, does “fluency” mean, though? Certainly, to put a human who only speaks Russian in a Turing test with all English speakers would be against the spirit of the test. What about dialects, though? What exactly counts as a “language”? Is a Turing test peopled by English speakers from around the globe easier on the computers than one peopled by English speakers raised in the same country? Ought we to consider, beyond national differences, demographic ones? And where—as I imagine faltering against a British judge’s cricket slang—do we draw the line between
language
and
culture
?
It all gets a bit murky, and because in the Turing test all routes to and from intelligence pass through language, these become critical questions.
All of a sudden I recalled a comment that Dave Ackley had made to me on the phone, seemingly offhand. “I really have no idea how I would do as a confederate,” he said. “It’s a little bit of a crapshoot whether the judges are your kind of people.” He’s right: if language is the medium with which we confederates must prove ourselves to the judges, then there are any number of things that can aid or impair it, from shared interests or reference points, to generational gaps, to nuances of allusion and slang.
Among the four confederates, Dave and I are Americans, Doug is Canadian, and Olga is a Russian-born South African. Among the four judges, two are English, one is an American expatriate to England, and one is a Canadian. I had read logs of Loebner Prizes past and
had seen the problems that arise when cultural mismatch or cultural disfluency rears its head.
I wondered: Would any such cultural issues come to bear in 2009? All my preparations, my investigations, all the good advice I’d gotten from lawyers, linguists, researchers, and interviewers, wilted compared to actually having something in common and hitting it off with someone. To “speaking the same language,” however literally or figuratively. Would that play in this year?
I didn’t have to wait long for my answer; any uncertainty I’d had on that score, not to mention the optimism I’d begun to feel about my own chances, faded fast when I glanced at Doug’s terminal:
JUDGE
:
Hey Bro, I’m from TO.
REMOTE
:
cool
REMOTE
:
leafs suck
REMOTE
: ;-)
JUDGE
:
I am jist back froma sabbatical in the CS Dept. at U or T.
REMOTE
:
nice!
JUDGE
:
I remember when they were a great team.
JUDGE
:
That carbon date me, eh?
REMOTE
:
well, the habs were a great team once, too …
REMOTE
:
*sigh*
JUDGE
:
YEH, THEY SUCK TOO.
REMOTE
:
(I’m from Montreal, if you didn’t guess)
Doug and his judge had just discovered that they were both from Canada. And they started to let rip with abbreviations and nicknames and slang and local references.
And they started to talk about hockey
.
I was in trouble.
1.
Generally speaking, software has three ways of going awry: crashing while the code is being compiled into a program (“compile-time”), crashing when the program is being run by a user (“run-time”), or running smoothly but producing weird behavior. This is roughly analogous to sentences that are ungrammatical, un-meaningful, and false—to which we could reply “Huh!?,” “Mu,” and “Nope,” respectively.
2.
That Wikipedia contains relatively detailed instructions on how to parry such questions is indicative of how difficult they are to deal with.
3.
Also, there’s no point in trying to mask your interest—whether it be sexual, social, academic, professional, or other—in the person anyway, because the very fact that you’re talking to them signals it: they’re not stupid.
4.
A common complaint among “pickup artists,” I learned, is that they get tons of phone numbers but no one calling back—a telltale sign of a maximin approach.
5.
Graph theory talks about the “branching factor” or the “degree” of a vertex, meaning the number of nodes in the graph to which a given node connects. The conversational analogue is how many distinct continuations or segues there are from the present remark or topic; for my money, the sweet spot is around two or three.
6.
Apparently his German ancestors, surname Nurmacher, were in fact “moneyers,” or coin smiths, by trade.
7.
Prober recalled asking one deponent if he could state his name for the record. His answer: “Yes.”
Each is trying not to give himself or herself away, each is preserving fundamental loneliness, each remains intact and therefore unfructified. In such experiences there is no fundamental value
.
–BERTRAND RUSSELL
A tree or shrub can grow and bloom. I am always the same. But I am clever
.
–RACTER
As I read through the Loebner Prize transcripts, reams (literally) of small talk and chitchat and “how’s it going?” and basic world-knowledge questions, I was struck by the following dialogue from the 1997 contest:
REMOTE
:
Hi, my name is Catherine, what’s yours?
JUDGE
:
My name is Elliott.
REMOTE
:
Well, it’s nice to meet you, Elliott. Did you see that story on CNN last night about the lesbian couple who came out at a White House party on Sunday?
JUDGE
:
Yes, I did. I think it may all be a publicity stunt for Ellen.
REMOTE
:
Ellen Degeneres was one of them - she was kissing her lover in front of Bill Clinton. Do you think he was very embarrassed?
JUDGE
:
I think he want the gay vote. He will do anything to get it and keep it. He does not have a conservative base.
REMOTE
:
Are you still a Friend of Bill Clinton?
JUDGE
:
I am not sure if he has any real friends. He has achieved alot and has left alot of people behind.
REMOTE
:
You really don’t like the guy—well, at least that’s definite—I thought most people were just indifferent or numb by now.
JUDGE
:
I think we could have a more honest and dynamic leader. One able to face the real dangers facing the country and address them with courage and intelligence. Rather then listening to public opinion polls.
REMOTE
:
I guess that means you really think there’s serious behind Whitewater, the Special prosecutor and all that stuff?
JUDGE
:
We have two dead people, Brown and guy who commited suicide, we have lost papers. And we have Clintons people willing to tell all. I think that Hillary will be found guilty of something.
REMOTE
:
But how do you actually feel about him, now in the number two term?
JUDGE
:
He is too smart to get caught. He deserved to named hotel manager of the year.
REMOTE
:
OK, but what happens now—does it all drift on, with everyone waiting to see how much his friends have said about him under threat of lots and lots of jail–or will something dramatic actually happen?
I have to say, this is up there as one of the most trenchant, coherent, focused, and substantive conversations in the entire annals of the Turing test.
Which is why it’s so surprising that Catherine is a computer.
Specifically, she’s a program called Converse, written by science author, entrepreneur, and International Computer Games Association president David Levy, along with University of Sheffield AI professor (now emeritus) Yorick Wilks. This conversation really shook me—how did they do that? It shook me, that is, until I read the
other
judges’ conversations with “Catherine.” Observe:
REMOTE
:
Are you still a Friend of Bill Clinton?
JUDGE
:
Never met him.
REMOTE
:
You really don’t like the guy—well, at least that’s definite—I thought most people were just indifferent or numb by now.
JUDGE
:
Have you seen the movie Mars attacs?
REMOTE
:
Give me a break — I’m just here to answer dumb questions — not that! What were we talking about … Ok.. I guess that means you really think there’s serious behind Whitewater, the Special prosecutor and all that stuff?
JUDGE
:
I’m cold.
REMOTE
:
But how do you actually feel about him, now in the number two term?
JUDGE
:
Hey, don’t you get it? I don’t want to talk about Clinton.
REMOTE
:
Sorry — my attention slipped for a moment—can we pushon to something more gripping. where was I.. oh yes OK, but what happens now—does it all drift on, […]