The First Word: The Search for the Origins of Language (30 page)

Steels distributed his robots’ bodies throughout the real world, with some going to Paris, London, Tokyo, and Amsterdam, among other cities. The virtual entities occupying the bodies, the agents, were able to teleport through the Internet into specific bodies set up in each lab. Only once they were established inside a body could they communicate about what they saw, and only agents that inhabited the same physical space were allowed to talk to one another. The agents were like strangers at an art gallery, not looking at one another but standing side by side, commenting on the painting before them. This ensured not only that the agents had something to talk about but that they talked about the same physical world.

Steels was inspired by the twentieth-century philosopher Ludwig Wittgenstein’s habit of using games to study language. A game captures language in its most basic form, Steels said. It is a simple interaction between individuals within a specific setting. Steels’s agents played a guessing game. One agent would pick an object in the world and generate a word for it. Its agent interlocutor had to guess what the word referred to. Each entity took turns at being a speaker or a listener. If one correctly guessed what the other was referring to, the game was successful.

Steels didn’t program any word lists or mental and perceptual categories into the agents. They had to segment the images they looked at into sensory data, such as color and position on the board, and then the speaker agent would pick an object based on these data (for example, the red circle in the upper-left part of the board). Then it would choose a word to tell the hearer about the object; that word—for example, “malewina” or “bozopite”—was selected at random. If the listener agent guessed the word’s meaning correctly, it might then go on to use it with other robots, and in this way a correspondence between a word and a meaning developed within the population.

Steels found that the game would never get off the ground unless the robots had another channel for communication and verification, so he enabled them to point at the board by moving their camera and zooming in on an area (the other agent could sense the direction the camera was pointing). In one of the largest versions of the talking heads experiment, eight thousand words were generated for five thousand concepts, and a basic vocabulary of fundamental concepts, like up, down, left, right, green, large. There was no central dictionary or record defining each word; they existed only as tokens in the mind of each agent. Meaning was created when agents were able to make perceptually grounded distinctions, such as “left” or “right” and “green” or “red.” The distinctions arose when agents identified the object under discussion, separate from other objects in the context.

Since conducting the largest of the talking heads experiments in 1999, Steels and his co-workers have built more complexity into their experiments. One researcher has robots not playing games so much as communicating in order to feel emotion. In another, Steels has robots communicating with ears and vocal tracts to further increase their challenge. The lab is also looking at case marking, tense, open-ended semantics, language processing, and the different types of grammars that can emerge.

Steels is also interested in the way that structure spontaneously arises in biological systems where random behavior is reinforced by positive feedback. He was particularly inspired by Jean-Louis Deneubourg’s work on ants. Hundreds, sometimes thousands, of ants organize themselves into long chains when they are carrying material from a food source to their nest. The chains are adaptive: you can sweep away part of one, put objects in its way, remove individual ants or add new ones, and the chain will emerge again until the food source is depleted. There is no central coordinator instructing the ants on what to do and how to organize themselves in the face of disruption. Nevertheless, a greater intelligence—a design—emerges out of the local behavior of many relatively unintelligent individuals. Other systems where order emerges spontaneously from chaos are termite nest building, the growth of cell tissue, the way that cellular slime amoeba form an aggregate entity, and flocking in birds.
¹⁴

The language that evolved in the guessing game has many of the same features as these systems, said Steels. It exhibited an absence of central planning, an adaptation to changing circumstances, and a resilience to the unexpected appearance and disappearance of elements (whether objects or individuals). Meaning and linguistic structure simply arose out of interaction between bodies in space.

Steels has recently taken embodiment to more complicated levels. In 2001 he started work with AIBO robots, which are among the most complex robots ever built.
¹⁵Each AIBO is an independent entity. Steels and his co-workers place the robots in various situations—on a floor with objects like boxes and colored balls, for example—and like the talking heads they must build both a conceptual system and a way of talking about it. The robots develop speaking and hearing processes while constantly trying to map their world (as they move about in it). They also have to work out where another is in space, and if one asks, “Where are you?” and the other answers, “To the left of the box,” the first AIBO has to decipher what “left” might mean. His group has also just finished a series of experiments in Tokyo with the QRIO humanoid robot. Working with the QRIO allowed them to implement many of the mechanisms humans use for joint attention, like pointing with a finger.

Because the robots engage in real image analysis (as opposed to being fitted with programs that dictate how to see the world), many errors arise in their interactions. But that’s the point, explained Steels. When successful communication does evolve, it shows how language is possible in difficult circumstances. “There is no reason,” Steels said, “to think that language processing is any less complicated than vision processing—which is very complicated.” He added: “The complexity of language is incredible, but we shouldn’t be afraid of that.”

As they grope their way through the world, Steels’s robots end up evolving rudimentary grammar as well as words and concepts. Syntax arises mainly from a situation of ambiguity. In phrases such as “red ball next to green box,” it is not clear to agents whether “red” goes with “box” or “ball” (unless they already have grammar). When an ambiguity like this is detected, the agent will invent a grammatical pattern to make his intended meaning clear to the listener. This suggests to Steels that human language ability is an emergent adaptive system that is created by a basic cognitive mechanism rather than by a genetically endowed language module.

Neither robotic nor digital linguistic systems can tell us exactly how language evolved. Indeed, the communication systems that arise in Kirby’s modeling or Steels’s experiments may or may not have the characteristics of human languages. What each can do is show how language might have evolved, and this is invaluable data. We can’t think these concepts through with our brain alone—instead we had to achieve this stage of technological innovation with computers fast enough to model such complicated processes and robots that can enact them. Kirby’s virtual linguistic creatures and Steels’s real ones suggest that in order to get to something that looks a lot like language, you may not need a language-specific mental device. Humans do a lot more with language than simple pointing and referring, but in order for language to become established, the ability to perform these steps is essential.

The most elusive part of the language evolution mystery is working out why all these things happened. Why did our species evolve in the way it did? Why does culture evolve the way it does? And even more complicated, how and why do they evolve together? The rollover of language change is thousands of times more rapid than biological evolution. We might find it difficult to talk with English speakers from a thousand years ago, but we wouldn’t have any trouble procreating with them. The final and greatest challenge for language evolution is discovering how the language suite and language itself evolved together.

14.
Why things evolve

G
enes mutate as a matter of course. If the carrier of a mutated gene is lucky, some effect of the new version will improve its chances of having offspring that survive, and then those offspring will have their own successful offspring, and so on and so forth. Every animal alive today stands at the end of a long line of lucky entities that begat lucky entities that begat lucky entities. They may not have been happy or fulfilled or at peace with their lives, but that’s not the point.

For a long time people have wondered why a particular trait has evolved. What was it about that trait and the environment in which it arose that meant it was a good thing to have? These considerations have been the most contentious part of the language evolution debate: Why did language evolve?

Part of the problem with posing this question in decades past was that even though scientists were using the same words, they were asking a fundamentally different question. At that time, language was still generally thought of as a single entity. Regarded as such, it left the question truly unanswerable, for different components of language have evolved in different stages in the history of life. If you ask, “Why did the whole thing evolve?” the implication is that it happened all at once, and no evolutionary pressure is up to the task of bringing forth everything from nothing.

The other problem with asking this question is that to some extent you have to imagine the answer. No one can ever know all the details of what happened when our distant ancestors began to talk. The only way to be completely sure is to travel back in time to witness the process, and we can’t do that. And there’s the problem of language fossils. There are none, at least none as definitive as the femur that Lucy left behind. As Chomsky has pointed out: “There is a rich record of the unhappy fate of highly plausible stories about what might have happened, once something was learned about what did happen—and in cases where far more is understood [than with language evolution].”
¹

However, the same objections could be raised about any attempt to explain the origins of the universe. In
Fire in the Mind,
George Johnson reminds us that the big bang scenario is still only a theory. Nevertheless, the intense layering of evidence and theoretical modifications that have accumulated since it was first proposed have given the theory the heft of unassailable truth. Today, says Johnson, the theory remains a work in progress that underpins the productive work of thousands of astronomers and physicists all over the world.

Cautions against employing “just-so” stories and fairy tales to trace language evolution had great resonance when less data were available about what happened and when it happened in the development of language in evolutionary time. Now the accumulation of evidence from genetics, comparative biology, behavioral studies, linguistics, and neuroscience makes such stories more feasible by placing powerful constraints on them.

With the information scientists now have about gesture, thought, and behavior both in humans and in close and distant species, they are better equipped to carve out the problem space and define the outlines of their story. They know more about where to look for clues and what paths not to take in a possible reconstruction of language evolution. It will never be possible to recover and rebuild every step of the way. But significant steps, major biological traits, and evolutionary landmarks can be identified. And while there are a number of ways in which the facts about humans and life and language evolution can be mapped onto the known evolutionary path that brought us to where we are today, data gathered over the next few years will further refine those conjectures.

In this context the prohibition against asking “why?” is starting to look as unscientific as the kind of fairy tale it once warned against. Indeed, there’s something a little disingenuous about the insistence that because you can’t prove it, you shouldn’t imagine it. Imagination is at the core of the scientific process. All the tests and experiments in the world mean nothing without the hunch or the story—the hypothesis—that kicks the process off. Now, instead of not venturing into the imagination or simply not declaring what they suspect, many scientists in the field of language evolution choose to propose a story
and
be up-front about how much their theory has been informed by data and how much is not yet verifiable.

Michael Arbib, one of the researchers who has investigated mirror neurons, has an idea about what he thinks might have happened and why it might have happened, based on the rigorous work he has carried out on the brain. Arbib’s approach is the opposite of the traditional Chomskyan one. Instead of emphasizing the fundamental sameness of language in a search for universals, he is interested in the different ways that people solve problems with language. As he explained: “Once you get beyond the fact that you’ve got to have words for actions, you’ve got to have words for objects and the agents that act upon them then, I think, you get into the realm of what people have learned over the centuries to do, rather than something that must be in the brain. People advertising universal grammar focus on what is common. I’m just struck by how varied the approaches people in different communities have to solving communicative problems.” Instead of tracing the parameters of language back to genes, Arbib thinks that most of grammar and the way that structure relates to meaning are products of culture. “My feeling is that most of it is probably a tribute to human ingenuity. I mean, kids can surf the Web, and nobody says there’s a Web-surfing gene.”

Like Lieberman and others, Arbib disputes the idea that language is one big package, a kind of all-or-nothing proposition. When it is conceived of in this way, he said, “I think you make some very foolish claims.” The alternative is to take the historical point of view. “You can imagine the first protolanguage as ten words or a hundred words. Then a lot of things can occur over the generations and crystallize out. Language becomes very mysterious if you have to make it a single biological evolutionary leap.”

One of Arbib’s most important points is that language is not inevitable. He encourages thinking about possible stages by stepping away for a while from the end state—the current form of language. We have it today not because we took one crucial turn at some point in the past but because we took hundreds of crucial turns. And for each of these turns, you can’t know that you are going to get language at the end of it. Each step is critical for the value it adds at that point in time. Linguistic evolution was a tumultuous natural experiment that started with a particular brain structure and hundreds of variables—a couple of ice ages, constantly evolving predators and prey, a changing social structure. The process lasted many millennia, there was no control group, there may have been false starts along the way, and the completely unpredictable result of this random experiment was modern language.

The mirror neurons discovery set Arbib on a course that has most recently ended with his fully articulating an idea that many researchers assume but few have examined in detail; that is, language evolution had to occur in a layering of stages. Arbib calls it an ascending spiral. So far he has proposed about ten different stages, though he warns that even a ten-stage theory is still a long way from accounting for all the steps along the journey to language.

Initially, he says, our ancestors must have developed a capacity for complex imitation that went beyond that of even modern apes, and greatly increased the possibility of social transmission of novel skills. Beyond that there must have been some kind of gestural protosign that broke through the fixed set of primate vocalizations and was supported by the mirror system. Gesture, in his view, was an ancient scaffolding on which language started to build. You had to use protosign to build the scaffolding, and then sounds became parasitic. Speech did not arrive directly, and the first gestural steps of language would have been quite simple. “It doesn’t make sense to have a full sign language and then go to vocalization,” said Arbib. “It’s hard to build up a rich tradition just through gesture. You need sound to flesh out that scaffolding.” So there were oral and facial gestures as well, maybe some association between lip movements and what lips are often used for, like eating.

Pantomime probably provided the crucial bridge from imitation of practical skills to imitation of the skills required for proto-sign (and much later for language). “The claim is something like this,” he explained:

You’ve got a system for primate calls, but it’s closed. You can’t add a new call to it. So you use a different system, you go through a different route, to be able to create new patterns of sound that can be paired with new meanings. And then we eventually get to the stage where you can get those sounds and meanings together to create new meanings on the fly. But there must have been an intermediate time when you didn’t create new meanings like that. My argument is that if you look at the ability of the hands to move skillfully, then you can imagine that there was an evolutionary advantage in being able to imitate patterns of hand movement, and having imitated patterns of hand movement (once you had a brain in place that could do that), it’s a plausible step to begin to use patterns of hand movement for communication—pantomime. And the beauty of pantomime is that if you pantomime carefully, and maybe do it three times when the person doesn’t get it, you can convey novel meanings.

Why we didn’t ultimately become a species that is constantly engaged in pantomime with no speech, said Arbib, is because “people aren’t very good at recognizing someone else’s pantomime.” As he explained, “It doesn’t have to be that you suddenly have a society in which everybody was doing pantomime and conveyed thousands of meanings, but maybe in a particular year, two or three pantomimes were added to the tribe’s vocabulary by becoming somewhat ritualized to make them easier to perform and understand.”

After this, Arbib suspects that humans developed protospeech:

The story goes (if it went anything like that) that in the end you can’t disambiguate pantomime by just doing better pantomime. If I flap my hands to imitate the flapping wings of a bird, do I mean “fly”? Do I mean “bird”? Do I mean “bird flying”? So maybe some genius comes along and invents some way of saying, “Well, if I do this sound and I’m flapping my hands, I mean the bird. If I do another sound while I’m flapping my hands, I mean the flying. You need to make distinctions. So the notion is you got to the stage where a sequence of gestures can convey meaning, and you got across the idea that meaningless gestures are part of conveying meaning. It’s no longer pantomime.

Vocalization was involved all along, said Arbib. There may have been stages where the pantomime was entirely vocal. “My purely fictional example is that you bite the piece of fruit, it’s sour, you go—” He puckered and made a sucking sound, before continuing:

The act of genius there is to go from having that as a reaction when it’s too late and you’ve already bitten the fruit, to making that noise, before somebody bites the fruit, to warn them, “Don’t waste that fruit; it’s too sour to eat.” So the notion is that the pantomime would give you the possibility of conveying a rich sense of meanings. The arbitrary gestures would come in to begin to allow you to save effort and avoid ambiguity. The gesture in the end is conventionalized. It doesn’t have to be a fresh pantomime all the time. Then the sound can come into play, and it can begin to become part of an integrated performance. Beyond this, you begin to find certain conventional distinctions that are easier to convey, and then you begin to build a phonology, and then, as you begin to build a phonology, you begin to put those meaningless phonological gestures together to take over more of the conventionalized meaning. I want to claim that this skill was parasitic on increasing manual dexterity and the mirror system that supported it, which increases the cortical motor representation, and then we can expand that to the new use in the vocal system. So I prefer that story at the moment.

Arbib’s account could be further elaborated by explaining why the pantomimes were taken up and spread throughout the group. Perhaps it was a case of sexual selection, as Pinker and Bloom suggested in 1990. In this scenario, the mime is a male who impresses females with his linguistic skills, thus creating more opportunity to procreate, having more children, and spreading the predisposition for expression. The same principle explains why male peacocks develop such spectacular tails.

Tecumseh Fitch, on the other hand, argues that sexual selection is particularly unlikely as an explanation for linguistic evolution. As with the peacock, this kind of selection generally results in a marked difference between the sexes with regard to a particular trait. However, not only do men and women both use language, but young females are more adept with language than are young males. Other pressures may have come into play. Perhaps a change in the available game required better hunting techniques, which in turn required more precise language. Maybe the step from one form of protolanguage to another occurred when hominids reached a critical level of population density—just like the orangutans with their Neesia-splitting techniques.

In an interview Chomsky suggested there had to be a point in time when a rewiring of the human brain that allowed people to use recursion took place. Perhaps sixty to seventy thousand years ago in a small hominid group in East Africa, a single individual was born with a genetic mutation. This mutation would have caused a restructuring of the brain and instantly bequeathed the affected person with the capacity for unbounded thought. Linguistic communication would not have begun at this moment, because the individual with the mutation was the only one with the capacity for it. But even a slight advantage spreads quickly throughout a population, and after this new rewiring was passed on to his or her offspring, the entire group would eventually become language-ready.

Other books

The Night Before by Lisa Jackson

The Elementals by Saundra Mitchell

The Sheikh's First Christmas - A Warm and Cozy Christmas Romance by Rayner, Holly

The Man Who Would Be F. Scott Fitzgerald by David Handler

Seeing Magic (The Queen of the Night Series Book 1) by Laura Emmons

Hooked on the Game (The Sterling Shore Series #1) by Owens, C.M.

Truth Avenged (Green Division Series Book 1) by Monahan, Ashley

Blood of Dragons by Bonnie Lamer

Suspicion by Lauren Barnholdt, Aaron Gorvine

Finding Forever by Ken Baker