Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man (10 page)

Rube Goldberg machines excel at producing very long events, all part of a single causal chain. Like most events, Rube Goldberg events are built mostly out of hits, slides, and rings. Again letting
b
,
s
, and
a
stand for hits, slides, and rings, Rube Goldberg events sound something like
basabababasababababababasabababasa
, although the chains are very often much longer than even this. If events were typically like Rube Goldberg events, then even if spoken words have many of the auditory features found in events, words would be much too short to be event-like. Events are, however, not typically Rube Goldberg-like. Events are, instead, much more typically like a pen thrown on a table, the generic event we discussed in the previous section. Pen-on-table events may consist of a hit, hit, and slide. Or possibly just a hit and a slide. Or even just a lone hit. Most events have just several physical interactions or fewer, much nearer in length to spoken words than to Rube Goldberg events.

This is what nature-harnessing expects. Spoken words across human languages are not only built out of sounds like those in solid-object physical events, but words tend to have the
size
of typical physical events. Words tend to sound like events with up to several interaction sounds—plosives or fricatives—not, say, ten. And although words with a single interaction sound are allowed, two or three interaction sounds are more common, again like solid-object physical events.

Words are not only approximately the size of solid-object physical events—i.e., having several interaction sounds—words also take the amount of
time
for a typical event. This is something I have thus far ignored. But notice that plosives, fricatives, and rings do not just have similar acoustic characteristics to hits, slides, and rings; they also occur over periods of time similar to those typical of hits, slides, and rings. For example, although I described both hits and plosives as nearly instantaneous explosions, the notion of “instantaneous” depends on the time scale relevant to the listener—what’s instantaneous to a human may not be instantaneous to a fly. Hits and plosives are both instantaneous explosions as heard by
human
ears. This is why plosives sound hitlike; for example, if a hitlike sound were stretched out it would, instead, sound more slidelike (something we discussed in the earlier section called “Hesitant Hits”). Similarly, fricatives and sonorants tend to occur over time scales similar to the slides and rings of physical events. Typical syllables of human speech—e.g., of the form
ba
or
sa
—tend to have a duration approximately on the order of tenths of seconds, roughly the same time scale as is common for physical events involving macroscopic objects. In fact, you’ll notice in Figure 4 earlier that the physical and linguistic analogs (e.g., a hit and “k”) are on the same scale for the time (x) axis.

Words tend to be built out of the constituents of natural solid-object physical events, and to have approximately the size and temporal duration of such events. But are words actually
structured
like solid-object physical events? Are the natural-sounding phonemes and syllables put together into natural-sounding words? In particular, I’m interested in asking whether the sequences of physical interactions that occur in events—the hits and slides—are similar to the sequences of plosives and fricatives in words. My students and I analyzed the “event structure” of common words across 18 languages, and for each language we measured the distribution of six event types: hit (
b
), slide (
s
), hit-hit (
bb
), hit-slide (
bs
), slide-hit (
sb
), and slide-slide (
ss
). For example, “tea” is a
b
, “far” is an
s
, and “faker” is an
sb
.

Figure 11
. The freqency of the structure types found in words across 18 widely diverse languages (listed in the legend of Figure 9). (Standard error bars shown. See Appendix for details.)

To estimate how common these simple event types are in nature, students Elizabeth Counterman, Kyle McDonald, and Romann Weber counted the kinds of events occurring in a wide variety of videos. In deciding upon the kinds of videos to sample, we were not especially interested in having videos of, say, the savanna. Recall our discussion in the previous chapter, where we observed that there are “hard cores” of nature likely found in most or all habitats with solid objects crashing about. In choosing twenty videos from which to enumerate solid-object physical events, we simply aimed for a variety of scenarios in which solid-object physical events occur, including cooking, children playing, family gatherings, assembly instructions, and acrobatics. Each student acquired data on the events occurring, and did so using only the visual modality (that is, the videos were on mute); this helped to deal with a worry that our auditory systems are biased by speech so that we hear speechlike structure in events (akin to seeing faces in clouds). The three observers identified an average of 650 events across the 20 videos. Figure 12 shows the average results for the videos as a dotted line, overlaid on the language data from Figure 11. One can see the close similarity in the plots. (Notice that a simple model assuming hits are more common than slides does not explain why
bs
occurs more often than
sb
in the language data.)

Figure 12
. The relative frequency of simple event types in videos and in language. One can see their considerable similarity. (Standard error bars shown. See Appendix for details.)

Again, we find the signature of solid-object physical events—of nature—in spoken language! Our final story in this chapter on speech concerns the sounds of speech above the level of words: the structure of whole phrases and sentences.

Unresolved Questions

Earlier in the chapter I remarked on how audition is nature’s more terse modality, only speaking up when there’s an event. In real life, though, there can often be “event overload.” I’m sitting at an airport right now, and I just counted 30 distinct sound events occurring around me over the last 30 seconds. How can we possibly pick out the sounds that matter to us amongst all the noise? There
are
, in fact, auditory cues that can tell an observer whether an event is relevant to him or her. In particular, these cues can tell the observer that “an event you should pay attention to is coming.”

The most obvious such auditory cue is loudness. As a sequence of events nears me—be it footsteps, the whir of a whiffle ball, or the siren of a police car—it gets louder. Loudness is also worthy of attention because louder events can sometimes be the more energetic events. The ecological importance of loudness may underlie the role of emphasis in language, the way that more important words or sentences are sometimes spoken more loudly. That louder speech is more important speech is one of those things that is so obvious it is difficult to notice. But its analog in vision is not true—brighter parts of a scene are
not
the more important parts. Brightness in a scene is usually just a matter of where the sun is, and where it glares off objects. The importance of loudness modulations in speech needs explaining, and the explanation is found in the structure of nature.

In addition to loudness, events in nature have another sound quality that is even more informative: pitch (the musical, note-like quality of sound). The pitch of an event depends not on how close it is to the observer, but on the
rate
at which it is getting closer to the observer. To understand why, let’s imagine standing next to a passing train, the standard example used to explain the Doppler effect. The main observation is that the pitch of the train’s whistle starts high and changes to low as it passes. More specifically, note that when the train is far away and approaching, its whistle is at a fixed high pitch, that is, a pitch that is not changing. (It is actually falling, but negligibly and imperceptibly.) The pitch only begins falling audibly when the train is very close to passing you. And shortly after the train has passed you, the pitch has dropped to nearly its low point, so that from then on the pitch stays effectively constant and low. This drop in pitch would apply in any scenario where sequences of events are passing us by. It also occurs any time
we
are moving past noisy objects. Our auditory systems can sense pitch changes on the order of half a percentage of the sound frequency, sufficient for sensing (if not consciously) the pitch changes due to our walking by a source of sound.

The important conclusion of these observations is that a typical sequence of events will tend to have this signature
falling
pitch (unless headed
directly
toward you). One might speculate that this is why language has a tendency to signal the approaching end of a sentence with a falling intonation—a drop in pitch.
That’s
what events typically sound like in nature.

Sequences of events do not
always
have pitches that fall, however. Pitches can sometimes rise, but special circumstances are required. First, let’s consider what happens if you stand
on
the railroad tracks rather than beside them. Now the pitch of the train stays the same, right up to the moment that it hits you. Of course, at the instant it hits you, the sound you would be hearing if you were conscious abruptly drops to a lower pitch (because it passes you in a single brain-crushingly short instant), and stays at that pitch as the train moves away. A constant pitch accompanied by increasing loudness is the signature of an impending collision. That same loudness increase, but with a pitch
decrease
, signals a near miss.

What could make a pitch
increase
? Considering the train again, imagine first standing beside the tracks as it approaches, but then walking onto the tracks before it gets there. Because you have moved to a position more directly in the train’s path of motion, the frequency your ears receive from the train will increase as you walk onto the tracks. Alternatively, the pitch would also increase if you stayed off to the side, but the train jumped the tracks and headed toward you at the last moment. A pitch increase is the signature of a sequence of events that is changing its direction in
your
direction. This is true not only when an approaching sequence of events veers toward you, but also when a receding sequence of events veers so as to begin turning around, perhaps to come back and get you after a miss. An increase in the pitch is, in a sense, more important than loudness. An event might be loud and getting louder, but if its pitch is decreasing, it is not going to hit you. But if an event is not so loud, but has a pitch that is
increasing
, that means it is aiming itself more toward you (or you are aiming more toward it).

A rising pitch suggests, then, that the sequence of events is not finished. Events are coming your way. Or, if the sequence of events is moving away from you, then a rising pitch means it is beginning to turn around. This unresolved nature of rising pitches may be the reason why rising pitches in many languages tend to indicate a question. The spoken sentence, “Is that the elephant that stepped on your car?” is a request for further speech. And what better way to sound unresolved than to mimic the sound of nature’s unresolved events?

This is a natural lead-in to the rest of the book, which deals with the origins of music, where loudness and pitch are even more crucial. We will see that “unresolved” pitch even tends to get resolved in melody.

Summary Table

In our modern lives we hear hits, slides, and rings all around us, and we also hear the sounds of speech. They
mean
fundamentally different things to us, and so our brains quickly learn to treat them differently. Our brains can treat them differently because, despite the many similarities between solid-object physical event sounds and speech sounds that I have pointed to throughout this chapter, there are ample auditory cues distinguishing them (e.g., the timbre of a voice is fundamentally different from the timbre of most solid objects). And once our brains treat these sounds as fundamentally different in their ecological meaning, it can be next to impossible to hear that there are deep similarities in how they sound. A fish struggling up onto land for the first time, however, and listening to human speech intermingled with the solid-object event sounds in the terrestrial environment, might find the similarity overwhelming. “What is wrong with these apes,” it might wonder, “that they spend so much of their day mimicking the sounds of solid-object physical events?”

Other books

Rule of God (Book Three of the Dominium Dei Trilogy) by Thomas Greanias

Making Choices (Black Shamrocks MC Book 2) by Kylie Hillman

FIRST CASE - Novella (McRyan Mystery Series Prequel) by Stelljes, Roger

Inheritance by Simon Brown

Playing With Fire by Pope, Christine

Reawakening the Dragon: Part Two by Jessie Donovan

Horse Fever by Bonnie Bryant

Psycho Killer by Cecily von Ziegesar

ocalypse (Book 10): Drawl (Duncan's Story) by Chesser, Shawn

Key the Steampunk Vampire Girl and the Tower Tomb of Time (9781941240076) by Becket