Read Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man Online
Authors: Mark Changizi
Tags: #Non-Fiction
Therefore, although some hits ring with effectively no delay, other kinds of hits take their time before ringing. Hits can be hesitant, and the delay between hit and ring is highly informative because it tells us about the rigidities of the objects involved. Our auditory systems understand this information very well: they have been designed by evolution to possess mechanisms for sensing this gap and thus for perceiving the rigidity of the objects involved in events.
Because our auditory systems are evolutionarily primed to notice these hit-to-ring delays, we expect that languages should have come to harness this capability, so that plosives may be distinguished on the basis of such hit-to-ring delays. That is, we would expect that plosive phonemes will have as part of their identity a characteristic gap between the initial explosive sound and the subsequent sonorant. Language does, indeed, pay homage to the hit-ring gaps in nature, in the form of voiced and unvoiced plosives. Voiced plosives are like “b,” “g,” and “d,” and in these cases the sonorant sound following them occurs with negligible delay (Figure 7b, left). They even
sound
bouncy—“boing,” “bob,” and “bounce”—like a properly inflated basketball. Unvoiced plosives are like “p,” “k,” and “t,” and in these cases there is a significant delay after the plosive and before the sonorant sound begins, a delay called the
voice onset time
(Figure 7b, right). (Try saying “pa,” and listen for when your voice kicks in.) In English we have short voice onset times and long ones, corresponding to voiced and unvoiced plosives, respectively. Some other languages have plosives with voice onset times in between those found in English.
Figure 7
. Illustration that voiced plosives are like rigid, elastic hits, and unvoiced plosives like nonrigid, inelastic hits. These plots show the amplitude of the sound on the y-axis, and time on the x-axis.
(a)
The sound made by a stiff hardcover book landing on my wooden desk on the left, followed by the sound of that same book landing on my desk, but where a wrinkly piece of paper cushioned the landing (making it less rigid and less elastic).
(b)
Me saying “bee” and “pee.” Notice that in the inelastic book-drop and the unvoiced plosive cases—i.e., the right in
(a)
and
(b)
—there is a delay after the initial collision before the ringing begins.
Not only do languages utilize a wide variety of voice onset times—hit-to-ring gaps—for plosive phonemes, but one does not find plosive phonemes that don’t care about the length of the gap. One could imagine that, just as the intensity of a spoken plosive doesn’t change the identity of the plosive, the voice onset time after a plosive might not matter to the identity of a plosive. But what we find is that it always does matter. And that’s because the intensity of a hit in nature is not informative about the objects involved, but the gap from hit to ring
is
informative (as is the timbre). That’s why the gap from hit to ring is harnessed in language. And that’s why, as we saw earlier, the distinct plosive sounds at the start and end of words are treated as the same, despite being acoustically more different than are voiced and unvoiced plosives (like “b” and “p”).
In light of the ecological meaning of voiced versus unvoiced plosives, consider the following two letters from a mystery language:
◆
and
✴
. Each stands for a plosive, but one is voiced and the other unvoiced. Which is which? Most people guess that
◆
is voiced, and that
✴
is unvoiced. Why? My speculation is that it is because
◆
looks rigid, and would tend to be involved in hits that are voiced (i.e., a short gap from hit to ring), whereas
✴
looks more kinked, and thus would be likely to have a more complex collision, one that is unvoiced (i.e., a long gap between hit and ring). My “mystery language” is fictional, but could it be that more rigid-looking letters across real human writing systems have a tendency to be voiced, and more kinked-looking letters have a tendency to be unvoiced? It is typically assumed that the shapes of letters are completely arbitrary, and have no connection to the sounds of speech they stand for, but could it be that there are connections because objects with certain shapes tend to make certain sounds? This is the question Kyle McDonald—a graduate student at Rensselaer Polytechnic Institute (RPI) working with me—raised and set out to investigate. He found that letters having junctions with more contours emanating from them—i.e., the more kinked letters—have a greater probability of being unvoiced. For example, in English the three voiced plosives are “b,” “d,” and “g,” and their unvoiced counterparts are “p,” “t,” and “k.” Notice how the unvoiced letters—the “t” and “k,” in particular—have more complex structures than the voiced ones. Kyle McDonald’s data—currently unpublished—show that this is a weak but significant tendency across writing systems generally.
Rigid Muffler
As I walk along my upstairs hallway, I accidentally bump the hammer I’m carrying into the antique gong we have, for some inexplicable reason, hung outside the bedroom of our sleeping infant. I need to muffle it, quickly! I have one bare hand, and the other wielding the guilty hammer; what do I do? It’s obvious. I should use my bare hand, not the hammer, to muffle the gong. Whereas my hand will dampen out the gong ring quickly, the hammer couldn’t be worse as a dampener. My hand serves as a good gong-muffler because it is fleshy and nonrigid. My hand muffles the gong faster than the rigid hammer, yet recall from the previous section that nonrigid objects cause explosive hits with long hit-to-ring gaps. Nonrigid hits create rings with a delay, and yet diminish rings without delay. And, similarly, rigid hits create rings without delay, but are slow dampeners of rings.
These gong observations are crucial for understanding what happens to voiced and unvoiced plosives when they are not released (i.e., when the air in the mouth and lungs is not allowed to burst out, creating the explosive hit sound), which often occurs at word endings (as discussed in the section titled “Two-Hit Wonder”). When a plosive is not released, there clearly cannot
be
a hit-to-ring gap—because it never rings. So how do voiced and unvoiced plosives retain their voiced-versus-unvoiced distinction at word endings? For example, consider the word “bad.” How do we know it is a “d” and not a “t” at the end, given that it is unreleased, and thus there is no hit-to-ring delay characterizing it as a “d” and not a “t”?
My gong story makes a prediction in this regard. If voiced plosives really have their foundation in rigid objects (mimicking rigidity’s imperceptibly tiny hit-to-ring gap at a word’s beginning), then, because rigid objects are poor mufflers, the sonorant preceding an unreleased
voiced
plosive at a word
ending
should last longer than the sonorant preceding an unreleased
un
voiced plosive at a word ending. For example, the vowel sound in “bad” should last longer than in the word “bat.” The nonrigid “t” at the end of the latter should muffle it quickly. Are words like “bad” spoken with vowels that ring longer than in words like “bat”?
Yes. Say “bad” and “bat.” The main difference is
not
whether the final plosive is voiced—neither is, because neither is ever released, and thus neither ever gets to ring. Notice how when you say “bad,” the “a” gets more drawn out, lasting longer, than the “a” sound in “bat.” Most nonlinguist readers may never have noticed that the principal distinguishing feature of voiced and unvoiced plosives at word endings is not whether they are voiced at all. It is a seemingly unrelated feature: how long the preceding vowel lasts. But, as we see from the physics of events, a longer-lasting ring before a dampening hit
is
the signature of a rigid object’s bouncy hit, and so there is a fundamental ecological order to the seemingly arbitrary linguistic phonological regularity. (See Figure 8.)
Figure 8
. Matrix illustrating the tight match between the qualities of hits (not in parentheses) and plosives (within parentheses). For hits, the columns distinguish between rigid and nonrigid hits, and the rows distinguish between hits that initiate rings and hits that muffle rings. Inside the matrix are short descriptions of the auditory signature of the four kinds of hits. For plosives, the columns distinguish the analogs of rigid and nonrigid hits, which are, respectively, voiced and unvoiced plosives; the rows distinguish the analogs of ring-initiating and ring-muffling hits, which are, respectively, released and unreleased plosives. Together, this means four kinds of hits, and four expected kinds of plosives, matching the signature features of the respective hits. If the meaning of voiced versus unvoiced concerns rigid versus nonrigid objects, then we expect that plosives at word starts should have little versus a lot of voice-onset time, respectively, for voiced and unvoiced. And we expect that for plosives at word endings the voiced ones should reveal themselves via a longer preceding sonorant (slow to damp) whereas unvoiced should reveal themselves via a shorter preceding sonorant (fast to damp). Plosives do, in fact, modulate across this matrix as predicted from the ecological regularities of rigid and nonrigid hits at ring-inceptions and ring-dampenings.
Over the last half dozen sections of this chapter we have analyzed the constituents—the hits, slides, and rings—of events and language. Hits, slides, and rings may be the fundamental building blocks for human speech, but that alone doesn’t make speech sound natural. Just as natural contours can be combined in unnatural ways for vision, natural sound atoms can be combined unnaturally for audition. Language will not effectively harness our auditory system if speech combines plosives, fricatives, and sonorants in unnatural ways, like “yowoweelor” or “ptskf.” To find out whether speech sounds like nature, we need to understand how nature’s phonemes combine, and then see if language combines in the same way. For the rest of this chapter, we will look at successively larger combinations of sounds. But we turn first to the simplest combination.
Nature’s Syllables
My friend’s boy made a video of himself solving a Rubik’s Cube blindfolded, and then posted it on the Web. As I watched him put the blindfold on, pick up the cube, and begin twisting, I noticed something strange about the sound, but I couldn’t put my finger on what was unusual. Later, when I commented to my friend how his bright boy must owe it to inheritance, he replied, “Indeed, the apple doesn’t fall far from the tree. He faked it. The movie was in reverse.”
The world does not sound the same when run backward. What had raised my antennae when watching the Rubik’s Cube video was the unusual sounds that occur when one hears events in reverse. One of the first strange sounds occurred when he picked up the cube at the start of the video. Knowing now that it was shown in reverse, what appeared in the video to be him picking up the cube to begin unscrambling it was
actually
him setting the cube down after having scrambled it. Setting the cube down caused a hit and a ring, but in reverse what one hears is a ring coming out of nowhere, and ending with a sudden ring-stopping hit (the second voice of a hit, as discussed earlier in the section titled “Two-Hit Wonder”). That just doesn’t happen much in nature. When nature comes to the door, it knocks before ringing, not the other way around. Rings don’t
start
events. Rings are due to the periodic vibrations of objects, and objects do not typically ring without first being in physical contact with another object. Rings therefore do not typically occur without a hit or slide occurring first.
Hits, slides, and rings may be the principal fundamental building blocks for events, but rings are a different animal than hits and slides. Hits and slides involve objects in motion, physically interacting with other objects. Hits and slides are the backbone of the causal chain in an event. Rings, on the other hand, occur as a result of hits or slides, but don’t themselves cause more events. Rings are free riders, contributing nothing to the causality. Events do not have a ring followed by another ring. That’s impossible (although a single complex, or wiggly, ring is possible, as we discussed in an earlier section). And events never have an interaction (i.e., a hit or a slide) followed directly by another interaction without an intervening ring. Sometimes a ring will be inaudible, and so there will
appear
to be two interactions without an intervening ring, but physically there’s always an intervening ring, because objects that are involved in a physical interaction always vibrate to some extent. Events also always end with a ring, although whether it is audible is another matter.
The most basic way in which hits, slides, and rings combine is, then, this:
Interaction—Ring