One formerly fashionable way of avoiding the insoluble problem of fixing the meaning of a word was to imagine it as the compound product of sublinguistic mental units or “features” of meaning. Take the three words
house
,
hut
, and
tent
. They can all be used to refer to dwellings of some sort, but they refer to three different kinds of dwelling. The task of distinctive feature analysis was to find the minimal semantic constituents that would account for the meaning relations among these three semantically related terms. All three are “marked” with the feature [+dwelling], but only
house
also has the two features [+permanent] and [+brick].
Tent
would be marked [–perma-nent] [–brick] and
hut
could be marked [+permanent] [–brick]. How wonderful it would be if all words in the language could be decomposed into atoms of meaning in this way. The meaning of a word would then be fully specified through the list of the distinctive features that mark it. If you could show that it was possible to account for the differences in the meanings of all the words in a language by the distribution of a finite set of semantic features, then you could go further still. You would be in a position to build a great Legoland of the mind, in which all possible meanings could be constructed out of irreducible, binary building blocks of sense.
To map some area of vocabulary (let alone a whole language) using only such elementary features of meaning is an enticing prospect, but it runs up against a fundamental problem: what criterion to use to establish the list of the elementary semantic features themselves. Common sense no doubt dictates that [±animate] and [±female] are among the distinctive features relevant to the meaning of the term
woman
and that [±chrome-plated] is not. But common sense appeals to our total experience of the nonlinguistic world as well as to our ability to find a way through the language maze: it is precisely the kind of fuzzy, vague, and informal knowledge that distinctive feature analysis seeks to overcome and replace. Despite the usefulness of binary decomposition for some kinds of linguistic description and (in far more complex form) in the “natural language processing” that computers can now perform, word meanings can never be fully specified by atomic distinctions alone. People are just too adept at using words to mean something else.
Such quasi-mathematical computation of “meaning” is equally unable to solve an even more basic problem, which is how to identify the very units whose meaning is to be specified. To ask what a word means (and translators often are asked to say what this or that word means) is to suppose that you know what word you are asking about, and that in turn requires you to know what a word is. The word
word
is certainly a familiar, convenient, and effective tool in the mental toolbox we use to talk about language. But it is uncommonly hard to say what it means.
Computers must know the answer, because they count words. That’s no consolation to us, however. What computers know about words is what they’ve been told, which comes down to this: a word is a string of alphabetic characters bounded on left and right by a space or one of these typographical symbols:—/ ? ! : ; , .
5
Computers don’t need to know what a word means to carry out the operations we ask of them. But we do! And if in some instance we really don’t, then we try to find out from a dictionary, from an acquaintance, or from listening to how other people speak. But all kinds of problems remain.
In languages such as English the identification of words is more art than science. Publishers have their own style sheets with rules for deciding whether couples have break-ups or break ups or breakups; but ordinary people also want to know if “to break up” should be counted as one, two, or three words. Yet nobody can really say.
6
English prepositional verbs provide unending employment for language experts who want to determine what a word is. They come in three or four parts. Sometimes they stay together—“Did you remember
to take out
the trash?”—and sometimes they don’t: “I promised
to take
my daughter
out
to see a film.” Does that mean that “to take out” is a word (or three) or two different words—“to take out” and “to take … out”—(or six) that look the same? Compilers of alphabetical dictionaries adopt practical solutions, but not the same ones, leaving the underlying question—what word is this?—unresolved. Teachers of English as a second language know the best answer to the question of how many words there are in a prepositional verb. If you want to know how to use the language properly, don’t ask.
Given the labyrinthine complexity of the variable terminologies and conflicting expert solutions to the conundrum of establishing what the word units are in perfectly ordinary English expressions, it seems fairly obvious that an ordinary user of a language such as English doesn’t need to know what a word is—or what word it is—in order to make sense. Wordhood is often a useful notion, but it is not a hard-edged thing.
Other languages undermine the wordness of words in a variety of different ways. German runs them together to make new ones.
Lastkraftwagenfahrer
(truck driver) is of course a single word in ordinary use, but it can easily be seen as two words written next to each other (
Lastkraftwagen
plus
Fahrer
, “truck” plus “driver”), or as three words run together (
Last
plus
Kraftwagen
plus
Fahrer
, “freight” plus “motor vehicle” plus “driver”), or as four (
Last
plus
Kraft
plus
Wagen
plus
Fahrer
, “freight” plus “power” plus “vehicle” plus “driver”). Hungarian also melds what we think of as many separate words, but in a different and equally elegant way. What a computer would count as a three-word expression,
Annáékkal voltunk moziban
, for example, would be expressed in English by around a dozen words: “We were [
voltunk
] at the cinema [
moziban
] with Anna and her folk” (that’s to say, friends or relatives or hangers-on, without distinction). The modest suffix
-ék
is all that is needed to turn Anna into a whole group, and the “glued-on” or agglutinated addition
-kal
says that you were part of it, too. Indeed, at my younger daughter’s wedding in London in 2003, in honor of her Hungarian grandparents I was able (after doing my homework) to raise a toast
édeslányaméknak
, which is to say in one word “to my dear daughter’s husband, in-laws, and friends.”
Classical Greek has no proper word for
word
; moreover, in manuscripts and monuments from the earlier period, Greek is written without spaces between words. But that does not automatically mean that Greek thinkers had no concept of a basic unit of language smaller than the utterance. There is evidence of word dividers in Greek written in Linear B and Cyprian, ancient scripts that predate the Greek alphabet, and in various other ways a notion of “basic unit” does seem to emerge even in a language that supposedly has no “word” for the unit thus distinguished.
7
Even Hungarians recognize that some “words” are more basic than others, that beneath the practically infinite welter of possible agglutinated and compounded forms lie nuggets that are the elementary building blocks of sense.
Gyerek
is Hungarian for “child,” and though it may almost never occur in that form in any actual expression, it is nonetheless the “root” or “stem” corresponding to the English stem word
child
. Without an operative concept of the meaning units of which a language is made, it would be hard to imagine how a dictionary could be constructed. And without a dictionary, how would anyone ever learn a foreign tongue, let alone be able to translate it?
Understanding Dictionaries
Translators use dictionaries all the time. I have a whole set, with the
Oxford English Dictionary
in two volumes and Roget’s
Thesaurus
in pride of place, alongside monolingual, bilingual, and picture dictionaries of French idioms, Russian proverbs, legal terminologies, and much else. These books are my constant friends, and they tell me many fascinating things. But the fact that I seek and obtain a lot of help from dictionaries doesn’t mean that without them translation would not exist. The real story is the other way around. Without translators, Western dictionaries would not exist.
Among the very earliest instances of writing are lists of terms for important things in two languages. These bilingual glossaries were drawn up by scribes to maintain consistency in translating between two languages and to accelerate the acquisition of translating skills by apprentices. These still are the main purposes of the bilingual and multilingual glossaries in use today. French perfume manufacturers maintain proprietary databases of the terms of their trade to help translators produce promotional material for export markets, as do lathe manufacturers, medical specialists, and legal firms working in international commercial law. These tools assist translators mightily, but they do not lie at the origin of translating itself. They are the fruits of established translation practice, not the original source of translators’ skills.
Sumerian bilingual dictionaries consist of roomfuls of clay tablets sorted into categories—occupations, kinship, law, wooden artifacts, reed artifacts, pottery, hides, copper, other metals, domestic and wild animals, parts of the body, stones, plants, birds and fish, textiles, place-names, and food and drink, each with its matching term in the unrelated language of Sumer’s Akkadian conquerors.
1
As they are organized by field, they correspond directly to today’s SPDs, or “special purpose” dictionaries—
Business French
,
Russian for the Oil and Gas Industries
,
German Legal Terminology
, and so forth. Some of them are multilingual (as are many of today’s SPDs) and give equivalents in Amoritic, Hurritic, Elamite, Ugaritic, and other languages spoken by civilizations with which the Akkadians were in commercial if not always peaceful contact.
2
From ancient Mesopotamia to the late Middle Ages in Western Europe, word lists with second-language equivalents went on serving the same purposes—to regularize translation practice and to train the next generation of translators. Characteristically, they mediate between the language of conquerors and the language of the conquered retained as a language of culture. What did not arise in the West at any time until after the invention of the printed book were general or all-purpose word lists giving definitions in the same language.
The Western monolingual dictionary—“the general purpose” dictionary, or GPD—is a late by-product of the ancient tradition of the translator’s companion, the bilingual word list, but its impact on the way we think about a language has been immense. The first real GPD was launched by the Académie Française in the seventeenth century (volume 1, A–L, appeared in 1694); the first to be finished from A to Z was Samuel Johnson’s dictionary of the English language, which came out in 1755.
These monuments mark the invention of French and of English as languages in a peculiar, modern sense. Once they had been launched, every other language had to have its own GPD—failing which, it would not be a real language. It wasn’t just rivalry that sparked the great race to produce national dictionaries for every “national language.” The need to compile self-glossing lists of all the words in a language also expressed a new idea of what kind of a thing a language was, an idea taken directly from what had happened in English and French.
The Chinese tradition is entirely different.
3
Its rich history of word lists is essentially linked to the tradition of writing commentaries on ancient texts, not at all with the business of translating foreign languages, in which traditional Chinese civilization seems to have had as little interest as did the Greeks. Early Chinese dictionaries were organized by semantic field and gave definitions roughly like this:
If someone calls me an uncle, I call him a nephew
(from the
Erh Ya
, third century B.C.E.). It was not easy to find a word in the
Erh Ya
, and many of the definitions given were too vague to be useful in the way we would now want a dictionary to be. It was a tool for cultivating knowledge of more ancient texts, so as to maintain refinement in speech and script. The second kind of glossary of classical Chinese arose in the first century C.E., and it listed characters organized by their basic written shapes, or “graphic radicals.” These works gave no clues as to how the words should be pronounced, and their purpose was mainly to assist the interpretation of ancient written texts. The third type of early Chinese lexicon was the rhyme dictionary—handbooks for people who needed to know what rhymes with what, because rhyming skills were tested in examinations for the imperial civil service. It was not until the seventeenth century that a device for classifying Chinese characters in a way that made them easily retrievable was devised by the scholar Mei Ying-tso, a few years before Jesuit missionaries produced the first Western-style bilingual dictionaries of Chinese (into Latin, then Portuguese, Spanish, and French). Traditional Chinese dictionaries, lexicons, and glossaries do not list “all the words of the language” in the way that Western dictionaries seek to do; they list written characters and they organize them by semantic field, or by written forms, or by sound. Their profound difference perhaps makes clearer the extent to which Western dictionary making is also a “regional” tradition arising from the particular nature of the script that we have.
What is a dictionary for? The utility of a bilingual glossary is obvious. But what is the purpose of a monolingual one? A GPD seems to imply that speakers of the language do not know it very well, as if English, to take the first real example, were to some degree foreign to speakers of English themselves. Why else would they need a dictionary to translate the words of the language for them? The conceptualization of anything as grand and comprehensive as the
Dictionnaire de l’Académie
involves treating the written form of a spoken language as a thing that can be learned and studied not by foreigners but by native speakers of that language. It’s a peculiar idea. By definition, what a monolingual dictionary codifies is precisely the ability to speak that users of the dictionary possess.
The second presupposition of general-purpose dictionaries is that a list of all the word forms of a language is possible. We have become so accustomed to GPDs that it takes a moment to realize just what an extraordinary proposition that is. We may grant that dictionaries are always a little bit out-of-date, that even the best among them always miss something we would have liked to see there—but we should stop to take such thoughts a step further. To try to capture “all the words of a language” is as futile as trying to capture all the drops of water in a flowing river. If you managed to do it, it wouldn’t be a flowing river anymore. It would be a fish tank.
Once Latin had ceased to be a spoken language, it became possible to list all the word forms occurring in Latin manuscripts. That was done many times over, just as Roman scholars had compiled lexicons of words in Homer’s Greek, and Buddhist monks had listed all the words in sacred Sanskrit texts.
4
The monolingual dictionary of modern times treats French, or English, or German as if it were Latin—and that was the point. It raises the vernacular to the level of the language of the scholars. It proves that speaking English requires and also shows as much cultivation as using Latin. The monolingual dictionary was in the first place a two-pronged weapon for the improvement and the assertion of the common man.
“Improvement” and “assertion” may seem to go hand in hand, but those locked hands are really engaged in an arm-wrestling match. The first alphabetical lists of words in vernacular languages were extensions of traditional language-teaching tools: Robert Estienne’s
Les Mots francois selon l’ordre des let-tres ainsi que les fault escrire; tournez en latin, pour les enfans
, first published in 1544, helped French-speaking children learn the rudiments of their language of culture, namely Latin, but incidentally gave them a tool for writing the vernacular correctly. (Spelling in French was quite variable in the sixteenth century. As Estienne was a printer, he had a stake in the standardization of the written language.) Over the following century, as both English and French absorbed more words from each other and from classical languages, alphabetical listings of technical, philosophical, and foreign words became quite popular. In 1604, a Coventry schoolmaster, Robert Cawdrey, brought out a work whose lengthy title explains the social and cultural basis for dictionary making ever since:
A Table Alphabeticall of hard usual English wordes, with the interpretation thereof by plaine English words, gathered for the benefit & help of Ladies, gentlewomen, or any other unskilful person. Whereby they more eas-ilie and better understand many hard English wordes, which they shall heare or read in the Scriptures, Sermons, or elsewhere, and also be made able to do the same aptly themselves
.
The step from compiling such socially useful works for the improvement of the undereducated classes to making dictionaries of
all
words may seem natural. It could be accounted for by the spread of literacy, the growth of the book trade, an obsession with the making of more and more specialized glossaries, and the wish to bring all this language lore together in one place. But that would be a retrospective illusion. Intellectually, there is a huge gulf between works, however extensive, that lay down the meanings of “hard” or technical or foreign terms to help less well-educated folk, and an attempt to list all the words that are spoken by the speakers of a given language. To make that leap you have to think of the language you speak as a finite entity. “The English language” has to be conceptualized not as a social practice but as a thing in itself. That is why the history of the English dictionary is the history of the invention of a “language” in the sense that we now understand that word.
Dictionaries alone aren’t responsible for the thingification of natural languages, but they crystallized a peculiar modern view of what it means to have a language. The spread of the printed book is also a major factor in the converging circumstances and technologies that gave us the ideas that have dominated modern language study ever since, and profoundly affected our understanding of what translators do.
GPDs, from Samuel Johnson’s to Webster’s and from Brock-haus to Robert, list the words that are part of the language. In so doing they also tell us that the language we speak is a list of words. From its origin in the Hebrew Bible, the nomenclaturist understanding of what a language is was given a huge, definitive boost by the emergence of the modern typographical mind.
Which words are entitled to be listed in a dictionary that gives not a field-restricted set of words but the words of a whole language? Well, the words that people use. All of them? To the extent that is even possible, GPDs forfeit their historical claim to be instruments of improvement. That’s the arm wrestling. Laying down what words mean and how they should best be used, as was Cawdrey’s laudable plan, runs directly counter to the wider project of listing all the words people actually use with the varied meanings they may give to them. That’s why monolingual
reference dictionaries have grown so impractically large. The solution to that problem is vividly illustrated by the career of one of Georges Perec’s fictional characters:
Cinoc … pursued a curious profession. As he said himself, he was a “word-killer”: he worked at keeping Larousse dictionaries up to date. But while other compilers sought out new words and meanings, his job was to make room for them by eliminating all the words and meanings that had fallen into disuse.
When he retired … he had disposed of hundreds and thousands of tools, techniques, customs, beliefs, sayings, dishes, games, nicknames, weights and measures … He had returned to taxonomic anonymity hundreds of varieties of cattle, species of birds, insects and snakes, rather special sorts of fish, kinds of crustaceans, slightly dissimilar plants and particular breeds of vegetables and fruit; and cohorts of geographers, missionaries, entomologists, Church Fathers, men of letters, generals, Gods & Demons had been swept by his hand into eternal obscurity.
5
GPDs of any language, and quite especially those using an alphabetical script, are always of potentially infinite size, because no language can have fixed boundaries in time or space, and there can be no ultimate, definitive division of a social practice into a finite set of components. To escape from this dilemma while pursuing the broad project of mapping a particular language, Peter Mark Roget devised his
Thesaurus
(“treasure” in Greek), which uses not the arbitrary order of the alphabet but the natural order of the world as its organizing principle. He established six general classes of “real things,” which are not material things but ideas: Abstract Relations, Space, Matter, Intellectual Faculties, Voluntary Power, and Sentient and Moral Powers. These he divided into categories, then broke down each category into lesser groups of ideas, and only at this point does he list all the words and expressions that may be used to communicate the idea. “Sentient and Moral Powers,” for example, incorporates the category of “Personal Affections,” one of whose groups is constituted by “Discriminative Affections,” among which figures the subgroup “Aggravation.” That’s where you find a raft of words and phrases including
anger
,
ire
,
fury
,
to get up someone’s nose
,
to piss someone off
, and
to get someone’s goat
—a long list of synonyms all of which express some quality or variety of aggravation. Roget’s
Thesaurus
is an extraordinary achievement. Its structure harks back to those Sumerian word hoards on clay tablets sorted by thematic category, but as it contains very few words like
polyester
,
recitative
, or
crankset
, it offers no support at all to those who would like to see a language as a list of the names of things. Rather, it displays to a spectacular degree the sheer redundancy of the vocabulary set that we have, with dozens of words giving only minutely different shades of meaning for almost exactly the same thing (anger, ire, fury …). Roget shows language to be a rich, illogical, and complicated tool for making fine and often arbitrary distinctions—for discriminating, separating out, and saying the same thing in different ways.