The Adventure of Automated Language-Translation Machines
The reluctance of European peoples to retain Latin or to adopt some other transmission language—such as Esperanto—for the dissemination of important information has created a costly and difficult set of translation tasks, carried out under time pressures unimaginable in earlier ages. Now that nearly all other aspects of news transmission are carried out not by couriers but by electronic devices, it seems natural to ask why the core activity itself cannot be handled likewise, by automatic translation machines.
Although it is still in its infancy, machine translation has had an eventful and uneven history. It first arose in dramatic historical circumstances and in response to an overriding political need. It wasn’t initiated by an explicit act of political will, like the language rules of the European Union, but its launching ground was the climate of terror at the start of the Cold War. The United States had developed and used the atomic bomb. For the time being it had a monopoly on this terrible weapon. How long would the monopoly last? When would the Soviet Union catch up? One way of guessing the answer was to comb through all the research journals being published in the U.S.S.R., looking for clues as to the state of knowledge in the relevant disciplines.
1
The journals were in Russian. The United States needed either to train up a veritable army of Russian–English scientific translators—or to invent a machine that would do the job for them.
But it takes a long time to constitute a large group of translators from a language not widely known. There was no obvious source of English-educated, scientifically literate Russian translators in 1945, and so the authorities began to look toward machines. There were good reasons to think they could help with the urgent task of tracking the Soviets’ ability to design an atomic bomb.
The Second World War had fostered great advances in cryptography, the making and breaking of secret codes. Statistical techniques had been developed for decoding messages even when the language that had been encoded was not known. The astounding successes of the code breakers at the Bletchley Park site in England prompted some thinkers to wonder whether language itself could not be treated as a code. In a famous memorandum written in July 1949, Warren Weaver, then a senior official with the Rockefeller Foundation, found it “very tempting to say that a book written in Chinese is simply a book in English which was coded into the ‘Chinese code.’ If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?”
2
Weaver was aware of the pioneering work of Claude Shannon and others in the nascent disciplines of information theory and cybernetics and could see that if language could be treated as a code, then there would be huge development contracts available for mathematicians, logicians, and engineers working on the new and exciting number-crunching devices that had only just acquired their modern name of “computers.” But the temptation to see “language as code” comes from much deeper sources than just an intuition that it would create interesting jobs for very smart boys.
A code, or cipher, is a way of representing a piece of information in a way that is receivable only if the (secret) key to the code
is available. However sophisticated the key, however complicated the algorithm that turns the “source” into “code,” there is always a discoverable relationship between the expression in code and the encoded expression. If a language itself is a code of that kind, what does it encode? There’s only one possible answer in the long Western tradition of thinking about language since the time of the Greeks, and that answer is: meaning (sometimes called “thought”). A translation machine would need to strip away from the actual expression in language A all that is “code,” so as to access the real thing that it encodes, namely, the actual, irreducible, plain, and basic meaning of the expression. It’s really no more than a rehearsal of the ancient idea that language is the dress of thought. Weaver himself proposed the following analogy:
Think of individuals living in a series of tall closed towers, all erected on a common foundation. When they try to communicate with one another, they shout back and forth, each from his own closed tower. It is difficult to make the sounds penetrate even the nearest towers, and communication proceeds very poorly indeed. But when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers.
3
That dream of “easy and useful communication” with all our fellow humans in the “great open basement” that is the common foundation of human life expresses an ancient and primarily religious view of language and meaning that has proved very hard to escape, despite its manifestly hypothetical nature. For what language would humans use to communicate with one another in the “great open basement”? The language of pure meaning. At later stages in the adventure of machine translation and modern linguistics, it came to be called “interlingua” or “the invariant core” of meaning and thought that a communication in any language encodes.
The task that machine-translation pioneers set themselves was therefore almost identical to the task of the translator as expressed by many modern theorists and philosophers: to discover and implement the purely hypothetical language that all people really speak in the great open basement of their souls.
How was that to be done by machines? Plenty of intellectual machinery already existed that seemed designed for the purpose. Ever since the Romans started teaching their young to read and write Greek, language learners in Western tongues have always been told they have two basic tasks: to acquire vocabulary in the foreign tongue, and to learn its grammar. That’s why we have bilingual dictionaries separate from our grammar books, which give the set of rules by which the “words” in the vocabulary may be combined into acceptable strings. That’s what a language is, in our ancient but undimmed language theology: a Meccano set, made up of one part nuts, bolts, girders, beams, cogwheels, and perforated bars (let’s say, prepositions, verbs, nouns, adjectives, particles, and determiners) and, for the other part, rules about how to fix them together. A nut goes on a bolt but not on a cogwheel, just as a verb clicks on a subject before and an object after …
It was theoretically possible at the start of the machine-translation adventure (and it soon became practically possible as well) to store a set of words on a computer, divided into the grammatical classes the Greeks and Romans devised. It was equally possible to store two sets of words, one for Russian, one for English, and to tell the computer which English word matched which Russian one. More dubious was the proposition implicit in Weaver’s fable that you could bring people down from their separate towers to the common basement—that’s to say, tell a computer what to do to unwrap the meaning of a sentence from the form of the sentence itself. To do that, the computer would first need to know the entire grammar of a language. It would have to be told what that consists of. But who knows the entire grammar of English? Every language learner quickly realizes that systematic regularities are frequently overruled by exceptions of arbitrary kinds. Every speaker of a native language knows that she can (and frequently does) break the “rules” of grammar. A complete linguistic description of any language remains an aspiration, not a reality. That is one of the two reasons why the first great phase of machine translation hit the skids. The second is that even humans, who can be assumed to be in full possession of the grammar of their language, still need a heap of knowledge about the world in order to fix the meaning of any expression—and nobody has figured out how to get a computer to know what a sentence is
about
. A classic conundrum that computers could not solve is to attribute the correct meanings to the words in the following two sentences: “The pen is in the box” and “The box is in the pen.” Understanding them calls on knowledge of the relative sizes of things in the real world (of a pen-size box and a sheep pen, respectively) that can’t be resolved by dictionary meanings and syntactic rules. In 1960, the eminent logician Yehoshua Bar-Hillel, who had been hired by MIT specifically to develop “fully automated high-quality translation,” or FAHQT, sounded a testy retreat:
I have repeatedly tried to point out the illusory character of the FAHQT ideal even in respect to mechanical determination of the syntactical structure of a given source-language sentence … There exist extremely simple sentences in English—and the same holds, I am sure, for any other natural language—which, within certain linguistic contexts, would be … unambiguously translated into any other language by anyone with a sufficient knowledge of the two languages involved, though I know of no program that would enable a machine to come up with this unique rendering unless by a completely arbitrary and
ad hoc
procedure …
4
That pretty much put an end to easy money from the grant-giving foundations. But the establishment of the European Union in 1957 provided a new political impetus—and a new funding source—for the development of the tools that Bar-Hillel thought impossible. Ambitions were scaled down from FAHQT to more feasible tasks. As computers grew in power and shrank in size, they could more easily be relied upon for tasks that humans find tiresome, such as checking that a given term has been translated the same way each time it occurs in a long document. They could be used for compiling and storing dictionaries not just of technical terms but of whole phrases and expressions. The era not of fully automatic translation but of CAT—computer-aided translation—began. Private companies started developing proprietary systems, for although the big demand came from transnational entities such as the EU, there was a real need for such tools among major companies producing aircraft, automobiles, and other goods to be sold all over the world.
It is easier to achieve good results from CAT when the input conforms not to a natural language in its raw and living state but to a restricted code, a delimited subspecies of a language. In an aircraft-maintenance manual you find only a subset of the full range of expressions possible in English. To produce the hundred or so language versions of the manual that are needed through an automatic-translation device, you do not need to make the device capable of handling restaurant menus, song lyrics, or party chitchat—just aircraft-maintenance language. One way of doing this is to pre-edit the input text into a regularized form that the computer program can handle, and to have proficient translators post-edit the output to make sure it makes sense (and the right sense) in the target tongue. Another way of doing it is to teach the drafters of the maintenance manuals a special, restricted language—Boeinglish, so to speak—designed to eliminate ambiguities and pitfalls within the field of aircraft maintenance. This is now a worldwide practice. Most companies that have global sales have house styles designed to help computers translate their material. From computers helping humans to translate we have advanced to having humans help computers out. It is just one of the truths about translation that shows that a language is really not like a Meccano set at all. Languages can always be squeezed and shaped to fit the needs that humans have, even when that means squeezing them into computer-friendly shapes.
Computer-aided human translation and human-aided computer translation are both substantial achievements, and without them the global flows of trade and information of the past few decades would not have been nearly so smooth. Until recently, they remained the preserve of language professionals. What they also did, of course, was to put huge quantities of translation products (translated texts paired with their source texts) in machine-readable form. The invention and the explosive growth of the Internet since the 1990s has made this huge corpus available for free to everyone with a terminal. And then Google stepped in.
Using software built on mathematical frameworks originally developed in the 1980s by researchers at IBM, Google has created an automatic-translation tool that is unlike all others. It is not based on the intellectual presuppositions of Weaver, and it has no truck with interlingua or invariant cores. It doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before. It uses vast computing power to scour the Internet in the blink of an eye looking for the expression in some text that exists alongside its paired translation. The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the Web by individuals, libraries, booksellers, authors, and academic departments. Drawing on the already established patterns of matches between these millions of paired documents, GT uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it. Much of the time, it works. It’s quite stunning. And it is largely responsible for the new mood of optimism about the prospects for FAHQT, Weaver’s original pie in the sky.