Surfaces and Essences: Analogy as the Fuel and Fire of Thinking (89 page)

Only a few years after the dream of machine translation was hatched, it was already starting to run into profound problems. These problems were articulated by a number of skeptics, of whom perhaps the most vociferous was the logician Yehoshua Bar-Hillel, who had once been one of the field’s earliest and most enthusiastic researchers. In the mid-1950’s, then, there was already deep skepticism over the idea of translation as a “mechanical” or “algorithmic” process, and such skepticism is still warranted today, as we have just seen.

Good Analogies Make Good Translations

How, then, should one translate Lincoln’s famous speech-opener? Suppose we wished to translate it into French, for instance. It helps to know what it means. To begin with, then, since a score consists of twenty items, it would follow that “four score”

means “eighty”, and that “four score and seven” means “eighty-seven”. Can we say this in French? Yes, fortunately, the French, who have a strong mathematical tradition, do have a way of saying “eighty-seven” — namely, “quatre-vingt-sept”. To say “years”, say “ans”, and to say “ago”, just put “il y a” at the very front. So putting the little pieces together, we get “Il y a quatre-vingt-sept ans”. All done!

There is a problem, though, which is that Lincoln’s phrase “four score and seven” is not the usual way of saying “eighty-seven” in English, but a special way of saying it, based on a somewhat odd way of looking at the number and which, more importantly, has a clear poetic resonance. These are both key qualities of Lincoln’s turn of phrase, so it would be crucial to preserve both of them in our translation.

Now, by a remarkable coincidence, the French word for “eighty” happens not to be based on “huit” (the French word for “eight”), but on the words “quatre” and “vingt” (“four”, “twenty”) — so that “quatre-vingts”, the standard French word for “eighty”, actually means “four twenties”! It would seem that the ideal solution — “Il y a quatre-vingt-sept ans” (“Four-twenty-seven years ago”) — fell right out of the sky. What luck! It certainly would not have worked if the Gettysburg Address had been delivered in 1843 (an event that would have been confusing to more than one historian), since “three score and seven” is expressed in French not as “trois-vingt-sept” (that is, not as “three-twenty-seven”) but as “soixante-sept” (that is, as “sixty-seven”, very literally).

It might at first blush seem that thanks to this stroke of luck, we are in fact done, but this is a hasty conclusion. French speakers almost never notice the “quatre” and the “vingt” inside “quatre-vingts” — no more than English speakers hear the words
“for” and “tea” inside “forty”, or, for that matter, an allusion to the terror of Mongol invaders inside “hundred”. Even if for English speakers, “quatre-vingts” seems to brim with the proper arithmetical meaning and also to exude an archaic, poetic flavor, it does nothing of the sort for French speakers; to them, it is just an ordinary, pedestrian word for the number halfway between 70 and 90. By contrast, for English speakers, Lincoln’s phrase requires a tad of conscious calculation and resounds with poetry and nobility. Therefore, if we hope to capture its essence in another language, then we have to avoid, at all costs, the mundane — and this means that what we at first took for a great stroke of luck turns out to have been a cursing in disguise! And so, it’s back to the drawing board.

Our goal is to be high-quality copycats, which is to say, to “do what Lincoln did” — and what Lincoln did is to replace the standard English
eight-tens
view of eighty by an exotic, attention-grabbing
four-twenties
view. At this point, it may seem obvious to some that the solution is simply to use a reversal analogy — namely, to replace French’s standard
four-twenties
view of eighty by an exotic, attention-grabbing
eight-tens
view. Indeed, this is even reminiscent of that elegant Copycat flip of perspective whereby
xyz
is seen as the mirror image of
abc.
And as a matter of fact, the word “huitante” (meaning essentially “eight-ty”), though exotic in France, is commonly used in Switzerland in lieu of “quatre-vingts”, and in some French-speaking locales the even more exotic word “octante” is also a dialectal way of saying “eighty”, though today it has almost fallen out of currency. So what about using one or the other of these unusual French words for “eighty”? Well, unfortunately, the phrases “Il y a huitante-sept ans” and “Il y a octante-sept ans”, rather than sounding poetic and uplifting, come across to most French speakers as simply quaint. As would-be translations of Lincoln’s noble phrase, they are both extremely wanting. In this case, the often clever idea of a reversal analogy doesn’t pay off.

Clearly, then, we will need to search around for a deeper French analogue to Lincoln’s phrase. What might be the French counterpart of the poetic phrase “four score and seven years ago”? Without recounting all our failed forays, we can simply say that we began by exploring the idea of “seven dozen and three years ago”, which uses the common French word “douzaine”. This, however, did not seem analogous enough to the original because, among other things, it involved too much conscious calculation, although it did at least activate the closely related thought of using the French word “dizaine” (which is analogous to “douzaine”, but features only ten items). And once again, although “il y a huit dizaines et sept ans” sounded silly to our ears, it reminded us of the word “trentaine” (which also is like a dozen, but involves
thirty
items), so that “eighty-seven” could be expressed as “trois trentaines moins trois” — “three thirties minus three”. At this point, however, our translation was becoming too much like an arithmetic problem, or even a mild tongue-twister. Hardly our goal!

Eventually, it dawned on us that French has the word “vingtaine” as well — based on “vingt” and meaning a collection of twenty items (in other words, a score). Thus we finally hit upon the translation “Il y a quatre vingtaines d’années, plus sept ans”. This was far more promising, but still, its flavor was not sufficiently analogous to that of the
original, since the explicit mention of the arithmetical operation “plus” was too heavy-handed, as if Lincoln had said “four score
plus
seven years ago”. In the end, though, we were able to raise the loftiness of the tone by making some minor adjustments, as follows: “Voici quatre vingtaines d’années, et encore sept ans…” (“Four score years ago, and yet seven more…”). We were quite proud of our collaborative find.

Another idea was suggested to us by a translator friend who took advantage of the poetic word “lustre”. Although most native speakers of French think of it as simply meaning “a long time”, it can also mean a five-year chunk of time. This fact allowed our friend to render the president’s lustrous words by “Voici seize lustres et encore sept ans…” (“Sixteen lustres and seven years ago…”), which exudes a rather grand and lofty flavor. (Of course he could have said “Seventeen lusters and two years ago…”, but it’s not clear that that’s an improvement.)

Altogether, then, through a slow process of carefully honed analogy-making and analogy-judging, we eventually managed to recreate some of the high-sounding flavor of Abraham Lincoln’s immortal phrase, while sidestepping various superficially enticing traps along the way.

Potential Progress in Machine Translation

The preceding anecdote confirms the pervasive thesis of Warren Weaver’s book
Alice in Many Tongues
, which is that to translate well, the use of analogies is crucial. In order to come up with possible analogies and then to judge their appropriateness, one must carefully exploit one’s full inventory of mental resources, including one’s storehouse of life experiences.

Could machine translation possibly do anything of the sort? Is it conceivable that one day, computer programs will be able to carry out translation at a high level of artistry? A couple of decades ago, some machine-translation researchers, spurred by the low quality of what had then been achieved in their field, began to question the methods on which the field had been built (mostly word-matching and grammatical rules), and started exploring other avenues. What emerged with considerable vigor was the idea of
statistical
translation, which today has become a very important strategy used in tackling the challenge of machine translation.

This approach is based on the use of statistically-based educated guesswork, where the data base in which all guesses are rooted consists of an enormous storehouse of bilingual texts, all of which have been carefully translated by human experts. A typical example of such a data base is the proceedings, over several decades, of the Canadian Parliament, which are legally required to be made available in both English and French. Such a data base is a marvelous treasurehouse of linguistic information, if only one can figure out how to exploit it.

The basic idea of statistical machine translation is to choose among the many possible meanings of a “chunk” (that is, a word or several-word segment) in a piece of input text (
i.e.
, a text to be translated) by exploiting the
context
in which the chunk appears in the given passage. Suppose, for instance, that the engine is translating from
English to French. The English chunk to translate may appear in many thousands of diverse contexts in the English side of the bilingual data base, but only a small number of those thousands of contexts (say, two dozen) are likely to be found sufficiently “similar” to the original context (where “similarity” is judged by a complex statistical calculation). This phase of narrowing-down on the basis of statistical similarity is the crux of the matter. In the human-translated bilingual data base, each of these relatively few English-language contexts comes aligned with a corresponding French-language context. The translation problem would thus seem to have been reduced to simply zeroing in on the corresponding chunk in these few French contexts. Unfortunately, though, this vision is too optimistic. In general there won’t be just one precisely corresponding chunk in the French contexts; there may be quite a few rival candidate French chunks, and so, to get a good candidate, educated guesswork (
i.e.
, further statistical calculations, the details of which we will skip) is called for. The long and the short of it is that in this computationally intensive fashion, which takes advantage of vast amounts of human-translated text, the French chunk that is “most probably equivalent” to the English chunk is pinpointed and is inserted into the outgoing stream of French words.

One way of describing the translation algorithm that we’ve just sketched is that, through a sophisticated and highly efficient set of computations, it repeatedly makes
analogies between pieces of text
in the two languages involved. This sounds nothing if not promising, but the proof of the pudding is in the eating, and so we will now proceed to sample the pudding. In order to do so, we’ll take a careful look at a short piece of French text in order to see how two extremely different machine-translation programs dealt with it — one using the old strategy, and one using the new strategy. The passage we will examine is taken from an obituary of the novelist Françoise Sagan, written by the literary critic Bertrand Poirot-Delpech, and which appeared in the highly respected national newspaper
Le Monde
in September of 2004. The paragraph we selected is written in elegant and evocative but standard French, readily understood by any literate native speaker. We did not choose it for its difficulty; indeed, its density of “traps” for a translator is no higher than that of any typical article in
Le Monde
.

Below we give the original French, followed by the translation furnished by Google’s translation engine shortly after the obituary appeared. At that time, the Google engine was based on the original “Weaverian” machine-translation philosophy — namely, first via lookup in a very big on-line dictionary, followed by enhancement using grammatical “patching”.

Original paragraph from
Le Monde
, September 2004:

Parfois, le succès ne fut pas au rendez-vous. On a beau y penser très fort, le bon numéro ne sort pas forcément. Sagan prenait les échecs d’auteur dramatique comme les revers de casino, avec respect pour les caprices de la banque et du ciel. Il faut bien perdre un peu, pour mieux savourer la gagne du lendemain. Qui ne l’a pas vue « récupérer » en quelques quarts d’heure les pertes de toute une nuit ne peut comprendre comme c’est joyeux de narguer le sort.

Google’s translation engine, September 2004:

Sometimes, success was not with go. One thinks of it in vain very extremely, the good number does not leave inevitably. Sagan took the failures of dramatic author like the reverses of casino, with respect for the whims of the bank and the sky. It is necessary well to lose a little, for better enjoying gains it following day. Who did not see it “recovering” in a few fifteen minutes the losses of a whole night cannot include/understand as they is merry of narguer the fate.

It is obvious that the “decoding” technique — the technique that lay behind the original optimistic vision of machine translation — was hopelessly inadequate to the task, since the output that the translation engine yielded is pretty much nonsensical to an English speaker.

It is ironic that the only French word that the Google translation engine considered ambiguous in this passage was the word “comprendre”, for which it gave two possible interpretations separated by a slash, as if to suggest that were the only spot in the whole paragraph where a translator might have some doubts as to how to word things properly in English. (The French word “narguer”, found near the end and roughly meaning “to flout”, was apparently not in the engine’s on-line dictionary, so it was simply left in French.) This example gives a sense for the quality of machine translation in the fall of 2004.

Other books

Revolution of the Gods: The Battle for Sol Book One by W.R. Hobbs

The King of Infinite Space by David Berlinski

A Serengeti Christmas by Vivi Andrews

The Reverberator by Henry James

My Secret Life by Anonymous

Relentless (Fallon Sisters Trilogy: Book #1) by O'Dwyer, P. J.

After Midnight by Irmgard Keun

Fairy in Danger by Titania Woods

Once Upon a Shifter by Kim Fox, Zoe Chant, Ariana Hawkes, Terra Wolf, K.S. Haigwood, Shelley Shifter, Nora Eli, Alyse Zaftig, Mackenzie Black, Roxie Noir, Lily Marie, Anne Conley

WINDKEEPER by Charlotte Boyett-Compo