Read Is God a Mathematician? Online
Authors: Mario Livio
Quetelet used to open his course on the history of science with the following insightful observation: “The more advanced the sciences become, the more they have tended to enter the domain of mathematics, which is a sort of center towards which they converge. We can judge of the perfection to which a science has come by the facility, more or less great, with which it may be approached by calculation.”
In December of 1823, Quetelet was sent to Paris at the state’s expense, mostly to study observational techniques in astronomy. As it turned out, however, this three-month visit to the then mathematical capital of the world veered Quetelet in an entirely different direction—the theory of probability. The person who was mostly responsible for igniting Quetelet’s enthusiastic interest in this subject was Laplace himself. Quetelet later summarized his experience with statistics and probability:
Chance, that mysterious, much abused word, should be considered only a veil for our ignorance; it is a phantom which exercises the most absolute empire over the common mind, accustomed to consider events only as isolated, but which is reduced to naught before the philosopher, whose eye embraces a long series of events and whose penetration is not led astray
by variations, which disappear when he gives himself sufficient perspective to seize the laws of nature.
The importance of this conclusion cannot be overemphasized. Quetelet essentially denied the role of chance and replaced it with the bold (even though not entirely proven) inference that even social phenomena have causes, and that the regularities exhibited by statistical results can be used to uncover the rules underlying social order.
In an attempt to put his statistical approach to the test, Quetelet started an ambitious project of collecting thousands of measurements related to the human body. For instance, he studied the distributions of the chest measurements of 5,738 Scottish soldiers and of the heights of 100,000 French conscripts by plotting separately the frequency with which each human trait occurred. In other words, he represented graphically how many conscripts had heights between, say, five feet and five feet two inches, and then between five feet two inches and five feet four inches, and so on. He later constructed similar curves even for what he called “moral” traits for which he had sufficient data. The latter qualities included suicides, marriages, and the propensity to crime. To his surprise, Quetelet discovered that all the human characteristics followed what is now known as the
normal
(or
Gaussian,
named somewhat unjustifiably after the “prince of mathematics” Carl Friedrich Gauss), bell-shaped frequency distribution (figure 33). Whether it was heights, weights, measurements of limb lengths, or even intellectual qualities determined by what were then pioneering psychological tests, the same type of curve appeared again and again. The curve itself was not new to Quetelet—mathematicians and physicists recognized it from the mid-eighteenth century, and Quetelet was familiar with it from his astronomical work—it was just
the association of this curve with human characteristics that came as somewhat of a shock. Previously, this curve had been known as the
error curve,
because of its appearance in any type of errors in measurements.
Figure 33
Imagine, for instance, that you are interested in measuring very accurately the temperature of a liquid in a vessel. You can use a high-precision thermometer and over a period of one hour take one thousand consecutive readings. You will find that due to random errors and possibly some fluctuations in the temperature, not all measurements will give precisely the same value. Rather, the measurements would tend to cluster around a central value, with some measurements giving temperatures that are higher and others that are lower. If you plot the number of times that each measurement occurred against the value of the temperature, you will obtain the same type of bell-shaped curve that Quetelet found for the human characteristics. In fact, the larger the number of measurements performed on any physical quantity, the closer will the obtained frequency distribution approximate the normal curve. The immediate implication of this fact for the question of the unreasonable effectiveness of mathematics is quite dramatic in itself—even human errors obey some strict mathematical rules.
Quetelet thought that the conclusions were even more far-reaching. He regarded the finding that human characteristics followed the error curve as an indication that the “average man” was in fact a type that nature was trying to produce. According to Quetelet, just as manufacturing errors would create a distribution of lengths around the average (correct) length of a nail, nature’s errors were distributed around a preferred biological type. He declared that the people of a nation were clustered about their average “as if they were the results of measurements made on one and the same person, but with instruments clumsy enough to justify the size of the variation.”
Clearly, Quetelet’s speculations went a bit too far. While his discovery that biological characteristics (whether physical or mental) are distributed according to the normal frequency curve was extremely important, this could neither be taken as proof for nature’s intentions nor could individual variations be treated as mere mistakes. For instance, Quetelet found the average height of the French conscripts
to be five feet four inches. At the low end, however, he found a man of one foot five inches. Obviously one could not make an error of almost four feet in measuring the height of a man five feet four inches tall.
Even if we ignore Quetelet’s notion of “laws” that fashion humans in a single mold, the fact that the distributions of a variety of traits ranging from weights to IQ levels all follow the normal curve is in itself pretty remarkable. And if that is not enough, even the distribution of major-league batting averages in baseball is reasonably normal, as is the annual rate of return on stock indexes (which are composed of many individual stocks). Indeed, distributions that deviate from the normal curve sometimes call for a careful examination. For instance, if the distribution of the grades in English in some school were found not to be normal, this could provoke an investigation into the grading practices of that school. This is not to say that all distributions are normal. The distribution of the lengths of words that Shakespeare used in his plays is not normal. He used many more words of three and four letters than words of eleven or twelve letters. The annual household income in the United States is also represented by a non-normal distribution. In 2006, for instance, the top 6.37% of households earned roughly one third of all income. This fact raises an interesting question in itself: If both the physical and the intellectual characteristics of humans (which presumably determine the potential for income) are normally distributed, why isn’t the income? The answer to such socioeconomic questions is, however, beyond the scope of the present book. From our present limited perspective, the amazing fact is that essentially all the physically measurable particulars of humans, or of animals and plants (of any given variety) are distributed according to just one type of mathematical function.
Human characteristics served historically not only as the basis for the study of the statistical frequency distributions, but also for the establishment of the mathematical concept of
correlation
. The correlation measures the degree to which changes in the value of one variable are accompanied by changes in another. For instance, taller women may be expected to wear larger shoes. Similarly, psychologists found a correlation between the intelligence of parents and the degree to which their children succeed in school.
The concept of a correlation becomes particularly useful in those situations in which there is no precise functional dependence between the two variables. Imagine, for example, that one variable is the maximal daytime temperature in southern Arizona and the other is the number of forest fires in that region. For a given value of the temperature, one cannot predict precisely the number of forest fires that will break out, since the latter depends on other variables such as the humidity and the number of fires started by people. In other words, for any value of the temperature, there could be many corresponding numbers of forest fires and vice versa. Still, the mathematical concept known as the
correlation coefficient
allows us to measure quantitatively the strength of the relationship between two such variables.
The person who first introduced the tool of the correlation coefficient was the Victorian geographer, meteorologist, anthropologist, and statistician Sir Francis Galton (1822–1911). Galton—who was, by the way, the half-cousin of Charles Darwin—was not a professional mathematician. Being an extraordinarily practical man, he usually left the mathematical refinements of his innovative concepts to other mathematicians, in particular to the statistician Karl Pearson (1857–1936). Here is how Galton explained the concept of correlation:
The length of the cubit [the forearm] is correlated with the stature, because a long cubit usually implies a tall man. If the correlation between them is very close, a very long cubit would usually imply a very tall stature, but if it were not very close, a very long cubit would be on the average associated with only a tall stature, and not a very tall one; while, if it were
nil,
a very long cubit would be associated with no especial stature, and therefore, on the average, with mediocrity.
Pearson eventually gave a precise mathematical definition of the correlation coefficient. The coefficient is defined in such a way that when the correlation is very high—that is, when one variable closely follows the up-and-down trends of the other—the coefficient takes the value of 1. When two quantities are
anticorrelated,
meaning that when one increases the other decreases and vice versa, the coefficient is equal
to–1. Two variables that each behave as if the other didn’t even exist have a correlation coefficient of 0. (For instance, the behavior of some governments unfortunately shows almost zero correlation with the wishes of the people whom they supposedly represent.)
Modern medical research and economic forecasting depend crucially on identifying and calculating correlations. The links between smoking and lung cancer, and between exposure to the Sun and skin cancer, for instance, were established initially by discovering and evaluating correlations. Stock market analysts are constantly trying to find and quantify correlations between market behavior and other variables; any such discovery can be enormously profitable.
As some of the early statisticians readily realized, both the collection of statistical data and their interpretation can be very tricky and should be handled with the utmost care. A fisherman who uses a net with holes that are ten inches on a side might be tempted to conclude that all fish are larger than ten inches, simply because the smaller ones would escape from his net. This is an example of
selection effects
—biases introduced in the results due to either the apparatus used for collecting the data or the methodology used to analyze them. Sampling presents another problem. For instance, modern opinion polls usually interview no more than a few thousand people. How can the pollsters be sure that the views expressed by members of this sample correctly represent the opinions of hundreds of millions? Another point to realize is that correlation does not necessarily imply causation. The sales of new toasters may be on the rise at the same time that audiences at concerts of classical music increase, but this does not mean that the presence of a new toaster at home enhances musical appreciation. Rather, both effects may be caused by an improvement in the economy.
In spite of these important caveats, statistics have become one of the most effective instruments in modern society, literally putting the “science” into the social sciences. But why do statistics work at all? The answer is given by the mathematics of
probability,
which reigns over many facets of modern life. Engineers trying to decide which safety mechanisms to install into the Crew Exploration Vehicle for astronauts, particle physicists analyzing results of accelerator
experiments, psychologists rating children in IQ tests, drug companies evaluating the efficacy of new medications, and geneticists studying human heredity all have to use the mathematical theory of probability.
Games of Chance
The serious study of probability started from very modest beginnings—attempts by gamblers to adjust their bets to the odds of success. In particular, in the middle of the seventeenth century, a French nobleman—the Chevalier de Méré—who was also a reputed gamester, addressed a series of questions about gambling to the famous French mathematician and philosopher Blaise Pascal (1623–62). The latter conducted in 1654 an extensive correspondence about these questions with the other great French mathematician of the time, Pierre de Fermat (1601–65). The theory of probability was essentially born in this correspondence.
Let’s examine one of the fascinating examples discussed by Pascal in a letter dated July 29, 1654. Imagine two noblemen engaged in a game involving the roll of a single die. Each player has put on the table thirty-two pistoles of gold. The first player chose the number 1, and the second chose the number 5. Each time the chosen number of one of the players turns up, that player gets one point. The winner is the first one to have three points. Suppose, however, that after the game has been played for some time, the number 1 has turned up twice (so that the player who had chosen that number has two points), while the number 5 has turned up only once (so the opponent has only one point). If, for whatever reason, the game has to be interrupted at that point, how should the sixty-four pistoles on the table be divided between the two players? Pascal and Fermat found the mathematically logical answer. If the player with two points were to win the next roll, the sixty-four pistoles would belong to him. If the other player were to win the next roll, each player would have had two points, and so each would have gotten thirty-two pistoles. Therefore, if the players separate without playing the next roll, the first player could correctly argue: “I am certain of thirty-two pistoles even if I
lose this roll, and as for the other thirty-two pistoles perhaps I shall have them and perhaps you will have them; the chances are equal. Let us then divide these thirty-two pistoles equally and give me also the thirty-two pistoles of which I am certain.” In other words, the first player should get forty-eight pistoles and the other sixteen pistoles. Unbelievable, isn’t it, that a new, deep mathematical discipline could have emerged from this type of apparently trivial discussion? This is, however, precisely the reason why the effectiveness of mathematics is as “unreasonable” and mysterious as it is.