Read Is God a Mathematician? Online
Authors: Mario Livio
The essence of probability theory can be gleaned from the following simple facts. No one can predict with certainty which face a fair coin tossed into the air will show once it lands. Even if the coin has just come up heads ten times in a row, this does not improve our ability to predict with certainty the next toss by one iota. Yet we can predict with certainty that if you toss that coin ten million times, very close to half the tosses will show heads and very close to half will show tails. In fact, at the end of the nineteenth century, the statistician Karl Pearson had the patience to toss a coin 24,000 times. He obtained heads in 12,012 of the tosses. This is, in some sense, what probability theory is really all about. Probability theory provides us with accurate information about the collection of the results of a large number of experiments; it can never predict the result of any specific experiment. If an experiment can produce
n
possible outcomes, each one having the same chance of occurring, then the probability for each outcome is 1/
n.
If you roll a fair die, the probability of obtaining the number 4 is 1/6, because the die has six faces, and each face is an equally likely outcome. Suppose you rolled the die seven times in a row and each time you got a 4, what would be the probability of getting a 4 in the next throw? Probability theory gives a crystal-clear answer: The probability would still be 1/6—the die has no memory and any notions of a “hot hand” or of the next roll making up for the previous imbalance are only myths. What is true is that if you were to roll the die a million times, the results will average out and 4 would appear very close to one-sixth of the time.
Let’s examine a slightly more complex situation. Suppose you simultaneously toss three coins. What is the probability of getting
two tails and one head? We can find the answer simply by listing all the possible outcomes. If we denote heads by “H” and tails by “T,” then there are eight possible outcomes: TTT, TTH, THT, THH, HTT, HTH, HHT, HHH. Of these, you can check that three are favorable to the event “two tails and one head.” Therefore, the probability for this event is 3/8. Or more generally, if out of
n
outcomes of equal chances,
m
are favorable to the event you are interested in, then the probability for that event to happen is
m/n
. Note that this means that the probability always takes a value between zero and one. If the event you are interested in is in fact impossible, then
m
= 0 (no outcome is favorable) and the probability would be zero. If, on the other hand, the event is absolutely certain, that means that all
n
events are favorable (
m = n
) and the probability is then simply
n/n
= 1. The results of the three coin tosses demonstrate yet another important result of probability theory—if you have several events that are entirely
independent
of each other, then the probability of all of them happening is the product of the individual probabilities. For instance, the probability of obtaining three heads is 1/8, which is the product of the three probabilities of obtaining heads in each of the three coins: 1/2 × 1/2 × 1/2 = 1/8.
OK, you may think, but other than in casino games and other gambling activities, what additional uses can we make of these very basic probability concepts? Believe it or not, these seemingly insignificant probability laws are at the heart of the modern study of genetics—the science of the inheritance of biological characteristics.
The person who brought probability into genetics was a Moravian priest. Gregor Mendel (1822–84) was born in a village near the border between Moravia and Silesia (today Hyncice in the Czech Republic). After entering the Augustinian Abbey of St. Thomas in Brno, he studied zoology, botany, physics, and chemistry at the University of Vienna. Upon returning to Brno, he began an active experimentation with pea plants, with strong support from the abbot of the Augustinian monastery. Mendel focused his research on pea plants because they were easy to grow, and also because they have both male and female reproductive organs. Consequently, pea plants can be either self-pollinated or cross-pollinated with another plant.
By cross-pollinating plants that produce only green seeds with plants that produce only yellow seeds, Mendel obtained results that at first glance appeared to be very puzzling (figure 34). The first offspring generation had only yellow seeds. However, the following generation consistently had a 3:1 ratio of yellow to green seeds! From these surprising findings, Mendel was able to distill three conclusions that became important milestones in genetics:
But how can one explain the quantitative results in Mendel’s experiment? Mendel argued that each of the parent plants must have had two identical “factors” (what we would call alleles, varieties of a gene), either two yellow or two green (as in figure 35). When the two were mated, each offspring inherited two different alleles, one from each parent (according to rule 2 above). That is, each offspring seed contained a yellow allele and a green allele. Why then were the peas of this generation all yellow? Because, Mendel explained, yellow was the dominant color and it masked the presence of the green allele in this generation (rule 3 above). However (still according to rule 3), the dominant yellow did not prevent the recessive green from being passed on to the next generation. In the next mating round, each plant containing one yellow allele and one green allele was pollinated with another plant containing the same combination of alleles. Since the offspring contain one allele from each parent, the seeds of the next generation may contain one of the following combinations (figure 35): green-green, green-yellow, yellow-green, or yellow-yellow. All the seeds with a yellow allele become yellow peas, because yellow is dominant. Therefore, since all the allele combinations are equally likely, the ratio of yellow to green peas should be 3:1.
Figure 34
Figure 35
You may have noticed that the entire Mendel exercise is essentially identical to the experiment of tossing two coins. Assigning heads to green and tails to yellow and asking what fraction of the peas would be yellow (given that yellow is dominant in determining the color) is precisely the same as asking what is the probability of obtaining at least one tails in tossing two coins. Clearly that is 3/4, since three of the possible outcomes (tails-tails, tails-heads, heads-tails, heads-heads) contain a tails. This means that the ratio of the number of tosses that do contain at least one tails to the number of tosses that do not should be (in the long run) 3:1, just as in Mendel’s experiments.
In spite of the fact that Mendel published his paper “Experiments on Plant Hybridization” in 1865 (and he also presented the results at two scientific meetings), his work went largely unnoticed until it was rediscovered at the beginning of the twentieth century. While some questions related to the accuracy of his results have been raised, he is still regarded as the first to have laid the mathematical foundations of modern genetics. Following in the path cleared by Mendel, the influential British statistician Ronald Aylmer Fisher (1890–1962) established the field of population genetics—the mathematical branch
that centers on modeling the distribution of genes within a population and on calculating how gene frequencies change over time. Today’s geneticists can use statistical samplings in combination with DNA studies to forecast probable characteristics of unborn offspring. But still, how exactly are probability and statistics related?
Facts and Forecasts
Scientists who try to decipher the evolution of the universe usually try to attack the problem from both ends. There are those who start from the tiniest fluctuations in the cosmic fabric in the primordial universe, and there are those who study every detail in the current state of the universe. The former use large computer simulations in an attempt to evolve the universe forward. The latter engage in the detective-style work of trying to deduce the universe’s past from a multitude of facts about its present state. Probability theory and statistics are related in a similar fashion. In probability theory the variables and the initial state are known, and the goal is to predict the most likely end result. In statistics the outcome is known, but the past causes are uncertain.
Let’s examine a simple example of how the two fields supplement each other and meet, so to speak, in the middle. We can start from the fact that statistical studies show that the measurements of a large variety of physical quantities and even of many human characteristics are distributed according to the
normal frequency curve
. More precisely, the normal curve is not a single curve, but rather a family of curves, all describable by the same general function, and all being fully characterized by just two mathematical quantities. The first of these quantities—the
mean
—is the central value about which the distribution is symmetric. The actual value of the mean depends, of course, on the type of variable being measured (e.g., weight, height, or IQ). Even for the same variable, the mean may be different for different populations. For instance, the mean of the heights of men in Sweden is probably different from the mean of the heights of men in Peru. The second quantity that defines the normal curve is known as the
standard deviation
. This is a measure of how closely the data are clustered around the mean value. In figure 36, the normal curve (a) has the largest standard
deviation, because the values are more widely dispersed. Here, however, comes an interesting fact. By using integral calculus to calculate areas under the curve, one can prove mathematically that irrespective of the values of the mean or the standard deviation, 68.2 percent of the data lie within the values encompassed by one standard deviation on either side of the mean (as in figure 37). In other words, if the mean IQ of a certain (large) population is 100, and the standard deviation is 15, then 68.2 percent of the people in that population have IQ values between 85 and 115. Furthermore, for all the normal frequency curves, 95.4 percent of all the cases lie within two standard deviations of the mean, and 99.7 percent of the data lie within three standard deviations on either side of the mean (figure 37). This implies that in the above example, 95.4 percent of the population have IQ values between 70 and 130, and 99.7 percent have values between 55 and 145.
Figure 36
Suppose now that we want to predict what the probability would be for a person chosen at random from that population to have an IQ value between 85 and 100. Figure 37 tells us that the probability would be 0.341 (or 34.1 percent), since according to the laws of probability, the probability is simply the number of favorable outcomes divided by the total number of possibilities. Or we could be interested in finding out what the probability is for someone (chosen at random) to have an IQ value higher than 130 in that population. A glance at figure 37 reveals that the probability is only about 0.022, or 2.2 percent. Much in the same way, using the properties of the normal distribution and the tool of integral calculus (to calculate areas), one can calculate the probability of the IQ value being in any given range. In other words, probability theory and its complementary helpmate, statistics, combine to give us the answer.