Read A Troublesome Inheritance: Genes, Race and Human History Online
Authors: Nicholas Wade
Those who assert that human races don’t exist like to point to the many, mutually inconsistent classification schemes that have recognized anywhere from 3 to 60 races. But the lack of agreement doesn’t mean that races don’t exist, only that it is a matter of judgment as to how to define them. As with any species that evolves into geographically based races, there is usually continuity between neighboring races because of gene exchange between them. Because there is no clear dividing line, there are no distinct races—that is the nature of variation within a species. Nonetheless, useful distinctions can be made.
The first step in making sense of human variation and the emergence of races is to follow the historical succession of major population splits. As noted above, the first such split occurred when a small group of people left northeast Africa some 50,000 years ago and populated the rest of the world. The first major division in the human population is thus between Africans and non-Africans. (Africans here denotes people who live south of the Sahara, because those north of the Sahara are largely Caucasian.) Among the non-Africans, there was an early division, whose nature is still poorly understood, between Europeans and East Asians. This gives a three-way split in the human population that corresponds robustly to the three racial groups that everyone can identify at a glance, those of Africans, East Asians and Caucasians. The fact that other peoples may not be so easy to classify does not alter the validity of these three basic categories.
The first migration out of Africa, the one that gave rise to both Europeans and East Asians, eventually reached Sahul, the ancient Ice Age continent that was split by rising sea levels into the three landmasses of Australia, New Guinea and Tasmania. Australian aborigines, surprisingly, turn out to be a race unlike any other. They and their relatives in New Guinea have no trace in their genome of admixture with other races until the historical period. This implies that once Sahul was settled, some 46,000 years ago, the residents fought off all later migrations until the arrival of Europeans in the 18th century. Australian aborigines can reasonably be considered a race, although a minor one in terms of population size, because of their distinctness, antiquity and the fact that they inhabit a continent.
American Indians, the original inhabitants of North and South America, can also be considered a race. Their ancestors were Siberians who originally crossed into Alaska some 15,000 years ago, but American Indians have diverged considerably since then.
A practical way of classifying human variation is therefore to recognize five races based on continent of origin. These are the three principal races—Africans, East Asians and Caucasians—and the two other continent-based groups of Native Americans and Australian aborigines (including the people of New Guinea, an island joined to Australia until the end of the last ice age).
At the land boundaries where races meet, there are often intermarried or admixed populations, as geneticists call them. Palestinians, Somalis and Ethiopians, for instance, are admixtures of African and Caucasian populations. The Uigur Turks of northwestern China and the Hazara of Afghanistan are admixtures of Caucasian and East Asian populations. African Americans are an admixture mostly of Africans and Caucasians.
Within each continental race are smaller groupings which, to avoid terms like subrace or subpopulation, that might be assumed to imply inferiority, may be called ethnicities. Thus Finns, Icelanders, Jews and other groups with recognizable genetics are ethnicities within the Caucasian race.
Such an arrangement, of portioning human variation into five continental races, is to some extent arbitrary. But it makes practical sense. The three major races are easy to recognize. The five-way division matches the known events of human population history. And most significant of all, the division by continent is supported by genetics.
THE GENETICS OF RACE
Selfish and contentious people will not cohere, and without coherence nothing can be effected. A tribe rich in the above qualities would spread and be victorious over other tribes: but in the course of time it would, judging from all past history, be in its turn overcome by some other tribe more highly endowed. Thus the social and moral qualities would tend slowly to advance and be diffused throughout the world.
—C
HARLES
D
ARWIN
1
I
n the case of human races, the genetic differences from one race to another are slight and subtle. One might expect that different races would have different genes, but they don’t. All humans, so far as is known, have the same set of genes. Each gene comes in various alternative forms, called alleles, so the next expectation might be that races would be distinguished by having different alleles of various genes. But this too is not how the system works. There are a mere
handful of known cases where a particular allele of a gene occurs in only one race.
The genetic differences between human races turn out to be based largely in allele frequencies, meaning the percentages of each allele that occur in a given race. How a mere difference in allele frequencies could lead to differences in physical traits is explained below.
A necessary approach to studying racial variation is to look not for absolute differences but at how the genomes of individuals throughout the world cluster together in terms of their genetic similarity. The result is that everyone ends up in the cluster with which they share the most variation in common. These clusters always correspond to the five continental races in the first instance, though when extra DNA markers are used, the people of the Indian subcontinent sometimes split away from Caucasians as a sixth major group, and people of the Middle East as a seventh.
One of the first genetic clustering techniques depended on examining an element of the genome called tandem repeats. There are many sites on the genome where the same pair of DNA units is repeated several times in tandem. CA stands for the DNA unit known as a cytosine followed by adenine, so the DNA sequence CACACACA would be called a tandem CA repeat. The string of repeats occasionally confuses the DNA copying apparatus, which every few generations may add or drop a repeat unit during the copying process that has to occur before a cell can divide. Sites at which repeats occur therefore tend to be quite variable, and this variability is useful for comparing populations.
In 1994, in one of the earliest attempts to study human differentiation in terms of DNA differences, a research team led by Anne Bowcock of the University of Texas and Luca Cavalli-Sforza of Stanford University looked at CA repeats at 30 sites on the genome in people from 14 populations. Comparing their subjects on the basis of the number of CA repeats at each genomic site, the researchers found that people clustered together in groups that were coincident with their continent of origin. In other words, all the Africans had patterns of CA repeats that resembled one another, all the American Indians had a different pattern of repeats and so on. Altogether there were 5 principal clusters of CA repeats, formed by people living in each of the 5 continental regions of Africa, Europe, East Asia, the Americas and Australasia.
2
Many larger and more sophisticated surveys have been done since, and all have come to the same conclusion, that “genetic differentiation is greatest when defined on a continental basis,” writes Neil Risch, a statistical geneticist at the University of California, San Francisco. “Effectively, these population genetic studies have recapitulated the classical definition of races based on continental ancestry—namely African, Caucasian (Europe and Middle East), Asian, Pacific Islander (for example, Australian, New Guinean and Melanesian), and Native American.”
3
In one of these more sophisticated studies, a team led by Noah Rosenberg of the University of Southern California and Marcus Feldman of Stanford University looked at the number of repeats at 377 sites on the genome of more than 1,000 people around the world. When this many sites are examined on a genome, it’s possible to assign segments of an individual’s genome to different races if he or she has mixed ancestry. This is because each race or ethnicity has a characteristic number of repeats at each genomic site.
The Rosenberg-Feldman study showed, as expected, that the
1,000 individuals in their study clustered naturally into five groups, corresponding to the five continental races. It also brought out the fact that several Central Asian ethnicities, such as Pathans, Hazara and Uigurs, are of mixed European and East Asian ancestry. This is not a surprise, given the frequent movement of peoples to and fro across Central Asia.
Language is often an isolating mechanism that deters intermarriage with neighboring groups. The Burusho, a people of Pakistan who speak a unique language, turn out also to be unlike their neighbors genetically. Within races, the Rosenberg-Feldman study showed that different ethnicities could be recognized. Among Africans, it is easy to distinguish by their genomes the Yoruba of Nigeria, the San (a click-speaking people of southern Africa) and the Mbuti and Biaka pygmies.
Many populations are not highly mixed, and the Rosenberg-Feldman survey confirmed the remarkable extent to which people throughout history have lived and died in the place where they were born.
4
In the ancestral human population in Africa, a large number of alleles had developed for each gene over many generations. Those who migrated out of Africa took away only a sample of these alleles. And each time a new group split off, the number of alleles from the original population again decreased.
The farther away from Africa that this process continued, the less was the diversity of alleles. This downhill gradient happens with any population that expands too far from its origins to maintain the regular interbreeding that keeps the gene pool well mixed.
A genetic gradient, or cline, is what some researchers prefer to think exists in place of races. “There are no races, there are only clines,” asserted the biological anthropologist Frank Livingstone.
5
Critics raised the same objection against the Rosenberg-Feldman result, alleging that the clustering of individuals into races was an
artifact and that with a geographically more uniform sampling approach, the researchers would have seen only clines.
6
The Rosenberg-Feldman team then reanalyzed their data and gave their survey finer resolution by looking at 993 sites, not just 377, on each of the genomes in their study. They found that the clusters are real. Although there are gradients of genetic diversity, there is also a clustering into the continental groups described in their first article.
7
Rosenberg and Feldman compared people’s genomes on the basis of DNA repeats. Another kind of DNA marker has since become available for global population comparison—the SNP, which is more useful for medical studies. SNP stands for single nucleotide polymorphism, meaning a site on the genome where some people have a different kind of DNA unit from that of the majority. A vast preponderance of sites on the genome are fixed, meaning everyone has the same DNA unit, whether A, T, G or C. The fixed sites, being all the same, say nothing about human variation. It’s the SNP sites, which are variable, that are of particular interest to geneticists because they afford a direct way of comparing populations. To exclude the many random mutations that occur just in particular individuals and have no wider importance, SNPs are arbitrarily defined as sites on the genome where at least 1% of the population has a DNA unit other than the standard one.
A research group led by Jun Z. Li and Richard M. Myers has applied a clustering program like that used by Rosenberg and Feldman to almost 1,000 people in 51 populations across the globe. Each person’s genome was examined at 650,000 SNP sites. On the basis of SNPs, just as with the DNA repeats, people sampled from around the world clustered into 5 continental groups. But in addition, the SNP library brought to light two other major clusters. These had not emerged in the Rosenberg-Feldman study, which had used fewer markers. The more DNA markers that are used, whether tandem repeats or SNPs, the more subdivisions can be established in the human population.
One of the new clusters is formed by the people of Central and South Asia, including India and Pakistan. The second is the Middle East, where there is considerable admixture with people from Europe and Africa.
8
It might be reasonable to elevate the Indian and Middle Eastern groups to the level of major races, making seven in all. But then many more subpopulations could be declared races, so to keep things simple, the five-race, continent-based scheme seems the most practical for most purposes.
Within each continental race, the SNP analysis could separate out further subgroups. Within Europe it distinguished French, Italians, Russians, Sardinians and Orcadians (people who live in the Orkney Islands, north of Scotland). In China the northern Han can be distinguished from the southern Han.
Groupings within Africa are of particular interest because this is where modern humans spent the first 150,000 years of their existence. In the most thorough survey of Africa so far, Sarah Tishkoff and colleagues surveyed people from 121 populations, scanning their genomes at 1,327 variable sites, most of them DNA repeats. The survey brought to light 14 different ancestral groups within Africa. Tishkoff found that, unlike in the rest of the world, where there are definable continental races, in Africa most populations are admixtures of several ancestral groups. There have presumably been a larger number of migration events within Africa, which served to mix up populations that were originally separate. The most recent large-scale migration was the Bantu expansion, a population explosion driven by new agricultural technology. Within the past few thousand years, Bantu speakers from the region of Nigeria and Cameroon in West Africa have migrated across to eastern Africa and down both coasts to southern Africa. Only a few groups have kept relatively clear of the churning of populations within Africa. These include the click-speaking peoples of Tanzania and southern Africa, who until recently have been
hunter-gatherers, and the various pygmy groups, who live deep in the forest.
9
The click-speakers and pygmies may be remnants of a much earlier hunter-gatherer population that once occupied a large part of southern Africa and the eastern coast as far north as Somalia. The click-speakers speak a group of languages known as Khoisan, which are unlike any others and have only very distant relationships among themselves, probably reflecting their great antiquity. The pygmy groups too may once have spoken Khoisan languages but it is impossible to know for sure, because they have lost their original languages.
Africa has four language superfamilies, of which Khoisan is one and the other three are Niger-Kordofanian (also known as Niger-Congo), Nilo-Saharan and Afro-Asiatic. The Niger-Kordofanian languages, the most widespread, were carried from western to eastern Africa and then south by the Bantu expansion, a great stream of migrations from the proto-Bantu homeland in western Africa that began in about 1000
BC
and reached southern Africa a thousand years later. Afro-Asiatic languages are spoken in a broad belt across northern Africa, and the Nilo-Saharan speakers are sandwiched between Afro-Asiatic to the north and Niger-Kordofanian to the south.
Genetics generally correlates with language family, except in the case of populations that have switched languages; the pygmies now speak Niger-Kordofanian languages, and the Luo of Kenya, whose genetics place them with Niger-Kordofanian speakers, now speak a Nilo-Saharan language.
The Tishkoff team surveyed African Americans from Chicago, Baltimore, Pittsburgh and North Carolina and found that 71% of their genomes, on average, matched the genetics of Niger-Kordofanian speakers, 8% matched that of other African populations and 13% were European. These percentages varied greatly from one individual to another.
The origin of a species can often be located by surveying the genetic diversity in its members and seeing where diversity is highest. This is because the founding population will have had longest to accumulate the mutations that generate diversity, and the groups that migrate away will carry with them only a sample of the original mutations. (Other forces, like natural selection, reduce diversity by eliminating harmful mutations and sweeping away others when a beneficial mutation is favored.) On the basis of the new African and other genomic data, the origin of the modern human migration lies in southwestern Africa, near the border of Namibia and Angola, in a region that is the current homeland of the San click-speakers. The finding is not definitive, because the distribution of ancient populations may have been rather different from those of today. Nonetheless, the fact that human genetics points to a single origin confirms that today’s races are all mere variations on the same theme.