A lively debate has been going on over at anthropology.net regarding a proposal by German Dziebel, expounded in his recent book, that modern humans originated in the Americas and spread from there to the rest of the world — an Out of America (OOAm) hypothesis to mirror the more widely-accepted Out of Africa (OOAf) hypothesis. The debate has been stimulated by two posts by Dziebel (here and here), which argue that many diverse sources of data suggest that modern humans originated in the Americas, or at the very least, the available data certainly do not rule this possibility out. Much of the debate in the comment threads focuses on genetic arguments, as one might expect, but I was interested in the linguist evidence. I asked Dziebel to elaborate on the linguistic evidence, and he kindly responded as follows:

Regarding the relevance of linguistic diversity in the Americas to the problem of the peopling of the Americas, I base myself off of Johanna Nichols’s “Linguistic diversity and the first settlement of the New World.” Language 66:3. (1990) as well as her Linguistic Diversity in Space and Time (1992).

Being stranded in Lima as I am, I have no access to Nichols’ 1992 book, but I was able get the Language article through JSTOR. In this post my goal is basically to evaluate to what degree the evidence and arguments presented in Nichols (1990), cited by Dziebel in support of the OOAm hypothesis, in fact support this hypothesis. For those who want the Reader’s Digest summary, my conclusions are the following: to a large degree, the basic evidence given in Nichols (1990) is neutral with respect to the OOAm hypothesis or competing hypotheses that place human origins in other continents. However, those parts of the paper that raise arguments relevant to distinguishing various origin hypotheses come down in favor of America as a site of colonization from the Old World, and not as a site from which humans migrated. (Just to be clear: I am not arguing for or against the OOAm hypothesis as a whole, but rather, taking on the much more restricted question of whether the linguistic evidence that Dziebel cites in fact supports the OOAm hypothesis.)

For Dziebel, the interesting point of Nichols (1990) lies in the relatively high linguistic diversity of the Americas and the implications of this diversity for the antiquity of human presence in the Americas. In his comment to me, Dziebel writes:

As measured by the number of independent linguistic stocks, linguistic divergence in the Americas must have taken at least 35,000 years. Of course, this figure cannot be taken literally but there’s a marked contrast between language diversity in the Americas (and in places like Papua New Guinea, with human archaeological record of some 40,000 years) and language diversity in Africa.

Dziebel raises two points here that are based on Nichols (1990). First, the linguistic diversity found in the Americas suggests that the human presence in the Americas goes back at least 35,000 years. And second, the human diversity of the Americas is significantly greater that found in Africa.

The arguments that Nichols (1990) marshals for the early date for the initiation of human migration to the Americas are very interesting, and rely on converging sources of data. However, the single most important piece of evidence is the sheer number of linguistic stocks found in the Americas. If we follow a uniformitarian assumption about rates of linguistic differentiation, and then calculate the rate of development of distinct stocks in other parts of the world, we are led to the conclusion that there is simply no way that the linguistic diversity we find in the Americas could have developed in the time window given by Clovis-based chronologies that posit that colonization of the Americas began around 12,000 years ago, or more recent accepted chronologies that push that date back to about 20,000 years ago. Pulling together as much linguistic and and archeological evidence as she can about migration rates across Beringia and the Bering Straits, Nichols suggests a date of roughly 35,000 years for the initial migrations into the Americas.

If we abstract away from the colonization-based scenario that Nichols employs, as Dziebel clearly does, we could argue that Nichols calculations support human presence in the Americas from 35,000 years ago — whether due to migration or otherwise. However, this interesting result cannot distinguish between the OOAm hypothesis and hypotheses that place human origins in other continents. It counts as an interesting piece of evidence regarding human presence in the Americas, but does not speak to the validity of OOAm, because it tells us nothing about how these humans got to be in the Americas.

It is worth noting that although Nichols (1990) does indeed argue for an earlier human presence in the Americas than do hypotheses based on physical remains, the entire point of the article is to develop a estimate for the date of human colonization of the Americas, based on linguistic evidence. Dziebel takes the early date for human presence in the Americas presented in the paper as support for the OOAm hypothesis, but discards the fact that this date is given in the context of a model for colonization of the Americas from the Old World.

Let us now take up Dziebel’s second point, which concerns the relative linguistic diversity of the Americas and Africa. Nichols (1990) observes that if one looks at the density of linguistic stocks globally, certain areas, such as New Guinea and South America, show a higher density that other areas, such as Europe. And, as Dziebel correctly notes, the density of the Americas as whole is higher than that of Africa. But, does this fact count as evidence either for or against OOAm? No, not at all.

Dziebel interest in the relative linguistic diversity of the Americas and of Africa lies in the supposed ability of linguistic diversity to predict the age of populations:

To summarize, linguistic diversity is a good and straightforward predictor of a population’s age if geography is factored in and if it’s checked against the mtDNA and Y-chromosome picture.

While it is certainly true that, all other things being equal, linguistic diversity in a region increases over time, it does not follow that linguistic diversity is a straightforward indicator of the age of that area’s population. The confounding factor is large-scale language shift. As Nichols argues, there is good reason to believe that in Europe, for example, Indo-European languages replaced pre-Indo-European languages on a massive scale, radically reducing the linguistic diversity of the region.

Of course, Dziebel also mentions the “mtDNA and Y-chromosome picture” — but it’s not clear to me how this is relevant to the utility of using linguistic diversity to estimate the age of a population, unless his following comment gives us a clue:

Linguistic diversity steadily increases with time, unless this process is checked by geography and reversed by population replacements.

So here it appears that Dziebel makes use of the concept of ‘population replacement’ to account for interruptions in the steady growth of linguist diverstiy. But of course, language shift need not co-occur with population replacement, entirely disrupting the tidy correspondence between linguistic diversity and the age of populations. In Europe, for example, Nichols argues that Indo-European *languages* replaced pre-Indo-European ones, not that *populations* were replaced. The result was a loss of linguistic diversity. And as the following comment shows, Dziebel seems perfectly aware of this fact:

Translated into the levels of linguistic diversity, Europe experienced periods of language replacement (now it’s dominated by Indo-European languages) but all these replacements originated from the same genetic pool.

But then he concludes:

However the factors of geography and population replacement are subordinate to the factor of spontaneous differentiation because differentiation occurs all the time and everywhere, while geographical constraints and population replacements are accidental events.

What Dziebel seems to be arguing here is that even though we know that language shift occurs — and on vast scales, as in Europe and Africa — at the end of the day, linguistic diversity is still a reliable measure of a population’s age. But this is clearly false — or maybe I am misunderstanding his point. The fact that large-scale language shift occurs, without necessarily significant changes in the *biological* population, means that linguistic diversity is good as a measure of the amount of time that has transpired *subsequent to* such large scale linguistic shifts. These shifts largely erase the linguistic history of an area, screening off the population’s age prior to that point from measures based on linguistic diversity.

The fact that such large scale shifts appear to have occurred in Africa and Europe means that measures of linguistic diversity simply cannot tell us very much about the ultimate ages of those populations. Consequently, the fact that the Americas display greater linguistic diversity than Africa tells us nothing about the relative ages of the populations of the two regions. The linguistic diversity evidence that Dziebel cites simply does not bear on the validity of OOAm.

Apart from the linguistic diversity evidence just discussed, Dziebel also cites typological evidence:

The distribution of grammatical features (such as head-marking vs. dependent-marking, numeral classifiers, etc.) again shows a cline from America and Australasia to Africa and Europe, and Nichols’s argued that our perspective on an early human language comes from America and Australasia and not Africa and Europe.

It is certainly true that Nichols (1990) observes certain typological features appear to cluster in certain geographical areas, and that intermediate areas show intermediate values for the parameters in question. Thus, as extremes, South America shows a very high proportion of head-marking languages, while Europe and Africa show a very high proportion of dependent-marking languages. Intermediate areas, such as Australasia, tend to show either mixed-marking or double-marking. However, the fact that one can identify typological parameters that exhibit a cline of values between the Americas, on the one hand, and Europe and Africa, on the other, tells us little about the locus of modern human origins. By themselves, these linguistic facts are consistent with both OOAm and OOAf scenarios. They simply do not speak to validity of one hypothesis over the other.

Dziebel also says, however, that “Nichols’s argued that our perspective on an early human language comes from America and Australasia and not Africa and Europe.” Well, if she does so in Nichols (1990), I can’t find it. The closest argument I can find in Nichols (1990) to the one that Dziebel attributes to her is an observation about the relationship between colonization and the preservation of linguistic features. To summarize, Nichols observes that when new areas are colonized, it is not unusual for linguistic features to survive in the colonized area that are subsequently lost in the areas from which the linguistic stocks originally spread. Note, of course, that the languages in the colonized area continue to change, as do all human languages, so it is misleading to characterize them as somehow reflecting “early human languages”. Rather, the languages in questions simply preserve some features that were present at the time of colonization, and which tend to get lost in the original area due to language shift. Note, btw, that *were* it possible to show that American languages retain certain features subsequently lost in other parts of the world, this would actually serve as evidence, following Nichols’ arguments, for the Americas having been colonized from the Old World, rather than the reverse, as Dziebel proposes.

Thus far, then, I can find no evidence in Nichols (1990) that supports the OOAm hypothesis. I now wish to briefly review evidence given in the paper that argues against the OOAm hypothesis.

First, linguistic diversity in the Americas tends to increase the further south one goes. Modulo issues of language shift, touched on above, this fact suggests that the older American populations are found in the south, and successively more recent populations are found in Meso-America and North America. These facts are easy to reconcile with a scenario in which populations entered the American in the north in stages, with subsequent populations pushing prior ones towards the south. It is not clear how these linguistic diversity facts fit with an OOAm scenario.

Second, Nichols argues that linguistic diversity is, in general, higher in areas that have been colonized than the centers from which colonization occurred (a point to which I alluded above). Nichols argues (p. 487) that this is due to the fact that centers are loci of large scale economies, which result in linguistic spreads that reduce linguistic diversity. The greater linguistic diversity of the Americas is, by this reasoning, supportive of the Americas being a colonized region, and not the OOAm hypothesis.

To summarize, Dziebel cited Nichols (1990) as a source of evidence and arguments that support the OOAm hypothesis. In particular, Dziebel cites linguistic evidence from this work for the antiquity of human settlement in the Americas and for the existence of a typological cline linking the Old World and New. However, neither piece of evidence supports an OOAm scenario over a OOAf scenario (or vice versa). However, other evidence and arguments presented in Nichols (1990) casts doubt on an OOAm scenario. In particular, the evidence regarding linguistic diversity within the Americas is consistent with a process of colonization of the New Word by multiple migrations from the north, but is not easy to reconcile with a an OOAm scenario. Additionally, Nichols makes arguments regarding the effects of colonization on linguistic diversity which are consistent with the Americas being the site of colonization, but not with the Americas being the point from which the Old World was colonized.

Regardless of the ultimate validity of the OOAm hypothesis, then, the linguistic arguments Dziebel presents in its favor are unconvincing to me. I wish to emphasize that I am restricting my attention to the linguistic arguments, and it is possible that the genetic arguments or those based on kinship terminology provide much better evidence for OOAm. At this point, however, I am led to conclude that the linguistic evidence that Dziebel has presented so far in favor of OOAm is weak.


The process of the settlement of the Americas is one of those long-standing and fascinating research questions that can probably only be properly tackled by bringing to bear the tools of multiple disciplines: archeology, historical linguistics, and biology — especially genetic analyses of Native American populations. I was excited to see, therefore, a recent study, Genetic Variation and Population Structure in Native Americans (PLoS Genetics), that sought to use information on genetic variation in Native American populations to develop and test hypotheses about the question of prehistoric migration in the Americas.

There is much to chew on in this interesting article, and I have some queries on methodological issues related to the genetics discussed in the article, but in this post I want to comment on the use the authors made of historical linguistics. Most of the article is devoted to analyses of genetic samples from various indigenous peoples of the Americas, but one section is entitled “Genes and Languages”. The first sentence of this section reads:

We compared the classification of the population into linguist “stocks” with their genetic relationships as inferred on a neighbor-joining tree constructed from Nei genetic distances.

When I saw the word “stocks”, my eyebrows went up, and I read on:

In a neighbor-joining tree, a reasonably well-supported cluster (86%) includes all non-Andean South American populations, together with the Andean-speaking Inga population from southern Columbia. Within this South American cluster, strong support exists from separate clustering of Chibchan-Paezan (97%) and Equatorial-Tucanoan (96%) speakers (except for the inclusions of the Equatorial-Tucanoan Wayuu population with its Chibchan-Paezan geographic neighbors, and the inclusion of Kaingang, the single Ge-Pano-Carib population, with its Equatorial-Tucanoan geographic neighbors).

Chibchan-Paezan? Equatorial-Tucanoan? Ge-Pano-Carib? Uh-oh, I thought, it looks like the authors are using Greenberg’s classification of the languages of the Americas. The citations confirmed it: Greenberg (1987) and Ruhlen (1991) are their main linguistic references. I was stunned.

The authors are geneticists, and not historical linguists specializing in the Americas, so they are probably blissfully unaware of the fact that Greenberg’s classification (which Ruhlen essentially repeats) has been severely criticized by Americanist historical linguists, and is regarded by most of them as unreliable at best. They may exist, but I’ve never met an Americanist that finds Greenberg’s classification vaguely plausible. But the authors thank Merritt Ruhlen for assistance in their acknowledgement section, which indicates at least one source for their linguistic advice.

The problematic nature of the use of Greenberg’s classification is nicely, if subtly, indicated by the following observation by the authors:

As the use of a single-family grouping (Amerind) of all languages not belonging to the Na-Dene or Eskimo-Aleut families is controversial [here they cite Bolnick et al. 2004], we focused our analysis on the taxonomically lower level of linguistic stocks.

To say that Amerind is “controversial” is an understatement — but never mind that for now — as Lyle Campbell points out, even Greenberg and Ruhlen admit that they have greater confidence in the Amerind supergroup than they do in the accuracy of the subgroupings within Amerind:

Moreover, there is some reason to believe that not even Greenberg and Ruhlen have strong faith in the validity of these eleven groupings, since the repeatedly mentioned their belief that the overall Amerind construct “is really much more robust that some [of these eleven] lower branches of Amerind (Ruhlen 1994b:15; see Greenberg 1987:59). (Campbell 1997: p.328)

The Greenberg citation in question reads:

The validity of Amerind as a whole is more secure than that of any of its stocks.

So, the authors of GVPSNA think that Amerind is too controversial to be used in their paper, but Greenberg and Ruhlen think that Amerind is “more robust” and “more secure” that the “taxonomically lower level of linguistic stocks” used in GVPSNA. Simple transitivity means that these the authors should not trust the lower level stocks either.

The root problem with the lower-level groupings in Amerind is that even if the method of mass lexical comparison (MMLC) used by Greenberg and Ruhlen is viable (and there are not many historical linguists who would defend this position), the method is (as Bill Poser, among many others, has pointed out) incapable of defining subgroupings. The very best that MMLC can do (and once again, historical linguists have grave doubts even here) is show that a group of languages is related. It cannot elucidate subgroupings within that group of related languages.

I’ll save the explanations for the flaws in MMLC and its inability to define subgroupings for another post, but we see in the case of GVPSNA both a pervasive problem and an opportunity. The pervasive problem is that literacy in linguistics is low both among laymen and in other scientific disciplines — a horse long ago beaten to death over at Language Log (the horse in question, is, unfortunately, undead, and requires period new beatings). The opportunity is twofold: first, its clear that linguistics has something to offer scientists in other fields, which is nice; and second, getting the word out about the state of the art in linguistics gives linguists a great way to achieve world domination. Fast.

Works Cited

Bolnick DA, Shook BA, Campbell L, Goddard I. 2004. Problematic use of Greenberg’s linguistic classification of the Americas in studies of Native American genetic variation. Am J Hum Genet 75: 519–522.

Campbell, Lyle. 1997. American Indian Languages: The historical linguistics of Native America. Oxford University Press.

Greenberg, Joseph. 1987. Language in the Americas. Stanford University Press.

Ruhlen, Merritt. 1991. A guide to the world’s languages. Volume 1: Classification. Stanford, CA: Stanford University Press.