The process of the settlement of the Americas is one of those long-standing and fascinating research questions that can probably only be properly tackled by bringing to bear the tools of multiple disciplines: archeology, historical linguistics, and biology — especially genetic analyses of Native American populations. I was excited to see, therefore, a recent study, Genetic Variation and Population Structure in Native Americans (PLoS Genetics), that sought to use information on genetic variation in Native American populations to develop and test hypotheses about the question of prehistoric migration in the Americas.
There is much to chew on in this interesting article, and I have some queries on methodological issues related to the genetics discussed in the article, but in this post I want to comment on the use the authors made of historical linguistics. Most of the article is devoted to analyses of genetic samples from various indigenous peoples of the Americas, but one section is entitled “Genes and Languages”. The first sentence of this section reads:
We compared the classification of the population into linguist “stocks” with their genetic relationships as inferred on a neighbor-joining tree constructed from Nei genetic distances.
When I saw the word “stocks”, my eyebrows went up, and I read on:
In a neighbor-joining tree, a reasonably well-supported cluster (86%) includes all non-Andean South American populations, together with the Andean-speaking Inga population from southern Columbia. Within this South American cluster, strong support exists from separate clustering of Chibchan-Paezan (97%) and Equatorial-Tucanoan (96%) speakers (except for the inclusions of the Equatorial-Tucanoan Wayuu population with its Chibchan-Paezan geographic neighbors, and the inclusion of Kaingang, the single Ge-Pano-Carib population, with its Equatorial-Tucanoan geographic neighbors).
Chibchan-Paezan? Equatorial-Tucanoan? Ge-Pano-Carib? Uh-oh, I thought, it looks like the authors are using Greenberg’s classification of the languages of the Americas. The citations confirmed it: Greenberg (1987) and Ruhlen (1991) are their main linguistic references. I was stunned.
The authors are geneticists, and not historical linguists specializing in the Americas, so they are probably blissfully unaware of the fact that Greenberg’s classification (which Ruhlen essentially repeats) has been severely criticized by Americanist historical linguists, and is regarded by most of them as unreliable at best. They may exist, but I’ve never met an Americanist that finds Greenberg’s classification vaguely plausible. But the authors thank Merritt Ruhlen for assistance in their acknowledgement section, which indicates at least one source for their linguistic advice.
The problematic nature of the use of Greenberg’s classification is nicely, if subtly, indicated by the following observation by the authors:
As the use of a single-family grouping (Amerind) of all languages not belonging to the Na-Dene or Eskimo-Aleut families is controversial [here they cite Bolnick et al. 2004], we focused our analysis on the taxonomically lower level of linguistic stocks.
To say that Amerind is “controversial” is an understatement — but never mind that for now — as Lyle Campbell points out, even Greenberg and Ruhlen admit that they have greater confidence in the Amerind supergroup than they do in the accuracy of the subgroupings within Amerind:
Moreover, there is some reason to believe that not even Greenberg and Ruhlen have strong faith in the validity of these eleven groupings, since the repeatedly mentioned their belief that the overall Amerind construct “is really much more robust that some [of these eleven] lower branches of Amerind (Ruhlen 1994b:15; see Greenberg 1987:59). (Campbell 1997: p.328)
The Greenberg citation in question reads:
The validity of Amerind as a whole is more secure than that of any of its stocks.
So, the authors of GVPSNA think that Amerind is too controversial to be used in their paper, but Greenberg and Ruhlen think that Amerind is “more robust” and “more secure” that the “taxonomically lower level of linguistic stocks” used in GVPSNA. Simple transitivity means that these the authors should not trust the lower level stocks either.
The root problem with the lower-level groupings in Amerind is that even if the method of mass lexical comparison (MMLC) used by Greenberg and Ruhlen is viable (and there are not many historical linguists who would defend this position), the method is (as Bill Poser, among many others, has pointed out) incapable of defining subgroupings. The very best that MMLC can do (and once again, historical linguists have grave doubts even here) is show that a group of languages is related. It cannot elucidate subgroupings within that group of related languages.
I’ll save the explanations for the flaws in MMLC and its inability to define subgroupings for another post, but we see in the case of GVPSNA both a pervasive problem and an opportunity. The pervasive problem is that literacy in linguistics is low both among laymen and in other scientific disciplines — a horse long ago beaten to death over at Language Log (the horse in question, is, unfortunately, undead, and requires period new beatings). The opportunity is twofold: first, its clear that linguistics has something to offer scientists in other fields, which is nice; and second, getting the word out about the state of the art in linguistics gives linguists a great way to achieve world domination. Fast.
Bolnick DA, Shook BA, Campbell L, Goddard I. 2004. Problematic use of Greenberg’s linguistic classification of the Americas in studies of Native American genetic variation. Am J Hum Genet 75: 519–522.
Campbell, Lyle. 1997. American Indian Languages: The historical linguistics of Native America. Oxford University Press.
Greenberg, Joseph. 1987. Language in the Americas. Stanford University Press.
Ruhlen, Merritt. 1991. A guide to the world’s languages. Volume 1: Classification. Stanford, CA: Stanford University Press.