A couple of months ago an announcement by a group biologists led by a team working out of the Universidade Federal do Minais Gerais, cleared up a small mystery that has been nagging me for about ten years now, and the resolution to this mystery nicely illustrates how the ethnobiological knowledge of the peoples that field linguists work with can outstrip that of biological experts we often rely upon.

This mystery first raised its head when I was working in Peruvian Amazonia, collaborating with several  speakers of Iquito to document the ethnobiological terminology of their language, as part of a broader effort to develop an Iquito dictionary (see here for a draft). Although we eventually got into more challenging domains like birds, fish, and plants, we began with the easiest domain: mammals (1). Our work on mammal terminology went quickly and smoothly, but for one thing: the men I was working with — principally Hermenegildo Díaz Cuyasa and Jaime Pacaya Inuma — provided two Iquito terms corresponding to the local Spanish term for tapir (sachavaca): pɨsɨkɨ and ariyuukʷaaha. The first was clearly Tapirus terrestris, the lowland tapir found all over the Amazon Basin, but I was perplexed by the second term, ariyuukʷaaha, which Hermenegildo and Jaime explained denoted a smaller variety than the one denoted by pɨsɨkɨ. I probed to see if perhaps the two terms referred to different life stages of the same species or the like or simply morphological variants (2), but the Iquito speakers were positive that there were in fact two distinct species of tapir, and described the physical characteristics that distinguished them. Mammologists, however, recognized only a single species of tapir in Amazonia: Tapirus terrestris.

 Twelve Iquito speakers at lunch in their honor (2004); Hermenegildo Díaz Cuyasa is in the back row, far left, and Jaime Pacaya Inuma, far right.

Twelve Iquito speakers at lunch in their honor (2004); Hermenegildo Díaz Cuyasa is in the back row, far left, and Jaime Pacaya Inuma, far right.

I was stumped by this state of affairs, and in the Iquito dictionary I just decided to indicate that pɨsɨkɨ was Tapirus terrestris, and that ariyuukʷaaha denoted a smaller variety of tapir which speakers identified as a distinct species. I was never fully satisfied by the this, however. How could biologists miss a wholly distinct species of mammal as large as a tapir? But on the other hand, how could a people who hunted tapirs regularly be wrong about a species distinction like this?

I expected this to be one of those numerous mysteries that crop up in fieldwork that are never resolved, and was thus very excited when I read about the discovery of a new species of tapir, Tapirus kabomani, which, crucially, is smaller than Tapirus terrestris. The original Cozzuol et al. BioOne article which announces the discovery can be found here. Interestingly, evidence for this species has been found in various locations in the lowland South America, including one location a mere 240 miles northeast of Iquito territory, suggesting that the Iquito ariyuukʷaaha is Tapirus kabomani.

Although the potential solution to the ariyuukʷaaha mystery is quite satisfying, it is worth pointing out that the ‘discovery’ in question is of course a curious one, in that the existence of this second species of tapir is no news to several Amazonian peoples, as Cozzuol et al. themselves point out. Although reports by indigenous peoples of this species to Western scientists date at least to an early 19th century mention of this species to Carl Friedrich Philip von Martius (see here), biologists never pursued this lead systematically, and thereby managed to miss identifying a quite massive mammal. Whatever the lesson for biologists in this story, as a field linguist who spends a reasonable amount of time concerned with ethnobiological matters as part of lexical work, this experience has left me with a renewed appreciation for how seriously we should take indigenous ethnobiological knowledge.


(1) In my experience, mammalian ethnobiological terminology is ‘easy’ in the sense that either there are few similar-looking species within a given genus in any given area, making species identification comparatively easy (e.g. within the genus Ateles), or there are a large number of similar-looking species, but there is a single ethnobiological term employed for the entire genus, or sometimes only two terms for an entire order, like bats (Chiroptera; the peoples I have worked with in the Amazon Basin make a two way terminological distinction: vampire bats  vs. any other member of the order).

(2) I’ve run across one pervasive terminological distinction in Peruvian Amazonian languages (and local Spanish) that does not correspond to a species distinction, although speakers of these languages believe that it does: the adult and juvenile phases of Bothrop atrox. In local Spanish, for example, the adult phase is referred to as a gergón, and the juvenile phase as a cascabel, and it is believed that they are distinct species.

Vale Constenla

November 11, 2013

I was saddened to hear that Adolfo Constenla Umaña recently passed away. Constenla was a giant in Costa Rican linguistics, doing important work on Chibchan languages and training students who also advanced our understanding of the family. Constenla was also the author of an important book that deserves to be better known than it is, Las lenguas del area intermedia: Introducción a su estudio areal. Among other things, this work evaluates whether the ‘area intermedia’, roughly the region south of the Mayan zone in Meso-America, and extending to northern Colombian Andes, constitutes a linguistic area. This study prefigures by almost two decades the increasingly common use of a relatively large number of typological features to assess areality, and carefully examines the distribution of diagnostic features outside the proposed area, as well as inside, an important methodological point not always attended to in older work on linguistic areas. In many respects this work represented one of the most rigorous studies of a linguistic area until recently, when computational techniques were harnessed to assess areality. Constenla left behind a rich body of work and a cadre of students, through which his influence will live on.

I recently learned of David Fleck’s new monograph Panoan Languages and Linguistics, available online here through the American Museum of Natural History. Fleck provides an internal classification of the family, but perhaps the greatest service he has provided is to sort through the perplexing blizzard of Panoan ethnonyms one finds in the colonial and ethnographic literature, and in older classifications of Panoan languages. He also discusses language names that have been applied to both Panoan languages and non-Panoan ones (Katukina, anyone?), which is another source of confusion. This is a very useful reference to anyone who engages, however briefly, with Panoan linguistics.


This post describes the use of a phylogenetic analysis program, Mesquite, to identify possibly erroneous cognacy judgments in large lexical datasets. I’ve found it to be a very useful tool, and I haven’t heard other linguists talk about it a great deal, so I though it might be interesting for others to hear about. But first, some background…

For the past couple of years the Berkeley Comparative Tupí-Guarani Project* has been working to develop an improved internal classification of the Tupí-Guaraní (TG) family. By this point we have, among other things, collected lexical data on 30 TG languages (plus Awetí and Mawe, two non-TG Tupian languages, to serve as out-group languages), using a 539-item comparative list, and arranged these data into approximately 1300 non-singleton cognate sets. We will, in the not-too-distant future, start constructing correspondence sets in order to begin applying the Comparative Method to this dataset, but in the meantime, we are running computational phylogenetic analyses on the lexical data to obtain a preliminary internal classification. What we obtain are trees like the following:

An inferred phylogeny of TG languages based on lexical cognate sets

An inferred phylogeny of TG languages based on lexical cognate sets (click for larger view)

This is actually a pretty credible TG tree (although it may not, of course, be entirely correct): it largely reproduces the basic groups of Rodrigues (1984/5) and the proposed subgroups of Rodrigues and Cabral (2002), along with additional structure that seems plausible if you have, like us, been spending a lot of time looking at TG lexical and morphological data. (It also yields a very sensible model for the geographical dispersal of the family, but that’s a matter for another day.) One weakness of the phylogenetic result, however, is the support values for certain subgroups. Support values correspond roughly to the probability that a given subgroup is, in fact, a subgroup, and we have wanted to use a value of 0.85 as our cutoff point for considering a clade (or subgroup) credible. Unfortunately, some of our most interesting subgroups have lower values. For example, the subgroup that corresponds the more or less to Groups I+II+III in the Rodrigues classification has a support value of 0.81.

Fortunately, one can increase the support values by improving the reliability of the cognate sets (assuming that they are not already perfect — ha ha). Computationally, lowered support values arise from ‘conflicting signals’, i.e. different sets of evidence that point to different subgroups. So, for example, there is good evidence for our Group I+II+III subgroup, i.e. cognate sets that uniquely define this subgroup, but there are other cognate sets that lead one to want to include other languages in this larger subgroup, or languages from this subgroup in other subgroups,  reducing support the support for all of the subgroups.

This kind of conflicting signal can arise from a number of sources, but two important ones are: 1) independent innovations that yield false cognacy; and 2) mistakes in building cognate sets, where two elements are deemed to be cognate when they are not. The latter issue is, of course, always a potential issue at this stage in the process, i.e. before complete application of the Comparative Method, since without adequate knowledge of the relevant sound changes, it is possible to treat bogus look-alikes as cognate, and miss true cognates due to changes that obscure cognacy. And in dealing with such a large dataset, human error inevtiably comes into play: forms are deemed cognate in the wee hours of a particular morning, which really aren’t credibly cognate by the cold light of day.

Fortunately, we have found a very useful tool for ferreting out potentially bogus cognacy judgments in the form of Mesquite, an application that serves to carry out analyses on inferred phylogenetic trees. Mesquite has many functions, but the relevant one for our purposes is its ‘reconstruction’ of ancestral states. Basically what this function does is to ‘reconstruct’ (i.e. identify) how far back in a phylogenetic tree a given phylogenetic character (in our case, a form that is a member of a particular cognate set) reconstructs, according to the tree that one’s phylogenetics application has inferred. In doing so, it also identifies cases of independent innovation (likewise, according to the inferred tree).

One thing that makes Mesquite especially nice is that it has a nice graphical interface that allows one to easily spot instances of independent innovation. First, in the following screen shot, one can see a nice instance of a character (KNEE4, presence of forms for ‘knee’ cognate to, e.g. Assuriní de Tocantíns kanawá), that seems to reconstruct quite solidly for one of the robust subgroups in our larger ‘Central’ subgroup.

A Mesquite 'reconstruction' for the TG KNEE4 set

A Mesquite ‘reconstruction’ for the TG KNEE4 cognates

Next, in the following screen shot, one can see a character (TOE2) that was, according the ancestral state reconstruction associated with the tree, independently innovated  three times: in Chiriguano, Pauserna, and Wayampí. This is a somewhat suspicious state of affairs, suggesting that it might make sense to look at the cognate set again. Doing so we see that the word for ‘toe’ in these languages is actually a compound meaning something like ‘foot head’. Body-part compounds with ‘head’ or ‘bone’ are fairly common in TG languages, suggesting that these forms for ‘toe’ are independently innovated, based on (true) cognates for ‘foot’ and ‘head’. On this basis we exclude this compound as informative for purposes of phylogenetic analysis. And note that pattern evident in the ‘reconstruction’ is precisely the kind of conflicting signal that might lower the support for subgroups like Central and Peripheral.

A Mesquite 'reconstruction' for the TOE2 cognate set

A Mesquite ‘reconstruction’ for the TG TOE2 cognates

Examining suspicious ‘reconstructions’ like the TOE2 one has led us to identify previously unnoticed complex forms, as in this case, as well as instances of poor cognacy judgments. And having identified several dozen problematic sets in this way,  we have high hopes that our next TG tree will have the support values that we are pining for. We’re keeping our fingers crossed, and I’ll post our next set of results.

In any case, I’ve found Mesquite to be such a wonderful tool for evaluating cognate sets in the context of phylogenetic analysis that I wanted to share it with others who might not be familiar with it. (And thanks, Natalia, for introducing it to the TG group!)


Rodrigues, A. D. 1984/1985. Relações internas na família lingüística tupí-guaraní. Revista de Antropologia 27/28, 33–53.

Rodrigues, A. D. and A. S. A. C. Cabral. 2002. Revendo a classificação interna da família tupí-guaraní. In A. S. A. C. Cabral and A. D. Rodrigues (eds.), Línguas Indígenas Brasileiras: Fonologia, Gramática e História, pp. 327–337. Belém: Editora Universitária, Universidade Federal do Pará.

*Current project members include Keith Bartolomei, Natalia Chousou-Polydori, Erin Donelly, and Zachary O’Hagan; alumni include Mike Roberts and Vivian Wauters. The work described here has been funded in part by NSF BCS #0966499 . Thanks also to Sebastian Drude and Françoise Rose for data-sharing!

One of my favorite new blogs on the linguistics scene is Diversity Linguistics Comment, which presents itself as

… a scholarly blog that discusses current issues in language typology and language description, written by linguists for other linguists. The notion of “diversity linguistics” recognizes the close connections between the enterprises of language comparison and analysis of particular languages. Topics include grammatical structures (syntax and morphology, phonology), language contact, language change in a comparative perspective, and genealogical linguistics.

Posts appear somewhat infrequently, but they are always substantive and interesting, often accompanied by equally meaty comment threads. As an example of the fare provided, consider Simeon Floyd’s recent post on Quechua adjectives (here), which engages with the debate over the universality of word classes, especially as this question intersects with descriptive linguistic practice. The post focuses on Simeon’s own work on Quechua adjectives, and Martin’s Haspelmath’s criticism of Simeon’s (and others’) conclusions. I find it to be a very thoughtful piece that provides a nice example of the subtle issues involved in applying putatively cross-linguistically valid labels like ‘adjective’ to language-specific word classes, and also shows how attention to naturally-occurring discourse can play a crucial role in grammatical analysis.

In the context of work that I’ve been carrying out on 17th and 18th century Omagua society, I’ve come to be interested in the etymology of the names of Omagua communities mentioned in the Jesuit records of the period, including those found on Samuel Fritz‘ map (a not awful copy is available here). A brief inspection of these names as given in these sources (e.g. Zuruité, Iviraté, Yoaivaté, and Aruparaté) reveal that most of them appear to consist of a nominal root (e.g. zurui = /surui/ ‘catfish sp.’ or ivira = /ɨwɨra/ ‘tree’) and a suffix -té. What, though, is the suffix in question?

An obvious first guess for a Tupí-Guaraní (TG) specialist would be the suffix -eté, found in most TG languages, which expresses meanings like ‘true’ or ‘real’. This suffix surfaces, for example, in Paraguayan Guaraní word yawareté ‘jaguar’ (lit. ‘true jaguar’, to distinguish it from yawar which now means ‘dog’, due to semantic shift). And indeed one finds cognates to this suffix in Omagua, as in the word Omaguayete ‘true Omaguas’, recorded in Jesuit sources, which was apparently the autonym for the Omagua group that lived on the Upper Napo (see here for details). But for two reasons, it seems unlikely that the -te found in the toponyms in question is the same suffix. In the first place, meaning of the form that would be derived seems implausible and unmotived: why would the Omaguas wanted to have called their communities ‘True Catfish’ or ‘True Tree’? Second, the form of the suffix in Omaguayete doesn’t quite seem to match that of the toponyms in question, since in the former it appears to be -yete, but in the toponyms, -te. Even if one argued, as one might want to (see below), that the form of the suffix in the toponyms is underlyingly -ete, it is unclear why vowel hiatus would have been resolved by vowel deletion in the toponyms, but not in the name of the Omagua subgroup autonym.

By chance, however, I recently noticed that in at least one TG language, Parintintín, a suspiciously similar suffix, -ete, is used to derive derive river names (Betts 1981). One possibility, then, is that -te did the same thing in Old Omagua (it no longer does), and that the 17th and 18th century community names were ultimately river names. Nothing would be more natural, in fact: indigenous Amazonian communities very frequently take as their names the names of nearby small tributaries. On this view, then, -te was an endocentric derivational suffix that derived hydronyms. Whether -te derived forms that denoted rivers in general or small rivers is unclear at this point. Although every Amazonian language I’ve done substantial work with exhibits hydronymic derivational morphology like this, some languages exhibit two (or more) morphemes that distinguish the size of the river, while others don’t. For example, Máíhɨ̃ki distinguishes two sizes of river (-ya ‘river’ and -gaya ‘creek’), but Iquito exhibits only a single hydronymic derivational suffix (-mu).

There are two things that would need to be done to properly evaluate the hypothesis sketched out above. First, it would be good to see if there are tributaries in former Omagua territory that actually bear names with the -te suffix. Of course, there has been a lot of toponymic turnover since the 18th century, when the Omaguas were decimated by disease and Portuguese slave raids, and largely abandoned their former territories. Names of Quechua origin have probably replaced most of the older Omagua names, but some traces may remain. I don’t have maps on hand of the necessary detail, but my big 3,300,000:1 scale map of the Amazon basin reveals one such tributary, Puruté, suggesting that some progress could be made here.

The second issue to examine would be to see if other TG languages exhibit a hydronym-deriving suffix cognate to Parintintín -ete. This would help reassure us that the resemblance between the Omagua and Parintintín suffixes is not a chance similarity.


Betts, L. V. 1981. Dicionário Parintintin-Portugues Portugues-Parintintin. Cuiabá: Summer Institute of Linguistics.

A new volume in the occasional Survey (of Californian and Other Indian Languages) Reports series was just published, and is available online here. The volume, entitled Structure and contact in languages of the Americas, was edited by John Sylak-Glassman and Justin Spence, and includes a number of very interesting articles on South and Central American languages, as evident in the table of contents, reproduced below:
  • Subgrouping in the Tupí-Guaraní family: A phylogenetic approach by Natalia Chousou-Polydouri and Vivian Wauters
  • A ‘perfect’ evidential: The functions of -shka in Imbabura Quichua by Jessica Cleary-Kemp
  • Hierarchies, subjects, and the lack thereof in Imbabura Quichua subordinate clauses by Clara Cohen
  • One -mi: An evidential, epistemic modal, and focus marker in Imbabura Quechua by Iksoo Kwon
  • The stops of Tlingit by Ian Maddieson and Caroline L. Smith
  • The plank canoe of southern California: Not a Polynesian import, but a local innovation by Yoram Meroz
  • Variable affix ordering in Kuna by Lindsey Newbold
  • Passive constructions in Kʷak̓ʷala by Daisy Rosenblum
  • Dialect contact, convergence, and maintenance in Oregon Athabaskan by Justin Spence
  • Affix ordering in Imbabura Quichua by John Sylak-Glassman

Get every new post delivered to your Inbox.