Máíhuna film project

April 18, 2014

I recently learned of a new documentary film project that aims to tell the story of the Máíhuna fight to defend their lands in the face of a plan to build a road through their traditional territory. As the project website describes,

The Maijuna, an indigenous group of the northeastern Peruvian Amazon, live in one of the most biologically rich regions of the world. Unfortunately, the Peruvian government wants to build a road directly through the heart of their ancestral lands, an area that they have cared for and lived in for millennia. The direct effects of highway construction and the associated impacts from an influx of colonists and subsequent deforestation would irreversibly alter the ecological fabric of this currently roadless area. Given that the Maijuna are a forest dwelling people who rely on the forest for sustenance and survival, building this road would severely impact their livelihoods and traditional culture. Help us tell the story of the last remaining Maijuna through the power of documentary filmmaking as they fight for their ancestral homeland and their cultural survival. This film is critically important because it will help to get the word out about the plight of the Maijuna and help them in their struggle to defend themselves.

This is a joint project between Professor Michael Gilmore and students Tyler Orton and Will Martinez of George Mason University, documentary filmmaker Jacob Wagner, and the non-governmental organization Rainforest Conservation Fund.

The project is currently seeking to raise $25k in funds through crowd-funding, and you can learn more about the project this article, and support it through the project’s Indiegogo page.

I was delighted to receive via email yesterday a copy of a new collection of Máíhɨ̃ki texts compiled by Amalia Skilton, who was a member of the Máíhɨ̃ki Project fieldwork team in 2012 and 2013. Amalia began independent fieldwork on Máíhɨ̃ki in the fall of 2013, and since January of this year, she has been working with speakers of Northern dialect of Máíhɨ̃ki in the town of El Estrecho, located on the Peruvian side of the Peru-Colombia border.

Northern Máíhɨ̃ki was historically spoken in the basin of the Algodón River (Máíhɨ̃ki: Tótòyà), a major southern tributary of the Putumayo River, and the remaining 13 speakers of this variety live either in the community of Tótòyà, located on the river of the same name, or have moved to El Estrecho to have easier access to education, work, and commercial products. Northern Máíhɨ̃ki was, until Amalia began her work, the least documented of the three Máíhɨ̃ki varieties (Western Máíhɨ̃ki, spoken in the Yanayacu River basin, Eastern Máíhɨ̃ki, spoken in the Sucusari River basin, and Northern Máíhɨ̃ki), but its small number of remaining speakers are considered by many Máíhuna to be among the most knowledgeable in terms of traditional culture, including oral traditions. Amalia has also found quite a number of grammatical and phonological differences between Northern Máíhɨ̃ki and the other Máíhɨ̃ki varieties which will no doubt lead to interesting insights into the history of the language as whole.

The text collection that Amalia has compiled for distribution to the Maihuna communities includes texts from majority of the speakers of the Northern dialect (Adriano Ríos Sanchez, Enrique Ríos Díez, Féderico Lopez Algoba, Lizardo González Flores, Otília López Gordillo, Pedro López Algoba, Soraida López Algoba, and Trujillo Ríos Díez), and includes illustrations by Gervasio López Mosoline. The oral texts related by these speakers, and transcribed and translated by Amalia with their help, are all fascinating, and exemplify a wide range of themes and forms of verbal artistry.

Anyone with an interest in Tukanoan linguistics or Amazonian verbal art should check it out here (6.3mb)!

A couple of months ago an announcement by a group biologists led by a team working out of the Universidade Federal do Minais Gerais, cleared up a small mystery that has been nagging me for about ten years now, and the resolution to this mystery nicely illustrates how the ethnobiological knowledge of the peoples that field linguists work with can outstrip that of biological experts we often rely upon.

This mystery first raised its head when I was working in Peruvian Amazonia, collaborating with several  speakers of Iquito to document the ethnobiological terminology of their language, as part of a broader effort to develop an Iquito dictionary (see here for a draft). Although we eventually got into more challenging domains like birds, fish, and plants, we began with the easiest domain: mammals (1). Our work on mammal terminology went quickly and smoothly, but for one thing: the men I was working with — principally Hermenegildo Díaz Cuyasa and Jaime Pacaya Inuma — provided two Iquito terms corresponding to the local Spanish term for tapir (sachavaca): pɨsɨkɨ and ariyuukʷaaha. The first was clearly Tapirus terrestris, the lowland tapir found all over the Amazon Basin, but I was perplexed by the second term, ariyuukʷaaha, which Hermenegildo and Jaime explained denoted a smaller variety than the one denoted by pɨsɨkɨ. I probed to see if perhaps the two terms referred to different life stages of the same species or the like or simply morphological variants (2), but the Iquito speakers were positive that there were in fact two distinct species of tapir, and described the physical characteristics that distinguished them. Mammologists, however, recognized only a single species of tapir in Amazonia: Tapirus terrestris.

 Twelve Iquito speakers at lunch in their honor (2004); Hermenegildo Díaz Cuyasa is in the back row, far left, and Jaime Pacaya Inuma, far right.

Twelve Iquito speakers at lunch in their honor (2004); Hermenegildo Díaz Cuyasa is in the back row, far left, and Jaime Pacaya Inuma, far right.

I was stumped by this state of affairs, and in the Iquito dictionary I just decided to indicate that pɨsɨkɨ was Tapirus terrestris, and that ariyuukʷaaha denoted a smaller variety of tapir which speakers identified as a distinct species. I was never fully satisfied by the this, however. How could biologists miss a wholly distinct species of mammal as large as a tapir? But on the other hand, how could a people who hunted tapirs regularly be wrong about a species distinction like this?

I expected this to be one of those numerous mysteries that crop up in fieldwork that are never resolved, and was thus very excited when I read about the discovery of a new species of tapir, Tapirus kabomani, which, crucially, is smaller than Tapirus terrestris. The original Cozzuol et al. BioOne article which announces the discovery can be found here. Interestingly, evidence for this species has been found in various locations in the lowland South America, including one location a mere 240 miles northeast of Iquito territory, suggesting that the Iquito ariyuukʷaaha is Tapirus kabomani.

Although the potential solution to the ariyuukʷaaha mystery is quite satisfying, it is worth pointing out that the ‘discovery’ in question is of course a curious one, in that the existence of this second species of tapir is no news to several Amazonian peoples, as Cozzuol et al. themselves point out. Although reports by indigenous peoples of this species to Western scientists date at least to an early 19th century mention of this species to Carl Friedrich Philip von Martius (see here), biologists never pursued this lead systematically, and thereby managed to miss identifying a quite massive mammal. Whatever the lesson for biologists in this story, as a field linguist who spends a reasonable amount of time concerned with ethnobiological matters as part of lexical work, this experience has left me with a renewed appreciation for how seriously we should take indigenous ethnobiological knowledge.


(1) In my experience, mammalian ethnobiological terminology is ‘easy’ in the sense that either there are few similar-looking species within a given genus in any given area, making species identification comparatively easy (e.g. within the genus Ateles), or there are a large number of similar-looking species, but there is a single ethnobiological term employed for the entire genus, or sometimes only two terms for an entire order, like bats (Chiroptera; the peoples I have worked with in the Amazon Basin make a two way terminological distinction: vampire bats  vs. any other member of the order).

(2) I’ve run across one pervasive terminological distinction in Peruvian Amazonian languages (and local Spanish) that does not correspond to a species distinction, although speakers of these languages believe that it does: the adult and juvenile phases of Bothrop atrox. In local Spanish, for example, the adult phase is referred to as a gergón, and the juvenile phase as a cascabel, and it is believed that they are distinct species.

Vale Constenla

November 11, 2013

I was saddened to hear that Adolfo Constenla Umaña recently passed away. Constenla was a giant in Costa Rican linguistics, doing important work on Chibchan languages and training students who also advanced our understanding of the family. Constenla was also the author of an important book that deserves to be better known than it is, Las lenguas del area intermedia: Introducción a su estudio areal. Among other things, this work evaluates whether the ‘area intermedia’, roughly the region south of the Mayan zone in Meso-America, and extending to northern Colombian Andes, constitutes a linguistic area. This study prefigures by almost two decades the increasingly common use of a relatively large number of typological features to assess areality, and carefully examines the distribution of diagnostic features outside the proposed area, as well as inside, an important methodological point not always attended to in older work on linguistic areas. In many respects this work represented one of the most rigorous studies of a linguistic area until recently, when computational techniques were harnessed to assess areality. Constenla left behind a rich body of work and a cadre of students, through which his influence will live on.

I recently learned of David Fleck’s new monograph Panoan Languages and Linguistics, available online here through the American Museum of Natural History. Fleck provides an internal classification of the family, but perhaps the greatest service he has provided is to sort through the perplexing blizzard of Panoan ethnonyms one finds in the colonial and ethnographic literature, and in older classifications of Panoan languages. He also discusses language names that have been applied to both Panoan languages and non-Panoan ones (Katukina, anyone?), which is another source of confusion. This is a very useful reference to anyone who engages, however briefly, with Panoan linguistics.


This post describes the use of a phylogenetic analysis program, Mesquite, to identify possibly erroneous cognacy judgments in large lexical datasets. I’ve found it to be a very useful tool, and I haven’t heard other linguists talk about it a great deal, so I though it might be interesting for others to hear about. But first, some background…

For the past couple of years the Berkeley Comparative Tupí-Guarani Project* has been working to develop an improved internal classification of the Tupí-Guaraní (TG) family. By this point we have, among other things, collected lexical data on 30 TG languages (plus Awetí and Mawe, two non-TG Tupian languages, to serve as out-group languages), using a 539-item comparative list, and arranged these data into approximately 1300 non-singleton cognate sets. We will, in the not-too-distant future, start constructing correspondence sets in order to begin applying the Comparative Method to this dataset, but in the meantime, we are running computational phylogenetic analyses on the lexical data to obtain a preliminary internal classification. What we obtain are trees like the following:

An inferred phylogeny of TG languages based on lexical cognate sets

An inferred phylogeny of TG languages based on lexical cognate sets (click for larger view)

This is actually a pretty credible TG tree (although it may not, of course, be entirely correct): it largely reproduces the basic groups of Rodrigues (1984/5) and the proposed subgroups of Rodrigues and Cabral (2002), along with additional structure that seems plausible if you have, like us, been spending a lot of time looking at TG lexical and morphological data. (It also yields a very sensible model for the geographical dispersal of the family, but that’s a matter for another day.) One weakness of the phylogenetic result, however, is the support values for certain subgroups. Support values correspond roughly to the probability that a given subgroup is, in fact, a subgroup, and we have wanted to use a value of 0.85 as our cutoff point for considering a clade (or subgroup) credible. Unfortunately, some of our most interesting subgroups have lower values. For example, the subgroup that corresponds the more or less to Groups I+II+III in the Rodrigues classification has a support value of 0.81.

Fortunately, one can increase the support values by improving the reliability of the cognate sets (assuming that they are not already perfect — ha ha). Computationally, lowered support values arise from ‘conflicting signals’, i.e. different sets of evidence that point to different subgroups. So, for example, there is good evidence for our Group I+II+III subgroup, i.e. cognate sets that uniquely define this subgroup, but there are other cognate sets that lead one to want to include other languages in this larger subgroup, or languages from this subgroup in other subgroups,  reducing support the support for all of the subgroups.

This kind of conflicting signal can arise from a number of sources, but two important ones are: 1) independent innovations that yield false cognacy; and 2) mistakes in building cognate sets, where two elements are deemed to be cognate when they are not. The latter issue is, of course, always a potential issue at this stage in the process, i.e. before complete application of the Comparative Method, since without adequate knowledge of the relevant sound changes, it is possible to treat bogus look-alikes as cognate, and miss true cognates due to changes that obscure cognacy. And in dealing with such a large dataset, human error inevtiably comes into play: forms are deemed cognate in the wee hours of a particular morning, which really aren’t credibly cognate by the cold light of day.

Fortunately, we have found a very useful tool for ferreting out potentially bogus cognacy judgments in the form of Mesquite, an application that serves to carry out analyses on inferred phylogenetic trees. Mesquite has many functions, but the relevant one for our purposes is its ‘reconstruction’ of ancestral states. Basically what this function does is to ‘reconstruct’ (i.e. identify) how far back in a phylogenetic tree a given phylogenetic character (in our case, a form that is a member of a particular cognate set) reconstructs, according to the tree that one’s phylogenetics application has inferred. In doing so, it also identifies cases of independent innovation (likewise, according to the inferred tree).

One thing that makes Mesquite especially nice is that it has a nice graphical interface that allows one to easily spot instances of independent innovation. First, in the following screen shot, one can see a nice instance of a character (KNEE4, presence of forms for ‘knee’ cognate to, e.g. Assuriní de Tocantíns kanawá), that seems to reconstruct quite solidly for one of the robust subgroups in our larger ‘Central’ subgroup.

A Mesquite 'reconstruction' for the TG KNEE4 set

A Mesquite ‘reconstruction’ for the TG KNEE4 cognates

Next, in the following screen shot, one can see a character (TOE2) that was, according the ancestral state reconstruction associated with the tree, independently innovated  three times: in Chiriguano, Pauserna, and Wayampí. This is a somewhat suspicious state of affairs, suggesting that it might make sense to look at the cognate set again. Doing so we see that the word for ‘toe’ in these languages is actually a compound meaning something like ‘foot head’. Body-part compounds with ‘head’ or ‘bone’ are fairly common in TG languages, suggesting that these forms for ‘toe’ are independently innovated, based on (true) cognates for ‘foot’ and ‘head’. On this basis we exclude this compound as informative for purposes of phylogenetic analysis. And note that pattern evident in the ‘reconstruction’ is precisely the kind of conflicting signal that might lower the support for subgroups like Central and Peripheral.

A Mesquite 'reconstruction' for the TOE2 cognate set

A Mesquite ‘reconstruction’ for the TG TOE2 cognates

Examining suspicious ‘reconstructions’ like the TOE2 one has led us to identify previously unnoticed complex forms, as in this case, as well as instances of poor cognacy judgments. And having identified several dozen problematic sets in this way,  we have high hopes that our next TG tree will have the support values that we are pining for. We’re keeping our fingers crossed, and I’ll post our next set of results.

In any case, I’ve found Mesquite to be such a wonderful tool for evaluating cognate sets in the context of phylogenetic analysis that I wanted to share it with others who might not be familiar with it. (And thanks, Natalia, for introducing it to the TG group!)


Rodrigues, A. D. 1984/1985. Relações internas na família lingüística tupí-guaraní. Revista de Antropologia 27/28, 33–53.

Rodrigues, A. D. and A. S. A. C. Cabral. 2002. Revendo a classificação interna da família tupí-guaraní. In A. S. A. C. Cabral and A. D. Rodrigues (eds.), Línguas Indígenas Brasileiras: Fonologia, Gramática e História, pp. 327–337. Belém: Editora Universitária, Universidade Federal do Pará.

*Current project members include Keith Bartolomei, Natalia Chousou-Polydori, Erin Donelly, and Zachary O’Hagan; alumni include Mike Roberts and Vivian Wauters. The work described here has been funded in part by NSF BCS #0966499 . Thanks also to Sebastian Drude and Françoise Rose for data-sharing!

One of my favorite new blogs on the linguistics scene is Diversity Linguistics Comment, which presents itself as

… a scholarly blog that discusses current issues in language typology and language description, written by linguists for other linguists. The notion of “diversity linguistics” recognizes the close connections between the enterprises of language comparison and analysis of particular languages. Topics include grammatical structures (syntax and morphology, phonology), language contact, language change in a comparative perspective, and genealogical linguistics.

Posts appear somewhat infrequently, but they are always substantive and interesting, often accompanied by equally meaty comment threads. As an example of the fare provided, consider Simeon Floyd’s recent post on Quechua adjectives (here), which engages with the debate over the universality of word classes, especially as this question intersects with descriptive linguistic practice. The post focuses on Simeon’s own work on Quechua adjectives, and Martin’s Haspelmath’s criticism of Simeon’s (and others’) conclusions. I find it to be a very thoughtful piece that provides a nice example of the subtle issues involved in applying putatively cross-linguistically valid labels like ‘adjective’ to language-specific word classes, and also shows how attention to naturally-occurring discourse can play a crucial role in grammatical analysis.


Get every new post delivered to your Inbox.