WeSay: A tool for community-participatory lexicography?

In the most recent volume of Language Documentation and Conservation there is an article (here) about a piece of lexicographical software called WeSay. The interesting thing about WeSay is that it is designed to be used by lay members of language communities — rather than professional linguists — to build dictionaries of their own languages. There are obvious reasons why this application would be interesting to many of us involved in language documentation, but I want to relate a personal experience that indicates why something like WeSay would definitely fill a serious gap in the range of currently available lexicographical software.

One of the major goals of the Iquito Language Documentation Project, in which I participated, was to integrate trained community members (‘community linguists’) into the day-to-day research activities of the project. One area in which we felt the community linguists could be especially productive was lexicography, especially since the tasks involved squared with the community linguists’ personal interests in the language. The question that immediately arose was how to coordinate the community linguists’ lexicographical work with the task of building the Shoebox lexical database.

Our first idea was simply to teach the community linguists how to enter data into the Shoebox database. We were already in the process of teaching the community linguists how to use PC laptops and word-processing software, so we thought that extending their training to include Shoebox would be a relatively straightforward matter. Unfortunately, this did not turn out to be the case. Shoebox can be difficult to use even for individuals with considerable computer-related experience, and for the community linguists, who were learning to use computers for the first time in their lives, the application proved far to finicky and difficult to use.

One of our team members had significant programming experience, however, and suggested that he write a front end for Shoebox that would considerably simplify the community linguists’ interactions with the database. The idea was a good one, but I had two misgivings. First, I was concerned that regardless of how foolproof the front end seemed, over the course of the nine months we were away from the community, and the community linguists were working on the dictionary independently, *something* unforeseen would happen with the front end, and bring work to a halt. Second, I was concerned that the team member who promised to maintain the front end would not stay with the project for its entire duration, and we would be left with a piece of home-grown software that we didn’t know how to modify or fix, should the need arise. We debated the issue at length, but the team leaned towards the front end idea, so we decided to try it.

At first, everything went well. The front end worked very well, and the community linguists found it easy and comfortable to use. The visiting linguists (including me) left at the end of the summer, and it was then that the problems arose. After about four months, something happened that disrupted the connection between the front end and the Shoebox database, and that was that until the team of visiting linguists returned five months later. The community linguists were smart and started entering their data into an Excel spreadsheet, so their work didn’t grind to a halt, but we had to spend a lot of time transferring the data into Shoebox. So, all in all, the front end experiment was not a great success. And, to top it off, the team member who wrote the front end didn’t return — he decided to quit linguistics and go into real estate.

From then on, the community linguists collected their data in notebooks, and every June, when the visiting linguists arrived, we spent many hours entering the data into Shoebox. Hardly a very efficient process, but the best we could manage at the time.

It should be obvious, then, why I was very excited to read about WeSay. Before I provide a brief description, let be add that apart from the LD&C article, information can also be obtained at the WeSay website (www.wesay.org), which includes a page of screenshots and Flash movies that illustrate how the program works (here).

Basically the idea behind WeSay is a much better implemented and more comprehensive version of the front end we came up with in the field. The user interface consists of relatively simple forms into which one enters data, and the entire paraphernalia of directories, data field codes, and the like are hidden from view. The program also provides guidance, in terms of semantic fields, to prompt the collection of lexical data, further facilitating the independent work of community members. Significantly, WeSay also provides localization tools, so that the interface can be translated into the locally appropriate language. Despite the simplicity of the interface, however, WeSay can also export in data formats used by more powerful lexicographical software. And note that WeSay is free, open source software, and can be downloaded from the WeSay site.

In many respects, then, WeSay sounds like the answer for those who are interested in linguistic documentation projects with significant community participation. I have yet to try it out myself, but I look forward to doing so when I have the time. If any readers have had personal experience with WeSay, I’d be interested to hear about it.


Two talks in Lima

I am presently in Lima, getting on with the logistical and bureaucratic preparations for fieldwork this summer. Of possible interest to readers in Peru, however, I will also be giving several academic talks on Peruvian Amazonian languages while in Lima. The first two that I have confirmed dates on are both at the Pontificia Universidad Católica del Perú (PUCP), on May 20th and May 22nd. I don’t yet have precise locations for the talks, but I will add that information as soon as I have it. Below I present abstracts for the two talks (with due apologies to the Spanish-speaking peoples of the world). Hope to see some people there!

May 20th, 5:00 pm
La marcación de una categoría flexiva por el orden de palabras: el modo irreal en el idioma iquito (familia zaparoana, Amazonía peruana)

Se sabe que las categorías flexivas de tiempo, aspecto, y modo (TAM) se pueden marcar utilizando diversas estrategias entre las lenguas humanas, inclusive por medio de los afijos, la mutación fonológica, la suplencia léxica y por procesos super-segmentales, como los cambios en los patrones tonales (Anderson 1992, Spencer 1998). En esta ponencia describiré un tipo de marcación de una categoría TAM que no ha sido mencionado hasta el presente en las tipologías de morfología flexiva: la marcación por medio de cambios en el orden de las palabras en una cláusula.

Existe un sistema de marcación de una categoría TAM por medio del orden de palabras en el idioma iquito, un idioma zaparoano de la Amazonía peruana. En este idioma, el modo irreal es marcado por el desplazamiento de elementos post-verbales a la posición entre el sujeto y el verbo, como en (1), donde el elemento /nu/ se desplaza a esta posición. El modo real es marcado por la falta de tal desplazamiento, como en (2).

(1) iina anitáaqui nu ása-qui
DET huangana 3.PRO comer-PERF
`El huangana va a comerlo.’

(2) iina anitáaqui ása-qui nuú.
DET huangana comer-PERF
`El huangana lo comió.’

Muestro que los elementos que desplazan a la ‘posición irreal’ no forman una clase sintáctica coherente, y que los elementos desplazados no son constituyentes sintácticos en todos casos, sino que, a veces son fragmentos de constituyentes. A base de estas observaciones, argumento que el elemento desplazado es un constituyente fonológico (una ‘palabra fonológica’) y que el significado del elemento y sus rasgos sintácticos no son relevantes en marcar el modo irreal. Como tal, el elemento desplazado es semánticamente vacío, y solo sirve como materia fonológica que ocupa la posición irreal.


Anderson, Stephen. 1992. A-morphous morphology. Cambridge University Press.

Spencer, Andew. 1998. Morphophonological operations. En Andrew Spencer and Arnold Zwicky (Eds.), The handbook of Morphology. pp. 123-143.

May 22nd, 12:00 noon
La evidencialidad, la pragmática y la responsibilidad: nexos entre la gramática y la vida social en la sociedad Nanti (familia arahuaco, Amazonía peruana)

La evidencialidad ha sido un enfoque importante la para investigaciones sobre la actividad comunicativa como un aspecto de la vida social (Hill & Irvine 1993, Sidnell 2005). Es claro que la evidencialidad es una parte de las prácticas que forman una ‘epistemología cotidiana’ y su investigación ofrece una apertura para entender las maneras en que los recursos gramaticales sirven como instrumentos en la construcción de las relaciones y estructuras sociales a través de la interacción comunicativa.

En esta charla, analizo el uso de recursos evidenciales por los hablantes del Nanti, un idioma arahuaco de la Amazonía peruana, en el contexto de interacciones sociales cotidianos. Muestro que uno de los usos principales de los recursos evidenciales en la sociedad Nanti es para construir representaciones de acontecimientos que disminuyen la responsabilidad del hablante por percances o por situaciones problemáticas. Mi argumento es que la disminución de responsabilidad es un resultado de una ‘metáfora pragmática’ (Silverstein 1976) por lo cual el tipo de relación perceptual indicado por un recurso evidencial corresponde a la intensidad con que el hablante se involucra en un cierto acontecimiento, y por este medio, el recurso indica de manera indirecta la responsibilidad del hablante por la situación.

Este resultado ilustra un aspecto de las funciones sociales y comunicativas de los evidenciales, los cuales han sido un tema de debate entre lingüistas y antropólogo-lingüistas (Aikhenvald 2004). Este resultado también respalda los argumentos de De Haan (1999) y Aikhenvald (2004) quiénes señalan que la evidencialidad es distinta de la modalidad epistémica, aún al nivel de la pragmática.

Aikhenvald, Alexandra. 2004. Evidentiality. Cambridge University Press.

De Haan, Ferdinand. 1999. Evidentiality and epistemic modality: Setting boundaries. Southwest Journal of Linguistics. 18: 83-101.

Hill, Jane and Judith Irvine. 1993. Responsibility and evidence in oral discourse. Cambridge University Press.

Sidnell, Jack. 2005. Talk and practical epistemology. John Benjamins Publishing Company.

Silverstein, Michael. 1976. Shifters, linguistic categories, and cultural description. En Keith Basso and Henry Selby (Eds.), Meaning in Anthropology. University of New Mexico Press. pp. 11-56

Two on-line resources for Amazonianists

Fabre’s Diccionario etnolingüístico

I recently had a conversation that made me realize that not every single Amazonianist is familiar with Alain Fabre’s on-line Diccionario etnolingüístico y guía bibliográfica de los pueblos indígenas sudamericanos. Since I have found this to be the single most comprehensive and detailed bibliographic reference work on Amazonian languages currently available I though I should help publicize this resource.

The Diccionario is organized by language family, and within each family, Fabre provides information on genetic classification, numbers of speakers, and their location for each language. The greatest value of this work, however, lies in the incredible thoroughness of its bibliographic references on the languages and societies he covers. Of course, I can only judge the references on languages and language families with which I am familiar, but I have found the coverage to be truly impressive. Fabre has managed to locate both printed works of great obscurity and more recent out-of-the-way digital publications. In addition Fabre updates this work periodically, which means that it has steadily improved over time.


In a recent message, Eduardo Rivail Ribeiro informed me that the etnolinguistica.org site has been updated, redesigned, and has had a great deal of new information and material added to it. For those who are unfamiliar with the site, etnolinguistica.org is devoted to the linguistics of South America, and to a lesser but still significant degree, the ethnography of the region. The focus of the site is very much on Amazonia, and especially Brazilian Amazonia, but I think this simply reflects the center-of-gravity of the interests of those involved with the site.

The site has several very useful pages, including my favorite: a page of links to digital versions of MA and PhD theses on Amazonian languages. There is also a page providing links to open access on-line journals that focus on, or touch on, South American languages. Last, but certainly not least, a link is provided to join the etnolinguistica.org listserve, which I have found very informative and interesting.

Off to Peru

Tomorrow morning my partner Chris Beier and I return to Peru for a summer of fieldwork. I plan to keep blogging as much as I can while in Peru, and it’s my hope to break some new ground in terms of from how close to the edge of the wired world I can post. Internet access is spreading further and further into the jungle, so who knows how far I’ll be able to get.

We have a very busy summer planned. In fact, I feel that I have transitioned into a phase of my life where I would like to do much more fieldwork than I possibly have time for. We first plan to return to the Nanti community of Montetoni, where we actually have a house, if the thatch hasn’t given out by now. Apart from seeing our friends and drinking huge amounts of manioc beer with them during village feasts, and continuing our long-term health- and education-related work with the community, I have a number of research goals.

I’m presently writing a paper on the fascinating Nanti reality status (realis/irrealis) inflectional system, and I need to check on reality status marking in epistemic conditional constructions. They’re rare as hen’s teeth in texts, so I need to get some more examples. I’m also planning to do some more work on the Nanti stress system. A few years back Megan Crowhurst and I wrote a paper that focused on stress in Nanti verbs, and I now want to write a paper looking at the nominal stress, which behaves quite differently. Finally, in writing my dissertation, I noticed an interesting gap in the textual data I have on an exotic and discursively rare morphosyntactic alignment system found in Nanti ditransitive verbs: I realized I had no data at all on first or second person theme/patient arguments. Curiously, a perusal of the literature on the languages most closely related to Nanti showed the exact same gap, without a word of explanation from any of the authors. So now I am intensely curious if this empirical gap is accidental, or if it represents a restriction that the patient/theme argument must be lower in the speech act participant hierarchy than the beneficiary/recipient argument.

We’ll also be returning to the Iquito community of San Antonio, to say hello to our friends there, see how the language revitalization program is doing, and do a couple of weeks of research. I’m working on a comprehensive reference grammar of this language — along with several colleagues — which I will finally be getting back to, now that my dissertation is out of the way. I’ll be working on a few outstanding things for the grammar, including nailing down the complicated prosodic system, which features some subtle interactions between stress and tone. I feel we have a good analysis of the system, and I think we’ll be able to clear up the remaining questions quickly. I also hope to settle the semantics of one stubborn corner of the evidential system, and in particular, the semantics a morpheme that appears to exhibit visual evidential, mirative, and malefactive (!) meanings. Once again, this morpheme is textually very rare, so it has proven difficult to figure out.

The single most exciting project we have planned for this summer, however, is an attempt to do some fieldwork on Andoa, a language of the same family as Iquito. The word on the street was that the last speaker of this language died in 1993, but in late 2006 we ran into a French anthropologist who had located two elderly speakers and made several hours of recordings with them, some of which I listened to. The speakers seemed quite fluent, as far as I could tell, so there may still be a chance to get some basic data on the language. In the coming years I’m hoping to do some historical work on the Zaparoan family, and a few weeks of work to gather to basic lexical and morphological data on Andoa would be hugely helpful.

Now, back to packing…

Cultural constraints on Aharip grammar

Recent research on Aharip, one of the typologically remarkable languages of the Mt. Iso area of Papua New Guinea, has revealed striking evidence in support of recent proposals that a people’s culture can significantly affect the grammar of the language spoken by that people (Everett 2005). In particular, the culture of the Aharip, who live between the 300 and 400 meter isoclines of Mt. Iso, appears to prohibit any direct reference to immediate experience. Instead Aharip culture appears to be governed by a ‘Distant Experience Principle’ (DEP).

The cultural and grammatical consequence of the DEP are wide-ranging, including a tense system that distinguishes only distant future and distant past tenses. One of the most remarkable findings regarding Aharip grammar, however, is the absence of any grammatical structures lacking recursion.

All sentences in Ahirip are minimally biclausal, consisting of a main clause and and a subordinate clause. Concepts that are typically expressable monoclausally in most human languages are expressed in Aharip as subordinate clauses to a large class of speech act verbs, verbs of perception, or verbs of cognition.

Obligatory recursion is also found in possessive constructions. Thus, no expression directly corresponding to ‘my foot’ exists in Aharip, and must instead by expressed by an expression like `my brother’s brother’s foot’. Indeed, it appears that eloquence is Aharip society is measured by a speaker’s ability to employ recursion to create sentences so long that his or her interlocutor loses consciousness before they are complete.

The Aharip numeral system also shows the consequences of the DEP, in that it consists solely of transfinite numbers and infinitesimals.

As far as linguistic anthropologists have been able to determine, all Aharip utterances consist of quotations of creation myths and science fiction novels, the meanings of which are inferred on the basis of culture-specific communicative maxims, including the Maxim of Vast Quantities. This shows that the results reported by Picard et al. are not limited to extra-terrestrial languages, but apply to human ones also.

It is not clear how these results regarding Aharip culture and grammar are related to the previous results linking phonological inventories in Diuwe and Hidbap to altitude, although psychologists speculated that the fact that heavy clouds at the 300 meter isocline block the views of Aharip speakers of everything but distant mountain peaks may have exerted a significant effect on Aharip culture.


Everett, Daniel. 2005. Cultural constraints on grammar and cognition in Pirahã. Current Anthropology. 46 (4): 621-646.

Hot off the presses

The electrons have not even dried on the PDF of my dissertation — I submitted it and handed in the final lump of paperwork a few hours ago — but you can download it here. Here’s the abstract:

This dissertation examines the strategic deployment of evidential resources in communicative interactions among Nantis, an Arawak people of Peruvian Amazonia. In particular, this work focuses on Nantis’ uses of evidentials to modulate representations of responsibility, and shows that two distinct types of responsibility must be distinguished in order to account for the socially instrumental properties of evidential resources: event responsibility and utterance responsibility. Event responsibility concerns praiseworthiness or blameworthiness for happenings in which the relevant individual is causally implicated; while utterance responsibility concerns the socially salient attributes of an utterance (e.g. truthfulness), and not the utterance’s consequences. Evidential resources are shown to mitigate event responsibility in Nanti interactions by serving as a pragmatic metaphor, whereby the sensory directness or indirectness encoded by evidentials yields inferences regarding individuals’ participation in, and responsibility for, events. The use of evidential resources, principally quotative resources, to modulate utterance responsibility operates on quite different principles. Specifically, quotative resources serve to individuate utterances by attributing them to a particular source, thereby rendering explicit that individual’s commitment to the stances expressed by the quoted utterance. In doing so, the use of the quotative resource emphasizes that individual’s responsibility for the expressed stance. Quotative resources are also employed to decrease a first party’s responsibility for a stance, by attributing it to a third party. In this case, inferences based on the Maxim of Quantity lead interactants to infer reduced commitment on the part of the first party on the basis of the attribution of strong commitment to a third party. Both epistemic stance and a variety of moral and evaluative stances are relevant to utterance responsibility. Significantly, utterance responsibility is one of the few areas in which a pragmatic tie exists between evidentiality and epistemic modality, indicating the relative marginality of epistemic modality to evidentiality in Nanti, even at the level of pragmatics. An ethnographic and historical sketch of the Nanti people is provided, and a grammatical description of the Nanti language is also included.