Máíhɨ̃ki minimum word

One of the most gratifying aspects of scientific research for me is that satisfying click that results when a set of loosely related facts unexpectedly fit together to yield a deeper insight. The scale of such insights vary tremendously, of course, from ones that are the result of  months or years of careful data collection and analysis, and which lead to an understanding of some major component of a language’s grammar, to more modest empirical generalizations that leap out unbidden from a dataset.

We made one very simple but gratifying discovery of the latter type just a couple of weeks ago during the waning weeks of fieldwork on Máíhɨ̃ki,  which reminded me how a language can continue to yield surprises, even in areas which you think you already understand well. Briefly, we realized that Máíhɨ̃ki exhibits a minimum word requirement — and  like an image that emerges from an autostereogram, now that we have noticed it, it seems so obvious that we can’t quite understand why it took us so long to become aware of it. But there’s actually an interesting story in how we got there.

Máíhɨ̃ki, a Western Tukanoan language closely related to  Siona and Sekoya, is tonal, and for the first several years that we worked with speakers of the language (beginning in 2010), we were particularly preoccupied with understanding its subtle tonal system. Stephanie Farmer carried out the bulk of the early work on Máíhɨ̃ki tone, and we now have a satisfying analysis of the system [pdf]. Crucially, it was during this most recent field season that I really felt confident that I could hear Máíhɨ̃ki tone reliably, and it was clearing this tonal hurdle that I think allowed us to pay proper attention to the segmental issues at play in the minimum word requirement.

Perhaps the first significant step towards the discovery of the minimal bimoraicity of Máíhɨ̃ki words took place last summer, when we realized that verb roots are bimoraic in regular finite forms. Crucially one can find tonal minimal pairs like the following, whose roots exhibit only a single vowel quality, and where the tonal contrast depends on the bimoraicity of the root.

(1a) sáá-yí  ‘I am leaping’

(1b) sáà-yì  ‘I am taking’

(2a) dáá-yí  ‘I am bringing’

(2b) dáà-yì  ‘I am intoxicated’

(Surface high tones are indicated by an acute accent, surface low tones by grave accent; high tones spread from roots to inflectional suffixes, and -yi is a first person present tense suffix.)

Before finding the HH vs. HL contrast in roots like these, we had mistakenly thought that bimoraic HH roots, like those in (1a) and (2a), were monomoraic H roots, but once we found bimoraic HL roots with only a single vowel quality, like those in (1b) and (2b), it became clear that their HH counterparts were also bimoraic.

So far so good. Then this most recent summer we began to make some surprising discoveries. For example, in an elicitation session involving plantains (which we thought was the following: ó ‘plantain’), I very clearly heard a consultant say óò. It turns out that óò means something like ‘unit of plantain’ (e.g. a plantain stalk) and probably bears an old phonologically assimilated classifier, but upon comparison, it was also clear that species name was bimoraic, thus: óó ‘plantain’.

The floodgates opened after this: the classic minimal pair, which we formerly thought was ‘macaw’ and ‘path’, turned out to be máá ‘macaw’ and màà ‘path’, and so on and so forth. Significantly — and this was no doubt one factor that contributed to our late realization of the bimoraic nature of these forms — these words surface as bimoraic only when they are the only morphemes in a given phonological word. Thus, we have óó ‘plantain’, but óhù ‘bunch of plantains’ (bearing the bunch classifier -hu) and macaw máá, but mánà (bearing plural suffix -na). In other words, these forms behave like underlying monomoraic forms that experience moraic augmentation to satisfy a presumable bimoraic minimum word requirement.

I will write next about two other interesting matters related to this discovery, but suffice it to say that we were both delighted to make this important discovery, but also somewhat alarmed by how it managed to elude our attention until now. And in one of those strange quirks of perception, now that we have noticed the bimoraicity of the forms in question, it’s so obvious that it’s hard to understand how we could have ever missed it.

Venomous Snakes: 0

The end of August approaches, and I find myself recently back in Berkeley after a full field season in Peru. This summer I worked exclusively in the Máíjuna community of Nueva Vida, collaborating with several colleagues (about whom, more in subsequent posts) to document Máíhɨ̃ki, the westernmost of the Tukanoan languages.

This field season was marked by relatively little in the way of physical adventure, which is in general a good thing. In particular: no venomous snakes. Our two preceding  field seasons involved somewhat startling encounters with cascabel snakes (the juvenile form of Bothrops atrox), which we found snuggled up in various places in our tight living quarters. Given that cascabeles are venomous, we followed local custom and dispatched them with machetes, rather than chase them off and run the risk of blundering into them with unfortunate consequences in a less attentive moment.  A human armed with a machete is more than a match for a little cascabel, but it was nevertheless gratifying not to have to confront another one, however unequal the terms. We did have one serpentine visitor this field season, though: a charming little boa that lurked for a couple of days in the thatch of the hut that served as our kitchen and dining area.

The one near-disaster we had involved a situation so mundane that I’m still shaking my head over it. Several of us had gone to the community of Sucusari to attend the annual meeting of FECONAMAI, the Máíjuna federation, and had gone with a group of folks from Nueva Vida. This trip involved going down the Napo from Nueva Vida, past the town of Mazán, and on to Sucusari. On the way back, we needed to leave a little early because we had to make a side-trip to Iquitos, via Mazán, to buy supplies for the community linguist workshop we were holding shortly after the end of the FECONAMAI meeting, and so we headed upriver with several Máíjunas who had their own reasons to make a trip to Mazán.

Anyway, we were heading up the Napo, half-mesmerized by the blazing sun and the coffee-colored waters of the river slipping by the bow, when the boat almost sank. The river was about a kilometer wide at that point, and we were right in the middle of the tranquil-looking river, when we hit a sand bar, which caused the boat to lurch. Crucially, we were cutting across the current at that moment, so that the lurch sent the ‘upriver’ lip of the boat under the level of the water, and the Napo began to rush into our little vessel, threatening to swamp it seconds and sink it. Needless to say, that would have been very bad: the banks were several hundred meters away on either side, and several of our number were weak swimmers — never mind all the recording equipment and computers we had with us.

Fortunately, someone with the right reflexes jumped out instantly onto the sandbar and wrenched the lip of the boat up, so that the boat remained afloat. With some vigorous bailing we were able to continue, somewhat damp and shaken, but otherwise unharmed. Ironically, Chris and I had just been discussing how mild the river conditions seemed to us in comparison to the more turbulent rivers we are accustomed to in southern Peru. It just goes to show how deceiving appearances can be.

Ethics and IRBs

During the past week my wife Chris and I have been in transit from Peru, to Austin, and finally to Berkeley, and we are now setting up our new home. Much neglect of this blog has ensued. I did, however, want to pass on two interesting links. The first, brought to my attention by Jane Simpson over at Transient Languages and Cultures, is a link to the draft of the LSA’s ethics statement. The statement itself is available here. The draft statement is cleverly set up as a series of blog posts, with each major section getting its own post, with its own comments section. (The front page of the blog is here.) This seems like a nice way to get discussion going among linguists, and there have already been some interesting comments posted. Also included are links to ethics statements by other professional organizations such as the American Anthropological Association.

On a related note, Claire Bowern over at Angarrgoon (who also mentions the LSA ethics statement blog) provided a link some time ago to Institutional Review Blog, which is maintained by Zachary M. Schrag, an Assistant Professor of History at George Mason University. According to its subheader, the blog is dedicated to providing “[n]ews and commentary about Institutional Review Board oversight of the humanities and social sciences.” Schrag is apparently preparing a book and he posts frequently. His perspective seems like a valuable complement to discussions going on at places like Savage Minds (e.g. here, here, and many others).

I must admit that my personal experience with IRBs at the University of Texas was not that bad. Certainly there were lots of bureaucratic hoops to jump through, but at the end of the day, the members of the IRB seemed sane and did not engage in the over-reaching that I’ve heard about in some of the worse horror stories from my colleagues. I will be very interested to see how the IRB is at Berkeley. (I’m keeping my fingers crossed.)

Fieldwork on Vacacocha

In an earlier post I outlined my plans to do some exploratory work this summer on Andoa, a minimally documented Zaparoan language spoken on the Rio Pastaza, near the Peru-Ecuador border region. As I was preparing for my trip to the Rio Pastaza, however, some of my travel arrangements fell through, and it became apparent that I would not be able to make it to the Andoa community in the time I had available. Fieldwork on Andoa would have to wait until next year.

I thus found myself in the lovely city of Iquitos with a free week on my hands. Perfect, I though, this would be an opportunity for me to see if I could find any speakers of Vacacocha. Now, if you haven’t ever heard of Vacacocha (also known as Aushiri), you are not alone. It is among the most poorly documented of Peruvian Amazonian languages, and the language is known only from a few short word lists, none of them collected by trained linguists (a bibliography of Vacacocha references is available here (pdf)). Based on this limited information, the language is considered by most classifiers to be a linguistic isolate, but for the most part, so little is known about the language that it tends to elude linguists’ attention. The one clue about where to locate speakers of Vacacocha, repeated in many sources, is that in the early 20th century, there were several families of Vacacochas in a place on the Rio Napo known as Puerto Elvira.

Two days later, then, I found myself on the Rio Napo with a theoretical destination and a general direction to head in — upriver. After two more days’ travel up the Napo I pulled into Puerto Elvira, a community of about 200 people, situated on a bluff overlooking a majestic bend in the Rio Napo. Shortly after touching down I was shuttled over to the community’s three school teachers, who politely asked me what I was doing in their community. After I explained that I was looking for speakers of Vacacocha, the teachers put their heads together and came up with some recommendations for whom I might speak to.

I spent the remainder of the afternoon shuttling back and forth between various little islands upriver of Puerto Elvira, following up on suggestions about where older individuals with some knowledge of Vacacocha might be found. Eventually I met Delia Luisa Andi Macahuachi, a slight woman of some 70 years, who explained that she spoke Vacacocha as a child, and had used it intermittently as a young adult, but had not spoken the language in several decades. She expressed willingness to work with me, however, to document anything she could remember.

Delia Luisa Andi Macahuachi with one of her granddaughters
It very quickly became apparent that the language is tonal — in fact, shortly after beginning the first elicitation session, Delia reprimanded me for repeating the words with a flat intonational contour, and I subsequently paid more attention to carefully reproducing the tone contour of the words. Also obvious is the fact that the language has a contrast between oral and nasal vowels. Neither the tonal nature of the language nor the oral/nasal contrast is mentioned in the available material on the language, so it became clear that even if I were only able to collect lexical data, it would be possible to significantly improve linguists’ knowledge of this isolate.

From what I was able to determine, Delia is the only remaining individual in the Puerto Elvira area with any significant knowledge of Vacacocha. During the two days I was with her, Delia worked hard to remember aspects of the language she had not used regularly in close to sixty years. Although she initially found the work frustrating, she came to find the exercise of recovering long-dormant parts of her knowledge quite gratifying. I promised to return to her at the earliest opportuniry a copy of all the words and phrases I was collecting from her, and she was especially excited about the idea of leaving the linguistic documentation as a legacy for her grandchildren.

Delia and her family members mentioned another relative who they considered to be the best and sole other remaining speaker of the language. Unfortunately, this other speaker was taken several years ago by her children to live on the Rio Momon, near Iquitos, and I did not have the opportunity to work with her. I hope to locate her next year.

After two days, I had to return back downriver, as I had other pending fieldwork obligations. I was quite excited, however, to have found at least one semi-speaker with who I could work to recover aspects of Vacacocha phonology and lexicon, and I am looking forward to returning next year to make some further progress.

Words from Sepahua

I am now writing from the small town of Sepahua, located at the confluence of the Urubamba and Sepahua Rivers. My wife Chris and I arrived here this morning from a trip to the Machiguenga communities of the Camisea and Urubamba basins, and we hope to be heading tomorrow back downriver, en route, eventually, to our new lives in Berkeley.

I am quite surprised to be writing a post from Sepahua. I predicted that with the economic decline in the region brought on by the collapse of the logging industry, the fledgling internet service here would be an early casualty. It seems I was half right.

The signs of economic decline are clear in Sepahua. Over half the stores have closed down, and in the ones that remain open, the goods that are offered are the cheapest possible. A small number of people are working with the oil and gas companies that are active in the region, but most people are very worried about how they are going to survive. Many people we know have already left Sepahua for other places, and I suspect that this trend will accelerate.

As I predicted, the previous internet service, managed by the municipality, has closed down. But to my great surprise, two new internet businesses have been launched in Sepahua, and they seem to be doing well. It is presently 8 pm, and eight of the ten computers are in use at the internet cabinas from which I am writing. It seems that communication and access to information are a high priority for people in Sepahua, despite their shrinking budgets. This seems to be an instance of a more general pattern I have seen here in Peru, namely, that people are willing to spend a relatively large fraction of their income on communication (especially cell phones), even when their financial situation is precarious.

Tomorrow is the next leg: by river down to Atalaya, where we will be waiting for a small plane to Pucallpa.

Notes from Atalaya

I am writing from the humid environs of Atalaya, a town of some 5,000 people located at the point where the Urubamba and Tambo Rivers meet to form the Ucayali. As the major town of the region, it is the only place with internet within two days travel by local means. The machine is a battered seven year old IBM that looks like it has been repaired many times, and the internet connection is struggling to deal with the WordPress server. But the presence of internet is impressive nevertheless. I still remember from my first visits to this part of Peruvian Amazonia in the early 90s that telephones were scarce to non-existent, never mind internet.

I am on my way from Loreto, and fieldwork on Iquito (in northern Peruvian Amazonia), to a brief visit to the Lower Urubamba region for about two weeks of humanitarian aid and linguistic work in the Matsigenka and Nanti communities on the Camisea River (in southern Peruvian Amazonia). My time working with the Iquito speakers we know was very productive, and yielded some suprising results. I plan to blog about those experiences when I am back in places with reliable internet connections, probably in early August.

We arrived in Atalaya this morning from the major jungle city of Pucallpa by plane — a six seater into which they crammed eight passengers. Tomorrow my wife Chris and I will be heading upriver on the next stage of the trip, to the town of Sepahua, where we will obtain a boat and crew to take us up the Camisea. Unless the internet connection is still working in Sepahua, which I doubt, I will be out of touch until the end of the present month.

In search of Andoa

Tomorrow morning I travel from Lima to Iquitos, en route to the Río Pastaza. My reason for this brief trip is to follow up on information I have received that there are two elderly speakers of Andoa in the community of Andoas Viejo, which is located on the Río Pastaza, not far from the border with Ecuador.

Andoa is a member of the Zaparoan family of languages, which includes Iquito, a language with which I have worked a great deal over the last six years. In recent years I have become interested in the historical linguistics of the family, but there is something of a dearth of information on the other languages of the family. There is quite a good dictionary of Arabela, which includes an adequate description of the language’s morphology (Rich 1999), and a grammatical sketch of Záparo (Peeke 1991), but otherwise the data available on the other languages is very spare. In this context, even basic lexical and morphological data on Andoa would be a tremendous boon.

Andoa, however, is widely believed to be extinct — Ethnologue, for example, reports that that last speaker of Andoa died in 1993. However, in late 2006 I met an anthropologist in Iquitos who had recently made some recordings with two elderly Andoa speakers. I listened to them briefly, and my superficial impression was that the speakers displayed significant fluency. So, the report of the demise of Andoa may have been premature — and that is what I would like to ascertain on this trip.

(Incidentially, Nick Evans has a nice chapter entitled ” The last speaker is dead — long live the last speaker”, in Linguistic Fieldwork, edited by Paul Newman and Martha Ratliff, which discusses the phenomenon through which who counts as the “last speaker” of a language is frequently a moving target, as much tied to issues of local identity community politics as linguistic ability.)

On this particular trip I don’t anticipate doing a great deal of actual linguistic documentation. If I in fact locate any individuals who identify themselves as speakers, I hope to talk with them about a small-scale documentation project and see if they — and the community more generally — are interested. If I can, I would like to do enough work with the speakers to roughly gauge their level of fluency. Can they remember only core vocabulary, or can they easily construct relative clauses?

Interestingly, there are signs that the ethnically Andoa communities on the Ecuadorean side of the border have recently become interested in language documentation and revitalization. See, for example, a newspaper article here, and a UN report here. These reports indicate that there are several speakers on the Ecuadorean side, which leads me to hope that Ecuadorean linguists will be taking up the challenge to document the language on that side of the border.


Rich, Rolland G., compiler. 1999. Diccionario Arabela—Castellano.‭ Lima: Instituto Lingüístico de Verano.

Peeke, M. Catherine. 1991. Bosquejo gramatical del záparo.‭ Quito: Instituto Lingüístico de Verano.

The Hollow Frontier and the Logistic Travails of Fieldwork

Much to my surprise, I am still in Lima, still waiting to head off to the field. The logistical complications that have led to my prolonged stay in Lima have gotten me thinking about the concept of the ‘hollow frontier’ in Amazonia — for reasons that will become clear. The notion of the hollow frontier is an old one in Brazilian historiography, but I first came across it several years ago while reading William Fisher’s Rainforest Exchanges, a work on the interaction between extractive industry and community politics among the Xikrin Kayapo of central Brazil.

In this work, Fisher invokes the notion of the ‘hollow frontier’ as way of understanding important aspects of the history of Amazonia from the 18th century on. In particular, Fisher uses the concept to talk about the waves of extractive industry — among them the sarsparilla, rubber, and timber industries — that have swept through Amazonia. In North America, extractive industries frequently formed the leading edge of long-term colonization of areas previously inhabited and controlled by indigenous peoples. In much of Amazonia, in contrast, the successive waves of extractive activity have not served as the leading edge of substantial permanent settlement by non-indigenous peoples. Rather, as soon as the extractivist boom collapses, non-indigenous population in the extractive zones drops back off, as does the interest of the nation state, and most of the temporary infrastructure that supported the extractive industry evaporates. In Amazonia, then, the waves of extractive industry are not so much the leading edge of permanent non-indigenous colonization as short term extractivist booms that leave relatively little state influence or infrastructure in their wake.

My recent reflections on the hollow frontier have been triggered by the fact that my delay in Lima are in large part the consequence of the ongoing collapse of one of these hollow frontiers near one of my fieldsites, the town of Sepahua.

Sepahua is a small town on the banks of the lower Urubamba River, near the southern border of the departmento of Ucayali, that has for several decades effectively marked the edge of mestizo society in its area of the selva. The town began as a Dominican mission that was founded in 1948 to missionize the Amahuaca, Yine (Piro), and Asháninka living in the area, and a small mestizo settlement of traders and minor extractivists began to grow at the side of the mission not long after its foundation. At this time Sepahua was very difficult to get to from the main mestizo jungle urban centers like Pucallpa, requiring a river journey of roughly two weeks. One can get a sense of how remote mestizos and the Peruvian state considered Sepahua by the fact that Sepahua was a day’s travel beyond a penal colony founded in 1951 at the mouth of the Sepa River, a tributary of the Urubamba.

The town grew very slowly until the early 1980s, when rising prices for tropical hardwoods, especially mahogany, made logging in this remote region quite profitable. Another extractivist boom hit the region at about the same time: Shell began petrochemical exploration in the region. Shell built a significant airstrip in Sepahua and a variety of commercial businesses sprang up to supply the company and its workers — from bars and brothels to dry goods merchants. Shell left the region in the late 1980s, but many of the people drawn to Sepahua by Shell stayed and turned to logging to support themselves. Rising prices for mahogany and cedar drew even more people, and by the early 1990s, huge amounts of timber were being harvested from an ever-widening area around Sepahua.

The growth of Sepahua received some unusual help in 1991/2 when the town was attacked by a small group of Sendero Luminoso. The attack did little more than frighten the townsfolk, but the Fujimori government took no chances. To protect the airstrip and prevent the expansion of the SL into this new area, the government built a base and sent in a detachment of marines, which further stimulated commercial growth in Sepahua.

I first visited Sepahua in 1993, and even between then and the late 1990s, the town grew by leaps and bounds, reaching a population of about 4000. Sepahua was a real boom town. By the late 1990s, however, accessible timber was getting noticeably scarcer. The economic collapse of the region was fended off for a few more years by a modest amount of local economic activity linked to the Camisea natural gas project, but by the time this income dried up, logging in the region was in serious decline.

Airplane flights, which were weekly in the early 1990s and almost daily in the late 1990s and early 2000s, became monthly and then ceased altogether when the airstrip could no longer be maintained. Symptomatic of Sepahua’s decline, the military presence in the town was reduced to almost zero a few years ago. As of 2007, when I was last there, the only reliable way to get to Sepahua was by boat, a return to the early 1980s. But even here, things were no longer the same: the collapse of the hollow frontier in Sepahua meant that even river transport became relatively scarce.

Which bring us to me, sitting in Lima. If it were just a question of getting myself to Sepahua, there would not be much of an issue. The trip would be more circuitous and time-consuming than in years past, but I would be able to get from Lima to Sepahua in 4-5 days. The problem is that I also need to transport a sizable quantity of medical supplies to Sepahua, as part of an agreement with several indigenous communities in the region. And here the impact of the collapse of the hollow frontier in the Sepahua region is strongest: previously, there was a regular overland-and-river route from Lima to Sepahua run by numerous traders in Sepahua to supply manufactured goods to the town. However, with the collapse of logging in the Sepahua region, demand for goods has largely dried up, and now there is only one person doing the route — and very irregularly at that! The economic activity and infrastructure that I came to rely on over the course of the last decade has largely evaporated, a victim of the hollow frontier.

It has been close to two years since I was last in Sepahua, so it will be interesting to see how the town has fared. In the go-go early 2000s, Sepahua even had internet service, but when I was last there, it seemed to be on its last legs. If the internet connection is still working when I eventually get there, I will be sure to write a short post. The hollow frontier being what it is, though, I’m not counting on it.


Fisher, William. 2000. Rainforest Exchanges: Industry and Community on an Amazonian Frontier. Washington, DC: Smithsonian Institution Press.

Off to Peru

Tomorrow morning my partner Chris Beier and I return to Peru for a summer of fieldwork. I plan to keep blogging as much as I can while in Peru, and it’s my hope to break some new ground in terms of from how close to the edge of the wired world I can post. Internet access is spreading further and further into the jungle, so who knows how far I’ll be able to get.

We have a very busy summer planned. In fact, I feel that I have transitioned into a phase of my life where I would like to do much more fieldwork than I possibly have time for. We first plan to return to the Nanti community of Montetoni, where we actually have a house, if the thatch hasn’t given out by now. Apart from seeing our friends and drinking huge amounts of manioc beer with them during village feasts, and continuing our long-term health- and education-related work with the community, I have a number of research goals.

I’m presently writing a paper on the fascinating Nanti reality status (realis/irrealis) inflectional system, and I need to check on reality status marking in epistemic conditional constructions. They’re rare as hen’s teeth in texts, so I need to get some more examples. I’m also planning to do some more work on the Nanti stress system. A few years back Megan Crowhurst and I wrote a paper that focused on stress in Nanti verbs, and I now want to write a paper looking at the nominal stress, which behaves quite differently. Finally, in writing my dissertation, I noticed an interesting gap in the textual data I have on an exotic and discursively rare morphosyntactic alignment system found in Nanti ditransitive verbs: I realized I had no data at all on first or second person theme/patient arguments. Curiously, a perusal of the literature on the languages most closely related to Nanti showed the exact same gap, without a word of explanation from any of the authors. So now I am intensely curious if this empirical gap is accidental, or if it represents a restriction that the patient/theme argument must be lower in the speech act participant hierarchy than the beneficiary/recipient argument.

We’ll also be returning to the Iquito community of San Antonio, to say hello to our friends there, see how the language revitalization program is doing, and do a couple of weeks of research. I’m working on a comprehensive reference grammar of this language — along with several colleagues — which I will finally be getting back to, now that my dissertation is out of the way. I’ll be working on a few outstanding things for the grammar, including nailing down the complicated prosodic system, which features some subtle interactions between stress and tone. I feel we have a good analysis of the system, and I think we’ll be able to clear up the remaining questions quickly. I also hope to settle the semantics of one stubborn corner of the evidential system, and in particular, the semantics a morpheme that appears to exhibit visual evidential, mirative, and malefactive (!) meanings. Once again, this morpheme is textually very rare, so it has proven difficult to figure out.

The single most exciting project we have planned for this summer, however, is an attempt to do some fieldwork on Andoa, a language of the same family as Iquito. The word on the street was that the last speaker of this language died in 1993, but in late 2006 we ran into a French anthropologist who had located two elderly speakers and made several hours of recordings with them, some of which I listened to. The speakers seemed quite fluent, as far as I could tell, so there may still be a chance to get some basic data on the language. In the coming years I’m hoping to do some historical work on the Zaparoan family, and a few weeks of work to gather to basic lexical and morphological data on Andoa would be hugely helpful.

Now, back to packing…

Shoebox and beyond

Having previously discussed research funding, let me now turn to another question that I get asked with some frequency by people thinking of heading to the field for the first time: what software do I need — and more specifically: what is Shoebox, and what do you think of it? What follows is my very personal opinion about the Shoebox program.

Briefly, Shoebox is a program for creating dictionaries and interlinearized texts, produced by SIL. What do I think of it? Well, I think of Shoebox like that ancient hulk of a car that your uncle gave you for your 18th birthday, that you still use because you can’t afford anything better, that wheezes, rattles, belches smoke and smells like oil and gasoline, which breaks down so regularly that you have to keep a toolbox in the trunk to make repairs by the side of the road — and which you have to pump the gas on when you come to a stop sign so that it doesn’t stall — but which, at the end of the day, gets you to your destination — perhaps not in style, and definitely not in comfort — but gets you there.

Most people I know who do fieldwork use Shoebox, but dream of the day that something better will come along and they can take the rattling hulk out into back field and leave it there to rust. (Speaking of which, the most recent version of Shoebox was re-christened Toolbox, although it differs in only a few ways. SIL has, more recently, produced a new program, called FLEx, that fills the same basic function as Shoebox/Toolbox, and which is reviewed here. I have not tried it yet, but it looks like a significant improvement of Shoebox in most respects.)

What is Shoebox? Shoebox is essentially a lexical database program joined to a morphological parsing program. The database is designed with dictionary-making in mind, and one of its major virtues is that it includes an export function that permits one to export the dictionary database as a Microsoft Word document that is formatted in a recognizable dictionary style, with headwords and subentries. The database has basic filtering and search functions.

The morphological parser is intended to split words in a text up into their constituent morphemes and assign glosses and parts of speech to the segmented morphemes. The results of the parsing processes are outputted as interlinearized text. The parser searches for morphemes by searching the dictionary database, and a potentially very useful function is that one can set the preferences so as to open a new dictionary entry whenever one comes across a morpheme that the parser does not encounter in the dictionary, thereby allowing you to build up your dictionary by parsing texts. One can also in this way build up corpora of parsed texts that can then be searched for glosses or morphemes of interest.

In my view Shoebox’s strengths are its dictonary-format output, and the fact that the lexical database and parser are fairly well integrated. For a long time, there was little else out there that fulfilled these functions so conveniently in a single package, although that is now changing (see below). SIL, who wrote and maintained the program, also made it available for free — which is a good price.

As my griping above suggests, however, I find Shoebox frustrating in several respects.

Perhaps the single greatest weakness of Shoebox is its documentation, which is woefully inadequate. I have known several people who have tried to use Shoebox, but have given up in frustration. The program comes with a tutorial, which is good, as far as it goes, and it has a help feature, but much of what you need to know to use the program is simply not documented anywhere. If you have programming experience or are generally software savvy, you can, with a lot of patience and gnashing of teeth, figure out most of what you need to know (although there are still a number of things that remain mysterious to me, even after several years). Otherwise, I strongly recommend getting some tutoring from someone who has been using the program for a long time. It will save you a lot of time.

Because Shoebox documentation is so spotty, a number of people have written their own notes for set-up, such as this and this. Here is an article that discusses using Shoebox.

The dictionary-formatted output is one of the best things about Shoebox, in my opinion, but it still leaves a lot to be desired. The user has fairly little control over the formatting, unless one is willing to edit the files that control the conversion from the plain-text Shoebox database to Word (or RTF). That is not something that most people feel comfortable about.

The morphological parser works well for languages with agglutinative morphology, and works best when there is little allomorphy or morphophonology. There are ways to handle allomorphy and morphophonology, including a way to input conditional or environmental rules, but I have found that in languages with lots of morphology and complex allomorphy or morphophonology the parser tends to make lots of errors and one has to spend a lot of time telling the parser what to do, or, even worse, one has to spend a lot of time beefing up one’s lexical entries to deal with ambiguities and parsing problems. In Nanti, a Kampan language I work on, the parser works pretty well, but for Iquito, a Zaparoan language I work on, the parser is really not worth the trouble, at least so far. (I suppose this may reflect my lack of computational skill; I’m not a computational linguist, but neither am I computationally illiterate. If a linguist with my abilities is having a hard time getting the parser to perform well, that suggests to me that its not very well designed.)

Another major weakness of the parser, in my view, is the formatting of its interlinearized output file. The interlinearized output looks OK on the screen, but it is a very laborious process of cutting, pasting, and reformatting to move it from the Shoebox text file to another document. As a consequence, its fairly impractical to use Shoebox to create publishable interlinearized texts (contrast this with the dictionary output).

Finally, I find that the overall design and organization of the user interface leaves much to be desired. In many cases, important functions are buried in such a way that one has to go through nested sets of dialogue boxes to get at them. Finding functions can also be a chore, since they are sometimes put in pretty obscure places. In addition, text windows and menus are sometimes very small, so that its hard to see all one needs to see. As a result of these issues, actually using the program can be frustrating. Maybe this is because I’m a Mac user, but I also find Shoebox’s sensitivity to, and stupidity about, file locations to be frustrating, as it makes transferring databases between files or between machines a tricky affair.

In summary, then, Shoebox is vastly, vastly better than nothing, but I find that the program leaves much to be desired. For a long time, though, it was really the only game in town, so the situation was pretty much “put up or shut up” (or for the computationally-minded among us: write something better). However, the situation is beginning to change…

Moving Beyond Shoebox

As I mentioned above, SIL has released a new package, FLEx, which is intended to replace Shoebox/Toolbox. I have not yet tried it, but it looks like a considerable improvement over Shoebox. It is reviewed here. As soon as I have some time (i.e. when I have finished my dissertation), I plan to take FLEx out for spin and see how it works.

TshwaneLex is a commercially available lexicography program for creating dictionaries. This review makes it look like a fairly attractive option, except that its not free (150 Euros for an academic license). Also, it is not integrated with a parser, the way Shoebox is, so if that is important to you, then Tshwanelex is not for you.

There are some other tools out there, but as far as I can tell, most of them are not yet ready for prime time. The E-MELD School of Best Practice is a great resource for any linguist heading off to the field for the first time, and they have a large quantity of information about software here.