In this paper we explore the relationship between language documentation and revitalization, highlighting some methodological issues that emerge. We argue that work on both documentation and revitalization has much to gain from working together more closely, in order to produce documentary corpora that are more relevant to communities and useful for language revitalization. We argue that documentation should pay more attention to local histories, ethnographies, goals and management of language use, as well as the crucially important but poorly researched beliefs and ideologies about language and language use held by both speech communities and researchers.
One of the main responses of academia to language endangerment has been the development of the sub-field of Language Documentation (LD, also called Documentary Linguistics). Himmelmann (1998: 161) presented its main goal as “to provide a comprehensive record of the linguistic practices characteristic of a given speech community.” Himmelmann (2006: v) restated this as a focus on “the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties.” This approach emphasizes transparency and multifunctionality, as well as ethical engagement with a wide range of stakeholders, including speech community members. As Himmelmann (1998: 161) also pointed out, LD “differs fundamentally from … language description [which] aims at the record of a language … as a system of abstract elements, constructions, and rules.”
The reawakened interest in language practices in context can be traced to the Ethnography of Communication pioneered by Hymes (1964), Ethnopoetics and the study of verbal art developed by Tedlock (1983) and Hymes (1981), and the discourse-based approach of Sherzer (1987), who argues for a change in focus to contextualized language in use rather than fixed objects with grammatical structure.
Language Documentation is generally understood as the creation of a corpus of archivable audio, video, and textual recordings, and translating and annotating them, paying attention to relevant contextual metadata (Austin 2013). The corpus and analysis should be available and accessible to a wide range of users. These goals have been facilitated by advances in information and communication technologies and digital media, and by large infusions of funding (e.g., from the Volkswagen-Stiftung, the Arcadia Fund, and the Documenting Endangered Languages Program). Both of these developments have influenced the methodologies and directions of research.
Documentary linguistics places a strong emphasis on the production of high-quality recordings. Techniques influenced by principles and practices in recording arts pay particular attention to such things as the choice and positioning of microphones, 1 reducing background noise, and lighting, framing, and editing for video. Data collection methods have expanded beyond the traditional gathering of monologic narratives and eliciting structural features, to include adaptations of methods developed within psycholinguistic and discourse analysis research. These include “staged events” (such as picture or video descriptions), information gap–type activities where participants have to communicate in order to match pictures or descriptions (Lüpke 2010), narrative problem-solving (San Roque et al 2012), and retelling of oral narratives (Reiman 2010; Bird 2010). 2
Emphasis on technical issues related to recording, archiving, data structure, and data management (Bird and Simons 2003; Nathan 2011) have led in some cases to “technical parameters such as bit rates and file formats [being] now often foregrounded to the point that they eclipse discussions of documentation methods” (Dobrin, Austin and Nathan 2009: 42). These challenges need to be considered at all levels: funding, research design, and training.
One such issue is the “observers’ paradox” that has long been a concern of sociolinguists, who place high importance on naturalistic speech in data collection: “the aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain these data by systematic observation” (Labov 1972: 209). Increasingly, sociolinguists and linguistic anthropologists address this challenge through ethnographic methods, which are perceived as giving more voice to research participants (or consultants, as they are often called in LD). Yet in documentary linguistics, participant observation of ordinary speech in conversational settings is rare, partly because in the standard model of research funding and design few linguists have the time to become fluent in the languages they study. This leads us to wonder whether in some cases documentary records may include “foreigner speak” due to linguistic accommodation (Giles et al 1973).
In addition, “high-quality recording” has often been misinterpreted as requiring that performances of narratives, songs, and rituals be “purified,” excluding culturally relevant elements of storytelling, verbal art, call and response, audience participation, etc., or ignoring or manipulating their social context (e.g., shifting to a location or time more convenient to the researchers, removing bystanders, directing participants to behave in socially unusual ways so that they are in frame for video, etc.).
Archiving documented materials in digital repositories is required by most donors, and has thus become an integral part of LD. Archives are expected to have transparent policies and processes for acquiring, cataloguing, preserving, disseminating, and format/content migration (future-proofing) (Conathan 2011; Nathan 2011), as well as “access protocols” that formulate and implement participants’ rights and sensitivities, together with methods and processes for controlled access to the materials in the archive. 3 Endangered language archiving thus requires adaptation of the open access trend current in much other academic research and publication.
Documentation and archiving highlight the need for collecting and analyzing metadata (data about data) so that records can be contextualized, managed, preserved, and understood. Two main approaches have emerged to the level and type of data that are desirable:
Corpus theorization in LD has only been weakly developed: few scholars have been explicit about why and how they are collecting and organizing their corpora, other than some vague notion of “documenting the language” or “saving the data.”
Austin (2013: 6ff) argues for a wider approach that he calls “meta-documentation,” which would deal with “methods, tools, and theoretical underpinnings for setting up, carrying out and concluding a documentary linguistics research project.” Meta-documentation should thus include ideological processes, researcher positionality, and sociopolitical issues.
The emphasis on “compiling a representative and lasting record of a language” (often forgetting the multipurpose element) has led documenters to focus on defining and describing individual languages-as-systems, and to ignore the fact that language shift necessarily requires the presence of another (more powerful or desirable) language or variety. It has been argued that linguists should document language ecologies, not just what they define as individual languages or varieties (Mühlhäusler 2000; Grenoble 2011); we should pay attention to multilingual repertoires, mixed codes, translanguaging, contact effects, and language variation and change (Lüpke and Storch 2013; see also Lüpke, Chapter 46, this volume), as well as “the sociolinguistic contexts in which those codes are used” (Childs et al 2014: 169).
The active participation of community members is increasingly valued in documentary linguistics. Yet LD tends to be uncritical with regard to the mediating effect of the camera and microphone, the potential permanence of the records created, and to power relationships between researchers and consultants and their ideological implications. A more reflexive approach is favored by practitioners in disciplines such as visual anthropology (e.g., Ruby 1980). 6 Sociolinguistic studies have shown that characteristics of the interviewer (e.g., gender, age, experience, social background, and ethnicity), and characteristics of the interview itself (e.g., data collection strategies, the role of the fieldworker in interviews, and the presence of other interlocutors), may also affect the data (Cukor-Avila 2000).
In LD there has been considerable discussion of relationships between fieldworkers and speech communities, focusing on cultural differences (Dobrin 2005), ethical responsibilities (Grenoble and Furbee 2010; Rice 2012), efficacy and empowerment (Roche et al 2010), or control of the research agenda (Penfield et al 2008; Czaykowska-Higgins 2009). There is increasing rhetoric about documentation with and by communities (Grinevald 2003, based on Cameron 1992) and on collaborative research (Yamada 2007; Leonard and Haynes 2010). Dobrin and Schwartz (2016: 253) argue that collaboration and reciprocity could benefit from using participant observation to deal with the “complexity and diversity of what goes on in particular cross-cultural researcher–community relationships.”
However, practicalities such as lack of time and resources, too closely delimited linguistic training, and the nature of funding mean that the dominant research model remains the “lone wolf” linguist (Crippen and Robinson 2013) who sets (or whose funders set) the research agenda, with the chief output being a description of some aspect of the grammar or phonology of the language as defined by the linguist, rather than materials designed by and for community members.
There is an ongoing and unresolved tension between two apparently contradictory standpoints taken by language documenters (Sallabank 2013: 18). On the one hand, there are perceived priorities to “preserve records of key languages before they become extinct,” 7 with the main beneficiaries being linguistics and comparative typology. On the other hand, we find rhetoric such as the aim “to create a repository of resources for the linguistic, social science, and the language communities” (emphasis added) from the Endangered Languages Documentation Program, 8 which promotes an ethical position to “give something back” to language communities. Both stop short of language revitalization, yet community members are often more interested in revitalization than documentation, which to them has less obvious immediate benefits.
One of the stated aims of LD is mobilization—the practical development of resources that make use of collected data and analysis for purposes such as language revitalization (Nathan 2006). However, the production of educational materials is not usually integrated into language documentation projects (Mosel 2012) and is difficult to obtain funding for without government or private donor support.
In the view of many documentary linguists, materials for reference and revitalization are ideally based on a corpus from language documentation. However, much material in LD corpora and outputs may be unsuitable for revitalization, for several reasons. First, it frequently contains genres or topics such as sacred materials, discussions of death or sexual relationships, as well as gossip, that are inappropriate for language learners, especially children. Second, the outputs of LD tend to consist of linguistic analyses published in academic journals, or descriptive grammars written in a major, typically Western, language, using terminology that is obscure to non-linguists. They tend to focus on grammatical or phonological features rather than on conversational gambits, communicative competence, or notions and functions.
Third, on a practical level, although archived recordings are in principle accessible to community members, they may be difficult for learners to make use of. Documentation is heavily biased towards the language performances of older and fluent speakers, which may be too fast, heavily context dependent, and include slurring or elisions, or even be affected by physiological factors such as lack of teeth. Such material may be difficult for language learners to process, especially at beginning stages. Few, if any, documentary corpora include learner-directed speech. This is partly because, as noted by Cope (2014), documentary linguists are not trained in pedagogical materials design, and applied linguists are rarely included in language documentation teams.
Fourth, observed language practices may not match the perceptions or preferences of speakers and language activists, who may prefer “folk linguistics” (Nieldzielski and Preston 2003) or purism to documentary corpora when constructing revitalization materials. Documentation that demonstrates low vitality, attrition, decline, or variation and change may be unwelcome. There are tensions between teaching norms based on the “more authentic” usage and interests of older native speakers versus those of younger or “new” speakers (see O’Rourke, Chapter 26, this volume). If a documentary corpus includes speakers of all ages and variation and change, as recommended by Amery (2009) and Childs et al (2014), corpus-based description may differ from speakers’ idealized perceptions of desirable usage (Dorian 2009).
Language documentation has thus been characterized as “Zombie linguistics” (Perley 2012). Perley goes further than the purely practical issue of accessibility and relevance: he problematizes the documentation of “a language” rather than support for the group who speak it. Both LD and revitalization have been criticized for focusing on languages rather than people (Moore 2007; Labov 2008; Spolsky 2014). Yet in Austin’s work with the Dieri Aboriginal Corporation in South Australia, language documentation and revitalization have been intertwined with social revitalization (Austin 2014). Austin’s work uses innovative methods such as grass-roots community-based workshops and a blog 9 to link revitalization and documentary materials.
Language revitalization has been seen by some documentary linguists as a simple “technical add-on” to their research through the creation of orthographies, dictionaries, subtitled videos, primers, and multimedia, rather than as a field of research or activity that requires theoretical and applied knowledge. For those who are familiar with multimedia software, it can be relatively simple technically to produce certain types of materials (such as talking dictionaries) from corpora. But the processes of negotiating which materials should be published, for what purposes and how, and in what orthography, involve issues of language planning and community dynamics that are not necessarily straightforward (Sallabank 2012). For this reason, it is important to consider extra-linguistic factors such as intercultural and power relationships between linguists and communities, intra-community hierarchies, histories of language contact and dominance, and language attitudes and ideologies (Sallabank 2013; Austin and Sallabank 2014), and to include these in documentation. This necessitates a reflexive, collaborative, and ethnographic approach to data collection and analysis.
The main emphasis of contemporary endangered language documentation remains on eliciting linguistic structures from the oldest consultants through interviews or storytelling (Sugita 2007). As mentioned previously, genres such as monologue narratives, word lists and paradigms, and (sometimes) songs continue to dominate the types of material collected. We may ask the question: what would LD look like if it was done with a goal of producing outputs of direct use for revitalization?
Sugita (2007) and Amery (2009) propose that documentation should also include interactional data, language functions, idiomatic expressions, and commonly occurring speech formulas, as well as conversations about everyday life, especially in non-traditional contexts. Additionally, it is particularly important to extend language documentation to intergenerational interaction (if speakers from different generations still exist), including code-switching, child-directed speech, and the language practices of younger generations. It should also include day-to-day interactions (greetings, leave-takings, and phatic communication), and chunks of language, ranging from formulaic expressions to whole discourses (e.g., “Welcome to Country” speeches that have become popular in revitalizing communities in Australia). Pawley and Syder (1983) propose that speakers know several hundred thousand of these sequences and that much fluent discourse is made up of repurposed memorized, fixed sequences. Documentation of such chunks of language would contribute to revitalization by providing a basis for learners to contribute to ongoing conversations.
One clue to the potential significance of this kind of material is Dorian’s (1977) notion of the “semi-speaker,” who has reduced structural or lexical knowledge compared to more fluent speakers; they are often active in revitalization movements, and they can become important sources of language knowledge as traditional fluent speakers pass away (see Sallabank, Chapter 29, this volume). Such individuals can sometimes be judged by the community to be highly competent in the language because they can interact appropriately and provide chunks that are suitable to the discourse context (Dorian 2009); yet they can also be diffident about their own proficiency and lack confidence to use the language. Research on what causes a semi-speaker to be judged in this way could be valuable for language learning and revitalization strategies, as well as for establishing means to re-activate semi-speakers’ and latent speakers’ language competence (Basham and Fathman 2008).
Nathan and Fang (2009) have also argued that language documenters could pay more attention to metadata for language pedagogy, such as:
They also suggest that documenters could make notes on learner levels, links to associated materials that have explanations and examples, notes on how to use material for teaching, as well as warnings about materials that are inappropriate for certain contexts (e.g., profane or archaic language). A more sociolinguistic approach to documentation can also identify potential learner groups and their abilities, needs, and motivations, as well as potential teachers and consultants and their particular skills.
To date there has been little recording and analysis of language revitalization from a documentary linguistic perspective, even though language teaching can provide useful language data and insights about structure and use. Nathan and Fang (2009) argue that language classes provide a locus for uncovering language attitudes, paths of acquisition, language change, literacy, language in use, new types of language usage, etc. Indeed, Goméz (2007: 101) proposes that language teaching needs to precede linguistic documentation so that community members can be fully informed and empowered, and so that their contributions can be richer. A current project in Jersey, Channel Islands encourages language apprentices to record sessions for future analysis and revision, thus contributing to documentation (see Sallabank, this volume).
In the mid-1990s, Language Documentation emerged as a new sub-field of linguistics that promised a fresh approach to the study of human language, paying attention to language use in context, and to data collection, analysis, and preservation in ways that might help confront global language loss. Hundreds of LD research projects have been initiated over the past 20 years, and the major digital archives in London and Nijmegen contain terabytes of corpus materials. This area of research has attracted large numbers of scholars and students, and engaged with speech communities around the world. Much innovative and challenging theoretical and practical work has been done, including in more recent times engagement between linguists and other academic fields in multidisciplinary projects. The potential for this work and its outputs to contribute to language revitalization seems clear prima facie; however, we believe that it has so far not done so, for reasons explored in this chapter.
There are opportunities to explore documentation for and of revitalization, and to engage with colleagues in related disciplines, especially anthropology, ethnobiology, sociolinguistics, and language pedagogy, that have barely been investigated so far. In addition, we argue that work on both documentation and revitalization has generally failed to pay proper attention to wider contexts of local histories, ethnographies, project goals, and management of language use, and the crucially important but poorly researched beliefs and ideologies about language and language use held by both speech communities and researchers. We hope that in the future documentation and revitalization will see much more collaboration, communication, and understanding on all sides.
Sallabank, Julia. 2013. Attitudes to endangered languages: identities and policies. Cambridge: Cambridge University Press.
San Roque, Lila, Alan Rumsey, Lauren Gawne, Stef Spronck, Darja Hoenigman, Alice Carroll, Julia Colleen Miller and Nicholas Evans. 2012. Getting the story straight: Language fieldwork using a narrative problem-solving task. Language Documentation and Conservation 6, 135–174.
See the recommendations at www.elar-archive.org/documenting/equipment/microphones.php (Accessed 12 February 2017).
Narratives have limitations, however, e.g., they tend to be in past tenses, refer to imaginary or fantastic events, and lack interactional language. As a result they do not necessarily contain useful material for revitalization.
http://emeld.org/school/index.html (Accessed 12 February 2017).
Very few linguists, or anthropologists, experience what it is like to be in front of the microphone or video camera.
From the NEH Documenting Endangered Languages website, www.neh.gov/grants/guidelines/del.html (Accessed 12 February 2017).
www.eldp.net// (Accessed 20 February 2017).
http://dieriyawarra.wordpress.com (Accessed 12 February 2017).