Andrea Berez Wayne State University
Gary Holton Alaska Native Language Center

Designing Community-Tech Workflows: A field linguist's guide to putting good practice language technology into the hands of speakers

In this paper, we describe our involvement over the past two years with a project to create archival and presentation formats of Dena'ina Athabaskan legacy materials, and to train Dena'ina heritage speakers in the use of language technology. We integrated these goals by developing a workflow for producing digital presentations of annotated corpora, and then training community members in its execution.

We approached the challenge with two sets of goals: methodological goals (those which would not change from project to project) and product-specific goals. Our methodological goals were to develop a workflow that incorporates good practices in digitization and archiving, is relatively streamlined and bug-free, and can be learned with minimal effort. It was also important to us that easy access1 software be used in all stages of product development and use. Our product-specific goals included linking legacy audio to annotation, incorporating digitization and archiving into the workflow in a seamless manner, and providing support to non-Unicode enabled machines. To these ends, we created an HTML CD-ROM of interlinearized Dena'ina narratives with line-by-line audio, built upon Elan XML files that are archived and then transformed with XSLT. We are also currently training Dena'ina community members to create similar products.

In this paper we discuss our step-by-step procedure for developing the workflow, from investigating various technologies and standards, to debugging the procedure, to training community members in a limited time frame. We examine which steps in our procedure were successful in meeting our goals, and which were not.

We address this paper to two audiences. First, we offer what we have learned to field linguists who might be faced with similar community requests and feel a commitment to best practice standards in language digitization. Second, we address tool developers, so that by knowing the goals of one group of linguists, they might better understand the needs of the users of their tools.

1. "Easy access" in this case refers to three things: first, that software and technologies used in making the product are either free or very inexpensive; second, that the final product is viewable on a standard home computer without much additional configuration or software downloads; third, that the product uses standards that are likely to be long-lasting, like XML and HTML.