Chris Hellmuth, Nakhimovsky & Tom Myers, Colgate University

Linguist's Toolbox and XML Technologies

Toolbox (available at is a powerful linguistic tool, but its capacities for viewing the data on screen and printing are somewhat limited and dependent on proprietary Microsoft formats (.DOC, .RTF). The rapid development of Web and XML technologies have created many new opportunities for information storage, query, and display. Thanks to the new XML-export feature of Toolbox, these technologies can now be deployed to make Toolbox data available on the web, presented in structured way within the browser, and stored in a relational database that can be queried by the back-end code of the web server. This paper presents two web applications that demonstrate possible uses.

The first web application is quite simple and does not need anything other than a modern Web browser. It maintains a collection of XML files that are exported by Toolbox. These can contain text, interlinear data, wordlists, glossaries, and OLAC metadata. The web application will also maintain a collection of XSLT files that will act similarly to a query. By choosing an XML file and an XSLT, the Toolbox data, including audio and video files, can be displayed in the web browser. Because there is no database back-end, the application is very simple and provides an easy way to view Toolbox data in flexible ways based on open and stable standards. The entire application can be programmed in JavaScript; no installations are needed. If, in addition, a Web server (Tomcat) and a Java processor are installed, another set of XSLT programs can convert Toolbox data into the PDF format for printing.

The second application adds a web server and a relational database to the framework. (The browser, the server and the database can all be on the same laptop - or they can be on three different computers separated by great distances: the same software can function in both situations.) Just as the first application, it converts Toolbox XML exports (e.g., an interlinear text) to a structured XHTML format that conforms to a "microformat" to be developed for that purpose. (See OLAC metadata collected within Toolbox will be converted to the OLAC-established XML format, validated, and also converted to an XHTML microformat for Web display. Both linguistic data and OLAC metadata will be stored in a database for querying and display. The relational database can serve both as an archive and a scholarly resource available on the Web. The ultimate goal of our effort is to provide a smooth, almost completely automated path from field data and analysis to a Web-accessible OLAC-compatible repository that is also a convenient resource for linguists everywhere (subject to intellectual property protections).

We greatly appreciate generous help from Alan and Karen Buseman, Joan Spanne, and Gary Simons, all of Summer Institute of Linguistics. Tom Myers of N-Topus Software ( has provided, as often, invaluable advice. This research has been supervised by Dr. Alexander Nakhimovsky and in part supported by Colgate University.