DRH 2007

Proposal abstract

Opening Dawes: Organising Knowledge around a Linguistic Manuscript

David Nathan (SOAS), Susannah Rayner (SOAS), Stuart Brown (OxfordML)

Summary

The notebooks of William Dawes [1] describing the Sydney language, or Dharuk [2], are one of the earliest documentations of any of the Australian Aboriginal languages. The Dawes Online project is a collaboration of the Endangered Languages Archive and the Library Special Collections, both at SOAS, and the NSWAboriginal Languages Research and Resource Centre (part of the NSW Department of Aboriginal Affairs). It was conceived as a means of disseminating these important and difficult-to-access manuscripts through creating a web resource that combines digital manuscript images with transcriptions and Topic Maps that organise various types of knowledge around the images.

Goals and audience

The project will greatly increase access to the manuscripts, enhance them with information of various kinds not found within their pages, and will decrease the physical handling of the originals. In addition, the project will address some other goals. Since the notebooks are not too different from the fieldnotes of today's linguists and others who are documenting far-flung and less-documented languages, the project will provide one model for the encoding, sharing, and archiving of unique linguistic fieldwork data.

We aim to reach audiences interested in Australian languages and linguistics in general, but in particular those interested in the Sydney language and language support work, such as the description and revitalisation of this language which was destroyed so rapidly and comprehensively after the colonisation of Sydney. For such languages, old sources provide essential and irreplaceable information; however, at the same time, these sources need to be interpreted to take into account the times and environment in which they were written; the motivations, interests, language and skills of the writer; other complementary sources (such as other writers, related languages, and contemporary information from community members); and in the light of today's knowledge about Australian languages and linguistics in general.

Most of the project's components are aimed at providing rich and explicit interpretation of the notebook pages, by combining resources using web technologies including Topic Maps. The website will bring together high quality images of the 136 manuscript pages, a digital transcription, modern linguistic renditions, together with further knowledge markup, enabling users to search and navigate not only by content but also by topic, such as "places", "names" or linguistic topics, such as verb paradigms. In addition, "collateral" information – about people, places, maps, history etc – will be included and linked. A future edition of the website will allow interested users to add further comments and resources. The project could help provide researchers with the material they need to come to new understandings about Aboriginal Sydney. For example, one researcher has found that "smudges" inside the cover of the manuscript were recoverable as maps, which provided new evidence about Aboriginal placenames, and, in turn, about possible linguistic boundaries in the Sydney area.

Historical context

The life and character of Dawes himself are inseparable from the content of the notebooks and the project website will reflect this. In 1787, William Dawes, a lieutenant in the Royal Marines, departed England for Botany Bay on HMS Sirius, flagship of the "First Fleet" to Australia. As an astronomer, Dawes had been commissioned to establish an observatory at Sydney and to make meteorological observations; the notebooks contain considerable lexical elaboration of these domains. He also utilised his map-making, engineering and surveying skills, constructing batteries round the harbour area, laying out the government farm (crucial to the colonists' survival) and the first streets in Sydney and Parramatta. He also took part in explorations to west of Sydney including the first attempt to cross the Blue Mountains, with his colleague, Watkin Tench.

What is most important for this project is Dawes' interest in the language and people of the area. He developed friendships with Aboriginal people, in particular one girl, Patyegarang, and he compiled vocabularies, grammatical forms, and many expressions in the language that Patyegarang and others taught him. These friendships, however, were to prove his undoing. Dawes' friend Zachary Macauley wrote: "Dawes is one of the excellent of the earth … with great sweetness of disposition and self-command he possesses the most unbending principles." These principles regularly brought him into conflict with Governor Phillip, and eventually, when Dawes refused to apologise for some of his humanitarian deeds, and despite his ardent wish to stay, Phillip sent him back to England, where he then campaigned against slavery with William Wilberforce.

The Dawes manuscripts were originally part of the library of William Marsden (1754-1836), which he presented to King's College London in 1835. Part of this collection was then transferred to SOAS in 1916. The two small and insignificant looking Dawes notebooks remained unremarked until they were listed in 1972 by Phyllis Mander-Jones in Manuscripts in the British Isles relating to Australia, New Zealand, and the Pacific, and subsequently came to the attention of linguists such as Robert Dixon (now Professor of Linguistics at La Trobe University). Since then they have continued to generate increasing research interest.Although in good condition, the original manuscripts are vulnerable to damage, particularly the entries and drawings made in pencil. A poor microfilm copy – probably based on a photocopy of the original – is available in the State Library of New South Wales.

Project steps

The first step of the project was to make high resolution digital images of the notebook pages. We commissioned a photographer experienced in copy photography, and used specialised equipment including Kaiser copy stand, Bowens lighting and a Nikon D2X camera (12.8 megapixel) with fixed 55mm lens, storing the images as 600dpi tiff files.

Next, the pages were fully transcribed and marked up in XML using the Text Encoding Initiative (TEI) vocabulary.Some minor issues of transcription from original sources arose in this step, most notably around the handling of idiosyncratic characters.Finally, stand-off mark-up was added, as discussed below.

Technical design

The transcription is the starting point, or centre, of a network of interpretations, assertions, and claims; and the digital edition is intended to not only represent these but to facilitate their expansion. With the language long dead, and other information around Dawes sketchy, the speculative nature of many of the assertions connected with the manuscripts lead to potentially competing interpretations and analyses– not just between modern scholars, but between Dawes' own assertions and those of modern scholars (notably in the case of morphological analyses of the language documented).

Although the transcription itself is, of course, linear in nature, the knowledge surrounding that transcription is non-linear, with claims interconnecting with each other (and with other assertions external to the manuscripts), and falling within a number of semantic domains, both horizontally complementary (linguistic / historical / geographical) and vertically hierarchical (starting with the text itself, through apparently explicit claims of manuscript author, implied claims of the author, up to modern interpretations and analyses). A key factor in the project design was that assertions may contradict each other, and that assertions should be scoped by their respective authorities.

The Topic Maps [3] paradigm was therefore selected as the most suitable system for organising the various types of knowledge.Topic Maps are inherently non-linear, scopeable, and extensible.The transcription itself is left minimally annotated, while all the collateral material is stored separately within a Topic Map, with references into the digital editions where necessary ("stand-off mark-up"). Initially, the majority of the content in this component of the digital edition will be drawn from Jakelin Troy's linguistic analysis; however the easily extensible nature of the paradigm means that additional analyses and commentary may easily be added, possibly even directly through an editorially moderated Wiki-style online interface.

Each item in the linear transcription – whether the word token "naa", the smudges that may be maps, or the name "Benelong" and the person it refers to — doubles as the reification of a node in the Topic Map, allowing the user to navigate between the different representations with ease. As a single topic in a Topic Map may be reified by reference to any number of addressable objects (both digital and physical), the knowledge stored within the Topic Map will additionally point where possible to locations outside of the digital edition, for instance to Google Earth for physical locations, to resources concerning comparable languages, to libraries and museums for cultural contextualisation — and hopefully to further projects which adopt a similar approach.

References

[1] (SOAS MS 41645: The books of Lieutenant William Dawes, School of Oriental and African Studies, London. Marsden Collection; MS 41645b. Vocabulary of the language of New South Wales in the neighbourhood of Sydney by William Dawes, 1790. ML FM4/3431)

[2] Troy, Jakelin. 1994. The Sydney language. Canberra: AIATSIS

[3] First described in ISO/IEC 13250:2000 Topic Maps: Information Technology -- Document Description and Markup Languages, Michel Biezunski, Martin Bryan, Steven R. Newcomb, ed., 3 Dec 1999; the flavour used in the project is that of XML Topic Maps (