Report on the Digital Humanities 2010 conference (DH2010)

General Information

Digital Humanities 2010, the annual international conference of the Alliance of Digital Humanities Organisations (ADHO), was hosted by the Centre for Computing in the Humanities (CCH) and the Centre for e-Research (CeRch) at King's College London from 7th to 10th July. The conference, whose theme was “Cultural expression, old and new”, featured opening and closing plenary talks given by Charles J. Henry and Melissa Terras respectively, theBusa Award and Lecture given by Joseph Raben,three days of academic programme with four strands each, and four poster sessions, as well as a complementary social programme. The highlights of the latter included a conference dinner at the Great Hall at Lincoln's Inn, and excursions to Hampton Court, to the Tate Britain and Tate Modern museums, and to the Globe Theatre and Exhibition. The conference website is at < from which the conference abstracts can be downloaded. Discussions could also be followed on Twitter at < The three-day programme of parallel sessions began on Thursday morning, the following is a summary of the panels and paper sessions attended.

Session one

The first session attended was a panel session, entitled “Building the Humanities Lab: Scholarly Practices in Virtual Research Environments” and moderated by Charles van den Heuvel and Smiljana Antonijevic of the Royal Netherlands Academy for Arts and Sciences, in which six panelists presented their work on virtual research environments:

  1. David Bodenhamer, Director of the Polis Center at Indiana University Purdue University Indianapolis (IUPUI), introduced his centre < a multi-disciplinary research unit founded in 1989. The centre adopted GIS as its technology of choice in 1992 and has six focus areas: geoinformatics, community informatics, health geographics, spatial informatics, humanities GIS, and community collaborations. The centre is a matrix organisation with 25FTE and employs a central strong project management system and industry-standard practices. The centre is entirely projects-based, it runs between 50-100 projects, and self-funded with a diverse funding base that includes consultancy work. One of its key focus area is the Virtual Center for Spatial Humanities, a Humanities GIS with book and journal publications in preparation.
  2. Fotis Jannidis of the University of Würzburg presented on the BMBF-funded Textgrid project < The project is now entering its second phase, there are already a number of resources and tools at researchers' disposals for retrieval and re-use. The integrated tools in the user environment include data capture, editing, analysis, workflows, annotation, and publication. The front-end is the Textgridlab software, the services on the service layer include among others tokenize, lemmatize, sort, transform, and collate, the backend is provided by the Grid. Textgrid is based on lucene and eXist. Textgrid is now a library-based project, academics serve as advisers. Some of the challenges include: providing robust, production-ready tools and sustainability for both services and data; need for structure and an architecture, but openness is a necessity, and development should be collaborative; availability of an integrated user interface, but user-defined processing pipelines are needed.
  3. Bethany Nowviskie of the University of Virginia Scholars' Lab presented on the NINES VRE < NINES is a collaborative research environment for nineteenth-century scholarship, which models an alternative to traditional peer-reviewed publishing. Peer-review is the core function of NINES, currently there are 670,000 digital objects available in an entirely scholar-driven environment. While the social functions in the community have been less successful, work concentrates on resource discovery rather than on an VRE as an online environment that needs to be studied to be used. Essentially, NINES establishes federated peer-review and resource discovery of quality online materials.
  4. Geoffrey Rockwell of the University of Alberta, Canada, gave a presentation on the TAPoR Portal < which is being put on a Grid-infrastructure. The Portal is a Web-services broker for the TAPoR tools, the second-generation tools known as Voyeur, formerly TAPoRware tools. There is also an accompanying Web of tutorials, documentation, and a wiki. Some challenges remain: building a portal is tricky, the complexity of learning about and using it is prohibitive; tools pose a challenge as well, as humanists think of texts not tools, therefore tools need to be plugged into texts; lastly, community building is challenging, methods and research practices are important aspects. There are however benefits too: Web-based tools are easier to develop and maintain, and adaptable to different environments; it is possible to offer APIs to the community. Funding is always an issue: TAPoR seeks sustainability through collaboration with the HPC infrastructure; libraries are considered to be permanent homes for projects like this.
  5. Mike Priddy of King's College London gave a presentation on the EU-funded DARIAH project < which develops a digital research infrastructure for the Arts and Humanities. DARIAH's roots are in VREs, it was originally planned to widen the reach of e-science. The ultimate aim of the project is a Virtual Research Community (VRC) that is based on the collective intelligence of the community and establishes an architecture of participation. It aims to be a Social VRE, one that is institutionally grounded in everyday research and empowers research that is open and re-usable. The challenges include interoperability (integration of data), collaboration that can lead to collective intelligence, integration of tools and services into everyday research, software engineering vs. humanities practices. Conceptually, VREs are second-generation digital libraries, in which the concept of “research objects” is foregrounded.
  6. Joris van Zundert of the Department of Software R&D at the Huygens Institute, Royal Netherlands Academy of Arts and Sciences, presented on AlfaLab, a digital research infrastructure < The project is a collaboration between computer science and humanities divisions. It takes the “distant reading” paradigm, coined by Franco Moretti, as its underlying concept. AlfaLab conceptualizes VREs as highly specific applications for very specific research questions for specific communities. AlfaLab comprises TextLab, GISLab, LifeLab, underlying them is a common technical infrastructure that facilitates very basic functions, e.g. annotations across corpora and collections. AlfaLab 2.0 is currently planned as a support service for digital researchers enabling them to innovate, providing support, and offering sustainability.

Session two

In the second session, entitled “TEI”, Piotr Bański of theUniversity of Warsaw and AdamPrzepiórkowski of the Institute of Computer Science, Polish Academy of Sciences, presented on “TEI P5 as a Text Encoding Standard for Multilevel Corpus Annotation”. The presenters introduced the National Corpus of Polish (NCP) project < a three-year project that involves four institutions. It builds on several previous corpora projects and employs multiple levels of linguistic annotation. One million words have been annotated manually at a detailed level. The workflow starts with the source text, which is segmented into its morphosyntax, into syntactic words (syntactic groups and named entity versions), and word senses (kept in a sense inventory). Several standards have been investigated to provide the encoding for the corpus: TEI P5, XML Corpus Encoding Standard (XCES), and the ISO TC37/SC4 family of standards, including LAF (Linguistic Annotation Framework). The pros and cons of the standards were reviewed and weighed and a decision for TEI P5 was made primarily due to its good documentation and solid user base. The NCP project has customized the P5 schema for its specific purposes by comparing the solutions offered by TEI to the other standards and then recommend the customization as best practice for the project. The segment element is used for sentences, phrases, words, etc. instead of feature structures, ptr is used to point to immediate constituents as it allows for encoding discontinuities, which are very frequent in the Polish language. TEI has proven to be a reasonable choice as it is sufficiently flexible to implement data models proposed by a variety of standards.

John Walsh of Indiana University gave a paper entitled “'It’s Volatile': Standards-Based Research & Research-Based Standards Development”. Digital Humanities scholarship connects humanities scholarship with technological research and development, such as standards development. Unicode is one of the most important standards for the digital humanities. The Chymistry of Isaac Newton project < uses Unicode extensively to digitise and edit, study and analyse the alchemical works of Isaac Newton and to develop various scholarly tools around the collection. The project has made a proposal to the Unicode Consortium to register all the alchemical symbols in Newton's alchemical writings. The proposal process is a thorough, peer-reviewed process, the project's proposal is currently in its final stages and due to be included in the Unicode 6.0 release of the standard. One of the main challenges of the project was that by its nature alchemical language is imprecise, full of allusion and uncertainty, which is in conflict to the stringent requirements of character standardisation. The graphic and pictorial nature of alchemical language makes glyphs very important carriers of meaning, which contradicts the idea in Unicode of abstract characters that can be represented by a number of different glyphs. Some of the solutions employed involved additional layers of meaning used in combination with standard Unicode characters, additional markup, use of images, and representing variants of characters and glyphs. The Unicode proposal process has been a rewarding experience and as it was successful it has benefited the scholarly community and potentially a much wider audience.

Deborah Anderson of UC Berkeley presented on “Character Encoding and Digital Humanities in 2010 – An Insider's View”, a paper related to the previous paper in the same session. Unicode < is now the most used encoding standard on the Web, TEI's endorsement of the standard has been phenomenally helpful in establishing it as the encoding standard of choice for the electronic transmission of text. Unicode 5.2 currently defines over 100,000 characters for over 75 scripts, Unicode 6.0 is about to be finalized and published. While Unicode offers a vast array of character choices to the digital humanist, its breadth and depth can also be confusing. This is particularly true of the approval process for new characters by the Unicode Technical Committee and the relevant ISO committee. The presenter explained this process, which involves two standards committees, namely the Unicode Technical Committee (UTC) and the ISO/IEC JTC1/SC2 WG2, and usually takes more than two years. The UTC is primarily concerned with technical questions and computer science applications, discussions are often based on potential issues for computer systems and electronic text. The ISO committee is comprised of national representatives of all member countries and support from a national body is highly advisable and sometimes essential for proposals to be accepted into the international standard, discussions are sometimes more political in nature. Being aware of the interests and backgrounds of each standard group and their members can help explain what appears to be a spotty set of characters in Unicode.

Session three

The third session, headed “Archives”, began with a paper by Peter Doorn and Dirk Roorda of the Data Archiving and Networked Services, Netherlands, entitled “The ecology of longevity: the relevance of evolutionary theory for digital preservation”. The presenter borrowed notions from evolutionary theory in thinking about digital longevity or “digital survival”. Thus, the concept of “sustainability”, which in biology refers to ecological systems that remain diverse and productive, is analogous to preservation in the digital environment. Digital objects are intended for the long term, but only short-term survival can realistically be accomplished. The preservation of individual objects may be less effective than making them richer and interoperable. Preservation strategies are either emulation, which preserves the environment of the data, or migration, which adapts the data to the environment. Current digital preservation thought states to first preserve and then re-use, however, this has the perverse consequence that preserving too well actually prevents re-use. Moreover, in evolutionary theory systems get rid of unused functions, therefore it is more suitable to first re-use and thus preserve. Evolution is based on mutation and copies, not originals, making copies preserves data. Reproduction in ecological systems can therefore be directly linked to the interoperability of data, mating your data will preserve it. An ecology of sustainable information must be governed by certain rules: a financial incentive to preserve is helpful, optimizations of both value and cost are necessary, data use is instrumental in preservation, and the actors involved must constantly be aware of the chances and risks of survival of the digital data they curate and use.

The second paper, entitled “The Specimen Case and the Garden: Preserving Complex Digital Objects, Sustaining Digital Projects”, was given by Melanie Schlosser and H. Lewis Ulman of Ohio State University. The presenters outlined the aspects of preservation of digital humanities projects from a librarian's and a scholar's point of view. Preservation too often means taking a snapshot of living projects and storing them away where they die from lack of use. From a scholarly perspective, digital projects led by scholars are often ambitious and complex, more often than not a labour of love of individuals, often situated on the fringes of the official institutional preservation infrastructure. From a librarian's perspective, in order to support faculty research effectively, standardisation is paramount, there are few resources available for unique projects, however the library is committed to offer advice and to preserve projects. The NEH-funded project "Reliable Witnesses: Integrating Multimedia, Distributed Electronic Textual Editions into Library Collections and Preservation Efforts" < a collaboration between faculty and librarians, can serve as an example. For preservation purposes ('the specimen case'), the library has adopted a low-level approach to preservation of projects, namely preserving all pieces of a project as METS-files, employing a dedicated, registered METS profile. The digital projects are ideally sustained in their production environments ('the garden'), this requires to work closely with and connect all the staff involved in digital projects in order to support staff in creating robust, sustainable projects. This makes it possible to curate a set of completed projects that comply with certain criteria and to sustain them beyond the time of involvement of the original creators.

The final paper of the session, entitled “Digital Libraries of Scholarly Editions”, was authored by George Buchanan of the School of Informatics, City University London, and Kirsti Bohata, Centre for Research into the English Literature and Language of Wales (CREW), Swansea University. Digital editions are complex, often bespoke, dynamic objects, which are difficult to preserve. The presenters are working on establishing common denominators of the scholarly electronic editions created in Wales that will allow them to create a central “library” of digital editions. Digital libraries have been a major movement in Computer Science since at least 1995. Digital editions are best organized into collections, but often digital libraries are not designed to hold complex objects, instead they reduce, make static, and atomize content. While digital library systems can provide preservation and collection-level description, little focus is on the internal organisation of scholarly editions. A user study of producers and consumers of digital editions has been undertaken to establish the common features of scholarly editions. The presenters aim to develop extensions to the Greenstone Digital Library system to bridge the gap between digital library software development and the specific requirements of digital editions. In particular, the extension of the Greenstone SW is intended to support both text and commentary, dynamic content, collections of content, and provide interoperability.

Session four

In a session headed “Literature”, John Walsh of Indiana University gave a paper entitled “'Quivering Web of Living Thought': Mapping the Conceptual Networks of Swinburne's Songs of the Springtides”. The presenter introduced a new section of the Swinburne Archive < with experimental work. The starting point for the experiments is the postmodernist view of texts as nodes in networks that work on both intratextual and intertextual levels. Swinburne himself used the metaphor of networks and nodes in a Web of intertextuality. His poetry is highly allusive and referential, and employs its own collection of symbols, images, and concepts. His “Songs of the Springtide” collection is an excellent example of such a deliberately designed volume of networks: the poems have been organized into three parts with a transitional sonnet and a final ode. The thematic networks employed include: literary figures; poetry, music, and song; the passage of time; celestial bodies: sun, moon, and stars; the sea. This highly structured volume of poetry is almost schematic in its explorations of these topics. The texts have been marked up using the TEI P5 encoding standard, and keywords are used to encode these concepts. A P5 taxonomy is used and keywords are encoded as <seg type=”keyword”>...</seg>. The presenter demonstrated an interface that allowed for the selection of clusters of keywords, which were highlighted in individual poems, but could also be seen in a bird's eye view of the entire volume of poetry thus offering a unique view of the nodes at work in the overall structure of the volume.