A Collaborative Linguistic Research Interface for the 1641 Depositions

poster / demo / art installation
Authorship
  1. 1. Deirdre O'Regan

    King's College - University of Aberdeen, School of Language and Literature - University of Aberdeen

  2. 2. Barbara Fennell

    King's College - University of Aberdeen, School of Language and Literature - University of Aberdeen

  3. 3. Séamus Lawless

    Knowledge and Data Engineering Group - Trinity College Dublin

  4. 4. Mark Sweetnam

    King's College - University of Aberdeen, School of Language and Literature - University of Aberdeen

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

A Collaborative Linguistic Research Interface for the 1641 Depositions
O’Regan, Deirdre, School of Language and Literature, University of Aberdeen, King’s College, UK, deirdre.oregan@gmail.com
Sweetnam, Mark, School of Language and Literature, University of Aberdeen, King’s College, UK, sweetnammark@gmail.com
Fennell, Barbara, School of Language and Literature, University of Aberdeen, King’s College, UK, b.a.fennell@abdn.ac.uk
Lawless, Seamus, Knowledge and Data Engineering Group, Trinity College Dublin, Ireland, seamus.lawless@scss.tcd.ie
This poster presents an account of the development of a collaborative research environment for the socio-historical linguistic exploration of a unique seventeenth-century resource of Irish national importance and international significance. It is also an account of the process - from inception to application - of a highly interdisciplinary, unique collaboration between academia and industry, which is part of an evolving set of DH projects.

The '1641 Depositions' in Trinity College Dublin comprise some 8,000 personal statements, in which mainly Protestant men and women of all classes told of their experiences following the outbreak of the 1641 Rebellion in Ireland by the Catholic Irish. Collected by government-appointed commissioners, the witness testimony runs to approximately 20,000 pages, and constitutes the chief evidence for the sharply contested allegation that the rebellion began with a general massacre of Protestant settlers. As a result, this material has been central to protracted and bitter historical dispute.

This body of material, unparalleled elsewhere in Early Modern Europe, provides a unique source of information for the causes and events surrounding the 1641 Rebellion and for the social, economic, cultural, religious, political and linguistic history of seventeenth-century Ireland, England and Scotland. In addition, the depositions vividly document various colonial and 'civilizing' processes, including the spread of Protestantism in one of the remotest regions of the Stuart kingdoms and the introduction of lowland agricultural and commercial practices, together with the native response to these developments.

Following the recent completion of a three year process of digitizing, transcribing and annotating the 1641 Depositions, the resulting Text Encoding Initiative (TEI) encoded corpus has become available for digital enhancement and analysis. The Arts and Humanities Research Council (AHRC) of the United Kingdom has funded the next generation of research on this corpus under the auspices of their ‘Digital Equipment and Database Enhancement for Impact’ programme. ‘Language and Linguistic Evidence in the 1641 Depositions’ is a multi-disciplinary Digital Humanities project designed to create an interactive computer environment in which scholars interested in historical linguistics, corpus analysis and forensic linguistics / discourse analysis can work together with historians, Early Modern prose scholars and other specialists to interrogate these valuable resources, exploiting new methods of personalization, visualization and collaboration.

The 1641 Collaborative Linguistic Research and Learning Environment (CLRLE) has been developed using Omeka, an open-source digital archival collections management system that has become a popular tool in the Digital Humanities for archiving, publishing and managing access to primary source materials such as documents, images, transcriptions and other multimedia resources (see http://omeka.org/). Omeka is an ideal tool for exploring the concept of a collaborative research interface, since it doubles as a content management system and offers myriad possibilities for personalization and collaboration amongst users.

The resulting web-based portal houses fully searchable records of the 1641 Depositions as ‘Items’ in various ‘Collections’, as is typical in Omeka-powered applications. Privileged users of the interface can collaboratively manage the archive and its content, editing Collections and Item metadata (e.g. deposition transcriptions, dates and deponent and commissioner names), annotating and tagging Items and contributing specialist content to public web pages on the site.

A central part of this project has been knowledge exchange with IBM’s LanguageWare Research and Development Team. LanguageWare (http://www.alphaworks.ibm.com/tech/lrw) is IBM’s natural language processing software and is part of the Unstructured Information Management Architecture (UIMA) framework (http://uima.apache.org/). Researchers have addressed the challenge of applying this software (designed for contemporary language analysis) to the highly problematic “dirty data” of the 1641 corpus, with its propensity to variable spellings, morphologic instability and syntactic complexity. This has allowed the identification of important processes of linguistic change and has enabled linguists to trace the development of English in this unique Early Modern corpus. This involved the integration of a suite of software creating a domain-specific UIMA pipeline which offered a level of accuracy comparable to that achieved by manual annotators (Sweetnam and Fennell, 2010). A crucial element of CLRLE will be the integration and exploitation of the results of this analysis.

A particularly valuable feature of the Omeka-powered CLRLE is the provision of an interactive Exhibit Builder tool enabling users to create personalized ‘Exhibits’ of their research outcomes. These Exhibits draw together a highly extensible collection of reusable research objects, including transcribed depositions and associated metadata, dynamic visualizations, the outputs of statistical linguistic analyses and GIS displays. These Exhibits facilitate a high level of research cooperation and dissemination, and also have considerable pedagogical and outreach applications.

This poster charts the successful completion of a multi-disciplinary collaboration involving the adaptation and modification of new and evolving open source technologies for humanities research, significant knowledge exchange between industry and academia, and the interaction of a range of private and public institutions including the University of Aberdeen, Trinity College Dublin, Lancaster University, IBM LanguageWare and the Irish Digital Humanities Observatory (http://dho.ie/). The outcomes of this project offer valuable lessons for future undertakings in the Digital Humanities. CLRLE is an exemplar of the potential for the impact of modern technology and new methodologies on our understanding of historical resources and underlying processes and their continuing contemporary relevance.

References:
Sweetnam, Mark Fennell, Barbara 2010 “Natural Language Processing and Early Modern Dirty Data, ” Proceedings of the Chicago Colloquium on Digital Humanities,

1641 Depositions Project, Trinity College Dublin (link) 15 Mar 2011

TEI Consortium, (ed.) TEI P5: Guidelines for Electronic Text Encoding and Interchange, (link) 30 Aug 2010

Omeka. Version 1.2.1, Center for History and New Media (CHNM), George Mason University (link) 30 Aug 2010

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None