Reading the Readers: Modelling Complex Humanities Processes to Build Cognitive Systems

Melissa Terras

Authorship

1. Melissa Terras

School of Library, Archive and Information Studies - University College London

Original URL

http://web.archive.org/web/20040903094154/http://www.hum.gu.se/allcach2004/AP/html/prop9.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Roman stylus tablets from Vindolanda are an unparalleled resource for scholars of the ancient world, giving the sole immediate account of the Roman occupation of Britain (Bowman and Thomas 2003). Unfortunately, the physical condition of the stylus tablets renders them illegible to the human eye. Novel imaging techniques were developed at the Department of Engineering Science, University of Oxford, to analyse these texts (Bowman, Brady et al 1997, Brady et al 2003), but whilst a scrutiny of the document surface using image processing techniques provided new information, it was necessary to develop a computer system to aid the historians in the reading and interpretation of these images themselves, to speed up the reading process. This paper describes the steps taken to understand the nature of the processes used by the papyrologists, in order to build such a computer system.

Before designing and building any tools to aid papyrologists in the reading of texts, it is a necessary requirement firstly to ask: just what does a papyrologist do when trying to read and understand an ancient text? Although the readings generated from ancient documents provide one of the major primary information sources for classicists, linguists, archaeologists, historians, palaeographers, and scholars from associated disciplines, surprisingly little research has been carried out regarding how an expert constructs meaning from deteriorated and damaged texts (Terras 2002).

This paper discusses an interdisciplinary approach to modelling a complex humanities process, where techniques from artificial intelligence, cognitive psychology, knowledge elicitiation, computational linguistics, and computational content analysis, are combined to result in a proposed model of how experts read ancient documents. This representation was subsequently used as a basis for the development of a computer system which can aid historians in the reading of the Vindolanda texts.

The problem with trying to discover the process that papyrologists go through whilst reading an ancient text is that experts are notoriously bad at describing what they are expert at (McGraw and Harbison-Briggs 1989). The primary questions to be asked in this study were: is there a general process that experts use when reading ancient texts? Can this procedure be elucidated? Additionally, what are the differences and similarities between individual experts’ approaches to the problem? In trying to answer these questions, computational techniques were employed to interrogate and manipulate any data collected, and assist in developing a model of how experts operate in the given domain.

Figure 1 Stylus tablet 836, one of the most complete stylus tablets unearthed at Vindolanda. This text is a letter from Albanus to Bellus, containing a receipt and further demand for payment of transport costs. The incisions on the surface can be seen to be complex, whilst the woodgrain, surface discoloration, warping, and cracking of the physical object demonstrate the difficulty papyrologists have in reading such texts.

A process of Knowledge Elicitation (a technique borrowed from artificial intelligence, where the behaviour of experts is analysed to understand the knowledge and procedures they use whilst carrying out expert tasks (Diaper, 1989)) was used to gain an understanding of the general techniques the experts utilise when approaching an ancient text, and specifically, the Vindolanda tablets. Knowledge Elicitation consists of very defined stages. Firstly, the domain literature was researched. Secondly, any other associated literature was collated. Although not a direct comment on the act of reading and transcribing, the two published volumes regarding the Vindolanda ink tablets contain detailed apparatus of the individual texts (Bowman and Thomas 1983; Bowman and Thomas 1994), and electronic versions of these were subjected to Content Analysis techniques, and linguistic analysis (using WordSmith and TACT ), to detect any underlying structures and decision matrices. T

Three experts were then identified who were working on the Vindolanda texts, and who were willing to take part in this investigation. Think Aloud Protocols, where each expert is set specific tasks, and asked to describe their thought processes, were carried out, and transcripts from these sessions provided data which enabled an explicit and quantitative representation of the way the papyrologists read damaged and abraded texts to be generated. General procedural information was also collated and analysed, and Automated Knowledge Elicitation techniques were used to help resolve this information into a structure that was used as the basis of a model which would provide the structure of a computer program to aid the historians. Particular issues regarding problems in reading the Vindolanda stylus texts were highlighted, indicating areas in which computational tools may be able to aid the papyrologists in reading such texts.

The model that was generated from this process is hierarchical, and recursive, with separate "agents" represented which each carry out specific tasks. It is proposed that an expert reads an ancient document by identifying visual features, and then incrementally building up knowledge about the document’s characters, words, grammar, phrases, and meaning, continually proposing hypotheses, and checking those against other information, until s/he finds that this process is exhausted. At this point a representation of the text is prepared in the standard publication format. At each agent level, external resources may be consulted, or be unconsciously compared to the characteristics of the document. Although a simple representation, the model shows the overall scope of the process of reading an ancient text.

Figure 2: Proposed model of the procedure used to read an ancient text, broken down into individual agents.

The computer version of this model was constructed by identifying, adopting and adapting an existing system for the analysis of satellite images; the GRAVA system, which is implemented in YOLAMBDA, a dialect of LISP (Robertson 1999, 2001). This provided an architecture which could easily represent the hierarchical nature of the papyrologist model, and was a successful basis to construct a working computer system which intakes images of ancient documents and generates plausible interpretations of the text of the documents.

This paper will discuss the knowledge elicitation procedures and techniques adopted in this research, demonstrate results and conclusions of these procedures, and explain how a model can be resolved from this data. Constructing an explicit model of such a process is the first stage in building a computer system which replicates the process, and it will be shown how the resulting computer program depends on the underlying model which was generated through this research. Finally, it will be shown how the analysis of complex humanities procedural tasks in this manner can result in computer systems which aim to aid experts to carry out those tasks more efficiently.

References

1. Bowman, A. K., J. M. Brady, et al. (1997). "Imaging Incised Documents." Literary and Linguistic Computing 12(3): 169 - 176.
2. Bowman, A. K., and J. D. Thomas (1983). Vindolanda: The Latin Writing Tablets. London, Society for Promotion of Roman Studies.
3. Bowman, A. K., and J. D. Thomas (1994). The Vindolanda Writing-Tablets (Tabulae Vindolandses II). London, British Museum Press.
4. Bowman, A. K. and J. D. Thomas (2003). The Vindolanda Writing-Tablets (Tabulae Vindolandses III). London, British Museum Press.
5. Brady, M., X. Pan, M. Terras, V. Schenk. (Forthcoming, (2003)). Shadow Stereo, Image Filtering and Constraint Propagation. Images and Artefacts of the Ancient World, London, British Academy.
6. Diaper, D. (1989). Knowledge Elicitation: Principles, Techniques and Applications. Chichester, Ellis Horwood.
7. McGraw, K. L. and K. Harbisson-Briggs (1989). Knowledge Acquisition: Principles and Guidelines. London, Prentice-Hall International Editions.
8. Robertson, P. (1999). A Corpus Based Approach to the Interpretation of Aerial Images. IEE IPA99, Manchester.
9. Robertson, P. (2001). A Self Adaptive Architecture for Image Understanding. D.Phil Thesis, Engineering Science, University of Oxford.
10. Terras, M. (2002). Image to Interpretation: Towards an Intelligent System to Aid Historians in the Reading of the Vindolanda Texts. D.Phil Thesis, Engineering Science, University of Oxford.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Conference website: http://web.archive.org/web/20040815075341/http://www.hum.gu.se/allcach2004/

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

Reading the Readers: Modelling Complex Humanities Processes to Build Cognitive Systems

1. Melissa Terras

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004