Computer-Aided Palaeography, Present and Future

paper
Authorship
  1. 1. Peter Stokes

    Cambridge University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The field of digital palaeography has received increasing
attention in recent years (Ciula 2005; Bulacu and
Schomaker 2007; Stokes 2007/8; Rehbein et al. forthcoming).
However, this subdiscipline
is not generally
accepted by the wider humanities community, and indeed
some have argued that handwriting is inherently
unquantifiable and cannot be analysed by digital means
(Costamagna et al. 1995–96). This paper therefore asks
what problems might be amenable to digital analysis and
why existing methods have not been widely accepted,
and then introduces new software designed to address
these issues.
Present Issues in ‘Traditional’
Palaeography
One of the principle difficulties in palaeography is subjectivity,
since palaeographers often express qualitative
opinions rather than objective arguments and this can
lead to an ‘authoritarian discipline’ which depends on
‘the authority of the author and the faith of the reader’
(Derolez 2003). Attempts to make the field more scientific
began well before the use of computers, including
an attempt to measure hundreds of letters by hand
(Gilissen 1973), but these were laborious and problematic
and the approach was largely forgotten. Others have
drawn on forensic document analysis, since forensic
document analysts have had to demonstrate the efficacy
and objectivity of their methods in court and are therefore
far ahead of palaeographers in this respect (Srihari
et al. 2002; Davis 2007). Nevertheless, the differences
between modern and medieval handwriting are significant.
Amongst other things, medieval scribes were (usually)
highly trained and can show very little variety from
one scribe to the next, unlike modern writers; forensic
analysts can generally collect a large corpus of known
handwriting whereas palaeographers often have few or
no known examples; and palaeographers cannot verify
their results with the certainty that forensic analysts can
often achieve.
In addition to these problems, palaeographical study
has also been hampered by the lack of an established
terminology, and this makes commmunicating arguments
very difficult indeed. The Comité international de
paléographie latine (CIPL) was founded in 1953 partly
for this reason but, despite sixteen international meetings
to date, it is nowhere near producing an accepted solution
(but see Muzerelle 2003). However, the problem
is receiving more attention with the increasing use of
databases and online catalogues. Attempts to computerise
existing catalogues have revealed inconsistencies in
earlier sources, and the absence of accepted terminology
has made digital resources difficult to search and almost
impossible to interconnect.
Another issue is the volume of data to be analysed. Palaeographers
have tended to consider small corpora of
scribal hands, but these are not necessarily representative
of wider practices, and any conclusions based on
such samples are necessarily limited. Larger corpora are
beyond traditional methods, however, since they can include
hundreds of scribal hands with potentially thousands
or tens of thousands of features (Stokes 2007/8).
Digital methods can help, as databases and data-mining
can both be used to manage large quantities of material.
Similarly, if catalogues and other online resources followed
standards in describing handwriting then data can
be pooled from many different projects. However, as
noted above, such standards do not yet exist.
Present Issues in ‘Digital’ Palaeography
These difficulties can all be addressed to some extent by
digital methods, and new studies in ‘digital palaeography’
are going some way towards doing that (Ciula 2005;
Bulacu and Schomaker 2007; Stokes 2007/8). However,
promising as these seem, they have received almost no
acceptance and relatively little interest from ‘traditional’
palaeographers. This is partly because the technology is
not yet mature, and perhaps also because the attempts to
date have generally involved small projects without the
sustained funding or larger interdisciplinary groups that
digital humanities often require (Pierazzo 2008). However,
neither of these seems to be a complete explanation,
and another problem may be one of understanding
and engagement. Even if procedures are communicated
clearly using recognised terminology, scholars still require
a good understanding of many complex fields to
fully appreciate and engage with the material, including
post-graduate level probability theory, digital signal processing,
computer graphics, and more, all skills which
cannot be expected from palaeographers. This means
that they cannot engage meaningfully with the results
of digital methods and that we have replaced the human
authority with a digital one.
Another difficulty is that all results should be reproducible
or at least verifiable if any claim to objectivity is
made. However, this is rarely done in practice. For example, most techniques for enhancing images require a
great deal of ad hoc human intervention which cannot
be accurately documented using most existing tools.
Many scholars in the humanities use Adobe Photoshop
for enhancing images (Craig-McFeely and Lock 2006),
but Photoshop provides no easy way of recording actions
such as colour-selection, nor is it always clear exactly
how the proprietary algorithms work. Photoshop is undoubtedly
useful and should not be discarded lightly, but
the methods and tools must allow full documentation
and reproducibility.
The lack of standard terminology for handwriting not
only makes communication between scholars difficult,
but it also makes databases of script all but impossible.
Some less conventional approaches have been tried,
such as the Manchester ‘C11’ database (Rumble 2005;
Scragg et al. 2005), but this particular classification and
interface is very opaque and subjective, so that neither
search criteria nor result-sets are intelligible to users. Alternatives
such as descriptive rather than visual criteria
for searches have also been attempted (Stokes 2007/8),
but these still depend on users understanding a specific
and ad hoc vocabulary and so are difficult to search and
are nearly impossible to integrate with other resources.
Some Future Directions
Perhaps the easiest problem to solve is that of documentation
and reproducibility, since software can be designed
to record all operations performed and then save this
as a MIX/METS file or other standard format (Stokes,
forthcoming). A Photoshop plugin could achieve this but
would be subject to Adobe’s licensing and the restricted
availability of their Software Development Kit. Instead,
basic software for image enhancement can be developed
relatively easily, for example with the Java Advanced
Imaging library (JAI), and any purpose-built software to
quantify scribal hands can (and should) reveal its algorithms
and log every detail of what it does.
Regarding understanding and scholarly engagement, it
is unreasonable to expect palaeographers to understand
the detailed mathematics of image-processing and datamining,
but the techniques are promising and should not
be discarded lightly. One possibility is therefore to find
ways of presenting the results of this software in ways
that are intelligible to palaeographers: if the results can
be understood then they are more likely to be accepted.
This problem is complex, though, as data-mining often
produces results which are not intelligible and cannot
necessarily be articulated in an intelligible way.
The Framework for Image Analysis
To test some of these principles, the author of the present
paper has developed new modular and extendible
software in Java for the analysis of scribal hands. Users
can load images of handwriting and then run processes
to generate metrics for scribal hands, where each
process implements one or more algorithms to extract
features from images of handwriting and records both
the process and the result. A module is also provided to
enhance images before processing, and this can be run as
a stand-alone application to help recover text from damaged
manuscripts (Stokes, forthcoming). The system can
compare metrics generated by different processes and
thereby measure distances between samples of writing,
and both metrics and distances can be exported for further
study. The processes are implemented as plugins so
that users can choose different combinations of processes
for different situations and can also implement their
own algorithms and exchange these to allow others to
test their methods and reproduce their results. This allows
people to compare different techniques in a common
framework, producing libraries of scribal hands and
plugins as a common and documented basis for palaeographical
study.
Fig. 1: Framework for Image Analysis
A system like this should alleviate some of the problems
discussed above. All analysis using the framework is
documented and reproducible, assuming that the images
are freely accessible and that the plugins conform to the
principles outlined above. The results provide objective
evidence, the validity and interpretation of which can then
be debated. The metrics and processes may or may not be
meaningful to scholars in the humanities, depending on the
plugins, but at least they are open to scrutiny by those who
can and wish to do so. Indeed, it is an important question
how the results of complex algorithms can best be presented
to scholars in the humanities, and it may well be that the
plugins should allow both ‘computer-
friendly’ and ‘human-friendly’ output, with the latter including graphical
or even interactive displays. Much work is still required
here, but the system described is designed for precisely
this sort of experimentation and thereby, one hopes, provides
a useful environment in which the work can be
done.
Fig. 2: The Image Enhancement module.
Manuscript page is Cambridge, Corpus Christi College
144, 32r. Reproduced by permission of the Master and
Fellows, Corpus Christi College Cambridge
References
Bulacu, M., and Schomaker, L. (2007). Automatic
Handwriting Identification on Medieval Documents.
Proc. of 14th Int. Conf. on Image Analysis and Processing
(ICIAP 2007), pp. 279–284. <http://www.ai.rug.
nl/~bulacu/iciap2007-bulacu-schomaker.pdf>
Ciula, A. (2005). Digital Palaeography: Using the Digital
Representation of Medieval Script to Support Palaeographic
Analysis. Digital Medievalist 1. <http://www.
digitalmedievalist.org/journal/1.1/ciula/>
Costamagna, G., et al. (1995 and 1996). Commentare
Bischoff. Scrittura e Civiltà 19: 325–48 and 20:401–7.
Craig-McFeely, J., and Lock, A. (2006). Digital Image
Archive of Medieval Music: Digital Restoration Workbook.
Oxford: Oxford Select Specialist Catalogue Publications.
<http://www.methodsnetwork.ac.uk/redist/pdf/
workbook1.pdf>
Davis, T. (2007). The practice of Handwriting Identification.
The Library (7th Series) 8: 251–76.
Derolez, A. (2003). The Palaeography of Gothic Manuscript
Books. Cambridge: Cambridge University Press.
Gilissen, L. (1973). L’expertise des écritures médiévales.
Ghent: Éditions scientifiques E. Story-Scientia.
Muzerelle, D. (2003). Vocabulaire codicologique.
<http://vocabulaire.irht.cnrs.fr/vocab.htm>
Pierazzo, E. (2008). Editorial Teamwork in a Digital
Environment: The Edition of the Correspondence of
Giacomo Puccini. Jahrbuch für Computerphilologie 10.
<http://computerphilologie.tu-darmstadt.de/jg08/pierazzo.
html>
Rumble, A. R. (2005). Palaeography, Scribal
Identification and the Study of Manuscript Characteristics.
Care and Conservation of Manuscripts: Proceedings
of the 8th International Seminar, edited by G. Fellows-
Jensen and P. Springborg. Copenhagen: Museum
Tusculanum Press, pp. 217–28.
Scragg, D. G., et al. (2005). ManCASS C11 Database
Project. <http://www.arts.manchester.ac.uk/mancass/
C11database/>
Srihari, S. N., et al. (2002). Individuality of Handwriting.
Journal of Forensic Science 47: 1–17.
Stokes, P. A. (2007/8). Palaeography and Image-Processing:
Some Solutions and Problems. Digital Medievalist
3. <http://www.digitalmedievalist.org/journal/3/
stokes/>
Stokes, P. A. (forthcoming). Recovering Anglo-Saxon
Erasures: Some Questions, Tools and Techniques. Palimpsests
and the Literary Imagination of Medieval England,
edited by R. Chai-Elsholz et al. New York: Palgrave.
Rehbein, M., et al., eds. (forthcoming). Kodikologie
und Paläographie im Digitalen Zeitalter—Codicology
and Palaeography in the Digital Age. Norderstedt:
Books on Demand.
All URLs last accessed 12 March 2009.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None