Converting St Paul: A new TEI P5 edition of The Conversion of St Paul using stand-off linking

paper
Authorship
  1. 1. James Cummings

    Oxford University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In researching the textual phenomena and scribal practices of
late-medieval drama I have been creating an electronic edition
of The Conversion of St Paul. This late-medieval verse play
survives only in Bodleian MS Digby 133. My edition attempts
to be a useful scholarly work, which where possible leverages
existing resources to create an agile interoperable resource.
In undertaking this work I have been able to explore a number
of issues related to the creation of such editions which may
have pedagogic benefi t as a case study to others. These include
shortcuts to speed up the creation of the initial edition, the
generation of additional resources based solely on a single
XML encoded text, the use of new mechanisms in the TEI
P5 Guidelines, and the exploitation of stand-off markup to
experiment with seamlessly incorporating external resources.
The TEI P5 Guidelines have been followed in production of
this edition, and specifi cally I have used a number of features
which are new additions to TEI P5 and which others may not
yet be familiar. It was developed in tandem with early drafts of
the Guidelines, in part to test some of the features we were
adding.
These include:
• A customised view of the TEI expressed as a TEI ODD fi le.
This allows generation not only of constrained TEI Schemas
and DTDs but also project specifi c documentation through
the TEI’s ODD processor ‘Roma’.
• A manuscript description of MS Digby 133, and the
manuscript item of this play, using the TEI’s new module for
manuscript description metadata.
• Consistent and in-depth use of the new <choice>
structure to provide alternative textual information at
individual points in the text. Specifi cally, in this highlyabbreviated
medieval text, this has been used to provide
both abbreviations and expansions which then can be
toggled in the rendered version. Regularisation of medieval
spelling has been handled with stand-off markup, but could
equally have been incorporated into <choice>
• Inside abbreviations and expansions, the new elements
<am> (abbreviation marker) and <ex> (expanded text)
have been used. This allows the marking and subsequent
display of abbreviation marks in a diplomatic edition view
and italicised rendering of the supplied text in the expanded
view.
• The edition also records information about the digital
photographic surrogate provided by the Bodleian using the
new <facsimile> element. While this will also allow linking
from the text to particular zones of the images, this has not
yet been undertaken..
• The edition also uses various new URI-based pointing
mechanisms and datatype constraints new to TEI P5.
To speed up the creation of the edition I took an out of
copyright printed version of the text (Furnivall 1896) which
was scanned and passed through optical character recognition.
This was then carefully corrected and proofread letter-byletter
against freely available (though restrictively licensed)
images made available online by the Bodleian Library, Oxford
(at http://image.ox.ac.uk/show?collection=bodleian&manu
script=msdigby133). I used OpenOffi ce to create the initial
edition, with specialised formatting used to indicate various
textual and editorial phenomena such as expanded material,
superscript abbreviation markers, stage directions, and notes.
The up-scaling of the markup through using this presentational
markup was achieved when I converted it, using XSLT, to very
basic TEI XML. While this is a quick method of data entry
familiar to many projects, it only tends to work successfully
in non-collaborative projects where the consistency of the
application of formatting can be more easily controlled as any
inconsistencies can lead to signifi cant manual correction of the
generated XML.
Another of the issues I was interested in exploring in
this edition was the use of stand-off markup to create
interoperable secondary resources and how this might effect
the nature of scholarly editing. While I could have stored much
of this information I needed in the edition itself, I wanted to
experiment with storing it in external fi les and linking them
together by pointing into the digital objects. The motivation for
this comes from a desire to explore notions of interoperability
since stand-off markup methodology usually leaves the base
text untouched and stores additional information in separate
fi les. As greater numbers of good scholarly academic resources
increasingly become available in XML, the pointing into a
number of resources, and combining these together to form
an additional greater resource is becoming more common.
Stand-off markup was used here partly to experiment with the
idea of creating an interoperable fl exible resource, that is an
‘agile edition’. For example, an edition can be combined with
associated images, a glossary or word list, or other external
resources such as dictionaries. In the case of this edition, I
generated a word list (encoded using the TEI dictionaries
module) using XSLT. The word list included any distinct
orthographic variants in the edition. This was based on a ‘deepequals’
comparison which compared not only the spelling of
words, but all of their descendant elements, and thus captured
differences in abbreviation/expansion inside individual words.
The location of individual instances of orthographic variance
in the edition could easily have been stored along with the
entry in the word list. However, since part of the point was to
experiment with handling stand-off markup, I stored these in a third fi le whose only purpose was to record <link> elements
pointing both to an entry in the word list and every single
instance of this word in the edition. This linking was done using
automatically-generated xml:id attributes on each word and
word list entry. This enables a number of usability features.
The clicking on any individual word in the edition takes you
to its corresponding entry in the word list. From any entry in
the word list you can similarly get back to any other individual
instance of that word in the edition. Moreover the word list
entry also contains an optionally displayed concordance of
that word to allow easy comparison of its use in context.
In addition to using resources created by myself, it was a desired
aim of this investigation into stand-off markup to use external
resources. The most appropriate freely-available resource in
this case is the Middle English Dictionary (MED), created by
the University of Michigan. As this scholarly edition was being
created in my spare time, I did not want to exhaustively check
orthographic words in my edition against the MED and link
directly to the correct senses. While that is certainly possible,
and should be the recommended text-critical practice, it
would take a signifi cant amount of time and be prone to error.
Instead I desired to pass a regularised form of the word to the
MED headword search engine, and retrieve the results and
incorporate them dynamically into the display of the entry for
that word in the word list. However, this proved impossible
to do from the MED website because their output, despite
claiming to be XHTML, was not well-formed. Luckily, they
were willing to supply me with an underlying XML fi le which
provided not only headwords, but also their various different
orthographic forms and the MED id number to which I could
link directly. Thus, I was able to achieve the same effect as
transcluding the MED search results by reimplementing the
functionality of their search directly in my XSLT and thus
providing pre-generated links in each entry to possible
headwords in the MED. While successful for my resource, in
terms of true interoperability it is really a failure, one which
helps to highlight some of the problems encountered when
pointing into resources over which you have no control.
The proposed paper will describe the process of creation
of the edition, the benefi ts and drawbacks of using standoff
markup in this manner, its linking to external resources,
and how the same processes might be used in either legacy
data migration or the creation of new editions. One of the
concluding arguments of the paper is that the advent of new
technologies which make the longpromised ability for the
interoperability of resources that much easier, also encourages
(and is dependent upon) us making our own existing materials
accessible in a compatible manner.
Bibliography
Baker, Donald C., John L. Murphy, and Louis B. Hall, Jr. (eds)
Late Medieval Religious Plays of Bodleian MSS Digby 133 and E
Museo 160, EETS, 283 (Oxford: Oxford Univ. Press, 1982)
Eggert, Paul. ‘Text-encoding, theories of the text and the
“work-site”’. Literary and Linguistic Computing, (2005), 20:4,
425-435
Furnivall, F. J. (ed.) The Digby Plays, (New Shakespeare Society
Publications, 1882) Re-issued for EETS Extra Series LXX,
(London: EETS, 1896)
Robinson, P. ‘The one and the many text’, Literary and Linguistic
Computing, (2000), 15:1, 5-14.
Robinson, P. ‘Where We Are with Electronic Scholarly
Editions, and Where We Want to Be’, Jahrbuch für
Computerphilologie, 5 (2003), 123-143.
Robinson, P. ‘Current issues in making digital editions of
medieval texts or, do electronic scholarly editions have a
future?’, Digital Medievalist, (2005), 1:1, Retrieved 1 Nov. 2007
<http://www.digitalmedievalist.org/journal/1.1/robinson/>
Sperberg-McQueen, C. M. Textual Criticism and the Text
Encoding Initiative. Annual Convention of the Modern
Language Association, 1994. reprinted in Finneran, R. J. (Ed.)
(1996) The Literary Text in the Digital Age. Ann Arbor:
University of Michigan Press. 37-62
TEI Consortium, eds. TEI P5: Guidelines for Electronic Text
Encoding and Interchange. Retrieved 1 Nov. 2007 <http://www.
tei-c.org/release/doc/tei-p5-doc/en/html/index.html>
Unsworth, J. Electronic Textual Editing and the TEI. Annual
Convention of the Modern Language Association, 2002.
Retrieved 1 Nov. 2007. <http://www3.isrl.uiuc.edu/
~unsworth/mla-cse.2002.html>.
Vanhoutte, E. ‘Prose fi ction and modern manuscripts:
limitations and possibilities of text-encoding for electronic
editions’. In J. Unsworth, K. O’Keefe, and L. Burnard (Eds).
Electronic Textual Editing. (New York: Modern Language
Association of America, 2006), p. 161-180.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2008

Hosted at University of Oulu

Oulu, Finland

June 25, 2008 - June 29, 2008

135 works by 231 authors indexed

Conference website: http://www.ekl.oulu.fi/dh2008/

Series: ADHO (3)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None