Text-Image Linking Environment (TILE)

Dorothy Carr Porter; Doug Reside; John Walsh

Authorship

1. Dorothy Carr Porter

Digital Humanities Observatory - Royal Irish Academy
2. Doug Reside

University of Maryland, College Park
3. John Walsh

Indiana University, Bloomington

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction
To create the next generation of the technical infrastructure
supporting image-based editions and electronic
archives of humanities content, we are developing
a new web-based image markup tool, the Text-Image
Linking Environment (TILE), through a collaboration of
the Maryland Institute for Technology in the Humanities,
Indiana University Bloomington, the Royal Irish
Academy, the University of Oregon, and Harvard’s
Center for Hellenic Studies. Despite the proliferation of
image-based editions and archives, the linking of images
and textual information remains a slow and frustrating
process for editors and curators. TILE, built on the existing
code of the AXE image tagger, will dramatically
increase the ease and efficiency of this work. TILE will
be interoperable with other popular tools (including both
the Image Markup Tool and the Edition Production and
Presentation Technology suite) and capable of producing
TEI-compliant XML for linking image to text. We
will also put the image linking features of the newest
version of the Text Encoding Standard (TEI P5) through
its first rigorous, “real world” test, and, at the close of the
project, expect to provide the TEI with a list of suggestions
for improving the standard to make it more robust
and effective. TILE will be developed and thoroughly
tested with the assistance of our project partners, who
represent some of today’s most exciting image-based
editions projects, in order to create a tool generated by
the community, for the community, with the expectation
that, unlike so many other tools, it will be used by the
community.
History of Images in the Digital Environment
Texts, from the earliest classical inscriptions to most
twentieth-century correspondence, were originally inscribed
on such physical objects as stones, papyrus
scrolls, codex manuscripts, printed books, and handwritten
and typewritten letters. As editors transfer a
text from its original inscription, some of this context
is necessarily obscured. Further, editors must often
make potentially questionable decisions as they interpret
the unclear or damaged text on the original artifact.
A good editor will, of course, highlight such interventions
in textual notes, but such notes, usually in
small type and inconveniently separated from the main
text, often go unread. The inclusion of page facsimiles
can make the editorial process more transparent,
but in print editions the reproduction of multiple, high
quality images is often prohibitively expensive. Digital
facsimile editions, on the other hand, may be distributed
far less expensively, and so many editors are
now choosing to publish their facsimile editions online.
The growth of the Internet as a public space in the
early 1990s led to the first generation of widely-accessible
scholarly electronic archives, and even at this
early stage many projects integrated images into their
work in significant ways. The Valley of the Shadow
(1993), provided images for some of letters in the collection
(in relatively low resolution), and the Rossetti,
Dickinson, and William Blake Archives brought together
encoded texts and images or parallel viewing
and study.1
The relationship between image and texts
in these archives is quite simple: for example, the page
image of the source of the edited text in the Valley of
the Shadow or the Rossetti Archive may be opened in
a separate window, but the links go no deeper than the
page level. One cannot, for instance, link from a word
in the edited text to its location in the image or click
an interesting area in the image to read an annotation.
At the same time that these relatively open-ended online
archives were under development, other scholars were
taking advantage of digital technologies to build selfcontained
scholarly editions. Some of the earliest efforts
include the Wife of Bath’s Prologue on CDROM (Chaucer
1996), the Electronic Beowulf (Kiernan 1999), and
the Piers Plowman Electronic Archive, Vol. 1, (Langland
2000). As with the online archives, these early editions
were limited in how closely they linked image and text.
The Electronic Beowulf did provide some annotations
linked to areas on the manuscript folio image, but there
are few of these as the coordinates for each had to be
added to the HTML “by hand.”
As the community of scholars developing image-based
projects has grown in the past decade, tools have been
created that are actively used for project development.
As of November 14, 2008, the project investigators know
of no fewer than ten tools or collections of tools that allow users to edit or display images within the context of
textual projects or editions. These range from those that
simply display an image alongside a text, to very robust
software suites which support the development of complete
image-based projects with substantial functionality
beyond simple text-to-image mapping.
The simplest tools enable the viewing of images alongside
text transcription, either for editing or for display.
Juxta, developed through Networked Infrastructure for
Nineteenth-century Electronic Scholarship (NINES)
<http://www.nines.org/tools/juxta.html> provides a
window for viewing image files (if provided) alongside
transcriptions, which could be very useful for an editor
checking readings or adding annotations, but does not
provide any method for connecting the image with the
text beyond the page level. Similar is the Versioning Machine,
developed by Susan Schreibman at the University
of Maryland Libraries (http://v-machine.org/): a display
tool for comparing encoded texts that also enables page
images to be linked to the text at the page level. These
tools are both useful, but for those scholars who seek to
include more fine-grained linking in their projects they
are not suitable.
There are also tools that support the linking of image to
transcription or annotation. The Edition Production and
Presentation Technology (EPPT), developed by Kevin
Kiernan at the University of Kentucky under the aegis of
the Electronic Boethius and ARCHway projects (http://
www.eppt.org/eppt/) is a set of tools that have been developed
in and run through the Eclipse software development
platform. One of the main functions of the tool is
to link transcription to an image of text, although it provides
much more robust functionality. The Image Markup
Tool (IMT), under development by Martin Holmes at
the University of Victoria, BC, is the first tool to output
complete and valid TEI P5 XML. The IMT enables a
user to place a series of annotations on an image, resulting
in a file that validates against the regular (unmodified)
TEI P5 schema, and then enables the user to create
HTML for the display of those annotations online. The
IMT is very simple and easy to use, and is in many ways
a model of the type of tool that we will be developing in
this project - it does one thing, and it does it very well.
Unfortunately, the IMT runs only on Windows machines
and cannot be easily ported into new web-based projects.
TILE will interoperate with the constrained IMT
TEI format.
There have also been some efforts to build tools to automate
the creation of links between transcribed text and
image of that text. Hugh Cayless at UNC-Chapel Hill
has recently developed a system for automating im-text
linking, a process he presented at the Text Encoding Initiative
Member’s Meeting, November 2008,2
and Reside
has also developed the Word Linking tool, originally developed
for the Shakespeare Quartos project.
The Ajax XML Encoder (AXE), also developed by Reside,
allows users with limited technical knowledge to
add metadata to text, image, video, and audio files. Users
can collaboratively tag a text in TEI, associate XML
with time stamps in video or audio files, and mark off
regions of an image to be linked to external metadata.
At present the web-based image tagger allows users to
select regions in an image and store the coordinates of
this region in a database, but it does not provide tools to
make use of this data once it is stored. The text tagger
allows a user to specify a relaxNG schema and then tag
a text using this schema, but it requires users to enter
coordinates for image links by hand (it does not, at present,
interface easily with the image tagger). The tools
in AXE were always intended to be interoperable and
to have the functionality described in this narrative, and
this current collaboration allows us to move the suite to
the next stage of its development.
The Tool
TILE will be based primarily on the Ajax XML Encoder
(AXE). Through TILE, we will extend the functionality
of AXE to allow the following:
• Semi-automated creation of links between transcriptions
and images of the materials from which
the transcriptions were made. Using a form of optical
character recognition, our software will recognize
words in a page image and link them to a preexisting
textual transcription. These links can then
be checked, and if need be adjusted, by a human.
• Annotation of any area of an image selected by the
user with a controlled vocabulary (for example, the
tool can be adjusted to allow only the annotations
“damaged” or “illegible”).
• Application of editorial annotations to any area of
an image.
• Support linking for non-horizontal, non-rectangular
areas of source images.
• Creation of links between different, non-contiguous
areas of primary source images. For example:
• captions and illustrations;
• illustrations and textual descriptions; • analogous texts across different manuscripts
We are especially concerned with making our tool available
for integration into many different types of project
environments, and we will therefore work to make the
system requirements for TILE as minimal and as generic
as possible.
Notes
1
Valley of the Shadow: Two Communities in the American
Civil War, Virginia Center for Digital History, University
of Virginia (http://valley.vcdh.virginia.edu/); The
Complete Writings and Pictures of Dante Gabriel Rossetti,
A Hypermedia Archive, edited by Jerome J. McGann,
University of Virginia (http://www.rossettiarchive.org/);
Dickinson Electronic Archives, edited by Martha Nell
Smith, Online. Institute for Advanced Technology in the
Humanities (IATH), University of Virginia (http://www.
emilydickinson.org/); The William Blake Archive. Ed.
Morris Eaves, Robert N. Essick, and Joseph Viscomi.
(http://www.blakearchive.org/).
2
Hugh Cayless, “Experiments in Automated Linking of
TEI Transcripts to Manuscript Images,” presented at the
Text Encoding Initiative Member’s Meeting, November
2008. http://www.cch.kcl.ac.uk/cocoon/tei2008/programme/abstracts/abstract-166.html
References
Carlquist, J. 2004. “Medieval Manuscripts, Hypertext
and Reading. Visions of Digital Editions. Literary &
Linguistic Computing 19.1,105-118.
Chaucer, G. 1996. Wife of Bath’s Prologue on CDROM.
Edited by P. Robinson. Cambridge University Press.
Holmes, M. 2007. Image Markup Tool v. 1.7. [http://
www.tapor.uvic.ca/~mholmes/image_markup/] Accessed
2008-11-13.
Kiernan, K.. 2005. “Digital Facsimiles in Editing: Some
Guidelines for Editors of Image-based Scholarly Editions.”
Electronic Textual Editing. Ed. Lou Burnard, ,
Katherine O’Brien O’Keeffe and John Unsworth. New
York: Modern Language Association, 2005. [preprint at
http://www.tei-c.org/About/Archive_new/ETE/Preview/
kiernan.xml]
Kiernan, K. 1999. The Electronic Beowulf. University of
Michigan Press.
Kirschenbaum, M. G. 2002. Editor’s Introduction: Image-based
Humanities Computing. Computers and the
Humanities 36.1, 3-6.
Langland, W. 2000. Piers Plowman Electronic Archive,
Vol. 1. Edited by R. Adams. University of Michigan Press.
TEI Consortium, eds. 2007. “Digital Facsimiles.” Guidelines
for Electronic Text Encoding and Interchange. [Last
modifieddate:2008-07-04].[http://www.tei-c.org/release/
doc/tei-p5- doc/en/html/PH.html] Accessed 2008-07-25.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Conference website: http://web.archive.org/web/20130307234434/http://mith.umd.edu/dh09/

Series: ADHO (4)

Organizers: ADHO

Text-Image Linking Environment (TILE)

1. Dorothy Carr Porter

2. Doug Reside

3. John Walsh

ADHO - 2009