A Modest proposal. Analysis of Specific Needs with Reference to Collation in Electronic Editions

paper
Authorship
  1. 1. Ron Van den Branden

    Centrum voor Teksteditie en Bronnenstudie (KANTL)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Text collation is a vital aspect of textual editing; its results
feature prominently in scholarly editions, either in printed
or electronic form. The Centre for Scholarly Editing and
Document Studies (CTB) of the Flemish Royal Academy
for Dutch Language and Literature has a strong research
interest in electronic textual editing. Much of the research for
the electronic editions of De teleurgang van den Waterhoek
(2000) and De trein der traagheid (forthcoming) concerned
appropriate ways of visualising textual traditions. Starting from
the functionality of collation results in electronic editions, as
illustrated in the latter edition, this paper will investigate the
needs for a dedicated automatic text collation tool for textcritical
research purposes.
The second section sets out with identifying different scenarios
in the production model of electronic scholarly editions (based
on Ott, 1992; Vanhoutte, 2006). Consequently, possible criteria
for an automatic text collation tool in these production
scenarios are identifi ed, both on a collation-internal and
generic software level. These criteria are analysed both from
the broader perspective of automatic differencing algorithms
and tools that have been developed for purposes of automatic
version management in software development (Mouat, 2002;
Komvoteas, 2003; Cobéna, 2003; Cobéna e.a., 2002 and [2004];
Peters, 2005; Trieloff, 2006), and from the specifi c perspective
of text encoding and collation for academic purposes (Kegel
and Van Elsacker, 2007; Robinson, 2007). Collation-internal
criteria distinguish between three phases of the collation
process: the automatic text comparison itself, representation
of these comparisons and aspects of visualisation. Especially
in the context of electronic scholarly editions, for which TEI
XML is the de facto encoding standard, a degree of XMLawareness
can be considered a minimal requirement for
collation algorithms. A dedicated collation algorithm would
be able to track changes on a structural and on word level.
Moreover, since textual editing typically deals with complex
text traditions, the ability to compare more than two versions
of a text could be another requirement for a collation
algorithm. Regarding representation of the collation results,
an XML perspective is preferable as well. This allows for easier
integration of the collation step with other steps in electronic
editing. In a maximal scenario, a dedicated text collation tool
would represent the collation results immediately in one or
other TEI fl avour for encoding textual variation. Visualisation
of the collation results could be considered a criterion for a
text collation tool, but seems less vital, however prominent it features in the broader context of developing an electronic
edition. From a generic software perspective, a dedicated tool
would be open source, free, multi-platform, and embeddable in
other applications. These criteria are summarised in a minimal
and a maximal scenario. The plea for a tool that meets the
criteria for a minimal scenario is illustrated in the third section
of the paper.
The third section of the paper is a case study of another
electronic edition in preparation at the CTB: the complete
works of the 16th century Flemish poetess Anna Bijns,
totalling around 300 poems that come in about 60 variant
pairs or triplets. The specifi c circumstances of this project led
to an investigation for collation solutions in parallel with the
transcription and markup of the texts. After an evaluation of
some interesting candidate tools for the collation step, the
choice was made eventually to investigate how a generic
XML-aware comparison tool could be put to use for multiple
text collations in the context of textual editing. Besides
formal procedures and a set of XSLT stylesheets for different
processing stages of the collation results, this development
process provided an interesting insight in the specifi c nature
of ‘textual’ text collation, and the role of the editor. The
description of the experimental approach that was taken will
illustrate the criteria sketched out in the previous section,
indicate what is possible already with quite generic tools, and
point out the strengths and weaknesses of this approach.
Finally, the fi ndings are summarised and a plea is made for a text
collation tool that fi lls the current lacunae with regards to the
current tools’ capacities for the distinct steps of comparison,
representation and visualisation.
References
Cobéna, Grégory (2003). Change Management of semistructured
data on the Web. Doctoral dissertation, Ecole
Doctorale de l’Ecole Polytechnique. <ftp://ftp.inria.fr/INRIA/
publication/Theses/TU-0789.pdf>
Cobéna, Grégory, Talel Abdessalem, Yassine Hinnach (2002).
A comparative study for XML change detection. Verso report
number 221. France: Institut National de Recherche en
Informatique et en Automatique, July 2002. <ftp://ftp.inria.
fr/INRIA/Projects/verso/VersoReport-221.pdf>
Cobéna, Grégory, Talel Abdessalem, Yassine Hinnach ([2004]).
A comparative study of XML diff tools. Rocquencourt,
[Unpublished update to Cobéna e.a. (2002), seemingly
a report for the Gemo project at INRIA.] <http://www.
deltaxml.com/dxml/90/version/default/part/AttachmentData/
data/is2004.pdf>
Kegel, Peter, Bert Van Elsacker (2007). ‘“A collection, an
enormous accumulation of movements and ideas”. Research
documentation for the digital edition of the Volledige Werken
(Complete Works) of Willem Frederik Hermans’. In: Georg
Braungart, Peter Gendolla, Fotis Jannidis (eds.), Jahrbuch für
Computerphilologie 8 (2006). Paderborn: mentis Verlag. p. 63-
80. <http://computerphilologie.tu-darmstadt.de/jg06/kegelel.
html>
Komvoteas, Kyriakos (2003). XML Diff and Patch Tool. Master’s
thesis. Edinburgh: Heriot Watt University. <http://treepatch.
sourceforge.net/report.pdf>
Mouat, Adrian (2002). XML Diff and Patch Utilities. Master’s
thesis. Edinburgh: Heriot Watt University, School of
Mathematical and Computer Sciences. <http://prdownloads.
sourceforge.net/diffxml/dissertation.ps?download>
Ott, W. (1992). ‘Computers and Textual Editing.’ In
Christopher S. Butler (ed.), Computers and Written Texts.
Oxford and Cambridge, USA: Blackwell, p. 205-226.
Peters, Luuk (2005). Change Detection in XML Trees:
a Survey. Twente Student Conference on IT, Enschede,
Copyright University of Twente, Faculty of Electrical
Engineering, Mathematics and Computer Science. <http://
referaat.cs.utwente.nl/documents/2005_03_B-DATA_AND_
APPLICATION_INTEGRATION/2005_03_B_Peters,L.
J.-Change_detection_in_XML_trees_a_survey.pdf>
Robinson, Peter (2007). How CollateXML should work.
Anastasia and Collate Blog, February 5, 2007. <http://www.
sd-editions.com/blog/?p=12>
Trieloff, Lars (2006). Design and Implementation of a Version
Management System for XML documents. Master’s thesis.
Potsdam: Hasso-Plattner-Institut für Softwaresystemtechnik.
<http://www.goshaky.com/publications/master-thesis/XMLVersion-
Management.pdf>
Vanhoutte, Edward (2006). Prose Fiction and Modern
Manuscripts: Limitations and Possibilities of Text-Encoding
for Electronic Editions. In Lou Burnard, Katherine O’Brien
O’Keeffe, John Unsworth (eds.), Electronic Textual Editing. New
York: Modern Language Association of America, p. 161-180.
<http://www.tei-c.org/Activities/ETE/Preview/vanhoutte.xml>

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2008

Hosted at University of Oulu

Oulu, Finland

June 25, 2008 - June 29, 2008

135 works by 231 authors indexed

Conference website: http://www.ekl.oulu.fi/dh2008/

Series: ADHO (3)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None