After the Fall –– Structured Data at IATH

paper
Authorship
  1. 1. John Unsworth

    University of Illinois, Urbana-Champaign

  2. 2. Daniel Pitti

    University of Virginia

Parent session
Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper argues that in text-encoding, as in life, you cannot eat your cake and have it too. At some point, it becomes necessary to make choices that rule out, or at the very least make far more difficult, recourse to other, incompatible choices. The constraints that operate in enforcing these consequences may be of several types: economic, intellectual, political, technical, or functional. The point at which they will impinge on a particular encoding project depend on the nature and context of that project, and often on factors entirely independent of either: from one situation to the next, these factors are likely to be weighted rather differently as well.
The Institute for Advanced Technology is unusual in the world of humanities computing and text encoding, in that it deals with projects in disciplines from architecture and archaeology to history, religion, classics, linguistics and literary studies, and more. This makes IATH, in effect, a ext-encoding generalist, in that it deals with many types of data, and it puts us in a position to understand, from experience, the choices we all have to make.
This paper will provide examples of text-encoding choices and their consequences, including:
TEI-based projects, such as Hoyt Duggan's Piers Plowman Electronic Archive (http://www.iath.virginia.edu/piers/) , which uses TEI to encode a literary text (in spite of the fact that TEI is not particularly good for source description, it is very good for abstracting text and making it available for concordancing, collation, and linguistic analysis);
Michael Satlow's Inscriptions project (http://www.iath.virginia.edu/mls4n/), with its TEI-based inscript dtd, where TEI's implicit assumption that the textual transcription (the body of the etext) is of central importance runs up against the fact that the bibliographical description (the head of the etext) is in fact the center of scholarly attention.
Projects that have developed their own DTDs, such as Jerome McGann's Rossetti Archive (http://www.press.umich.edu/bookhome/rossetti/index.html) . McGann's reasons for departing from TEI have already been set out, in McGann's The Rationale of Hypertext (http://www.iath.virginia.edu/public/jjm2f/rationale.html) and the more recent "Imagining What You Don't Know: The Theoretical Goals of the Rossetti Archive (http://www.iath.virginia.edu/~jjm2f/chum.html), but would be reviewed as part of this presentation.
The Blake Archive (ed. Eaves, Essick and Viscomi:http://www.iath.virginia.edu/blake/) is a prototypical instance of an editorial decision that militates against use of the TEI. The Blake editors have chosen to focus their attention on Blake's work as a printer, and to privilege the plate over the poem as the basic unit of that work. This central focus on what TEI considers source description was critical to the project, and to their decision to develop their own DTD.
Projects that use a mix of SGML and database tools. . Among the IATH applications of database encoding are census records, military rosters, and control production data from the Valley of the Shadow Project (http://www.iath.virginia.edu/vshadow2/), item-level descriptions of images in the Pompeii Forum Project (http://pompeii.virginia.edu/), item-level descriptions of text and image production data from the London Project, and scanning records from the Blake Archive. Databases, for example, do certain kinds of numerical processing better than most SGML-delivery applications, and database records can be a more appropriate format for information without significant hierarchical structure.
These examples (and others) will be described and analyzed, not only with respect to problems of encoding but also with a particular focus on the problems that arise when encoded text needs to be processed for output under one or another delivery system, and the results of such processing will be demonstrated, in an attempt to derive from our experience some general criteria that could be used by other projects to assess the costs and benefits of different encoding strategies.
And finally, this paper will describe some recent and ongoing work at IATH using various commercial document management and digital library software to address the problem of generating and tracking control data (production information, version control, etc.) and integrating or harvesting this data into SGML. Also, we will consider the possibility that XML may reduce the impetus to adopt standardized representations of data for the sake of buying a benefit at the level of processing¾standards for style sheets such as XSL and DSSSL and Hytime could make it possible to go from rather idiosyncratic representations into more or less standard architectures. On the other hand, this is only likely to make it obvious where the real issues are¾that representation is an intellectual activity and not simply important for technical reasons, and that standard (community-based) intellectual representations are important for reasons that don't have to do with processing, but have to do with analysis, theory, understanding of what the problems are, what the perspectives are, what we know about the information.
Other relevant IATH Projects:
The Emily Dickinson Archive (http://www.iath.virginia.edu/dickinson/)
The World of Dante (http://www.iath.virginia.edu/dante/)
The Walt Whitman Archive (http://www.iath.virginia.edu/whitman/)
Manumission Inscriptions (http://www.iath.virginia.edu/meyer/)
Selected Bibliography:
Structured Information: Navigation, Access, and Control (presented at the Berkeley Finding Aid Conference, April 4-6, 1995) by Steven J. DeRose (http://www.sil.org/sgml/deroseStructure.html)
Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies. Allen Renear, Elli Mylonas, David Durand. (http://www.stg.brown.edu/resources/stg/monographs/ohco.html)
Allen Renear, "The Target Paper"
(http://hhobel.phl.univie.ac.at/mii/mii/node7.html)

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998
"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None