On the Meaning of the Term 'text' in Digital Humanities

Paul Caton

Authorship

1. Paul Caton

Centre for Computing in the Humanities - King's College London

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

On the Meaning of the Term 'text' in Digital Humanities
Caton, Paul, Centre for Computing in the Humanities, King's College London, pncaton@gmail.com
In digital humanities the word "text" (in both mass and count noun senses)By the end of the paper I hope the relation between the mass and the count senses of the noun "text" will be clear. That "text" has a count noun sense is embodied in the Text Encoding Initiative Guidelines (2008 passim). occurs ubiquitously; familiar uses include text encoding, full text search, there are six texts online, we are using the text of the first edition. Typically the word is not defined in the specific context of its use, nor is there an overarching definition or description so widely accepted that it is taken as given at all times. However, our various uses of "text" show we have a priori assumptions about the nature and scope of its reference. But what are those assumptions? are they justified? do they collectively define "(a) text" for us?Text encoding models have of course benefited greatly from the "ordered hierarchy of content objects" definition proposed in DeRose et al 1990 and its subsequent refinements such as in Renear et al 1996. Caton 1999 discusses the relation between the TEI element <text> and the concept of "a text". Caton and the INKE Research Group 2010 argues for greater precision in the use of core digital humanities terms such as "text".

In this preliminary investigation I approach from the outside in. I take a number of marginal cases and of each one ask "is this a text?" - because any attempt to answer that question must draw out the assumptions that underlie our common usages. I focus on the count noun sense because discrete entities with boundaries ought to be easier to recognize. In our professional lives we talk about "texts" all the time: oughtn't we to know one when we see one?

The marginal cases I discuss are an encrypted message (Figure 1), a sigilSigils in general are magical symbols, and as used by Austin Osman Spare "are developed by fusion and stylization of letters" (Frater, U. D. 1991, 7). The letters come from a sentence that expresses a particular desire of the magical practitioner. created by English occultist Austin Osman Spare (Figure 2), a minimal unit (Figure 3), and a poster with quoted words on it (Figure 4).

Figure 1 - encrypted messageFigure 1 is adapted from http://en.wikipedia.org/wiki/Caesar_cipher. Retrieved 28/10/2010.

Full Size Image

Figure 2 - creation of a sigil from a message stringFigure 2 is from Spare 1913, page 50.

Full Size Image

Figure 3 - minimal unit.

Full Size Image

Figure 4 - Second World War posterFigure 4 is from http://en.wikipedia.org/wiki/Never_was_so_much_owed_by_so_many_to_so_few. Retrieved 28/10/2010.

Full Size Image

Summary of Discussion
I suggest the following are core assumptions underlying our collective use of "a text":

representation of language: for any non-metaphorical use we think that language must be involved. Unlike a painting or a piece of music, which seem to affect us unmediated by language, for something to be "a text" it must resolve to language in our heads, even if what we see does not directly represent language. Thus we see Milton Glaser's famous logo "I [heart symbol] NY" and in our heads hear the words "I love New York", because the particular symbol-word association is so common that it resolves almost by default - especially given the linguistic context in which the heart symbol occurs. The encrypted message in Figure 1 is all linguistic symbols, but not directly interpretable as any language. Resolution to language depends upon knowing how to decipher the symbol sequence - though I suggest that as creatures of language even if we do not know the cipher (and so the sequence remains impenetrable to us) we think it likely that the sequence we see is a reversible transformation of a comprehensible linguistic sequence. We accord it an honorary status as "a text" whose lack of recognition is due to our ignorance, and not to its being something other than "a text". But Figure 2 is a different matter. Unlike the product of the cipher transformation - an incomprehensible string of what are recognisably linguistic symbols - sigilization transforms a comprehensible sequence of linguistic symbols into an almost purely graphic image. I suggest that seeing the final sigil without knowing its origin, we would not even accord it honorary status as "a text", because the deletions, substitutions, and spatial reconfigurations make it almost impossible to resolve back into language - there are so few clues that it started out as language in the first place. On the other hand, the linguistic message has not been replaced by a figurative image, in the way that a photograph of an emaciated child might replace the symbol sequence "children are starving" - indeed mimetic representation of the desire conveyed by the communication would not be to the purpose, "[t]he idea being," writes Spare, "to obtain a simple form which can be easily visualised at will, and has not too much pictorial relation to the desire." (50) For the person who creates the sigil the message is still completely present, implying that for them at least the sigil is "a text".

communication: in the normative case for "a text" we assume that a linguistic symbol sequence has been created to communicate, which is the primary function of such sequences. We assume the sequence forms a message (or, in the case of a fragment, would form a message if the entire sequence were present). We have such a propensity to find a message, to make sense out of a sequence, that we will try to establish "a text" even in the least promising cases. Because the glyph shown in Figure 3 represents a character that (in addition to being a letter) is also a lexical item in English it triggers that response, but because the lexical item is supposed to function as a determiner yet here determines nothing it gives us no semantic purchase. Compare this to Figure 3b:

Figure 3b

Full Size Image

This is another glyph that represents a character that is both letter and lexical item, and here in majuscule form as proper to the lexical item. Because pronouns carry more semantics than indefinite articles it gives us more 'traction'; I suggest that we would rank Figure 3b as closer to being "a text" than Figure 3, even if we wouldn't commit to saying that it is "a text".

completeness: the completeness of the message embedded in the symbol sequence depends entirely upon the context. What is merely part of "a text" in one context can stand alone in another context, as the excerpt from Churchill's speech does in Figure 4. If we say that the poster in Figure 4 contains "a text", though, does that text contain the words "The Prime Minister" - or are we looking at two texts, one (just the quote) embedded in another (the whole linguistic symbol sequence)? When we hold a paperback book - an edition of Moby Dick, for example - how many texts are represented in that physical object?

Some Preliminary Conclusions
There is a type of thing called "text" which is a symbol or sequence of symbols that either directly represents language or can be resolved back into language by reversing an earlier, non-arbitrary transformation. In this mass noun sense, text exists, is independent of context, and independent of individual interpretation or experience. However, while text in the mass noun sense must be what makes up text in a count noun sense, there is no such thing as "a text" that is independent of context or of individual experience and interpretation. No linguistic symbol sequence is naturally "organic" or "unitary" (the adjectives used by the TEI Guidelines), though any complex sequence will have structural features that offer themselves as convenient boundaries. Nevertheless these boundaries are always artificial, as much recent work on genetic editions has shown.Particularly interesting examples are the work of Malte Rehbein on a medieval German town record book (2009), and of Justin Tonra on Thomas Moore's long poem "Lalla Rookh" (2009). In each case the multiplicity of symbol sequences that are candidates for being "a text" is striking. Being "a text" is a status we give some text in a particular context and at our choosing. In this sense "a text" is, as Renear and Dubin say in a somewhat similar context, "a matter of contingent social/linguistic circumstances" (2007 p.8) and is thus - as they similarly concluded about three of the four FRBR Group 1 entity types - not a type but a role. In other words I suggest that being "a text" is not what Guarino and Welty would term a rigid property of any instance of text in its mass noun sense.Guarino and Welty define a rigid property as "a property that is essential to all its instances" where by "essential" they follow Lowe in saying that "an essential property of an object … [is one where] the object has that property always and in every possible world" (2001 p.57). An example of a rigid property that they use several times is PERSON: "if x is an instance of PERSON, it must be an instance of PERSON in every possible world" (2001 p.57). They contrast PERSON with STUDENT; STUDENT is a property an entity can have and then not have without the entity changing: the same is not true of PERSON. A good deal of ontological work needs to be done, however, before this can be asserted with confidence.

References:
Caton, Paul 1999 “Using <TEXT> in TEI Markup, ” ALLC/ACH conference, Virginia, June 1999

Caton, Paul, and INKE Research Group 2010 “No representation without taxonomies: Specifying key terms in digital humanities, ” Digital Humanities 2010, London, July 2010

DeRose, Steven J., David Durand, Elli Mylonas, and Allen H. Renear 1990 “What is text, really?, ” Journal of Computing in Higher Education, 1 (2) 3-26

Frater U. D. 1991 Practical Sigil Magic: Creating Personal Symbols for Success. [Trans. Ingrid Fischer.], St. Paul, MN Llewellyn Publications

Guarino, Nicola, and Christopher Welty 2001 “Supporting ontological analysis of taxonomic relationships, ” Data and Knowledge Engineering, 39 51-74

Rehbein, Malte 2009 “Reconstructing the textual evolution of a medieval manuscript, ” Literary and Linguistic Computing, 24 (3) 319-327'

Renear, Allen, David Durand, and Elli Mylonas 1996 “Refining our notion of what text really is, ” Research in Humanities Computing [edited by Nancy Ide and Susan Hockey], Oxford Oxford University Press

Renear, Allen H. (corresponding author), and David Dubin 2007 “Three of the four FRBR Group 1 entity types are roles, not types., ” Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (ASIST) [In Grove, Andrew, Ed.], Milwaukee, WI (US),

Spare, Austin Osman 1913 The Book of Pleasure (Self-Love), LondonCo-operative Printing Society Limited

TEI Consortium 2008 TEI P5: Guidelines for Electronic Text Encoding and Interchange [Lou Burnard and Syd Bauman, eds.],

Tonra, Justin 2009 “Textual studies and the TEI: Encoding Thomas Moore's 'Lalla Rookh', ” Jahrbuch für Computerphilologie, 10 (2009) 25-36

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011

"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

On the Meaning of the Term 'text' in Digital Humanities

1. Paul Caton

ADHO - 2011

"Big Tent Digital Humanities"