Attributes: A Solution

Peter Cripps

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. The problem restated
The principle objective of The Wittgenstein Archives at Bergen University is to publish the literary estate – the Nachlass – of the Austrian philosopher Ludwig Wittgenstein in machine readable
form. This Nachlass comprises some 20,000 pages
of which roughly 65% are manuscripts and the
remainder typescripts. The greater part of the
handwritten material, as well as much of the typed,
is replete with later alterations, deletions, insertions, rearrangements, cross-references and the
like, the sum of which constitutes a formidable
challenge to text encoding.
Some of the textual features we wish to record in
our transcription work invite the use of a markup
language which can describe properties on more
than one level.
148
For instance, a word or words constituting a later
addition to a text might be inserted either above or
below the original line of writing; such an insertion may or may not include a marking to indicate
that it is confirmed as an addition to the main text;
such a confirmation marking, if present, might in
turn be modified by means of a wavy underlining
to indicate that the addition has subsequently been
called into doubt, and this wavy underlining might
itself be crossed through, thereby effectively reaffirming what was formerly doubted.
And still this isn’t the whole story. In addition to
the different markings we can often distinguish
that each step in the process makes use of a different ink, and this information needs to be recorded
independently of what that ink was used for. I.e.
we can never assume a consistent correlation between, say, confirmation markings and blue ink,
or doubt markings and red ink.
What we have here is a text feature with one
essential property – its being a later addition to the
base text – and a number of secondary properties,
such as position (above or below the line) and
degree of authorial approval (“marked” = confirmed, “marked with added wavy underlining” =
doubted, “marked with added wavy underlining,
wavy underlining cancelled” = reaffirmed).
In Standard Generalized Markup Language
(SGML) secondary properties such as an insertion’s position, its degree of authorial approval (status) and the type of ink it was written with would
commonly be handled by means of attributes. But
when we consider the problem of Wittgenstein’s
insertions closely it becomes clear that SGML
attributes are not capable of describing such features in all their complexity.
Suppose in the above case that we have a single
SGML type generic identifier (GI) to describe the
primary property of a string’s being inserted and
that all the secondary properties are to be accounted for by means of attributes. The insertion’s
position is easy enough to deal with, since for any
one insertion this will be invariable. Thus we
might have a tag which looks like this:
<insertion position=“above line”> ... </insertion>
Neither should the confirmation marking be a
problem so long as it isn’t modified by anything
else:
<insertion position=“above line”
marking=confirmation> ... </insertion>
But what should we do if the confirmation marking has subsequently been modified by a wavy
underlining to indicate doubt? The only way to
quote an attribute in SGML is to list it in the
open-tag of an element along with any others that
might be relevant. Both “marking=confirmation”
and “marking=doubt” appear to be relevant in our
example. Yet the insertion as a whole cannot be
both confirmed and doubted!
And how should we represent the use of different
inks? We cannot simply add an attribute “ink=red”
if only the confirmation marking is in red ink
whereas the text of the insertion is in black.
What we seem to be dealing with is properties of
properties: the confirmation marking represents a
property of the insertion as a whole, whereas the
doubt marking annuls the confirmation; the fact of
being written in black ink is a property of the
insertion as a whole, whereas that of being written
in red describes the confirmation marking alone.
And so on. What we need here is a system whereby
attributes can qualify each other as well as the GIs
that describe the element’s primary property.
2. MECSA – a more flexible attribute
syntax
MECSA is an attribute syntax currently being
developed as an adjunct to the Multi-Element
Code System (MECS), the markup language used
at the Wittgenstein Archives in Bergen. Well formed MECS texts are convertible to and derivable
from SGML texts, and it is envisaged that MECSA
attributes will likewise be translatable to and from
SGML attributes.
MECSA incorporates a number of innovations,
the most significant of which is that it permits
attributes to qualify not only GIs but also other
attributes. This it does simply by allowing for
bracketing. For example, if an attribute “att2”
describes a property of an attribute “att1” (rather
than an immediate property of the feature described by the GI) then one can express the relationship in MECSA by quoting “att2” in brackets1
immediately after “att1”, thus:
<GI att1=val1(att2=val2)> ... </GI>
Suppose further that “att2” requires the qualification of an attribute “att3”. This we would express
by appending “att3” as a parenthesis to “att2”,
thus:
<GI att1=val1(att2=val(att3=val3))> ... </GI>
And so on.
When two or more attributes apply to the same
object (either the GI or another attribute) they are
simply listed concurrently within the relevant
bracket:
<GI att1=val1 att2=val2(att3=val3(att4=val4)
att5=val5)> ... </GI>
In this case “att1” and “att2” apply directly to
149
“GI”, “att3” and “att5” apply to “att2”, and “att4”
applies to “att3”.
MECSA makes it appropriate to talk of attributes
occurring on different levels. In the above example “att1” and “att2” are on the highest level and
could be called primary – or first-level – attributes,
while “att3”, “att4” and “att5” are sub-attributes
(on lower levels), whereof “att3” and “att5” are
second-level attributes and “att4” is on a third-level. In this way we could say that SGML provides
a single level attribute syntax, whereas that of
MECSA is multi-level.
The possible uses for such a system are numerous.
In a particular application one might choose to
ignore everything but the primary attributes, or
alternatively to “work out” the brackets, beginning
with the most deeply nested and progressing outwards, before putting the primary attribute(s) to
their task(s). This will be clearer if we consider a
practical example.
Let us imagine some MECSA attributes to describe the different properties associated with Wittgenstein’s insertions as these are outlined above.
The different kinds of markings, which serve to
confirm or doubt an insertion, can be accounted
for by an attribute called “marking” which takes
the values “confirmation” (for the initial insertion
marking) or “doubt” (for the wavy underlining
indicative of doubt). The attribute “ink” takes one
of the values “black”, “blue” or “red”. Allowing
that the “marking” attribute can be applied to itself
and the “ink” attribute to both the GI and the
“marking” attribute, we might then tag a particular
insertion thus:
<insertion ink=black marking=confirmation
(ink=blue marking=doubt(ink=red))> ...
</insertion>
Suppose now that we are interested in Wittgenstein’s text as it looked after he had reworked it in
blue ink. At that stage he was evidently in approval
of the inserted material, and consequently it
should be included in the text we retrieve. In other
contexts the attribute “marking=doubt” might
well be used to suppress the effect of “marking=confirmation”. But by defining “ink=red”
such that it first suppresses the effect of “marking=doubt”, we can leave the attribute “marking=confirmation” to function uninhibited. In
this way we suppress what was in itself a suppressor.
On another occasion we might wish to view the
text as it looked after the first pass. To achieve this
we can use the “ink=blue” attribute to suppress
any effect the “marking=confirmation” attribute
might have. And so on.
It is not difficult to imagine further applications
for such a system.
3. Parsing MECSA attributes
In MECSA, details about the legal combinations
of attributes and GIs are recorded in an Attribute
Definition Table (ADT), which in most respects
serves the same purposes as the ATTLIST declarations in an SGML DTD.
One of the functions of the SGML ATTLIST, or
– in the case of MECSA – the ADT, is to specify
the value for a legally applicable attribute in the
case that none is made explicit in a particular
document. In SGML this is a straightforward process. If the GI in a particular tag lacks a legal
attribute, then a parser can supply that attribute
together with its default value in accordance with
the appropriate ATTLIST. But since MECSA allows even one and the same attribute to apply to
itself as well as to a GI an ADT cannot be allowed
to handle the ascription of attributes to attributes
in the same way as it handles the ascription of
attributes to GIs. The danger we have to guard
against is, of course, the possibility of an infinite
regress. It should be obvious what would happen
if a parser were asked to render explicit all the legal
attributes in a system where “att1” can apply to
“att1”!
MECSA avoids this danger by allowing for the
supplementation (extension) only of first-level attributes. If the ADT specifies that “att1” can legally be applied to “GI1” then a parser can, on encountering a tag where “GI1” lacks “att1”, supply
the attribute plus its default value. But although
the ADT might specify that “att1” can legally be
applied to “att1”, a MECSA parser will not supply
a lower level occurrence of “att1” where one is
already present. The ADT information that “att1”
can be applied to “att1” is available for the purpose
of checking whether the attributes which are already explicit in a document are legal, not for the
purpose of supplying default values where they
are absent.
In an SGML ATTLIST the legal values for each
attribute are listed together with the legal attributes
for the particular GI. But in the MECSA ADT the
information about the values an attribute may take
is handled separately from the information about
the range of attributes which may legally be ascribed to a particular attribute object (GI or other
higher level attribute). This separation is again a
consequence of the fact that MECSA allows attributes to qualify attributes. (It would not be feasible
to express this possibility using the structure of the
SGML ATTLIST.)
It is worth noting, however, that a MECSA parser
does not necessarily presuppose the existence of
an ADT. In the absence of an ADT a MECSA
parser may still check the attribute syntax. But it
may also, if desired, compile a minimal-ADT
from the document itself. In doing so it notes
which attributes are applied to which GIs, which
150
to other attributes, and which values are already
attached to those attributes. This information is
then arranged in the format of a normal ADT, such
that this record can, if desired, be used as a measure of the correctness of further document instances. The one thing such a syntax-checker cannot
do is deduce an attribute’s default value. Instead,
in compiling a minimal-ADT, it supplies a System
Default Value (SDV) in the place where an explicit default value would stand in a full-ADT. This
SDV is a reserved character which ensures that
attributes remain functionally meaningless when
inserted into documents automatically on the basis
of a minimal-ADT. If and when a document containing such attributes is checked against a fullADT, these SDVs can then be replaced by meaningful values.
4. Conclusion
Due to the nature of certain textual phenomena
encountered in Wittgenstein’s Nachlass, the need
arose for an attribute syntax of greater descriptive
resolution than that offered by SGML. MECSA is
a system which promises to satisfy this need by
allowing attributes to qualify not only GIs but also
one another. In designing this system such that
documents using its syntax can be converted to
and obtained from SGML documents, it is hoped
that MECSA will be of use also in contexts other
than the work currently being done at the Wittgenstein Archives in Bergen.
Notes
1. Technically speaking, MECSA provides for a
“sub-attribute open delimiter” and a “sub-attribute close delimiter”, the functions of which
could be assigned to any suitable characters.
Ie there is nothing compulsory about the characters “(“and ”)”.
Bibliography
Charles F. Goldfarb, “The SGML Handbook”,
Oxford 1990.
Claus Huitfeldt, “MECS – A Multi-Element Code
System”, Bergen 1992; Working Papers from
the Wittgenstein Archives at the University of
Bergen, 1995.
C.M. Sperberg-McQueen & Lou Burnard (eds.),
“Guidelines for Electronic Text Encoding and
Interchange – TEI P3”, Chicago/Oxford, 1994.

Full text license: This text is republished here with permission from the original rights holder.

Attributes: A Solution

1. Peter Cripps

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1996