What can you do with a TEI Writing System Declaration?

paper
Authorship
  1. 1. Mavis Cournane

    University College Cork

Parent session
Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The TEI's Writing System Declaration is one of the four auxiliary DTDs.It is referred to by the parameter entity !ENTITY % foo system "myTEI.wsd" which is set to SUBDOC.

[1] pp. 431-439-this article contains a detailed discussion of the SUBDOC feature. Kimber points out that as subdocuments are one of two defined object types (documents are the other) they can be reliably re-used. Subdocuments are stand-alone, self describing objects and can thus be easily re-used.

SUBDOC is an SGML keyword which indicates that an entity's contents are considered to be a separate SGML document with its own DOCTYPE declaration pointing potentially to a different DTD. The WSD defines a triplet of language, writing system, and coded character set (such as Ascii, Unicode). Its main components are:[2] pp. 679-680.

WritingSystemDeclaration, which declares the coded character set, entity set or transliteration formula used to transcribe a given writing system of a given language. Attributes include, name, i.e. the official name of the WSD and date, the date on which the WSD was last modified language, the language being described by the WSD script, a description (not machine processable) of the script declared by the WSD direction, indicates the direction in which a language is written using a specific script characters, contains a specification of the characters used in a specific writing system to write a particular language, and how those characters are represented electronically note, a note of any type in a writing system.

Implementing the Writing System Declaration

The second part of this presentation will deal with the use of the TEI's Writing System Declaration (WSD) to document and map a correct character encoding for non-Latin characters in complex multilingual texts. The problem to be addressed is this:the TEI's encoding defaults are ASCII characters but in the case of some texts such as the 11th century Irish poem Adelphus Adelpha Mater characters in other alphabets are needed. This section of the presentation will look specifically at the problem which is exemplified in the Adelphus text. It is a poem with a Latin base but containing words in transliterated hellenized Hebrew and latinized Greek.

In the case of the poem Adelphus Adelpha Mater (see below) it was decided to hard-code the original Hebrew and Greek words into the poem via the markup.

I am particularly grateful to Professor Lewis M. Barth, Hebrew Union College for his help in identifying the Hebrew characters and for suggested corrections to the Hebrew words. I am also grateful to Ms Sinead O'Sullivan, St Annes College, Oxford for identifying the Greek characters. This was achieved by modifying the TEI DTD to include the attribute reg on the element frn which is used to identify words in a foreign language. See the example below.

<L N="19"><FRN LANG="he"

reg="&gimelhb;&vethb;&reshhb;&vavhb;">Gibro</FRN>

<FRN LANG="el" reg="&pgr;&rgr;&xgr;&ogr;&ngr;

&agr;&ggr;&agr;&thgr;&ogr;&ngr;">praxon agathon</FRN>

</L>

In () the example above character entities for Hebrew and Greek are contained in the reg attributes attached to the frn element. The element frn uses the lang attribute to identify the languages concerned, with the values of either he, for Hebrew or el for greek. These character entities are associated with a WSD in the TEIheader. See.

<profiledesc>

<langusage>

<language lang="he" wsd="foo">Some of the words are in Hebrew.</language>

<language id="el" wsd="bar">Other words are in Greek.</language>

</langusage>

</profiledesc>

One of the primary problems encountered by those implementing the WSD is that the documentation in the TEI Guidelines is rather confusing. No prior practical examples of an implemented

WSD are given. Use of a WSD may have multiple practical objectives (Professor Birnbaum will exemplify some and this section others). There is a concrete need to enable mapping between the coded character set (ISOheb and ISOgrk in this case) and the glyphs in the font you want to use, without sacrificing document portability.

The use of a WSD poses a practical concern as there is a lack of software which will handle WSDs automatically. The display of these Hebrew and Greek character sets in a plain SGML editor like Emacs is very difficult as there are few facilities for character replacement. Some graphical SGML editors permit the specification of a particular font to display particular elements and attributes. This ought to enable Hebrew and Greek entities such as &vavhb; , and &xgr; to be displayed in their correct Hebrew and Greek font. However, not all editors provide this facility of entity replacement from attributes. SoftQuad's graphical SGML editor, Author/Editor does not facilitate such a replacement. SGML browsers such as Panorama Pro permit entity substitution in the before and after replications of attribute values. Thus attribute values can be assigned a different font and substitution takes place. It is the character entity set which enables this substitution and not the Writing System Declaration. Attempts to print an SGML file containing these Hebrew and Greek characters are hindered also by the lack of software to automatically process the WSD.

The WSD, however, provides not only for the documentation of the UCS (Unicode) value of a character but also for other mappings. In the case of the Hebrew character whose symbolic representation is à, it provides for a formal UCS code i.e. 05D0, and an afiicode i.e. E140.

A short program was written using the Omnimark processing language to convert the SGML to LaTeX for printing. This uses the recursive do sgml-parse action to enable processing of the current file (this document) to break off when the entity reference in the WSD attribute of the LANGUAGE element is detected, and process the WSD file itself, then resume the main document. I wish to acknowledge the help of Peter Flynn, Computer Centre, UCC for his help in creating the Omnimark scripts and his contribution towards my understanding of the WSD.

This enables the character entity names in the WSD to be interpreted, written out to disk to a style file in LaTeX format so that they can be read in again during the LaTeX processing to implement the exact character encoding required for the font used.

REFERENCES

1. Re-usable SGML: Why I demand SUBDOC W. Eliot Kimber SGML '96 Conference Proceedings November 1996 Graphic Communications Association USA

2. Guidelines for Electronic Text Encoding and Interchange (TEI P3)

C.M. Sperberg-McQueen (ed), Lou Burnard (ed) ACH, ACL, ALLC Chicago, Oxford 1994

3. Writing System Declarations in the TEI

Peter Flynn

http://imbolc.ucc.ie/~pflynn/wsd/_

Bibliography

Five Experiments In Textual
Reconstruction And AnalysisDavid Howlett Peritia: Journal of the Medieval Academy of Ireland 9 1995 Brepols Belgium

Character Representation
Harry E. Gaylord Computers and The Humanities 29 1 1995 Kluwer Academic Publishers The Netherlands

Standardizing characters, glyphs, and
SGML entities for encoding early Cyrillic writing David J. Birnbaum Computer Standards & Interfaces 18 1996
Elsevier Science B.V Amsterdam

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998
"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

Tags