Text Analysis Markup Language (TAML)

poster / demo / art installation
Authorship
  1. 1. Stéfan Sinclair

    University of Alberta

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Text analysis tools come in a wide variety of shapes and sizes, from small modules meant to do a specific task quickly and efficiently (like many Unix commands) to larger applications that integrate several functions into an interface (like HyperPo). A further distinction can be made between tools that are designed to be used locally (like TACT) and those that have some network intelligence built-in (like TAPoR Tools). However, a common characteristic of almost all existing tools is an inability to communicate with other tools, beyond the simplest input/output mechanisms. As a new era of fully networked text analysis tools is emerging, we are in urgent need of a standardised syntax for transmitting information about both functionality and data between different tools.
The current situation in tools development can be likened to the disjointed and somewhat cacophonic predicament of text digitisation prior to the advent of the Text Encoding Initiative (TEI): many sophisticated and effective means of encoding texts had been devised, but a lack of standardisation made exploiting these texts a challenge. In can be argued that the TEI was a major catalyst in the explosion of text digitisation and encoding projects since the mid 1990s. It is hoped that a standardised syntax for text analysis tools will spark a similar upsurge in tools development, particularly since so many high-quality TEI-conformant texts are now available.
The Text Analysis Markup Language (TAML) project is an initial attempt to specify a standardised syntax for text analysis communication. TAML is actually a concentric two-tier system that contains functionality and content components on the first tier that can be used independently. The second tier is meant as a transcript language to express a complete set of text analysis operations (see figure 1).
The TAML Transcript syntax anticipates three primary types of communication: 1) between an initial user interface and an abstract text analysis tool; 2) between various text analysis tools; and 3) between a text analysis tool and a location to store or view the results (such as a browser). A key component of the TAML Transcript system would be a central broker tool that would be able to dispatch requests to different tools, be they TAML-compliant or not (see figure 2).
Any standardisation is an exercise of compromises and difficult strategic decisions. For instance, what is the optimal level of generality for TAML? Should one build the application-specific intelligence into a schema by using specific tags, like <raw-frequency> or more general tags, like <item>, that rely on external documentation? In this context, TAML should only be seen as an opening gambit in a process of arriving at standardised language for text analysis. What we can learn by trying to standardise the communication is almost as valuable as the standard itself.
This poster will present the current state of the TAML schema and provide examples of transcripts and prototype tool implementations.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None