TextAnnotator: A web-based annotation suite for texts

poster / demo / art installation
Authorship
  1. 1. Giuseppe Abrami

    Johann-Wolfgang-Goethe-Universität Frankfurt am Main (Goethe University of Frankfurt)

  2. 2. Alexander Mehler

    Johann-Wolfgang-Goethe-Universität Frankfurt am Main (Goethe University of Frankfurt)

  3. 3. Stoeckel Manuel

    Johann-Wolfgang-Goethe-Universität Frankfurt am Main (Goethe University of Frankfurt)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The annotation of natural language texts and their use is addressed in many projects in the digital humanities. This not only involves the generation of training data, but also the correction of errors by automatic preprocessing. Nowadays there are many methods for automatic text analysis, and just as many tools which encapsulate them for different natural languages as well as for different programming languages. However, there are relatively few annotation tools for correcting annotations or generating training data. The annotation tools mentioned usually only allow a simple annotation of texts as well as a simple visual annotation support. In addition, the use of knowledge databases, such as Wikipedia, Wikidata, Geonames or similar, are rarely usable. Furthermore, the administration of Corpora, the use of different annotation views, the simultaneous and collaborative annotation of the same texts by different users, the user and group-related granting of access permissions to texts, as well as the dynamic determination of Inter-Annotator-Aggreements, are almost non-existent. However, this limited use of annotation tools shows a gap in the large field of digital humanities that can be closed by the so-called TextAnnotator. The TextAnnotator includes a variety of modules for the annotation of texts, which contains the annotation of argumentative, rhetorical, propositional and temporal structures as well as a module for named entity linking and rapid annotation of named entities (Fig. 2). Especially the modules for annotation of temporal, argumentative and propositional structures are currently unique in web-based annotation tools. TextAnnotator, which allows the annotation of texts as a platform, is divided into a front- and a backend component. The backend is a web service based on WebSockets, which integrates the UIMA Database Interface to manage and use texts. UIMA acts as de facto standard for all NLP tasks and almost all preprocessing tools produce a UIMA output. In order to use raw texts and preprocessed texts with TextAnnotator, they first are automatically converted into the UIMA format with the help of the so-called TextImager and preprocessed. In addition, texts are made accessible by using the ResourceManager and the AuthorityManager, based on user and group access permissions (Fig. 1). The use of different components allows the flexible and project-related use of one tool for different purposes. Therefore, texts can be placed in a flexible folder structure and edited by different teams. In addition, different views of a document can be created and used depending on the scenario. Figure 1: Schematic diagram of the use of annotation views (AV). TextAnnotator has access on documents that contain annotation views which are accessible to users. By this assignment TextAnnotator uses the annotations in the individual views for annotation through implemented tools. Figure 2: Extract from an annotation session and the use of KnowledgeBaseLinker. The individual tokens can be linked to knowledge resources or the entries can be modified. In this scenario the texts were already automatically preprocessed by the TextImager and the lower line shows an implicit relation which was interpreted based on the Wikidata entries of the respective assignments to the knowledge database Wikipedia. Through the use of the frontend component, developed in ExtJS, browser-based access to the texts and the available annotation tools is enabled. Once a document has been opened, access is gained to the annotations stored within annotation views in which these are organized. (Fig. 3). Any annotation view can be assigned with access permissions and by default, each user obtains his or her own user view for every annotated document. In addition, with sufficient access permissions, all annotation views can also be used and curated (Fig. 3). This allows the possibility to calculate an Inter-Annotator-Agreement for a document, which shows an agreement between the annotators. Annotators without sufficient rights cannot display this value so that the annotators do not influence each other. This contribution is intended to reflect the current state of development of TextAnnotator, demonstrate the possibilities of an instantaneous Inter-Annotator-Agreement and trigger a discussion about further functions for the community. Figure 3: A open document is shown in TextAnnotator. The annotation views are displayed on the left. User views are named with the names of the users other views are annotation views that can be authorized by the user. The IAA value (right) of the document is visualized based on the selected annotation views (left) and the previously selected annotation classes. The agreement of the annotations can be modified arbitrarily (views, classes) and is calculated directly. At the same time, the agreement is highlighted in different colors.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO