A Comprehensive Image-Based Digital Edition Using CEX: A fragment of the Gospel of Matthew

poster / demo / art installation
  1. 1. Janey Capers Newland

    Furman University

  2. 2. Emmett Baumgarten

    Furman University

  3. 3. De'sean Markley

    Furman University

  4. 4. Jeffrey Rein

    Furman University

  5. 5. Brienna Dipietro

    Furman University

  6. 6. Anna Sylvester

    Furman University

  7. 7. Brandon Elmy

    Furman University

  8. 8. Summey Hedden

    Furman University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This poster (with accompanying downloadable dataset and application) will demonstrate as a proof-of-concept a comprehensive image-based publication and analysis of a text bearing artifact, [catalog number redacted for anonymous review], a hitherto unpublished 10th Century palimpsest fragment of the Gospel According to Matthew. The fragment contains most of the “Parable of the Sower”.
In editing this text, we sought to be as comprehensive as possible, capturing:

Natural light and UV images, both overview images and details
A diplomatic transcription of the overwritten text of Matthew and any legible characters from the undertext
A word tokenization of the diplomatic transcription, mapping to each token:

a normalization
editorial status
lexical status
morphology, part of speech, and syntactic relations
alignment to the image data

A character tokenization, aligned to the image data
An edition of the whole Gospel according to Matthew from a critical edition, for comparison and context
Translations aligned to the text
Editors’ comments

In publishing it, we sought simplicity, longevity, and clarity. While we use TEI XML as a format for capturing an initial transcription, the overlapping analytical categories, many-to-many alignments of text and image, and open ended possibilities for commentary precluded implementing a coherent data model fully in XML. At the same time, we wanted a concise and integrated publication.
By using the CEX format

CEX (CITE Exchange Format) is a plain text format for capturing data about texts and collections, based on the CITE/CTS architecture and developed by C. Blackwell (
Homer Multitext), T. Köntges (University of Leipzig), and N. Smith (
Homer Multitext). For implementations and projects using CEX, see: T. Köentges, (Meletē)ToPān (topic modelling environment): https://thomask81.github.io/ToPan/; C. Blackwell, N. Smith, CEX Library (Scala): https://github.com/cite-architecture/cex; C. Blackwell, N. Smith, CEX Dataset Repository: https://github.com/cite-architecture/citedx

, a plain text, self-documenting format based on the CITE/CTS architecture, we are able to bring together these many levels of analysis in a form that is at once disaggregated, with each scholarly primitive explicitly and unambiguously citable, while still united in a single file. CEX allows us to distribute a fully integrated dataset in the form of a single plain text file and a single directory of images.

Our publication, a CEX file and a directory of images, is technology-agnostic readable by humans, but also able to serve as the data for an end-user application. We will describe, and have available for download and on USB thumbdrives, a lightweight, zero-configuration single page web application (SPA), fully usable offline, that integrates the data and images for this publication for end-users.

This application is based on the ScalaJS implementation of “CITE App” by C. Blackwell and N. Smith: https://github.com/cite-architecture/CITE-App

Finally, we will outline the low cost, low technology, collaborative work behind the digitization and editing of this manuscript fragment: off-the-shelf cameras, simple handheld UV lighting, readily available FOSS software.
We believe that this work will be of interest to the international Digital Humanities research community both as a new publication of a Byzantine Greek text and as a demonstration of a replicable and sustainable combination of technology and workflow. We think this approach provides a compelling alternative to XML or RDF editions and complex database-driven end user applications, offering advantages both on the back end (a flexible, scalable, and self-documenting format for implementing diverse data models), and on the front end (lightweight and portable presentation for readers). At the same time the data we present as CEX and images is easily transferrable to other standard formats.

Existing code libraries for working with CEX include a microservice framework that delivers textual and other data from CEX files as JSON objects, via HTTP requests (see https://github.com/cite-architecture/scs-akka ) and libraries that export CEX data into other formats, such as 2-column tabular data or 82XF (see https://github.com/cite-architecture/scm ).

All project data is under version control in a public GitHub repository, and licensed under a CC-BY license. All source code is under either a GPL or MIT public license.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO / EHD - 2018

Hosted at El Colegio de México, Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico)

Mexico City, Mexico

June 26, 2018 - June 29, 2018

340 works by 859 authors indexed

Conference website: https://dh2018.adho.org/

Series: ADHO (13), EHD (4)

Organizers: ADHO