This poster (with accompanying downloadable dataset and application) will demonstrate as a proof-of-concept a comprehensive image-based publication and analysis of a text bearing artifact, [catalog number redacted for anonymous review], a hitherto unpublished 10th Century palimpsest fragment of the Gospel According to Matthew. The fragment contains most of the “Parable of the Sower”.
In editing this text, we sought to be as comprehensive as possible, capturing:
Natural light and UV images, both overview images and details
A diplomatic transcription of the overwritten text of Matthew and any legible characters from the undertext
A word tokenization of the diplomatic transcription, mapping to each token:
morphology, part of speech, and syntactic relations
alignment to the image data
A character tokenization, aligned to the image data
An edition of the whole Gospel according to Matthew from a critical edition, for comparison and context
Translations aligned to the text
In publishing it, we sought simplicity, longevity, and clarity. While we use TEI XML as a format for capturing an initial transcription, the overlapping analytical categories, many-to-many alignments of text and image, and open ended possibilities for commentary precluded implementing a coherent data model fully in XML. At the same time, we wanted a concise and integrated publication.
By using the CEX format
CEX (CITE Exchange Format) is a plain text format for capturing data about texts and collections, based on the CITE/CTS architecture and developed by C. Blackwell (
Homer Multitext), T. Köntges (University of Leipzig), and N. Smith (
Homer Multitext). For implementations and projects using CEX, see: T. Köentges, (Meletē)ToPān (topic modelling environment): https://thomask81.github.io/ToPan/; C. Blackwell, N. Smith, CEX Library (Scala): https://github.com/cite-architecture/cex; C. Blackwell, N. Smith, CEX Dataset Repository: https://github.com/cite-architecture/citedx
, a plain text, self-documenting format based on the CITE/CTS architecture, we are able to bring together these many levels of analysis in a form that is at once disaggregated, with each scholarly primitive explicitly and unambiguously citable, while still united in a single file. CEX allows us to distribute a fully integrated dataset in the form of a single plain text file and a single directory of images.
Our publication, a CEX file and a directory of images, is technology-agnostic readable by humans, but also able to serve as the data for an end-user application. We will describe, and have available for download and on USB thumbdrives, a lightweight, zero-configuration single page web application (SPA), fully usable offline, that integrates the data and images for this publication for end-users.
This application is based on the ScalaJS implementation of “CITE App” by C. Blackwell and N. Smith: https://github.com/cite-architecture/CITE-App
Finally, we will outline the low cost, low technology, collaborative work behind the digitization and editing of this manuscript fragment: off-the-shelf cameras, simple handheld UV lighting, readily available FOSS software.
We believe that this work will be of interest to the international Digital Humanities research community both as a new publication of a Byzantine Greek text and as a demonstration of a replicable and sustainable combination of technology and workflow. We think this approach provides a compelling alternative to XML or RDF editions and complex database-driven end user applications, offering advantages both on the back end (a flexible, scalable, and self-documenting format for implementing diverse data models), and on the front end (lightweight and portable presentation for readers). At the same time the data we present as CEX and images is easily transferrable to other standard formats.
Existing code libraries for working with CEX include a microservice framework that delivers textual and other data from CEX files as JSON objects, via HTTP requests (see https://github.com/cite-architecture/scs-akka ) and libraries that export CEX data into other formats, such as 2-column tabular data or 82XF (see https://github.com/cite-architecture/scm ).
All project data is under version control in a public GitHub repository, and licensed under a CC-BY license. All source code is under either a GPL or MIT public license.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at El Colegio de México, Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico)
Mexico City, Mexico
June 26, 2018 - June 29, 2018
340 works by 859 authors indexed
Conference website: https://dh2018.adho.org/