Mind the gap: Filling gaps in cuneiform tablets using Machine Learning Algorithms

poster / demo / art installation
Authorship
  1. 1. Timo Homburg

    Fachhochschule Mainz (Mainz University of Applied Sciences)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

IntroductionA presisting problem in near eastern studies is the existence of broken cuneiformtablets (listing 1.1). In recent years efforts have been undertaken to 3DScan Maraet al. (2010), to paleographically describe Homburg (2019) and to digitally recon-struct broken fragments Collins et al. (2014, 2017) of cuneiform tablets. However,not always broken fragments can complement each other and often parts of thecuneiform tablet remain destroyed. These fractures or gaps in the cuneiformtablet are not always easy for scholars to fill and take a considerable amount ofinterpretation time on their part. With the emergence of more digitally availablecuneiform text resources, this publication sees an opportunity to investigate ifauto-complete algorithms, based on machine learning and linguistic linked opendata (LLOD) resources Homburg (2017) can be useful in the reconstructionof cuneiform texts. The classification results are to be used to create a epochand language specific recommendation system to fill gaps on cuneiform tablets,therefore assisting cuneiform scholars.Related WorkRelated work has been done in autocompletion systems which face the similarchallenge of anticipating the users input derived from context and other featuresLeung & Zhang (2008), Gikandi (2006), Hyvönen & Mäkelä (2006). Those tech-nologies are heavily relied on in input method engines1 which are powered withdifferent dictionary-based algorithms, but recently Chen et al. (2015), Huanget al. (2018) also with machine learning approaches and neural networks. Inputmethod engines for cuneiform have been developed by Homburg et al. (2015).Methodology Following Homburg & Chiarcos (2016) machine learning methods applied areeither based on grammatical rules (POSTagging), dictionary-based methods ex-ploiting (third-party) dictionary resources or statistical approaches using thefollowing types of machine learning features:– Context-dependent features: e.g. for Hidden Markov Model Classifications– Grammatical features derived from POSTaggers– Semantic Features derived from the semantic meaning of surrounding words– Metadata Features e.g. text categorizations– Paleographic Features using PaleoCodage for a subset of manually annotatedtexts Homburg (2019)Experimental SetupThe effectiveness of the algorithms and features is tested on a corpus of all CDLItexts in ATF which is split in a training and test set. Texts are prepared withrandom gaps for classification and evaluated using the original texts (the goldstandard) on unicode cuneiform and on the respective cuneiform transliterationfor different cuneiform languages (Sumerian, Hittite, Akkadian) and epochs. Theposter features selected peliminary results of the classification and a significanceanalysis of the features for further discussion for improvement. A possible futuregoal could be a shared task to improve classification accuracy similar to thecuneiform language identification challenge Jauhiainen et al. (2019)ApplicationLastly, the poster presents a prototypical application (fig. 1)displaying the results of the machine learning process which is currently in devel-opment. The implementation builds up on the concept of input method enginesHomburg et al. (2015) and will provide a self-learning component.Caption Figure 1: (prototype.jpg)Text Completion Prototype: ”If...Enlil”. The dictionary knows, that Enlilis a gods name (NE) and is commonly preceded by a determinative character for god (an), which is suggested in first place to fill the gap. Next likely options are aperson named Enlil (male or female), the people (tribe) of Enlil, or a location

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO