Script Analysis In A World Of Anonymous Writers Digital Forensics for Historical Documents

poster / demo / art installation
Authorship
  1. 1. Hannah Busch

    Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW), Utrecht University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


In the medieval period script and handwriting were not very personal: they were a taught and strictly constructed set of characters. Yet scripts did develop over time, from the cursive scripts of antiquity to Caroline minuscule, Gothic textualis, humanist and early modern scripts. The discipline of studying those chronological and regional variances of script is called palaeography, and it requires a deep and detailed knowledge of historical script features.
The work of palaeographers is characterized by their high level of expertise and their skill of minute manual comparison with the scope to date and localize handwritings. Therefore, there are and there have been only a few authorities in the field. To date script classification addresses the “subjectivity of the human mind,”(Stutzmann, 2016b) but computational methods from the field of image analysis and pattern recognition could be able to objectify the study of historical scripts (see e.g.
Ciula, 2004 and 2005; Stokes, 2007 and 2009; Stutzmann, 201
6a
). The decisive reason for the application of computational approaches is not the distrust in the opinion of the few experts, but the fact that due to mass digitization thousands of medieval manuscripts have become digitally available. Beside the enormous amount of high quality image data,
new standards are being established, such as

IIIF

, which allows to browse through and compare manuscripts kept at different institutions, and at different ends of the world with one mouse click. The increasing computing power enables scholars to process larger datasets and to analyze the images.
There have been attempts in the past to develop quantitative approaches, which have experienced critique by those authorities, like Bischoff who already in 1979 suspected, that palaeography, which used to be an art of vision and empathy, becomes an art of measuring by technical means (Bischoff, 1986: 19). The worries about computational methods might be justified, but as a matter of fact, the number of experts in the field has not grown in proportion to the material which has become available in recent years (see Nichols, 2016: 44). It seems obvious that thinking about new (quantitative) approaches to palaeography becomes more urgent than before.

Therefore, the here presented project
The project “Digital Forensics for Historical Documents. Cracking cold cases with new technology” is funded by the Research Fund of the Royal Netherlands Academy of Arts and Sciences from 2018 to 2021. attempts to create a digital tool, based on a deep learning system, in which the unique characteristics of a certain script sample will be matched with similar script samples by making use of digitized manuscript collections available in the world wide web and their IIIF APIs.

While methods of feature extraction regarding the segmentation and comparison of single letters or groups of letters are time consuming when it comes to larger amounts of image data, deep learning is the technique of choice for processing a large number of historical written documents (see Kestemont et al., 2017). Convolutional neural networks are able to learn the features of a script sample, to reproduce and memorize them, and to compare different samples of script and give a concrete statement about their similarity.
Hence, the
approach of deep learning is similar to the experienced eye of highly trained palaeographers, but the computer is able to compare even more and finer features than the human eye. Moreover, it can give a precise and objective output about the degree of similarity—or distinction between processed samples of scripts. And transform the subjective view of the palaeographer into an objective assessment.

In addition to the state of the art of the technical developments, we also want to discuss the challenges the humanities scholar using computational methods encounters: The need of ground truth data in order to cluster the script samples in a way manuscript scholars can extract information from the results (see Stutzmann, 2016a). The question that arises is whether the given information in the accompanying metadata can help us to add a “ground-truth” to the images available. Within the project we do not only want to build the technical framework, but also evaluate the deep learning approach for palaeographic research. Does the approach allow new insights on spatial and diachronic
evolution of medieval book scripts by comparing different “ground-truth data” (e.g. opinions of manuscript scholars, script classification system in use), and different neural networks: Which differences does the computer see? How do different settings change the classification? How similar are scripts that have been assigned to the same time and place of production, or script family? And, can the application of artificial neural networks help the manuscript scholars to answer their research questions, or the curator of digital collections to add more information to the catalog?

Bibliography

Bischoff, B. (1986
2).
Paläographie des römischen Altertums und des abendländischen Mittelalters. Berlin: Erich Schmidt Verlag.

Ciula, A. (2005). Digital palaeography: using the digital representation of medieval script to support palaeographic analysis.
Digital Medievalist, 1. DOI:
https://doi.org/
10.16995/dm.4.

---. (2004). Modelli digitali di scrittura carolina.
Gazette du livre médiéval 45, pp. 27–38.

Kestemont, M.
et al. (2017). Artificial Paleography: Computational Approaches to Identifying Script Types in Medieval Manuscripts.
Speculum, 92/S1, pp. 86–109, doi:
.

Nichols, S. (2016).
From Parchement to Cyberspace: Medieval Literature in the Digital Age. Bern: Peter Lang.

Stokes, P. (2007). Palaeography and Image-Processing: Some Solutions and Problems.
Digital Medievalist 3, doi:
https://doi.org
10.16995/dm.1
5.

Stokes, P. (2009). Computer-Aided Palaeography, Present and Future. In Busch, H. et al. (eds),
Kodikologie und Paläographie im digitalen Zeitalter / Codicology and Palaeography in the Digital Age, Norderstedt: BoD, pp. 309–38,
.

Stutzmann, D. (2016a). Clustering of medieval scripts through computer image analysis: Towards an evaluation protocol.
Digital Medievalist 10,
.

Stutzmann, D. (2016b). Classification of medieval handwritings in latin script. 28 Septembre 2016.
(accessed 2019/04/15).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.