Penninc versus Vostaert. Contrasting co-authors by means of authorship attribution techniques

paper
Authorship
  1. 1. Karina van Dalen-Oskam

    Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW), Dept. Dutch Linguistics and Literary Studies - University of Amsterdam

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Only one complete version of the Middle Dutch Arthurian Romance Roman van Walewein is handed down to us, in a manuscript written in 1350. The texts as it is consists of 11.292 lines and about 65.000 words (tokens). As stated by the clerk who copied it, the romance was written by two authors: Penninc started the work and Pieter Vostaert finished it by adding the last 3300 lines - or the last 330, as recently has been argued. It is unknown when the romance was composed; most researchers seem to opt for a tentative date between 1230 and 1260, which leaves about a century between the composition of the original text and the manuscript from 1350. If the two authors have known each other and have both written their parts of the text around the middle of the thirteenth century, this might imply that subsequent copying may have smoothed the text as to possible differences between the two authors. Researchers of the text, however, have pointed to significant differences between Penninc and Vostaert. These were elaborately described by G.A. van Es in the introduction to his edition of the text, published in 1957. He based his characterisation of Penninc and Vostaert on a.o. the analysis of syntactical aspects, the use of past perfect and historical present tense, the use of adverbs, the way the authors handled certain motifs in clearly different ways, and on a selective analysis of their vocabulary. This last aspect, he explained, should be researched more deeply in the future because exhaustive research undoubtedly would reveal more about the character and style of the two authors and even may lead to more clues as to the time in which both were writing.

Modern authorship attribution techniques will be used to follow up on Van Es’s work. I will go into the distribution of parts of speech throughout the text (cf. Holmes 1994: 89-90) and vocabulary analysis of several kinds (type – token ratio, vocabulary distribution, comparing the vocabulary of both authors, vocabulary richness; cf. Holmes 1994: 91-98). Furthermore, because the set of authors is ‘closed’, limited to two, principal component analysis might yield interesting information in the comparison of both authors (Binongo and Smith 1999, Burrows 2003: 8-10, Somers and Tweedie 2003). The most interesting technique, however, is Burrows ‘Delta’: analysis of the use of the most frequent words (Burrows 2002; Burrows 2003), a.o. because this kind of analysis was not part of Van Es’s research in 1957. Depending on the results of these tests, other (e.g. multivariate) techniques that might supplement the information will be used as well.

In order to apply these techniques to the Roman van Walewein of Penninc and Vostaert, the text will be iteratively divided by a cursor into two smaller parts, which will both be analysed with the help of the above-mentioned techniques. The cursor will move through the complete text and the assumption is that the different techniques will concur in locating the place where the first and last part of the text are contrasting most, indicating the place where Vostaert took over from Penninc.

Because fourteenth-century Dutch did not have a standard spelling yet, the text is (semi-automatically) tagged on word level, giving each word a modern Dutch headword that will bring all spelling variants and inflectional or conjugational forms of a word under one heading; the text is also tagged for parts of speech. Word counts will be done for both the tagged text and for the ‘original’ text, to find out in what way tagging will change the results of the tests (cf. Burrows 2003: 10).

The goals of this research are twofold: firstly, to find out whether it is possible to get answers to the following questions: is it possible to pinpoint the place where Vostaert took over from Penninc? Did he indeed write about 3300 lines at the end, or only a meagre 330, or will the text as it was written down in 1350 give rise to new assumptions as to the (non-existence of a) break? What will this mean for the suggestions earlier researchers have made as to the lines where Vostaert must have taken over? Another interesting question (cf. Forsyth 1999) is whether it will be possible to find out whether Vostaert wrote only a few years after Penninc or maybe after a longer time - e.g. closer to the year 1350 than to 1250? And to go a step further: could Vostaert possibly be the clerk who wrote the manuscript in 1350, building on a (century-)old manuscript that contained Penninc's unfinished text? Or will this new research lead to new insights into the work of copying clerks? Secondly, the goal is to evaluate the usefulness of the chosen combination of authorship attribution techniques for this specific type of textual problem.

References

1. Binongo, J.N. and M.W.A. Smith, ‘The Application of Principal Component Analysis to Stylometry’. In: Literary and Linguistic Computing 14 (1999), 445-465.
2. Burrows, J., ‘“Delta”: a Measure of Stylistic Difference and a Guide to Likely Authorship’. In: Literary and Linguistic Computing 17 (2002), 267-287
3. Burrows, J., ‘Questions of Authorship: Attribution and Beyond’. In: Computers and the Humanities 37 (2003), 5-32.
4. Es, G.A. van (ed.), De jeeste van Walewein en het schaakbord van Penninc en Pieter Vostaert. 2 Vols. Zwolle, 1957.
5. Forsyth, R.S., ‘Stylochronometry with Substrings, or: a Poet Young and Old’. In: Literary and Linguistic Computing 14 (1999), 467-477.
6. Holmes, D.I., ‘Authorship Attribution’. In: Computers and the Humanities 28 (1994), 87-106.
7. Johnson, D.F. and G.H.M. Claassens (Eds. and Transl.), Dutch Romances I: Roman van Walewein. Cambridge, 2000.
8. Somers, H. and F. Tweedie, ‘Authorship Attribution and Pastiche’. In: Computers and the Humanities 37 (2003), 407-429.
9. Tweedie, F.J., S. Singh & D.J. Holmes, ‘Neural Network Applications in Stylometry: The Federalist Papers’. In: Computers and the Humanities 30 (1996), 1-10.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None