Stylometric Analyses of Character Speeches in French Plays

paper, specified "short paper"
Authorship
  1. 1. Ioana Galleron

    Université Sorbonne Nouvelle Paris 3

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


It is usually understood that literary authors have style, and numerous papers have been published about the usefulness of digital approaches and techniques for identifying stylistic specificities of a writer, for confronting styles, for attributing texts, etc. (see Holmes, 1994; Brunet, 2004; Herman et al., 2004). However, another traditional type of stylistic analysis in literary studies has been less operationalized in a digital paradigm, aimed at observing how the characters speak (see however Brynes 2010 and 2012). This paper contributes to the testing of digital tools for such an approach; in other words, it tries to answer, with digital tools, whether literary characters have a style, or if the signal of the author that creates them is prevalent over all other kinds of linguistic specificities.
In order to answer this question, the paper focuses on theatrical texts from the 17
th and 18
th French centuries. To the contrary of what happens in narrative texts, character’s discourses have clear boundaries in plays, and can be easily extracted from an XML/TEI encoded text. Also, characters in plays have been somewhat less studied with digital tools than characters in novels (Jockers and Kiriloff, 2016; Jockers 2013).

A sample of 8 comedies staged approximately from 1630 to 1740 has been put together; the sample tries to balance well known play writers, such as Molière, Corneille Destouches and Marivaux, with more obscure authors (Dancourt, Regnard, Boissy, Boursault). However, all the plays are “grandes comédies” in 5 acts and in verses, with comparable lengths and quite similar numbers of characters. A total amount of 82 discourses has been extracted using an XQuery under BaseX.
First, the plays as a whole (but without the stage directions or any kind of meta-discourse) have been submitted to a PCA using the stylometric library under R written by (Eder and al., 2016). As expected, differences between author styles are well underlined by their distribution on the graph, with Molière in the middle and Regnard closer to him than to the authors from the 18
th century.

This first analysis has been conducted only to confirm that the tool works on the type of texts it has been fed to, and yields sensible results.
Second, character’s discourses have been submitted to the same kind of analysis. After further adjustments, such as the exclusion from the corpus of too short roles that were skewing the general distribution on the graph, and the testing of various algorithms (covariance, correlation or cluster analysis? Classic Delta, Canberra or Eder’s Delta?), the following representation has been obtained:

As it can be observed, characters do not group by “origin” (i. e., more often than not characters from a play do not appear together), nor do they display a clear historical split - to the difference to what was happening with the authors. This tends to confirm that characters do have a style, whose parameters are to be further identified and tested. Moreover, when merging in a same txt file discourses of feminine, respectively masculine, characters from the same play/ author, significant differences appear in certain cases, with Molière’s Agnès and other feminine characters being the most intriguing case:

In a third stage, an analysis of characters’ speeches is conducted with TXM (see Heiden, 2010), so as to delineate the stylistic differences pointed to by the PCA and to attempt an explanation. Verbs do not seem to be useful discriminators, even when the texts are lemmatized. The calculus of specificities shows that personnel pronouns, names and possessives are the most discriminant features. To these, one may add the adjectives, which appear more frequently as a characteristic of male speeches according to the table of preferred words built with the “oppose” function under stylo library in R. While the importance of “pronouns of dialogue” for characterizing plays has already been underlined (Muller, 1979), the other features are a bit more surprising, since one would have expected, for instance, verbs to play a more prominent role, related to the actantial position of the characters. Also, it is not clear why male characters would use more adjectives than their feminine counterparts; sensibility and a trend towards pathos are more often evoked in relation to the second ones. After further inquiries with a new set of discourses, from other plays by the same authors, but also from other authors, so as to confirm the above mentioned phenomena, the paper will try to propose some explanations based on a close analysis of the contexts.

Bibliography

Brunet, Étienne. (2004). Où l’on mesure la distance entre les distances.
Texto!
http://www.revue-texto.net/Inedits/Brunet/Brunet_Distance.html

Brynes, R. (2010). A statistical analysis of the ‘Eumaeus’ Phrasemes in James Joyce’s Ulysses
. JADT 2010: Conference Proceedings, Rome: Edizioni Universitarie di Lettere Economia Diritto, pp. 289-96.

Brynes, R. (2012). The Stylometry of Cliche Density and Character.
Proceedings of JADT 2012, Liège/ Bruxelles: Université de Liège/ Facultés Universitaires Saint-Louis, pp. 239-46.

Eder, M., Kestemont, M. and Rybicki, J. (2013). Stylometry with R: a suite of tools.
Digital Humanities 2013: Conference Abstracts. University of Nebraska--Lincoln, NE, pp. 487-89.

Heiden, S. (2010). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme.
24th Pacific Asia Conference on Language, Information and Computation. Institute for Digital Enhancement of Cognitive Development, Waseda University, pp. 389-98.

Herrmann, J. B., van Dalen-Oskam, K. and Schöch, C. (2015). Revisiting style, a key concept in literary studies.
Journal of Literary Theory, 9(1) : 25-52.

Holmes, D.I. (1994) Authorship attribution.
Computers and the Humanities, 28(2): 87-106.

Jockers, M. and Kiriloff, G. (2016) Understanding Gender and Character Agency in the 19th Century Novel.
Journal of Cultural Analytics. DOI 10.22148/16.010

Jockers, M. (2013)
Macroanalysis. Digital Methods and Literary History, Urbana, Chicago and Springfield: University of Illinois Press.

Kastberg Sjöblom, M. (2004) L’indice pronominal est-il encore d’actualité?
Lexicometrica, (5): 1-21
.

Malrieu, D.
(2001). Stylistique et statistique textuelle. À partir de l’article de C. Muller sur les ‘pronoms de dialogue’.
Texto !
http://www.revue-texto.net/index.php?id=611

Muller, C. (1979).
Langue française et linguistique quantitative. Genève : Slatkine.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019
"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO