Character Network Analysis of Émile Zola's Les Rougon-Macquart

paper, specified "long paper"
  1. 1. Yannick Rochat

    École Polytechnique Fédérale de Lausanne (EPFL)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Character Network Analysis of Émile Zola's Les Rougon-Macquart


EPFL, Switzerland


Paul Arthur, University of Western Sidney

Locked Bag 1797
Penrith NSW 2751
Paul Arthur

Converted from a Word document



Long Paper

character networks
theory of the character
distant reading

literary studies

In this work, we use network analysis methods to sketch a typology of fiction novels based on characters and their proximity in the narration. We construct character networks modelling the 20 novels composing
Les Rougon-Macquart, written by Émile Zola. To categorise them, we rely on methods that track down major and minor characters relative to the character-systems. For that matter, we use centrality measures such as degree and eigenvector centrality. Eventually, with this analysis of a small corpus, we open the stage for a large-scale analysis of novels through their character networks.

Character Network Analysis
A character network is a model of a novel’s plot focusing on a single dimension among the different types of narrative entities—that is, the
character or, at the level of the whole novel, the character-system:

[. . .] the arrangement of multiple and differentiated character-spaces—differentiated configuration and manipulations of the human figure—into a unified narrative structure. (Woloch, 2003, 14])
Characters are represented in the network by nodes. The relations among them are determined on the basis of their proximity in the narration: if two characters appear side-by-side more often than a given threshold, then a link (i.e., an edge) is created between them in the network (Rochat, 2014). If two characters never appear close together, or not significantly enough according to the defined threshold, then they are not linked in the character network.
As examples of existing research, Franco Moretti explored the narrative importance of a character by comparing some features of a character network before and after deletion of the said character (Moretti, 2011), while Mac Carron and Kenna (2012) extracted the structures of three mythological works (
Iliad, and
Táin) and compared them one to another and to real social networks, concluding that they were ‘discernable from real social networks’ and eventually proposing to rank them ‘from the real to the fictitious’ (5).

Les Rougon-Macquart
The novels constituting
Les Rougon-Macquart were published between 1871 and 1893, starting with
La Fortune des Rougon and ending with
Le Docteur Pascal. They cover a historical period going from 1852 to 1870. In these, Zola arranged a society of fictional and real characters in dissimilar ways, once focusing on a single character, and at other times dividing the attention between a few complementary protagonists, along with other characters recurring from one novel to another:

I wish to explain how a family [. . .] conducts itself in a given social system. [. . .] I shall endeavour to discover and follow the thread of connection which leads mathematically from one man to another. (Zola, 1967 [E. A. V. Merton, trans.])
In his study of
Les Rougon-Macquart’s character-systems, Philippe Hamon writes that some novels have a main protagonist, while others have more than one protagonist:

Polyfocalisation of the system on a few heroes—rather than unfocalisation, which alternately shares the ‘hero spots’ of the system—polyfocalisation of which
La Bête Humaine and
La Débâcle are the best examples, processes issued from a network made of marked ‘nodes’ and interstitial light layers, which take distance from a fixed ‘pyramid-like’ hierarchy (a hero, secondary and marginal characters, etc., according to a non-adjustable scale) of classic works. (Hamon, 1998, 320, my translation)

We propose a mathematical formalism to study these questions in the section on typology. The index of
centralisation measures how centralised the network is, i.e., how much more central the most central character is compared to all the other characters, ‘central’ being an open concept thus far. Then,
coreness highlights who the characters at the center of the narration are.

The Index
In order to construct the character networks, we consider an index built on the whole series (Zola, 1967, 1795–884), for which the indexer detailed his/her choices. It is a table compiling the occurrences of characters, from which we extract the co-occurrences that lead to the determination of the sets of edges. Contrarily to an automatic extraction process, we can rely here on the professional work of scholars, which provides exact positions at a page-level by disambiguating characters cited by nicknames, pronouns, or multiple names.
The index contains supplementary information, from which we use the novel names (characters frequently appear in more than one novel) and the characters’ descriptions to distinguish characters with the same name—for example, the six different characters named ‘Rose’.
Eventually, we transformed the index into a table composed of 40,768 entries, each one of them having three attributes: name of character, name of novel, and page. The table contains 1,343 unique characters and 7,290 unique pages.
The Networks
The table is then divided into 20 smaller tables, each one corresponding to a novel. We apply the method developed in Rochat (2014) to include co-occurrences on overlapping pairs of pages in order to take characters appearing in the same sentence but on different pages into account when creating the edges, since they need to be linked together. We build bipartite networks from these tables, with one set of nodes composed of the characters and the other set composed of the pages. Then we compute the graph projections on the sets of characters to obtain the characters’ networks, shown in Figure 1 (see Fruchterman and Reingold, 1991, for the layout algorithm).

Figure 1. The character networks of the
Rougon-Maquart’s 20 novels.

The character networks show significant diversity (Table 1). The number of nodes (i.e., the
order) varies from 16 to 88, and the number of edges (i.e., the
size) from 68 to 1,181. Works like
Le Rêve and
La Faute de l’Abbé Mouret feature few characters and relations: this is consistent with their intimate subjects. In comparison,
Au Bonheur des Dames, and
Germinal feature many characters and relations: they are composed of a rich crowd along with narrative events involving many characters.

Table 1. Basic network properties.
density of a network is the ratio of the number of existing edges by the number of all possible edges. Low density implies that the characters are sparsely connected, while high density means that the characters are more intricately connected to each other. In our case, this property can be used for categorisation, since large (
La Débâcle) and rather small (
La Fortune des Rougon) character networks obtain small density values. However, large density values can also be attained by large (
Germinal) as well as the small (
Le Rêve) character networks.

Typology Based on Major vs. Minor Characters
In this section, we develop two ways to categorise character networks by exploiting the distributions of major and minor characters. The first one consists in studying
centralisation, a global measure based on the centrality measures of all the characters, while the second one measures the
coreness of the network—that is, the size of a particularly dense subgraph that we view as a core of protagonists of the network.

Centrality is a wide concept mathematically expressed by families of measures reflecting particular properties of the network under study. For example, degree is one of them. Here, we use in particular betweenness centrality: it measures how much a character acts as an intermediary at the level of the network. Betweenness centralisation is the global network measure based on betweenness centrality: we sum the differences between the maximal betweenness score and each node’s betweenness score, and then divide it by the theoretical maximal sum (Freeman, 1979). A centralisation index returns a value located between 0 and 1: a value close to 0 means that there is no node playing a central role (e.g., a ring graph), while a value close to 1 implies that there is a centralised structure (e.g., a star graph).
We observe the scores in Table 2: most of the networks have low betweenness centralisation. However, those that rank first are significantly more centralised:
L’Oeuvre, L’Argent, Le Docteur Pascal, and
Son Excellence Eugène Rougon have one and only one protagonist (the main character of
L’Argent appears on every page), and
La Débâcle is the story of two men at the front and their strong friendship.

Table 2. Centralisation scores.

In order to delimit the
core of the network (in opposition to the
periphery), we consider the notion of
k-core (Seidman, 1983; Csardi and Nepusz, 2006), that is, the maximal induced subgraph with all its nodes having a degree equal or superior to
k. Normalised by its respective network order, the highest possible
k value in a network is a measure of how compact the main group of characters is. We call it

Results are shown in Figure 2, plotted with the networks’ orders.
La Faute de l’Abbé Mouret’s character network is composed of a very dense component consisting of more than half the total number of characters. We remark that among the three ‘polyfocalised’ novels noticed earlier by Hamon, two of them (
Pot-Bouille and
Germinal) have high values of coreness, meaning that the central and prominent characters are well connected among them among themselves and act as interchangeable figures. However, for the third one,
La Débâcle, the coreness is low, suggesting that having strong protagonists in a sparser network diminishes the strength of the core of protagonists.

Figure 2. Coreness.
In this work, we have shown a descriptive approach to compare character networks. Our results show that it is possible to discriminate them. By iteration, the comparison of character networks leads to the analysis of large numbers of character networks.


Csardi, G. and Nepusz, T. (2006). The igraph Software Package for Complex Network Research.
InterJournal Complex Systems,

Freeman, L. C. (1979). Centrality in Social Networks I: Conceptual Clarification.
Social Networks,
1(3): 215–39.

Fruchterman, T. M. and Reingold, E. M. (1991). Graph Drawing by Force-Directed Placement.
Software: Practice and Experience,
21(11): 1129–64.

Hamon, P. (1998).
Le personnel du roman: Le système des personnages dans les Rougon-Macquart d’Émile Zola. Librairie Droz.

Mac Carron, P. and Kenna, R. (2012). Universal Properties of Mythological Networks.
Europhysics Letters,
99(2) (July): 28002.

Moretti, F. (2011). Network Theory, Plot Analysis.
New Left Review,
68 (April).

Rochat, Y. (2014).
Character Networks and Centrality. Ph.D. thesis, University of Lausanne.

Seidman, S. B. (1983). Network Structure and Minimum Degree.
Social Networks,
5(3): 269–87.

Woloch, A. (2003).
The One vs. the Many: Minor Characters and the Space of the Protagonist in the Novel. Princeton University Press, Princeton, NJ.

Zola, É. (1967).
Les Rougon-Macquart. Gallimard.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2015
"Global Digital Humanities"

Hosted at Western Sydney University

Sydney, Australia

June 29, 2015 - July 3, 2015

280 works by 609 authors indexed

Series: ADHO (10)

Organizers: ADHO