Common Container Correlation: A Simple Method for the Extraction of Structural Models from Statistical Data

paper, specified "short paper"
Authorship
  1. 1. Rafael Alvarado

    University of Virginia

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The use of topological graphs, or networks, to represent and analyze the semantic contents of source materials, such as texts and images, has become a signature contribution by the digital humanities to the humanities in general. Specific techniques, such as topic modeling and network analysis, and general approaches, such as macroanalysis and distant reading, exemplify the popularity and effectiveness of methods based on the graph theoretical representation and statistical modeling of cultural materials. However, because of their mathematical complexity and their focus on very large corpora of texts, these methods are beyond the reach of many humanists interested in the interpretation of smaller sets of source materials for cultural meaning. They are also suspect since they introduce ontological commitments that both elide traditional notions of human agency and reframe culture as a set of abstract, metrical dimensions. In this talk, I introduce “common container correlation” (C3) as a relatively simple and transparent interpretive method for the graph theoretical analysis of source materials that may be practiced by both students and more advanced researchers to excavate and make sense of cultural models implicit in textual materials.
C3 may be described as a variation of co-occurrence analysis designed to take advantage of the abundance of encoded cultural materials available to the digital humanist and to allow for the analysis of small sample sizes, such as individual texts. Formally, a common container correlation is just a link, or edge, that is asserted between any two items, regarded as nodes or vertices, that are contained within the same structural container. The set of all such links produces a graph of nodes and links based on their co-occurrence in a common container. In some cases these graphs will have meaning—that is, they will exhibit patterns that lend themselves to structuralist and other forms of interpretive analysis. These patterns may sometimes be correlated with psychological, sociological, or material causes that will be of interest to the humanist.
For example, in a novel marked up with TEI-based schema, we may choose to define the paragraphs of the text as container elements and tagged references to proper names as contained elements. We then assert that all named agents in a given paragraph are related to each other (in the special sense of co-occurring). The set of all of these assertions for all paragraphs will produce a kind of social graph that may then be visualized and analyzed in structural terms. In such a case, it may emerge that two characters consistently appear on opposite sides in multiple instances of a force-directed representation of the graph. This may be evidence of a structural opposition that will have emerged from the statistical distribution of the selected elements. Other approaches may use other container elements, such as scenes, and combinations of contained elements, such as places and people.
The C3 method is easy to implement using available tools. Container and contained elements in XML encoded materials may be extracted using simple XPath statements (by means of a variety of tools) and dumped into tables with columns for container IDs and contained IDs. Such tables may then be transformed using simple SQL queries into various graph data formats for visualization and analysis in tools such as Gephi, GraphViz, SHIVA, and D3. Depending on the intention of the user, the resultant graph may or may not reflect the frequency of edges and vertices in the source data.
In this talk I will describe the C3 method using examples taken from three digital humanities projects with which I have been associated. First, I will describe the application of the method to rhetorical figures (containers) and characters (contained elements) using data from the Princeton Charrette Project. Second, I will describe how undergraduates in an introductory digital humanities course at the University of Virginia created a database relating characters and paragraphs in Austen’s Persuasion. Third, I will describe the use of the method in Stephen Railton’s Digital Yoknapatawpha Project, drawing on data correlating scene containers to people and places as contained elements in William Faulkner’s corpus of fiction. In each case, I will explore the interpretive implications of the algorithms used to visualize the data, taking particular care to describe the specific steps involved in going from markup to data representation to visualization to interpretation. In this way, I hope to connect the discourse on data-driven textual analysis to traditional interpretive methods, such as close reading and structural analysis, in order to produce a genuinely humanistic use of quantitative methods that does not alienate the researcher from the tools of interpretation.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO