Over the past several years, as part of a Digging into Data Grant and in conjunction with Oxford University's e-Research Centre, the ARTFL Project at the University of Chicago has been developing a large-scale
online resource that allows scholars to examine text
reuse in the 18th century. We used sequence alignment algorithms to compile a database of textual repetitions found in the Gale-Cengage Eighteenth Century Collections Online (ECCO) corpus, which contains the 200,000 works that represent most of the printed literary and scientific output in Britain from 1700 to 1799.
At Digital Humanities 2016 in Krakow, we presented the methodology and algorithms we used to identify these reuses and showed the early results of our work (Roe et al, 2016; for earlier discussions of this project, see Roe et al, 2015 and Abdul-Rahman et al, 2016). For the upcoming Digital Humanities conference, our presentation will have two facets. First, we will outline the technical and editorial approaches we took when building the final version of the alignment database to maximize its usability and usefulness as a scholarly resource (See the Common Place Cultures project site, and its page on the University of Chicago domain). Secondly, we will discuss a use case of the database in which we examined reuses and citations of the first-century BC Roman poet, Lucretius, as a means to get a broad understanding of the 18th-century reception of the De Rerum Natura, the philosophical poem that proposes a materialist conception of the universe.
To construct the database, we used the PhiloLine (see Horton et al, 2010) sequence aligner to identify many millions of similar passages in the often low-quality OCR of the ECCO dataset. These passages range from a handful of words to large extracts of documents. We then used a similar passage matching algorithm to identify passages that were reused many times. The resulting database allows users to track specific passages, identify citations of specific texts in later texts, and find borrowed passages in later documents of an author’s oeuvre.
As the creators and users of our navigational tool, we had to decide on the nature and scope of research the database should support and then strike balances between feasibility, usability, and performance. One of our earliest concerns was to allow users to get a fairly long view of textual reception/citation. To be able to identify the true source of any given reuse, we made the editorial decision to include texts that predated the 18th century. We therefore extended our alignment detection procedure to a variety of curated datasets such as the King James Bible, Classical Latin texts, and EEBO-TCP.
Our extended alignment experiment was extremely successful: we uncovered more than 40 million text reuses across our multiple datasets. At the same time, this success raised the problem of devising a way to explore result sets of a huge scale efficiently, leading us to focus on building a navigation tool that provides filtering and sorting control to users in a precise and intuitive way. We added various UI elements to guide users in their exploration: a list of the most commonly cited authors just a click away from the input box; a faceted browser to help users narrow down search results; and a timeline view of any given text reuse (see Appendix for screenshots) Combined together, these choices greatly enhance the capabilities of our web application, making it a tool that can very easily track intellectual influence from the classical Latin era to the late 18 th century.
In the second part of our presentation we will discuss a use case of our database, examining citations of Lucretius’s Latin text in 18th-century English texts. Our aim was to fill in gaps in current scholarship and discover the aftereffects of the resurgence in interest in Lucretius’s work, the De Rerum Natura (DRN), at the middle of the 17th century in England (for example, Greenblatt (2011) gives only very cursory treatment to reception of Lucretius in the 18th century in The Swerve). During this so called “Epicurean revival,” John Evelyn (1656) and Lucy Hutchinson (unpublished) were the first to translate, either in part or whole, the DNR into English. Walter Charlton published his Physiologia Epicuro-Gassendo-Charltoniana, or, A fabrick of science natural, upon the hypothesis of atoms in 1654. And though Thomas Hobbes never cites Lucretius directly in his Leviathan (1651), the Latin poet's ideas about the material nature of the universe are a distinct antecedent to Hobbes's mechanical philosophy.
Even around the time of the revival, the reception of the DRN was vexed. Lucretius's statements about the materiality and mortality of the soul, the role of chance in the universe, and the detachment of the gods were far too radical ever to gain wide acceptance in early modern England. In the backlash against Hobbesianism in the 1660s, Lucretius was a prime target for criticism. Cottegnies (2016) argues that the backlash against Hobbes's ideas in the 1660s also marks the end of the Epicurean moment. Theologians writing against Hobbes attacked Lucretius's atheistic materialism. And though Lucretius's stature as a poet continued to grow, praise for DRN was almost always tempered. His more extreme ideas were to be dismissed or ignored. In the notes to his translation of DRN in 1682, the first complete translation published in English, Thomas Creech argued against Lucretian and Epicurean attitudes toward the soul and divinity
(Creech and Dryden, 1700; these citations refer to the
text published using the PhiloLogic build of eebo). John Dryden, the preeminent arbiter of literary taste of his era, quoted Lucretius often in his plays and included translations of select passages in Sylvae (1685). In his preface to that collection, Dryden praised the directness of Lucretius's poetic expression, pointing out the “positive assertion of his Opinion” and his “Magisterial authority.” But the subject of his poem is “naturally Crabbed” and the poet himself is “often in the wrong” (Dryden, 1685; text published online by Philo-logic).
Reuses and citations found in the Digging into Data database suggest that this basic framework for understanding Lucretius largely played out across the 18th century. Lucretius was at once an admired poet, a materialist attacked for not admitting divine involvement in the universe, and a philosopher who in fact had important things to say about living well. Even so, the alignment database allows us to see a handful of authors, mostly medical and scientific writers, whose views of Lucretius and his ideas veer from this basic narrative. Mainly toward the middle and latter parts of the century, some reuses suggest a less troubled acceptance of Lucretius's naturalism.
Through this presentation, we hope to show that this alignment database, through the accumulation of so many instances of citation, can facilitate a kind of large-scope reading that allows scholars to gain a nu-anced sense of longer term intellectual trends. Built on a huge quantity of uncorrected OCR, the database provides scholars the specific source evidence -- and a ready means to access it -- that they might need as a starting point to pursue even deeper investigations into the thought of the 18th century.
Figure 1: The Commonplace Cultures search form
Figure 2: Search results filtered by facet
Figure 3: Search results by year
Figure 4: Search results with timeline
Roe, G., Abdul-Rahman, A., Chen, M., Morrissey, R., Olsen, M. (2015). "Visualizing Text Alignments: Image Processing Techniques for Locating 18th-Century Commonplaces." Digital Humanities 2015, Sydney, Australia, July 1, 2015.
Roe, G., Gladstone, C., Morrissey, R., and Olsen, M.
(2016) “Digging into ECCO: Identifying Commonplaces and Other Forms of Text Reuse at Scale.” Digital Humanities 2016, Krakow, July 13, 2016.
Abdul-Rahman, A., Roe, G., Olsen, M., Gladstone, C., Whaling, R., Cronk, N., Morrissey, R. and Chen, M.
(2016). “Constructive Visual Analytics for Text Similarity Detection.” Computer Graphics Forum
Cottegnies, L. (2016) “Michel de Marolles's 1650 Translation”, pp. 161-189 in Norbrook, Harrison, and Hardie eds, Lucretius and the Early Modern (Oxford: Oxford University Press).
Creech, T. and Dryden, J. (1700) Lucretius his six books of epicurean philosophy and Manilius his five books containing a system of the ancient astronomy and astrology together with The philosophy of the Stoicks / both translated into English verse with notes by Mr. Tho. Creech; To which is added the several parts of Lucretius, English'd by Mr. Dryden. London and Westminster, London. Published online by Philologic. (EEBO-TCP; phase 1, no. A49437) Transcribed from Early English Books Online; image set 45711.
Dryden, J. (1685) Sylvæ, or, The second part of Poetical miscellanies London, Tonson. Published online by Philologic. (EEBO-TCP; phase 1, no. A36697) Transcribed from: Early English Books Online; image set 58020
Edelstein, D., Morrissey, R., and Roe, G. (2013) "To Quote or not to Quote: Citation Strategies in the Encyclopédie", Journal of the History of Ideas Vol. 74, No. 2: 213-236.
Greenblatt, S. (2011) The Swerve (New York: W.W. Norton).
Horton, R., Olsen, M., and Roe, G. (2010) "Something Borrowed: Sequence Alignment and the Identification of Similar Passages in Large Text Collections," Digital Studies / Le Champ numérique Volume 2, Number 1
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at McGill University, Université de Montréal
Aug. 8, 2017 - Aug. 11, 2017
438 works by 962 authors indexed
Conference website: https://dh2017.adho.org/
Series: ADHO (12)