Seven collections of prosody: the difficulty of visualizing non-hierarchical data

paper, specified "short paper"
Authorship
  1. 1. Rebecca Sutton Koeser

    Princeton University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Why do so many data visualizations require sequential or hierarchical data? The sequential and hierarchical aspects of humanities data are often not the most important or interesting aspects. We’ve been exhorted to embrace the nonscalability and diversity of our data (Rawson and Muñoz), to disavow binary distinctions (D’Ignazio and Klein), and others are working to improve how we visualize uncertainty (Hullman). Creative and evocative visualizations abound, but they are rarely generalizable new approaches. Where are the new data visualizations for Digital Humanities work?  Based on work for a recent Princeton Prosody Archive (PPA) Editorial essay, I will share my experiences and failures attempting to visualize the seven scholar-curated collections within PPA. Categorization of the content in PPA is complicated because of the shifting meaning of the term prosody over time, encompassing both grammar and phonology as well as versification (Martin), and my visualizations reflect aspects of these two overlapping discourses in the form of the Linguistic and Literary collections within PPA.The seemingly simple task of showing the relative size and overlap of all seven collections is actually quite difficult. A bubble plot or bar chart can convey relative sizes, but can’t communicate overlaps and are limited to sets of similar scale. Visualizing the relative size and overlap of PPA (4,792 items) alongside the original bibliography that inspired its creation is useful, but put in context of all of HathiTrust (16 million items) we see a laughable diagram that nevertheless demonstrates the value of smaller, scholar-curated bibliographies (more than mere collections) based on HathiTrust materials. Relative sizes of bar charts are easier to read than circles, but a bar chart requires imposing an order. Euler and Venn diagrams (invented by mathematicians in the 1700s and 1800s) seem promising, but are limited in the number of sets, and also have the same problems with relative scale as bubble plots and bar charts. Treemap diagrams are useful for visualizing relationships, but only within a single hierarchy; likely because it was invented to visualize disk space utilization (Shneiderman and Plaisant). The UpSet plot is a relatively new solution to show “set intersections in a matrix layout” (Lex et al, 2014), which came out of bioinformatics, and is a powerful solution for this problem once viewers are oriented and trained to read an UpSet plot. Length proportional linear diagrams are another solution for set visualization (Rodgers et. al., 2015) which purport to improve set-related task performance, but they do not yet seem to be widely known or used. After a quick overview of these visualization solutions and their limitations, I will demonstrate an experimental visualization I created as I worked on visualizing the PPA collections. It was inspired by the “warming stripes” climate change visualization (Hawkins) and a workshop series on p5.js called “Playing with Data” (Roth and Koeser). Vertical stripes are used to represent each item, with colored horizontal rows designating collection membership; interaction allows the viewer to focus on a single collection and see how it overlaps with others. In an earlier prototype, I sorted items somewhat arbitrarily by publication date. In my most recent prototype, items are ordered randomly to give viewers an overall sense of the distribution of the collections, but with interaction that allows a user to group all items in a single collection to allow focusing on overlaps with one collection at a time. Some might argue that a linear diagram would provide a more objective or efficient solution, but I’m less interested in viewers “accomplishing set-based task performance” than I am in the affective and aesthetic impacts (D’Ignazio and Klein) that make it possible to see and interact with these collections.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO