Co-Cited Author Maps as Real-Time Interfaces for Web-Based Document Retrieval in the Humanities

paper
Authorship
  1. 1. Howard D. White

    College of Information Science and Technology - Drexel University

  2. 2. Xia Lin

    Drexel University

  3. 3. Jan Buzydlowski

    College of Information Science and Technology - Drexel University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We offer here a brief account of a Web-based visual information retrieval interface that gives humanists new powers in mapping scholarly literatures and retrieving documents from them. The maps make visible bibliographic structures otherwise hidden in a database and, by implication, the intellectual structure of a field as evidenced by citation data on its authors. In preparing writings such as reviews of literatures or histories of specialties, the maps are publishable documentation in their own right, regardless of whether they are used for retrieval.

In August 2000, we implemented the system presently called AuthorLink as an interface to the Arts & Humanities Citation Index (AHCI) for 1988-1997, a database with 1.26 million records that the Institute for Scientific Information, its publisher, gave our college for research purposes in 1998. The system is operational on the Web, although, because of site-licensing limitations on our infrastructural software, BRS/Search, we cannot yet handle multiple simultaneous visitors. Given a single author’s name as input, AuthorLink can create two different maps of interrelated author-names from AHCI. as seen in Figure 1. Both are interactive interfaces for document retrieval (White, Lin, & Buzydlowski, 2000; White, Lin, & McCain, 1998).

The relationship among the authors is co-citation—that is, counts of the number of articles that cite pairs of authors jointly in the journal literatures covered by AHCI . The tradition of author co-citation mapping actually began in the humanities with the work of Rosengren (1968), who studied "co-mentions" of authors such as Ibsen and Tolstoy in Scandinavian book reviews. It has been continued in a series of papers and dissertations (including some in the humanities) reviewed in White (1990a, 1990b) and White & McCain (1989, 1997). "Authors" should here be understood as oeuvres, not persons. For example, if any work by Faulkner is cited with any work by Hemingway in the notes of a journal article covered by AHCI, the count for the pair Faulkner–Hemingway would be incremented by one. (Whole citing articles, not their individual references, are being counted; were Faulkner and Hemingway both cited multiple times in a single article, that article would still add only one to the Faulkner–Hemingway count.) Over time, co-citation counts come to vary enormously, ranging from zero for pairs of authors who have never been jointly mentioned, to many hundreds for pairs who, like Faulkner and Hemingway, are frequently studied together. It is recurrent co-citation that signifies an important interrelation; the higher the co-citation counts between pairs of authors, the stronger their relationships as perceived by citers, and the more readily interpretable the maps based on them will be.

We can use as an input name any author or artist whose work was cited in AHCI 1988-1997. This means that we can map many thousands of people, famous or obscure, past or present, in the arts and humanities– from Thales to Martin Scorsese, from Palestrina to Erica Jong. The maps are created on the fly, in seconds, for 25 authors at a time. After a particular author’s name is entered (e.g., Albert Camus, entered as Camus-A), AuthorLink retrieves and rank-orders the names of the 24 other authors who are most frequently co-cited with the input author; their co-citation counts with him or her are shown as well. On request, all 25 are then systematically paired with each other, and their co-citation counts are retrieved from AHCI to be placed in a 25 x 25 symmetric matrix. BRS/Search, a commercial package similar to Dialog, is used in these ranking and retrieval operations.

The maps are based on the matrix of raw counts, which they usefully simplify by presenting only the strongest co-citation relationships. Authors most frequently seen by citers as "belonging together" are displayed as neighboring or linked points (depending on the mode of mapping). The points are labeled with authors’ names. Properly manipulated, these labels lead to retrieval, from the full 10-year database, of documents that co-cite the input author with other authors in the map. For example, one could retrieve all AHCI records of documents that co-cite Albert Camus and Simone de Beauvoir. The process may, of course, lead to further mappings and further retrievals based on the interest of names as they emerge.

The different two-dimensional display modes are (1) Kohonen feature mapping, programmed by Lin, who also designed the overall interface, and (2) Pathfinder networks (PFNets), programmed by Buzydlowski. The two are demonstrated in Figure 1, with Plato as the input name (on the actual maps, the input name is in red; all other names are in blue). The Kohonen display, a self-organizing map (SOM) produced by a neural networks algorithm (Kohonen, 1989), puts in contiguous spaces the authors that most frequently co-occur; the blocked areas shown are proportional in size to the magnitude of the counts. The PFNet display (Schvaneveldt, 1990) results from what may be thought of as a link-pruning algorithm. With count-data as weights on links, it examines the possible paths linking author-points and reproduces only those whose weights sum to the least amount. This has the effect of basing a link on the single highest (or tied highest) co-citation count between authors, because the weights on all other paths would sum to an even higher amount.

We maintain that both maps in Figure 1 have high face validity, and we think that literally thousands of similar AuthorLink maps will be judged to have it as well. One need not be Mortimer Adler to see the cogent groupings that arise from algorithmically finding the authors with the highest co-citation counts and rendering these as spatial metaphors. In Figure 1(a), Plato and Aristotle, the big two, are central. Surrounding them, the Greek and Roman historians are automatically placed at upper right; the Greek playwrights and poets all go to lower right; the Roman poets are aligned with the Greek poets along the bottom; the modern Continental philosophers are below the early moderns at lower left; and the Bible and Christian philosophers related to Plato are at upper left.

Figure 1(b) is not oriented the same way as Figure 1(a), but the connections made by the counts, because explicit, are even clearer. Diogenes Laertius, whose name is truncated in the ISI records, appears as a biographer of Plato (and many other classical philosophers). We will let the reader elucidate the other links, which of course are at considerable remove from the actual statements that are being made in the co-citing documents. Here we will simply point out some features of AuthorLink not apparent in Figure 1. If the cursor is passed over any name in either map, that author’s co-citation count with Plato (or whoever is the input author) will appear. If the "Show Numbers" button near the PFNet is clicked, the co-citation counts for all explicit links will appear (e.g., the count for Nietzsche and Derrida or for Homer and Hesiod). This helps the viewer see how big the document sets associated with various retrievals will be.

Retrieval is carried out by double-clicking on a name and dragging it to the "Additional Authors" box at right in both figures. The main author, in this case Plato, is automatically entered there when the map is formed; any additional authors will be searched in a Boolean AND relationship with the main author when the "Search" button is clicked. The search calls up an interface provided by BRS/Search that shows the authors and titles of documents retrieved from AHCI 1988-1997. A click on any one of these shows the full AHCI record, which consists of a bibliographic reference, sometimes an abstract (ISI only added these in recent years), and a list of items that the full document cites. In the latter, the co-cited authors that caused the retrieval are highlighted in blue.

We believe that AuthorLink is particularly congenial to humanists’ patterns of inquiry. Humanists tend to center their work on named persons. AuthorLink, moreover, requires only a single named person as input, rather than complex search strategies. The payoff for a single name is 25 names whose mapped connections are usually intriguing to interpret, allowing one to demonstrate one’s cultural literacy in a chosen domain. There may also be a factor of self-interest: any person reading this account can be mapped if his or her citation record is in our decade of AHCI.

References

Kohonen, Teuvo. (1989). Self-Organization and Associative Memory. 3rd ed. New York: Springer-Verlag.

Rosengren, Karl Erik. (1968). Sociological Aspects of the Literary System. Stockholm, Sweden: Natur och Kultur.

Schvaneveldt, Roger W., ed. (1990). Pathfinder Associative Networks: Studies in Knowledge Organization. Norwood, NJ: Ablex.

White, Howard D. (1990a). Introduction [to Perspectives on...Author Co-Citation Analysis]. Journal of the American Society for Information Science 41: 430-431.

White, Howard D. (1990b). Author Co-Citation Analysis: Overview and Defense. In Bibliometrics and Scholarly Communication, Christine Borgman, ed. Newbury Park, CA: Sage. 84-106.

White, Howard D., Jan Buzydlowski, Xia Lin. (2000). Co-Cited Author Maps as Interfaces to Digital Libraries: Designing Pathfinder Networks in the Humanities. In Information Visualisation 2000, Proceedings of the IEEE International Conference on Information Visualisation, July 19-21, 2000, London, England. Los Alamitos, CA: IEEE Computer Society. 25-30.

White, Howard D., Katherine W. McCain. (1989). Bibliometrics. Annual Review of Information Science and Technology 24. Amsterdam: Elsevier. 119-186.

White, Howard D., Katherine W. McCain. (1997). Visualization of Literatures. Annual Review of Information Science and Technology 32: 99-168.

White, Howard D., Xia Lin, Katherine W. McCain. (1998). Two Modes of Automated Domain Analysis: Multidimensional Scaling vs. Kohonen Feature Mapping of Information Science Authors. In Structures and Relations in Knowledge Organization, Widad Mustafa el Hadi, Jacques Maniez, Steven A. Pollitt, eds. Wurzburg, Germany: Ergon Verlag. 57-63

Figure 1. Two Screen Displays from AuthorLink

Figure 1(a). Kohonen Display of the Co-Citation Map for Plato

Figure 1(b). PFNet Display of the Co-Citation Map for Plato

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2001

Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC