Seeing the Text Through the Trees:Data and Program Visualization in the Humanities

paper
Authorship
  1. 1. Geoffrey M. Rockwell

    McMaster University

  2. 2. John Bradley

    King's College London

  3. 3. Patricia Monger

    McMaster University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Can we represent a text as an interactive diagram? Do visualization systems give researchers an interesting way to navigate an electronic text?
In this paper we will discuss research we have been conducting on textual visualization, in particular a visual programming environment for text applications called Eye-ConTact and a interactive visualization system for the WWW called SIMWeb. These two experiments in the design of text systems show the potential for two different types of visualization in computer assisted text analysis.
The first project, SIMWeb, was designed to test the feasibility of interactive diagrams derived from Correspondence Analysis on a text, that are connected to a full text search engine. The combination of the text engine and the visualization environment works on the WWW so that users can query a text by changing the parameters for the visualization and clicking on parts of the visualization to launch the text engine. In the presentation we will demonstrate SIMWeb and comment on the design issues.
The second, Eye-ConTact is a visual programming environment for researchers who are studying texts. It is a prototype designed to test possible designs for modular visual development. In Eye-ConTact the user creates a flow chart of how they want their "experiment" on a text to be conducted. The Map of this flow of data from a complete text to refined displays allows the researcher to see the logic of his or her study of a text and then reproduce such studies with other texts. In this paper we will demonstrate a working prototype and discuss the design issues around such an interface design.
These two experiments in visualization are of different sorts, but connected in that Eye-ConTact is, among other things, designed to facilitate data visualization of the sort experimented with in SIMWeb. In the concluding part of our paper we will comment on the intersection of program and data visualization and the need for a new generation of tools that allow both.
In particular this paper we will do the following:
Discuss the history of text and program visualization with particular attention to the types of interactive visualizations possible.
Demonstrate the SIMWeb interactive visualization environment.
Demonstrate the Eye-ConTact visual programming prototype for text-analysis.
Discuss the future of text-analysis tools in the humanities in light of experiments like these.
1. Interactive Text and Program Visualization
Humanists have been graphically showing off the results of their research for some time; one need only look at the frontispiece of Vico's New Science or the illustrations in Tristram Shandy for examples that predate the computer. In an earlier paper "Watching Scepticism: Computer-Assisted Visualization and Hume's 'Dialogues'", 1 we outlined a topology of these types of visualizations of texts. This topology was limited in two ways, first it was limited to static representation, and second it was limited to representations of texts. In this paper we propose to widen the scope by discussing and showing two other types of graphical representations of interest to the textual scholar.
1.1 Interactive Visualization
The first type of visualization of interest is the interactive one. The graphs found in articles in Computers and the Humanities and Literary and Linguistic Computing are presented to the reader to demonstrate finished research. These types of graphs can also be used to explore a text when they are connected to a text-analysis environment like TACTweb. It is common in the sciences to use visualization systems not only for the presentation of results but for the exploration of data. This is particularly the case when the researcher is overwhelmed by the amount of data that today's tools can provide. Reading tables of numbers to find patterns becomes difficult after a certain point. Scientists have therefore resorted to visualization tools to help them discover anomalies and find patterns in quantitative data. In the humanities we are in an analogous situation. The quantity of electronic texts and the amount of quantitative data we can gather about these texts is overwhelming our ability to interpret the results. For this reason we need to explore interactive visualization tools appropriate to our data. As William Playfair pointed out 200 years ago,
Information, that is imperfectly acquired, is generally as imperfectly retained; and a man who has carefully investigated a printed table, finds, when done, that he has only a very faint and partial idea of what he has read; and that like a figure imprinted on sand, is soon totally erased and defaced. ... On inspecting any one of these Charts attentively, a sufficiently distinct impression will be made, to remain unimpaired for a considerable time, and the idea which does remain will be simple and complete, at once including the duration and the amount. 2
The SIMWeb project which we will demonstrate later demonstrates the possibilities for links between interactive visualizations and traditional text-analysis tools like TACTweb, which is a WWW accessible version of a subset of the functionality of TACT. This environment provides the researcher with a visualization of a statistical analysis of a particular text, in this case Hume's Dialogues Concerning Natural Religion which can be explored in a fashion that the tables of numbers output from such a statistical analysis could not. The point of this demonstration is not to defend the statistical analysis employed, but to demonstrate the value of interactive exploration of data and the connection between interactive visualizations and traditional text tools. In particular such visualizations offer the following features:
Because they show the words in a two-dimensional space as opposed to only points, the user can easily make the connection between the words used in the statistical analysis and resulting data. Many graphing tools do not easily provide for the labeling of points with the words whose position they represent.
Because one can change the parameters of the representation and zoom in on the results one can easily explore the data for anomalies and patterns. The research can thus identify the perspectives that best demonstrate features of the text for further study.
Because one can launch a TACTweb search from any display, which means one can call up the full text, one can easily make the connection between the graphical representation and the text represented. This allows one to confirm possible patterns or decide that what you see is uninteresting.
As long as the visualization tools we use do not connect with text-analysis tools and do not properly display the objects of our study, namely texts made up of words, then we will not be able to enjoy the serendipitous discovery that characterizes highly interactive research environments. As anyone knows who has used tools like TACT, the character of the research done with an interactive tool is different from that done with tools where each query has to be thought, programmed, and then run.
1.2 Program Visualization
In computer science there has been a substantial amount of research done in the area of visual programming. (See the bibliography below, especially Myers and Price.) Visual programming environments provide the user with a graphical way of representing the logic of their application as an alternative to the way most of us program by writing code in a programming language. While there is much excitement about the potential for visual programming there have been few successes. It would seem that visual programming works best in specific domains where there is a consensus about the types of operations and the relationships between operations. Textual scholars offer such a community - we have the experience with computer assisted text-analysis to have built a consensus about the operations likely to be needed. In addition the promise of domain specific visual programming is that it should allow the humanist to concentrate on the tasks of their discipline, not on the difficulties of programming. A visual programming environment tailored to text-analysis promises to provide accessible programming suitable to researchers in the humanities. The Eye-ConTact project to be demonstrated later in this paper is a prototype of what such text-analysis visual programming might look like.
As important as the ease of programming is the potential for the rigorous description of the logic of a research project. Too often when using text-analysis tools the researcher might find useful results but have not kept track of how the results were derived. This becomes a particular problem when one is moving data from one application to another. The design principle behind Eye-ConTact is that the user be forced to build the process by which they get results so that the program can record that process as a map which can then reviewed at a glance (or altered as the research matures). This allows the researcher to recapitulate the process with a different text or to experiment with different processes with the same text. Given that such visual programming environments are usually modular, this also allows the researcher to build larger projects out of smaller programs if they can be repurposed. From the user's point of view, by implementing a text-analysis environment as a visual programming environment, the results always come through the conscious arrangement of operations on information thus preserving the logic of the experiment. Visual programming tools like Eye-ConTact are therefore more difficult to use out of the box, but they encourage a research discipline on the user which we feel is important to include in the next generation of research text-analysis tools. 3
It is also important to understand how visual programming, or the visualization of a program, is different from the visualization of data. What we discussed above in part 1.1 was data visualization where what is represented graphically is a quantification of the text. Program visualizations like Eye-ConTact grow out a genre of scientific visualization tools like Explorer which are designed to show you the flow of data through selected operations. What you see is the logic of the transformation of information which might lead to the type of data visualization result which SIMWeb displays. The program visualization is showing you the logic of your experiment, the data visualization the text represented. To illustrate this difference it is best to turn to the demonstrations.
SIMWeb
At this point we will demonstrate SIMWeb which is an interactive WWW based environment. SIMWeb can be tried at URL: http://tactweb.humanities.mcmaster.ca/cgi-dos/simweb/simweb.bat. In the demonstration we will do the following,
Discuss the statistical analysis that generated the interactive graph,
Demonstrate the different perspectives one can take on the data and how one can zoom in on the data,
Demonstrate the link to TACTweb, and
Discuss the underlying technology.
Eye-ConTact
Following the demonstration of SIMWeb we will demonstrate the Eye-ConTact prototype. Eye-ConTact is a working thought piece that was designed to test the feasibility of visual programming for text-analysis. It was built in Visual Basic for the Wintel platform and uses a variant of TACTweb as its underlying text engine. It was designed to try different paradigms for visual programming in the humanities, so the emphasis of the demonstration will be on the interface issues that crop up when one tries to represent graphically a text application. In particular we will focus on the following:
What is being represented in a visualization of a text application. We believe that what should be displayed is the flow of transformations to a text in order to produce a useful research result.
What should a map of the flow of information through a text application display. We believe the map should be a rigorous description of the transformations to the original electronic text such that the results could be recapitulated from the map. A visual programming environment should encourage the researcher to think about the transformations they subject a text to and it should allow the researcher to keep an accurate record of how their analysis produced the results they find significant.
What are the limitations of this prototype. This prototype while it works, was not designed to be a distributable product. More importantly we have in the process of prototyping discovered limitations to the design that would have to be fixed in any full development project. We will discuss these limitations.
A paper discussing the rationale behind the design of Eye-ConTact along with screen dumps of the program can be found at the URL: http://www.humanities.mcmaster.ca/~grockwel/ictpaper/ictintro.htm 4
Conclusion
To conclude we should stress that these two types of visualization are not the only types that can be imagined. Visualization raises a collection of issues around what it means to represent textual information. In the case of textual visualization we are interpreting and re-presenting a text not through further texts, but through images and diagrams. The computer now allows us to do this automatically, in the sense that the diagram is generated based on measurements of the original text or corpus. Illustrations of texts created by artists do not have the direct connection to the original text, though they might be more illuminating. What the direct connection between the diagram and the text allows us to do is design diagrams that are interactive - that can respond immediately to researcher. That said, there is much work to be done determining how to best measure or quantify texts for graphical representation. The reader of this paper should not confuse the particular statistical methods we used with the potential for graphical representation. Other ways of quantifying texts and a better appreciation of textual statistics will lead to graphical representations we haven't imagined, which is why we stepped back and tried to imagine a visual programming environment where researchers could easily experiment with visualizations. While we expect interactive visualization will become more common, we also expect that the types of visualizations will change as the field matures. This paper has concentrated on showing two major types of interactive visualization and the pragmatic connection between the two. The potential for the use of visualization in text-analysis tools should not be obscured by possible defects in our implementation; instead these two experiments point to the need for tools that allow us to explore different ways of quantifying and representing texts.
Notes
1 Bradley, John, Rockwell, Geoffrey, "Watching Scepticism: Computer Assisted Visualization and Hume's Dialogues ".
2 Playfair, The Commercial and Political Atlas, p. 3-4; from Tufte, The Visual Display of Quantitative Information. p. 32.
3 In a paper available on the web we went into the reasons for recording ones research in more detail. See Bradley, J., Rockwell, G., Towards new Research Tools in Computer-Assisted Text Analysis at the URL: http://www.humanities.mcmaster.ca/~grockwel/ictpaper/learneds.htm.
4 Rockwell, Geoffrey, Bradley, John, "Eye-ConTact: Towards a New Design for Research Text Tools".
Bibliography
1. Arnheim, R., Visual Thinking, University of California Press: Berkeley, California, 1969.
2. Bertin, J. (Berg, W. J. tr.), Semiology of Graphics, University of Wisconsin Press: Madison, Wisconsin, 1983.
3. Bradley, J., Rockwell, G., "Watching Scepticism: Computer Assisted Visualization and Hume's Dialogues ", Research in Humanities Computing 5, Oxford: Clarendon Press, 1996, pp. 32-47.
4. Earnshaw, R. A., and Wiseman, N., An Introductory Guide to Scientific Visualization, Springer-Verlag: Berlin, 1992.
5. Foulser, D., "IRIS Explorer: A Framework for Investigation," Computer Graphics, 29.2, pp. 13-16.
6. Hume, David. (Stanley Tweyman, ed.) Dialogues Concerning Natural Religion., New York: Routledge, 1991.
7. McKinnon, A., "Mapping the Dimensions of a Literary Corpus" in Literary and Linguistic Computing, 4:2, pp. 73-84.
8. Myers, B. A., "Taxonomies of Visual Programming and Program Visualization," Journal of Visual Languages and Computing, 1, pp. 97-123.
9. Lancashire, I., Bradley, J., McCarty, W., Stairs, M., and Wooldridge T. R., Using TACT with Electronic Texts, The Modern Language Association of America: New York, 1996.
10. Petre, M., Green, T. R. G., "Learning to Read Graphics: Some Evidence that 'Seeing' an Information Display is an Acquired Skill", Journal of Visual Languages and Computing, 4 (1993), pp. 55-70.
11. Petre, M., "Why Looking Isn't Always Seeing: Readership Skills and Graphical Programming", Communications of the ACM, 38:6 (1995), pp. 33-44.
12.Potter, R. G., "Literary Criticism and Literary Computing: The Difficulties of a Synthesis", Computers and the Humanities, 22, pp. 91-97.
13. Price, B., I. Small, and R. Baeker. "A Principled Taxonomy of Software Visualization." Journal of Visual Languages and Computing, 4.1, pp. 211-266.
14. Raymond, D. R., "Visualizing Texts", Making Sense of Words: Proceedings of the Ninth Annual Conference of the UW Centre for the New OED and Text Research, pp. 19-32.
15. Raymond, D. R., "Characterizing Visual Languages" IEEE Workshop on Visual Languages, IEEE Workshop on Visual Languages, (1991), pp. 176-182.
16. Repenning, A., and T. S., "Agentsheets: A Medium for Creating Domain-Oriented Visual Languages", IEEE Computer, March 1995, pp. 17-25.
17. Rockwell, Geoffrey, Bradley, John, "Eye-ConTact:Towards a New Design for Research Text Tools", accepted for publication in the online journal Computing in the Humanities Working Papers, URL: http://www.chass.utoronto.ca:8080/epc/chwp/. The current version of the essay is available at URL: http://www.humanities.mcmaster.ca/~grockwel/ictpaper/ictintro.htm.
18. Rockwell, G., Bradley, J., "Empreintes dans le sable: Visualisation scientifique et analyse de texte", forthcoming collection of papers edited by Michel LeNoble and Alain Vuillemin entitled Litterature, Informatique, Lecture: De la lecture assistee par ordinateur a la lecture interactive.
19. Tufte, E. R., The Visual Display of Quantitative Information, Graphics Press: Chesire, Connecticut, 1983.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998
"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

Tags