TADA Research Evaluation Exchange: Winning 2008 Submissions

Stéfan Sinclair

Authorship

1. Stéfan Sinclair

McMaster University

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Winners: Dave Beavan, Susan Brown, J.
Stephen Downie, Carlos Fiorentino, Patrick
Juola, Shelly Lukon, Peter Organisciak,
Geoffrey Rockwell, Susan Schreibman,
Kirsten Uszkalo
I
n the spring of 2008 the Text Analysis Developers’
Alliance organized a digital humanities tools competition
called T-REX, modelled on similar competitions
such as MIREX (music information retrieval) and TREC
(text retrieval competition), (cf. Downie 2006). The
community response to T-REX was very positive and
among the many submissions received, judges selected
winners from the following categories:
• Best New Web-based Tool
• Best New Idea for a Web-based Tool
• Best New Idea for Improving a Current Web-Based
Tool
• Best New Idea for Improving the Interface of the
TAPoR Portal
• Best Experiment of Text Analysis Using High Performance
Computing
The categories above deliberately cover not only working
tools, but also ideas, designs and preliminary experiments;
a primary objective of T-REX is to encourage
the involvement and collaboration of programmers, designers,
and users. More information on the categories,
the judges, and other aspects of T-REX are available at
http://tada.mcmaster.ca/trex/.
The organizers and winning participants of T-REX would
like to propose a cluster of posters that showcase various
aspects of the research done. In particular, we will
prepare seven “half” posters presenting relevant aspects
from each of the winning T-REX submissions. In addition,
an eighth “half poster” will provide an overview of
the inaugural TREX competition, lessons learned, and
new initiatives for the second round. Below are brief descriptions
of each of the seven project posters.
Susan Brown et al.,
Degrees of Connection Tool (New Tool)
This linkage tracing tool allows users working on a large
collection of documents to explore the linkages within
the collection based on the semantic tags it contains. Our
prototype based on the Orlando textbase traces links between
people mentioned in different XML documents
based on co-occurrences of a small set of key tags that
occur across many documents: personal names, organization
names, places, and titles. This exploits the tagging
to get at connections between people that may not be
made by direct linkages between documents, but rather
by the co-occurrence of tags within two documents, or
a pathway from document x to document y by way of
document z in which different tags common to x and y
occur in z. It is a way of getting at implicit but nevertheless
potentially important linkages, and while it emerges
in this case from an interest in literary history, the tool
could be useful to other fields ranging from journalism
to creative writing, sociology or psychology. It provides
a new way of exploring the large digital collections researchers
are increasingly using. The poster will 1) outline
the principles on which the prototype is based, 2) list
key features for a fully-developed, generalized version,
3) explain our application of graph theory to the tagged
text, and 4) outline the design challenges that emerged in
the development of the prototype. A live demo will allow
attendees to test the prototype.
David Beavan,
Collocate Cloud (Idea for Existing Tool
Improvement)
Clouds of information e.g. keywords, tags or words, are
a very useful way to aggregate and present vast quantities
of data. These clouds have gone on to be used in
many web 2.0 sites. As such they are becoming a well
known and understood visualisation by many users.
TAPoRware currently provides a Word Cloud visualisation,
which shows the frequency of words in a document.
Scholars often wish to go further, to see how a particular
word is used, by examining which words co-occur near
their search word. TAPoRware already has this Collocation
tool, showing the results in tabular form.
The Collocate Cloud would merge the collocation output
and the cloud visualisation technology. It will show the
collocates of a particular search word in cloud format.
The alphabetical ordering of the Collocate Cloud would
allow the user to find or discount a word quickly. Frequencies and collocational strength are shown by size
and brightness, letting these terms stand out visually.
Carlos Fiorentino,
The Magic Circle (Idea for New Tool)
The Magic Circle is an information glyph that allows
scholars to visually summarize combinations of the lexical
information included in text collections (typically frequency
data about words, lemmas, and parts of speech)
and the bibliographic information attached to these texts.
The glyph consists of a set of rings organized outwards
from the centre and divided in wedges or sections. The
lexical data determine the size of the centre, which also
shows a word, a lemma, or a part of speech, and the total
number of search results for that word, lemma, or part
of speech within a specified work or set of works. The
bibliographic data is related to authors specified by the
user, and the rings allow the user to analyze how the
search results are distributed in different collections as
well as in different periods of time. The color sets of the
rings follow patterns of associations with variations in
hue, tone and saturation. A comparative scale helps the
user to understand the volume of information found in
context with the whole volume of information present in
the collections.
Alejandro Giacometti et al.,
Ripper Browser (New Tool)
The Ripper Browser is a prototype for rich-prospect
browsing of text collections. Rich-prospect browsing
interfaces are designed to aid research tasks such as exploration
and synthesis by providing both a meaningful
representation of each item in a collection and tools to
manipulate their visual organization (Ruecker 2003). The
Ripper browser offers an environment for exploration
and interaction with digital text documents. The system
creates tiles that contain faceted information about each
document. The tiles can be manipulated with a series of
controls to reveal or hide details, organize them according
to a particular hierarchy, or select a specific group.
By adapting the size of the tiles, the Ripper browser allows
researchers to visualize the complete collection and
the precise information they need about each document
in view at all times. The Ripper browser was developed
in web-native technologies: HTML, JavaScript, and uses
the jQuery library. It is configured to use text collections
provided by the MONK Project. The Ripper browser is
part of an ongoing effort to understand the potential of
rich-prospect browsing and improve on our strategies for
designing rich-prospect tools. It has allowed us to experiment
further with meaningful representation, increased
our understanding of the importance of sequences, and
provided insight into new possibilities for organization
in visualizations.
Patrick Juola & Shelly Lukon,
Back-of-the-Book Index Generation
(Experiment in HPC)
This is actually a work-in-progress; as we have detailed
elsewhere (Juola, 2005, ACH/ALLC; Lukon and Juola,
2006, DH2006), we are working on a program to apply
standard ML techniques, including latent semantic analysis,
to the problem of back-ofthe-book index generation.
LSA implicitly uses huge document-by-term matrices
to determine which words appear in similar contexts
and are therefore good candidates for grouping under a
single index term.
The sheer size of this matrix makes it difficult to work
without HPC; one of the tools we are using is the 200+
node Beowulf cluster available at Duquesne University
Computer Science Department. We analyze the document
to be indexed (which can in theory be arbitrarily
large but in practice will be about novel-sized) to select
candidate words (mostly nouns, via POS tagging) for indexing,
then use LSA to identify potential relationships
among those words.
Peter Organisciak,
Bookmarklet for Immediate Text Analysis
(Improving TAPoR)
This idea is of an online interface for the generation of
TAPoR bookmarklets on demand.
Bookmarklets are browser bookmarks that run javascript
code. They provide value to text analysis tools in two
way: ease and ubiquity. They allow one-click connection
of content to tool and, more importantly, allow it to be
run on whatever content the user is at.
One problem of bookmarklets is that they are static,
which means that customization of the query is limited.
One solution would be to call up an interface every time
the bookmarklet is called. Doing so, however, is an impediment
to the core concept of ease. Rather, through an
interface for creating customized bookmarklets, a user
can create single-purpose bookmarklet buttons that do
the same command every time, immediately and directly.
Kirsten C. Uszkalo,
Throwing Bones (Idea for New Tool)
The Throwing Bones interface operates as a means to
discover meaningful relationships within a corpus of
texts. These relationships will appear as a series of piles,
which the user can zoom into and out of, shuffle through and examine closer for more comprehensive, annotated
information. For example, in the case of a corpus of early
modern witchcraft trials, a user might want to see the relationship
between animal familiars and accusers. After
shuffling, the top item in a pile would show the number
of familiars, while the cards beneath show the number of
accusers, illustrating a connection between the imagination
of accusers and the presence of familiars in trials
and texts. The piles could also be based on geographic,
temporal, textual, or relationship proximity. The concept
behind Throwing Bones is that the interface will not only
offer the pleasure of play, but also erudite and serendipitous
textual analysis.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Conference website: http://web.archive.org/web/20130307234434/http://mith.umd.edu/dh09/

Series: ADHO (4)

Organizers: ADHO

TADA Research Evaluation Exchange: Winning 2008 Submissions

1. Stéfan Sinclair

ADHO - 2009