SEASR integrates with Zotero to Provide Analytical Environment for Mashing up Other Analytical Tools

poster / demo / art installation
Authorship
  1. 1. Loretta Auvil

    University of Illinois, Urbana-Champaign

  2. 2. Boris Capitanu

    University of Illinois, Urbana-Champaign

  3. 3. Xavier Llorà

    University of Illinois, Urbana-Champaign

  4. 4. Michael Welge

    University of Illinois, Urbana-Champaign

  5. 5. Bernie Ács

    University of Illinois, Urbana-Champaign

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper describes a development effort to link two
humanities cyberscholarship infrastructure projects
supported by The Andrew W. Mellon Foundation. We
have created an extension to Zotero [1] that acts as a
bridge between the data stored by Zotero, and the suite
of analytic tools provided by SEASR [2]. This extension
provides users with the ability to apply a variety of data
analysis algorithms to their Zotero constructed collections,
and visualize the results directly in the browser.
This is accomplished by directly accessing the data model
provided by Zotero, and converting that data model
into RDF, which allows the ability to exploit the analytical
capabilities of SEASR.
The SEASR environment provides a framework to integrate
data, analytics, and tool constructs, so that data
from one component can be passed to another. One of the
unique capabilities of SEASR is the facility to provide a
tool for mashups. That is, the ability to allow users to
combine tools in efficient and effective ways. This paper
describes the coupling of two relevant environments for
humanist, Zotero and SEASR - Zotero’s data asset creation
with the analytical capabilities of SEASR. Through
the use of Zotero’s plugin environment, we can execute
the analysis capabilities of the SEASR environment.
The following sections provide a description of the two
major pieces of this effort—Zotero and SEASR. Also
provided is a description of the major functions performed
by the combination of the two. These include:
data gathering, data analytics, and data visualization. We
end with a summary of the integration of the two efforts
and a view to our future work.
1. Background
1.1 Zotero
Zotero was selected because of its popularity with scholars
to record, catalog and find resources collected from
the Internet. Zotero was developed at the Center for History
and New Media, George Mason University, and is
a tool aimed at facilitating a user’s research process by
providing mechanisms for collecting, managing, and citing
internet resources. Zotero functions as an extension
of the popular open-source browser, Firefox, which allows
it to provide its services in the same environment
where the research is usually performed. One of the key
features provided by Zotero is the ability to automatically
extract metadata from online resources as part of
the resource collection process, and store it conveniently
on the user’s computer, allowing for offline retrieval of
this data on demand. Zotero also provides advanced tagging
and searching functionality, allowing the user to organize,
find, and visualize the collected resources.
Zotero includes a powerful metadata editor, allowing the
user to make additions/corrections to the automatically
extracted information. Users can add new fields, attach
screenshots and documents, create notes, and even create
relationships between the various resources collected.
Overall, with such a vast and diverse amount of information,
a mechanism for finding patterns or interesting
relationship between these resources would go a step
further in helping researchers discover and extract more
information from their collections. Enter SEASR.
1.2 SEASR
(Software Environment for the
Advancement of Scholarly Research)
SEASR analytics enhances scholars’ use of digital materials
by helping them uncover hidden information and
connections, supporting the study of assets from small
patterns drawn from a single text or chunk of text to
broader entity categories and relations across a million
words or a million books. SEASR is designed to enable
digital humanities scholars to rapidly design, build, and share software applications that support research and
collaboration.
The SEASR team developed Meandre [3], which is the
machinery for assembling and executing data flows—
software applications consisting of software components
that process data (such as by accessing a data store,
transforming the data from that store and analyzing or
visualizing the transformed results).
SEASR is extensible allowing for new analytics to be
added, such as support for linguistic analysis for different
time periods or languages, to readjusting entire
steps in the work process so that researchers can validate
results from their queries. Components can be created
from other programming projects. The SEASR environment
is data driven and includes a workbench to orchestrate
the flow of data through the different components.
All SEASR analytics are enabled as web service calls.
SEASR also provides publishing capabilities for flows
and components, enabling users to assemble a repository
of components for reuse and sharing. This allows users
to leverage other research and development efforts by
querying and integrating component descriptions that
have been published previously at other shareable repository
locations.
2. Data Gathering
Zotero’s data model is very flexible, allowing the user
to add new fields, create notes, attach documents and
screenshots, and establish relationships between resources.
At a minimum, Zotero adds the following information
for each resource that is added: title of the
resource, originating URL, and the dates when the resource
was created, modified, and accessed. For many
major research and library sites Zotero can automatically
extract the full reference information, which includes authorship
data, abstracts, page references, locations, etc.
This provides a wealth of information that can then be
submitted for analysis to SEASR.
Once the data are converted into RDF (a process which
is transparent to the user), it can be sent for processing, at
user’s request, to any of a number of available data analysis
algorithms. When such a data processing request
is received, the extension establishes a communication
channel with the web service associated with the processing
flow, through which the RDF data are submitted.
After processing completes, the results are retrieved via
the same communication channel and displayed in a new
browser window. Depending on the complexity of the
type of processing requested, there may be a significant
delay until the results are retrieved.
Figure 1. Plugin for Zotero that requests SEASR analytics.
The extension provides a flexible mechanism through
which the user can specify which data processing flows
they want to have access to, by configuring a list of SEASR
servers where these flows are hosted. This way, the
user can include any number of Zotero-compatible data
processing flows hosted by 3rd party organizations.
3. Data Analytics
The SEASR team has been integrating a variety of tools
as well as developing our own analytics. Currently we
have integrated natural language processing tools (NLP)
and current research algorithms from our data mining
collaborators as well as transformation components to
allow for data movement between the different components.
We have enabled some very simple and straightforward
requests, like word counts, information regarding part of
speech, and entity extraction capabilities. We also have
additional machine learning approaches that can be leveraged,
like clustering, frequent pattern analysis, predictive
modeling, graph mining, and sequence analysis. We
have currently integrated D2K (Data to Knowledge) [4]
and T2K (Text to Knowledge) analysis, OpenNLP [5],
and GATE (General Architecture for Text Engineering)
[6]. This means that from your Zotero collection, you
can ask for a social network analysis based on authors
and other metadata. You can ask for a tag cloud of all
your notes. You can ask for a tag cloud of a particular
work or collection. You can cluster the documents from
your collection. You can track a character or set of terms
throughout a book or collection. You can look at extracted
entities like locations on a Google map [7]. You
can look at extracted entities like date on a timeline like
Simile [8]. You can build a social network of the people
mentioned in your collection.
4. Data Visualization
As with the data analysis, a number of visualization tools
exist, so we have been working to integrate with these
tools rather than redeveloping. We have incorporated
visualizations from D2K as applets such as frequent pattern analysis as well as several of the predictive modeling
visualizations. We have also leveraged code to create
a tag cloud [9]. We are providing link-node charts and
stacked bar charts via flare [10]. The collage of example
visualizations below is meant to provide an idea of the
visual metaphors being used.
5. Future Work
We continue developing analysis and visualization capabilities
that can be leveraged by Zotero. As part of our
Pathways to SEASR Workshops, we are demonstrating
this tool integration and are establishing collaborations
with workshop teams. These teams are exploring specific
use cases to demonstrate scholarly research that can
be easily added to this plugin environment. We are looking
to improve the interaction between the plugin and the
SEASR framework and its ability to provide users with
visual interfaces for customizing the execution of flows.
Figure 2. Collage of visual metaphors available with
SEASR analytics.
6. Summary
In summary, we have created a tool that facilitates the
communication of Zotero collections data with SEASR
for further study and research. We have linked SEASR,
a strong and flexible tool that can add research capabilities
to these text assets. SEASR allows for the use of
its existing analysis and visualization tools, and more,
it allows for the integration of other tools through a
mashup process. The result of this effort is a synergy—a
strengthening of both Zotero and SEASR—as useable
tools for cyberscholarship.
7. Acknowledgement
The SEASR project is funded by The Andrew W. Mellon
Foundation.
8. References
1. Zotero, http://www.zotero.org
2. SEASR, http://seasr.org
3. Llorà, Ács B, Auvil LS, Capitanu B, Welge ME,
Goldberg DE (2008) Meandre: Semantic-Driven Data-
Intensive Flows in the Clouds, in Proceedings of IEEE
Fourth International Conference on eScience, 238-245,
IEEE Press.
4. D2K, http://alg.ncsa.uiuc.edu/do/tools/d2k
5. OpenNLP, http://opennlp.sourceforge.net/
6. H. Cunningham, D. Maynard, K. Bontcheva, V.
Tablan. GATE: A Framework and Graphical Development
Environment for Robust NLP Tools and Applications.
Proceedings of the 40th Anniversary Meeting of
the Association for Computational Linguistics (ACL’02).
Philadelphia, July 2002.
7. Google Maps API, shttp://code.google.com/apis/
maps/
8. Simile Timeline, http://www.simile-widgets.org/timeline/
9. Tag Cloud, http://emumarketing.uoregon.edu/
paul/2008/09/28/the-new-tag-cloud/
10. Flare, http://flare.prefuse.org/

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None