A Bibliographic Utility for Digital Humanities Projects

poster / demo / art installation
Authorship
  1. 1. James Stout

    Brown University

  2. 2. Clifford Wulfman

    Brown University

  3. 3. Elli Mylonas

    Brown University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction
Many, if not most, digital humanities projects have a
bibliographical component. For some projects the collection,
annotation and dissemination of bibliographical information
on a specifi c topic is the primary focus. The majority, however,
include bibliography as a resource. Over the years, our group has
worked on several projects that have included a bibliography. In
each case, we were given bibliographical information to include
at the outset, which the scholar working on the project would
edit and amplify over time.
In the past we wrote several one-off bibliographic management
systems, each for a particular project. Each tool was tailored
to and also limited by the needs of its project. Over time we
ended up with several bibliographic components, each slightly
different in the way it was implemented and the way it handled
data formats. We decided to settle upon a general purpose tool,
in order to avoid writing any further single use applications,
which were diffi cult and time consuming to support.
We were primarily interested in a tool that could handle many
forms of bibliographic entity, from conventional scholarly
materials, serials and manuscripts to multimedia and websites.
It had to allow scholars to interact with it easily using a web
interface for single record input and editing, and also be capable
of loading large numbers of prepared records at once. We were
less concerned with output and text-formatting capabilities,
given the ability to produce structured XML output. We
expected to allow the individual projects to query the system
and format the output from the bibliographic tool.
Bibliographical management systems, although tedious to
implement, are a well-understood problem, and we originally
hoped to use a pre-existing tool. Upon surveying the fi eld,
we realized that the available tools were primarily single-user
citation managers, like EndNote, Bibtex, and Zotero. Our users
were compiling and manipulating scholarly bibliographies,
so we wanted something that was familiar to academics.
However, we also wanted the fl exibility and overall standards
compatibility of the library tools. great deal of the diffi culty of bibliographic management
arises out of the complexity of bibliographic references
and the problem of representing them. Many libraries use a
form of MARC to identify the different pieces of information
about a book. MARC records are not ideal for storing and
displaying scholarly bibliography; they aren’t meant to handle
parts of publications, and they don’t differentiate explicitly
among genres of publication. Many personal bibliographical
software applications store their information using the RIS
format, which provides appropriate categories of information
for most bibliographic genres (e.g. book, article, manuscript,
etc.). Hoenicka points out the fl aws in the underlying
representation of RIS, but agrees that it does a good job of
collecting appropriate citation information [Hoenicka2007a,
Hoenicka2007b]. We wanted to be able to use the RIS
categories, but not the RIS structures. We also hoped that we
would be able to store our bibliographic information in XML,
as we felt that it was a more versatile and appropriate format
for loosely structured, repeating, ordered data.
Personal bibliography tools were not appropriate for our
purposes because they were desktop applications, intended
to be used by a single person, and were optimized to produce
print output. At the time we were researching other tools,
RefDB had a very powerful engine, but no interface to speak
of. WIKINDX was not easily confi gurable and was not easy to
embed into other projects. Both RefDB and WIKINDX have
developed features over the last year, but we feel that the
system we developed is different enough in ways that render
it more versatile and easier to integrate into our projects.
Although RefDB is the system that is closest to Biblio, it uses
own SQL-based system for storing information about records,
whereas we were looking for an XML based system. RefDB
also allows the user to modify records in order to handle
new genres of material. However, the modifi cations take place
at the level of the user interface. As discussed below, Biblio
provides full control over the way that records are being
stored in the database.
Based our survey of existing tools and the needs we had
identifi ed for our projects, we decided to implement our own
bibliographic utility. We made sure to have a fl exible internal
representation, and a usable interface for data entry and
editing. We left all but the most basic display up to the project
that would be using the bibliography.
Implementation
Biblio is an XForms- and XQuery- based system for managing
bibliographic records using the MODS XML format. The pages
in the system are served by a combination of Orbeon, an
XForms implementation, and eXist, a native XML database that
supports XQuery. Using a simple interface served by eXist,
the user can create, edit, import, export, and delete MODS
records from a central collection in the database. When the
user chooses to edit a record, they are sent to an Orbeon
XForms page that allows them to modify the record and save
it back to the database.
By basing the backend of our system on XML, we can take
advantage of its unique features. Among the most useful is the
ability to easily validate records at each point changes may
have occurred. Validation begins when a document is fi rst
imported or created, to ensure that no invalid document is
allowed to enter the database. The document is validated
against the standard MODS schema, which allows us to easily
keep our system up to date with changes in the MODS format.
Once a record is in the system, we must ensure that future
editing maintains its integrity. Instead of simply validating upon
saving a document, we use on-the-fl y XForms validation to
inform the user immediately of any mistakes. The notifi cation
is an unobtrusive red exclamation mark that appears next to
the fi eld containing the error.
Although validation helps prevent mistakes, it does little to
protect the user from being exposed to the complexity and
generality of the MODS format. Because the MODS format is
designed to handle any type of bibliographic record, any single
record only needs a small subset of all the available elements.
Fortunately, most records can be categorized into “genres”,
such as “book” or “journal article”. Furthermore, each of these
genres will have several constant elements, such as their title.
To maximize workfl ow effi ciency and easy of use, we have
designed a highly extensible genre system that only exposes
users to the options that are relevant for the genre of the record
they are currently editing. The genre governs what forms the
user sees on the edit page, and hence what elements they are
able to insert into their MODS record. The genre defi nition
can also set default values for elements, and can allow the user
to duplicate certain elements or groups of elements (such as
a <name> element holding an author). When a record is saved
to the database, any unused elements are stripped away.
The genre system is also compatible with MODS records
that are imported from outside the system. We simply allow
the user to choose what genre best describes the imported
record, which allows the system to treat the record as if it
had been created by our system. The user can also select
“Previously Exported” if they are importing a document that
was created by our system, which will cause the appropriate
genre to be automatically selected.
We have designed a simple XML format for creating genre
defi nitions that makes it easy to add, remove, or change
what genres are available for the user. The format allows
the administrator to specify which elements are in a genre,
what type of fi eld should be used (including auto-complete,
selection, and hidden), what the default value of a fi eld should
be, and whether the user should be allowed to duplicate a
fi eld. All of our predefi ned genres also use this format, so it is
easy to tweak the system to fi t the needs of any organization.
Finally, once users have accumulated a large set of records of
various genres, we use the power of XML again to enable the
user to search and sort their records on a wide and extensible range of criteria. The administrator can easily specify new
search and sort criteria for any element in the MODS schema,
using the powerful XPath language.
Results
The bibliographic tool that we built provides an intelligent
fi lter between the academic’s concept of bibliography and
the cataloger’s concept of bibliographic record. We adopted
the fundamental representation of a bibliographic entry, the
MODS structure, from the library community because we
want to use their tools and their modes of access. At the same
time, we don’t expect scholars and their students to interact
directly either with MODS as an XML structure or MODS as
a cataloging system, so we mediate it through a RIS-inspired
set of categories that inform the user interface. The choice of
an XML structure also benefi ts the programmer, as this makes
it very easy to integrate Biblio into an existing web application
or framework. The resulting bibliographic utility has an easily
confi gurable, easy to use data entry front end, and provides
a generic, standards-based, re-confi gurable data store for
providing bibliographic information to digital projects.
Bibliography
[BibTeX] BibTeX. http://www.bibtex.org/
[Endnote] Endnote. http://www.endnote.com
[Hoenicka2007a] Marcus Hoenicka. “Deconstructing RIS
(part I)” (Blog entry). http://www.mhoenicka.de/system-cgi/
blog/index.php?itemid=515. Mar. 12, 2007
[Hoenicka2007b] Marcus Hoenicka. “Deconstructing RIS
(part II)” (Blog entry). http://www.mhoenicka.de/system-cgi/
blog/index.php?itemid=567. Apr. 23, 2007.
[MODS] MODS. http://www.loc.gov/standards/mods/
[Orbeon] Orbeon. http://www.orbeon.com/
[RefDB] RefDB. http://refdb.sourceforge.net
[RIS] RIS. http://www.refman.com/support/risformat_intro.
asp
[Wikindx] WIKINDX. http://wikindx.sourceforge.net
[Zotero] Zotero http://www.zotero.org

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2008

Hosted at University of Oulu

Oulu, Finland

June 25, 2008 - June 29, 2008

135 works by 231 authors indexed

Conference website: http://www.ekl.oulu.fi/dh2008/

Series: ADHO (3)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None