Cirilo Client: An application for data curation and content preservation

poster / demo / art installation
Authorship
  1. 1. Elisabeth Steiner

    Karl-Franzens Universität Graz (University of Graz)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

GAMS: A Fedora Commons instance

Since 2003 the Centre for Information Modeling - Austrian Centre for Digital Humanities at the University of Graz (Austria) provides an infrastructure for a variety of DH projects. After years of building insular solutions, the Centre introduced a powerful yet flexible new infrastructure, called GAMS (Geisteswissenschaftliches Asset Management System, AMS for the Humanities). It is based on the Fedora Commons architecture. Thus, the infrastructure inherits all features already provided by Fedora: full OAIS-compliance, strict separation of data and metadata, and predefined interfaces like OAI-PMH. A central advantage of the Fedora architecture is its object model: An asset consists of a primary source, some metadata and virtual representations derived from the primary source. The object is completely self-descriptive: It knows about all changes that have been made to it, its version history, datastreams and assigned context objects. Finally, it also knows about all possible representation forms. Each object contains all the necessary information to store, preserve, retrieve and view it.
Cirilo Client: Mass operations in Fedora made easy

Although Fedora is a powerful tool, front-end object management is not always easy, especially with regard to mass operations. The Centre has developed a tool for this use case, complementing Fedora’s built-in Admin Client. Cirilo is a java application developed for data curation and content preservation in Fedora-based repository systems. Content preservation and data curation in our sense include object management and creation, versioning, normalization and standards, and choice of data formats.
Cirilo makes use of Fedora’s management-API (API-M). It offers applications which are particularly prone to being used as tools for mass operations on Fedora repository objects, such as ingest or replacement processes: With Cirilo ingest processes can be performed from the file system, from an eXist database or an Excel spreadsheet. During the ingest metadata is automatically extracted from the source document and written to the newly created object (for instance in DC format).
The client operates on a collection of predefined content models which can be used without further adjustments for standard workflow scenarios like the management of collections of TEI objects. The content models, which are based on the Fedora object model, are class definitions: On the one hand they define the (MIME-)type of the contained data streams, on the other hand they designate dissemination methods operating on these data streams. Every object in the repository is an instance of one of these class definitions. The advantage of this concept lies in the fact that very complex data sources and workflows can be handled easily.
Currently, the client offers various content models for specific purposes, special emphasis lies on the TEI model. The TEI ingest processes can be flexibly costumized: during ingest policies for the extraction of semantic information can be applied, referenced images can be uploaded simultaneously and ontology concepts can be resolved. A new content model currently in development creates the appropriate ontology objects, especially SKOS objects. A designated query object makes it possible to pose queries with parameters to the Mulgara triplestore. With the help of these ontology and query objects dynamic indices can be created. There is a container object for the creation of collections available, which makes it easy to organize your resources. Finally, there are some models optimized for specified primary sources like METS/MODS, HTML, PDF, BibTeX or external resources accessible via an URL. A content model for linguistic resources is in development (in cooperation with ICLTT, Vienna). Currently, we are testing how controlled vocabularies and thesauri (for instance geonames.org), can be sensibly integrated in the system.
The user can assign numerous virtual representations via the client. The METS/MODS object is designed to be viewed in the DFG-Viewer. TEI objects can be directly used as the input for the Voyant Tools or the Versioning Machine. The members of a context object can be projected on a map using Google Maps. Basically any web-based service can be integrated into the infrastructure. Of course, user- and project-specific stylesheets are often employed.
The Cirilo Client will be made available as an open source software project, including documentation, as a contribution of the Centre for Information Modeling - Austrian Centre for Digital Humanities to DARIAH-AT in 2014.
References

DARIAH-EU: www.dariah.eu [2013-10-28]
DFG-Viewer: dfg-viewer.de/ueber-das-projekt [2013-10-28]
Fedora Commons: www.fedora-commons.org [2013-10-28]
Google Maps: maps.google.at [2013-10-28]
Geisteswissenschaftliches Asset Management System, AMS for the Humanities: gams.uni-graz.at [2013-10-28]
Carl Lagoze, Sandy Payette, Edwin Shin, Chris Wilper, Fedora (2005). An Architecture for Complex Objects and their Relationships. arxiv.org/ftp/cs/papers/0501/0501012.pdf [2013-10-28]
Versioning Machine: v-machine.org [2013-10-28]
Voyant Tools: voyant-tools.org [2013-10-28]

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO