Pcube - Policarpo Petrocchi Project : The Architecture of a (Semantic) Digital Archive

paper
Authorship
  1. 1. Federico Meschini

    Tuscia University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The aim of the Pcube (Policarpo Petrocchi Project)
is to create a digital archive about the cultural
and intellectual production of Policarpo Petrocchi. This archive will preserve and disseminate digital objects and their associated metadata, using current technology and metadata standards and will implement features and
functionality associated with the Semantic Web. Policarpo
Petrocchi was an important intellectual and political figure
during the second half of the ninetheenth. Petrocchi is
famous as the author of the “Il Novo dizionario universale
della lingua italiana” (The new universal dictionary of Italian language), used widely in the early twentieth
century, and he was also the author of many literary works,
reviews and translations, as the L’Assommuàr of Émile Zola, and the founder of the society “Italia e Lavoro”.
Petrocchi’s cultural production has left us with an
“information legacy” composed of extremely
heterogeneous materials, which can be divided in three main categories. The first category is archival materials, including letters, press clippings and manuscripts. Second
is the published works, including literary works,
educational and grammar books, novels and translations. The final category is the iconographic material, composed
of loose photographs and a family photo album.
Moreover none of these materials has been digitized until
now, and the relative metadata have been catalogued in two separated databases: the first, for the archival
description, with WinISIS <http://www.unesco.org/
webworld/isis/isis.htm>, using a structure that has been designed to be ISAD(G) compatible; the second, for the bibliographic records, has been created with EasyCat <http://www.biblionauta.it/biblionauta/easycat.php>, using the ISBD(G) <http://www.ifla.org/VII/s13/pubs/isbdg.htm> format, with the possibility of an UNIMARC <http://www.ifla.org/VI/3/p1996-1/sec-uni.htm> export. Even if widely used in library and archival contexts, these two databases are far from optimal starting points, an OPAC using plug-in and extensions, they cannot be
directly integrated, and this is contrary to the goal of having
a homogeneous and open system as the infrastructure for this digital archive, which must integrate electronic texts, digital images and metadata about both the analog and digital objects. Currently, the best model for this kind of infrastructure is certainly the Open Archival Information
System (OAIS), with its concept of the different
types of “Information Packages” and in particular the
Archival Information Package (AIP). With the OAIS Model in mind the architecture of the Policarpo Petrocchi Digital Archive has been designed with three different levels. The first level is the data layer, perhaps the most delicate one, because it’s where actually lie the roots for any possibility of interoperability between the different data sets. Another classification of the different materials can be made using the nature of the digital representation
that will be obtained from them, being either visual
or textual. All the photos, most of the letters and
manuscripts, and the most important part of the literary works will be digitized in an image format, following the guidelines of the Digital Library Federation<http://www.diglib.org/>, with a high quality format for preservation
and a lower quality format for dissemination. XML will be used for the digitizing the content of some of the
archival documents and novels, and for the encoding of the metadata of all the analog items. EAD (Encoded
Archival Description) <http://www.loc.gov/ead> is the most suitable metadata format for the archival documents, and EAD allows links to digital representations, wheter textual or iconographic, from the EAD finding aid. For published articles and books, the MODS <http://www.loc.gov/standards/mods> schema a rich bibliographic metadata format, will be used. Another issue which will be described, is the possibility of conversion into EAD and MODS from the legacy data of the databases, in a direct way, using XML export features, with some string
manipulation through a programming language, or, when needed, in a manual way. The content of the books
and the manuscripts will be encoded using the TEI
Guidelines. The coherence and cohesion of all these
standards, and of all the related digital objects, can be achieved using the METS <http://www.loc.gov/
standards/mets> schema, the primary function of which is the encoding of descriptive, administrative and
structural metadata of the items constituting a digital object(14). The second level is the framework layer, the implementation of the software architecture which has to provide the basic functions of a digital library, using as a base the data of the first level. During the last couple of years the number of this kind of software programs,
and their availability in open-source mode, has constantly
increased(15). Notwithstanding this, due to the
characteristics of the Policarpo Pretrocchi Digital
Archive, what is needed is an high level of customization,
and the integrated use of the Apache Cocoon Framework <http://cocoon.apache.org> with the XML Database
eXist <http://exist.sourceforge.net> its probably the
most suitable choiche. Using these two programs, the
potentialities of the XML technologies XQuery and XSLT
offer a lot of possibilities, from textual and metadata
researches to electronic edition with multiple output
format and text/image visualization. The final level is the semantic layer, which aims to create a network of
relationships among the items contained in the archive and possibly external resources and thereby to offer advanced
navigation functionalities to the archive users. The
ontology takes as a model the CIDOC-Conceptual
Reference Model (CRM) <http://cidoc.ics.forth.gr>, which “provides definitions and a formal structure for describing the implicit and explicit concepts and
relationships used in cultural heritage documentation”. This abstract model must be rendered to an actual syntax and for this role has been chosen the ISO standard of the TopicMaps <http://www.topicmaps.org> with the XTM serialization. The reason of this choice is the growing
adoption of TopicMaps in the Digital Humanities
community, compared for example to RDF/OWL.
Starting from the particulare case of the Pcube project, this paper will analyze, extract and outline the general
guidelines and the best strategies in the creation of a
digital archive composed by very different starting
materials, in order to make these strategies applicable to several other projects, which are currently facing the same issue in crossing the borders from the old models
of the cultural heritage preservation towards the new
paradigms of the digital library.
References
Aa. Vv. (2000). ISAD (G) : general international
standard archival description. Ottawa, Canada. International Council on Archives.
Aa. Vv. (2002). Reference Model for an Open Archival
Information System (OAIS). Wasghinton, USA.
Consultative Committee forS pace Data Systems.
McDonough, J. (2002). Encoding Digital Objects with METS. in Tennant, R (ed). XML in Libraries.
Neal-Shuman Publisher. pp. 167-180.
Meschini, F. (2006). TMS – TEI Management Systems. in Aa.Vv. ODOK ’05. VÖB.
Sperberg-McQueen, C. M. and Burnard, L. (eds).
(2002). TEI P4: Guidelines for Electronic Text
Encoding and Interchange. Text Encoding Initiative
Consortium.
Tuohy, C. (2005) New Zealand Electronic Text Centre: Using XML Topic Map to present TEI. TEI Members Meeting 2005 Presentation. Bulgaria.
Walsh, J. (2005).TM4DH (Topic Maps for Digital
Humanities): Examples and an Open Source Toolkit. in
ALLC/ACH 2005 Proceedings. Victoria. Canada.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None