Finding the Same Artworks from Multiple Databases in Different Languages

poster / demo / art installation
Authorship
  1. 1. Taisuke Kimura

    College of Information Science and Engineering - Ritsumeikan University

  2. 2. Biligsaikhan Batjargal

    Kinugasa Research Organization - Ritsumeikan University

  3. 3. Fuminori Kimura

    Kinugasa Research Organization - Ritsumeikan University

  4. 4. Akira Maeda

    College of Information Science and Engineering - Ritsumeikan University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Finding the Same Artworks from Multiple Databases in Different Languages

Kimura
Taisuke

College of Information Science and Engineering, Ritsumeikan University, Japan
is0013hh@ed.ritsumei.ac.jp

Batjargal
Biligsaikhan

Kinugasa Research Organization, Ritsumeikan University, Japan
biligsaikhan@gmail.com

Kimura
Fuminori

Kinugasa Research Organization, Ritsumeikan University, Japan
fkimura@is.ritsumei.ac.jp

Maeda
Akira

College of Information Science and Engineering, Ritsumeikan University, Japan
amaeda@is.ritsumei.ac.jp

2014-12-19T13:50:00Z

Paul Arthur, University of Western Sidney

Locked Bag 1797
Penrith NSW 2751
Australia
Paul Arthur

Converted from a Word document

DHConvalidator

Paper

Poster

multilingual record linkage
Japanese arts
Ukiyo-e
GLAM

databases & dbms
metadata
multilingual / multicultural approaches
GLAM: galleries
libraries
archives
museums
English

This paper discusses a method for identifying the same artworks across multiple databases by using textual metadata written in different languages. As more and more libraries, museums, galleries, and archives are making their collections available online, it is essential to develop methods for accessing these vast and valuable collections of cultural heritage easily and thoroughly.
We aim to identify the same artworks whose metadata are in different languages since most existing methods, such as
record linkage (Fellegi and Sunter, 1969) and
duplicate detection (Bilenko and Mooney, 2003; Elmagarmid et al., 2007) use mainly one language.

We have developed a method for identifying the same Ukiyo-e prints in databases in English and Japanese (Batjargal et al., 2014). This method is particularly useful since the Japanese traditional woodblock printing—
Ukiyo-e—involves engraving, and many copies or variants of each particular work were made from the same woodblock. Most of these copies and variants were scattered around the world in the 19th century and are now stored in museums and galleries in many countries. Most of the metadata in these databases are available only in English or in the native language of each country. Titles are mostly written either as the transliteration of the original Japanese title or a translation into another language. Table 1 shows examples of metadata in databases that contain Ukiyo-e in their collections and the languages in which the titles are written.

One effective approach for identifying the same artworks in multiple image databases is to use image similarity calculations. Ukiyo-e.org
1 (Resig, 2013) provides a highly efficient method for identifying the same Ukiyo-e prints. The method exclusively uses image similarities rather than textual data to identify Ukiyo-e prints. Our textual-metadata-based method and image-similarity-based approach both have advantages and disadvantages. One of the advantages of our method is not having to harvest all the data from the databases beforehand. Furthermore, artifacts other than prints might not be suitable for identification using two-dimensional image similarities. It is better to combine both methods to obtain the most accurate results.

Original Ukiyo-e print

Title
Database

神奈川沖浪裏
(original title in Japanese)

神奈川沖浪裏
(original title in Japanese)

The Edo-Tokyo Museum

Kanagawa oki nami ura
(transliteration)
The great wave off shore of Kanagawa
(in English)

The Library of Congress

The great wave off Kanagawa
(in English)

National Gallery of Victoria

• La grande vague
• Sous la grande vague au large de Kanagawa
• Sous la vague au large de Kanagawa
(in French)

French Photo Agency

De grote golf bij Kanagawa
(in Dutch)

Netherlands State Museum

De grote golf bij KanegawaFugaku Sanjrokkei
(in Dutch)

Centre Céramique

• Große Woge
• Der Fuji hinter der großen Woge
• Die Welle
(in German)

Bildarchiv Foto Marburg

Proposed Method

We extended our previous method that identifies the same artworks by comparing transliterated Japanese titles with their English translations, both of which are written in the same script, i.e., Latin, by utilizing intermediate representations such as transcriptions or transliterations (Batjargal et al., 2014). In this paper, we aim to identify the same artworks by comparing original Japanese titles with their translations in other European languages. Our method utilizes proper nouns in a title as key elements for matching, and it uses translations of other words to further improve the matching accuracy. If a given word in the title is not a proper noun, we perform a literal translation using bilingual dictionaries—that is, all the words excluding the proper nouns are translated. The similarity degree is based on the weighting of matching words among titles in different languages. The similarity degree increases as matching words are found in the titles. In addition to the weighting of matching words, the partial string matching score is considered in the similarity calculation.
Figure 1 illustrates how the same artworks are identified from databases in different languages. First, the metadata elements are (1) translated as explained above. Then, the artworks are (2) filtered on the basis of the artist name by using our previous method that identifies uses artists’ various notations and aliases. Our previous method is capable of providing multilingual integrated access to diverse Ukiyo-e databases by generating links to miscellaneous Linked Data resources dynamically from the returned results using authority data. It allows users to access additional data about a certain artwork in multiple databases regardless of the languages and formats of each database (Batjargal et al., 2013). The proposed method then (3) calculates the similarity degree, and the artworks that have higher scores are (4) treated as the same as the given artwork just after obtaining the artworks from multiple databases.

Although this method has been tested only on Ukiyo-e databases, it does not depend on a particular type of art. Since this method requires only bilingual dictionaries as the language resource, it can be applied to other languages easily.

Experimental Evaluation
We have conducted preliminary experiments to evaluate the proposed method in finding the same artworks in multiple databases in different languages. In the experiments, 45 Japanese titles of the artist Katsushika Hokusai’s Ukiyo-e prints in the Edo-Tokyo Museum were checked against the 437 English titles in the Metropolitan Museum of Art, where each title has at least one identical record. The proposed method has achieved 86.7% accuracy in finding the same artworks between Japanese and English by successfully identifying 39 prints out of 45 (see Table 2). Further evaluations against other languages are under examination.

Condition

Success rate

Accuracy

Finding the same artworks between Japanese and English from multiple databases
39 of 45
86.7%

Table
. A preliminary experimental result of the proposed method

Summary
We developed a method for identifying the same artworks across multiple databases using textual metadata written in different languages. The proposed method is useful for both humanities researchers and database administrators. It provides researchers with easy and efficient ways of finding the same artworks in other databases regardless of the language. For database administrators, aggregating multiple metadata of the same artwork from various databases makes it possible to correct and/or enrich the existing metadata records to improve the quality of their database. The proposed method can be used to link the same or similar artworks across different databases, and it will contribute to enriching the Linked Open Data in the field of humanities (Batjargal et al., 2013).
Our future work includes extending the proposed method to other humanities disciplines and conducting evaluations against other languages.
Note
1. http://ukiyo-e.org/.

Bibliography

Batjargal, B., Kuyama, T., Kimura, F. and Maeda, A. (2013). Linked Data Driven Multilingual Access to Diverse Japanese Ukiyo-e Databases by Generating Links Dynamically.
Literary and Linguistic Computing, 28(4) (December): 522–30.

Batjargal, B., Kuyama, T., Kimura, F. and Maeda, A. (2014). Identifying the Same Records across Multiple Ukiyo-e Image Databases Using Textual Data in Different Languages.
Proceedings of Digital Libraries 2014: ACM/IEEE Joint Conference on Digital Libraries (
JCDL 2014)
and International Conference on Theory and Practice of Digital Libraries (
TPDL 2014), London, 8–12 September 2014.

Bilenko, M. and Mooney, R. (2003). On Evaluation and Training-Set Construction for Duplicate Detection. In
Proceedings of the KDD-03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, August 2003, pp. 7–12.

Elmagarmid, A., Ipeirotis, P. and Verykios, V. (2007). Duplicate Record Detection: A Survey.
IEEE Transactions on Knowledge and Data Engineering,
19(1) (January): 1–16.

Fellegi, I. P. and Sunter, A. B. (1969). A Theory for Record Linkage.
Journal of the American Statistical Association,
64(328) (December): 1183.

Resig, J. (2013). Aggregating and Analyzing Digitized Japanese Woodblock Prints.
3rd Annual Conference of the Japanese Association for Digital Humanities, Ritsumeikan University, Kyoto, Japan, 19–21 September 2013.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2015
"Global Digital Humanities"

Hosted at Western Sydney University

Sydney, Australia

June 29, 2015 - July 3, 2015

280 works by 609 authors indexed

Series: ADHO (10)

Organizers: ADHO