Minna de Honkoku: Learning-driven Crowdsourced Transcription of 
Pre-modern Japanese Earthquake Records

paper, specified "long paper"
Authorship
  1. 1. Yuta Hashimoto

    National Museum of Japanese History

  2. 2. Yasuyuki Kano

    Kyoto University

  3. 3. Ichiro Nakasnishi

    Kyoto University

  4. 4. Junzo Ohmura

    Bukkyo University

  5. 5. Yoko Odagi

    Kyoto University

  6. 6. Kentaro Hattori

    Kyoto University

  7. 7. Tama Amano

    Kyoto University

  8. 8. Tomoyo Kuba

    Kyoto University

  9. 9. Haruno Sakai

    Tokyo Metropolitan Library

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
In the last decade, crowdsourcing has become a major technique for transcribing a large volume of historical manuscripts. The volunteers of Transcribe Bentham

http://www.transcribe-bentham.da.ulcc.ac.uk/.

have transcribed more than 19,000 pages of manuscripts written by Jeremy Bentham (Causer and Wallace 2012). More than 480,000 pages of weather observations from the US Government Arctic logbooks written in the 19th century were transcribed by 4,730 people through the Old Weather

https://www.oldweather.org/.

project (Eveleigh et al. 2013).

However, managing a crowdsourcing project remains a big challenge for humanities scholars. The following practical difficulties are encountered:

The need to draw public attention to the project successfully.
The need to encourage participants’ long-term involvement.
The tasks requiring crowdsourcing in humanities studies (e.g. transcribing ancient handwritten manuscripts) are often difficult for non-trained participants.

In case of Japanese Studies, the last difficulty is particularly crucial; due to the drastic change in the writing system that occurred at the end of 19th century, 99% of modern Japanese people are unable to read
kuzushiji, classical calligraphic renderings of Japanese characters that were common for both publishing and handwriting. Therefore, the crowdsourcing technique has never been successfully applied to pre-modern Japanese materials.

However, humanities scholars can use education to draw the attention of a large number of people, promote their long-term participation, and train them to tackle difficult tasks. The fundamental idea in this paper is to develop a crowdsourcing system embedded in a collaborative learning environment that enables learners to conduct crowdsourced tasks as a part of their learning with their peers.
Minna de Honkoku

The literal translation of Minna de Honkoku in English is “Transcribe with everyone.” Also, the video tutorial of Minna de Honkoku in English is available at:
https://www.youtube.com/watch?v=iX5xN4vZeao.

(
https://honkoku.org/) is a crowdsourced transcription project of pre-modern Japanese earthquake records, developed by the members of the Historical Earthquake Study Group (HESG) at Kyoto University based on this idea. In this paper, we will briefly describe the aim, materials, approach, and results of Minna de Honkoku.

The Background and aim of the project
HESG is a joint group of seismologists and historians including the authors at Kyoto University who have been studying pre-modern earthquake records for seismic research and disaster prevention. Since instrumental observation of earthquakes in Japan began only after the end of 19th century, transcribing written records are required for studying past earthquakes. Therefore, Japanese seismologists have developed an extensive collaboration with historians and archivists.
However, the number of records to be transcribed is vast and cannot be handled by a small group of scholars. This prompted the members of HESG to think of using crowdsourcing for transcribing historical earthquake records.
We have set the first goal of our project, Minna de Honkoku, to transcribe all the 114 books from the Ishimoto Collection, which is composed of historical earthquake records collected by a seismologist Mishio Ishimoto (1893-1940) and digitized by Earthquake Research Institute (ERI), Tokyo University. The number of pages in the books ranges from 14 to 268. The total number of pages across the 114 books is 6,386. Each digital image in the collection contains two pages, as presented in Figure. 1.

An example of two digitized pages in a book from the Ishimoto Collection

The challenge and our approach
The biggest challenge of our project is to crowdsource the reading of
kuzushiji, which is illegible for most modern Japanese people except trained experts. Our approach to this challenge is to design our crowdsourcing system as an online learning environment where participants can learn
kuzushiji by transcribing the earthquake records in a collaborative manner.

More specifically, Minna de Honkoku integrates crowdsourcing with online learning in the following two ways:

Collaboration with a mobile learning app: Minna de Honkoku collaborates with KuLA

Android version:
https://play.google.com/store/apps/details?id=yuta.hashimoto.kula and

iOS version:
https://itunes.apple.com/jp/app/id1076911000.

(Kuzushiji Learning App), a mobile learning app for reading
kuzushiji that was developed by one of the authors (Hashimoto 2017) and has been downloaded 85,000 times since its release in 2016 (see Figure. 2). After completing a set of basic lessons for reading
kuzushiji, the users of KuLA are invited to Minna de Honkoku as an opportunity to acquire more practical training by transcribing actual materials from pre-modern Japan. They can thus begin participating in the project as a continuation of their learning.

Collaborative learning through distributed proofreading: Transcribing
kuzushiji correctly is quite difficult, and beginners usually make a lot of mistakes. For quality control of transcriptions, Minna de Honkoku uses “distributed proofreading” adopted by Project Gutenberg (Newby 2003) but with an educational purpose; when you finish transcribing an image from a book on the transcription editor of Minna de Honkoku (see Figure. 3), your transcription will be shared and reviewed by other participants on the timeline that shows user activities in real-time (see Figure. 4). When another participant makes corrections on your transcription, you will receive a notification with the feedback, informing you of the mistakes you made and the corrections (see Figure. 5, 6).

Screenshots of KuLA

Transcription editor of Minna de Honkoku

The timeline view of user activities

The notification panel

Corrections made by another participant (added texts are colored in green and deleted texts in red)

The results
The website of Minna de Honkoku was launched on January 10, 2017. The transcription of 114 books (6,386 pages) from the Ishimoto Collection was completed on May 31, 2017. Thus, our initial goal was completed in less than five months since the project launch. We extended our goal and added another 223 books stored in ERI. As of November 2017, 271 books out of 337 (9,254 pages out of 9,716) including those from the Ishimoto Collection have been transcribed by volunteers. A total number of 3.12 million characters have been transcribed.
A total of 3,457 people have registered an account, and 285 of them have transcribed at least one character on the website. While we were unable to include all registered users in the transcription process, a small number of regular volunteers have eagerly contributed to the project: 35 users have transcribed more than 10,000 characters, and 6 of them more than 100,000.

The background and motivations of the participants
In order to understand the backgrounds and motivations of the participants, we administered an online questionnaire to them via Google Form between March 8 to May 13, 2017. We obtained responses from 64 participants. The following is a brief summary of the questionnaire results:

70% of respondents (45 people) are KuLA users.
We asked the respondents to choose the reasons of their participations from 12 pre-defined choices (multiple choices up to three are allowed). The most selected reasons are as follows:

“Transcribing historical manuscripts is fun” (70%, 45 choices). 
“I can learn from other participants’ transcriptions and reviews” (50%, 32 choices).
“I can contribute to seismic research and disaster prevention through the project” (44%, 28 choices).

The results above suggest the following: (1) KuLA works effectively as an “entrance” to Minna de Honkoku, and (2) the possibilities of collaborative learning greatly motivate the participants, although the most powerful motivation is the enjoyment gained from transcribing.

Conclusion
In this paper, we have described the background, aim, approach, and results of Minna de Honkoku, a crowdsourced transcription of historical earthquake records of pre-modern Japan. It had been often said that crowdsourced transcription of pre-modern Japanese materials is not possible because reading
kuzushiji is too difficult for non-trained volunteers. However, our learning-centered approach appears to have achieved considerable success. The same approach may also be used in many other countries that are facing difficulties in reading historical manuscripts due to changes in writing systems.

Lastly, desire to learn is one of the most fundamental characteristics of human beings, fulfilling which is one of the important roles of a scholar as a teacher. We therefore believe that considering academic crowdsourcing in the context of education will bring beneficial outcomes.

Bibliography
Causer, T., and Wallace V. (2012). Building a volunteer community: results and findings from Transcribe Bentham. Digital Humanities Quarterly, 6(2).
Eveleigh, A., et al. (2013). “I want to be a Captain! I want to be a Captain!”: Gamification in the Old Weather Citizen Science Project.” Proceedings of the first international conference on gameful design, research, and applications. ACM.
Hashimoto, Y., et al. (2017). The Kuzushiji Project: Developing a Mobile Learning Application for Reading Early Modern Japanese Texts. Digital Humanities Quarterly, 11(1).
Newby, G. B., and Franks, C. (2003). Distributed proofreading. In Digital Libraries, 2003. Proceedings. 2003 Joint Conference on (pp. 361-363). IEEE.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO / EHD - 2018
"Puentes/Bridges"

Hosted at El Colegio de México, Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico)

Mexico City, Mexico

June 26, 2018 - June 29, 2018

340 works by 859 authors indexed

Conference website: https://dh2018.adho.org/

Series: ADHO (13), EHD (4)

Organizers: ADHO