Haiku Author Recognition

paper, specified "short paper"
Authorship
  1. 1. Lubomir Ivanov

    Iona College

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Haiku Author Recognition1. IntroductionHaiku is a Japanese poetic form renowned for its brevity and expressiveness. Haiku derives from renga/renku – collaborative collections of verses with a 3-line opening hokku verse in the form 5-7-5 on (equiv. syllable). Matsuo Basho made famous the stand-alone hokku form, preserving the 5-7-5 on structure. The name haiku was associated with this form of hokku during 19th century.Four haiku authors rise in prominence above all: Matsuo Basho (17th century) is considered the “father” of haiku. Yosa Buson (18th century) focused on haiku as an art rather than a reflection of reality. Buson combined hokku with painting, inventing haiga (verse-painting). Kobayashi Issa (18-19th century) reinvented haiku through his depth of feeling and humanism. In the second half of the 19th century, Masaoka Shiki critically re-evaluated the art of haiku (coining the term), braking away from the traditional 5-7-5 form, and popularizing the poetic style beyond Japan.We present a study, which employs authorship attribution techniques to determine the distinctiveness of poetic styles in haiku, focusing on the poetry of Basho, Buson, Issa, and Shiki. There has been little work in the field of haiku attribution. A theoretical study of phonological complexity in haiku was presented in [1]. An approach to automatic evaluation of the quality of haiku was presented in [2]. An interesting work [3] deals with identifying unintended haiku in text. We approach haiku attribution as a classification problem: Given a set of attributed haikus, we train classifiers to recognize the writing style of each poet, and apply an ensemble of trained models to unattributed texts.2. Our Haiku Corpus The first step in creating our model was obtaining a haiku corpus. There are three approaches:Use actual haikus written in hiragana (a form of Japanese alphabet)Use Roman alphabet transcriptions (rōmaji) of haikus.Use English translations of haikus.While using hiragana haikus is arguably the best option, our software lacks the capability to process hiragana text. English translations of haikus are readily available, but while research suggests that the authorial signal is stronger than the translators’ [4], we do not know if that applies to haiku. We opted to construct a corpus of rōmaji transcribed haikus. This was difficult since most resources are either hiragana originals or translations. We obtained 723 haikus by Basho from [5], 842 haikus by Buson from [6], and 603 haikus by Issa from [7]. Finding transcribed Shiki haikus proved extremely challenging. Even though Shiki wrote over 24000 haikus, only a handful have been transcribed into rōmaji. Failing to secure transcriptions, we downloaded the full set of 24000 hiragana haikus from [8]. We then used an online hiragana-to-rōmaji transcription tool [9] to transcribe 967 randomly selected haikus by Shiki. Since many of the extracted haikus were organized alphabetically or by topic, we wrote Python code to randomly shuffle the order of the haikus for each author. A different program broke up the haikus into files of size 50 haikus each.3. Attribution MethodologyOur attribution software is a based on JGAAP [10] and implements an ensemble of classifier/stylistic-feature pairs [11,12]. For this study, we limited the set of stylistic features to character-2/3/4/5-grams (CnG), word-2/3-grams (WnG), vowel-initiated words (VIW), and first-word-in-sentence (FWIS). The classifiers used were support vector machines with sequential minimal optimization (SMO) and multilayer perceptrons (MLP).4. ResultsWe conducted several experiments, where we randomly chose one 50-haiku file for each author and removed it from the training set. We trained the classifiers on the remaining set of haikus using leave-one-out (L1O) validation. The results of the training for three sets of experiments are presented in Table 1:Table 1: Training Accuracy for Basho, Buson, Issa, and ShikiNext, we tested the authorship of the 50-haiku files that were left out of the training. The results of those experiments are presented in Table 2:Table 2: Attribution Results for Basho, Buson, Issa, and ShikiIt is quite clear that even with a reduced set of stylistic features, the attribution is very strong and the author identification definitive. We conducted an additional set of experiments, where we used each of the trained models to test the authorship of five haikus by the 18th century haiku poet Takarai Kikaku. The models were not trained on Kikaku, so, as expected, the results were split among two or more authors (Table 3):Table 3: Attribution Results for KikakuInterestingly, Kikaku was a prominent student and disciple of Basho, yet none of the models makes that association. This is most likely due to the small number of Kikaku haikus tested.5. Conclusion and Future WorkWe presented results from haiku author identification experiments, which suggest that haiku authorship can be determined even with a limited set of stylistic features from rōmaji-transcribed haikus. Our next efforts will be to experiment with a larger set of haiku authors, with English translations, and, possibly, with hiragana haikus. Among the questions we wish to answer are:What is the minimal set of haikus sufficient to identify an author?Is the authorial signal stronger than the translator’s for haiku translations?Can prosodic features be used for haiku author identification? Does the historical period affect the accuracy of attribution?

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO