An Assistant Tool for Verse-making in Basque-based on Two-Level Morphology

poster / demo / art installation
Authorship
  1. 1. Bertol Arrieta

    University of the Basque Country

  2. 2. Xabier Arregi

    University of the Basque Country

  3. 3. Iñaki Alegria

    University of the Basque Country

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction

In this paper we present a specialised word generator, which aims to be an assistant tool for Basque troubadours. Such a tool allows verse-writers to generate all the words that match with a given word termination. We coped with some interesting aspects, i.e. the dimensions of the generated list and the need to establish an order of relevance among the listed items.

This work can be seen as a way of re-using computational linguistic tools in the context of the Basque cultural means of expression. The technical foundations of this tool lie in a two-level morphological processor. The way in which words must be generated (starting from the end of the word) leads us to inverse the generation process.

"Bertsolaritza": What Is It?

"Bertsolaritza" (the Basque term for verse-making) is an oral or written literary form with old tradition and great popularity in the Basque Country. Similar forms are manifested in other countries like Cuba.

While the written mode is similar to poetry, the oral mode has a peculiarity: troubadours sing verses without previously knowing the theme. In other words, a theme is given to the singers and in a few seconds they have to think of a set of verses adjusted to the theme. These verses must hold to the formal conditions (measurement and rhyme) of the discipline.

This verse-making task is quite difficult, so great expertise is required. Because of that, some schools are devoted to teaching how to improvise this type of verses. From our view, the tool we are presenting may be quite useful in the verse-schools. For some decades, an oral verse-making competition has been organised in the Basque Country every four years. The high diffusion of this event (thousand of Basques follow this competition with great interest, live or on TV) is a clear demonstration of the importance of this discipline. From this background was formed the idea of designing the tool here presented. We hope that such an application will be a useful assistance-tool in the task of finding rhymes, namely for those inexperienced troubadours.

Reversing Of The Morphological Description

To make this tool we have re-used a morphological analyzer/generator for Basque developed few years before (Alegria et al., 96) and integrated several tools such as spelling correctors and ICALL systems (Maritxalar et al., 97). The morphological description is based on the Koskenniemi's two-level morphology model (Koskenniemi, 83).

The two-level system is based on two main components:

A lexicon where the morphemes (lemmas and affixes) and the possible links among them (morphotactics) are defined. The lexicon is divided into different sublexicons and each lexicon entry specifies its morphotactical information by means of a continuation class which is a set of sublexicons. Combining sublexicons (nodes) and continuation classes (arcs) the graph of morphotactics is defined.
A set of rules which controls the mapping between the lexical level and the surface level (changes at surface level when morphemes are linked) due to the morphophonological transformations (morphophonemics).
In order to get our inverted morphological analyser/generator for Basque we needed to reverse this morphological description. The goal is to build an inverted morphological generator for Basque, which will control the order of the proposals according to their suitability for being a rhyme. The inverted morphological generator will obtain all the possible forms corresponding to a known ending, instead of generating the possible forms corresponding to the beginning. We took into account two choices to reverse the morphological description.
The first one consists of manipulating the automata that is created from the morphological description of the Basque. This option initially looked good because we did not need to manipulate the lexicon and the rules; we only manipulated the automata. But, analysing this option in depth, we realised that our inverted Deterministic Finite Automata (DFA) would actually become a Non-Deterministic Finite Automata (NDFA) in an intermediate state of the transformation process; and trying to re-convert the NDFA in a DFA would cause a combining explosion.
The second option consists of manipulating and reversing the lexicon and the rules directly, before using the compilers (Karttunen and Beesley, 92)(Karttunen, 93). This approach, therefore, involves the implementation of the programs that invert the lexicon, the morphotactics and the phonological rules automatically.
Considering the risks of the first choice, we decided to develop the second method. This process was divided into three steps:
Reversing the lexicon: This task deals with the inversion of all the morphemes. The order of the characters inside the morphemes is inverted. For instance 'big' would be converted to 'gib'.
Converting the continuation classes in "backward classes": The basis of the morphotactics in the two-level model is the continuation classes (Koskenniemi, 83). We have programmed a script to convert the continuation classes in "backward classes", so that we have a group of morphemes that can go before an inverted morpheme. This looks easy, but it has some problems. Lexicons containing final classes have to be defined as root lexicons, and consequently the backward class of the original root lexicons must be null.


For example: Let ADJECT be the continuation class in adjectives with two syllables or less. Suppose that this class has a unique lexicon containing the stems -er, and -est, and that the continuation class of these stems is null. Once the conversion has been made, -er and -est will be in the root lexicon and in their backward class will be included the adjectives with two syllables or less.

Reversing the rules: The rules are expressed as following:


<correspondence> <operator> <left context> _ <right context>

To reverse the rules only contexts have to be changed, interchanging between them and reversing each one. The contexts are regular expressions and it is necessary to distinguish between data (to be reversed) and regular operators and reserved characters.
For example, the rule y:i <=> _ +: s #:; will be converted to y:i <=> #: s +:_ ;

Application To "Bertsolaritza": Finding Words That Rhyme With An Ending

Once the inverted analyser/generator for Basque was developed, we tried to reuse it in an application that got the rhymes based on the final part of a word. We needed to invert the character sequence given by the user and then launch the generation with our inverted morphological generator tool. The output of the generation process - that is, all the words that match with the given ending - must be inverted before showing them to the user in a Tcl-Tk made screen. In this way our tool returns all the Basque words that have the same final sequence of characters as the sequence given by the user. So, the application finds all the words that rhyme with the word-ending given by the user.

In order to improve the usefulness of the application, we considered it necessary to face the problem of the huge quantity of generated words that match with the sequence given by the user. Two solutions were implemented:

1. Establishing a kind of categorisation or class-partition among the morphemes, so that only one example (representative of the class) is returned when all the elements of the class are suitable to be shown. For instance, if the input is 'est', instead of returning all the adjectives with the superlative form added (too long!)

big + est--> biggest
small + est --> smallest
thin + est --> thinnest
...

the application will return only one example and a short explanation:

BIG+ est --> biggest (ADJECTIVE + est)

2. Returning words sorted in the order that verse- makers appreciate more. The quality of the rhyme is better if the word is not composed or declined. In the example above it would be better to use rhymes like 'guest' than words declined like 'smallest'.

Conclusions And Future Improvements

Basque is a Pre-Indo-European language of unknown origin and quite different from the surrounding European languages. The declension of the Basque language has fourteen different forms for each singular, plural and undefined form. All of these forms are added at the end of the words. Besides, it is an agglutinative language which accepts morphemes being added to other morphemes. These characteristics show us the relevance of the final parts of the Basque words. That reason leads us to think that the inverted morphological analyser/generator would be useful for different applications. We have found an interesting use for such a generator in the world of the "bertsolaritza". Given that final parts of words (rhymes) are very important in verses, the inverted morphological analyser/generator can be an important assistant tool for writing verses. Furthermore, an automatic method for inverting the morphological description has been defined. Such a method can be reused in any other language, always starting from a two-level description.

We are considering as future works, (i) returning words with assonance rhyme; (ii) dealing with semantics in the selection module in order to improve the order of presentation, and (iii) publishing the application as a web page.

Acknowledgements

We would like to thank Xerox for letting us use their tools, and specially to Lauri Karttunen.

References

Alegria, I., Artola, X., Sarasola, K. and Urkia, M. (1996). Automatic Morphological Analysis of Basque. Literary and Linguistic Computing 11 (4): 193-203. Oxford University Press.
Karttunen, L. and Beesley, K.R. (1992). Two-Level Rule Compiler. Xerox ISTL-NLTT-1992-2.
Karttunen, L. (1993). Finite-State Lexicon Compiler. Xerox ISTL-NLTT-1993-04-02.
Karttunen, L. (1994). Constructing Lexical Transducers. Proc. of COLING«94. 406-411.
Koskenniemi, K. (1983). Two-level Morphology: A general Computational Model for Word-Form Recognition and Production. University of Helsinki, Dept of General Linguistics. Publications n* 11.
Maritxalar ,M., Diaz de Ilarraza, A. and Oronoz, M. (1997). From Psycholinguistic Modelling of Interlingua to a Computational Model. Proc. Of CONLL97 Workshop (ACL Conference). Madrid 1997.
Lekuona et al. (1980). Bertsolaritza. Jakin 14. eta 15. Donostia 1980.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000

Hosted at University of Glasgow

Glasgow, Scotland, United Kingdom

July 21, 2000 - July 25, 2000

104 works by 187 authors indexed

Affiliations need to be double-checked.

Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/

Series: ALLC/EADH (27), ACH/ICCH (20), ACH/ALLC (12)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None