Corpus Linguistic Techniques to Reveal Cypriot Dialect Information

poster / demo / art installation
Authorship
  1. 1. Katerina T. Frantzi

    Dept. of Mediterranean Studies - University of the Aegean

  2. 2. Christiana Loukaidou

    Dept. of Mediterranean Studies - University of the Aegean

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The objective of this work is the creation and
exploitation of a corpus consisting of traditional poems for the extraction of dialect information. We focus on the island of Cyprus. Though there have been various collections of traditional Cypriot songs and poems, they
all are in hardcopy forms and as such they have
been only processed manually. Language resources on minority languages are still in their infancy (McEnery et al., 2000). A corpus in electronic form however, can be processed using corpus linguistic techniques in a
complete, accurate and quick way that could not be achieved manually (Biber, 1998; Ooi, 1998).
The Cypriot dialect belongs to the Eastern-Greek
dialects with 18 local variations, while that and the Tsakonian dialect are the oldest Greek dialects still alive
(Κοντοσόπουλος, 2001). Because of Cyprus rich
history, words of various origins can be found in the
Cypriot dialect: the ancient Cypriot dialect of the Achaeans, the Homer age, the Hellinistic period, the Medieval French, the Italian, the Catalan, the Arabic, the Turkish, the English. The dialect is very rich in words and expressions with ancient roots. Depending on the area, the dialect cannot be easily (if at all) understood by
non-Cypriot Greeks. The dialect is mainly used in
spoken language. Otherwise, it is mainly used for writing down spoken language, e.g. poems, fairytales, theatre
dialogues etc. (Καρυολαίμου, 2001). Nowadays
however, due to the modern way of living, the media, tourism and education, young Cypriots tend to use the common modern Greek instead of their dialect.
The Cypriot traditional poems are not simply an
expression of their creators. They express the people who share the same geographical area and the same time of the poem’s creation. They express their lives, feelings,
hopes. They can be categorized into various types: love, satiric, immigration, history, social, religious,
celebration, national and more. The language they use is simple and direct, same as that used by common people. As such, the exploitation of the poems could reveal dialect
characteristics of the real language use (Χατζηιωάννου, 1999a; Χατζηιωάννου, 1999b).
We apply corpus linguistic techniques on a corpus of
Cypriot traditional songs that we created to extract dialect
information. We present a sample of the linguistic
information extracted for the geographical areas studied. Our corpus is currently consisted of traditional poems from two geographical areas, Nicosia and Famagusta.
The size of the Nicosia corpus, in terms of number
of words, is 88,637, while the size of the Famagusta
corpus is 82,625 words. For the comparisons between the two areas we normalize frequencies to 88,000 words. We introduce three dialect phenomena that we currently explore.
The first important dialect characteristic of the Cypriot dialect electronically explored, is the use of the archaic
verbal endings “-ασιν” and “-ουσιν” (pronounced
“-asin” and “-usin”) for the third person in plural, past and present tense respectively, instead of the common
modern Greek ones “-αν” and “-ουν” (pronounced “-an” and “-un”). For example, in the Nicosia corpus, we find the dialect word “επήρασιν” (pronounced “epirasin”)
instead of the common modern Greek “επήραν”
(pronounced “epiran”), meaning “they took”. The
archaic endings survive the after-Byzantium period and are used interchangeably to the modern Greek ones, which appear at the beginning of the Middle-Age period. It is
believed that the archaic endings are extensively used in some areas.
Over the total of verbs ending in “-ασιν” and “-αν”,
25.11% end in “-ασιν” in Nicosia, and 43.96 in
Famagusta. In the area of Famagusta, the percentage of the use of the dialectic ending “-ασιν” is almost double to that of Nicosia. The residents of Famagusta, being in a rather rural area, use the dialect without many changes. Over the total of verbs ending in “-ουσιν” and “-ουν”, 25.73% end in “-ουσιν” in Nicosia, and 32.79 in
Famagusta. The use of the archaic ending “-ουσιν”
is also higher in Famagusta than in Nicosia. The
phenomenon is explored deeper in terms of the type of verbs that prefer the archaic endings “-ουσιν” and
“-ασιν” and those that prefer the common modern Greek
“-ουν” and “-αν”. Another parameter for future
exploitation is the context analysis for both type endings,
i.e. could it be that archaic endings prefer a specific
syntactic of semantic environment?
A second interesting research question is on the use of the dialect negatives “εν” (pronounced “en”) and “μεν”
(pronounced “men”), instead of the common modern Greek “δεν” (pronounced “then”) and “μην” (pronounced “min”), all meaning “not”, e.g. “εν παίρνω” (pronounced “en perno”) instead of “δεν παίρνω” (pronounced “den perno”), meaning “I do not take”. We compare the use between “εν” and “δεν” and the use between “μεν” and “μην”. Regarding “δεν”, corpus processing provides us the information that in almost all the occurrences it is pronounced as “εν” in both geographical areas (“εν” is found in 92% of the cases in the Nicosia corpus, and 98% in the Famagusta corpus). Similar case we find for “μην”, where the dialectic “μεν” is found in 96% of the cases in the Nicosia corpus and in 100% of the cases in the Famagusta corpus. These findings agree (as before) to that dialect characteristics are stronger in Famagusta than in Nicosia. Again, further exploitation could be on the syntactic and semantic context that the rare common Greek negatives “δεν” and “μην” tend to prefer.
Tsitakism is another interesting dialect phenomenon to explore where the “κ” (“k”) is pronounced as full “τζ” (“dj”). Let us see three words found in both geographical areas: “και” (pronounced “ke”) meaning “and”, “καιρός” (pronounced “keros”), meaning “time” or “weather” and “κεφαλή” (pronounced “kefali”) meaning “head”. From our corpus we extract the information that the dialectic “τζ” (“dj”) is much more common than “κ” (“k”) for both geographical areas for all the three example words. The word “καιρός” is used 23 times in its dialectic form in Nicosia and only once in its common modern Greek form, while in Famagusta it only appears in its dialectic
form (10 occurrences). The word “κεφαλή” is only
used in its dialectic form in both geographical areas
(10 occurrences in Nicosia, 1 in Famagusta). The word “and” appears in Famagusta 17 times more often in its dialectic form (379 occurences) than in its common
modern Greek form (22 occurences) and in Nicosia
7 times more, i.e. 1640 on its dialectic form and 218 times on its common modern Greek form. Again, we observe the dialect phenomenon to be stronger in Famagusta.
We gave a small sample of the dialect information that can be extracted applying corpus linguistic techniques on a traditional Cypriot poems corpus. Future work focuses on the following:
• Enhancement of the corpus with more traditional poems covering more Cypriot geographical areas.
• Continuous processing of the dynamic corpus for updating the dialectic knowledge with complete and accurate information.
• Enhancement of the corpus with other forms of
dialect speech.
• Continuous processing of the dynamic corpus
for the extraction of extra-linguistic cultural
information.
• Comparisons of the extracted dialect and
extra-linguistic information with that extracted of the
Dodecanese traditional songs corpus for acquiring knowledge on similarities and differences among the dialects and language idioms (Frantzi, 2005).
• Construction of an electronic Cypriot dictionary of dialect words and collocations.
References
Biber, D., Conrad, S., Reppen, R. (1998). Corpus
Linguistics - Investigating Language Structure and Use. Cambridge University Press, Cambridge, U.K.
Frantzi, K.T. (2005). Preserving and Exploiting the
Dodecanese Traditional Songs. In Book of Extended
Abstracts of the 1st South-Eastern European
Digitization Initiative (SEEDI) Conference,
Digital Re_Discovery of Culture (Physicality
of Soul) – Playing Digital, Ohrid, FYROM,
September, 11-14 2005, pp. 41-45.
McEnery, A.M., Baker, P., Burnard, L. (2000). Corpus Resources and Minority Language Engineering. In M. Gavrilidou, G. Carayannis, S. Markantontou, S. Piperidis and G. Stainhauoer (eds) Proceedings of
the Second International Conference on Language Resources and Evaluation, Athens, Greece, 2000, pp.801-806.
Ooi, V. (1998). Computer Corpus Lexicography.
Edinburgh University Press, Edinburgh.
Καρυολαίμου, Μ. (2001). Η Ελληνική γλώσσα στην Κύπρο. Στο Εγκυκλοπαιδικός Οδηγός για τη γλώσσα, Α.-Φ. Χριστίδης επιστ. Υπευθ., Υπουργείο Παιδείας και Θρησκευμάτων, Θεσσαλονίκη, Κέντρο Ελληνικής Γλώσσας, 180-184, (in Greek).
Κοντοσόπουλος, Γ.Ν. (2001). Διάλεκτοι και ιδιώματα της Νέας Ελληνικής. Εκδόσεις Γρηγόρη, 3η έκδοση, Αθήνα (in Greek).
Χατζηιωάννου Κ. 1999a. Γραμματική της Ομιλούμενης
Κυπριακής Διαλέκτου με Ετυμολογικό προσάρτημα. Εκδόσεις Ταμασός, Λευκωσία, 1999 (in Greek).
Χατζηιωάννου Κ. 1999b. Ετυμολογικό Λεξικό της Ομιλούμενης Κυπριακής Διαλέκτου. Εκδόσεις Ταμασός, Λευκωσία, 1999 (in Greek).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None