gabmap - A Web Application for Measuring and Visualizing Distances Between Language Varieties

workshop / tutorial
Authorship
  1. 1. John Nerbonne

    Rijksuniversiteit Groningen (University of Groningen)

  2. 2. Charlotte Gooskens

    Rijksuniversiteit Groningen (University of Groningen)

  3. 3. Peter Kleiweg

    Rijksuniversiteit Groningen (University of Groningen)

  4. 4. Therese Leinonen

    Rijksuniversiteit Groningen (University of Groningen)

  5. 5. Martijn Wieling

    Rijksuniversiteit Groningen (University of Groningen)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

gabmap – A Web Application for Measuring and Visualizing Distances Between Language Varieties
Nerbonne, John, University of Groningen, j.nerbonne@rug.n
Gooskens, Charlotte, University of Groningen, c.s.gooskens@rug.nl
Kleiweg, Peter, University of Groningen, p.c.j.kleiweg@rug.nl
Leinonen, Therese, University of Groningen, t.leinonen@rug.nl
Wieling, Martijn, University of Groningen, wieling@gmail.com
We frequently ask in linguistics, especially in dialectology and comparative linguistics, how similar linguistic varieties are to one another, effectively asking how similar linguistic culture is from one site to another. We operationalize the question more specifically by asking e.g. how similar the vocabulary of one variety is to another, or more interestingly how similar the pronunciations of a set of varieties are, sampled via the pronunciations of the same set of at least 30 words at a range of sites. Since there may be thousands of words and hundreds of sites, the questions must be addressed computationally. The techniques embodied in the web application have been used in dozens of scholarly papers on dialectology (see references).

At the University of Groningen the gabmap application has been developed that is capable of measuring differences in linguistic samples, including in particular sets of phonetic (or phonemic) transcriptions, to project present the results graphically onto maps. Gabmap is a graphical user interface that implements not only the comparison of vocabulary or other categorical data (essentially as percentage overlap or percentage difference) but also that of pronunciations via edit distance. Because the software is implemented as a web application users are not required to download it nor to keep it up to date by following releases. It is fairly user friendly and easily accessible and therefore enables experimentation with different techniques popular among linguists from various fields, especially dialectology and variationist linguistics.

During the workshop we will give some theoretical background about dialectometry followed by a tutorial where the theory is put into practice with exercises showing how to use the web-application. We have given similar courses in dialectology previously, for example during the Linguistic Society of America Linguistics Institute in 2005 at MIT and to the special meeting of the Forum Sprachvariation of the Internationale Gesellschaft füt deutsche Dialektologie in Erlangen in Oct. 2010 (www.sprachwissenschaft.uni-erlangen.de/tagung/programm.shtml). The workshop proposed here will be like the second in that it will include hands-on sessions.

The workshop will be structured as follows:

Introduction to dialectometry
Data entry: uploading dialect data, creating and uploading maps
Data inspection: data distribution and error detection
Measuring linguistic distances
Graphical presentations of linguistic distances: dialect maps
Statistical analyses: multidimensional scaling and clustering
Data mining, identifying influential individual variables (words, pronunciation variants)
We have named the gabmap collaborators as co-authors of the tutorial, but only Nerbonne and maximally one other will offer the tutorial. We can accommodate up to 20 participants.

We add a note to potential participants from non-linguistic fields. In theory one might ask the same questions of non-linguistic culture that we ask of linguistic culture, namely to what degree is e.g. the material culture of one settlement similar to that of another. We suspect that one might attack the non-linguistic question using techniques similar to the ones we will demonstrate during this tutorial, i.e. one might gather question as, but the point is purely theoretical so far, although we would welcome the chance to examine the question in a data-intensive way. If such studies are carried out, we suspect that at least the mapping facilities we demonstrate in this tutorial will be useful.

References:
Alewijnse, B., Nerbonne, J., van der Veen, L. & Manni, F. 2007 “A Computational Analysis of Gabon Varieties, ”

Proceedings of the RANLP Workshop on Computational Phonology. In P. Osenova et al. (eds.), 3–12 Borovetz,

Gooskens, C. & Heeringa, W. 2004 “Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data, ” Language Variation and Change, 16 3 189–207

Heeringa, W. 2004 Measuring dialect pronunciation differences using Levenshtein distance. Ph.D. thesis, University of Groningen,

Kessler, B. 1995 “Computational dialectology in Irish Gaelic, ” In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, 60–67 Dublin EACL

Leinonen, Therese 2008 “Factor Analysis of Vowel Pronunciation in Swedish Dialects, ” International Journal of Humanities and Arts Computing, 2(1-2) 189-204

Nerbonne, J. 2009 “Data-driven dialectology, ” Language and Linguistics Compass, 3(1) 175–198

Nerbonne, J. & Siedle, C. 2005 “Dialektklassifikation auf der Grundlage Aggregierter Ausspracheunterschiede, ” Zeitschrift für Dialektologie und Linguistik, 72(2) 129–147

Prokic, J., Nerbonne, J., Zhobov, V., Osenova, P., Simov, K., Zastrow, T. & Hinrichs, E. 2009 “The Computational Analysis of Bulgarian Dialect Pronunciation, ” Serdica Journal of Computing, 3(3) 269–298

Spruit, M. 2006 “Measuring syntactic variation in Dutch dialects, ” Literary and Linguistic Computing, special issue on Progress in Dialectometry: Toward Explanation [Nerbonne, J., Kretzschmar, W. (eds)], 21(4) 493–506

Yang, C. & Castro, A. 2008 “Representing Tone in Levenshtein Distance, ” International Journal of Humanities and Arts Computing, 2(1-2) 205–219

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None