CAT tools in DH training

poster / demo / art installation
Authorship
  1. 1. Anne Baillot

    Le Mans Université

  2. 2. Loïc Barrault

    Le Mans Université

  3. 3. Fethi Bougares

    Le Mans Université

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introducing Digital Humanities training in philology curricula seldom involves Computer-Assisted Translation, despite the fact that students in modern philologies are likely to use such digital methods or work environments during their training time, or even end up using them on a daily basis in their future job. Considered as a tool, CAT methods don’t really stand in their own right in a DH curriculum. Considered in a user-interface perspective though, or as an approach allowing to reflect on the impact of machine learning methods on the Humanities, CAT methods (e.g. their practice and the reflection on these) can legitimately be integrated in such a curriculum. One core aspect of this approach consists in reflecting the impact human translators can have, with their input in CAT tools, on such aspects as the enrichment of style, inclusion of context, idioms (such categories defined by Berman as those which translation tends to “destroy” most [1]) in the learning process of algorithms, e.g. how the interaction of the Humans with the machine can be conceived as a reciprocal one.
This poster presents the way we are integrating CAT-based translation classes training in a B.A. DH curriculum at the Institute for Modern Philologies in Le Mans Université. A first part of the poster is dedicated to the training setting (amount of sequences, methodological input, implementation of a reflection on the method, evaluation of the work done by the students in terms of translation quality and of reflection on the way CAT methods impact the work process, based on their end of term assignement). The two translation environments proposed to the students are MateCAT on the one hand, allowing them to learn how to edit a pre-processed translation, and multimodal approach with image descriptions[2], allowing to consider the impact of mental representations in the translation process. This infrastructure is based on the open source CAT tool MateCAT [3] equipped with machine translation systems trained with Moses[4]. The poster will show the role the Computer Science research department played in setting up a solid infrastructure for these two environments as well as the type of data that has been gathered from the student’s input.
Another aspect of this teaching setting that will be presented in the poster is the way the CAT-tools class is integrated in 1) the whole curriculum and 2) a DH training concept that starts at B.A., continues at M.A. level and extends up to PhD candidates. The poster will also pay a special attention to the coordination between year-long teaching of regular classes and the organization of intensive training events such as summer schools (Summer School held in June 2018, another one in planning for the Spring of 2020) or Translation Marathons like those held in February and March 2019 in Le Mans Université and to which B.A. students contribute. We will also discuss the reason why such a pedagogical experiment can be beneficial to small disciplines like German Studies are in France.

One final aspect to present in the poster concerns the connection of the pedagogical setting to research. The input gained from the CAT-training allows to gather useful data for research projects in the Computer Science department. More specifically, working with the German Studies department allows to collect high quality Human translations in German-French/French-German. We are organizing a machine translation shared task at the fourth Conference on Machine Translation (WMT19

http://www.statmt.org/wmt19/

) for a language pair not involving English (at source nor target side). The gathered corpus will be used as data to train state of the art machine translation systems involving deep learning techniques.

In the philologies, the class allows to envision translation studies not solely as a specialty anchored in one linguistic area only, in which German, English and Spanish studies work independently from one another, but on the contrary to foster a transdisciplinary approach enriched by NLP-based research.

Bibliography
[1] Berman, A.,
Translation and Trials of the Foreign, in:
The Experience of the Foreign: Culture and Translation in Romantic Germany. Albany : SUNY Press,
1992.

[2] Barrault, L., Bougares, F., Specia, L., Lala, C., Elliott, D.and Frank, S (2018). Findings of the third shared task on multimodal machine translation, Third Conference on Machine Translation (WMT18)
[3] M. Federico, N. Bertoldi, M. Cettolo, M. Negri, M. Turchi, M. Trombetti, A. Cattelan, A. Farina, D. Lupinetti, A. Martines, A. Massidda, H. Schwenk, L. Barrault, F. Blain, P. Koehn, C. Buck, U. Germann, "
The MateCat Tool", Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations, 2014

[4] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst, 
Moses: Open Source Toolkit for Statistical Machine Translation, Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic, June 2007.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019
"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO