A Neural OCR Engine for North Saami

paper, specified "short paper"
Authorship
  1. 1. Andre Kåsen

    National Library, Norway

  2. 2. Håvard Østli

    National Library, Norway

  3. 3. Andrea M. Huus

    National Library, Norway

  4. 4. Lars Johnsen

    National Library, Norway

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The DH-LAB at the National Library of Norway can announce that we have an open-source optical character recognition (OCR) engine for North Saami in construction. North Saami is an under-resourced indigenous minority language recognized by the Norwegian State. The OCR engine is induced with the system Tesseract by the means of cross-lingual model transfer. When evaluating the model on a held-out portion of the ground truth, it reaches a bag-of-words F1 measure of 0.98 %. The OCR engine in question will be the first freely available OCR engine for North Saami.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO