Automatic Labeled Data Generation for Person Named Entity Disambiguation on the Ming Shilu

poster / demo / art installation
Authorship
  1. 1. Richard Tzong-Han Tsai

    Research Center for Humanities and Social Sciences - Academia Sinica, Department of Computer Science and Information Engineering - National Central University

  2. 2. Cheng-Han Wu

    Department of Computer Science and Information Engineering - National Central University

  3. 3. Pi-Ling Pai

    Research Center for Humanities and Social Sciences - Academia Sinica

  4. 4. I-Chun Fan

    Institute of History and Philology - Academia Sinica

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

One important task of historical research in DH is to identify person names from history texts. This task can be divided into two subtasks: person named entity recognition (PNER) and person named entity disambiguation (PNED). PNED is to link each PNE mention to a specific person profile in the reference knowledge base. The main challenge of machine-learning-based PNED is the lack of annotated data. We design an automatic approach to labeling the training data. We choose the Ming Shilu as our target history texts. We use the Ming-Qing Archives Name Authority Database as our reference knowledge base, which contains 14,070 government officials living in Ming dynasty. Our BERT-based model reaches an accuracy of 90.1%, which proves that our approach can generate labeled data for the PNED task of very high quality on Chinese history texts. For the general situation (including trivial instances), the accuracy is even higher (~98%).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO