Obtaining More Expressive Corpus Distributions for Standardized Ancient Languages

Published in CHR 2021: Computational Humanities Research Conference, 2021

This paper introduces a latent variable model for ancient languages that aims at quantifying the influence that early authoritative works exert on their literary successors in terms of lexis. The model jointly estimates the amount of word reuse, based on uni- and bigrams of words, and the date of composition of each text. We apply the model to a corpus of pre-Renaissance Latin texts composed between the 3rd c. BCE and the 14th c. CE. Our evaluation focusses on the structures of word reuse detected by the model, its temporal predictions and the quality of the inferred diachronic distributions of words, which last aspect is assessed using a newly designed task from the field of computational etymology.

Recommended citation: Hellwig, O., Sellmer, S., & Nehrdich, S. (2021). "Obtaining More Expressive Corpus Distributions for Standardized Ancient Languages." In Proceedings of the Computational Humanities Research Conference (CHR 2021), Amsterdam, The Netherlands.
Download Paper