Machine Translation for Asian Studies
Date:
With the advent of large language models, machine translation (MT) has become a widely used, but little understood, tool for research, language learning, and communication. GPT, Claude, and many other model series allow researchers now to access literature in different languages, and even translate primary texts composed in classical languages with few resources available. But how to evaluate the translation output of such machines? How to decide which model is the best for my own research purposes and how to tweak it? How will MT impact language learning, which is fundamental for Asian Studies?
In the first part of this session, we will give an overview of the MT landscape for Asian Studies. Participants will learn how to use online interfaces and APIs to access language models for their own research needs. We will discuss different types of prompts and user defined parameters such as temperature or token length. We will demonstrate that results can differ radically according to parameterization and prompting, both within the same model and between models.
In the second part we will present Dharmamitra (dharmamitra.org), an open-source model trained on low-resource languages relevant for Asian Studies. Using examples from Sanskrit, Pali, Tibetan, and Classical Chinese we will show how even difficult, ancient languages are slowly becoming tractable for MT and what caveats to consider when using such language models to translate them.
This is a hands-on workshop with exercises. Participants will have to bring their computers. Programming experience is not needed.
Organizers
- Marcus Bingenheimer, Temple University
- Sebastian Nehrdich, University of California, Berkeley
Chair
- Marcus Bingenheimer, Temple University
Presenters
- Marcus Bingenheimer, Temple University
- Sebastian Nehrdich, University of California, Berkeley
- Xiang Wei, Temple University