Talks and presentations

Translation, OCR, and Semantic Retrieval: Current Status and Future Outlook of the Dharmamitra Ecosystem

December 21, 2025

Symposium, 仏教学とデジタル・ヒューマニティーズ国際シンポジウム (Buddhist Studies and Digital Humanities International Symposium), Tokyo, Japan

I presented on the current status and future outlook of the Dharmamitra ecosystem, covering translation, OCR, and semantic retrieval capabilities for Buddhist texts. The symposium was held at Tokyo International Forum Hall D5 and focused on “The Significance of Humanities and Research Infrastructure Development in the DX-AI Era.”

Dharmamitra: A Platform that Makes Translation and Discovery of Buddhist Texts Possible Across Language Barriers

December 21, 2025

Symposium, 11th Symposium of Humanistic Buddhism, Taiwan

I presented on the Dharmamitra platform as part of the panel “AI in the Fo Guang Dictionary of Buddhism English Translation Project and MITRA.” The panel showcased how emerging AI tools support large-scale Buddhist translation and lexicographical research. I introduced Dharmamitra as a collaborative AI-driven platform developed by Tohoku University with the Tsadra Foundation and Berkeley AI Research Lab, which employs Large Language Models for high-quality machine translation of Sanskrit, Pali, Tibetan, and Chinese alongside vector-based semantic retrieval.

Building the Foundations of Buddhist Philology through Digital Humanities: Exploring the Potential of the Tohoku University Digital Archives (ToUDA)

December 03, 2025

Workshop, Workshop and Symposium, Center for Integrated Japanese Studies (CIJS), Tohoku University, Sendai, Japan

I presented as part of the Digital Archive Research Unit at the Center for Integrated Japanese Studies (CIJS) at Tohoku University. The workshop and symposium was co-hosted by CIJS, the Tohoku University Digital Archives Steering Committee, and the Tohoku University Library. I delivered a lecture and participated in a panel discussion on the digitization of academic resources in Tohoku University and new developments in Buddhist textual studies with AI technology.

Machine Learning and Large Language Models in Buddhist Studies: The Dharmamitra Project

November 12, 2025

Talk, Goodman Lecture Series No. 32, Khyentse Foundation, Online

Recent advances in machine learning, particularly the advent of Large Language Models (LLMs) such as ChatGPT, are rapidly shaping new ways of accessing and interpreting knowledge preserved in textual form. This has far-reaching implications for the study of the Buddhist textual tradition. Applications once considered decades away, such as the fluent machine translation of Classical Tibetan or Chinese into English, are now commonly used by scholars at all levels, from early-career students to senior researchers. This talk will provide an overview of the tools that the Dharmamitra project currently offers the Buddhist Studies community, with a focus on machine translation and cross-lingual search for philological use cases. It will also introduce the underlying technical architecture of these tools and discuss both the capabilities and limitations of the current generation of language models for philological applications.

Dharmamitra & DharmaNexus: A New Set of Digital Tools for the Philological Study of Buddhist Texts

August 18, 2025

Presentation, ELTE BTK, Kodály terem, Budapest, Hungary

Traditional philological work on Buddhist sources often consists of laborious keyword searches across disparate corpora in multiple languages, followed by manual collation of parallels, a workflow that favours stamina over insight. Dharmamitra is an open-source platform that collapses those tasks to seconds using advanced computational and deep learning methods.

Is training deep neural embeddings worth the effort? A preliminary investigation of different representation methods for semantic similarity tasks in Buddhist Chinese and related languages of the Buddhist tradition

June 20, 2025

Presentation, Online workshop "Navigating Indra’s Net: Digital Approaches to Text Reuse-based Inter-textuality in Pre-Modern East Asian Texts" at the Hanmun Lab, Ruhr-Universität Bochum, Bochum, Germany (online)

This presentation is part of an online workshop on digital approaches to intertextuality in pre-modern East Asian texts. The talk will provide a preliminary investigation of different representation methods for semantic similarity tasks in Buddhist Chinese and related languages of the Buddhist tradition.

From Sthiramati to Dharmamitra: Developing Digital Tools for a New Age of Philological Buddhist Studies

June 13, 2025

Presentation, DH International Workshop at Keio University, Tokyo, Japan, Tokyo, Japan

This presentation was part of a workshop at Keio University, co-organized by Kakenhi Special Promotion Research “Compilation of the Reiwa Daizokyo as a Digital Research Infrastructure - Presentation of a Research Infrastructure Construction Model for Next-Generation Humanities (JP25H00001)” and the Research Infrastructure Hub, Research and Development Project for the DX of Humanities and Social Sciences.

Machine Translation for Asian Studies

March 14, 2025

Workshop, Annual Conference of the Association of Asian Studies, Columbus, Ohio

With the advent of large language models, machine translation (MT) has become a widely used, but little understood, tool for research, language learning, and communication. GPT, Claude, and many other model series allow researchers now to access literature in different languages, and even translate primary texts composed in classical languages with few resources available. But how to evaluate the translation output of such machines? How to decide which model is the best for my own research purposes and how to tweak it? How will MT impact language learning, which is fundamental for Asian Studies?

MITRASearch: Building Information Retrieval Systems for Classical Asian Languages in the Age of AI

March 13, 2025

Talk, CEAL (Council on East Asian Libraries) Technology Forum, Columbus, Ohio

Recent advances in artificial intelligence and natural language processing have revolutionized information retrieval and question-answering systems. This talk introduces MITRASearch, a specialized search platform designed for exploring Buddhist literature preserved across Classical Asian languages including Chinese, Tibetan, Sanskrit, and Pāli. The system leverages multilingual approximate search capabilities to enable scholars to identify parallel passages and conduct comparative analyses across different writing systems and translations. We demonstrate how large language models integrated into the Dharmamitra project enhance user interaction with search results, facilitating dynamic exploration of these classical texts. This innovation addresses the long-standing challenge of cross-linguistic textual research in Buddhist studies and offers new possibilities for digital humanities scholarship.

Dharmamitra: Developing a Toolkit for Philological Work on Premodern Asian Low-Resource Languages

November 25, 2024

Talk, Workshop: Case studies from current research projects - Conversations on Digital Scholarly Editing, Śivadharma Project Headquarters, Palazzo Giusso, L'Orientale University of Naples, Naples, Italy

This talk was presented as part of the workshop “Case studies from current research projects - Conversations on Digital Scholarly Editing” organized by Martina Dello Buono and Florinda De Simini at L’Orientale University of Naples.

MITRA: Beyond Just Machine Translation for Premodern Asian Low Resource Languages

October 25, 2024

Talk, Johns Hopkins University, Center for Language and Speech Processing, Baltimore, MD, USA

Recent years saw the rise of multilingual language models that achieve high levels of performance for a large number of tasks, with some of them handling hundreds of languages at once. Premodern languages are usually underrepresented in such models, leading to poor performance in downstream applications. The Dharmamitra project aims to develop a diverse set of language models to address these shortcomings for the classical Asian low-resource languages Sanskrit, Tibetan, Classical Chinese, and Pali. These models provide solutions for low-level NLP tasks such as word segmentation and morpho-syntactic tagging, as well as high-level tasks including semantic search, machine translation, and general chatbot interaction. The talk will address the individual challenges and unique characteristics of the data involved, and the strategies deployed to address these. It will also demonstrate how these different tools can be combined in an application that goes beyond simple sentence-to-sentence machine translation, providing detailed grammatical explanations and corpus-wide search to support both early-stage language learners and experienced researchers with specific demands.

MITRA: Developing Language Models for Machine Translation and Search in Buddhist Source Languages

August 29, 2024

Talk, PNC 2024 Annual Conference and Joint Meetings, Seoul, Korea

Translation and search are among the fundamental problems when researching the textual source material of Buddhist traditions. MITRA has successfully developed machine translation models to ease the access to this material. When it comes to search, The Dharmamitra project approaches this problem by using semantic embeddings that enable search on related passages in different languages, regardless of whether the answer to the query is found in a text preserved in Pāli, Sanskrit, Tibetan, or Chinese. In addition to providing researchers with this powerful search system, Dharmamitra also provides a system for the automatic detection of similar text passages within the same language and across different languages. In my talk, I will demonstrate how these tools are designed and how researchers can access them and integrate them in their workflow.