On Tuesday (April 21, 2026), ten students from the Javanese Language, Literature, and Culture Study Program, Faculty of Cultural Sciences, Universitas Gadjah Mada, participated in an activity entitled “Developed Language Corpus” organized by the Language Office of the Special Region of Yogyakarta Province. The activity took place from 08:00 am to 04:00 pm WIB in the Pandawa 2 Room at Swiss-Belboutique Yogyakarta. The ten students were Andini Nuraini, Arfia Kholifatul Ummamah, Bernadetta Rahayunin Tyas, David Sofyan Wilaynto, Diyah Pitaloka, Inoora Putri Haliza, Marseli Dwitanti, Maysa Putri Fatihah, Pingky Putri Khairani, and Wreksi Awinanggya Pinandhita.
During the activity, participants gained knowledge about corpus development and utilization, corpus-based preservation of the Javanese language, as well as techniques for collecting and processing corpus data. Based on an online interview with Maysa Putri Fatihah, one of the participants, on April 27, 2026, the initial meeting focused on technical explanations and task distribution. The next stage involved the collection of initial Javanese corpus data over a period of 20 days, followed by data processing activities such as cleaning and editing, which lasted approximately one month. These initial stages were carried out by the participants, who were referred to in the program as field assistants. Meanwhile, the subsequent stages, such as coding, metadata input, and finalization, were managed by the Yogyakarta Language Office.
In its development process, the corpus tool used was AntConc. Javanese language data were collected from various written sources, including old manuscripts, books, book chapters, articles, academic works, mass media, newspaper reports, social media, blogs, literary works, folktales, letters, and speeches. All texts then underwent an editing process before being entered into the system as corpus material.
According to Maysa, she was pleased to be part of the program. “I am very grateful to be able to learn something new, especially the technical aspects of corpus data processing, which were practiced directly. Previously, I had attended a public lecture on Javanese manuscript corpora, but it was only theoretical,” she explained.
The development of a Javanese language corpus is considered a strategic step in reserving and preserving Javanese as part of Indonesia’s regional linguistic heritage. Beyond preservation, this corpus also opens opportunities for use in the field of digital humanities, such as linguistic research, the development of teaching materials, and the design of data-driven curricula. This program also reflects collaboration between government institutions and educational institutions in fostering linguistic innovation that remains relevant to contemporary developments.
Author : Haryo Untoro
Editor : Haryo Untoro

