Enhancing Javanese Emotion Classification: A Comparative Study of Cross-Lingual, Supervised, and Hybrid Transfer Learning using IndoBERTweet

Authors

  • Galih Setiawan Nurohim Universitas Bina Sarana Informatika, Indonesia
  • Heribertus Ary Setyadi Universitas Bina Sarana Informatika, Indonesia
  • Sigit Wahyudi Sebelas Maret University, Indonesia
  • Paulus Tofan Rapiyanta Universitas Bina Sarana Informatika, Indonesia
Pages Icon

DOI:

https://doi.org/10.63158/journalisi.v8i3.1657

Keywords:

Emotion classification, Javanese Ngoko, cross-lingual transfer learning, IndoBERTweet, machine translation

Abstract

This research investigates transfer learning efficacy for five-class emotion classification in Javanese Ngoko. A parallel Indonesian–Javanese Ngoko corpus was synthesized by translating 5,400 samples from the PRDECT-ID dataset using machine translation, with translation quality verified via a preliminary expert validation sample. Using IndoBERTweet as the backbone architecture, three paradigms were evaluated: zero-shot transfer (E1), fully supervised learning (E2), and cross-lingual transfer learning (E3) with identical hyperparameters. Empirical results indicate that the cross-lingual transfer (E3) setup achieved peak performance (67,5% accuracy; 0,67 weighted F1) under the evaluated dataset and experimental setting. Per-class analysis identified that positive affect (Happy) showed cross-lingual stability, whereas negative emotions (Sadness, Fear) suffered degradation due to lexical divergence between the two languages. Training dynamics revealed early-onset overfitting, suggesting model capacity exceeds current dataset density. This work establishes a baseline benchmark for Javanese emotion classification and provides a reproducible machine-translated parallel corpus, emphasizing the need for future validation with native-speaker data to mitigate translation bias.

Downloads

Download data is not yet available.

References

[1] Z. Maryani, R. Legino, and P. Waijittragum, “Linguistic hybridity between Javanese and Bahasa Indonesia in contemporary Javanese songs,” vol. 23, no. 2, pp. 278–286, 2025.

[2] Hermanto and T. W. Sen, “Syllable-Based Javanese Speech Recognition Using MFCC and CNNs : Noise Impact Evaluation,” J. Tek. Inform., vol. 18, no. 1, pp. 32–42, 2025, doi: 10.15408/jti.v18i1.41067.

[3] A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Word Level Language Identification in Indonesian-Javanese-English Code-Mixed Text,” Procedia Comput. Sci., vol. 244, pp. 105–112, 2024, doi: 10.1016/j.procs.2024.10.183.

[4] S. R. Ntou, “Exploring complex diglossia in Javanese society,” Cogent Arts Humanit., vol. 11, no. 1, p., 2024, doi: 10.1080/23311983.2024.2313286.

[5] W. Udasmoro, A. Firmonasari, and W. T. Astuti, “Access to and Usage of Javanese in Mass Media among Yogyakarta Youth,” vol. 23, no. 2, pp. 268–277, 2023, doi: 10.24071/joll.v23i2.5508.

[6] P. Triawan, I. Tahyudin, and P. Purwadi, “Impact of NLP Algorithms on Sentiment Analysis Efficiency and Accuracy,” J. Inf. Syst. Informatics, vol. 7, no. 3, pp. 2684–2709, 2025, doi: 10.51519/journalisi.v7i3.1222.

[7] F. Arifin, A. Nasuha, A. S. Priambodo, A. Winursito, and T. S. Gunawan, “Advanced Multimodal Emotion Recognition for Javanese Language Using Deep Learning,” Indones. J. Electr. Eng. Informatics, vol. 12, no. 3, pp. 503–515, 2024, doi: 10.52549/ijeei.v12i3.5662.

[8] S. Praveena, “Emotion Classification Using BERT : A Comprehensive Study,” Tuijin Jishu/Journal Propuls. Technol., vol. 45, no. 4, pp. 3337–3345, 2024.

[9] A. Alabd-aljabar, Z. Raisan, M. Adnan, and S. Dhou, “A Hybrid Transfer Learning Approach to Teeth Diagnosis Using Orthopantomogram Radiographs,” IEEE Access, vol. 12, no. December, pp. 178142–178152, 2024, doi: 10.1109/ACCESS.2024.3507925.

[10] A. M. H. Pardede, R. Winanjaya, and J. Ismail, “HYBRID TRANSFER LEARNING AND ADVANCED DATA AUGMENTATION FOR MULTICLASS BRAIN TUMOR CLASSIFICATION,” vol. 11, no. 3, pp. 669–679, 2026, doi: 10.33480/jitk.v11i3.7524.

[11] T. Sindane, V. Marivate, and A. Modupe, “Cross-lingual embedding methods and applications : A systematic review for low-resourced scenarios,” Nat. Lang. Process. J., vol. 12, no. October 2024, p. 100157, 2025, doi: 10.1016/j.nlp.2025.100157.

[12] J. F. Kusuma and A. Chowanda, “Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter,” Int. J. INFORMATICS Vis., vol. 7, no. September, pp. 773–780, 2023, doi: 10.30630/joiv.7.3.1035.

[13] A. I. Gufroni, P. Purwanto, and F. Farikhin, “Academic Performance Prediction Using Supervised Learning Algorithms in University Admission,” JOIV Int. J. Informatics Vis., vol. 9, no. January, pp. 184–194, 2025, doi: 10.62527/joiv.9.1.2974.

[14] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” COLING 2020 - 28th Int. Conf. Comput. Linguist. Proc. Conf., pp. 757–770, 2020, doi: 10.18653/v1/2020.coling-main.66.

[15] F. Koto Jey Han Lau Timothy Baldwin, “INDOBERTWEET: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” pp. 10660–10668, 2021.

[16] A. F. Hidayatullah, R. A. Apong, D. T. C. Lai, and A. Qazi, “Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets,” PeerJ Comput. Sci., vol. 9, pp. 1–24, 2023, doi: 10.7717/PEERJ-CS.1312.

[17] G. Enrique, I. Alfina, and E. Yulianti, “Javanese part-of-speech tagging using cross-lingual transfer learning,” IAES Int. J. Artif. Intell., vol. 13, no. 3, pp. 3498–3509, 2024, doi: 10.11591/ijai.v13.i3.pp3498-3509.

[18] P. K. L. Utama, J. S. Dibangoye, and T. M. Tashu, “Cross-Lingual Emotion Recognition in Balinese Text using Multilingual-LLMs under Peer-Collaborations Settings,” in Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), 2026, pp. 225–238. doi: 10.18653/v1/2026.loreslm-1.21.

[19] R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data Br., vol. 44, p. 108554, 2022, doi: 10.1016/j.dib.2022.108554.

[20] T. O. Tafa, S. Zaiton, M. Hashim, and M. S. Othman, “Machine Translation Performance for Low-Resource Languages : A Systematic Literature Review,” IEEE Access, vol. 13, no. March, pp. 72486–72505, 2025, doi: 10.1109/ACCESS.2025.3562918.

[21] T. R. Mahesh, V. K. V, D. K. V, O. Geman, and M. Margala, “Healthcare Analytics The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification,” Healthc. Anal., vol. 4, no. July, p. 100247, 2023, doi: 10.1016/j.health.2023.100247.

[22] M. Martianus, D. Christian, K. Setyo, M. Martianus, and D. Christian, “ScienceDirect ScienceDirect Improving Indonesian emotion detection with openAI o4-mini Improving Indonesian emotion detection with openAI o4-mini text normalization text normalization,” Procedia Comput. Sci., vol. 269, pp. 863–871, 2025, doi: 10.1016/j.procs.2025.09.029.

Downloads

Published

2026-06-27

Issue

Section

Articles

Most read articles by the same author(s)