Optimized K-Means Clustering for Web Server Anomaly Detection Using Elbow Method and Security-Rule Enhancements

  • Rahmawan Bagus Trianto Politeknik Negeri Cilacap, Indonesia
  • Muhammad Abdul Muin Politeknik Negeri Cilacap, Indonesia
  • Cahya Vikasari Politeknik Negeri Cilacap, Indonesia
Keywords: anomaly detection, web server logs, K-Means, Elbow Method, security rules

Abstract

Anomaly detection in web server environments is essential for identifying early indicators of cyberattacks that arise from abnormal request behaviors. Traditional signature-based mechanisms often fail to detect emerging or obfuscated threats, requiring more adaptive analytical approaches. This study proposes an optimized anomaly detection model using K-Means clustering enhanced with engineered security-rule features and the Elbow Method. Two datasets were used: a small dataset of 3,399 log entries from one VPS and a large dataset of 223,554 entries collected from three VPS nodes, all sourced from local production servers of the Department of Computer and Business, Politeknik Negeri Cilacap. The preprocessing pipeline includes timestamp normalization, removal of non-informative static resources, numerical feature scaling, and TF-IDF encoding of URL paths. Domain-driven security features entropy scores, encoded-payload indicators, abnormal status-code ratios, and request-rate deviations were integrated to improve anomaly separability. Experiments across five model configurations show that combining larger datasets with rule-based features significantly enhances clustering performance, achieving a Silhouette Score of 0.9136 and a Davies–Bouldin Index of 0.4712. The results validate the effectiveness of incorporating security-rule engineering with unsupervised learning to support early-warning threat detection in web server environments.

Downloads

Download data is not yet available.

References

P. Feng et al., “GlareShell: Graph learning-based PHP webshell detection for web server of industrial internet,” Comput. Networks, vol. 245, no. April, p. 110406, 2024, doi: 10.1016/j.comnet.2024.110406.

B. Xie, Q. Li, and Y. Wang, “PHP-based malicious webshell detection based on abstract syntax tree simplification and explicit duration recurrent networks,” Comput. Secur., vol. 146, no. June, 2024, doi: 10.1016/j.cose.2024.104049.

Y. Xu, Y. Fang, Z. Liu, and Q. Zhang, “PWAGAT: Potential Web attacker detection based on graph attention network,” Neurocomputing, vol. 557, no. 2019, p. 126725, 2023, doi: 10.1016/j.neucom.2023.126725.

Yusuf Raharja, “Implementasi Metode Osint untuk Mengidentifikasi Serangan Judi Online pada Website,” J. Inform. Polinema, vol. 10, no. 3, pp. 359–364, 2024, doi: 10.33795/jip.v10i3.4847.

A. Kurniawan, B. S. Abbas, A. Trisetyarso, and S. M. Isa, “Classification of web backdoor malware based on function call execution of static analysis,” ICIC Express Lett., vol. 13, no. 6, pp. 445–452, 2019, doi: 10.24507/icicel.13.06.445.

H. Kwon and J. W. Baek, “Text Select-Backdoor: Selective Backdoor Attack for Text Recognition Systems,” IEEE Access, vol. 12, no. July, pp. 170688–170698, 2024, doi: 10.1109/ACCESS.2024.3436586.

Y. Bai et al., “Backdoor Attack and Defense on Deep Learning: A Survey,” IEEE Trans. Comput. Soc. Syst., vol. 12, no. 1, pp. 404–434, 2024, doi: 10.1109/TCSS.2024.3482723.

R. B. Trianto, A. S. Nugroho, and E. Supriyadi, “Klasterisasi Menggunakan Algoritma K-Means dan Elbow pada Opini Masyarakat Tentang Kebijakan Sekolah Luring Tahun 2022,” INOVTEK Polbeng - Seri Inform., vol. 8, no. 1, p. 1, 2023, doi: 10.35314/isi.v8i1.2756.

Y. Chen, P. Tan, M. Li, H. Yin, and R. Tang, “K-means clustering method based on nearest-neighbor density matrix for customer electricity behavior analysis,” Int. J. Electr. Power Energy Syst., vol. 161, no. July, 2024, doi: 10.1016/j.ijepes.2024.110165.

K. E. Setiawan, A. Kurniawan, A. Chowanda, and D. Suhartono, “Clustering models for hospitals in Jakarta using fuzzy c-means and k-means,” Procedia Comput. Sci., vol. 216, no. 2022, pp. 356–363, 2022, doi: 10.1016/j.procs.2022.12.146.

W. A. Prastyabudi, A. N. Alifah, and A. Nurdin, “Segmenting the Higher Education Market: An Analysis of Admissions Data Using K-Means Clustering,” Procedia Comput. Sci., vol. 234, no. 2023, pp. 96–105, 2024, doi: 10.1016/j.procs.2024.02.156.

N. Rylko, M. Stawiarz, P. Kurtyka, and V. Mityushev, “Study of anisotropy in polydispersed 2D micro and nano-composites by Elbow and K-Means clustering methods,” Acta Mater., vol. 276, no. April, p. 120116, 2024, doi: 10.1016/j.actamat.2024.120116.

X. Sun, X. Liu, C. Deng, H. Chu, G. Wang, and H. Zhao, “An Enhanced Density Peak Clustering Algorithm With Dimensionality Reduction and Relative Density Normalization for High-Dimensional Duplicate Data,” IEEE Access, vol. 13, no. August, pp. 147242–147264, 2025, doi: 10.1109/ACCESS.2025.3596983.

S. Tahvili, L. Hatvani, M. Felderer, F. G. de Oliveira Neto, W. Afzal, and R. Feldt, “Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases,” Softw. Qual. J., vol. 33, no. 2, pp. 1–36, 2025, doi: 10.1007/s11219-025-09722-7.

S. Mostafaei, A. Ahmadi, and J. Shahrabi, “Dealing with data intrinsic difficulties by learning an interPretable Ensemble Rule Learning (PERL) model,” Inf. Sci. (Ny)., vol. 595, pp. 294–312, 2022, doi: 10.1016/j.ins.2022.02.048.

A. Hannousse and S. Yahiouche, “Handling webshell attacks: A systematic mapping and survey,” Comput. Secur., vol. 108, p. 102366, 2021, doi: 10.1016/j.cose.2021.102366.

S. M, S. Anusuya, and L. K. Narayanan, “Enhancing Automatic Speech Recognition Accuracy Using a Gaussian Mixture Model (GMM),” SSRN Electron. J., 2025, doi: 10.2139/ssrn.5089158.

R. Nanda, E. Haerani, S. K. Gusti, and S. Ramadhani, “Klasifikasi Berita Menggunakan Metode Support Vector Machine,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 2, pp. 269–278, 2022, doi: 10.32672/jnkti.v5i2.4193.

D. F. AL-Hafiidh, I. F. Rozi, and I. K. Putri, “Peringkasan Teks Otomatis pada Portal Berita Olahraga menggunakan metode Maximum Marginal Relevance.,” J. Inform. Polinema, vol. 8, no. 3, pp. 21–30, 2022, doi: 10.33795/jip.v8i3.519.

D. H. Amalia and W. Yustanti, “Klasifikasi Buku Menggunakan Metode Support Vector Machine pada Digital Library,” J. Informatics Comput. Sci., vol. 3, no. 01, pp. 55–61, 2021, doi: 10.26740/jinacs.v3n01.p55-61.

J. Heidari, N. Daneshpour, and A. Zangeneh, “A novel K-means and K-medoids algorithms for clustering non-spherical-shape clusters non-sensitive to outliers,” Pattern Recognit., vol. 155, no. May, p. 110639, 2024, doi: 10.1016/j.patcog.2024.110639.

H. Zhang et al., “Webshell traffic detection with character-level features based on deep learning,” IEEE Access, vol. 6, pp. 75268–75277, 2018, doi: 10.1109/ACCESS.2018.2882517.

B. Subba and P. Gupta, “A tfidfvectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes,” Comput. Secur., vol. 100, p. 102084, 2021, doi: 10.1016/j.cose.2020.102084.

M. Berhili, O. Chaieb, and M. Benabdellah, “Intrusion Detection Systems in IoT Based on Machine Learning: A state of the art,” Procedia Comput. Sci., vol. 251, pp. 99–107, 2024, doi: 10.1016/j.procs.2024.11.089.

Z. T. Sworna, Z. Mousavi, and M. A. Babar, “NLP methods in host-based intrusion detection systems: A systematic review and future directions,” J. Netw. Comput. Appl., vol. 220, no. November 2022, p. 103761, 2023, doi: 10.1016/j.jnca.2023.103761.

Published
2025-12-10
Abstract views: 40 times
Download PDF: 20 times
How to Cite
Trianto, R., Muin, M., & Vikasari, C. (2025). Optimized K-Means Clustering for Web Server Anomaly Detection Using Elbow Method and Security-Rule Enhancements. Journal of Information Systems and Informatics, 7(4), 3601-3625. https://doi.org/10.63158/journalisi.v7i4.1391
Section
Articles