URL-Based Phishing Detection Using a BERT-LSTM Model

Hilman Singgih Wicaksana; Usman Ependi; Ari Muzakir

doi:10.63158/journalisi.v8i1.1543

Authors

Hilman Singgih Wicaksana Universitas Karya Husada Semarang, Indonesia https://orcid.org/0000-0001-9486-9601
Usman Ependi Universitas Bina Darma, Indonesia https://orcid.org/0000-0002-5814-4045
Ari Muzakir Universitas Bina Darma, Indonesia https://orcid.org/0000-0002-4560-5893

DOI:

https://doi.org/10.63158/journalisi.v8i1.1543

Keywords:

Bayesian Optimization, BERT, Cybersecurity, Deep Learning, Phishing Detection

Abstract

The rising prevalence of phishing websites presents substantial cybersecurity threats by deceiving users into revealing sensitive information through malicious URLs. This study aims to enhance phishing URL detection by introducing a deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) with Long Short-Term Memory (LSTM). In this framework, BERT is fine-tuned on a phishing URL dataset and utilized as a contextual embedding to represent URL tokens, while Bayesian Optimization is employed to identify optimal hyperparameter settings during model training. Experimental results demonstrate that the BERT-LSTM model achieves impressive detection performance, with a precision of 0.9299, recall of 0.9795, F1-score of 0.9540, accuracy of 0.9756, and ROC-AUC of 0.9962. The model consistently outperforms embedding-based methods such as Word2Vec, FastText, and GloVe, as well as a classical baseline model using Logistic Regression with TF-IDF features. These findings suggest that the contextual embeddings generated by BERT effectively capture structural patterns in URLs, leading to more accurate phishing detection and providing a promising approach for enhancing cybersecurity systems.

Downloads

Download data is not yet available.

References

[1] A. Aljofey, Q. Jiang, A. Rasool, H. Chen, W. Liu, Q. Qu, and Y. Wang, “An effective detection approach for phishing websites using URL and HTML features,” Sci. Rep., vol. 12, no. 1, p. 8842, May 2022, doi: 10.1038/s41598-022-10841-5.

[2] R. Ahmad, S. Terzis, and K. Renaud, “Getting users to click: a content analysis of phishers’ tactics and techniques in mobile instant messaging phishing,” Inf. Comput. Secur., vol. 32, no. 4, pp. 420–435, Sep. 2024, doi: 10.1108/ICS-11-2023-0206.

[3] S. Vinoth, H. L. Vemula, B. Haralayya, P. Mamgain, M. F. Hasan, and M. Naved, “Application of cloud computing in banking and e-commerce and related security threats,” Mater. Today Proc., vol. 51, pp. 2172–2175, 2022, doi: 10.1016/j.matpr.2021.11.121.

[4] M. Miao, T. Jalees, S. I. Zaman, S. Khan, N.-A. Hanif, and M. K. Javed, “The influence of e-customer satisfaction, e-trust and perceived value on consumer’s repurchase intention in B2C e-commerce segment,” Asia Pac. J. Mark. Logist., vol. 34, no. 10, pp. 2184–2206, Nov. 2022, doi: 10.1108/APJML-03-2021-0221.

[5] A. C. Tally, J. Abbott, A. M. Bochner, S. Das, and C. Nippert-Eng, “Tips, tricks, and training: Supporting anti-phishing awareness among mid-career office workers based on employees’ current practices,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2023, pp. 1–13.

[6] M. Nanda, M. Saraswat, and P. K. Sharma, “Enhancing cybersecurity: A review and comparative analysis of convolutional neural network approaches for detecting URL-based phishing attacks,” e-Prime Adv. Electr. Eng. Electron. Energy, vol. 8, no. November 2023, p. 100533, 2024, doi: 10.1016/j.prime.2024.100533.

[7] S. Kavya and D. Sumathi, “Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection,” Artif. Intell. Rev., vol. 58, no. 2, p. 50, Dec. 2024, doi: 10.1007/s10462-024-11055-z.

[8] A. S. Noviantina and F. A. Yulianto, “A Hybrid Approach for Detecting Phishing URLs: Integrating Rule-Based and Machine Learning Techniques,” in Proc. Int. Conf. Inf. Commun. Technol. (ICoICT), IEEE, Jul. 2025, pp. 1–6. doi: 10.1109/ICoICT66265.2025.11193051.

[9] W. Li, S. Manickam, Y.-W. Chong, W. Leng, and P. Nanda, “A State-of-the-Art Review on Phishing Website Detection Techniques,” IEEE Access, vol. 12, pp. 187976–188012, 2024, doi: 10.1109/ACCESS.2024.3514972.

[10] A. Jadhav and P. Chandre, “A Hybrid Heuristic-Machine Learning Framework for Phishing Detection Using Multi-Domain Feature Analysis,” Eng. Technol. Appl. Sci. Res., vol. 15, no. 5, pp. 27219–27226, Oct. 2025, doi: 10.48084/etasr.11548.

[11] F. Carroll, J. A. Adejobi, and R. Montasari, “How good are we at detecting a phishing attack? Investigating the evolving phishing attack email and why it continues to successfully deceive society,” SN Comput. Sci., vol. 3, no. 2, p. 170, 2022.

[12] M. Irsan, F. Febriana, H. H. Nuha, and H. R. Putra Sailellah, “Phishing Detection on URL Data Using K-Nearest Neighbors Method,” in Proc. Int. Conf. Innov. Intell. Informatics Comput. Technol. (3ICT), IEEE, Nov. 2024, pp. 792–797. doi: 10.1109/3ict64318.2024.10824630.

[13] A. Karim, M. Shahroz, K. Mustofa, S. B. Belhaouari, and S. R. K. Joga, “Phishing Detection System Through Hybrid Machine Learning Based on URL,” IEEE Access, vol. 11, pp. 36805–36822, 2023, doi: 10.1109/ACCESS.2023.3252366.

[14] H. S. Wicaksana and K. Huda, “Penerapan Word2Vec dan SVM dengan Hyperparameter Tuning untuk Deteksi Phishing,” JURIKOM (Jurnal Riset Komputer), vol. 12, no. 3, pp. 361–371, Jun. 2025, doi: 10.30865/jurikom.v12i3.8729.

[15] Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, “A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN,” Electronics, vol. 12, no. 1, p. 232, Jan. 2023, doi: 10.3390/electronics12010232.

[16] E. A. Aldakheel, M. Zakariah, G. A. Gashgari, F. A. Almarshad, and A. I. A. Alzahrani, “A Deep learning-based innovative technique for phishing detection in modern security with uniform resource locators,” Sensors, vol. 23, no. 9, p. 4403, 2023.

[17] P. E. Shawky, S. M. ElKaffas, and S. K. Guirguis, “Effect of typos on text classification accuracy in word and character tokenization,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 40, no. 2, pp. 152–162, 2024.

[18] A. Louati, H. Louati, E. Kariri, F. Alaskar, and A. Alotaibi, “Sentiment analysis of Arabic course reviews of a Saudi university using support vector machine,” Appl. Sci., vol. 13, no. 23, p. 12539, 2023.

[19] H. S. Wicaksana, R. Kusumaningrum, and R. Gernowo, “Determining community happiness index with transformers and attention-based deep learning,” IAES Int. J. Artif. Intell., vol. 13, no. 2, pp. 1753–1761, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1753-1761.

[20] D. S. Asudani, N. K. Nagwani, and P. Singh, “Impact of word embedding models on text analytics in deep learning environment: a review,” Artif. Intell. Rev., vol. 56, no. 9, pp. 10345–10425, Sep. 2023, doi: 10.1007/s10462-023-10419-1.

[21] M. Wankhade and A. C. S. Rao, “Opinion analysis and aspect understanding during covid-19 pandemic using BERT-Bi-LSTM ensemble method,” Sci. Rep., vol. 12, no. 1, p. 17095, Oct. 2022, doi: 10.1038/s41598-022-21604-7.

[22] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network,” IEEE Access, vol. 10, pp. 21517–21525, 2022, doi: 10.1109/ACCESS.2022.3152828.

[23] A. Ozcan, C. Catal, E. Donmez, and B. Senturk, “A hybrid DNN–LSTM model for detecting phishing URLs,” Neural Comput. Appl., vol. 35, no. 7, pp. 4957–4973, Mar. 2023, doi: 10.1007/s00521-021-06401-z.

[24] H. Li, G. K. Rajbahadur, D. Lin, C.-P. Bezemer, and Z. M. Jiang, “Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting,” IEEE Access, vol. 12, pp. 70676–70689, 2024, doi: 10.1109/ACCESS.2024.3402543.

[25] H. S. Wicaksana and K. Huda, “Optimized Machine Learning Approach for Malware Detection using Bayesian Optimization,” J. Sisfokom (Sist. Inf. dan Komput.), vol. 15, no. 01, pp. 103–111, Dec. 2025, doi: 10.32736/sisfokom.v15i01.2547.

[26] K. Yang, L. Liu, and Y. Wen, “The impact of Bayesian optimization on feature selection,” Sci. Rep., vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-54515-w.

[27] L. Zhang et al., “CNN-LSTM Model Optimized by Bayesian Optimization for Predicting Single-Well Production in Water Flooding Reservoir,” Geofluids, vol. 2023, 2023, doi: 10.1155/2023/5467956.

[28] S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman, and I. H. Sarker, “Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques,” Ann. Data Sci., vol. 11, no. 1, pp. 217–242, Feb. 2024, doi: 10.1007/s40745-022-00379-8.

[29] M. S. Alzboon, M. Subhi Al-Batah, M. Alqaraleh, F. Alzboon, and L. Alzboon, “Phishing Website Detection Using Machine Learning,” Gamification Augmented Real., vol. 3, p. 81, Jan. 2025, doi: 10.56294/gr202581.

URL-Based Phishing Detection Using a BERT-LSTM Model

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat