URL-Based Phishing Detection Using a BERT-LSTM Model
DOI:
https://doi.org/10.63158/journalisi.v8i1.1543Keywords:
Bayesian Optimization, BERT, Cybersecurity, Deep Learning, Phishing DetectionAbstract
The rising prevalence of phishing websites presents substantial cybersecurity threats by deceiving users into revealing sensitive information through malicious URLs. This study aims to enhance phishing URL detection by introducing a deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) with Long Short-Term Memory (LSTM). In this framework, BERT is fine-tuned on a phishing URL dataset and utilized as a contextual embedding to represent URL tokens, while Bayesian Optimization is employed to identify optimal hyperparameter settings during model training. Experimental results demonstrate that the BERT-LSTM model achieves impressive detection performance, with a precision of 0.9299, recall of 0.9795, F1-score of 0.9540, accuracy of 0.9756, and ROC-AUC of 0.9962. The model consistently outperforms embedding-based methods such as Word2Vec, FastText, and GloVe, as well as a classical baseline model using Logistic Regression with TF-IDF features. These findings suggest that the contextual embeddings generated by BERT effectively capture structural patterns in URLs, leading to more accurate phishing detection and providing a promising approach for enhancing cybersecurity systems.
Downloads
References
[1] A. Aljofey, Q. Jiang, A. Rasool, H. Chen, W. Liu, Q. Qu, and Y. Wang, “An effective detection approach for phishing websites using URL and HTML features,” Sci. Rep., vol. 12, no. 1, p. 8842, May 2022, doi: 10.1038/s41598-022-10841-5.
[2] R. Ahmad, S. Terzis, and K. Renaud, “Getting users to click: a content analysis of phishers’ tactics and techniques in mobile instant messaging phishing,” Inf. Comput. Secur., vol. 32, no. 4, pp. 420–435, Sep. 2024, doi: 10.1108/ICS-11-2023-0206.
[3] S. Vinoth, H. L. Vemula, B. Haralayya, P. Mamgain, M. F. Hasan, and M. Naved, “Application of cloud computing in banking and e-commerce and related security threats,” Mater. Today Proc., vol. 51, pp. 2172–2175, 2022, doi: 10.1016/j.matpr.2021.11.121.
[4] M. Miao, T. Jalees, S. I. Zaman, S. Khan, N.-A. Hanif, and M. K. Javed, “The influence of e-customer satisfaction, e-trust and perceived value on consumer’s repurchase intention in B2C e-commerce segment,” Asia Pac. J. Mark. Logist., vol. 34, no. 10, pp. 2184–2206, Nov. 2022, doi: 10.1108/APJML-03-2021-0221.
[5] A. C. Tally, J. Abbott, A. M. Bochner, S. Das, and C. Nippert-Eng, “Tips, tricks, and training: Supporting anti-phishing awareness among mid-career office workers based on employees’ current practices,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2023, pp. 1–13.
[6] M. Nanda, M. Saraswat, and P. K. Sharma, “Enhancing cybersecurity: A review and comparative analysis of convolutional neural network approaches for detecting URL-based phishing attacks,” e-Prime Adv. Electr. Eng. Electron. Energy, vol. 8, no. November 2023, p. 100533, 2024, doi: 10.1016/j.prime.2024.100533.
[7] S. Kavya and D. Sumathi, “Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection,” Artif. Intell. Rev., vol. 58, no. 2, p. 50, Dec. 2024, doi: 10.1007/s10462-024-11055-z.
[8] A. S. Noviantina and F. A. Yulianto, “A Hybrid Approach for Detecting Phishing URLs: Integrating Rule-Based and Machine Learning Techniques,” in Proc. Int. Conf. Inf. Commun. Technol. (ICoICT), IEEE, Jul. 2025, pp. 1–6. doi: 10.1109/ICoICT66265.2025.11193051.
[9] W. Li, S. Manickam, Y.-W. Chong, W. Leng, and P. Nanda, “A State-of-the-Art Review on Phishing Website Detection Techniques,” IEEE Access, vol. 12, pp. 187976–188012, 2024, doi: 10.1109/ACCESS.2024.3514972.
[10] A. Jadhav and P. Chandre, “A Hybrid Heuristic-Machine Learning Framework for Phishing Detection Using Multi-Domain Feature Analysis,” Eng. Technol. Appl. Sci. Res., vol. 15, no. 5, pp. 27219–27226, Oct. 2025, doi: 10.48084/etasr.11548.
[11] F. Carroll, J. A. Adejobi, and R. Montasari, “How good are we at detecting a phishing attack? Investigating the evolving phishing attack email and why it continues to successfully deceive society,” SN Comput. Sci., vol. 3, no. 2, p. 170, 2022.
[12] M. Irsan, F. Febriana, H. H. Nuha, and H. R. Putra Sailellah, “Phishing Detection on URL Data Using K-Nearest Neighbors Method,” in Proc. Int. Conf. Innov. Intell. Informatics Comput. Technol. (3ICT), IEEE, Nov. 2024, pp. 792–797. doi: 10.1109/3ict64318.2024.10824630.
[13] A. Karim, M. Shahroz, K. Mustofa, S. B. Belhaouari, and S. R. K. Joga, “Phishing Detection System Through Hybrid Machine Learning Based on URL,” IEEE Access, vol. 11, pp. 36805–36822, 2023, doi: 10.1109/ACCESS.2023.3252366.
[14] H. S. Wicaksana and K. Huda, “Penerapan Word2Vec dan SVM dengan Hyperparameter Tuning untuk Deteksi Phishing,” JURIKOM (Jurnal Riset Komputer), vol. 12, no. 3, pp. 361–371, Jun. 2025, doi: 10.30865/jurikom.v12i3.8729.
[15] Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, “A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN,” Electronics, vol. 12, no. 1, p. 232, Jan. 2023, doi: 10.3390/electronics12010232.
[16] E. A. Aldakheel, M. Zakariah, G. A. Gashgari, F. A. Almarshad, and A. I. A. Alzahrani, “A Deep learning-based innovative technique for phishing detection in modern security with uniform resource locators,” Sensors, vol. 23, no. 9, p. 4403, 2023.
[17] P. E. Shawky, S. M. ElKaffas, and S. K. Guirguis, “Effect of typos on text classification accuracy in word and character tokenization,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 40, no. 2, pp. 152–162, 2024.
[18] A. Louati, H. Louati, E. Kariri, F. Alaskar, and A. Alotaibi, “Sentiment analysis of Arabic course reviews of a Saudi university using support vector machine,” Appl. Sci., vol. 13, no. 23, p. 12539, 2023.
[19] H. S. Wicaksana, R. Kusumaningrum, and R. Gernowo, “Determining community happiness index with transformers and attention-based deep learning,” IAES Int. J. Artif. Intell., vol. 13, no. 2, pp. 1753–1761, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1753-1761.
[20] D. S. Asudani, N. K. Nagwani, and P. Singh, “Impact of word embedding models on text analytics in deep learning environment: a review,” Artif. Intell. Rev., vol. 56, no. 9, pp. 10345–10425, Sep. 2023, doi: 10.1007/s10462-023-10419-1.
[21] M. Wankhade and A. C. S. Rao, “Opinion analysis and aspect understanding during covid-19 pandemic using BERT-Bi-LSTM ensemble method,” Sci. Rep., vol. 12, no. 1, p. 17095, Oct. 2022, doi: 10.1038/s41598-022-21604-7.
[22] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network,” IEEE Access, vol. 10, pp. 21517–21525, 2022, doi: 10.1109/ACCESS.2022.3152828.
[23] A. Ozcan, C. Catal, E. Donmez, and B. Senturk, “A hybrid DNN–LSTM model for detecting phishing URLs,” Neural Comput. Appl., vol. 35, no. 7, pp. 4957–4973, Mar. 2023, doi: 10.1007/s00521-021-06401-z.
[24] H. Li, G. K. Rajbahadur, D. Lin, C.-P. Bezemer, and Z. M. Jiang, “Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting,” IEEE Access, vol. 12, pp. 70676–70689, 2024, doi: 10.1109/ACCESS.2024.3402543.
[25] H. S. Wicaksana and K. Huda, “Optimized Machine Learning Approach for Malware Detection using Bayesian Optimization,” J. Sisfokom (Sist. Inf. dan Komput.), vol. 15, no. 01, pp. 103–111, Dec. 2025, doi: 10.32736/sisfokom.v15i01.2547.
[26] K. Yang, L. Liu, and Y. Wen, “The impact of Bayesian optimization on feature selection,” Sci. Rep., vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-54515-w.
[27] L. Zhang et al., “CNN-LSTM Model Optimized by Bayesian Optimization for Predicting Single-Well Production in Water Flooding Reservoir,” Geofluids, vol. 2023, 2023, doi: 10.1155/2023/5467956.
[28] S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman, and I. H. Sarker, “Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques,” Ann. Data Sci., vol. 11, no. 1, pp. 217–242, Feb. 2024, doi: 10.1007/s40745-022-00379-8.
[29] M. S. Alzboon, M. Subhi Al-Batah, M. Alqaraleh, F. Alzboon, and L. Alzboon, “Phishing Website Detection Using Machine Learning,” Gamification Augmented Real., vol. 3, p. 81, Jan. 2025, doi: 10.56294/gr202581.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














