"MACHINE LEARNING IN CYBERSECURITY: COMPREHENSIVE ANALYSIS AND DETECTION OF URL PHISHING ATTACKS"
Abstract
In the face of escalating cyber threats, particularly phishing attacks, this research provides a comprehensive analysis of machine learning techniques for effective phishing URL detection. Leveraging a meticulously curated dataset comprising 10,000 webpages, evenly split between phishing and legitimate sites, and gathered from January 2015 to June 2017, the study employs advanced feature extraction using Selenium WebDriver, surpassing conventional data collection methodologies. A lot of different machine learning algorithms were carefully tested. These included ensemble methods like Random Forest Classifier and XGBoost, as well as more traditional models like Logistic Regression and GaussianNB. The analysis focused on critical performance metrics: accuracy, precision, recall, and F1-score. Results revealed that ensemble models, particularly XGBoost, outshine others with a remarkable accuracy of 96.0% and an equally impressive F1-Score of 96.0%, setting a new benchmark in phishing URL detection. This research not only gives a thorough comparison of different machine learning methods, but it also shows that advanced ensemble techniques are better at solving cybersecurity problems. It opens avenues for future exploration in deep learning and real-time application of these models, underscoring the potential of machine learning in fortifying defenses against continually evolving cyber threats.