تخطى إلى المحتوى
الصفحة الرئيسية » الإصدار 3، العدد 3 ـــــ مارس 2024 ـــــ Vol. 3, No. 3 » Exploiting the Capabilities of Classifiers to Examine a Website Defacement Data Set

Exploiting the Capabilities of Classifiers to Examine a Website Defacement Data Set

Authors

Dept. of Information Systems, College of Science & Arts-Alnamas, University of Bisha, Saudi Arabia

https://orcid.org/0000-0003-2375-6911

[email protected]

Dept. of Science and Lab. Technology, College of Science and Technology, Jigawa State Polytechnic Dutse, Nigeria

Dept. of Information Systems, College of Science & Arts-Alnamas, University of Bisha, Saudi Arabia

Dept. of Computer Science, College of Computer Science and Information Tech., Jazan University, Saudi Arabia

Educational Technology Dept., College of Education, University of Bisha, Saudi Arabia

Abstract

Websitedefacement is the illegal electronic act of changing a website. In this paper, the capabilities of robust machine learning classifiers are exploited to select the best input feature set for evaluation of a website’s defacement risk. A defacement mining data set was obtained from Zone-H, a private organization, and a sample consisting of 93,644 data points was pre-processed and used for modelling purposes. Using multi-dimensional features as input, enormous modelling computations were carried out to determine the optimal outputs, in terms of performance. Reason and hackmode presented the highest contributions for the evaluation of website defacement, and were thus chosen as outputs. Various machine learning models were examined, and decision tree (DT), k-nearest neighbours (k-NN), and random forest (RF) were found to be the most powerful algorithms for prediction of the target model. The input variables ‘domain’, ‘system’, ‘web_server’, ‘redefacement’, ‘type’, ‘def_grade’, and ‘reason/hackmode’ were tested and used to shape the final model. Using the cross-validation (CV) technique, the key performance factors of the models were calculated and reported. After calculating the average scores for the hyperparameter metrics (i.e., max-depth, min-sample-leaf, weight, max-features, and CV), both targets were evaluated, and the learning algorithms were ranked as RF > DT > k-NN. The reason and hackmode variables were thoroughly analysed, and the average score accuracies for the reason and hackmode targets were 0.85 and 0.585, respectively. The results comprise a significant development, in terms of modelling and optimizing website defacement risk. This study successfully addresses key cybersecurity concerns, particularly website defacement.