STUDYING THE IMPACT OF DATASET BALANCING ON MACHINE LEARNING-BASED INTRUSION DETECTION SYSTEMS FOR IOT

Abdel-Hamid, Salma; Hegazy, Islam; Aref, Mostafa; Roushdy, Mohamed

doi:10.21608/ijicis.2024.317982.1352

STUDYING THE IMPACT OF DATASET BALANCING ON MACHINE LEARNING-BASED INTRUSION DETECTION SYSTEMS FOR IOT

Document Type : Original Article

Authors

¹ Computer Science Department Faculty of Computers and Information Technology Future University in Egypt

² Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University

³ Department Computer Science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.

⁴ Faculty of Computer Science and Information Technology Innovation University, Sharqia, Egypt

10.21608/ijicis.2024.317982.1352

Abstract

Internet of Things (IoT) networks are integral to modern life due to their pervasive connectivity and automation capabilities. Intrusion Detection Systems (IDS) are crucial in IoT ecosystems to countermeasure attacks that can compromise devices and disrupt essential services. Their role is vital in maintaining the integrity, confidentiality, and availability of data within these networks. The effectiveness of these security systems is fundamentally dependent on the robustness of learning algorithms and the quality of the datasets utilized. Class imbalance is a common challenge in real-world datasets, where certain classes are represented by significantly fewer instances compared to others. This paper studies the impact of balancing the BoT-IoT dataset on the performance of Machine Learning (ML) based IDSs using three algorithms: K-Nearest Neighbors (KNN), Gradient Boosting (GB), and Support Vector Machine (SVM). To address the class imbalance problem, we apply two resampling techniques, random upsampling and Synthetic Minority Over-sampling Technique (SMOTE). We evaluate the efficacy of the models through various performance metrics, including accuracy, precision, recall, and F1-score. The findings of our experimental work prove that balanced datasets lead to more dependable and robust IDSs that are capable of handling real-world data with varied class distributions.

Keywords