Federated Learning Enabled IDS for Internet of Things on non-IID Data

Document Type : Original Article

Authors

1 Computer Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

2 Head of Department of Computer Systems, Faculty of Computer and Information Sciences, Ain shams university

3 Professor, Dept. of Computer Science, College of Engineering Department of Electrical and Computer Engineering, Tennessee Technological University, USA

4 5 El-Khalyfa El-Ma'moun Street Abbasia

Abstract

Critical applications in IoT systems are being targeted by attackers. Using a smart intrusion detection system (IDS) is crucial for protecting IoT systems. Centralized learning is commonly used to create smart IDS and has been successful in IoT networks. However, the Iot nodes in critical applications with highly sensitive information, are not willing to send their data through the network and share it with another party. To solve the problem of data privacy, researchers came up with distributed and federated learning. Both methods allow learning to happen within a local network, with data remaining inside the network and the learning process being done by the edge devices. In this research, a deep learning model is proposed to classify the types of behaviors provided in the CICIDS2017 dataset using the three learning approaches. The experiments were performed by splitting the dataset over ten simulated nodes. In the centralized learning approach, an F-Score of 98% can be achieved. In distributed learning, the F-Score achieved an average of 78% over the ten nodes. In the federated learning, the F-Score achieved an average of 89% over the ten nodes. A comparative study among the centralized, distributed, and federated approach is done and the challenge that may arise from using each approach. Moreover, an evaluation of the effect of the data distribution, the number of local training rounds and the global communication rounds on federated learning’s efficiency. The federated learning approach has shown promising improvements for both accuracy in addition to preserving data privacy.

Keywords