A STUDY FOR MALWARE STATIC ANALYSIS CLASSIFICATION ALGORITHMS WITH DIFFERENT FEATURES EXTRACTORS'

Shehata, Sara; Hegazy, Islam; El-Horabty, El-Sayed M.

doi:10.21608/ijicis.2023.242171.1300

A STUDY FOR MALWARE STATIC ANALYSIS CLASSIFICATION ALGORITHMS WITH DIFFERENT FEATURES EXTRACTORS'

Document Type : Original Article

Authors

¹ CS, FCIS, ASU

² Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University

³ Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University

10.21608/ijicis.2023.242171.1300

Abstract

: Smartphones are mobile devices that can connect to the Internet through various means such as Wi-Fi, cellular data networks (3G, 4G, 5G), or even through tethering to another device. Once connected to the Internet, smartphones can access a wide range of online services and applications, including web browsing, social media, email, streaming videos, online gaming, and much more. Malware attacks have significantly increased as a result of data movement. Malware causes unexpected smartphone behavior, including changing phone bill charges, intrusive advertisements, confusing messages being sent to contacts, unreliable performance, the appearance of new apps, unusual data use, and a noticeable drop in battery life. But smartphone consumers are still vulnerable to malware attacks. To solve this problem, we created a Malware detection system. Malicious Android Apps are categorized using static analysis through the APK’s metadata file. “Drebin” dataset primarily uses the Android manifest file as one of the key features for Android malware detection. Additionally, we investigated algorithms for static analysis, including adaboost, ANN, decision trees, extra trees, K-nearest neighbors, lasso regression, logistic regression, MLP, naïve bayes, random forests, ride regression, support vector machines and XGB. We employ “Drebin" dataset with different feature extractors to reduce dataset dimensionality. We use TF-IDF and word2vec feature extractor. The experimental results show that TF_IDF performs better on "Drebin" dataset.

Keywords