SMOTE-RUS : Combined Oversampling and Undersampling Technique to Classify the Imbalanced Autism Spectrum disorder dataset

Document Type : Original Article

Authors

1 Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

2 Department of Information Systems, Faculty of Computers and Information Sciences, Ain Shams University,Cairo , Egypt

Abstract

The imbalanced distribution of classes is a common issue in almost classification problems. Therefore, we must be familiar with class-imbalanced techniques to handle this problem. Autism spectrum disorder(ASD) disease affects the development of the brain. Therefore, patients with autism have some limitations to interact with others on the social level. So, it is necessary to predict the genes related to ASD for early diagnosis and treatment. Recent studies utilize different machine learning techniques to predict ASD genes that suffer from the imbalanced ASD dataset problem. In this paper, recent ASD gene prediction models are utilized to compare different techniques influence using undersampling and oversampling algorithms on the model performance. Moreover, a new combined technique(SMOTE-RUS) is proposed using Synthetic Oversampling Technique(SMOTE) and random undersampling(RUS) technique to solve the imbalanced dataset problem. SMOTE-RUS is used to build an effective model to predict ASD genes. The proposed technique results prove that it is effective to get a more robust gene prediction model. Moreover, it outperforms other models using a single resampling technique.

Keywords