Intelligent Model for Enhancing the Bankruptcy Prediction with Imbalanced Data Using Oversampling and CatBoost

Document Type : Original Article

Authors

1 Computer Science Department, Faculty of Computer and Information Science, Ain Shams University, Cairo, Egypt

2 Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

3 Computer Sciece Department, Faculty of Computer and Information Sciences, Ain Shams University

Abstract

Bankruptcy prediction is one of the most significant financial decision-making problems, which prevents financial institutions from sever risks. Most of bankruptcy datasets suffer from imbalanced distribution between output classes, which could lead to misclassification in the prediction results. This research paper presents an efficient bankruptcy prediction model that can handle imbalanced dataset problem by applying Synthetic Minority Oversampling Technique (SMOTE) as a pre-processing step. It applies ensemble-based machine learning classifier, namely, Categorical Boosting (CatBoost) to classify between active and inactive classes. Moreover, the proposed model reduces the dimensionality of the used dataset to increase predictive performance by using three different feature selection techniques. The proposed model is evaluated across the most popular imbalanced bankrupt dataset, which is the Polish dataset. The obtained results proved the efficiency of the applied model, especially in terms of the accuracy. The accuracies ofthe proposed model in predicting bankruptcy on the Polish five years datasets are 98%, 98%, 97%, 97% and 95%, respectively.

Keywords