Predicting Early-stage ASD using symptomatic dataset for Arab children: A comparison of K-fold cross validation and train-test approaches

Document Type : Original Article

Authors

1 Computer Science Department Faculty of Computer and Information Sciences,

2 Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University Cairo, Egypt

3 Medical Studies Department, Faculty of Postgraduate Childhood Studies, Ain Shams University Cairo,Egypt

4 Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University

Abstract

The early diagnosis process of autism spectrum disorder (ASD) in toddlers is critical and needs high experience and time to ensure that the diagnosis is accurate and the child has ASD. The early diagnose of ASD can help limit the development of the condition and provide a better life to the patients. To achieve this, many researchers studied how to apply machine learning (ML) algorithms in developing prediction models that help in early diagnosis of ASD. In this research, we leveraged our collected dataset that focuss on Arab children who have ASD especially in Egypt to develop prediction model using ML algorithms, comparing two data splitting approach: 10-fold cross-validation and train-test split. We evaluated the accuracy of Naïve Bayes, Decision trees, Support Vector Machine (SVM), Logistic Regression (LR), and Artificial Neural Network (ANN). Experimental results show cross-validation achieved an accuracy of 94.92% for Naïve bayes, 87.81 for decision trees, 94.41% for SVM, 92.38% for LR, and 96.44% for ANN algorithm, while train-test achieved 93.22% for Naïve Bayes, 88.13% for decision trees, 91.52% for SVM, 89.83% for LR, and 91.52% for ANN.

Keywords