Integration of Deep Learning Models for Enhanced Classification of Viral DNA Sequences Across Specific Viruses and Viral Families

Document Type : Original Article

Authors

1 Information Systems Department ,Faculty of Computer and Information Sciences,. Ain Shams University

2 Information Systems Department ,Faculty of Computer and Information Sciences, Ain Shams University

3 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt

Abstract

The field of genomic bioinformatics is continually challenged by the need for precise classification of viral DNA sequences. The challenge of accurately classifying viral sequences is crucial for the development of diagnostic and therapeutic strategies for any viral outbreaks. This study presents a comprehensive approach integrating two distinct deep learning models, namely the Genetic Algorithm (GA) optimized Convolutional Neural Networks (CNN) hybrid model and the CNN-Extreme Learning Machines (ELM) model aiming to enhance the classification of viral DNA sequences across specific viruses and viral families.
A comprehensive data preprocessing strategy is employed, wherein both datasets undergo k-mer, label, and one-hot vector encoding. This allows for a uniform and comparative analysis across different models and datasets. When the optimized GA-CNN is applied to the more generic viral family dataset, it demonstrates a good adaptability with an accuracy of 95.88% achieving a higher result than the CNN-ELM. In contrast, the CNN-ELM, when tested on the specific virus dataset, maintains robust feature extraction capabilities, faster training time but lower than the optimized GA-CNN model achieving an accuracy of 92.7%.
A comparative analysis of training times is also employed in this study. The CNN-ELM model shows a notable efficiency, with a 34% faster training time compared to the GA-CNN. Moreover, when both models are applied to the new generic dataset, a comparative study with other deep learning models is conducted. Remarkably, the GA-CNN outperforms other models, achieving the highest classification accuracy of 95.88%.

Keywords