Comparative Study on Feature Selection Methods for Protein

Document Type : Original Article

Authors

1 Faculty of computer and Information Sciences, Ain Shams University

2 Department Information System, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.

3 Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Abstract

The automated and high-throughput identification of protein function is one of the main issues in computational biology. Predicting the protein's structure is a crucial step in this procedure. In recent years, a wide range of approaches for predicting protein structure has been put forth. They can be divided into two groups: database-based and sequence-based. The first is to identify the principles behind protein structure and attempts to extract valuable characteristics from amino acid sequences. The second one uses pre-existing public annotation databases for data mining. This study emphasizes the sequence-based method and makes use of the ability of amino acid sequences to predict protein activity. The amino acid composition approach, the amino acid tuple approach, and several optimization algorithms were compared. Different protein sequence data sets were used in our experiments. Five classifiers were tested in this research. The best accuracy is 98% using across 10-fold cross-validation. This represents the highest performance in the Human dataset.

Keywords