DATA CLEANINGTOOL: USAGEOFFUZZYROUGHSETTHEORY AS MACHINE LEARNINGPRE-PROCESSING

Document Type : Original Article

Authors

1 Information System Department Information Technology Department Faculty of Computers and Information, Mansoura University-Egypt

2 Information System Department Faculty of Computers and Information, Mansoura University-Egypt

3 Information Technology Department Faculty of Computers and Information, Mansoura University-Egypt

Abstract

Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or
trends, and is likely to contain many errors. Data preprocessing is a crucial phase in the data mining
process that involves techniques toresolve such issues. Feature selection is a popular data
preprocessing procedure that is focused on omitting attributes from decision systems while still
maintain the ability of those decision systems to distinguish different decision classes. A popular way to
evaluate attribute subsets with respect to this criterion is based on the notion of dependency degree. In
this paper, we conduct an experimental study using the generalized classical rough set framework for
data-based attribute selection and reduction, based on the notion of fuzzy decision reducts to evaluate
the viability of using Fuzzy rough subset feature. Experimental results shows that, general optimization
can be achieved under average accuracy reduction, ±10.7 %, against high reduction rate over
attributesranging from 36% to 97% and over instances from 1.7% to 44%.