Information Theoretic Fuzzy Approach to Knowledge Discovery in Databases
Professor Mark Last
Dept. of Computer Science and Engineering, University of South Florida
October 13, 2000
11 a.m.
Fuller Labs 320
Abstract
A unified, information-theoretic fuzzy approach to knowledge discovery in databases (KDD) is presented. The approach is aimed at automating the KDD tasks of feature selection, rule induction, classification, estimation, and data cleaning. We model the relationship between the input (predictive) and the target (dependent) attributes by a multilevel information-theoretic fuzzy network (IFN). A stepwise forward procedure is applied to determine the maximal number of hidden layers providing a statistically significant decrease in the conditional entropy of a target attribute. The network construction procedure has a built-in feature selection capability, since each hidden layer of the IFN is associated with a single input attribute. The run time of the algorithm is quadratic-polynomial in the number of original features.
The network connections leading to the last (target) layer represent association rules between values of input and target attributes. The connection weights can be analyzed to identify the most informative associations. The extracted rules can also be used for estimating the value of a target attribute in a new record. Data not complying with the network prediction is considered unreliable. A fuzzy-based approach is used to evaluate reliability degrees of target attributes. The data quality can be improved by removing the most unreliable data from the database or correcting it to the values predicted by the network.
The method has been applied to real-world datasets of varying dimensionality, comprising a mixture of numerical and nominal attributes. The obtained results show that the IFN approach generates accurate models, which tend to be more compact and stable than the models produced other data mining methods, like decision trees and the Naïve Bayes algorithm.
Host
Professor Dave Brown
Maintained by webmaster@wpi.eduLast modified: Sep 27, 2006, 16:05 EDT
