Pruning Based Interestingness of Mined Classification Patterns

Pruning Based Interestingness of Mined Classification Patterns

Ahmed Al-Hegami
Department of Computer Science, University of Sana'a, Yemen

Abstract: Classification is an important problem in data mining. Decision tree induction is one of the most common techniques that are applied to solve the classification problem. Many decision tree induction algorithms have been proposed based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to interpret and comprehend compare to the patterns induced by other classification algorithms, the constructed decision trees may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the patterns. For this reasons, the question of an appropriate constructing and providing a good pruning criteria have long been a topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensible tree. Even after pruning, the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining a simple, and interpretable, but only approximate decision tree much better than an accurate tree that involves a lot of details. In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The approach computes the subjective interestingness and uses it as a pruning criterion to prune away uninteresting patterns. The proposed framework helps in reducing the size of the induced model and maintaining the model. One of the features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The experimental results are quite promising.

Keywords: Knowledge discovery in databases, data mining, decision tree, domain knowledge, interestingness, novelty measure.

Received December 9, 2007; accepted March 30, 2008

Read 2758 times Last modified on Wednesday, 02 June 2010 05:53
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…