2018 ComparisonofRuleInductionDecisi

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Rule Induction Algorithms; Ripper Algorithm; C4.5 Algorithm; In-Close Algorithm

Notes

Cited By

Quotes

Abstract

Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.

1. Introduction

2. Rule-Based Learning Algorithms

2.1. Ripper

Ripper (Repeated Incremental Pruning to Produce Error Reduction) [4] is a rule induction based on reduced error pruning [5]. Ripper includes three stages of rule processing: building, optimization and clean-up [6]. In the first stage, a training dataset is divided into growing and pruning sets. Rules are constructed on the basis of a growing set. Later these rules are reduced with the help of a pruning set. At the optimization stage, all rules are reconsidered in order to reduce the error on the entire dataset. At the clean-up phase, it is checked whether each Description Length (DL) rule increases the whole rule set and the dataset. The calculation of DL allows one to quantify the complexity of various rule sets, which describe dataset [7]. In case if the rule increases DL, it is deleted.

To date, the Ripper algorithm is considered as the state of the art in rule induction [8] and implemented in the machine learning library WEKA under the name of JRip [6].

2.2. C4.5

2.3. In-Close

3. Datasets

4. Results And Discussion

5. Conclusion

Thus, the leader in the test results is the In-Close algorithm. Furthermore, the simple voting classification scheme for this algorithm can be improved, for example, by using some information about the support of concepts and turn to the weighted voting classification scheme.

A large number of concepts / rules generated by In-Close can be reduced by their ranking according to the explanatory degree of the training instances, as suggested in the JSM method of automatic hypotheses generation [19]. Thus, an effective tool can be obtained to solve descriptive tasks of data mining.

Acknowledgments

The reported study was funded by RFBR according to research project No. 16-07-00342a.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2018 ComparisonofRuleInductionDecisiE V Kotelnikov
V R Milov
Comparison of Rule Induction, Decision Trees and Formal Concept Analysis Approaches for Classification10.1088/1742-6596/1015/3/0320682018