- (Kotelnikov & Milov, 2018) ⇒ E V Kotelnikov, and V R Milov. (2018). “Comparison of Rule Induction, Decision Trees and Formal Concept Analysis Approaches for Classification.” In: Proceedings of International Conference Information Technologies in Business and Industry 2018 IOP Publishing. IOP Conf. Series: Journal of Physics: Conf. Series 1015 (2018) 032068 doi:10.1088/1742-6596/1015/3/032068
Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
2. Rule-Based Learning Algorithms
Ripper (Repeated Incremental Pruning to Produce Error Reduction)  is a rule induction based on reduced error pruning . Ripper includes three stages of rule processing: building, optimization and clean-up . In the first stage, a training dataset is divided into growing and pruning sets. Rules are constructed on the basis of a growing set. Later these rules are reduced with the help of a pruning set. At the optimization stage, all rules are reconsidered in order to reduce the error on the entire dataset. At the clean-up phase, it is checked whether each Description Length (DL) rule increases the whole rule set and the dataset. The calculation of DL allows one to quantify the complexity of various rule sets, which describe dataset . In case if the rule increases DL, it is deleted.
4. Results And Discussion
Thus, the leader in the test results is the In-Close algorithm. Furthermore, the simple voting classification scheme for this algorithm can be improved, for example, by using some information about the support of concepts and turn to the weighted voting classification scheme.
A large number of concepts / rules generated by In-Close can be reduced by their ranking according to the explanatory degree of the training instances, as suggested in the JSM method of automatic hypotheses generation . Thus, an effective tool can be obtained to solve descriptive tasks of data mining.
|2018 ComparisonofRuleInductionDecisi||E V Kotelnikov|
V R Milov
|Comparison of Rule Induction, Decision Trees and Formal Concept Analysis Approaches for Classification||10.1088/1742-6596/1015/3/032068||2018|