Subject Headings: Multilabel Classification, Bioinformatics


Hierarchical multilabel classification (HMC) is an extension of binary classification where an instance can be labelled with multiple classes that are organised in a hierarchy. A well-known application of this kind of problem is gene function prediction. A gene can have multiple functions at the same time, and these functions are hierarchically organised: a gene predicted to have a certain class should also be predicted to have all its superclasses, as given by the hierarchy. A straightforward approach to solve this problem would be to learn a binary classifier for each class separately and then to combine the predictions. However, this has several disadvantages: (1) learning is not very efficient, since a separate classifier has to be learned for each class, (2) binary classifiers have known problems with skewed class distributions and (3) the hierarchy constraint, implying that a class should be predicted along with all its superclasses, is not automatically fulfilled. The obvious alternative is to learn a single model that predicts all the different classes at once. In this paper we propose a method for learning decision trees that predicts for each instance a set of classes instead of a single class.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 DecisionTreesForHierarchicalMultilabelClassificationHendrik Blockeel
Leander Schietgat
Jan Struyf
Sašo Džeroski
Amanda Clare
Decision Trees for Hierarchical Multilabel Classification: A case study in functional genomicsProceedings of 10th European Conference on Principles and Practice of Knowledge Discovery in Databaseshttp://www.cs.kuleuven.be/~jan/papers/HMCBNAIC.pdf10.1007/118716372006