2014 AReviewonMultiLabelLearningAlgo

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Multi-Label Learning.

Notes

Cited By

Quotes

Abstract

Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made toward this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multi-label learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multi-label learning are outlined for reference purposes

1. Introduction

IV. Related Learning Settings

There are several learning settings related to multi-label learning which are worth some discussion, such as multi-instance learning [25], ordinal classification [29], multi-task learning [10], and data streams classification [31].

Multi-instance learning (Dietterich et al., 1997) studies the problem where each example is described by a bag of instances while associated with a single (binary) label. A bag is regarded to be positive iff at least one of its constituent instances is positive. In contrast to multi-label learning which models the object’s ambiguities (complicated semantics) in output (label) space, multi-instance learning can be viewed as modeling the object’s ambiguities in input (instance) space [113]. There are some initial attempt towards exploiting multi-instance representation for learning from multi-label data [109].

Ordinal classification [29] studies the problem where a natural ordering exists among all the class labels. In multi-label learning, we can accordingly assume an ordering of relevance on each class label to generalize the crisp membership (yj? {-1, +1}) into the graded membership (yj? {m1, m2, · · ·, mk} where m1 < m2 < · · · < mk). Therefore, graded multi-label learning accommodates the case where we can only provide vague (ordinal) instead of definite judgement on the label relevance. Existing work shows that graded multi-label learning can be solved by transforming it into a set of ordinal classification problems (one for each class label), or a set of standard multi-label learning problems (one for each membership level) [12].

Multi-task learning [10] studies the problem where multiple tasks are trained in parallel such that training information of related tasks are used as an inductive bias to help improve the generalization performance of other tasks. Nonetheless, there are some essential differences between multi-task learning and multi-label learning to be noticed. Firstly, in multi-label learning all the examples share the same feature space, while in multi-task learning the tasks can be in the same feature space or different feature spaces. Secondly, in multi-label learning the goal is to predict the label subset associated with an object, while the purpose of multi-task learning is to have multiple tasks to be learned well simultaneously, and it does not concern on which task subset should be associated with an object (if we take a label as a task) since it generally assumes that every object is involved by all tasks. Thirdly, in multi-label learning it is not rare (yet demanding) to deal with large label space [90], while in multi-task learning it is not reasonable to consider a large number of tasks. Nevertheless, techniques for multi-task learning might be used to benefit multi-label learning [56].

Data streams classification [31] studies the problem where real-world objects are generated online and processed in a real-time manner. Nowadays, streaming data with multi-label nature widely exist in real-world scenarios such as instant news, emails, microblogs, etc [ 70]. As a usual challenge for streaming data analysis, the key factor for effectively classifying multi-label data streams is how to deal with the concept drift problem. Existing works model concept drift by updating the classifiers significantly whenever a new batch of examples arrive [68], taking the fading assumption that the influence of past data gradually declines as time evolves [53], [ 78], or maintaining a change detector alerting whenever a concept drift is detected [ 70].

V. Conclusion

In this paper, the state-of-the-art of multi-label learning is reviewed in terms of paradigm formalization, learning algorithms and related learning settings. In particular, instead of trying to go through all the learning techniques within confined space, which would lead to only abridged introductions, we choose to elaborate the algorithmic details of eight representative multi-label learning algorithms with references to other related works. Some online resources for multi-label learning are summarized in Table III, including academic activities (tutorial, workshops, special issue), publicly-available software and data sets.

As discussed in Section II-A2, although the idea of exploiting label correlations have been employed by various multi-label learning techniques, there has not been any formal characterization on the underlying concept or any principled mechanism on the appropriate usage of label correlations. Recent researches indicate that correlations among labels might be asymmetric, i.e. the influence of one label to the other one is not necessarily be the same in the inverse direction [42], or local, i.e. different instances share different label correlations with few correlations being globally applicable [43]. Nevertheless, full understanding on label correlations, especially for scenarios with large output space, would remain as the holy grail for multi-label learning.

As reviewed in Section III, multi-label learning algorithms are introduced by focusing on their algorithmic properties. One natural complement to this review would be conducting thorough experimental studies to get insights on the pros and cons of different multi-label learning algorithms. A recent attempt towards extensive experimental comparison can be found in [62] where 12 multi-label learning algorithms are compared with respect to 16 evaluation metrics. Interestingly while not surprisingly, the best-performing algorithm for both classification and ranking metrics turns out to be the one based on ensemble learning techniques (i.e. random forest of predictive decision trees [52]). Nevertheless, empirical comparison across a broad range or within a focused type (e.g. [ 79]) are worthwhile topic to be further explored.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 AReviewonMultiLabelLearningAlgoZhi-Hua Zhou
Min-Ling Zhang
A Review on Multi-Label Learning Algorithms10.1109/TKDE.2013.392014