2010 RedefiningClassDefinitionsUsing

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Two aspects are crucial when constructing any real world supervised classification task: the set of classes whose distinction might be useful for the domain expert, and the set of classifications that can actually be distinguished by the data. Often a set of labels is defined with some initial intuition but these are not the best match for the task. For example, labels have been assigned for land cover classification of the Earth but it has been suspected that these labels are not ideal and some classes may be best split into subclasses whereas others should be merged. this paper formalizes this problem using three ingredients: the existing class labels, the underlying separability in the data, and a special type of input from the domain expert. We require a domain expert to specify an [math]\displaystyle{ L \times L }[/math] matrix of pairwise probabilistic constraints expressing their beliefs as to whether the [math]\displaystyle{ L }[/math] classes should be kept separate, merged, or split. This type of input is intuitive and easy for experts to supply. We then show that the problem can be solved by casting it as an instance of penalized probabilistic clustering (PPC). Our method, Class-Level PPC (CPPC) extends PPC showing how its time complexity can be reduced from [math]\displaystyle{ O (N^2) }[/math] to [math]\displaystyle{ O(NL) }[/math] for the problem of class re-definition. We further extend the algorithm by presenting a heuristic to measure adherence to constraints, and providing a criterion for determining the model complexity (number of classes) for constraint-based clustering. We demonstrate and evaluate CPPC on artificial data and on our motivating domain of land cover classification. For the latter, an evaluation by domain experts shows that the algorithm discovers novel class definitions that are better suited to land cover classification than the original set of labels.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 RedefiningClassDefinitionsUsingDan R. Preston
Carla E. Brodley
Roni Khardon
Damien Sulla-Menashe
Mark Friedl
Redefining Class Definitions Using Constraint-based Clustering: An Application to Remote Sensing of the Earth's SurfaceKDD-2010 Proceedings10.1145/1835804.18359082010