2008 UnsupervisedFeatureSelectionfor

(Boutsidis et al., 2008) ⇒ Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. (2008). “[Unsupervised Feature Selection for Principal Components Analysis].” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401903

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the "best" subset of exactly k columns from an m x n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previously-existing results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, document-term data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k "eigenfeatures."

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 UnsupervisedFeatureSelectionfor	Christos Boutsidis Michael W. Mahoney Petros Drineas			Unsupervised Feature Selection for Principal Components Analysis				10.1145/1401890.1401903