2009 OnConsistencyandSparsityforPrin

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Principal Components Analysis

Notes

Cited By

Quotes

Author Keywords

Abstract

Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of [math]\displaystyle{ n }[/math] observations (or cases) of a vector with [math]\displaystyle{ n }[/math] variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if [math]\displaystyle{ p(n)/n→0 }[/math]. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if [math]\displaystyle{ p(n) \gt \gt n }[/math].

1. Introduction

Suppose [math]\displaystyle{ {x_i, i=1,…,n} }[/math] is a dataset of [math]\displaystyle{ n }[/math] observations on [math]\displaystyle{ p }[/math] variables. Standard principal components analysis (PCA) looks for vectors ξ that maximize

[math]\displaystyle{ var(ξTxℓ.)/∥ξ∥2. (1) }[/math]

If [math]\displaystyle{ ξ_1, …, ξ_k }[/math] have already been found by this optimization, then the maximum defining [math]\displaystyle{ ξ_{k+1}\lt math\gt is taken over [[vector]]s ξ orthogonal to \lt math\gt ξ1, …, ξk\lt math\gt . == References == {{#ifanon:| }} __NOTOC__ [[Category:Publication]], [[Category:Publication 2009]] \lt br /\gt {{#ask: [[{{FULLPAGENAME}}]] | ?author | ?volume | ?date | ?title | ?type | ?journal | ?titleUrl | ?doi | ?note | ?year | format=bibtex }}{{Publication|doi=10.1198/jasa.2009.0121|title=On Consistency and Sparsity for Principal Components Analysis in High Dimensions|titleUrl=|abstract=Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. [[We]] describe a simple asymptotic model in which the estimate of the leading principal component vector via standard [[PCA]] is consistent if and only if p(n)/n→0. [[We]] provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if [[PCA]] is done on the selected subset, then consistency is recovered, even if \lt math\gt p(n) \gt \gt n }[/math].}}