Dataset Dimensionality Reduction Task

(Redirected from Dimensionality Reduction)
Jump to: navigation, search

A Dataset Dimensionality Reduction Task is a data transformation task that requires the creation of a lower-dimensional data representation that can map a data set to another data set with fewer data dimensions (from a low dimension space).







  • (Fodor, 2002) ⇒ Imola K. Fodor. (2002). “A Survey of Dimension Reduction Techniques." LLNL technical report, UCRL ID-148494
    • QUOTE: In mathematical terms, the problem we investigate can be stated as follows: given the p-dimensional random variable x = (x1,...,xp)T, find a lower dimensional representation of it, s = (s1,...,sk)T with [math]k[/math] <= [math]p[/math], that captures the content in the original data, according to some criterion. The components of s are sometimes called the hidden components. Different fields use different names for the [math]p[/math] multivariate vectors: the term "variable" is mostly used in statistics, while "feature" and "attribute" are alternatives commonly used in the computer science and machine learning literature.

      Throughout this paper, we assume that we have [math]n[/math] observations, each being a realization of the p-dimensional random variable x = (x1,...,xp)T with mean E(x) = μ = (μ1,...,μp)T and covariance matrix E{(x-μ)(x-μ)T = Σp x p. We denote such an observation matrix by X = {xi,j : 1 <= [math]i[/math] <= [math]p[/math], 1 <= [math]j[/math] <= n}. If μi and σi = SQRT(Σp x p) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations xi,j by (xi,j - μi)i, where …