Data De-Meaning Task

From GM-RKB
(Redirected from Data De-Meaning)
Jump to navigation Jump to search

A Data De-Meaning Task is a data preprocessing task that changes the data mean to zero.



References

2016

  • http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
    • QUOTE: The first and simple pre-processing approach is zero-center the data, and then normalize them, which is presented as two lines Python codes as follows:
      • X -= np.mean(X, axis = 0) # zero-center
      • X /= np.std(X, axis = 0) # normalize
    • where, X is the input data (NumIns×NumDim). Another form of this pre-processing normalizes each dimension so that the min and max along the dimension is -1 and 1 respectively. It only makes sense to apply this pre-processing if you have a reason to believe that different input features have different scales (or units), but they should be of approximately equal importance to the learning algorithm. In case of images, the relative scales of pixe

2014

2012

  • http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia
    • QUOTE: In regression, it is often recommended to center the variables so that the predictors have mean 0. This makes it so the intercept term is interpreted as the expected value of YiYi when the predictor values are set to their means. Otherwise, the intercept is interpreted as the expected value of YiYi when the predictors are set to 0, which may not be a realistic or interpretable situation (e.g. what if the predictors were height and weight?). Another practical reason for scaling in regression is when one variable has a very large scale, e.g. if you were using population size of a country as a predictor. In that case, the regression coefficients may be on a very small order of magnitude (e.g. 10−610−6) which can be a little annoying when you're reading computer output, so you may convert the variable to, for example, population size in millions. The convention that you standardize predictions primarily exists so that the units of the regression coefficients are the same.