Data Bin

From GM-RKB
Jump to navigation Jump to search

A Data Bin is an interval that represents a range data points that has been sorted by a Data Binning System.



References

2018a

2018b

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Data_binning Retrieved:2018-5-20.
    • Data binning or bucketing is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall in a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization.

      Statistical data binning is a way to group a number of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals. [1] It can also be used in multivariate statistics, binning in several dimensions at once.

2018c

2018d

  • (NMRProcFlow, 2018) ⇒ Bucketing. In: NMRProcFlow Quick Tutorial. Retrieved: 2018-05-20
    • QUOTE: An NMR spectrum may contain several thousands of points, and therefore of variables. In order to reduce the data dimensionality binning is commonly used. In binning the spectra are divided into bins (so-called buckets) and the total area within each bin is calculated to represent the original spectrum. The more simple approach consists to divide all the spectra with uniform areas width (typically 0.04 ppm). Due to the arbitrary division of peaks, one bin may contain pieces from two or more peaks which may affect the data analysis. We have chosen to implement the Adaptive, Intelligent Binning method (De Meyer et al. 2008) that attempt to split the spectra so that each area common to all spectra contains the same resonance, i.e. belonging to the same metabolite. In such methods, the width of each area is then determined by the maximum difference of chemical shift among all spectra.

2016

  • (Izrailev, 2016) ⇒ Sergei Izrailev (2016). Cut Numeric Values into Evenly Distributed Groups (PDF). Package ‘binr’ https://github.com/jabiru/binr
    • QUOTE: bins - Cuts points in vector x into evenly distributed groups (bins). bins takes 3 separate approaches to generating the cuts, picks the one resulting in the least mean square deviation from the ideal cut - length(x) / target.bins points in each bin - and then merges small bins unless excat.groups is TRUE The 3 approaches are:
  1. Use quantiles, and increase the number of even cuts up to max.breaks until the number of groups reaches the desired number. See bins.quantiles.
  2. Start with a single bin with all the data in it and perform bin splits until either the desired number of bins is reached or there’s no reduction in error (the latter is ignored if exact.groups is TRUE). See bins.split.
  3. Start with length(table(x)) bins, each containing exactly one distinct value and merge bins until the desired number of bins is reached. If exact.groups is FALSE, continue merging until there’s no further reduction in error. See bins.merge.
For each of these approaches, apply redistribution of points among existing bins until there’s no further decrease in error …

2008