Regression Tree Post-Pruning Algorithm

From GM-RKB
Jump to navigation Jump to search

A Regression Tree Post-Pruning Algorithm is a Decision Tree Post-Pruning Algorithm that can solve a Regression Tree Post-Pruning Task.



References

2017

  • (Torgo, 2017) ⇒ Luı́s Torgo, (2017). "Regression Trees". In Encyclopedia of Machine Learning and Data Mining pp 1080-1083.
    • QUOTE: As most nonparametric modeling techniques, regression trees may over-fit the training data which will inevitably lead to poor out of the sample predictive performance. The standard procedure to fight this undesirable effect is to grow an overly large tree and then to use some reliable error estimation procedure to find the “best” sub-tree of this large model. This procedure is known as post-pruning a tree (Breiman et al. 1984 [1]). An alternative is to stop tree growth sooner in a process known as pre-pruning which again needs to be guided by reliable error estimation to known when over-fitting is starting to occur. Although more efficient in computational terms, this latter alternative may lead to stop tree growth too soon even with look-ahead mechanisms.

       Post-pruning is usually carried out in a three-stage procedure: (i) a set of sub-trees of the initial tree is generated, (ii) some reliable error estimation procedure is used to obtain estimates for each member of this set, and (iii) some method based on these estimates is used to select one of these trees as the final tree model. Different methods exist for each of these steps. A common setup (e.g., Breiman et al. 1984 ) is to use error-complexity pruning to generate a sequence of nested sub-trees, whose error is then estimated by cross validation. The final tree is selected using the x-SE rule which starts with the lowest estimated error sub-tree and then selects the smallest tree within the interval of x standard errors of the lowest estimated error tree (a frequent setting is to use 1 standard error).

      Variations on the subject of pruning regression trees include, among others, pre-pruning alternatives (e.g., Breiman and Meisel 1976[2] and Friedman 1979[3]), the use of different tree error estimators (see Torgo (1998)[4] for a comparative study and references to different alternatives), and the use of the MDL principle to guide the pruning (Robnik-Sikonja and Kononenko 1998[5]).

1998


  1. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Statistics/probability series. Wadsworth & Brooks/Cole Advanced Books & Software, Belmont
  2. Breiman L, Meisel WS (1976) General estimates of the intrinsic variability of data in nonlinear regression models. J Am Stat Assoc 71:301–307
  3. Friedman JH (1979) A tree-structured approach to nonparametric multiple regression. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture notes in mathematics, vol 757. Springer, Berlin/New York, pp 5–22
  4. Torgo L (1998) Error estimates for pruning regression trees. In: Nedellec C, Rouveirol C (eds) Proceedings of the 10th European conference on Machine Learning, Chemnitz. LNAI, vol 1398. Springer
  5. Robnik-Sikonja M, Kononenko I (1998) Pruning regression trees with MDL. In: Proceedings of ECAI-98, Brighton