Gradient Descent Boosting-based Decision Tree (GDBT) Learning Algorithm

From GM-RKB
Jump to navigation Jump to search

A Gradient Descent Boosting-based Decision Tree (GDBT) Learning Algorithm is a gradient descent boosting-based learning algorithm that utilizes gradient descent optimization in conjunction with decision tree models, where each tree is trained to correct the errors of the previous ones.



References

2024

  • (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting Retrieved:2024-3-13.
    • Gradient boosting is typically used with decision trees (especially CARTs) of a fixed size as base learners. For this special case, Friedman proposes a modification to gradient boosting method which improves the quality of fit of each base learner.

      Generic gradient boosting at the m-th step would fit a decision tree [math]\displaystyle{ h_m(x) }[/math] to pseudo-residuals. Let [math]\displaystyle{ J_{m} }[/math] be the number of its leaves. The tree partitions the input space into [math]\displaystyle{ J_{m} }[/math] disjoint regions [math]\displaystyle{ R_{1m}, \ldots, R_{J_{m}m} }[/math] and predicts a constant value in each region. Using the indicator notation, the output of [math]\displaystyle{ h_m(x) }[/math] for input x can be written as the sum: : [math]\displaystyle{ h_m(x) = \sum_{j=1}^{J_{m}} b_{jm} \mathbf {1}_{R_{jm}}(x), }[/math] where [math]\displaystyle{ b_{jm} }[/math] is the value predicted in the region [math]\displaystyle{ R_{jm} }[/math] . [1]

      Then the coefficients [math]\displaystyle{ b_{jm} }[/math] are multiplied by some value [math]\displaystyle{ \gamma_m }[/math] , chosen using line search so as to minimize the loss function, and the model is updated as follows: : [math]\displaystyle{ F_m(x) = F_{m-1}(x) + \gamma_m h_m(x), \quad \gamma_m = \underset{\gamma}{\operatorname{arg\,min}} \sum_{i=1}^n L(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)). }[/math] Friedman proposes to modify this algorithm so that it chooses a separate optimal value [math]\displaystyle{ \gamma_{jm} }[/math] for each of the tree's regions, instead of a single [math]\displaystyle{ \gamma_m }[/math] for the whole tree. He calls the modified algorithm "TreeBoost". The coefficients [math]\displaystyle{ b_{jm} }[/math] from the tree-fitting procedure can be then simply discarded and the model update rule becomes: : [math]\displaystyle{ F_m(x) = F_{m-1}(x) + \sum_{j=1}^{J_{m}} \gamma_{jm} \mathbf {1}_{R_{jm}}(x), \quad \gamma_{jm} = \underset{\gamma}{\operatorname{arg\,min}} \sum_{x_i \in R_{jm}} L(y_i, F_{m-1}(x_i) + \gamma). }[/math]

2018

2009

  • http://www.dtreg.com/treeboost.htm
    • TreeBoost - Stochastic Gradient Boosting
    • "Boosting" is a technique for improving the accuracy of a predictive function by applying the function repeatedly in a series and combining the output of each function with weighting so that the total error of the prediction is minimized. In many cases, the predictive accuracy of such a series greatly exceeds the accuracy of the base function used alone.
    • The TreeBoost algorithm used by DTREG is optimized for improving the accuracy of models built on decision trees. Research has shown that models built using TreeBoost are among the most accurate of any known modeling technique.
    • The TreeBoost algorithm is functionally similar to Decision Tree Forests because it creates a tree ensemble, and it uses randomization during the tree creations. However, a random forest builds the trees in parallel and they "vote" on the prediction; whereas TreeBoost creates a series of trees, and the prediction receives incremental improvement by each tree in the series.

2001


  1. Note: in case of usual CART trees, the trees are fitted using least-squares loss, and so the coefficient [math]\displaystyle{ b_{jm} }[/math] for the region [math]\displaystyle{ R_{jm} }[/math] is equal to just the value of output variable, averaged over all training instances in [math]\displaystyle{ R_{jm} }[/math] .