2018 CatBoostUnbiasedBoostingwithCat

From GM-RKB
Jump to navigation Jump to search

Subject Headings: CatBoost, GBDT.

Notes

Cited By

Quotes

Abstract

Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2018 CatBoostUnbiasedBoostingwithCatGleb Gusev
Liudmila Prokhorenkova
Aleksandr Vorobev
Anna Veronika Dorogush
Andrey Gulin
CatBoost: Unbiased Boosting with Categorical Features2018