Decision Tree Variants
In this post we dicuss different decision trees.
Two types of decision tree exists.
- Classification Tree.
- The simplest to exist. See Bootstrap and Bagging for details.
- Regression Tree
- Make use of variance as a measure for splitting.
- Adaboost with decision tree
- same as random forest except there the weights of each classifier and sample are re-calculated during each iteration
- GBDT (Gradient Boost Decision Tree) (Mart) (Multiple Additive Regression Tree)
- think of it as residual net
- two kinds of implementation
- new tree trained gradient based
- new tree trained difference based
- loop procedure: initial setup ylabel=yactual
- yn=ylabel−yn−11, where yn−11 represent an ensemble prediction of all previous trees. And yn is the new tree learned from either the differene or gradient
- yn1=yn−11+yn, the new tree is then added to the ensemble
- ylabel=difference or negative gradient
- shrinkage version
- replace step 2 in the loop with yn1=yn−11+step⋅yn, everything else is the same.
- require more computational resource
- prevent overfitting. No theory to prove it though.
- other variant: stochastic sampling of features and bootstrap sampling.
- Reference
- XGBOOST
- make use of 2nd Taylor series
- 2nd Taylor series reduces the iteration process therefore hasten the training process
- added 2 regulations to prevent the tree from overfitting
the loss function
the regulation function
- where γ and λ is two hyperparameters, the γ parameter is a threshold value. And λ is a smoothing factor.
- Reference - Usuage
- Reference - Induction
- Features I don’t understand
- how parallel process is implemented in XGboost
- Its relationship to GBDT
- How it handles missing features
- If 2nd order derivative is used, why not 3rd order?