Decision Tree is both classification and regression problem solving algorithm.
for Regression Decision Tree works Differently from classification. in classification, Splitting Decision tree will decided by Entropy or Gini. in Regression, splitting will be decided by Mean Squared Error, Mean Absolute Error, friedman_mse, poisson.
for diving into the algorithm, first let’s learn two things.
- Standard Deviation for one Attribute or feature.
- Count n (total number of rows or values in that data or feature)
- Average (X_bar)
- Standard_Deviation (S)
- Coefficient of variation (CV)
Standard Deviation for tree building (branching).
Coefficient of variation (CV) is used to decide when to stop branching. we can use count as well.
Average is the value in the leaf nodes.
2. Standard Deviation for two attribute(target and predictor):
Standard Deviation reduction
the standard deviation reduction is based on the decrease in standard deviation after a dataset is split on an attribute.
constructing a decision tree is all about finding attribute that returns the highest standard deviation reduction.
Thanks for reading this blog :)