Decision Tree with Gini index
If you don’t know about the Entropy follow this blog.
What is Gini Index?
The Gini Index or Gini Impurity is calculated by subtracting the sum of the squared probabilities of each class from one. It favors mostly the larger partitions and are very simple to implement.
In simple terms, it calculates the probability of a certain randomly selected feature that was classified incorrectly.
another one(🤦♂️), calculates amount of probability of a specific feature that is classified incorrectly when selected randomly.
formula,
after computing Gini index, we’re going compute Information Gain(just like Entropy but instead of it we go with Gini).
Entropy vs Gini
in this graph, on the X-axis, it’s probability of positive(P(+)) and on Y-axis, it is output value coming after applying formula.
the working of the both method are very similar and used for splitting the Decision tree.
for the entropy, it will go to 1 on the top and then decrease. on the other hand, Gini goes to 0.5and then decrease.
Entropy takes too much computational power compare to Gini because it use log in his formula which take much power when there is lots of data.
Entropy lies between 0 to 1 and Gini lies between 0 to 0.5.
How Decision Tree find splitting values when there is numerical values?
example, we have one numerical feature and it’s classification problem.
step-1 : sort every values of numerical features
step- 2 : it will take a threshold value
step-3 : then compute Entropy or Gini then go for information gain
every time step- 2 and 3 will continuously going on, threshold value will be changing every time and make a decision tree which ever DT has good information gain it will be selected.
Thank you for Reading this blog :)