Decision Tree with Gini index

If you don’t know about the Entropy follow this blog.

After Entropy another one Gini

What is Gini Index?

The Gini Index or Gini Impurity is calculated by subtracting the sum of the squared probabilities of each class from one. It favors mostly the larger partitions and are very simple to implement.

In simple terms, it calculates the probability of a certain randomly selected feature that was classified incorrectly.

another one(🤦‍♂️), calculates amount of probability of a specific feature that is classified incorrectly when selected randomly.

formula,

Gini Index

after computing Gini index, we’re going compute Information Gain(just like Entropy but instead of it we go with Gini).

Entropy vs Gini

in this graph, on the X-axis, it’s probability of positive(P(+)) and on Y-axis, it is output value coming after applying formula.

the working of the both method are very similar and used for splitting the Decision tree.

for the entropy, it will go to 1 on the top and then decrease. on the other hand, Gini goes to 0.5and then decrease.

Entropy takes too much computational power compare to Gini because it use log in his formula which take much power when there is lots of data.

Entropy lies between 0 to 1 and Gini lies between 0 to 0.5.

How Decision Tree find splitting values when there is numerical values?

example, we have one numerical feature and it’s classification problem.

step-1 : sort every values of numerical features

step- 2 : it will take a threshold value

step-3 : then compute Entropy or Gini then go for information gain

every time step- 2 and 3 will continuously going on, threshold value will be changing every time and make a decision tree which ever DT has good information gain it will be selected.

Thank you for Reading this blog :)

smiling...