Introduction to Decision Tree

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.( definition from Wikipedia)

In simple term, It has tree like structure (if you’re familiar with Data Structure there has tree concept which we’re going to use in this algorithm) where each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes).

another one(😅), Decision Tree are a type of supervised machine learning where the data is continuously split according to a certain parameter.

it follows a set if-else condition to visualize the data and classify it according to the conditions.

Important Terminology

  1. Root Node: this attribute is used for dividing the data into two or more sets. the feature attribute in this node is selected based on attribute selection technique.
  2. Branch or Sub-Tree: a part of the entire decision tree is called a branch or sub-tree.
  3. Splitting: dividing a node into two or more sub-nodes based on if-else conditions.
  4. Decision Node: after splitting the sub-nodes into further sub-nodes, then it is called the Decision Node.
  5. Leaf or Terminology Node: this is the end of the decision tree where it cannot be split into further sub-nodes.
  6. Pruning: removing a sub nodes from the tree is called pruning.

Meaning of it’s Name

Each ‘Decision’, ‘outcome’ or ‘reaction’ is getting from the Tree based Structured Algorithm that’s why its called “Decision Tree”.

Deep Dive in the Algorithm

we can construct Decision Tree with two criterion,

  1. Gini
  2. Information Gain(in the code, it’ll be Entropy)

for this blog I’ll go with Information Gain and tell you how it works,

we will go through three steps,

  1. Entropy: it is the measure of impurity disorder or uncertainty in a branch of control how a decision tree decides to split the data. it actually effect how a decision tree draws it’s boundaries.
  • Entropy E or H(s)= -P(positive) * log(P(positive)) -P(negative)* log(P(negative))

here, P is probability and s is feature

2. Information Gain,

From Wikipedia

here, H(T)is Entropy for T column or feature or particular value. v is other features and P is probability.

let me take an example so it can become easy to understand.


Easy to understand ( eternal advantage for every algorithm )

Easy to interpret, perfect for visual representation.

work with numerical and categorical features.

Requires little data preprocessing: no need for one-hot encoding, dummy variables, and so on.

Feature selection happens automatically



low bias, high variance problem

in the end, it is good algorithm. so, I have seen many person directly go for random forest before using decision tree. I know that there is overfitting problem but I have seen many problem are solving by using Decision tree with low error compare to random forest. so, my thought are try both of them and then see accuracy and loss after this go and choose algorithm.

Thank you for reading this blog and That’s all for today :)