Decision Tree is both classification and regression problem solving algorithm.

for classification follow this blogs 1, 2.

for Regression Decision Tree works Differently from classification. in classification, Splitting Decision tree will decided by Entropy or Gini. in Regression, splitting will be decided by Mean Squared Error, Mean Absolute Error, friedman_mse, poisson.

for diving into the algorithm, first let’s learn two things.

Boost 🏍🚀

it is one of the most powerful techniques for building predictive models.

Gradient boosting is a machine learning technique for regression, classification and other tasks, which produces a prediction model in the form of an ensemble of weak prediction models(Weak Learner), typically decision trees or Linear Regression.

Weak learners are models that perform slightly better than random guessing. Strong learners are models that have arbitrarily good accuracy. Weak and strong learners are tools from computational learning theory and provide the basis for the development of the boosting class of ensemble methods.

example of weak learner is Decision Tree. when we…

Regularization is a technique used for tuning the function by adding an additional penalty term in the error function. The additional term controls the excessively fluctuating function such that the coefficients don’t take extreme values.

in simple word, it uses to add some noise so that our our model cannot become overfit.

in Linear Regression example, if our input(x) is 5, so our best fit line predict 5 but what if out actual output is 4.then loss is 1. it is really bigger. so, to reduce the loss we use regularization.

Cost function = Loss + Regularization term

L2 regularization or Ridge regularization or regression

Ridge regression…

Mean Absolute Error, L1 loss

it is calculated the average of the absolute difference between actual and predicted data point or y or output.


Random Forest works on classification and Regression both problem statement.

Random Forest uses N number of Decision Tree as base model and give some sample of the data to each Decision Tree to predict.

if you don’t know about the DT blogs 1,2.

If you don’t know about the Entropy follow this blog.

After Entropy another one Gini

What is Gini Index?

The Gini Index or Gini Impurity is calculated by subtracting the sum of the squared probabilities of each class from one. It favors mostly the larger partitions and are very simple to implement.

In simple terms, it calculates the probability of a certain randomly selected feature that was classified incorrectly.

another one(🤦‍♂️), calculates amount of probability of a specific feature that is classified incorrectly when selected randomly.


A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.( definition from Wikipedia)

In simple term, It has tree like structure (if you’re familiar with Data Structure there has tree concept which we’re going to use in this algorithm) where each internal node represents a “test” on an attribute (e.g. …

In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc.

Logistic Regression came from Linear Regression and it is a Linear( because the logit of the estimated probability response is a linear function of the parameters) and used for classification.

Multi-class Classification

Meaning of it’s name?

Logistic came from logit function which is used in this algorithm.

Regression is there because its underlying technique is quite the same as…

Linear with some Noise 😁

before diving deep into algorithm let’s talk about,

What is Linear in Linear Regression?

Linear regression is called ‘Linear regression’ not because the x’s or the dependent variables are linear with respect to the y or the independent variable but because the parameters or the thetas are.

If you don’t understand above lines, don’t worry. First, read whole blog and after that follow this link to go deep dive into this question.

Regression is statistical methods that are used to predict a continuous value.

What is Linear Regression?

Linear regression is a machine learning algorithm based on supervised learning and performs regression task.

Linear Regression algorithm shows a relation between…

Before going into depth of Naive Bayes, let’s talk about the his name,

Why algorithm called Naive Bayes’ Classifier?

First, classifier word, it means utilizing some training data to understand how given input variables relate to the class.

Second, Bayes word, because Algorithm is based on Bayes’ theorem. later on this blog I’ll talk about the formula.

Third, Naive word, because it assumes that each input variable is independent. later on I’ll talk about with example.

What is Naive Bayes’ Algorithm?

Naive Bayes is a simple and powerful classification algorithm based on Bayes' theorem.

Bayes’ theorem,

Purav Patel


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store