Naive Bayes Classifier

Before going into depth of Naive Bayes, let’s talk about the his name,

Why algorithm called Naive Bayes’ Classifier?

First, classifier word, it means utilizing some training data to understand how given input variables relate to the class.

Second, Bayes word, because Algorithm is based on Bayes’ theorem. later on this blog I’ll talk about the formula.

Third, Naive word, because it assumes that each input variable is independent. later on I’ll talk about with example.

What is Naive Bayes’ Algorithm?

Naive Bayes is a simple and powerful classification algorithm based on Bayes' theorem.

Bayes’ theorem,

  • P( c | x ) is posterior probability. it means what is the probability of c when x is given?
  • P( x | c ) is Likelihood. it means what is the probability of x when c is given?
  • P( c ) is prior probability of class.
  • P( x ) is prior probability of predictor.

This algorithm uses two types of probability and those are calculated from training data,

  1. Probability of each class P(c)
  2. Conditional probability for each class P( c|x )

How Naive Bayes’ Algorithm works?

let’s take an example and see how algorithm works.

In this example, we have whether as Independent feature and Play as Dependent feature. prediction probability of person go to play or not by looking whether.

but, First we make a frequency Table,

frequency table tells that how many yes and no in this feature.

second we make a Likelihood table,

let’s use formula and find the probability,

1) P( Yes | Rainy ) = P( Rainy | Yes) * P( Yes ) / P( Rainy )

P( Rainy | Yes) = P(Rainy) ∩ P(Yes) / P(Yes) = 2/9

P(Yes) = 9/14 , P(Rainy) = 5/14

P( Yes | Rainy ) = (2/9 * 9/14) / (5/14) = 0.40

2) P( No| Rainy ) = P( Rainy | No ) * P( No ) / P( Rainy )

P( Rainy | No ) = P(Rainy) ∩ P(No) / P(No) = 3/5

P(No) = 5/14 , P(Rainy) = 5/14

P( No | Rainy ) = (3/5 * 5/14) / (5/14) = 0.60

so, model will choose no because it has higher probability.

In this example, other values like sunny and overcast didn’t interfere. that’s why this algorithm is called Naive. there’s only input and output .


  • easy to implement and fast
  • perform well in multi class prediction
  • When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
  • It perform well in case of categorical input variables compared to numerical variable(s).
  • For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).
  • Good results obtained in most of the cases


  • If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”.
  • On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
  • Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.
  • Precision will decrease when the size of the dataset is small.

Types of Naive Bayes

  1. Gaussian NB:- It is used in classification and it assumes that features follow a normal distribution. When the predictors take up a continuous value and are not discrete, we assume that these values are sampled from a gaussian distribution.
  2. Multinomial NB:- It is used for discrete counts. this is mostly used for document classification problem, i.e. whether a document belongs to the category of sports, politics, technology etc. The features/predictors used by the classifier are the frequency of the words present in the document.
  3. Bernouli NB:- This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not. it works good on text classifier if it has only two class.


It is good and fast algorithm and works well on NLP problems. everything in this world has good and bad side, example, trained NB model on text classification and on testing if he saw new word which is not from traininf then it will give that word 0 probability which is bad it means it will take that word as negative sign.

That’s all, Thank you for Reading this Blog :)