Linear with some Noise 😁

before diving deep into algorithm let’s talk about,

What is Linear in Linear Regression?

Linear regression is called ‘Linear regression’ not because the x’s or the dependent variables are linear with respect to the y or the independent variable but because the parameters or the thetas are.

If you don’t understand above lines, don’t worry. First, read whole blog and after that follow this link to go deep dive into this question.

Regression is statistical methods that are used to predict a continuous value.

What is Linear Regression?

Linear regression is a machine learning algorithm based on supervised learning and performs regression task.

Linear Regression algorithm shows a relation between dependent variable(y) and one or more independent variable(x) i.e., how the value of the dependent variable, y changes according to the value of the independent variable.

Equation of linear Regression,

y_hat = m * x + c = w1 * x + w0 = theta1 * x + theta0

for example I’ll take y = W1 * x + w0


w1 is slope of line( what happen to y’s value if x’s value increment by one step) and w0 is intercept ( point where line touches to the y-axis ).

you can refer them as weights.

x is input variable and y_hat is predicted output.

How Linear Regression works?

let’s take an example,

where ( x, y) values are (1,1),(2,2),(3,3),(4,4) and we will take w0 = 0.

so, y_hat = w1 * x

here, I have take three different w1 values.

for every w1 value, I am going to compute cost function(error function for finding error) and then try to show it by gradient descent(for converging weights to get less error type model).

here, in the cost function ,

n is total number of data.

y is actual output.

just like this,

now, we have cost function and weight w1. we’re going to in the gradient descent,

Gradient Descent is used to converge the weights and find best model for us. if you don’t know about it, don’t worry I’ll talk about gradient descent in my upcoming blogs. for now, please google it.

in simple language, it try to converge weights regarding loss to a global minima and give us best model.


where, j = 0,1.

alpha is learning rate.


easier to implement , interpret and efficient to train.

it handle overfitting well using dimension reduction technique, regularization and cross-validation.

perform well on linear separable data.


often prone to noise and overfitting.

sensitive to outliers.

prone to multi-collinearity.

assumption of the linearity between dependent and independent variables.

  • for above example, if we have to predict for the x value 5 then our best model will give our output 5 but what if our actual output not 5 something else ??
  • then, our model is overfitted model. to reduce overfitting we add some noise to that by regularization methods like Ridge, Lasso, Elastic Net. in the upcoming blogs I’ll write about the regularization methods.


In the end of the day, Linear Regression is good model if our data is linear form. but with outliers makes him worse model.

Thanks for Reading This Blog :)