Linear with some Noise 😁

before diving deep into algorithm let’s talk about,

What is Linear in Linear Regression?

If you don’t understand above lines, don’t worry. First, read whole blog and after that follow this link to go deep dive into this question.

Regression is statistical methods that are used to predict a continuous value.

What is Linear Regression?

Linear Regression algorithm shows a relation between dependent variable(y) and one or more independent variable(x) i.e., how the value of the dependent variable, y changes according to the value of the independent variable.

Equation of linear Regression,

y_hat = m * x + c = w1 * x + w0 = theta1 * x + theta0

for example I’ll take y = W1 * x + w0


w1 is slope of line( what happen to y’s value if x’s value increment by one step) and w0 is intercept ( point where line touches to the y-axis ).

you can refer them as weights.

x is input variable and y_hat is predicted output.

How Linear Regression works?

where ( x, y) values are (1,1),(2,2),(3,3),(4,4) and we will take w0 = 0.

so, y_hat = w1 * x

here, I have take three different w1 values.

for every w1 value, I am going to compute cost function(error function for finding error) and then try to show it by gradient descent(for converging weights to get less error type model).

here, in the cost function ,

n is total number of data.

y is actual output.

just like this,

now, we have cost function and weight w1. we’re going to in the gradient descent,

Gradient Descent is used to converge the weights and find best model for us. if you don’t know about it, don’t worry I’ll talk about gradient descent in my upcoming blogs. for now, please google it.

in simple language, it try to converge weights regarding loss to a global minima and give us best model.


where, j = 0,1.

alpha is learning rate.


it handle overfitting well using dimension reduction technique, regularization and cross-validation.

perform well on linear separable data.


sensitive to outliers.

prone to multi-collinearity.

assumption of the linearity between dependent and independent variables.

  • for above example, if we have to predict for the x value 5 then our best model will give our output 5 but what if our actual output not 5 something else ??
  • then, our model is overfitted model. to reduce overfitting we add some noise to that by regularization methods like Ridge, Lasso, Elastic Net. in the upcoming blogs I’ll write about the regularization methods.


Thanks for Reading This Blog :)