Regularization is a technique used for tuning the function by adding an additional penalty term in the error function. The additional term controls the excessively fluctuating function such that the coefficients don’t take extreme values.
in simple word, it uses to add some noise so that our our model cannot become overfit.
in Linear Regression example, if our input(x) is 5, so our best fit line predict 5 but what if out actual output is 4.then loss is 1. it is really bigger. so, to reduce the loss we use regularization.
Cost function = Loss + Regularization term
L2 regularization or Ridge regularization or regression
Ridge regression is a model tuning method that is used to analyze any data that suffers from multicollinearity. Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.
Multicollinearity exists whenever an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation.
in simple words, when two or more independent features are highly corelated to each other that time multi collinearity happens. to stop them we can drop those independent feature(if 5 independent features are multicollinear we can drop 4 of them) or use regularization to solve that problem.
here, λ is the tuning parameter that decides how much we want to penalize the flexibility of our model.
Ridge regression decreases the complexity of a model but does not reduce the number of variables since it never leads to a coefficient been zero rather only minimizes it. Hence, this model is not good for feature reduction.
L1 regularization or Lasso Regularization or regression
Lasso regression stands for Least Absolute Shrinkage and Selection Operator.
L1 add “absolute sum of the coefficients” as penalty term to the loss function.
As the value of coefficients increases from 0 this term penalizes, cause model, to decrease the value of coefficients in order to reduce loss.
The difference between ridge and lasso regression is that it tends to make coefficients to absolute zero as compared to Ridge which never sets the value of coefficient to absolute zero.
Lasso is tell us used for feature selection.
Lasso sometimes struggles with some types of data. If the number of predictors (p) is greater than the number of observations (n), Lasso will pick at most n predictors as non-zero, even if all predictors are relevant (or may be used in the test set).
If there are two or more highly collinear variables then LASSO regression select one of them randomly which is not good for the interpretation of data
There is another regularization called as Elastic Net that I was trying to understand it but my laziness comes and I slept(😅😅).
There is other regularization like dropout, data augmentation, Early Stopping which comes in Neural Network. so, I will talk about it later.
Thank you for reading this blog :)