In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc.
Logistic Regression came from Linear Regression and it is a Linear( because the logit of the estimated probability response is a linear function of the parameters) and used for classification.
Meaning of it’s name?
Logistic came from logit function which is used in this algorithm.
Regression is there because its underlying technique is quite the same as Linear ‘Regression’.
Relation between Linear Regression and Logistic Regression
let’s take an example of classification problem where we try to find the best fit line with Linear Regression and see where is the problem and how logistic regression came to face.
in the example, we have some of the classification data where it lies in 0 and 1.
if we draw y=0.5 line and try to predict with linear regression it predict the output value pretty well. so, the problem came when outlier came to the picture and best fit line fluctuate a little bit and predict some of the output values wrong.
There comes a Logistic Regression and it fits on the data with sigmoid function with “s” type of curve.
What is Sigmoid Function?
In order to map predicted values to probabilities, we use the Sigmoid function. The function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.
f(z) = 1/ (1+e^-(z))
z = w0 + w1*x = ln(P/1-P) which is Logit Function.
f(x) = 1/ (1+e^-(w0 + w1*x))
for linear regression we learn that we try to go into global minima. if try that into Logistic Regression we ended up at any local minima because it became non-convex function.
for Logistic Regression,
after compressing two function,
Gradient Descent will be same as Linear Regression,
easier to implement, interpret, and very efficient to train.
Good accuracy for many simple data sets and it performs well when the dataset is linearly separable.
It can interpret model coefficients as indicators of feature importance.
constructs linear boundaries.
Doesn’t handle large number of categorical features/variables well.
tough to obtain complex relationships using logistic regression.
Hyper Parameter Tuning
- Solver: solver is the optimization parameter like gradient descent in linear regression.
solver = [’ newton-cg ’,’ lbfgs ’,’ liblinear ’,’ sag ’,’ saga ’] default=’ lbfgs ’
for small dataset, ‘liblinear’ is good choice, where as ‘sag’ and ‘saga’ are faster for large one.
for multiclass problems, only ‘newton-cg’, ’sag’, ’saga’, ’lbfgs’ handle multinomial loss, ‘libliner ’ is limited to one-versus-rest schemes.
- Penalty: [‘l1’, ‘l2’, None, ‘elastic-net’], default=’l2’
‘newton-cg’, ‘sag’, ‘lbfgs’ supports only l2 penalties.
‘saga’ only support ‘elastic-net’.
- C: float, default=1e-4
inverse of regularization strength; must be a positive float.
like in support vector machine, smaller values specify stronger regularization.
Thank you for reading this blog :)