A Deep Dive
Logistic regression is a popular classification algorithm, often used when we need to predict a binary outcome — for example, whether an email is spam (1) or not spam (0).
The goal of logistic regression is to find model parameters (weights) that produce probabilities close to the true labels. But how do we measure how "good" our parameters are? That’s where the cost function comes in.
Why Not Use Mean Squared Error (MSE)?
If we tried to use the Mean Squared Error from linear regression for classification, we’d run into trouble:
-
Logistic regression uses the sigmoid function to map a linear score to a probability.
-
The sigmoid is non-linear, and MSE would produce a non-convex cost surface for logistic regression.
-
A non-convex cost surface means gradient descent could get stuck in local minima, making optimization harder.
We need a cost function that is convex so gradient descent works reliably.
From Likelihood to Cost
Instead of minimizing MSE, logistic regression uses Maximum Likelihood Estimation (MLE).
Our model outputs the probability:
where
For a given training example , where :
-
If , we want to be close to 1.
-
If , we want to be close to 0.
The probability of the correct class can be written as:
This works because:
-
If , the term becomes
-
If , the term becomes
The Likelihood Function
For training examples, the likelihood is:
Maximizing this likelihood means finding parameters that make the data most probable.
Log-Likelihood
Multiplying many small probabilities can cause numerical underflow. To make computation easier, we take the logarithm:
From Maximization to Minimization
Optimization algorithms like gradient descent usually minimize rather than maximize. So we take the negative of the log-likelihood:
This is the logistic regression cost function — also called binary cross-entropy loss.
Why It Works
-
The cost is 0 when predictions are perfect ( for , for ).
-
The cost increases sharply as predictions move away from the truth.
-
It’s convex, so gradient descent won’t get stuck in local minima.
Key Takeaway:
The logistic regression cost function isn’t arbitrary — it comes directly from the principles of probability and maximum likelihood estimation, and it’s designed to ensure a convex optimization problem.
Comments
Post a Comment