The Sigmoid Function: The Math That Powers Logistic Regression
When you first learn logistic regression, everyone says:
“We use the sigmoid function to map numbers to probabilities.”
But if you’re like most people, your next question is:
“Cool… but where does that magical S-shape even come from?”
To understand it fully, we need to take a short trip through logarithms, odds, and log-odds before finally arriving at the sigmoid.
Step 1: What is a Log?
A logarithm is the inverse of exponentiation.
If:
then:
In other words:
-
A log answers the question: “To what power must I raise the base to get ?”
-
For example:
In logistic regression, we specifically use the natural logarithm, denoted as:
This means the base is (≈ 2.71828), the mathematical constant for continuous growth.
Example:
Step 2: From Probability to Odds
Probability () is straightforward:
It’s the fraction of times an event happens out of all trials.
Example: A coin toss
-
Probability of heads = 0.5
But statisticians often use odds:
This compares the probability of success to the probability of failure.
Example: If :
Meaning “4 to 1” odds — success is four times more likely than failure.
Step 3: Odds to Log-Odds (Logit)
Now we take the log of the odds. This gives us the logit:
Why bother?
-
Odds are non-negative (0 to ∞).
-
Log-odds can take any real number (-∞ to +∞).
-
This makes them perfect for linking probabilities to linear models.
Example:
If :
If :
Notice how the sign changes — higher probability → positive logit, lower probability → negative logit.
Step 4: Logistic Regression’s Assumption
Logistic regression assumes a linear relationship between the log-odds and the inputs:
Where:
-
are weights
-
are features
-
is the bias
Step 5: Deriving the Sigmoid Function
Now let’s solve for :
-
Start with:
-
Exponentiate both sides (to remove the log):
-
Multiply both sides by :
-
Expand:
-
Add to both sides:
-
Factor out :
-
Divide both sides:
-
Multiply numerator and denominator by :
This is the sigmoid function:
Step 6: Why the Sigmoid is Perfect for Logistic Regression
-
Probability output: Always between 0 and 1.
-
Smooth gradient: Great for optimization via gradient descent.
-
Natural origin: Comes directly from transforming log-odds.
-
Interpretability: Each weight shifts the log-odds linearly.
Step 7: Intuition & Shape
The curve:
-
Looks like an S
-
At , output is 0.5
-
For large → output ~ 1
-
For very negative → output ~ 0
Key Takeaway
The sigmoid function is not just a mathematical trick, it’s the direct result of connecting probabilities, odds, and logs in a way that allows a simple linear equation to model complex classification tasks.
Comments
Post a Comment