The Sigmoid Function

The Sigmoid Function: The Math That Powers Logistic Regression

When you first learn logistic regression, everyone says:

“We use the sigmoid function to map numbers to probabilities.”

But if you’re like most people, your next question is:

“Cool… but where does that magical S-shape even come from?”

To understand it fully, we need to take a short trip through logarithms, odds, and log-odds before finally arriving at the sigmoid.

Step 1: What is a Log?

A logarithm is the inverse of exponentiation.

If:

b^y = x

then:

\log_b(x) = y

In other words:

A log answers the question: “To what power must I raise the base $b$ to get $x$ ?”
For example:

\log_2(8) = 3 \quad \text{because} \quad 2^3 = 8

In logistic regression, we specifically use the natural logarithm, denoted as:

\ln(x)

This means the base is $e$ (≈ 2.71828), the mathematical constant for continuous growth.

Example:

\ln(e^2) = 2

\ln(1) = 0

Step 2: From Probability to Odds

Probability ( $p$ ) is straightforward:

0 \leq p \leq 1

It’s the fraction of times an event happens out of all trials.

Example: A coin toss

Probability of heads = 0.5

But statisticians often use odds:

\text{Odds} = \frac{p}{1 - p}

This compares the probability of success to the probability of failure.

Example: If $p = 0.8$ :

\text{Odds} = \frac{0.8}{0.2} = 4

Meaning “4 to 1” odds — success is four times more likely than failure.

Step 3: Odds to Log-Odds (Logit)

Now we take the log of the odds. This gives us the logit:

\text{Logit}(p) = \ln\left(\frac{p}{1 - p}\right)

Why bother?

Odds are non-negative (0 to ∞).
Log-odds can take any real number (-∞ to +∞).
This makes them perfect for linking probabilities to linear models.

Example:
If $p = 0.8$ :

\text{Logit}(0.8) = \ln\left(\frac{0.8}{0.2}\right) = \ln(4) \approx 1.386

If $p = 0.2$ :

\text{Logit}(0.2) = \ln\left(\frac{0.2}{0.8}\right) = \ln(0.25) \approx -1.386

Notice how the sign changes — higher probability → positive logit, lower probability → negative logit.

Step 4: Logistic Regression’s Assumption

Logistic regression assumes a linear relationship between the log-odds and the inputs:

\ln\left(\frac{p}{1 - p}\right) = z

Where:

z = w_1x_1 + w_2x_2 + \dots + b

$w_i$ are weights
$x_i$ are features
$b$ is the bias

Step 5: Deriving the Sigmoid Function

Now let’s solve for $p$ :

Start with:

\ln\left(\frac{p}{1 - p}\right) = z

Exponentiate both sides (to remove the log):

\frac{p}{1 - p} = e^z

Multiply both sides by $1 - p$ :

p = e^z (1 - p)

Expand:

p = e^z - p e^z

Add $p e^z$ to both sides:

p + p e^z = e^z

Factor out $p$ :

p (1 + e^z) = e^z

Divide both sides:

p = \frac{e^z}{1 + e^z}

Multiply numerator and denominator by $e^{-z}$ :

p = \frac{1}{1 + e^{-z}}

This is the sigmoid function:

\sigma(z) = \frac{1}{1 + e^{-z}}

Step 6: Why the Sigmoid is Perfect for Logistic Regression

Probability output: Always between 0 and 1.
Smooth gradient: Great for optimization via gradient descent.
Natural origin: Comes directly from transforming log-odds.
Interpretability: Each weight shifts the log-odds linearly.

Step 7: Intuition & Shape

The curve:

Looks like an S
At $z = 0$ , output is 0.5
For large $z$ → output ~ 1
For very negative $z$ → output ~ 0

import numpy as np

import matplotlib.pyplot as plt

# 1. Sigmoid curve

z = np.linspace(-10, 10, 500)

sigmoid = 1 / (1 + np.exp(-z))

plt.figure(figsize=(15, 4))

plt.subplot(1, 3, 1)

plt.plot(z, sigmoid, color='blue')

plt.title("Sigmoid Curve")

plt.xlabel("z")

plt.ylabel("σ(z)")

Key Takeaway
The sigmoid function is not just a mathematical trick, it’s the direct result of connecting probabilities, odds, and logs in a way that allows a simple linear equation to model complex classification tasks.

Machine Learning

Search This Blog