Neural Network : Forward Propogation

Neural Networks: Model Representation (From Andrew Ng’s ML Course)

Let’s decode one of the most iconic diagrams from Andrew Ng’s Machine Learning course on Coursera — the foundational model representation of a feedforward neural network.

The Network Architecture

At its core, a neural network is just a collection of connected “neurons” organized into layers:

Input Layer (Layer 1)
- Inputs: x₁, x₂ (e.g., features like size and age of a house)
- Special input: x₀ = 1 → this is the bias unit (acts like the intercept in a linear model)
Hidden Layer (Layer 2)
- Neurons: a₁⁽²⁾, a₂⁽²⁾ — these are activations, computed using the inputs and weights.
- Another bias unit: a₀⁽²⁾ = 1
Output Layer (Layer 3)
- Final prediction: h_θ(x) — also known as the hypothesis

Every neuron in one layer is connected to every neuron in the next. These connections have weights (denoted Θ) that determine how strongly each input influences the output.

Forward Propagation

The data flows forward through the network in the following steps:

Step 1: From Inputs to Hidden Layer

Each hidden neuron computes a weighted sum (z) of the inputs:

z_1^{(2)} = \Theta_{10}x_0 + \Theta_{11}x_1 + \Theta_{12}x_2

z_2^{(2)} = \Theta_{20}x_0 + \Theta_{21}x_1 + \Theta_{22}x_2

These z values are passed through an activation function g(z) (commonly sigmoid or ReLU):

a_1^{(2)} = g(z_1^{(2)}), \quad a_2^{(2)} = g(z_2^{(2)})

This gives us the activations for the hidden layer.

Step 2: From Hidden Layer to Output Layer

Now, the output neuron takes the hidden activations as inputs:

h_θ(x) = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)})

Again, a weighted sum followed by the activation function gives us the final prediction.

The Power of Matrix Notation

To make things scalable (especially for deeper networks), we switch to matrix form:

z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

This computes all neurons in layer j simultaneously by matrix-multiplying the weight matrix with the activations from the previous layer.

Then apply the activation function:

a^{(j)} = g(z^{(j)})

This makes forward propagation fast and efficient.

Why Bias Units Matter

You might’ve noticed the special units x₀, a₀⁽²⁾, etc. These are bias units—neurons that always output 1. They help the network learn an intercept, improving flexibility and accuracy.

Summary: What You’re Seeing in the Diagram

It’s a 3-layer feedforward neural network
Input layer → Hidden layer → Output layer
Each connection has a weight (Θ) that the model learns during training
Neurons calculate:
weighted sum → activation function → output
This is called forward propagation

Machine Learning

Search This Blog