Neural Network : Forward Propogation

Neural Networks: Model Representation (From Andrew Ng’s ML Course)

Let’s decode one of the most iconic diagrams from Andrew Ng’s Machine Learning course on Coursera — the foundational model representation of a feedforward neural network.




The Network Architecture

At its core, a neural network is just a collection of connected “neurons” organized into layers:

  1. Input Layer (Layer 1)

    • Inputs: x₁, x₂ (e.g., features like size and age of a house)

    • Special input: x₀ = 1 → this is the bias unit (acts like the intercept in a linear model)

  2. Hidden Layer (Layer 2)

    • Neurons: a₁⁽²⁾, a₂⁽²⁾ — these are activations, computed using the inputs and weights.

    • Another bias unit: a₀⁽²⁾ = 1

  3. Output Layer (Layer 3)

    • Final prediction: h_θ(x) — also known as the hypothesis

Every neuron in one layer is connected to every neuron in the next. These connections have weights (denoted Θ) that determine how strongly each input influences the output.


Forward Propagation

The data flows forward through the network in the following steps:




Step 1: From Inputs to Hidden Layer

Each hidden neuron computes a weighted sum (z) of the inputs:

z1(2)=Θ10x0+Θ11x1+Θ12x2z_1^{(2)} = \Theta_{10}x_0 + \Theta_{11}x_1 + \Theta_{12}x_2 z2(2)=Θ20x0+Θ21x1+Θ22x2z_2^{(2)} = \Theta_{20}x_0 + \Theta_{21}x_1 + \Theta_{22}x_2

These z values are passed through an activation function g(z) (commonly sigmoid or ReLU):

a1(2)=g(z1(2)),a2(2)=g(z2(2))a_1^{(2)} = g(z_1^{(2)}), \quad a_2^{(2)} = g(z_2^{(2)})

This gives us the activations for the hidden layer.

Step 2: From Hidden Layer to Output Layer

Now, the output neuron takes the hidden activations as inputs:

hθ(x)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2))h_θ(x) = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)})

Again, a weighted sum followed by the activation function gives us the final prediction.


The Power of Matrix Notation

To make things scalable (especially for deeper networks), we switch to matrix form:

z(j)=Θ(j1)a(j1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

This computes all neurons in layer j simultaneously by matrix-multiplying the weight matrix with the activations from the previous layer.

Then apply the activation function:

a(j)=g(z(j))a^{(j)} = g(z^{(j)})

This makes forward propagation fast and efficient.


Why Bias Units Matter

You might’ve noticed the special units x₀, a₀⁽²⁾, etc. These are bias units—neurons that always output 1. They help the network learn an intercept, improving flexibility and accuracy.


Summary: What You’re Seeing in the Diagram

  • It’s a 3-layer feedforward neural network

  • Input layer → Hidden layer → Output layer

  • Each connection has a weight (Θ) that the model learns during training

  • Neurons calculate:
    weighted sumactivation functionoutput

  • This is called forward propagation


Comments