Neural Networks: Model Representation (From Andrew Ng’s ML Course)
Let’s decode one of the most iconic diagrams from Andrew Ng’s Machine Learning course on Coursera — the foundational model representation of a feedforward neural network.
The Network Architecture
At its core, a neural network is just a collection of connected “neurons” organized into layers:
-
Input Layer (Layer 1)
-
Inputs:
x₁
,x₂
(e.g., features like size and age of a house) -
Special input:
x₀ = 1
→ this is the bias unit (acts like the intercept in a linear model)
-
-
Hidden Layer (Layer 2)
-
Neurons:
a₁⁽²⁾
,a₂⁽²⁾
— these are activations, computed using the inputs and weights. -
Another bias unit:
a₀⁽²⁾ = 1
-
-
Output Layer (Layer 3)
-
Final prediction:
h_θ(x)
— also known as the hypothesis
-
Every neuron in one layer is connected to every neuron in the next. These connections have weights (denoted Θ
) that determine how strongly each input influences the output.
Forward Propagation
The data flows forward through the network in the following steps:
Step 1: From Inputs to Hidden Layer
Each hidden neuron computes a weighted sum (z
) of the inputs:
These z
values are passed through an activation function g(z)
(commonly sigmoid or ReLU):
This gives us the activations for the hidden layer.
Step 2: From Hidden Layer to Output Layer
Now, the output neuron takes the hidden activations as inputs:
Again, a weighted sum followed by the activation function gives us the final prediction.
The Power of Matrix Notation
To make things scalable (especially for deeper networks), we switch to matrix form:
This computes all neurons in layer j
simultaneously by matrix-multiplying the weight matrix with the activations from the previous layer.
Then apply the activation function:
This makes forward propagation fast and efficient.
Why Bias Units Matter
You might’ve noticed the special units x₀
, a₀⁽²⁾
, etc. These are bias units—neurons that always output 1. They help the network learn an intercept, improving flexibility and accuracy.
Summary: What You’re Seeing in the Diagram
-
It’s a 3-layer feedforward neural network
-
Input layer → Hidden layer → Output layer
-
Each connection has a weight (Θ) that the model learns during training
-
Neurons calculate:
weighted sum
→activation function
→output
-
This is called forward propagation
Comments
Post a Comment